DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 00/27] net/mlx5: HW steering PMD update
@ 2022-09-23 14:43 Suanming Mou
  2022-09-23 14:43 ` [PATCH 01/27] net/mlx5: fix invalid flow attributes Suanming Mou
                   ` (31 more replies)
  0 siblings, 32 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  Cc: dev

The skeleton of mlx5 HW steering(HWS) has been updated into
upstream for pretty a long time, but not updated anymore due
to missing of the low-level steering layer code. Luckily,
better late than never, the steering layer finnaly comes[1].

This series will add more features to the existing PMD code:
 - FDB and metadata copy.
 - Modify field.
 - Meter.
 - Counter.
 - Aging.
 - Action template pre-parser optimization.
 - Connection tracking.


Some features such as meter/aging/ct touches the public API,
and public API changes have been sent to ML much earily in
other threads in order not to be swallowed by this big series.

The dpends patches as below:
 [1]https://patches.dpdk.org/project/dpdk/cover/20220922190345.394-1-valex@nvidia.com/
 [2]https://patches.dpdk.org/project/dpdk/cover/20220921021133.2982954-1-akozyrev@nvidia.com/
 [3]https://patches.dpdk.org/project/dpdk/cover/20220921145409.511328-1-michaelba@nvidia.com/
 [4]https://patches.dpdk.org/project/dpdk/patch/20220920071036.20878-1-suanmingm@nvidia.com/
 [5]https://patches.dpdk.org/project/dpdk/patch/20220920071141.21769-1-suanmingm@nvidia.com/
 [6]https://patches.dpdk.org/project/dpdk/patch/20220921143202.1790802-1-dsosnowski@nvidia.com/


Alexander Kozyrev (7):
  ethdev: add meter profiles/policies config
  net/mlx5: add HW steering meter action
  net/mlx5: add meter color flow matching in dv
  net/mlx5: add meter color flow matching in hws
  net/mlx5: implement profile/policy get
  net/mlx5: implement METER MARK action for HWS
  net/mlx5: implement METER MARK indirect action for HWS

Bing Zhao (3):
  net/mlx5: enable mark flag for all ports in the same domain
  net/mlx5: add extended metadata mode for hardware steering
  net/mlx5: add support for ASO return register

Dariusz Sosnowski (5):
  net/mlx5: validate modify field action template
  net/mlx5: create port actions
  net/mlx5: support DR action template API
  net/mlx5: add pattern and table attribute validation
  net/mlx5: add meta item support in egress

Gregory Etelson (1):
  net/mlx5: add HW steering VLAN push, pop and VID modify flow actions

Suanming Mou (8):
  net/mlx5: fix invalid flow attributes
  net/mlx5: fix IPv6 and TCP RSS hash fields
  net/mlx5: add shared header reformat support
  net/mlx5: add modify field hws support
  net/mlx5: support caching queue action
  net/mlx5: fix indirect action validate
  lib/ethdev: add connection tracking configuration
  net/mlx5: add HW steering connection tracking support

Xiaoyu Min (3):
  net/mlx5: add HW steering counter action
  net/mlx5: update indirect actions ops to HW variation
  net/mlx5: support indirect count action for HW steering

 doc/guides/nics/mlx5.rst             |    9 +
 drivers/common/mlx5/mlx5_devx_cmds.c |   50 +
 drivers/common/mlx5/mlx5_devx_cmds.h |   27 +
 drivers/common/mlx5/mlx5_prm.h       |   21 +-
 drivers/common/mlx5/version.map      |    1 +
 drivers/net/mlx5/linux/mlx5_os.c     |   41 +-
 drivers/net/mlx5/meson.build         |    1 +
 drivers/net/mlx5/mlx5.c              |   39 +-
 drivers/net/mlx5/mlx5.h              |  160 +-
 drivers/net/mlx5/mlx5_defs.h         |    2 +
 drivers/net/mlx5/mlx5_flow.c         |  247 +-
 drivers/net/mlx5/mlx5_flow.h         |  265 +-
 drivers/net/mlx5/mlx5_flow_aso.c     |  313 +-
 drivers/net/mlx5/mlx5_flow_dv.c      |  840 ++--
 drivers/net/mlx5/mlx5_flow_hw.c      | 5434 +++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow_meter.c   |  967 ++++-
 drivers/net/mlx5/mlx5_flow_verbs.c   |    4 +-
 drivers/net/mlx5/mlx5_hws_cnt.c      |  523 +++
 drivers/net/mlx5/mlx5_hws_cnt.h      |  558 +++
 drivers/net/mlx5/mlx5_trigger.c      |   80 +-
 lib/ethdev/rte_flow.h                |   27 +-
 21 files changed, 8556 insertions(+), 1053 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 01/27] net/mlx5: fix invalid flow attributes
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 02/27] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

In the function flow_get_drv_type(), attr will be read in non-HWS mode.
In case user call the HWS API in SWS mode, attr should be placed in
HWS functions, or it will cause crash.

Fixes: 572801ab860f ("ethdev: backport upstream rte_flow_async codes")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c | 38 ++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 45109001ca..3abb39aa92 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -3740,6 +3740,8 @@ flow_get_drv_type(struct rte_eth_dev *dev, const struct rte_flow_attr *attr)
 	 */
 	if (priv->sh->config.dv_flow_en == 2)
 		return MLX5_FLOW_TYPE_HW;
+	if (!attr)
+		return MLX5_FLOW_TYPE_MIN;
 	/* If no OS specific type - continue with DV/VERBS selection */
 	if (attr->transfer && priv->sh->config.dv_esw_en)
 		type = MLX5_FLOW_TYPE_DV;
@@ -8252,8 +8254,9 @@ mlx5_flow_info_get(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8287,8 +8290,9 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8319,8 +8323,9 @@ mlx5_flow_pattern_template_create(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8350,8 +8355,9 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8385,8 +8391,9 @@ mlx5_flow_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8416,8 +8423,9 @@ mlx5_flow_actions_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8457,8 +8465,9 @@ mlx5_flow_table_create(struct rte_eth_dev *dev,
 		       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8494,8 +8503,9 @@ mlx5_flow_table_destroy(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8542,8 +8552,9 @@ mlx5_flow_async_flow_create(struct rte_eth_dev *dev,
 			    struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8585,8 +8596,9 @@ mlx5_flow_async_flow_destroy(struct rte_eth_dev *dev,
 			     struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8621,8 +8633,9 @@ mlx5_flow_pull(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8650,8 +8663,9 @@ mlx5_flow_push(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 02/27] net/mlx5: fix IPv6 and TCP RSS hash fields
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
  2022-09-23 14:43 ` [PATCH 01/27] net/mlx5: fix invalid flow attributes Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 03/27] net/mlx5: add shared header reformat support Suanming Mou
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

In the flow_dv_hashfields_set() function, while item_flags was 0,
the code went directly to the first if and the else case would
never have chance be checked. This caused the IPv6 and TCP hash
fileds in the else case would never be set.

This commit adds the dedicate HW steering hash field set function
to generate the RSS hash fields.

Fixes: 6540da0b93b5 ("net/mlx5: fix RSS scaling issue")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 12 +++----
 drivers/net/mlx5/mlx5_flow_hw.c | 59 ++++++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 885b4c5588..3e5e6781bf 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -11302,8 +11302,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		rss_inner = 1;
 #endif
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV4)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4)) ||
-	     !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4))) {
 		if (rss_types & MLX5_IPV4_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV4;
@@ -11313,8 +11312,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_IPV4_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV6)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6))) {
 		if (rss_types & MLX5_IPV6_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV6;
@@ -11337,8 +11335,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		return;
 	}
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_UDP)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP)) ||
-	    !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP))) {
 		if (rss_types & RTE_ETH_RSS_UDP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_UDP;
@@ -11348,8 +11345,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_UDP_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_TCP)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP))) {
 		if (rss_types & RTE_ETH_RSS_TCP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_TCP;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 7343d59f1f..46c4169b4f 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -62,6 +62,63 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	priv->mark_enabled = enable;
 }
 
+/**
+ * Set the hash fields according to the @p rss_desc information.
+ *
+ * @param[in] rss_desc
+ *   Pointer to the mlx5_flow_rss_desc.
+ * @param[out] hash_fields
+ *   Pointer to the RSS hash fields.
+ */
+static void
+flow_hw_hashfields_set(struct mlx5_flow_rss_desc *rss_desc,
+		       uint64_t *hash_fields)
+{
+	uint64_t fields = 0;
+	int rss_inner = 0;
+	uint64_t rss_types = rte_eth_rss_hf_refine(rss_desc->types);
+
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	if (rss_desc->level >= 2)
+		rss_inner = 1;
+#endif
+	if (rss_types & MLX5_IPV4_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV4;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV4;
+		else
+			fields |= MLX5_IPV4_IBV_RX_HASH;
+	} else if (rss_types & MLX5_IPV6_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV6;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV6;
+		else
+			fields |= MLX5_IPV6_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_UDP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_UDP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_UDP;
+		else
+			fields |= MLX5_UDP_IBV_RX_HASH;
+	} else if (rss_types & RTE_ETH_RSS_TCP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_TCP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_TCP;
+		else
+			fields |= MLX5_TCP_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_ESP)
+		fields |= IBV_RX_HASH_IPSEC_SPI;
+	if (rss_inner)
+		fields |= IBV_RX_HASH_INNER;
+	*hash_fields = fields;
+}
+
 /**
  * Generate the pattern item flags.
  * Will be used for shared RSS action.
@@ -225,7 +282,7 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 		       MLX5_RSS_HASH_KEY_LEN);
 		rss_desc.key_len = MLX5_RSS_HASH_KEY_LEN;
 		rss_desc.types = !rss->types ? RTE_ETH_RSS_IP : rss->types;
-		flow_dv_hashfields_set(0, &rss_desc, &rss_desc.hash_fields);
+		flow_hw_hashfields_set(&rss_desc, &rss_desc.hash_fields);
 		flow_dv_action_rss_l34_hash_adjust(rss->types,
 						   &rss_desc.hash_fields);
 		if (rss->level > 1) {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 03/27] net/mlx5: add shared header reformat support
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
  2022-09-23 14:43 ` [PATCH 01/27] net/mlx5: fix invalid flow attributes Suanming Mou
  2022-09-23 14:43 ` [PATCH 02/27] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 04/27] net/mlx5: add modify field hws support Suanming Mou
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

As the rte_flow_async API defines, the action mask with field value
not be 0 means the action will be used as shared in all the flows in
the table.

The header reformat action with action mask field not be 0 will be
created as constant shared action. For encapsulation header reformat
action, there are two kinds of encapsulation data, raw_encap_data
and rte_flow_item encap_data. Both of these two kinds of data can be
identified from the action mask conf as constant or not.

Examples:
1. VXLAN encap (encap_data: rte_flow_item)
	action conf (eth/ipv4/udp/vxlan_hdr)

	a. action mask conf (eth/ipv4/udp/vxlan_hdr)
	  - items are constant.
	b. action mask conf (NULL)
	  - items will change.

2. RAW encap (encap_data: raw)
	action conf (raw_data)

	a. action mask conf (not NULL)
	  - encap_data constant.
	b. action mask conf (NULL)
	  - encap_data will change.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   6 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 124 ++++++++++----------------------
 2 files changed, 39 insertions(+), 91 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1ad75fc8c6..74cb1cd235 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1065,10 +1065,6 @@ struct mlx5_action_construct_data {
 	uint16_t action_dst; /* mlx5dr_rule_action dst offset. */
 	union {
 		struct {
-			/* encap src(item) offset. */
-			uint16_t src;
-			/* encap dst data offset. */
-			uint16_t dst;
 			/* encap data len. */
 			uint16_t len;
 		} encap;
@@ -1111,6 +1107,8 @@ struct mlx5_hw_jump_action {
 /* Encap decap action struct. */
 struct mlx5_hw_encap_decap_action {
 	struct mlx5dr_action *action; /* Action object. */
+	/* Is header_reformat action shared across flows in table. */
+	bool shared;
 	size_t data_size; /* Action metadata size. */
 	uint8_t data[]; /* Action data. */
 };
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 46c4169b4f..b6978bd051 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -402,10 +402,6 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
  *   Offset of source rte flow action.
  * @param[in] action_dst
  *   Offset of destination DR action.
- * @param[in] encap_src
- *   Offset of source encap raw data.
- * @param[in] encap_dst
- *   Offset of destination encap raw data.
  * @param[in] len
  *   Length of the data to be updated.
  *
@@ -418,16 +414,12 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				enum rte_flow_action_type type,
 				uint16_t action_src,
 				uint16_t action_dst,
-				uint16_t encap_src,
-				uint16_t encap_dst,
 				uint16_t len)
 {	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
 		return -1;
-	act_data->encap.src = encap_src;
-	act_data->encap.dst = encap_dst;
 	act_data->encap.len = len;
 	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
 	return 0;
@@ -523,53 +515,6 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
-/**
- * Translate encap items to encapsulation list.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev data structure.
- * @param[in] acts
- *   Pointer to the template HW steering DR actions.
- * @param[in] type
- *   Action type.
- * @param[in] action_src
- *   Offset of source rte flow action.
- * @param[in] action_dst
- *   Offset of destination DR action.
- * @param[in] items
- *   Encap item pattern.
- * @param[in] items_m
- *   Encap item mask indicates which part are constant and dynamic.
- *
- * @return
- *    0 on success, negative value otherwise and rte_errno is set.
- */
-static __rte_always_inline int
-flow_hw_encap_item_translate(struct rte_eth_dev *dev,
-			     struct mlx5_hw_actions *acts,
-			     enum rte_flow_action_type type,
-			     uint16_t action_src,
-			     uint16_t action_dst,
-			     const struct rte_flow_item *items,
-			     const struct rte_flow_item *items_m)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	size_t len, total_len = 0;
-	uint32_t i = 0;
-
-	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++, items_m++, i++) {
-		len = flow_dv_get_item_hdr_len(items->type);
-		if ((!items_m->spec ||
-		    memcmp(items_m->spec, items->spec, len)) &&
-		    __flow_hw_act_data_encap_append(priv, acts, type,
-						    action_src, action_dst, i,
-						    total_len, len))
-			return -1;
-		total_len += len;
-	}
-	return 0;
-}
-
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -611,7 +556,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
-	uint8_t *encap_data = NULL;
+	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	bool actions_end = false;
 	uint32_t type, i;
@@ -718,9 +663,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_vxlan_encap *)
-				 masks->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -729,9 +674,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_nvgre_encap *)
-				actions->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -743,6 +688,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data =
+				(const struct rte_flow_action_raw_encap *)
+				 masks->conf;
+			if (raw_encap_data)
+				encap_data_m = raw_encap_data->data;
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 actions->conf;
@@ -773,22 +723,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
+		bool shared_rfmt = true;
 
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
-			if (flow_dv_convert_encap_data
-				(enc_item, buf, &data_size, error) ||
-			    flow_hw_encap_item_translate
-				(dev, acts, (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos,
-				 enc_item, enc_item_m))
+			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
 				goto err;
 			encap_data = buf;
-		} else if (encap_data && __flow_hw_act_data_encap_append
-				(priv, acts,
-				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, 0, 0, data_size)) {
-			goto err;
+			if (!enc_item_m)
+				shared_rfmt = false;
+		} else if (encap_data && !encap_data_m) {
+			shared_rfmt = false;
 		}
 		acts->encap_decap = mlx5_malloc(MLX5_MEM_ZERO,
 				    sizeof(*acts->encap_decap) + data_size,
@@ -802,12 +747,22 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		acts->encap_decap->action = mlx5dr_action_create_reformat
 				(priv->dr_ctx, refmt_type,
 				 data_size, encap_data,
-				 rte_log2_u32(table_attr->nb_flows),
-				 mlx5_hw_act_flag[!!attr->group][type]);
+				 shared_rfmt ? 0 : rte_log2_u32(table_attr->nb_flows),
+				 mlx5_hw_act_flag[!!attr->group][type] |
+				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
 		acts->rule_acts[reformat_pos].action =
 						acts->encap_decap->action;
+		acts->rule_acts[reformat_pos].reformat.data =
+						acts->encap_decap->data;
+		if (shared_rfmt)
+			acts->rule_acts[reformat_pos].reformat.offset = 0;
+		else if (__flow_hw_act_data_encap_append(priv, acts,
+				 (action_start + reformat_src)->type,
+				 reformat_src, reformat_pos, data_size))
+			goto err;
+		acts->encap_decap->shared = shared_rfmt;
 		acts->encap_decap_pos = reformat_pos;
 	}
 	acts->acts_num = i;
@@ -972,6 +927,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			.ingress = 1,
 	};
 	uint32_t ft_flag;
+	size_t encap_len = 0;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -989,9 +945,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
-	if (hw_acts->encap_decap && hw_acts->encap_decap->data_size)
-		memcpy(buf, hw_acts->encap_decap->data,
-		       hw_acts->encap_decap->data_size);
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1050,23 +1003,20 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 action->conf;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   raw_encap_data->data, act_data->encap.len);
+			rte_memcpy((void *)buf, raw_encap_data->data, act_data->encap.len);
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
@@ -1074,7 +1024,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
-	if (hw_acts->encap_decap) {
+	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 04/27] net/mlx5: add modify field hws support
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (2 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 03/27] net/mlx5: add shared header reformat support Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 05/27] net/mlx5: validate modify field action template Suanming Mou
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Dariusz Sosnowski

This patch introduces support for modify_field rte_flow actions in HWS
mode. Support includes:

- Ingress and egress domains,
	- SET and ADD operations,
	- usage of arbitrary bit offsets and widths for packet and metadata
	fields.

	Support is implemented in two phases:

	1. On flow table creation the hardware commands are generated, based
	on rte_flow action templates, and stored alongside action template.
	2. On flow rule creation/queueing the hardware commands are updated with
	values provided by the user. Any masks over immediate values, provided
	in action templates, are applied to these values before enqueueing rules
	for creation.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h   |   1 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +-
 drivers/net/mlx5/mlx5.h          |   1 +
 drivers/net/mlx5/mlx5_flow.h     |  96 ++++++
 drivers/net/mlx5/mlx5_flow_dv.c  | 538 ++++++++++++++++---------------
 drivers/net/mlx5/mlx5_flow_hw.c  | 445 ++++++++++++++++++++++++-
 6 files changed, 825 insertions(+), 274 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index b5624e7cd1..628bae72b2 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -751,6 +751,7 @@ enum mlx5_modification_field {
 	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
 	MLX5_MODI_GTP_TEID = 0x6E,
 	MLX5_MODI_OUT_IP_ECN = 0x73,
+	MLX5_MODI_TUNNEL_HDR_DW_1 = 0x75,
 };
 
 /* Total number of metadata reg_c's. */
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 6906914ba8..1877b6bec8 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1539,6 +1539,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				       mlx5_hrxq_clone_free_cb);
 	if (!priv->hrxqs)
 		goto error;
+	mlx5_set_metadata_mask(eth_dev);
+	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+	    !priv->sh->dv_regc0_mask) {
+		DRV_LOG(ERR, "metadata mode %u is not supported "
+			     "(no metadata reg_c[0] is available)",
+			     sh->config.dv_xmeta_en);
+			err = ENOTSUP;
+			goto error;
+	}
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->vport_meta_mask)
 		flow_hw_set_port_info(eth_dev);
@@ -1560,15 +1569,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		err = -err;
 		goto error;
 	}
-	mlx5_set_metadata_mask(eth_dev);
-	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
-	    !priv->sh->dv_regc0_mask) {
-		DRV_LOG(ERR, "metadata mode %u is not supported "
-			     "(no metadata reg_c[0] is available)",
-			     sh->config.dv_xmeta_en);
-			err = ENOTSUP;
-			goto error;
-	}
 	/* Query availability of metadata reg_c's. */
 	if (!priv->sh->metadata_regc_check_flag) {
 		err = mlx5_flow_discover_mreg_c(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 48ae2244da..f3bd45d4c5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -343,6 +343,7 @@ struct mlx5_hw_q_job {
 	struct rte_flow_hw *flow; /* Flow attached to the job. */
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
+	struct mlx5_modification_cmd *mhdr_cmd;
 };
 
 /* HW steering job descriptor LIFO pool. */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 74cb1cd235..a7235b524d 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1008,6 +1008,51 @@ flow_items_to_tunnel(const struct rte_flow_item items[])
 	return items[0].spec;
 }
 
+/**
+ * Fetch 1, 2, 3 or 4 byte field from the byte array
+ * and return as unsigned integer in host-endian format.
+ *
+ * @param[in] data
+ *   Pointer to data array.
+ * @param[in] size
+ *   Size of field to extract.
+ *
+ * @return
+ *   converted field in host endian format.
+ */
+static inline uint32_t
+flow_dv_fetch_field(const uint8_t *data, uint32_t size)
+{
+	uint32_t ret;
+
+	switch (size) {
+	case 1:
+		ret = *data;
+		break;
+	case 2:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		break;
+	case 3:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		ret = (ret << 8) | *(data + sizeof(uint16_t));
+		break;
+	case 4:
+		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
+		break;
+	default:
+		MLX5_ASSERT(false);
+		ret = 0;
+		break;
+	}
+	return ret;
+}
+
+struct field_modify_info {
+	uint32_t size; /* Size of field in protocol header, in bytes. */
+	uint32_t offset; /* Offset of field in protocol header, in bytes. */
+	enum mlx5_modification_field id;
+};
+
 /* HW steering flow attributes. */
 struct mlx5_flow_attr {
 	uint32_t port_id; /* Port index. */
@@ -1068,6 +1113,29 @@ struct mlx5_action_construct_data {
 			/* encap data len. */
 			uint16_t len;
 		} encap;
+		struct {
+			/* Modify header action offset in pattern. */
+			uint16_t mhdr_cmds_off;
+			/* Offset in pattern after modify header actions. */
+			uint16_t mhdr_cmds_end;
+			/*
+			 * True if this action is masked and does not need to
+			 * be generated.
+			 */
+			bool shared;
+			/*
+			 * Modified field definitions in dst field (SET, ADD)
+			 * or src field (COPY).
+			 */
+			struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS];
+			/* Modified field definitions in dst field (COPY). */
+			struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS];
+			/*
+			 * Masks applied to field values to generate
+			 * PRM actions.
+			 */
+			uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS];
+		} modify_header;
 		struct {
 			uint64_t types; /* RSS hash types. */
 			uint32_t level; /* RSS level. */
@@ -1093,6 +1161,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 };
 
@@ -1113,6 +1182,22 @@ struct mlx5_hw_encap_decap_action {
 	uint8_t data[]; /* Action data. */
 };
 
+#define MLX5_MHDR_MAX_CMD ((MLX5_MAX_MODIFY_NUM) * 2 + 1)
+
+/* Modify field action struct. */
+struct mlx5_hw_modify_header_action {
+	/* Reference to DR action */
+	struct mlx5dr_action *action;
+	/* Modify header action position in action rule table. */
+	uint16_t pos;
+	/* Is MODIFY_HEADER action shared across flows in table. */
+	bool shared;
+	/* Amount of modification commands stored in the precompiled buffer. */
+	uint32_t mhdr_cmds_num;
+	/* Precompiled modification commands. */
+	struct mlx5_modification_cmd mhdr_cmds[MLX5_MHDR_MAX_CMD];
+};
+
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
@@ -1122,6 +1207,7 @@ struct mlx5_hw_actions {
 	LIST_HEAD(act_list, mlx5_action_construct_data) act_list;
 	struct mlx5_hw_jump_action *jump; /* Jump action. */
 	struct mlx5_hrxq *tir; /* TIR action. */
+	struct mlx5_hw_modify_header_action *mhdr; /* Modify header action. */
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
@@ -2200,6 +2286,16 @@ int flow_dv_action_query(struct rte_eth_dev *dev,
 size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type);
 int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf,
 			   size_t *size, struct rte_flow_error *error);
+void mlx5_flow_field_id_to_modify_info
+		(const struct rte_flow_action_modify_data *data,
+		 struct field_modify_info *info, uint32_t *mask,
+		 uint32_t width, struct rte_eth_dev *dev,
+		 const struct rte_flow_attr *attr, struct rte_flow_error *error);
+int flow_dv_convert_modify_action(struct rte_flow_item *item,
+			      struct field_modify_info *field,
+			      struct field_modify_info *dcopy,
+			      struct mlx5_flow_dv_modify_hdr_resource *resource,
+			      uint32_t type, struct rte_flow_error *error);
 
 #define MLX5_PF_VPORT_ID 0
 #define MLX5_ECPF_VPORT_ID 0xFFFE
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 3e5e6781bf..5d3e2d37bb 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -241,12 +241,6 @@ rte_col_2_mlx5_col(enum rte_color rcol)
 	return MLX5_FLOW_COLOR_UNDEFINED;
 }
 
-struct field_modify_info {
-	uint32_t size; /* Size of field in protocol header, in bytes. */
-	uint32_t offset; /* Offset of field in protocol header, in bytes. */
-	enum mlx5_modification_field id;
-};
-
 struct field_modify_info modify_eth[] = {
 	{4,  0, MLX5_MODI_OUT_DMAC_47_16},
 	{2,  4, MLX5_MODI_OUT_DMAC_15_0},
@@ -379,45 +373,6 @@ mlx5_update_vlan_vid_pcp(const struct rte_flow_action *action,
 	}
 }
 
-/**
- * Fetch 1, 2, 3 or 4 byte field from the byte array
- * and return as unsigned integer in host-endian format.
- *
- * @param[in] data
- *   Pointer to data array.
- * @param[in] size
- *   Size of field to extract.
- *
- * @return
- *   converted field in host endian format.
- */
-static inline uint32_t
-flow_dv_fetch_field(const uint8_t *data, uint32_t size)
-{
-	uint32_t ret;
-
-	switch (size) {
-	case 1:
-		ret = *data;
-		break;
-	case 2:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		break;
-	case 3:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		ret = (ret << 8) | *(data + sizeof(uint16_t));
-		break;
-	case 4:
-		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
-		break;
-	default:
-		MLX5_ASSERT(false);
-		ret = 0;
-		break;
-	}
-	return ret;
-}
-
 /**
  * Convert modify-header action to DV specification.
  *
@@ -446,7 +401,7 @@ flow_dv_fetch_field(const uint8_t *data, uint32_t size)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-static int
+int
 flow_dv_convert_modify_action(struct rte_flow_item *item,
 			      struct field_modify_info *field,
 			      struct field_modify_info *dcopy,
@@ -1464,7 +1419,32 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static void
+static __rte_always_inline uint8_t
+flow_modify_info_mask_8(uint32_t length, uint32_t off)
+{
+	return (0xffu >> (8 - length)) << off;
+}
+
+static __rte_always_inline uint16_t
+flow_modify_info_mask_16(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_16((0xffffu >> (16 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_32((0xffffffffu >> (32 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mask)
+{
+	uint32_t mask = (0xffffffffu >> (32 - length)) << off;
+	return rte_cpu_to_be_32(mask & post_mask);
+}
+
+void
 mlx5_flow_field_id_to_modify_info
 		(const struct rte_flow_action_modify_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
@@ -1473,323 +1453,340 @@ mlx5_flow_field_id_to_modify_info
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint32_t idx = 0;
-	uint32_t off = 0;
-
-	switch (data->field) {
+	uint32_t off_be = 0;
+	uint32_t length = 0;
+	switch ((int)data->field) {
 	case RTE_FLOW_FIELD_START:
 		/* not supported yet */
 		MLX5_ASSERT(false);
 		break;
 	case RTE_FLOW_FIELD_MAC_DST:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_DMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_DMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_DMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_DMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_DMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_SRC:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_SMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_SMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_SMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_SMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_SMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VLAN_TYPE:
 		/* not supported yet */
 		break;
 	case RTE_FLOW_FIELD_VLAN_ID:
+		MLX5_ASSERT(data->offset + width <= 12);
+		off_be = 12 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_FIRST_VID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x0fff >> (12 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_TYPE:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_ETHERTYPE};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_TTL:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV4_TTL};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_SRC:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_SIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DST:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_DIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_HOPLIMIT:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV6_HOPLIMIT};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_SRC:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_SIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_SIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_SIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	case RTE_FLOW_FIELD_IPV6_SRC: {
+		/*
+		 * Fields corresponding to IPv6 source address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_SIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_SIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_SIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_SIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_DST:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_DIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_DIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_DIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	}
+	case RTE_FLOW_FIELD_IPV6_DST: {
+		/*
+		 * Fields corresponding to IPv6 destination address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_DIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_DIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_DIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_DIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
+	}
 	case RTE_FLOW_FIELD_TCP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_SEQ_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_SEQ_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_ACK_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_ACK_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_FLAGS:
+		MLX5_ASSERT(data->offset + width <= 9);
+		off_be = 9 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_FLAGS};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x1ff >> (9 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VXLAN_VNI:
-		/* not supported yet */
+		MLX5_ASSERT(data->offset + width <= 24);
+		/* VNI is on bits 31-8 of TUNNEL_HDR_DW_1. */
+		off_be = 24 - (data->offset + width) + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_TUNNEL_HDR_DW_1};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_GENEVE_VNI:
 		/* not supported yet*/
 		break;
 	case RTE_FLOW_FIELD_GTP_TEID:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_GTP_TEID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TAG:
 		{
-			int reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
-						   data->level, error);
+			MLX5_ASSERT(data->offset + width <= 32);
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = REG_C_1;
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
+							   data->level, error);
 			if (reg < 0)
 				return;
 			MLX5_ASSERT(reg != REG_NON);
@@ -1797,15 +1794,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] =
-					rte_cpu_to_be_32(0xffffffff >>
-							 (32 - width));
+				mask[idx] = flow_modify_info_mask_32
+					(width, data->offset);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_MARK:
 		{
 			uint32_t mark_mask = priv->sh->dv_mark_mask;
 			uint32_t mark_count = __builtin_popcount(mark_mask);
+			RTE_SET_USED(mark_count);
+			MLX5_ASSERT(data->offset + width <= mark_count);
 			int reg = mlx5_flow_get_reg_id(dev, MLX5_FLOW_MARK,
 						       0, error);
 			if (reg < 0)
@@ -1815,14 +1815,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((mark_mask >>
-					 (mark_count - width)) & mark_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, mark_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_META:
 		{
 			uint32_t meta_mask = priv->sh->dv_meta_mask;
 			uint32_t meta_count = __builtin_popcount(meta_mask);
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
 			int reg = flow_dv_get_metadata_reg(dev, attr, error);
 			if (reg < 0)
 				return;
@@ -1831,16 +1835,22 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((meta_mask >>
-					(meta_count - width)) & meta_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+		MLX5_ASSERT(data->offset + width <= 2);
+		off_be = 2 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_ECN};
 		if (mask)
-			mask[idx] = 0x3 >> (2 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index b6978bd051..b89d2cc44f 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -319,6 +319,11 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->mhdr) {
+		if (acts->mhdr->action)
+			mlx5dr_action_destroy(acts->mhdr->action);
+		mlx5_free(acts->mhdr);
+	}
 }
 
 /**
@@ -425,6 +430,37 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+static __rte_always_inline int
+__flow_hw_act_data_hdr_modify_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     uint16_t mhdr_cmds_off,
+				     uint16_t mhdr_cmds_end,
+				     bool shared,
+				     struct field_modify_info *field,
+				     struct field_modify_info *dcopy,
+				     uint32_t *mask)
+{
+	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->modify_header.mhdr_cmds_off = mhdr_cmds_off;
+	act_data->modify_header.mhdr_cmds_end = mhdr_cmds_end;
+	act_data->modify_header.shared = shared;
+	rte_memcpy(act_data->modify_header.field, field,
+		   sizeof(*field) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.dcopy, dcopy,
+		   sizeof(*dcopy) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.mask, mask,
+		   sizeof(*mask) * MLX5_ACT_MAX_MOD_FIELDS);
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
 /**
  * Append shared RSS action to the dynamic action list.
  *
@@ -515,6 +551,257 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline bool
+flow_hw_action_modify_field_is_shared(const struct rte_flow_action *action,
+				      const struct rte_flow_action *mask)
+{
+	const struct rte_flow_action_modify_field *v = action->conf;
+	const struct rte_flow_action_modify_field *m = mask->conf;
+
+	if (v->src.field == RTE_FLOW_FIELD_VALUE) {
+		uint32_t j;
+
+		if (m == NULL)
+			return false;
+		for (j = 0; j < RTE_DIM(m->src.value); ++j) {
+			/*
+			 * Immediate value is considered to be masked
+			 * (and thus shared by all flow rules), if mask
+			 * is non-zero. Partial mask over immediate value
+			 * is not allowed.
+			 */
+			if (m->src.value[j])
+				return true;
+		}
+		return false;
+	}
+	if (v->src.field == RTE_FLOW_FIELD_POINTER)
+		return m->src.pvalue != NULL;
+	/*
+	 * Source field types other than VALUE and
+	 * POINTER are always shared.
+	 */
+	return true;
+}
+
+static __rte_always_inline bool
+flow_hw_should_insert_nop(const struct mlx5_hw_modify_header_action *mhdr,
+			  const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd last_cmd = { { 0 } };
+	struct mlx5_modification_cmd new_cmd = { { 0 } };
+	const uint32_t cmds_num = mhdr->mhdr_cmds_num;
+	unsigned int last_type;
+	bool should_insert = false;
+
+	if (cmds_num == 0)
+		return false;
+	last_cmd = *(&mhdr->mhdr_cmds[cmds_num - 1]);
+	last_cmd.data0 = rte_be_to_cpu_32(last_cmd.data0);
+	last_cmd.data1 = rte_be_to_cpu_32(last_cmd.data1);
+	last_type = last_cmd.action_type;
+	new_cmd = *cmd;
+	new_cmd.data0 = rte_be_to_cpu_32(new_cmd.data0);
+	new_cmd.data1 = rte_be_to_cpu_32(new_cmd.data1);
+	switch (new_cmd.action_type) {
+	case MLX5_MODIFICATION_TYPE_SET:
+	case MLX5_MODIFICATION_TYPE_ADD:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = new_cmd.field == last_cmd.field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = new_cmd.field == last_cmd.dst_field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	case MLX5_MODIFICATION_TYPE_COPY:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = (new_cmd.field == last_cmd.field ||
+					 new_cmd.dst_field == last_cmd.field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = (new_cmd.field == last_cmd.dst_field ||
+					 new_cmd.dst_field == last_cmd.dst_field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	default:
+		/* Other action types should be rejected on AT validation. */
+		MLX5_ASSERT(false);
+		break;
+	}
+	return should_insert;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_nop_append(struct mlx5_hw_modify_header_action *mhdr)
+{
+	struct mlx5_modification_cmd *nop;
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	nop = mhdr->mhdr_cmds + num;
+	nop->data0 = 0;
+	nop->action_type = MLX5_MODIFICATION_TYPE_NOP;
+	nop->data0 = rte_cpu_to_be_32(nop->data0);
+	nop->data1 = 0;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_append(struct mlx5_hw_modify_header_action *mhdr,
+			struct mlx5_modification_cmd *cmd)
+{
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	mhdr->mhdr_cmds[num] = *cmd;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_converted_mhdr_cmds_append(struct mlx5_hw_modify_header_action *mhdr,
+				   struct mlx5_flow_dv_modify_hdr_resource *resource)
+{
+	uint32_t idx;
+	int ret;
+
+	for (idx = 0; idx < resource->actions_num; ++idx) {
+		struct mlx5_modification_cmd *src = &resource->actions[idx];
+
+		if (flow_hw_should_insert_nop(mhdr, src)) {
+			ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+			if (ret)
+				return ret;
+		}
+		ret = flow_hw_mhdr_cmd_append(mhdr, src);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static __rte_always_inline void
+flow_hw_modify_field_init(struct mlx5_hw_modify_header_action *mhdr,
+			  struct rte_flow_actions_template *at)
+{
+	memset(mhdr, 0, sizeof(*mhdr));
+	/* Modify header action without any commands is shared by default. */
+	mhdr->shared = true;
+	mhdr->pos = at->mhdr_off;
+}
+
+static __rte_always_inline int
+flow_hw_modify_field_compile(struct rte_eth_dev *dev,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *action_start, /* Start of AT actions. */
+			     const struct rte_flow_action *action, /* Current action from AT. */
+			     const struct rte_flow_action *action_mask, /* Current mask from AT. */
+			     struct mlx5_hw_actions *acts,
+			     struct mlx5_hw_modify_header_action *mhdr,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_modify_field *conf = action->conf;
+	union {
+		struct mlx5_flow_dv_modify_hdr_resource resource;
+		uint8_t data[sizeof(struct mlx5_flow_dv_modify_hdr_resource) +
+			     sizeof(struct mlx5_modification_cmd) * MLX5_MHDR_MAX_CMD];
+	} dummy;
+	struct mlx5_flow_dv_modify_hdr_resource *resource;
+	struct rte_flow_item item = {
+		.spec = NULL,
+		.mask = NULL
+	};
+	struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS] = { 0 };
+	uint32_t type, value = 0;
+	uint16_t cmds_start, cmds_end;
+	bool shared;
+	int ret;
+
+	/*
+	 * Modify header action is shared if previous modify_field actions
+	 * are shared and currently compiled action is shared.
+	 */
+	shared = flow_hw_action_modify_field_is_shared(action, action_mask);
+	mhdr->shared &= shared;
+	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
+	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
+		type = conf->operation == RTE_FLOW_MODIFY_SET ? MLX5_MODIFICATION_TYPE_SET :
+								MLX5_MODIFICATION_TYPE_ADD;
+		/* For SET/ADD fill the destination field (field) first. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
+						  conf->width, dev,
+						  attr, error);
+		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
+				(void *)(uintptr_t)conf->src.pvalue :
+				(void *)(uintptr_t)&conf->src.value;
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+			value = *(const unaligned_uint32_t *)item.spec;
+			value = rte_cpu_to_be_32(value);
+			item.spec = &value;
+		}
+	} else {
+		type = MLX5_MODIFICATION_TYPE_COPY;
+		/* For COPY fill the destination field (dcopy) without mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, dcopy, NULL,
+						  conf->width, dev,
+						  attr, error);
+		/* Then construct the source field (field) with mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->src, field, mask,
+						  conf->width, dev,
+						  attr, error);
+	}
+	item.mask = &mask;
+	memset(&dummy, 0, sizeof(dummy));
+	resource = &dummy.resource;
+	ret = flow_dv_convert_modify_action(&item, field, dcopy, resource, type, error);
+	if (ret)
+		return ret;
+	MLX5_ASSERT(resource->actions_num > 0);
+	/*
+	 * If previous modify field action collide with this one, then insert NOP command.
+	 * This NOP command will not be a part of action's command range used to update commands
+	 * on rule creation.
+	 */
+	if (flow_hw_should_insert_nop(mhdr, &resource->actions[0])) {
+		ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+		if (ret)
+			return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL, "too many modify field operations specified");
+	}
+	cmds_start = mhdr->mhdr_cmds_num;
+	ret = flow_hw_converted_mhdr_cmds_append(mhdr, resource);
+	if (ret)
+		return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "too many modify field operations specified");
+
+	cmds_end = mhdr->mhdr_cmds_num;
+	if (shared)
+		return 0;
+	ret = __flow_hw_act_data_hdr_modify_append(priv, acts, RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+						   action - action_start, mhdr->pos,
+						   cmds_start, cmds_end, shared,
+						   field, dcopy, mask);
+	if (ret)
+		return rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "not enough memory to store modify field metadata");
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -558,10 +845,12 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
+	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
 	uint32_t type, i;
 	int err;
 
+	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
 		type = MLX5DR_TABLE_TYPE_FDB;
 	else if (attr->egress)
@@ -714,6 +1003,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			reformat_pos = i++;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr.pos == UINT16_MAX)
+				mhdr.pos = i++;
+			err = flow_hw_modify_field_compile(dev, attr, action_start,
+							   actions, masks, acts, &mhdr,
+							   error);
+			if (err)
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -721,6 +1019,31 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (mhdr.pos != UINT16_MAX) {
+		uint32_t flags;
+		uint32_t bulk_size;
+		size_t mhdr_len;
+
+		acts->mhdr = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*acts->mhdr),
+					 0, SOCKET_ID_ANY);
+		if (!acts->mhdr)
+			goto err;
+		rte_memcpy(acts->mhdr, &mhdr, sizeof(*acts->mhdr));
+		mhdr_len = sizeof(struct mlx5_modification_cmd) * acts->mhdr->mhdr_cmds_num;
+		flags = mlx5_hw_act_flag[!!attr->group][type];
+		if (acts->mhdr->shared) {
+			flags |= MLX5DR_ACTION_FLAG_SHARED;
+			bulk_size = 0;
+		} else {
+			bulk_size = rte_log2_u32(table_attr->nb_flows);
+		}
+		acts->mhdr->action = mlx5dr_action_create_modify_header
+				(priv->dr_ctx, mhdr_len, (__be64 *)acts->mhdr->mhdr_cmds,
+				 bulk_size, flags);
+		if (!acts->mhdr->action)
+			goto err;
+		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
+	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
@@ -884,6 +1207,100 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_mhdr_cmd_is_nop(const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd cmd_he = {
+		.data0 = rte_be_to_cpu_32(cmd->data0),
+		.data1 = 0,
+	};
+
+	return cmd_he.action_type == MLX5_MODIFICATION_TYPE_NOP;
+}
+
+/**
+ * Construct flow action array.
+ *
+ * For action template contains dynamic actions, these actions need to
+ * be updated according to the rte_flow action during flow creation.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] job
+ *   Pointer to job descriptor.
+ * @param[in] hw_acts
+ *   Pointer to translated actions from template.
+ * @param[in] it_idx
+ *   Item template index the action template refer to.
+ * @param[in] actions
+ *   Array of rte_flow action need to be checked.
+ * @param[in] rule_acts
+ *   Array of DR rule actions to be used during flow creation..
+ * @param[in] acts_num
+ *   Pointer to the real acts_num flow has.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	const struct rte_flow_action_modify_field *mhdr_action = action->conf;
+	uint8_t values[16] = { 0 };
+	unaligned_uint32_t *value_p;
+	uint32_t i;
+	struct field_modify_info *field;
+
+	if (!hw_acts->mhdr)
+		return -1;
+	if (hw_acts->mhdr->shared || act_data->modify_header.shared)
+		return 0;
+	MLX5_ASSERT(mhdr_action->operation == RTE_FLOW_MODIFY_SET ||
+		    mhdr_action->operation == RTE_FLOW_MODIFY_ADD);
+	if (mhdr_action->src.field != RTE_FLOW_FIELD_VALUE &&
+	    mhdr_action->src.field != RTE_FLOW_FIELD_POINTER)
+		return 0;
+	if (mhdr_action->src.field == RTE_FLOW_FIELD_VALUE)
+		rte_memcpy(values, &mhdr_action->src.value, sizeof(values));
+	else
+		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
+	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(*value_p);
+	}
+	i = act_data->modify_header.mhdr_cmds_off;
+	field = act_data->modify_header.field;
+	do {
+		uint32_t off_b;
+		uint32_t mask;
+		uint32_t data;
+		const uint8_t *mask_src;
+
+		if (i >= act_data->modify_header.mhdr_cmds_end)
+			return -1;
+		if (flow_hw_mhdr_cmd_is_nop(&job->mhdr_cmd[i])) {
+			++i;
+			continue;
+		}
+		mask_src = (const uint8_t *)act_data->modify_header.mask;
+		mask = flow_dv_fetch_field(mask_src + field->offset, field->size);
+		if (!mask) {
+			++field;
+			continue;
+		}
+		off_b = rte_bsf32(mask);
+		data = flow_dv_fetch_field(values + field->offset, field->size);
+		data = (data & mask) >> off_b;
+		job->mhdr_cmd[i++].data1 = rte_cpu_to_be_32(data);
+		++field;
+	} while (field->size);
+	return 0;
+}
+
 /**
  * Construct flow action array.
  *
@@ -928,6 +1345,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	};
 	uint32_t ft_flag;
 	size_t encap_len = 0;
+	int ret;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -945,6 +1363,18 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
+	if (hw_acts->mhdr && hw_acts->mhdr->mhdr_cmds_num > 0) {
+		uint16_t pos = hw_acts->mhdr->pos;
+
+		if (!hw_acts->mhdr->shared) {
+			rule_acts[pos].modify_header.offset =
+						job->flow->idx - 1;
+			rule_acts[pos].modify_header.data =
+						(uint8_t *)job->mhdr_cmd;
+			rte_memcpy(job->mhdr_cmd, hw_acts->mhdr->mhdr_cmds,
+				   sizeof(*job->mhdr_cmd) * hw_acts->mhdr->mhdr_cmds_num);
+		}
+	}
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1020,6 +1450,14 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_modify_field_construct(job,
+							     act_data,
+							     hw_acts,
+							     action);
+			if (ret)
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2093,6 +2531,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
+			    sizeof(struct mlx5_modification_cmd) *
+			    MLX5_MHDR_MAX_CMD +
 			    sizeof(struct mlx5_hw_q_job)) *
 			    queue_attr[0]->size;
 	}
@@ -2104,6 +2544,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	for (i = 0; i < nb_queue; i++) {
 		uint8_t *encap = NULL;
+		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 
 		priv->hw_q[i].job_idx = queue_attr[i]->size;
 		priv->hw_q[i].size = queue_attr[i]->size;
@@ -2115,8 +2556,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 					    &job[queue_attr[i - 1]->size];
 		job = (struct mlx5_hw_q_job *)
 		      &priv->hw_q[i].job[queue_attr[i]->size];
-		encap = (uint8_t *)&job[queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
+		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
 		for (j = 0; j < queue_attr[i]->size; j++) {
+			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
 			priv->hw_q[i].job[j] = &job[j];
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 05/27] net/mlx5: validate modify field action template
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (3 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 04/27] net/mlx5: add modify field hws support Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 06/27] net/mlx5: enable mark flag for all ports in the same domain Suanming Mou
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adds validation step for action templates and validates if
RTE_FLOW_ACTION_TYPE_MODIFY_FIELD actions' fields are properly masked.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_hw.c | 132 ++++++++++++++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index b89d2cc44f..1f98e1248a 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -2047,6 +2047,136 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
+				     const struct rte_flow_action *mask,
+				     struct rte_flow_error *error)
+{
+	const struct rte_flow_action_modify_field *action_conf =
+		action->conf;
+	const struct rte_flow_action_modify_field *mask_conf =
+		mask->conf;
+
+	if (action_conf->operation != mask_conf->operation)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field operation mask and template are not equal");
+	if (action_conf->dst.field != mask_conf->dst.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->dst.field == RTE_FLOW_FIELD_POINTER ||
+	    action_conf->dst.field == RTE_FLOW_FIELD_VALUE)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"immediate value and pointer cannot be used as destination");
+	if (mask_conf->dst.level != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination encapsulation level must be fully masked");
+	if (mask_conf->dst.offset != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination offset level must be fully masked");
+	if (action_conf->src.field != mask_conf->src.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->src.field != RTE_FLOW_FIELD_POINTER &&
+	    action_conf->src.field != RTE_FLOW_FIELD_VALUE) {
+		if (mask_conf->src.level != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source encapsulation level must be fully masked");
+		if (mask_conf->src.offset != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source offset level must be fully masked");
+	}
+	if (mask_conf->width != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field width field must be fully masked");
+	return 0;
+}
+
+static int
+flow_hw_action_validate(const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	int i;
+	bool actions_end = false;
+	int ret;
+
+	for (i = 0; !actions_end; ++i) {
+		const struct rte_flow_action *action = &actions[i];
+		const struct rte_flow_action *mask = &masks[i];
+
+		if (action->type != mask->type)
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "mask type does not match action type");
+		switch (action->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MARK:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_DROP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_JUMP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_validate_action_modify_field(action,
+									mask,
+									error);
+			if (ret < 0)
+				return ret;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			actions_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "action not supported in template API");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow action template.
  *
@@ -2075,6 +2205,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
+	if (flow_hw_action_validate(actions, masks, error))
+		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
 	if (act_len <= 0)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 06/27] net/mlx5: enable mark flag for all ports in the same domain
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (4 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 05/27] net/mlx5: validate modify field action template Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 07/27] net/mlx5: create port actions Suanming Mou
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Raja Zidane; +Cc: dev, Bing Zhao

From: Bing Zhao <bingz@nvidia.com>

In the switchdev mode, there is a unique FDB domain for all the
representors and only the eswitch manager can insert the rule into
this domain.

If a flow rule is like below:
flow create 0 ingress transfer pattern port_id id is X / eth / end
actions mark id 25 ...
It is used for representor X and the mark flag was not enabled for
the queues of this port.

To fix this, once the mark flag needs to be enabled, in a FDB case,
all the queues' mark flag belonging to the same domain will be
engaged for only once.

Fixes: e211aca851a7 ("net/mlx5: fix mark enabling for Rx")

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_flow.c | 28 ++++++++++++++++++++++++----
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f3bd45d4c5..18d70e795f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1202,6 +1202,8 @@ struct mlx5_dev_ctx_shared {
 	uint32_t flow_priority_check_flag:1; /* Check Flag for flow priority. */
 	uint32_t metadata_regc_check_flag:1; /* Check Flag for metadata REGC. */
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
+	uint32_t shared_mark_enabled:1;
+	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 3abb39aa92..c856d249db 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1481,13 +1481,32 @@ flow_rxq_mark_flag_set(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_ctrl *rxq_ctrl;
+	uint16_t port_id;
 
-	if (priv->mark_enabled)
+	if (priv->sh->shared_mark_enabled)
 		return;
-	LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
-		rxq_ctrl->rxq.mark = 1;
+	if (priv->master || priv->representor) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->domain_id != priv->domain_id ||
+			    opriv->mark_enabled)
+				continue;
+			LIST_FOREACH(rxq_ctrl, &opriv->rxqsctrl, next) {
+				rxq_ctrl->rxq.mark = 1;
+			}
+			opriv->mark_enabled = 1;
+		}
+	} else {
+		LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
+			rxq_ctrl->rxq.mark = 1;
+		}
+		priv->mark_enabled = 1;
 	}
-	priv->mark_enabled = 1;
+	priv->sh->shared_mark_enabled = 1;
 }
 
 /**
@@ -1623,6 +1642,7 @@ flow_rxq_flags_clear(struct rte_eth_dev *dev)
 		rxq->ctrl->rxq.tunnel = 0;
 	}
 	priv->mark_enabled = 0;
+	priv->sh->shared_mark_enabled = 0;
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 07/27] net/mlx5: create port actions
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (5 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 06/27] net/mlx5: enable mark flag for all ports in the same domain Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 08/27] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch implements creating and caching of port actions for use with
HW Steering FDB flows.

Actions are created on flow template API configuration and created
only on the port designated as master. Attaching and detaching of ports
in the same switching domain causes an update to the port actions cache
by, respectively, creating and destroying actions.

A new devarg fdb_def_rule_en is being added and it's used to control
the default dedicated E-Switch rule is created by PMD implicitly or not,
and PMD sets this value to 1 by default.
If set to 0, the default E-Switch rule will not be created and user can
create the specific E-Switch rule on root table if needed.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |    9 +
 drivers/net/mlx5/linux/mlx5_os.c   |   12 +
 drivers/net/mlx5/mlx5.c            |   14 +
 drivers/net/mlx5/mlx5.h            |   24 +-
 drivers/net/mlx5/mlx5_flow.c       |   68 +-
 drivers/net/mlx5/mlx5_flow.h       |   22 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   93 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1350 +++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_trigger.c    |   69 +-
 10 files changed, 1554 insertions(+), 111 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 631f0840eb..c42ac482d8 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1118,6 +1118,15 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``fdb_def_rule_en`` parameter [int]
+
+  A non-zero value enables the PMD to create a dedicated rule on E-Switch root
+  table, this dedicated rule forwards all incoming packets into table 1, other
+  rules will be created in E-Switch table original table level plus one, to
+  improve the flow insertion rate due to skip root table managed by firmware.
+  If set to 0, all rules will be created on the original E-Switch table level.
+
+  By default, the PMD will set this value to 1.
 
 Supported NICs
 --------------
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 1877b6bec8..28220d10ad 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1554,6 +1554,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->sh->config.dv_flow_en == 2) {
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    flow_hw_create_vport_action(eth_dev)) {
+			DRV_LOG(ERR, "port %u failed to create vport action",
+				eth_dev->data->port_id);
+			err = EINVAL;
+			goto error;
+		}
 		return eth_dev;
 	}
 	/* Port representor shares the same max priority with pf port. */
@@ -1614,6 +1621,11 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	return eth_dev;
 error:
 	if (priv) {
+		if (eth_dev &&
+		    priv->sh &&
+		    priv->sh->config.dv_flow_en == 2 &&
+		    priv->sh->config.dv_esw_en)
+			flow_hw_destroy_vport_action(eth_dev);
 		if (priv->mreg_cp_tbl)
 			mlx5_hlist_destroy(priv->mreg_cp_tbl);
 		if (priv->sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 556709c697..a21b8c69a9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -172,6 +172,9 @@
 /* Device parameter to configure the delay drop when creating Rxqs. */
 #define MLX5_DELAY_DROP "delay_drop"
 
+/* Device parameter to create the fdb default rule in PMD */
+#define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1239,6 +1242,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->decap_en = !!tmp;
 	} else if (strcmp(MLX5_ALLOW_DUPLICATE_PATTERN, key) == 0) {
 		config->allow_duplicate_pattern = !!tmp;
+	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
+		config->fdb_def_rule = !!tmp;
 	}
 	return 0;
 }
@@ -1274,6 +1279,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_RECLAIM_MEM,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
+		MLX5_FDB_DEFAULT_RULE_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1285,6 +1291,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->dv_flow_en = 1;
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
+	config->fdb_def_rule = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1360,6 +1367,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"decap_en\" is %u.", config->decap_en);
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
+	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
 	return 0;
 }
 
@@ -1943,6 +1951,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	mlx5_flex_parser_ecpri_release(dev);
 	mlx5_flex_item_port_cleanup(dev);
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 #endif
 	flow_hw_clear_port_info(dev);
@@ -2644,6 +2653,11 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.fdb_def_rule ^ config->fdb_def_rule) {
+		DRV_LOG(ERR, "\"fdb_def_rule_en\" configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.l3_vxlan_en ^ config->l3_vxlan_en) {
 		DRV_LOG(ERR, "\"l3_vxlan_en\" "
 			"configuration mismatch for shared %s context.",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 18d70e795f..77dbe3593e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -309,6 +309,7 @@ struct mlx5_sh_config {
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
 	/* Allow/Prevent the duplicate rules pattern. */
+	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
 
@@ -337,6 +338,8 @@ enum {
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
 };
 
+#define MLX5_HW_MAX_ITEMS (16)
+
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
@@ -344,6 +347,8 @@ struct mlx5_hw_q_job {
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
+	struct rte_flow_item *items;
+	struct rte_flow_item_ethdev port_spec;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -1452,6 +1457,12 @@ struct mlx5_obj_ops {
 
 #define MLX5_RSS_HASH_FIELDS_LEN RTE_DIM(mlx5_rss_hash_fields)
 
+struct mlx5_hw_ctrl_flow {
+	LIST_ENTRY(mlx5_hw_ctrl_flow) next;
+	struct rte_eth_dev *owner_dev;
+	struct rte_flow *flow;
+};
+
 struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared device context. */
@@ -1492,6 +1503,12 @@ struct mlx5_priv {
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
 	void *root_drop_action; /* Pointer to root drop action. */
+	rte_spinlock_t hw_ctrl_lock;
+	LIST_HEAD(hw_ctrl_flow, mlx5_hw_ctrl_flow) hw_ctrl_flows;
+	struct mlx5dr_action **hw_vport;
+	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
+	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
+	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
@@ -1553,10 +1570,9 @@ struct mlx5_priv {
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
 	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_drop[MLX5_HW_ACTION_FLAG_MAX]
-				     [MLX5DR_TABLE_TYPE_MAX];
-	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_tag[MLX5_HW_ACTION_FLAG_MAX];
+	struct mlx5dr_action *hw_drop[2];
+	/* HW steering global tag action. */
+	struct mlx5dr_action *hw_tag[2];
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c856d249db..9c44b2e99b 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -999,6 +999,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.flex_item_create = mlx5_flow_flex_item_create,
 	.flex_item_release = mlx5_flow_flex_item_release,
 	.info_get = mlx5_flow_info_get,
+	.pick_transfer_proxy = mlx5_flow_pick_transfer_proxy,
 	.configure = mlx5_flow_port_configure,
 	.pattern_template_create = mlx5_flow_pattern_template_create,
 	.pattern_template_destroy = mlx5_flow_pattern_template_destroy,
@@ -1242,7 +1243,7 @@ mlx5_get_lowest_priority(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (!attr->group && !attr->transfer)
+	if (!attr->group && !(attr->transfer && priv->fdb_def_rule))
 		return priv->sh->flow_max_priority - 2;
 	return MLX5_NON_ROOT_FLOW_MAX_PRIO - 1;
 }
@@ -1269,11 +1270,14 @@ mlx5_get_matcher_priority(struct rte_eth_dev *dev,
 	uint16_t priority = (uint16_t)attr->priority;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+	/* NIC root rules */
 	if (!attr->group && !attr->transfer) {
 		if (attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR)
 			priority = priv->sh->flow_max_priority - 1;
 		return mlx5_os_flow_adjust_priority(dev, priority, subpriority);
-	} else if (!external && attr->transfer && attr->group == 0 &&
+	/* FDB root rules */
+	} else if (attr->transfer && (!external || !priv->fdb_def_rule) &&
+		   attr->group == 0 &&
 		   attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR) {
 		return (priv->sh->flow_max_priority - 1) * 3;
 	}
@@ -2828,8 +2832,8 @@ mlx5_flow_validate_item_tcp(const struct rte_flow_item *item,
  *   Item specification.
  * @param[in] item_flags
  *   Bit-fields that holds the items detected until now.
- * @param[in] attr
- *   Flow rule attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2841,7 +2845,7 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 			      uint16_t udp_dport,
 			      const struct rte_flow_item *item,
 			      uint64_t item_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_vxlan *spec = item->spec;
@@ -2878,12 +2882,11 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 	if (priv->sh->steering_format_version !=
 	    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
 	    !udp_dport || udp_dport == MLX5_UDP_PORT_VXLAN) {
-		/* FDB domain & NIC domain non-zero group */
-		if ((attr->transfer || attr->group) && priv->sh->misc5_cap)
+		/* non-root table */
+		if (!root && priv->sh->misc5_cap)
 			valid_mask = &nic_mask;
 		/* Group zero in NIC domain */
-		if (!attr->group && !attr->transfer &&
-		    priv->sh->tunnel_header_0_1)
+		if (!root && priv->sh->tunnel_header_0_1)
 			valid_mask = &nic_mask;
 	}
 	ret = mlx5_flow_item_acceptable
@@ -3122,11 +3125,11 @@ mlx5_flow_validate_item_gre_option(struct rte_eth_dev *dev,
 	if (mask->checksum_rsvd.checksum || mask->sequence.sequence) {
 		if (priv->sh->steering_format_version ==
 		    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
-		    ((attr->group || attr->transfer) &&
+		    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
 		     !priv->sh->misc5_cap) ||
 		    (!(priv->sh->tunnel_header_0_1 &&
 		       priv->sh->tunnel_header_2_3) &&
-		    !attr->group && !attr->transfer))
+		    !attr->group && (!attr->transfer || !priv->fdb_def_rule)))
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  item,
@@ -6183,7 +6186,8 @@ flow_create_split_metadata(struct rte_eth_dev *dev,
 	}
 	if (qrss) {
 		/* Check if it is in meter suffix table. */
-		mtr_sfx = attr->group == (attr->transfer ?
+		mtr_sfx = attr->group ==
+			  ((attr->transfer && priv->fdb_def_rule) ?
 			  (MLX5_FLOW_TABLE_LEVEL_METER - 1) :
 			  MLX5_FLOW_TABLE_LEVEL_METER);
 		/*
@@ -11106,3 +11110,43 @@ int mlx5_flow_get_item_vport_id(struct rte_eth_dev *dev,
 
 	return 0;
 }
+
+int
+mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+			      uint16_t *proxy_port_id,
+			      struct rte_flow_error *error)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t port_id;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " without E-Switch configured");
+	if (!priv->master && !priv->representor)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " for port which is not a master"
+					  " or a representor port");
+	if (priv->master) {
+		*proxy_port_id = dev->data->port_id;
+		return 0;
+	}
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_priv->master &&
+		    port_priv->domain_id == priv->domain_id) {
+			*proxy_port_id = port_id;
+			return 0;
+		}
+	}
+	return rte_flow_error_set(error, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "unable to find a proxy port");
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a7235b524d..f661f858c7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1152,6 +1152,11 @@ struct rte_flow_pattern_template {
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
 	uint32_t refcnt;  /* Reference counter. */
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * represented_port pattern item.
+	 */
+	bool implicit_port;
 };
 
 /* Flow action template struct. */
@@ -1227,6 +1232,7 @@ struct mlx5_hw_action_template {
 /* mlx5 flow group struct. */
 struct mlx5_flow_group {
 	struct mlx5_list_entry entry;
+	struct rte_eth_dev *dev; /* Reference to corresponding device. */
 	struct mlx5dr_table *tbl; /* HWS table object. */
 	struct mlx5_hw_jump_action jump; /* Jump action. */
 	enum mlx5dr_table_type type; /* Table type. */
@@ -1483,6 +1489,9 @@ void flow_hw_clear_port_info(struct rte_eth_dev *dev);
 void flow_hw_init_tags_set(struct rte_eth_dev *dev);
 void flow_hw_clear_tags_set(struct rte_eth_dev *dev);
 
+int flow_hw_create_vport_action(struct rte_eth_dev *dev);
+void flow_hw_destroy_vport_action(struct rte_eth_dev *dev);
+
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr,
 				    const struct rte_flow_item items[],
@@ -2055,7 +2064,7 @@ int mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 				  uint16_t udp_dport,
 				  const struct rte_flow_item *item,
 				  uint64_t item_flags,
-				  const struct rte_flow_attr *attr,
+				  bool root,
 				  struct rte_flow_error *error);
 int mlx5_flow_validate_item_vxlan_gpe(const struct rte_flow_item *item,
 				      uint64_t item_flags,
@@ -2312,4 +2321,15 @@ int flow_dv_translate_items_hws(const struct rte_flow_item *items,
 				uint32_t key_type, uint64_t *item_flags,
 				uint8_t *match_criteria,
 				struct rte_flow_error *error);
+
+int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+				  uint16_t *proxy_port_id,
+				  struct rte_flow_error *error);
+
+int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
+
+int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
+					 uint32_t txq);
+int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 5d3e2d37bb..d0f78cae8e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2460,8 +2460,8 @@ flow_dv_validate_item_gtp(struct rte_eth_dev *dev,
  *   Previous validated item in the pattern items.
  * @param[in] gtp_item
  *   Previous GTP item specification.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2472,7 +2472,7 @@ static int
 flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			      uint64_t last_item,
 			      const struct rte_flow_item *gtp_item,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_gtp *gtp_spec;
@@ -2497,7 +2497,7 @@ flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, item,
 			 "GTP E flag must be 1 to match GTP PSC");
 	/* Check the flow is not created in group zero. */
-	if (!attr->transfer && !attr->group)
+	if (root)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 			 "GTP PSC is not supported for group 0");
@@ -3362,20 +3362,19 @@ flow_dv_validate_action_set_tag(struct rte_eth_dev *dev,
 /**
  * Indicates whether ASO aging is supported.
  *
- * @param[in] sh
- *   Pointer to shared device context structure.
- * @param[in] attr
- *   Attributes of flow that includes AGE action.
+ * @param[in] priv
+ *   Pointer to device private context structure.
+ * @param[in] root
+ *   Whether action is on root table.
  *
  * @return
  *   True when ASO aging is supported, false otherwise.
  */
 static inline bool
-flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
-		const struct rte_flow_attr *attr)
+flow_hit_aso_supported(const struct mlx5_priv *priv, bool root)
 {
-	MLX5_ASSERT(sh && attr);
-	return (sh->flow_hit_aso_en && (attr->transfer || attr->group));
+	MLX5_ASSERT(priv);
+	return (priv->sh->flow_hit_aso_en && !root);
 }
 
 /**
@@ -3387,8 +3386,8 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
  *   Indicator if action is shared.
  * @param[in] action_flags
  *   Holds the actions detected until now.
- * @param[in] attr
- *   Attributes of flow that includes this action.
+ * @param[in] root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3398,7 +3397,7 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
 static int
 flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 			      uint64_t action_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -3410,7 +3409,7 @@ flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "duplicate count actions set");
 	if (shared && (action_flags & MLX5_FLOW_ACTION_AGE) &&
-	    !flow_hit_aso_supported(priv->sh, attr))
+	    !flow_hit_aso_supported(priv, root))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "old age and indirect count combination is not supported");
@@ -3641,8 +3640,8 @@ flow_dv_validate_action_raw_encap_decap
  *   Holds the actions detected until now.
  * @param[in] item_flags
  *   The items found in this flow rule.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3653,12 +3652,12 @@ static int
 flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 			       uint64_t action_flags,
 			       uint64_t item_flags,
-			       const struct rte_flow_attr *attr,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	RTE_SET_USED(dev);
 
-	if (attr->group == 0 && !attr->transfer)
+	if (root)
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -4908,6 +4907,8 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   Pointer to the modify action.
  * @param[in] attr
  *   Pointer to the flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -4920,6 +4921,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 				   const uint64_t action_flags,
 				   const struct rte_flow_action *action,
 				   const struct rte_flow_attr *attr,
+				   bool root,
 				   struct rte_flow_error *error)
 {
 	int ret = 0;
@@ -4967,7 +4969,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	}
 	if (action_modify_field->src.field != RTE_FLOW_FIELD_VALUE &&
 	    action_modify_field->src.field != RTE_FLOW_FIELD_POINTER) {
-		if (!attr->transfer && !attr->group)
+		if (root)
 			return rte_flow_error_set(error, ENOTSUP,
 					RTE_FLOW_ERROR_TYPE_ACTION, action,
 					"modify field action is not"
@@ -5057,8 +5059,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV4_ECN ||
 	    action_modify_field->dst.field == RTE_FLOW_FIELD_IPV6_ECN ||
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV6_ECN)
-		if (!hca_attr->modify_outer_ip_ecn &&
-		    !attr->transfer && !attr->group)
+		if (!hca_attr->modify_outer_ip_ecn && root)
 			return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_ACTION, action,
 				"modifications of the ECN for current firmware is not supported");
@@ -5092,11 +5093,12 @@ flow_dv_validate_action_jump(struct rte_eth_dev *dev,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
 	struct flow_grp_info grp_info = {
 		.external = !!external,
 		.transfer = !!attributes->transfer,
-		.fdb_def_rule = 1,
+		.fdb_def_rule = !!priv->fdb_def_rule,
 		.std_tbl_fix = 0
 	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
@@ -5676,6 +5678,8 @@ flow_dv_modify_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  *   Pointer to the COUNT action in sample action list.
  * @param[out] fdb_mirror_limit
  *   Pointer to the FDB mirror limitation flag.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -5692,6 +5696,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 			       const struct rte_flow_action_rss **sample_rss,
 			       const struct rte_flow_action_count **count,
 			       int *fdb_mirror_limit,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5793,7 +5798,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count
 				(dev, false, *action_flags | sub_action_flags,
-				 attr, error);
+				 root, error);
 			if (ret < 0)
 				return ret;
 			*count = act->conf;
@@ -7273,7 +7278,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
@@ -7367,7 +7372,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
 			ret = flow_dv_validate_item_gtp_psc(items, last_item,
-							    gtp_item, attr,
+							    gtp_item, is_root,
 							    error);
 			if (ret < 0)
 				return ret;
@@ -7584,7 +7589,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count(dev, shared_count,
 							    action_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			count = actions->conf;
@@ -7878,7 +7883,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_SET_TAG;
 			break;
 		case MLX5_RTE_FLOW_ACTION_TYPE_AGE:
-			if (!attr->transfer && !attr->group)
+			if (is_root)
 				return rte_flow_error_set(error, ENOTSUP,
 						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 									   NULL,
@@ -7903,7 +7908,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			 * Validate the regular AGE action (using counter)
 			 * mutual exclusion with indirect counter actions.
 			 */
-			if (!flow_hit_aso_supported(priv->sh, attr)) {
+			if (!flow_hit_aso_supported(priv, is_root)) {
 				if (shared_count)
 					return rte_flow_error_set
 						(error, EINVAL,
@@ -7959,6 +7964,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 							     rss, &sample_rss,
 							     &sample_count,
 							     &fdb_mirror_limit,
+							     is_root,
 							     error);
 			if (ret < 0)
 				return ret;
@@ -7975,6 +7981,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 								   action_flags,
 								   actions,
 								   attr,
+								   is_root,
 								   error);
 			if (ret < 0)
 				return ret;
@@ -7988,8 +7995,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			ret = flow_dv_validate_action_aso_ct(dev, action_flags,
-							     item_flags, attr,
-							     error);
+							     item_flags,
+							     is_root, error);
 			if (ret < 0)
 				return ret;
 			action_flags |= MLX5_FLOW_ACTION_CT;
@@ -9189,15 +9196,18 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev,
 	if (MLX5_ITEM_VALID(item, key_type))
 		return;
 	MLX5_ITEM_UPDATE(item, key_type, vxlan_v, vxlan_m, &nic_mask);
-	if (item->mask == &nic_mask &&
-	    ((!attr->group && !priv->sh->tunnel_header_0_1) ||
-	    (attr->group && !priv->sh->misc5_cap)))
+	if ((item->mask == &nic_mask) &&
+	    ((!attr->group && !(attr->transfer && priv->fdb_def_rule) &&
+	    !priv->sh->tunnel_header_0_1) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)))
 		vxlan_m = &rte_flow_item_vxlan_mask;
 	if ((priv->sh->steering_format_version ==
 	     MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 &&
 	     dport != MLX5_UDP_PORT_VXLAN) ||
-	    (!attr->group && !attr->transfer) ||
-	    ((attr->group || attr->transfer) && !priv->sh->misc5_cap)) {
+	    (!attr->group && !(attr->transfer && priv->fdb_def_rule)) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)) {
 		misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 		size = sizeof(vxlan_m->vni);
 		vni_v = MLX5_ADDR_OF(fte_match_set_misc, misc_v, vxlan_vni);
@@ -14169,7 +14179,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 			 */
 			if (action_flags & MLX5_FLOW_ACTION_AGE) {
 				if ((non_shared_age && count) ||
-				    !flow_hit_aso_supported(priv->sh, attr)) {
+				    !flow_hit_aso_supported(priv, !dev_flow->dv.group)) {
 					/* Creates age by counters. */
 					cnt_act = flow_dv_prepare_counter
 								(dev, dev_flow,
@@ -18318,6 +18328,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 			struct rte_flow_error *err)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	/* called from RTE API */
 
 	RTE_SET_USED(conf);
 	switch (action->type) {
@@ -18345,7 +18356,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 						"Indirect age action not supported");
 		return flow_dv_validate_action_age(0, action, dev, err);
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		return flow_dv_validate_action_count(dev, true, 0, NULL, err);
+		return flow_dv_validate_action_count(dev, true, 0, false, err);
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		if (!priv->sh->ct_aso_en)
 			return rte_flow_error_set(err, ENOTSUP,
@@ -18522,6 +18533,8 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 	bool def_green = false;
 	bool def_yellow = false;
 	const struct rte_flow_action_rss *rss_color[RTE_COLORS] = {NULL};
+	/* Called from RTE API */
+	bool is_root = !(attr->group || (attr->transfer && priv->fdb_def_rule));
 
 	if (!dev_conf->dv_esw_en)
 		def_domain &= ~MLX5_MTR_DOMAIN_TRANSFER_BIT;
@@ -18723,7 +18736,7 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 				break;
 			case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 				ret = flow_dv_validate_action_modify_field(dev,
-					action_flags[i], act, attr, &flow_err);
+					action_flags[i], act, attr, is_root, &flow_err);
 				if (ret < 0)
 					return -rte_mtr_error_set(error,
 					  ENOTSUP,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1f98e1248a..004eacc334 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,6 +20,14 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
+/* Maximum number of rules in control flow tables */
+#define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
+
+/* Flow group for SQ miss default flows/ */
+#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+
+static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -802,6 +810,77 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+flow_hw_represented_port_compile(struct rte_eth_dev *dev,
+				 const struct rte_flow_attr *attr,
+				 const struct rte_flow_action *action_start,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *action_mask,
+				 struct mlx5_hw_actions *acts,
+				 uint16_t action_dst,
+				 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_ethdev *v = action->conf;
+	const struct rte_flow_action_ethdev *m = action_mask->conf;
+	int ret;
+
+	if (!attr->group)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used on group 0");
+	if (!attr->transfer)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER,
+					  NULL,
+					  "represented_port action requires"
+					  " transfer attribute");
+	if (attr->ingress || attr->egress)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used with direction attributes");
+	if (!priv->master)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "represented_port acton must"
+					  " be used on proxy port");
+	if (m && !!m->port_id) {
+		struct mlx5_priv *port_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
+		if (port_priv == NULL)
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "port does not exist or unable to"
+					 " obtain E-Switch info for port");
+		MLX5_ASSERT(priv->hw_vport != NULL);
+		if (priv->hw_vport[v->port_id]) {
+			acts->rule_acts[action_dst].action =
+					priv->hw_vport[v->port_id];
+		} else {
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "cannot use represented_port action"
+					 " with this port");
+		}
+	} else {
+		ret = __flow_hw_act_data_general_append
+				(priv, acts, action->type,
+				 action - action_start, action_dst);
+		if (ret)
+			return rte_flow_error_set
+					(error, ENOMEM,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "not enough memory to store"
+					 " vport action");
+	}
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -879,7 +958,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			acts->rule_acts[i++].action =
-				priv->hw_drop[!!attr->group][type];
+				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			acts->mark = true;
@@ -1012,6 +1091,13 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			if (err)
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			if (flow_hw_represented_port_compile
+					(dev, attr, action_start, actions,
+					 masks, acts, i, error))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1334,11 +1420,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5dr_rule_action *rule_acts,
 			  uint32_t *acts_num)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
+	const struct rte_flow_action_ethdev *port_action = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1458,6 +1546,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (ret)
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			port_action = action->conf;
+			if (!priv->hw_vport[port_action->port_id])
+				return -1;
+			rule_acts[act_data->action_dst].action =
+					priv->hw_vport[port_action->port_id];
+			break;
 		default:
 			break;
 		}
@@ -1470,6 +1565,52 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static const struct rte_flow_item *
+flow_hw_get_rule_items(struct rte_eth_dev *dev,
+		       struct rte_flow_template_table *table,
+		       const struct rte_flow_item items[],
+		       uint8_t pattern_template_index,
+		       struct mlx5_hw_q_job *job)
+{
+	if (table->its[pattern_template_index]->implicit_port) {
+		const struct rte_flow_item *curr_item;
+		unsigned int nb_items;
+		bool found_end;
+		unsigned int i;
+
+		/* Count number of pattern items. */
+		nb_items = 0;
+		found_end = false;
+		for (curr_item = items; !found_end; ++curr_item) {
+			++nb_items;
+			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+				found_end = true;
+		}
+		/* Prepend represented port item. */
+		job->port_spec = (struct rte_flow_item_ethdev){
+			.port_id = dev->data->port_id,
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &job->port_spec,
+		};
+		found_end = false;
+		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
+			job->items[i] = items[i - 1];
+			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
+				found_end = true;
+				break;
+			}
+		}
+		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+		return job->items;
+	}
+	return items;
+}
+
 /**
  * Enqueue HW steering flow creation.
  *
@@ -1521,6 +1662,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
+	const struct rte_flow_item *rule_items;
 	uint32_t acts_num, flow_idx;
 	int ret;
 
@@ -1547,15 +1689,23 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow action array based on the input actions.*/
-	flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num);
+	/* Construct the flow actions based on the input actions.*/
+	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
+				  actions, rule_acts, &acts_num)) {
+		rte_errno = EINVAL;
+		goto free;
+	}
+	rule_items = flow_hw_get_rule_items(dev, table, items,
+					    pattern_template_index, job);
+	if (!rule_items)
+		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
 				 pattern_template_index, items,
 				 action_template_index, rule_acts,
 				 &rule_attr, &flow->rule);
 	if (likely(!ret))
 		return (struct rte_flow *)flow;
+free:
 	/* Flow created fail, return the descriptor and flow memory. */
 	mlx5_ipool_free(table->flow, flow_idx);
 	priv->hw_q[queue].job_idx++;
@@ -1736,7 +1886,9 @@ __flow_hw_pull_comp(struct rte_eth_dev *dev,
 	struct rte_flow_op_result comp[BURST_THR];
 	int ret, i, empty_loop = 0;
 
-	flow_hw_push(dev, queue, error);
+	ret = flow_hw_push(dev, queue, error);
+	if (ret < 0)
+		return ret;
 	while (pending_rules) {
 		ret = flow_hw_pull(dev, queue, comp, BURST_THR, error);
 		if (ret < 0)
@@ -2021,8 +2173,12 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
+	uint32_t fidx = 1;
 
-	if (table->refcnt) {
+	/* Build ipool allocated object bitmap. */
+	mlx5_ipool_flush_cache(table->flow);
+	/* Check if ipool has allocated objects. */
+	if (table->refcnt || mlx5_ipool_get_next(table->flow, &fidx)) {
 		DRV_LOG(WARNING, "Table %p is still in using.", (void *)table);
 		return rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2101,7 +2257,51 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
 }
 
 static int
-flow_hw_action_validate(const struct rte_flow_action actions[],
+flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
+					 const struct rte_flow_action *action,
+					 const struct rte_flow_action *mask,
+					 struct rte_flow_error *error)
+{
+	const struct rte_flow_action_ethdev *action_conf = action->conf;
+	const struct rte_flow_action_ethdev *mask_conf = mask->conf;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "cannot use represented_port actions"
+					  " without an E-Switch");
+	if (mask_conf->port_id) {
+		struct mlx5_priv *port_priv;
+		struct mlx5_priv *dev_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
+		if (!port_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for port");
+		dev_priv = mlx5_dev_to_eswitch_info(dev);
+		if (!dev_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for transfer proxy");
+		if (port_priv->domain_id != dev_priv->domain_id)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "cannot forward to port from"
+						  " a different E-Switch");
+	}
+	return 0;
+}
+
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
@@ -2164,6 +2364,12 @@ flow_hw_action_validate(const struct rte_flow_action actions[],
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			ret = flow_hw_validate_action_represented_port
+					(dev, action, mask, error);
+			if (ret < 0)
+				return ret;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2205,7 +2411,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
-	if (flow_hw_action_validate(actions, masks, error))
+	if (flow_hw_action_validate(dev, actions, masks, error))
 		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
@@ -2288,6 +2494,46 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
+static struct rte_flow_item *
+flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
+			       struct rte_flow_error *error)
+{
+	const struct rte_flow_item *curr_item;
+	struct rte_flow_item *copied_items;
+	bool found_end;
+	unsigned int nb_items;
+	unsigned int i;
+	size_t size;
+
+	/* Count number of pattern items. */
+	nb_items = 0;
+	found_end = false;
+	for (curr_item = items; !found_end; ++curr_item) {
+		++nb_items;
+		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+			found_end = true;
+	}
+	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	size = sizeof(*copied_items) * (nb_items + 1);
+	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
+	if (!copied_items) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				   NULL,
+				   "cannot allocate item template");
+		return NULL;
+	}
+	copied_items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = NULL,
+		.last = NULL,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	for (i = 1; i < nb_items + 1; ++i)
+		copied_items[i] = items[i - 1];
+	return copied_items;
+}
+
 /**
  * Create flow item template.
  *
@@ -2311,9 +2557,35 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *it;
+	struct rte_flow_item *copied_items = NULL;
+	const struct rte_flow_item *tmpl_items;
 
+	if (priv->sh->config.dv_esw_en && attr->ingress) {
+		/*
+		 * Disallow pattern template with ingress and egress/transfer
+		 * attributes in order to forbid implicit port matching
+		 * on egress and transfer traffic.
+		 */
+		if (attr->egress || attr->transfer) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "item template for ingress traffic"
+					   " cannot be used for egress/transfer"
+					   " traffic when E-Switch is enabled");
+			return NULL;
+		}
+		copied_items = flow_hw_copy_prepend_port_item(items, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else {
+		tmpl_items = items;
+	}
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL,
@@ -2321,8 +2593,10 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
-	it->mt = mlx5dr_match_template_create(items, attr->relaxed_matching);
+	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		mlx5_free(it);
 		rte_flow_error_set(error, rte_errno,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2330,9 +2604,12 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 				   "cannot create match template");
 		return NULL;
 	}
-	it->item_flags = flow_hw_rss_item_flags_get(items);
+	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
+	it->implicit_port = !!copied_items;
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
+	if (copied_items)
+		mlx5_free(copied_items);
 	return it;
 }
 
@@ -2458,6 +2735,7 @@ flow_hw_grp_create_cb(void *tool_ctx, void *cb_ctx)
 			goto error;
 		grp_data->jump.root_action = jump;
 	}
+	grp_data->dev = dev;
 	grp_data->idx = idx;
 	grp_data->group_id = attr->group;
 	grp_data->type = dr_tbl_attr.type;
@@ -2526,7 +2804,8 @@ flow_hw_grp_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 	struct rte_flow_attr *attr =
 			(struct rte_flow_attr *)ctx->data;
 
-	return (grp_data->group_id != attr->group) ||
+	return (grp_data->dev != ctx->dev) ||
+		(grp_data->group_id != attr->group) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_FDB) &&
 		attr->transfer) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_NIC_TX) &&
@@ -2589,6 +2868,545 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
 	mlx5_ipool_free(sh->ipool[MLX5_IPOOL_HW_GRP], grp_data->idx);
 }
 
+/**
+ * Create and cache a vport action for given @p dev port. vport actions
+ * cache is used in HWS with FDB flows.
+ *
+ * This function does not create any function if proxy port for @p dev port
+ * was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ *
+ * @return
+ *   0 on success, positive value otherwise.
+ */
+int
+flow_hw_create_vport_action(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+	int ret;
+
+	ret = mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL);
+	if (ret)
+		return ret;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport)
+		return 0;
+	if (proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u HWS vport action already created",
+			port_id);
+		return -EINVAL;
+	}
+	proxy_priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+			(proxy_priv->dr_ctx, priv->dev_port,
+			 MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u unable to create HWS vport action",
+			port_id);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Destroys the vport action associated with @p dev device
+ * from actions' cache.
+ *
+ * This function does not destroy any action if there is no action cached
+ * for @p dev or proxy port was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ */
+void
+flow_hw_destroy_vport_action(struct rte_eth_dev *dev)
+{
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+
+	if (mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL))
+		return;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport || !proxy_priv->hw_vport[port_id])
+		return;
+	mlx5dr_action_destroy(proxy_priv->hw_vport[port_id]);
+	proxy_priv->hw_vport[port_id] = NULL;
+}
+
+static int
+flow_hw_create_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	MLX5_ASSERT(!priv->hw_vport);
+	priv->hw_vport = mlx5_malloc(MLX5_MEM_ZERO,
+				     sizeof(*priv->hw_vport) * RTE_MAX_ETHPORTS,
+				     0, SOCKET_ID_ANY);
+	if (!priv->hw_vport)
+		return -ENOMEM;
+	DRV_LOG(DEBUG, "port %u :: creating vport actions", priv->dev_data->port_id);
+	DRV_LOG(DEBUG, "port %u ::    domain_id=%u", priv->dev_data->port_id, priv->domain_id);
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
+		struct mlx5_priv *port_priv = rte_eth_devices[port_id].data->dev_private;
+
+		if (!port_priv ||
+		    port_priv->domain_id != priv->domain_id)
+			continue;
+		DRV_LOG(DEBUG, "port %u :: for port_id=%u, calling mlx5dr_action_create_dest_vport() with ibport=%u",
+			priv->dev_data->port_id, port_id, port_priv->dev_port);
+		priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+				(priv->dr_ctx, port_priv->dev_port,
+				 MLX5DR_ACTION_FLAG_HWS_FDB);
+		DRV_LOG(DEBUG, "port %u :: priv->hw_vport[%u]=%p",
+			priv->dev_data->port_id, port_id, (void *)priv->hw_vport[port_id]);
+		if (!priv->hw_vport[port_id])
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static void
+flow_hw_free_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	if (!priv->hw_vport)
+		return;
+	for (port_id = 0; port_id < RTE_MAX_ETHPORTS; ++port_id)
+		if (priv->hw_vport[port_id])
+			mlx5dr_action_destroy(priv->hw_vport[port_id]);
+	mlx5_free(priv->hw_vport);
+	priv->hw_vport = NULL;
+}
+
+/**
+ * Creates a flow pattern template used to match on E-Switch Manager.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template used to match on a TX queue.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template with unmasked represented port matching.
+ * This template is used to set up a table for default transfer flows
+ * directing packets to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow actions template with an unmasked JUMP action. Flows
+ * based on this template will perform a jump to some group. This template
+ * is used to set up tables for control flows.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param group
+ *   Destination group for this action template.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_jump_actions_template(struct rte_eth_dev *dev,
+					  uint32_t group)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = group,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a flow action template with a unmasked REPRESENTED_PORT action.
+ * It is used to create control flow tables.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow action template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_ethdev port_v = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action_ethdev port_m = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
+				       struct rte_flow_pattern_template *it,
+				       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
+				  struct rte_flow_pattern_template *it,
+				  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = MLX5_HW_SQ_MISS_GROUP,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic
+ * from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table *
+flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
+			       struct rte_flow_pattern_template *it,
+			       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 15, /* TODO: Flow priority discovery. */
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a set of flow tables used to create control flows used
+ * when E-Switch is engaged.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, EINVAL otherwise
+ */
+static __rte_unused int
+flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
+	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *port_items_tmpl = NULL;
+	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_actions_template *port_actions_tmpl = NULL;
+	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+
+	/* Item templates */
+	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
+	if (!esw_mgr_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
+			" template for control flows", dev->data->port_id);
+		goto error;
+	}
+	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
+	if (!sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Action templates */
+	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
+									 MLX5_HW_SQ_MISS_GROUP);
+	if (!jump_sq_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Tables */
+	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
+	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
+			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_root_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+								     port_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
+	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
+							       jump_one_actions_tmpl);
+	if (!priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default jump to group 1"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	return 0;
+error:
+	if (priv->hw_esw_zero_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_zero_tbl, NULL);
+		priv->hw_esw_zero_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_tbl, NULL);
+		priv->hw_esw_sq_miss_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_root_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
+		priv->hw_esw_sq_miss_root_tbl = NULL;
+	}
+	if (jump_one_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
+	if (port_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
+	if (jump_sq_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (port_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
+	if (sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (esw_mgr_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
+	return -EINVAL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -2606,7 +3424,6 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-
 static int
 flow_hw_configure(struct rte_eth_dev *dev,
 		  const struct rte_flow_port_attr *port_attr,
@@ -2629,6 +3446,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		.free = mlx5_free,
 		.type = "mlx5_hw_action_construct_data",
 	};
+	/* Adds one queue to be used by PMD.
+	 * The last queue will be used by the PMD.
+	 */
+	uint16_t nb_q_updated;
+	struct rte_flow_queue_attr **_queue_attr = NULL;
+	struct rte_flow_queue_attr ctrl_queue_attr = {0};
+	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
+	int ret;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -2637,7 +3462,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* In case re-configuring, release existing context at first. */
 	if (priv->dr_ctx) {
 		/* */
-		for (i = 0; i < nb_queue; i++) {
+		for (i = 0; i < priv->nb_queue; i++) {
 			hw_q = &priv->hw_q[i];
 			/* Make sure all queues are empty. */
 			if (hw_q->size != hw_q->job_idx) {
@@ -2647,26 +3472,42 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		flow_hw_resource_release(dev);
 	}
+	ctrl_queue_attr.size = queue_attr[0]->size;
+	nb_q_updated = nb_queue + 1;
+	_queue_attr = mlx5_malloc(MLX5_MEM_ZERO,
+				  nb_q_updated *
+				  sizeof(struct rte_flow_queue_attr *),
+				  64, SOCKET_ID_ANY);
+	if (!_queue_attr) {
+		rte_errno = ENOMEM;
+		goto err;
+	}
+
+	memcpy(_queue_attr, queue_attr,
+	       sizeof(void *) * nb_queue);
+	_queue_attr[nb_queue] = &ctrl_queue_attr;
 	priv->acts_ipool = mlx5_ipool_create(&cfg);
 	if (!priv->acts_ipool)
 		goto err;
 	/* Allocate the queue job descriptor LIFO. */
-	mem_size = sizeof(priv->hw_q[0]) * nb_queue;
-	for (i = 0; i < nb_queue; i++) {
+	mem_size = sizeof(priv->hw_q[0]) * nb_q_updated;
+	for (i = 0; i < nb_q_updated; i++) {
 		/*
 		 * Check if the queues' size are all the same as the
 		 * limitation from HWS layer.
 		 */
-		if (queue_attr[i]->size != queue_attr[0]->size) {
+		if (_queue_attr[i]->size != _queue_attr[0]->size) {
 			rte_errno = EINVAL;
 			goto err;
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
+			    sizeof(struct mlx5_hw_q_job) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
 			    sizeof(struct mlx5_modification_cmd) *
 			    MLX5_MHDR_MAX_CMD +
-			    sizeof(struct mlx5_hw_q_job)) *
-			    queue_attr[0]->size;
+			    sizeof(struct rte_flow_item) *
+			    MLX5_HW_MAX_ITEMS) *
+			    _queue_attr[i]->size;
 	}
 	priv->hw_q = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
 				 64, SOCKET_ID_ANY);
@@ -2674,58 +3515,82 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		rte_errno = ENOMEM;
 		goto err;
 	}
-	for (i = 0; i < nb_queue; i++) {
+	for (i = 0; i < nb_q_updated; i++) {
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
+		struct rte_flow_item *items = NULL;
 
-		priv->hw_q[i].job_idx = queue_attr[i]->size;
-		priv->hw_q[i].size = queue_attr[i]->size;
+		priv->hw_q[i].job_idx = _queue_attr[i]->size;
+		priv->hw_q[i].size = _queue_attr[i]->size;
 		if (i == 0)
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &priv->hw_q[nb_queue];
+					    &priv->hw_q[nb_q_updated];
 		else
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &job[queue_attr[i - 1]->size];
+				&job[_queue_attr[i - 1]->size - 1].items
+				 [MLX5_HW_MAX_ITEMS];
 		job = (struct mlx5_hw_q_job *)
-		      &priv->hw_q[i].job[queue_attr[i]->size];
-		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
-		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
-		for (j = 0; j < queue_attr[i]->size; j++) {
+		      &priv->hw_q[i].job[_queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)
+			   &job[_queue_attr[i]->size];
+		encap = (uint8_t *)
+			 &mhdr_cmd[_queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
+		items = (struct rte_flow_item *)
+			 &encap[_queue_attr[i]->size * MLX5_ENCAP_MAX_LEN];
+		for (j = 0; j < _queue_attr[i]->size; j++) {
 			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
+			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
-	dr_ctx_attr.queues = nb_queue;
+	dr_ctx_attr.queues = nb_q_updated;
 	/* Queue size should all be the same. Take the first one. */
-	dr_ctx_attr.queue_size = queue_attr[0]->size;
+	dr_ctx_attr.queue_size = _queue_attr[0]->size;
 	dr_ctx = mlx5dr_context_open(priv->sh->cdev->ctx, &dr_ctx_attr);
 	/* rte_errno has been updated by HWS layer. */
 	if (!dr_ctx)
 		goto err;
 	priv->dr_ctx = dr_ctx;
-	priv->nb_queue = nb_queue;
+	priv->nb_queue = nb_q_updated;
+	rte_spinlock_init(&priv->hw_ctrl_lock);
+	LIST_INIT(&priv->hw_ctrl_flows);
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			priv->hw_drop[i][j] = mlx5dr_action_create_dest_drop
-				(priv->dr_ctx, mlx5_hw_act_flag[i][j]);
-			if (!priv->hw_drop[i][j])
-				goto err;
-		}
+		uint32_t act_flags = 0;
+
+		act_flags = mlx5_hw_act_flag[i][0] | mlx5_hw_act_flag[i][1];
+		if (is_proxy)
+			act_flags |= mlx5_hw_act_flag[i][2];
+		priv->hw_drop[i] = mlx5dr_action_create_dest_drop(priv->dr_ctx, act_flags);
+		if (!priv->hw_drop[i])
+			goto err;
 		priv->hw_tag[i] = mlx5dr_action_create_tag
 			(priv->dr_ctx, mlx5_hw_act_flag[i][0]);
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (is_proxy) {
+		ret = flow_hw_create_vport_actions(priv);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+		ret = flow_hw_create_ctrl_tables(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return 0;
 err:
+	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
@@ -2737,6 +3602,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -2755,10 +3622,11 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i, j;
+	int i;
 
 	if (!priv->dr_ctx)
 		return;
+	flow_hw_flush_all_ctrl_flows(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -2772,13 +3640,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, at, NULL);
 	}
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
@@ -3021,4 +3888,397 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
+static uint32_t
+flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
+{
+	MLX5_ASSERT(priv->nb_queue > 0);
+	return priv->nb_queue - 1;
+}
+
+/**
+ * Creates a control flow using flow template API on @p proxy_dev device,
+ * on behalf of @p owner_dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * Created flow is stored in private list associated with @p proxy_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device on behalf of which flow is created.
+ * @param proxy_dev
+ *   Pointer to Ethernet device on which flow is created.
+ * @param table
+ *   Pointer to flow table.
+ * @param items
+ *   Pointer to flow rule items.
+ * @param item_template_idx
+ *   Index of an item template associated with @p table.
+ * @param actions
+ *   Pointer to flow rule actions.
+ * @param action_template_idx
+ *   Index of an action template associated with @p table.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno set.
+ */
+static __rte_unused int
+flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
+			 struct rte_eth_dev *proxy_dev,
+			 struct rte_flow_template_table *table,
+			 struct rte_flow_item items[],
+			 uint8_t item_template_idx,
+			 struct rte_flow_action actions[],
+			 uint8_t action_template_idx)
+{
+	struct mlx5_priv *priv = proxy_dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	struct rte_flow *flow = NULL;
+	struct mlx5_hw_ctrl_flow *entry = NULL;
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	entry = mlx5_malloc(MLX5_MEM_ZERO | MLX5_MEM_SYS, sizeof(*entry),
+			    0, SOCKET_ID_ANY);
+	if (!entry) {
+		DRV_LOG(ERR, "port %u not enough memory to create control flows",
+			proxy_dev->data->port_id);
+		rte_errno = ENOMEM;
+		ret = -rte_errno;
+		goto error;
+	}
+	flow = flow_hw_async_flow_create(proxy_dev, queue, &op_attr, table,
+					 items, item_template_idx,
+					 actions, action_template_idx,
+					 NULL, NULL);
+	if (!flow) {
+		DRV_LOG(ERR, "port %u failed to enqueue create control"
+			" flow operation", proxy_dev->data->port_id);
+		ret = -rte_errno;
+		goto error;
+	}
+	ret = flow_hw_push(proxy_dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			proxy_dev->data->port_id);
+		goto error;
+	}
+	ret = __flow_hw_pull_comp(proxy_dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to insert control flow",
+			proxy_dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto error;
+	}
+	entry->owner_dev = owner_dev;
+	entry->flow = flow;
+	LIST_INSERT_HEAD(&priv->hw_ctrl_flows, entry, next);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+error:
+	if (entry)
+		mlx5_free(entry);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys a control flow @p flow using flow template API on @p dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * If the @p flow is stored on any private list/pool, then caller must free up
+ * the relevant resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param flow
+ *   Pointer to flow rule.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+static int
+flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	ret = flow_hw_async_flow_destroy(dev, queue, &op_attr, flow, NULL, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to enqueue destroy control"
+			" flow operation", dev->data->port_id);
+		goto exit;
+	}
+	ret = flow_hw_push(dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			dev->data->port_id);
+		goto exit;
+	}
+	ret = __flow_hw_pull_comp(dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to destroy control flow",
+			dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto exit;
+	}
+exit:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys control flows created on behalf of @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	if (owner_priv->sh->config.dv_esw_en) {
+		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u",
+				owner_port_id);
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+		proxy_priv = proxy_dev->data->dev_private;
+	} else {
+		proxy_dev = owner_dev;
+		proxy_priv = owner_priv;
+	}
+	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		if (cf->owner_dev == owner_dev) {
+			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+			if (ret) {
+				rte_errno = ret;
+				return -ret;
+			}
+			LIST_REMOVE(cf, next);
+			mlx5_free(cf);
+		}
+		cf = cf_next;
+	}
+	return 0;
+}
+
+/**
+ * Destroys all control flows created on @p dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+static int
+flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	int ret;
+
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
+		if (ret) {
+			rte_errno = ret;
+			return -ret;
+		}
+		LIST_REMOVE(cf, next);
+		mlx5_free(cf);
+		cf = cf_next;
+	}
+	return 0;
+}
+
+int
+mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HW_SQ_MISS_GROUP,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx ||
+	    !priv->hw_esw_sq_miss_root_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_esw_sq_miss_root_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = txq,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_ethdev port = {
+		.port_id = port_id,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	RTE_SET_USED(txq);
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
+	    !proxy_priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_sq_miss_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = port_id,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = 1,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_zero_tbl,
+					items, 0, actions, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index fd902078f8..7ffaf4c227 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1245,12 +1245,14 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 	uint16_t ether_type = 0;
 	bool is_empty_vlan = false;
 	uint16_t udp_dport = 0;
+	bool is_root;
 
 	if (items == NULL)
 		return -1;
 	ret = mlx5_flow_validate_attributes(dev, attr, error);
 	if (ret < 0)
 		return ret;
+	is_root = ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int ret = 0;
@@ -1380,7 +1382,7 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index c68b32cf14..3ef31671b1 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1280,6 +1280,48 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static int
+mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	int ret;
+
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
+			goto error;
+	}
+	for (i = 0; i < priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
+		uint32_t queue;
+
+		if (!txq)
+			continue;
+		if (txq->is_hairpin)
+			queue = txq->obj->sq->id;
+		else
+			queue = txq->obj->sq_obj.sq->id;
+		if ((priv->representor || priv->master) &&
+		    priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
+	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
+			goto error;
+	}
+	return 0;
+error:
+	ret = rte_errno;
+	mlx5_flow_hw_flush_ctrl_flows(dev);
+	rte_errno = ret;
+	return -rte_errno;
+}
+
 /**
  * Enable traffic flows configured by control plane
  *
@@ -1316,6 +1358,8 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 	unsigned int j;
 	int ret;
 
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_traffic_enable_hws(dev);
 	/*
 	 * Hairpin txq default flow should be created no matter if it is
 	 * isolation mode. Or else all the packets to be sent will be sent
@@ -1346,13 +1390,17 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_create_esw_table_zero_flow(dev))
-			priv->fdb_def_rule = 1;
-		else
-			DRV_LOG(INFO, "port %u FDB default rule cannot be"
-				" configured - only Eswitch group 0 flows are"
-				" supported.", dev->data->port_id);
+	if (priv->sh->config.fdb_def_rule) {
+		if (priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_create_esw_table_zero_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				DRV_LOG(INFO, "port %u FDB default rule cannot be configured - only Eswitch group 0 flows are supported.",
+					dev->data->port_id);
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled",
+			dev->data->port_id);
 	}
 	if (!priv->sh->config.lacp_by_user && priv->pf_bond >= 0) {
 		ret = mlx5_flow_lacp_miss(dev);
@@ -1470,7 +1518,12 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 void
 mlx5_traffic_disable(struct rte_eth_dev *dev)
 {
-	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		mlx5_flow_hw_flush_ctrl_flows(dev);
+	else
+		mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 08/27] net/mlx5: add extended metadata mode for hardware steering
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (6 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 07/27] net/mlx5: create port actions Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 09/27] ethdev: add meter profiles/policies config Suanming Mou
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Bing Zhao

From: Bing Zhao <bingz@nvidia.com>

The new mode 4 of devarg "dv_xmeta_en" is added for HWS only. In this
mode, the Rx / Tx metadata with 32b width copy between FDB and NIC is
supported. The mark is only supported in NIC and there is no copy
supported.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  10 +-
 drivers/net/mlx5/mlx5.c          |   7 +-
 drivers/net/mlx5/mlx5.h          |   8 +-
 drivers/net/mlx5/mlx5_flow.c     |   8 +-
 drivers/net/mlx5/mlx5_flow.h     |  14 +
 drivers/net/mlx5/mlx5_flow_dv.c  |  21 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 862 ++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_trigger.c  |   3 +
 8 files changed, 851 insertions(+), 82 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 28220d10ad..41940d7ce7 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1552,6 +1552,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->vport_meta_mask)
 		flow_hw_set_port_info(eth_dev);
 	if (priv->sh->config.dv_flow_en == 2) {
+		if (priv->sh->config.dv_esw_en &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+			DRV_LOG(ERR,
+				"metadata mode %u is not supported in HWS eswitch mode",
+				priv->sh->config.dv_xmeta_en);
+				err = ENOTSUP;
+				goto error;
+		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
@@ -1563,7 +1572,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		}
 		return eth_dev;
 	}
-	/* Port representor shares the same max priority with pf port. */
 	if (!priv->sh->flow_priority_check_flag) {
 		/* Supported Verbs flow priority number detection. */
 		err = mlx5_flow_discover_priorities(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a21b8c69a9..4abb207077 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1218,7 +1218,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
 		    tmp != MLX5_XMETA_MODE_META32 &&
-		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
+		    tmp != MLX5_XMETA_MODE_MISS_INFO &&
+		    tmp != MLX5_XMETA_MODE_META32_HWS) {
 			DRV_LOG(ERR, "Invalid extensive metadata parameter.");
 			rte_errno = EINVAL;
 			return -rte_errno;
@@ -2849,6 +2850,10 @@ mlx5_set_metadata_mask(struct rte_eth_dev *dev)
 		meta = UINT32_MAX;
 		mark = (reg_c0 >> rte_bsf32(reg_c0)) & MLX5_FLOW_MARK_MASK;
 		break;
+	case MLX5_XMETA_MODE_META32_HWS:
+		meta = UINT32_MAX;
+		mark = MLX5_FLOW_MARK_MASK;
+		break;
 	default:
 		meta = 0;
 		mark = 0;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 77dbe3593e..3364c4735c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -298,8 +298,8 @@ struct mlx5_sh_config {
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
-	unsigned int dv_flow_en:2;
-	uint32_t dv_xmeta_en:2; /* Enable extensive flow metadata. */
+	uint32_t dv_flow_en:2; /* Enable DV flow. */
+	uint32_t dv_xmeta_en:3; /* Enable extensive flow metadata. */
 	uint32_t dv_miss_info:1; /* Restore packet after partial hw miss. */
 	uint32_t l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	uint32_t vf_nl_en:1; /* Enable Netlink requests in VF mode. */
@@ -312,7 +312,6 @@ struct mlx5_sh_config {
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
@@ -1279,12 +1278,12 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
+	/* Availability of mreg_c's. */
 	void *devx_channel_lwm;
 	struct rte_intr_handle *intr_handle_lwm;
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
-	/* Availability of mreg_c's. */
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1509,6 +1508,7 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
+	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 9c44b2e99b..b570ed7f69 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1107,6 +1107,8 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_METADATA_TX:
@@ -1119,11 +1121,14 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_FLOW_MARK:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
+		case MLX5_XMETA_MODE_META32_HWS:
 			return REG_NON;
 		case MLX5_XMETA_MODE_META16:
 			return REG_C_1;
@@ -4442,7 +4447,8 @@ static bool flow_check_modify_action_type(struct rte_eth_dev *dev,
 		return true;
 	case RTE_FLOW_ACTION_TYPE_FLAG:
 	case RTE_FLOW_ACTION_TYPE_MARK:
-		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY)
+		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS)
 			return true;
 		else
 			return false;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index f661f858c7..15c5826d8a 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -49,6 +49,12 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
 };
 
+/* Private (internal) Field IDs for MODIFY_FIELD action. */
+enum mlx5_rte_flow_field_id {
+		MLX5_RTE_FLOW_FIELD_END = INT_MIN,
+			MLX5_RTE_FLOW_FIELD_META_REG,
+};
+
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
 
 enum {
@@ -1168,6 +1174,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
+	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
 };
 
 /* Jump action struct. */
@@ -1244,6 +1251,11 @@ struct mlx5_flow_group {
 #define MLX5_HW_TBL_MAX_ITEM_TEMPLATE 2
 #define MLX5_HW_TBL_MAX_ACTION_TEMPLATE 32
 
+struct mlx5_flow_template_table_cfg {
+	struct rte_flow_template_table_attr attr; /* Table attributes passed through flow API. */
+	bool external; /* True if created by flow API, false if table is internal to PMD. */
+};
+
 struct rte_flow_template_table {
 	LIST_ENTRY(rte_flow_template_table) next;
 	struct mlx5_flow_group *grp; /* The group rte_flow_template_table uses. */
@@ -1253,6 +1265,7 @@ struct rte_flow_template_table {
 	/* Action templates bind to the table. */
 	struct mlx5_hw_action_template ats[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	struct mlx5_indexed_pool *flow; /* The table's flow ipool. */
+	struct mlx5_flow_template_table_cfg cfg;
 	uint32_t type; /* Flow table type RX/TX/FDB. */
 	uint8_t nb_item_templates; /* Item template number. */
 	uint8_t nb_action_templates; /* Action template number. */
@@ -2332,4 +2345,5 @@ int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index d0f78cae8e..d1f0d63fdc 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1783,7 +1783,8 @@ mlx5_flow_field_id_to_modify_info
 			int reg;
 
 			if (priv->sh->config.dv_flow_en == 2)
-				reg = REG_C_1;
+				reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG,
+							 data->level);
 			else
 				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
 							   data->level, error);
@@ -1852,6 +1853,24 @@ mlx5_flow_field_id_to_modify_info
 		else
 			info[idx].offset = off_be;
 		break;
+	case MLX5_RTE_FLOW_FIELD_META_REG:
+		{
+			uint32_t meta_mask = priv->sh->dv_meta_mask;
+			uint32_t meta_count = __builtin_popcount(meta_mask);
+			uint32_t reg = data->level;
+
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT(reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0, reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 004eacc334..dfbf885530 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,13 +20,27 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
-/* Maximum number of rules in control flow tables */
+/* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Flow group for SQ miss default flows/ */
-#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+/* Lowest flow group usable by an application. */
+#define MLX5_HW_LOWEST_USABLE_GROUP (1)
+
+/* Maximum group index usable by user applications for transfer flows. */
+#define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
+
+/* Lowest priority for HW root table. */
+#define MLX5_HW_LOWEST_PRIO_ROOT 15
+
+/* Lowest priority for HW non-root table. */
+#define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+static int flow_hw_translate_group(struct rte_eth_dev *dev,
+				   const struct mlx5_flow_template_table_cfg *cfg,
+				   uint32_t group,
+				   uint32_t *table_group,
+				   struct rte_flow_error *error);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -210,12 +224,12 @@ flow_hw_rss_item_flags_get(const struct rte_flow_item items[])
  */
 static struct mlx5_hw_jump_action *
 flow_hw_jump_action_register(struct rte_eth_dev *dev,
-			     const struct rte_flow_attr *attr,
+			     const struct mlx5_flow_template_table_cfg *cfg,
 			     uint32_t dest_group,
 			     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_attr jattr = *attr;
+	struct rte_flow_attr jattr = cfg->attr.flow_attr;
 	struct mlx5_flow_group *grp;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -223,9 +237,13 @@ flow_hw_jump_action_register(struct rte_eth_dev *dev,
 		.data = &jattr,
 	};
 	struct mlx5_list_entry *ge;
+	uint32_t target_group;
 
-	jattr.group = dest_group;
-	ge = mlx5_hlist_register(priv->sh->flow_tbls, dest_group, &ctx);
+	target_group = dest_group;
+	if (flow_hw_translate_group(dev, cfg, dest_group, &target_group, error))
+		return NULL;
+	jattr.group = target_group;
+	ge = mlx5_hlist_register(priv->sh->flow_tbls, target_group, &ctx);
 	if (!ge)
 		return NULL;
 	grp = container_of(ge, struct mlx5_flow_group, entry);
@@ -757,7 +775,8 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)conf->src.pvalue :
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
-		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
 			item.spec = &value;
@@ -849,6 +868,9 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	if (m && !!m->port_id) {
 		struct mlx5_priv *port_priv;
 
+		if (!v)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
 		if (port_priv == NULL)
 			return rte_flow_error_set
@@ -892,8 +914,8 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] table_attr
- *   Pointer to the table attributes.
+ * @param[in] cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in/out] acts
@@ -908,12 +930,13 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  */
 static int
 flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct rte_flow_template_table_attr *table_attr,
+			  const struct mlx5_flow_template_table_cfg *cfg,
 			  struct mlx5_hw_actions *acts,
 			  struct rte_flow_actions_template *at,
 			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
 	const struct rte_flow_attr *attr = &table_attr->flow_attr;
 	struct rte_flow_action *actions = at->actions;
 	struct rte_flow_action *action_start = actions;
@@ -980,7 +1003,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
 				acts->jump = flow_hw_jump_action_register
-						(dev, attr, jump_group, error);
+						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
 				acts->rule_acts[i].action = (!!attr->group) ?
@@ -1090,6 +1113,16 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 							   error);
 			if (err)
 				goto err;
+			/*
+			 * Adjust the action source position for the following.
+			 * ... / MODIFY_FIELD: rx_cpy_pos / (QUEUE|RSS) / ...
+			 * The next action will be Q/RSS, there will not be
+			 * another adjustment and the real source position of
+			 * the following actions will be decreased by 1.
+			 * No change of the total actions in the new template.
+			 */
+			if ((actions - action_start) == at->rx_cpy_pos)
+				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			if (flow_hw_represented_port_compile
@@ -1354,7 +1387,8 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 	else
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
-	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
 	}
@@ -1492,7 +1526,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
 			jump = flow_hw_jump_action_register
-				(dev, &attr, jump_group, NULL);
+				(dev, &table->cfg, jump_group, NULL);
 			if (!jump)
 				return -1;
 			rule_acts[act_data->action_dst].action =
@@ -1689,7 +1723,13 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow actions based on the input actions.*/
+	/*
+	 * Construct the flow actions based on the input actions.
+	 * The implicitly appended action is always fixed, like metadata
+	 * copy action from FDB to NIC Rx.
+	 * No need to copy and contrust a new "actions" list based on the
+	 * user's input, in order to save the cost.
+	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
 				  actions, rule_acts, &acts_num)) {
 		rte_errno = EINVAL;
@@ -1997,8 +2037,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] attr
- *   Pointer to the table attributes.
+ * @param[in] table_cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in] nb_item_templates
@@ -2015,7 +2055,7 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  */
 static struct rte_flow_template_table *
 flow_hw_table_create(struct rte_eth_dev *dev,
-		     const struct rte_flow_template_table_attr *attr,
+		     const struct mlx5_flow_template_table_cfg *table_cfg,
 		     struct rte_flow_pattern_template *item_templates[],
 		     uint8_t nb_item_templates,
 		     struct rte_flow_actions_template *action_templates[],
@@ -2027,6 +2067,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -2067,6 +2108,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*tbl), 0, rte_socket_id());
 	if (!tbl)
 		goto error;
+	tbl->cfg = *table_cfg;
 	/* Allocate flow indexed pool. */
 	tbl->flow = mlx5_ipool_create(&cfg);
 	if (!tbl->flow)
@@ -2110,7 +2152,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			goto at_error;
 		}
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, attr,
+		err = flow_hw_actions_translate(dev, &tbl->cfg,
 						&tbl->ats[i].acts,
 						action_templates[i], error);
 		if (err) {
@@ -2153,6 +2195,96 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Translates group index specified by the user in @p attr to internal
+ * group index.
+ *
+ * Translation is done by incrementing group index, so group n becomes n + 1.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] cfg
+ *   Pointer to the template table configuration.
+ * @param[in] group
+ *   Currently used group index (table group or jump destination).
+ * @param[out] table_group
+ *   Pointer to output group index.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success. Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static int
+flow_hw_translate_group(struct rte_eth_dev *dev,
+			const struct mlx5_flow_template_table_cfg *cfg,
+			uint32_t group,
+			uint32_t *table_group,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
+
+	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
+	} else {
+		*table_group = group;
+	}
+	return 0;
+}
+
+/**
+ * Create flow table.
+ *
+ * This function is a wrapper over @ref flow_hw_table_create(), which translates parameters
+ * provided by user to proper internal values.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Pointer to the table attributes.
+ * @param[in] item_templates
+ *   Item template array to be binded to the table.
+ * @param[in] nb_item_templates
+ *   Number of item templates.
+ * @param[in] action_templates
+ *   Action template array to be binded to the table.
+ * @param[in] nb_action_templates
+ *   Number of action templates.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Table on success, Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static struct rte_flow_template_table *
+flow_hw_template_table_create(struct rte_eth_dev *dev,
+			      const struct rte_flow_template_table_attr *attr,
+			      struct rte_flow_pattern_template *item_templates[],
+			      uint8_t nb_item_templates,
+			      struct rte_flow_actions_template *action_templates[],
+			      uint8_t nb_action_templates,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = *attr,
+		.external = true,
+	};
+	uint32_t group = attr->flow_attr.group;
+
+	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
+		return NULL;
+	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
+				    action_templates, nb_action_templates, error);
+}
+
 /**
  * Destroy flow table.
  *
@@ -2271,10 +2403,13 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 					  "cannot use represented_port actions"
 					  " without an E-Switch");
-	if (mask_conf->port_id) {
+	if (mask_conf && mask_conf->port_id) {
 		struct mlx5_priv *port_priv;
 		struct mlx5_priv *dev_priv;
 
+		if (!action_conf)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
 		if (!port_priv)
 			return rte_flow_error_set(error, rte_errno,
@@ -2299,20 +2434,77 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static inline int
+flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
+				const struct rte_flow_action masks[],
+				const struct rte_flow_action *ins_actions,
+				const struct rte_flow_action *ins_masks,
+				struct rte_flow_action *new_actions,
+				struct rte_flow_action *new_masks,
+				uint16_t *ins_pos)
+{
+	uint16_t idx, total = 0;
+	bool ins = false;
+	bool act_end = false;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(ins_actions && ins_masks);
+	for (idx = 0; !act_end; idx++) {
+		if (idx >= MLX5_HW_MAX_ACTS)
+			return -1;
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
+		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+			ins = true;
+			*ins_pos = idx;
+		}
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+			act_end = true;
+	}
+	if (!ins)
+		return 0;
+	else if (idx == MLX5_HW_MAX_ACTS)
+		return -1; /* No more space. */
+	total = idx;
+	/* Before the position, no change for the actions. */
+	for (idx = 0; idx < *ins_pos; idx++) {
+		new_actions[idx] = actions[idx];
+		new_masks[idx] = masks[idx];
+	}
+	/* Insert the new action and mask to the position. */
+	new_actions[idx] = *ins_actions;
+	new_masks[idx] = *ins_masks;
+	/* Remaining content is right shifted by one position. */
+	for (; idx < total; idx++) {
+		new_actions[idx + 1] = actions[idx];
+		new_masks[idx + 1] = masks[idx];
+	}
+	return 0;
+}
+
 static int
 flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
-	int i;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t i;
 	bool actions_end = false;
 	int ret;
 
+	/* FDB actions are only valid to proxy port. */
+	if (attr->transfer && (!priv->sh->config.dv_esw_en || !priv->master))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "transfer actions are only valid to proxy port");
 	for (i = 0; !actions_end; ++i) {
 		const struct rte_flow_action *action = &actions[i];
 		const struct rte_flow_action *mask = &masks[i];
 
+		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
 		if (action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -2409,21 +2601,77 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int len, act_len, mask_len, i;
-	struct rte_flow_actions_template *at;
+	struct rte_flow_actions_template *at = NULL;
+	uint16_t pos = MLX5_HW_MAX_ACTS;
+	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
+	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
+	const struct rte_flow_action *ra;
+	const struct rte_flow_action *rm;
+	const struct rte_flow_action_modify_field rx_mreg = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_B,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field rx_mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action rx_cpy = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg,
+	};
+	const struct rte_flow_action rx_cpy_mask = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg_mask,
+	};
 
-	if (flow_hw_action_validate(dev, actions, masks, error))
+	if (flow_hw_action_validate(dev, attr, actions, masks, error))
 		return NULL;
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				NULL, 0, actions, error);
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en) {
+		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
+						    tmp_action, tmp_mask, &pos)) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "Failed to concatenate new action/mask");
+			return NULL;
+		}
+	}
+	/* Application should make sure only one Q/RSS exist in one rule. */
+	if (pos == MLX5_HW_MAX_ACTS) {
+		ra = actions;
+		rm = masks;
+	} else {
+		ra = tmp_action;
+		rm = tmp_mask;
+	}
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
 		return NULL;
 	len = RTE_ALIGN(act_len, 16);
-	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				 NULL, 0, masks, error);
+	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, rm, error);
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at), 64, rte_socket_id());
+	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
+			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2431,18 +2679,20 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
+	/* Actions part is in the first half. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions, len,
-				actions, error);
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
+				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	at->masks = (struct rte_flow_action *)
-		    (((uint8_t *)at->actions) + act_len);
+	/* Masks part is in the second half. */
+	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
-				 len - act_len, masks, error);
+				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
 	 * The rte_flow_conv() function copies the content from conf pointer.
@@ -2459,7 +2709,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	mlx5_free(at);
+	if (at)
+		mlx5_free(at);
 	return NULL;
 }
 
@@ -2534,6 +2785,80 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 	return copied_items;
 }
 
+static int
+flow_hw_pattern_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error)
+{
+	int i;
+	bool items_end = false;
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+
+	for (i = 0; !items_end; i++) {
+		int type = items[i].type;
+
+		switch (type) {
+		case RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			int reg;
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+
+			reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, tag->index);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported tag index");
+			break;
+		}
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+			struct mlx5_priv *priv = dev->data->dev_private;
+			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
+
+			if (!((1 << (tag->index - REG_C_0)) & regcs))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported internal tag index");
+		}
+		case RTE_FLOW_ITEM_TYPE_VOID:
+		case RTE_FLOW_ITEM_TYPE_ETH:
+		case RTE_FLOW_ITEM_TYPE_VLAN:
+		case RTE_FLOW_ITEM_TYPE_IPV4:
+		case RTE_FLOW_ITEM_TYPE_IPV6:
+		case RTE_FLOW_ITEM_TYPE_UDP:
+		case RTE_FLOW_ITEM_TYPE_TCP:
+		case RTE_FLOW_ITEM_TYPE_GTP:
+		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ITEM_TYPE_VXLAN:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+		case RTE_FLOW_ITEM_TYPE_META:
+		case RTE_FLOW_ITEM_TYPE_GRE:
+		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
+		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
+		case RTE_FLOW_ITEM_TYPE_ICMP:
+		case RTE_FLOW_ITEM_TYPE_ICMP6:
+			break;
+		case RTE_FLOW_ITEM_TYPE_END:
+			items_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL,
+						  "Unsupported item type");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow item template.
  *
@@ -2560,6 +2885,8 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
 
+	if (flow_hw_pattern_validate(dev, attr, items, error))
+		return NULL;
 	if (priv->sh->config.dv_esw_en && attr->ingress) {
 		/*
 		 * Disallow pattern template with ingress and egress/transfer
@@ -2994,6 +3321,17 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+static uint32_t
+flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+{
+	uint32_t usable_mask = ~priv->vport_meta_mask;
+
+	if (usable_mask)
+		return (1 << rte_bsf32(usable_mask));
+	else
+		return 0;
+}
+
 /**
  * Creates a flow pattern template used to match on E-Switch Manager.
  * This template is used to set up a table for SQ miss default flow.
@@ -3032,7 +3370,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match on a TX queue.
+ * Creates a flow pattern template used to match REG_C_0 and a TX queue.
+ * Matching on REG_C_0 is set up to match on least significant bit usable
+ * by user-space, which is set when packet was originated from E-Switch Manager.
+ *
  * This template is used to set up a table for SQ miss default flow.
  *
  * @param dev
@@ -3042,16 +3383,30 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
  *   Pointer to flow pattern template on success, NULL otherwise.
  */
 static struct rte_flow_pattern_template *
-flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
 	};
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_tx_queue queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
@@ -3062,6 +3417,12 @@ flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
+		return NULL;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -3099,6 +3460,132 @@ flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
+/*
+ * Creating a flow pattern template with all ETH packets matching.
+ * This template is used to set up a table for default Tx copy (Tx metadata
+ * to REG_C_1) flow rule usage.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr tx_pa_attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_pattern_template_create(dev, &tx_pa_attr, eth_all, &drop_err);
+}
+
+/**
+ * Creates a flow actions template with modify field action and masked jump action.
+ * Modify field action sets the least significant bit of REG_C_0 (usable by user-space)
+ * to 1, meaning that packet was originated from E-Switch Manager. Jump action
+ * transfers steering to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
+	uint32_t marker_bit_mask = UINT32_MAX;
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
+		return NULL;
+	}
+	set_reg_v.dst.offset = rte_bsf32(marker_bit);
+	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
+	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
 /**
  * Creates a flow actions template with an unmasked JUMP action. Flows
  * based on this template will perform a jump to some group. This template
@@ -3193,6 +3680,73 @@ flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
 					       NULL);
 }
 
+/*
+ * Creating an actions template to use header modify action for register
+ * copying. This template is used to set up a table for copy flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr tx_act_attr = {
+		.egress = 1,
+	};
+	const struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	const struct rte_flow_action copy_reg_mask[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_mask,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
+					       copy_reg_mask, &drop_err);
+}
+
 /**
  * Creates a control flow table used to transfer traffic from E-Switch Manager
  * and TX queues from group 0 to group 1.
@@ -3222,8 +3776,12 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 
@@ -3248,16 +3806,56 @@ flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
 {
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
-			.group = MLX5_HW_SQ_MISS_GROUP,
-			.priority = 0,
+			.group = 1,
+			.priority = MLX5_HW_LOWEST_PRIO_NON_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
+}
+
+/*
+ * Creating the default Tx metadata copy table on NIC Tx group 0.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param pt
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
+					  struct rte_flow_pattern_template *pt,
+					  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr tx_tbl_attr = {
+		.flow_attr = {
+			.group = 0, /* Root */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = 1, /* One default flow rule for all. */
+	};
+	struct mlx5_flow_template_table_cfg tx_tbl_cfg = {
+		.attr = tx_tbl_attr,
+		.external = false,
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
 }
 
 /**
@@ -3282,15 +3880,19 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 15, /* TODO: Flow priority discovery. */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 /**
@@ -3308,11 +3910,14 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
-	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *regc_sq_items_tmpl = NULL;
 	struct rte_flow_pattern_template *port_items_tmpl = NULL;
-	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_pattern_template *tx_meta_items_tmpl = NULL;
+	struct rte_flow_actions_template *regc_jump_actions_tmpl = NULL;
 	struct rte_flow_actions_template *port_actions_tmpl = NULL;
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
+	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
 
 	/* Item templates */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
@@ -3321,8 +3926,8 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
-	if (!sq_items_tmpl) {
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create SQ item template for"
 			" control flows", dev->data->port_id);
 		goto error;
@@ -3333,11 +3938,18 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Action templates */
-	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
-									 MLX5_HW_SQ_MISS_GROUP);
-	if (!jump_sq_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
+	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
+	if (!regc_jump_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
@@ -3347,23 +3959,32 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
 	if (!jump_one_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
-			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_root_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
-	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
@@ -3378,6 +3999,16 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
+		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
+					tx_meta_items_tmpl, tx_meta_actions_tmpl);
+		if (!priv->hw_tx_meta_cpy_tbl) {
+			DRV_LOG(ERR, "port %u failed to create table for default"
+				" Tx metadata copy flow rule", dev->data->port_id);
+			goto error;
+		}
+	}
 	return 0;
 error:
 	if (priv->hw_esw_zero_tbl) {
@@ -3392,16 +4023,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
 	if (port_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
-	if (jump_sq_actions_tmpl)
-		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (regc_jump_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
-	if (sq_items_tmpl)
-		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (regc_sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, regc_sq_items_tmpl, NULL);
 	if (esw_mgr_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
 	return -EINVAL;
@@ -3453,7 +4088,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
-	int ret;
+	int ret = 0;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -3604,6 +4239,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	/* Do not overwrite the internal errno information. */
+	if (ret)
+		return ret;
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -3712,17 +4350,17 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		return;
 	unset |= 1 << (priv->mtr_color_reg - REG_C_0);
 	unset |= 1 << (REG_C_6 - REG_C_0);
-	if (meta_mode == MLX5_XMETA_MODE_META32_HWS) {
-		unset |= 1 << (REG_C_1 - REG_C_0);
+	if (priv->sh->config.dv_esw_en)
 		unset |= 1 << (REG_C_0 - REG_C_0);
-	}
+	if (meta_mode == MLX5_XMETA_MODE_META32_HWS)
+		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
 						mlx5_flow_hw_avl_tags[i];
-				copy_masks |= (1 << i);
+				copy_masks |= (1 << (mlx5_flow_hw_avl_tags[i] - REG_C_0));
 			}
 		}
 		if (copy_masks != masks) {
@@ -3864,7 +4502,6 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	return flow_dv_action_destroy(dev, handle, error);
 }
 
-
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -3872,7 +4509,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
-	.template_table_create = flow_hw_table_create,
+	.template_table_create = flow_hw_template_table_create,
 	.template_table_destroy = flow_hw_table_destroy,
 	.async_flow_create = flow_hw_async_flow_create,
 	.async_flow_destroy = flow_hw_async_flow_destroy,
@@ -3888,13 +4525,6 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
-static uint32_t
-flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
-{
-	MLX5_ASSERT(priv->nb_queue > 0);
-	return priv->nb_queue - 1;
-}
-
 /**
  * Creates a control flow using flow template API on @p proxy_dev device,
  * on behalf of @p owner_dev device.
@@ -3932,7 +4562,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4007,7 +4637,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4144,10 +4774,24 @@ mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
+	struct rte_flow_action_modify_field modify_field = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
 	struct rte_flow_action_jump jump = {
-		.group = MLX5_HW_SQ_MISS_GROUP,
+		.group = 1,
 	};
 	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &modify_field,
+		},
 		{
 			.type = RTE_FLOW_ACTION_TYPE_JUMP,
 			.conf = &jump,
@@ -4170,6 +4814,12 @@ int
 mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 {
 	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_tx_queue queue_spec = {
 		.queue = txq,
 	};
@@ -4177,6 +4827,12 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
@@ -4202,6 +4858,7 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
+	uint32_t marker_bit;
 	int ret;
 
 	RTE_SET_USED(txq);
@@ -4222,6 +4879,14 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
+	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_create_ctrl_flow(dev, proxy_dev,
 					proxy_priv->hw_esw_sq_miss_tbl,
 					items, 0, actions, 0);
@@ -4281,4 +4946,53 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 					items, 0, actions, 0);
 }
 
+int
+mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx || !priv->hw_tx_meta_cpy_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_tx_meta_cpy_tbl,
+					eth_all, 0, copy_reg_action, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 3ef31671b1..9e458356a0 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1290,6 +1290,9 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	if (priv->sh->config.dv_esw_en && priv->master) {
 		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
 			goto error;
+		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
+			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+				goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 09/27] ethdev: add meter profiles/policies config
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (7 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 08/27] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 10/27] net/mlx5: add HW steering meter action Suanming Mou
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Provide an ability to specify the number of meter profiles/policies
alongside with the number of meters during the Flow engine configuration.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 lib/ethdev/rte_flow.h | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index a79f1e7ef0..abb475bdee 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -4898,10 +4898,20 @@ struct rte_flow_port_info {
 	 */
 	uint32_t max_nb_aging_objects;
 	/**
-	 * Maximum number traffic meters.
+	 * Maximum number of traffic meters.
 	 * @see RTE_FLOW_ACTION_TYPE_METER
 	 */
 	uint32_t max_nb_meters;
+	/**
+	 * Maximum number of traffic meter profiles.
+	 * @see RTE_FLOW_ACTION_TYPE_METER
+	 */
+	uint32_t max_nb_meter_profiles;
+	/**
+	 * Maximum number of traffic meter policies.
+	 * @see RTE_FLOW_ACTION_TYPE_METER
+	 */
+	uint32_t max_nb_meter_policies;
 };
 
 /**
@@ -4971,6 +4981,16 @@ struct rte_flow_port_attr {
 	 * @see RTE_FLOW_ACTION_TYPE_METER
 	 */
 	uint32_t nb_meters;
+	/**
+	 * Number of traffic meter profiles to configure.
+	 * @see RTE_FLOW_ACTION_TYPE_METER
+	 */
+	uint32_t nb_meter_profiles;
+	/**
+	 * Number of traffic meter policies to configure.
+	 * @see RTE_FLOW_ACTION_TYPE_METER
+	 */
+	uint32_t nb_meter_policies;
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 10/27] net/mlx5: add HW steering meter action
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (8 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 09/27] ethdev: add meter profiles/policies config Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 11/27] net/mlx5: add HW steering counter action Suanming Mou
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

This commit adds meter action for HWS steering.

HW steering meter is based on ASO. The number of meters will
be used by flows should be specified in advanced in the flow
configure API.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |  58 +-
 drivers/net/mlx5/mlx5_flow.c       |  71 +++
 drivers/net/mlx5/mlx5_flow.h       |  50 ++
 drivers/net/mlx5/mlx5_flow_aso.c   |  30 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  25 -
 drivers/net/mlx5/mlx5_flow_hw.c    | 113 +++-
 drivers/net/mlx5/mlx5_flow_meter.c | 851 ++++++++++++++++++++++++++++-
 7 files changed, 1138 insertions(+), 60 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3364c4735c..263b502d37 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -357,6 +357,9 @@ struct mlx5_hw_q {
 	struct mlx5_hw_q_job **job; /* LIFO header. */
 } __rte_cache_aligned;
 
+
+
+
 #define MLX5_COUNTERS_PER_POOL 512
 #define MLX5_MAX_PENDING_QUERIES 4
 #define MLX5_CNT_CONTAINER_RESIZE 64
@@ -782,15 +785,29 @@ struct mlx5_flow_meter_policy {
 	/* Is meter action in policy table. */
 	uint32_t hierarchy_drop_cnt:1;
 	/* Is any meter in hierarchy contains drop_cnt. */
+	uint32_t skip_r:1;
+	/* If red color policy is skipped. */
 	uint32_t skip_y:1;
 	/* If yellow color policy is skipped. */
 	uint32_t skip_g:1;
 	/* If green color policy is skipped. */
 	uint32_t mark:1;
 	/* If policy contains mark action. */
+	uint32_t initialized:1;
+	/* Initialized. */
+	uint16_t group;
+	/* The group. */
 	rte_spinlock_t sl;
 	uint32_t ref_cnt;
 	/* Use count. */
+	struct rte_flow_pattern_template *hws_item_templ;
+	/* Hardware steering item templates. */
+	struct rte_flow_actions_template *hws_act_templ[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering action templates. */
+	struct rte_flow_template_table *hws_flow_table[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering tables. */
+	struct rte_flow *hws_flow_rule[MLX5_MTR_DOMAIN_MAX][RTE_COLORS];
+	/* Hardware steering rules. */
 	struct mlx5_meter_policy_action_container act_cnt[MLX5_MTR_RTE_COLORS];
 	/* Policy actions container. */
 	void *dr_drop_action[MLX5_MTR_DOMAIN_MAX];
@@ -865,6 +882,7 @@ struct mlx5_flow_meter_info {
 	 */
 	uint32_t transfer:1;
 	uint32_t def_policy:1;
+	uint32_t initialized:1;
 	/* Meter points to default policy. */
 	uint32_t color_aware:1;
 	/* Meter is color aware mode. */
@@ -880,6 +898,10 @@ struct mlx5_flow_meter_info {
 	/**< Flow meter action. */
 	void *meter_action_y;
 	/**< Flow meter action for yellow init_color. */
+	uint32_t meter_offset;
+	/**< Flow meter offset. */
+	uint16_t group;
+	/**< Flow meter group. */
 };
 
 /* PPS(packets per second) map to BPS(Bytes per second).
@@ -914,6 +936,7 @@ struct mlx5_flow_meter_profile {
 	uint32_t ref_cnt; /**< Use count. */
 	uint32_t g_support:1; /**< If G color will be generated. */
 	uint32_t y_support:1; /**< If Y color will be generated. */
+	uint32_t initialized:1; /**< Initialized. */
 };
 
 /* 2 meters in each ASO cache line */
@@ -934,13 +957,20 @@ enum mlx5_aso_mtr_state {
 	ASO_METER_READY, /* CQE received. */
 };
 
+/*aso flow meter type*/
+enum mlx5_aso_mtr_type {
+	ASO_METER_INDIRECT,
+	ASO_METER_DIRECT,
+};
+
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
 	LIST_ENTRY(mlx5_aso_mtr) next;
+	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
-	uint8_t offset;
+	uint32_t offset;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -964,6 +994,14 @@ struct mlx5_aso_mtr_pools_mng {
 	struct mlx5_aso_mtr_pool **pools; /* ASO flow meter pool array. */
 };
 
+/* Bulk management structure for ASO flow meter. */
+struct mlx5_mtr_bulk {
+	uint32_t size; /* Number of ASO objects. */
+	struct mlx5dr_action *action; /* HWS action */
+	struct mlx5_devx_obj *devx_obj; /* DEVX object. */
+	struct mlx5_aso_mtr *aso; /* Array of ASO objects. */
+};
+
 /* Meter management structure for global flow meter resource. */
 struct mlx5_flow_mtr_mng {
 	struct mlx5_aso_mtr_pools_mng pools_mng;
@@ -1017,6 +1055,7 @@ struct mlx5_flow_tbl_resource {
 #define MLX5_FLOW_TABLE_LEVEL_METER (MLX5_MAX_TABLES - 3)
 #define MLX5_FLOW_TABLE_LEVEL_POLICY (MLX5_MAX_TABLES - 4)
 #define MLX5_MAX_TABLES_EXTERNAL MLX5_FLOW_TABLE_LEVEL_POLICY
+#define MLX5_FLOW_TABLE_HWS_POLICY (MLX5_MAX_TABLES - 10)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 #define MLX5_FLOW_TABLE_FACTOR 10
 
@@ -1303,6 +1342,12 @@ TAILQ_HEAD(mlx5_mtr_profiles, mlx5_flow_meter_profile);
 /* MTR list. */
 TAILQ_HEAD(mlx5_legacy_flow_meters, mlx5_legacy_flow_meter);
 
+struct mlx5_mtr_config {
+	uint32_t nb_meters; /**< Number of configured meters */
+	uint32_t nb_meter_profiles; /**< Number of configured meter profiles */
+	uint32_t nb_meter_policies; /**< Number of configured meter policies */
+};
+
 /* RSS description. */
 struct mlx5_flow_rss_desc {
 	uint32_t level;
@@ -1539,12 +1584,16 @@ struct mlx5_priv {
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
+	struct mlx5_mtr_config mtr_config; /* Meter configuration */
 	uint8_t mtr_sfx_reg; /* Meter prefix-suffix flow match REG_C. */
 	uint8_t mtr_color_reg; /* Meter color match REG_C. */
 	struct mlx5_legacy_flow_meters flow_meters; /* MTR list. */
 	struct mlx5_l3t_tbl *mtr_profile_tbl; /* Meter index lookup table. */
+	struct mlx5_flow_meter_profile *mtr_profile_arr; /* Profile array. */
 	struct mlx5_l3t_tbl *policy_idx_tbl; /* Policy index lookup table. */
+	struct mlx5_flow_meter_policy *mtr_policy_arr; /* Policy array. */
 	struct mlx5_l3t_tbl *mtr_idx_tbl; /* Meter index lookup table. */
+	struct mlx5_mtr_bulk mtr_bulk; /* Meter index mapping for HWS */
 	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 	uint8_t fdb_def_rule; /* Whether fdb jump to table 1 is configured. */
 	struct mlx5_mp_id mp_id; /* ID of a multi-process process */
@@ -1579,6 +1628,7 @@ struct mlx5_priv {
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
 #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])
+#define CTRL_QUEUE_ID(priv) ((priv)->nb_queue - 1)
 
 struct rte_hairpin_peer_info {
 	uint32_t qp_id;
@@ -1890,6 +1940,10 @@ void mlx5_pmd_socket_uninit(void);
 
 /* mlx5_flow_meter.c */
 
+int mlx5_flow_meter_init(struct rte_eth_dev *dev,
+			 uint32_t nb_meters,
+			 uint32_t nb_meter_profiles,
+			 uint32_t nb_meter_policies);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
 		uint32_t meter_id, uint32_t *mtr_idx);
@@ -1964,7 +2018,7 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b570ed7f69..fb3be940e5 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8331,6 +8331,40 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 	return fops->configure(dev, port_attr, nb_queue, queue_attr, error);
 }
 
+/**
+ * Validate item template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the item template attributes.
+ * @param[in] items
+ *   The template item pattern.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"pattern validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->pattern_validate(dev, attr, items, error);
+}
+
 /**
  * Create flow item template.
  *
@@ -8396,6 +8430,43 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 	return fops->pattern_template_destroy(dev, template, error);
 }
 
+/**
+ * Validate flow actions template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the action template attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[in] masks
+ *   List of actions that marks which of the action's member is constant.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
+			const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"actions validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->actions_validate(dev, attr, actions, masks, error);
+}
+
 /**
  * Create flow item template.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 15c5826d8a..c5190b1d4f 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1653,6 +1653,11 @@ typedef int (*mlx5_flow_port_configure_t)
 			 uint16_t nb_queue,
 			 const struct rte_flow_queue_attr *queue_attr[],
 			 struct rte_flow_error *err);
+typedef int (*mlx5_flow_pattern_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_pattern_template *(*mlx5_flow_pattern_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_pattern_template_attr *attr,
@@ -1662,6 +1667,12 @@ typedef int (*mlx5_flow_pattern_template_destroy_t)
 			(struct rte_eth_dev *dev,
 			 struct rte_flow_pattern_template *template,
 			 struct rte_flow_error *error);
+typedef int (*mlx5_flow_actions_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_actions_template *(*mlx5_flow_actions_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_actions_template_attr *attr,
@@ -1778,8 +1789,10 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_item_update_t item_update;
 	mlx5_flow_info_get_t info_get;
 	mlx5_flow_port_configure_t configure;
+	mlx5_flow_pattern_validate_t pattern_validate;
 	mlx5_flow_pattern_template_create_t pattern_template_create;
 	mlx5_flow_pattern_template_destroy_t pattern_template_destroy;
+	mlx5_flow_actions_validate_t actions_validate;
 	mlx5_flow_actions_template_create_t actions_template_create;
 	mlx5_flow_actions_template_destroy_t actions_template_destroy;
 	mlx5_flow_table_create_t template_table_create;
@@ -1861,6 +1874,8 @@ mlx5_aso_meter_by_idx(struct mlx5_priv *priv, uint32_t idx)
 
 	/* Decrease to original index. */
 	idx--;
+	if (priv->mtr_bulk.aso)
+		return priv->mtr_bulk.aso + idx;
 	MLX5_ASSERT(idx / MLX5_ASO_MTRS_PER_POOL < pools_mng->n);
 	rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 	pool = pools_mng->pools[idx / MLX5_ASO_MTRS_PER_POOL];
@@ -1963,6 +1978,32 @@ mlx5_translate_tunnel_etypes(uint64_t pattern_flags)
 
 int flow_hw_q_flow_flush(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error);
+
+/*
+ * Convert rte_mtr_color to mlx5 color.
+ *
+ * @param[in] rcol
+ *   rte_mtr_color.
+ *
+ * @return
+ *   mlx5 color.
+ */
+static inline int
+rte_col_2_mlx5_col(enum rte_color rcol)
+{
+	switch (rcol) {
+	case RTE_COLOR_GREEN:
+		return MLX5_FLOW_COLOR_GREEN;
+	case RTE_COLOR_YELLOW:
+		return MLX5_FLOW_COLOR_YELLOW;
+	case RTE_COLOR_RED:
+		return MLX5_FLOW_COLOR_RED;
+	default:
+		break;
+	}
+	return MLX5_FLOW_COLOR_UNDEFINED;
+}
+
 int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
 			     const struct mlx5_flow_tunnel *tunnel,
 			     uint32_t group, uint32_t *table,
@@ -2346,4 +2387,13 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_actions_template_attr *attr,
+		const struct rte_flow_action actions[],
+		const struct rte_flow_action masks[],
+		struct rte_flow_error *error);
+int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 4129e3a9e0..60d0280367 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -642,7 +642,8 @@ mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh)
 static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
-			       struct mlx5_aso_mtr *aso_mtr)
+			       struct mlx5_aso_mtr *aso_mtr,
+			       struct mlx5_mtr_bulk *bulk)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -653,6 +654,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t dseg_idx = 0;
 	struct mlx5_aso_mtr_pool *pool = NULL;
 	uint32_t param_le;
+	int id;
 
 	rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
@@ -666,14 +668,19 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
-	pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-			mtrs[aso_mtr->offset]);
-	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
-			(aso_mtr->offset >> 1));
-	wqe->general_cseg.opcode = rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
-			(ASO_OPC_MOD_POLICER <<
-			WQE_CSEG_OPC_MOD_OFFSET) |
-			sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
+	if (aso_mtr->type == ASO_METER_INDIRECT) {
+		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+				    mtrs[aso_mtr->offset]);
+		id = pool->devx_obj->id;
+	} else {
+		id = bulk->devx_obj->id;
+	}
+	wqe->general_cseg.misc = rte_cpu_to_be_32(id +
+						  (aso_mtr->offset >> 1));
+	wqe->general_cseg.opcode =
+		rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
+			(ASO_OPC_MOD_POLICER << WQE_CSEG_OPC_MOD_OFFSET) |
+			 sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
 	/* There are 2 meters in one ASO cache line. */
 	dseg_idx = aso_mtr->offset & 0x1;
 	wqe->aso_cseg.data_mask =
@@ -811,14 +818,15 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  */
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-			struct mlx5_aso_mtr *mtr)
+			struct mlx5_aso_mtr *mtr,
+			struct mlx5_mtr_bulk *bulk)
 {
 	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 
 	do {
 		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index d1f0d63fdc..80539fd75d 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -216,31 +216,6 @@ flow_dv_attr_init(const struct rte_flow_item *item, union flow_dv_attr *attr,
 	attr->valid = 1;
 }
 
-/*
- * Convert rte_mtr_color to mlx5 color.
- *
- * @param[in] rcol
- *   rte_mtr_color.
- *
- * @return
- *   mlx5 color.
- */
-static inline int
-rte_col_2_mlx5_col(enum rte_color rcol)
-{
-	switch (rcol) {
-	case RTE_COLOR_GREEN:
-		return MLX5_FLOW_COLOR_GREEN;
-	case RTE_COLOR_YELLOW:
-		return MLX5_FLOW_COLOR_YELLOW;
-	case RTE_COLOR_RED:
-		return MLX5_FLOW_COLOR_RED;
-	default:
-		break;
-	}
-	return MLX5_FLOW_COLOR_UNDEFINED;
-}
-
 struct field_modify_info modify_eth[] = {
 	{4,  0, MLX5_MODI_OUT_DMAC_47_16},
 	{2,  4, MLX5_MODI_OUT_DMAC_15_0},
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index dfbf885530..959d566d68 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -903,6 +903,38 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_meter_compile(struct rte_eth_dev *dev,
+		      const struct mlx5_flow_template_table_cfg *cfg,
+		      uint32_t  start_pos, const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	const struct rte_flow_action_meter *meter = action->conf;
+	uint32_t pos = start_pos;
+	uint32_t group = cfg->attr.flow_attr.group;
+
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
+	acts->rule_acts[pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
+		acts->jump = flow_hw_jump_action_register
+		(dev, cfg, aso_mtr->fm.group, error);
+	if (!acts->jump) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	acts->rule_acts[++pos].action = (!!group) ?
+				    acts->jump->hws_action :
+				    acts->jump->root_action;
+	*end_pos = pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	return 0;
+}
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1131,6 +1163,21 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter *)
+			     masks->conf)->mtr_id) {
+				err = flow_hw_meter_compile(dev, cfg,
+						i, actions, acts, &i, error);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							i))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1461,6 +1508,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
+	const struct rte_flow_action_meter *meter = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1468,6 +1516,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	struct mlx5_aso_mtr *mtr;
+	uint32_t mtr_id;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -1587,6 +1637,29 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			rule_acts[act_data->action_dst].action =
 					priv->hw_vport[port_action->port_id];
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			meter = action->conf;
+			mtr_id = meter->mtr_id;
+			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			rule_acts[act_data->action_dst].action =
+				priv->mtr_bulk.action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+								mtr->offset;
+			jump = flow_hw_jump_action_register
+				(dev, &table->cfg, mtr->fm.group, NULL);
+			if (!jump)
+				return -1;
+			MLX5_ASSERT
+				(!rule_acts[act_data->action_dst + 1].action);
+			rule_acts[act_data->action_dst + 1].action =
+					(!!attr.group) ? jump->hws_action :
+							 jump->root_action;
+			job->flow->jump = jump;
+			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
+			(*acts_num)++;
+			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2483,7 +2556,7 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 }
 
 static int
-flow_hw_action_validate(struct rte_eth_dev *dev,
+flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
@@ -2549,6 +2622,9 @@ flow_hw_action_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -2642,7 +2718,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_action_validate(dev, attr, actions, masks, error))
+	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
@@ -2988,15 +3064,27 @@ flow_hw_pattern_template_destroy(struct rte_eth_dev *dev __rte_unused,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_hw_info_get(struct rte_eth_dev *dev __rte_unused,
-		 struct rte_flow_port_info *port_info __rte_unused,
-		 struct rte_flow_queue_info *queue_info __rte_unused,
+flow_hw_info_get(struct rte_eth_dev *dev,
+		 struct rte_flow_port_info *port_info,
+		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
-	/* Nothing to be updated currently. */
+	uint16_t port_id = dev->data->port_id;
+	struct rte_mtr_capabilities mtr_cap;
+	int ret;
+
 	memset(port_info, 0, sizeof(*port_info));
 	/* Queue size is unlimited from low-level. */
+	port_info->max_nb_queues = UINT32_MAX;
 	queue_info->max_size = UINT32_MAX;
+
+	memset(&mtr_cap, 0, sizeof(struct rte_mtr_capabilities));
+	ret = rte_mtr_capabilities_get(port_id, &mtr_cap, NULL);
+	if (!ret) {
+		port_info->max_nb_meters = mtr_cap.n_max;
+		port_info->max_nb_meter_profiles = UINT32_MAX;
+		port_info->max_nb_meter_policies = UINT32_MAX;
+	}
 	return 0;
 }
 
@@ -4191,6 +4279,13 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	priv->nb_queue = nb_q_updated;
 	rte_spinlock_init(&priv->hw_ctrl_lock);
 	LIST_INIT(&priv->hw_ctrl_flows);
+	/* Initialize meter library*/
+	if (port_attr->nb_meters)
+		if (mlx5_flow_meter_init(dev,
+					port_attr->nb_meters,
+					port_attr->nb_meter_profiles,
+					port_attr->nb_meter_policies))
+			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		uint32_t act_flags = 0;
@@ -4505,8 +4600,10 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
+	.pattern_validate = flow_hw_pattern_validate,
 	.pattern_template_create = flow_hw_pattern_template_create,
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
+	.actions_validate = flow_hw_actions_validate,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
 	.template_table_create = flow_hw_template_table_create,
@@ -4562,7 +4659,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4637,7 +4734,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index d4aafe4eea..b69021f6a0 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -18,6 +18,157 @@
 static int mlx5_flow_meter_disable(struct rte_eth_dev *dev,
 		uint32_t meter_id, struct rte_mtr_error *error);
 
+static void
+mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->mtr_policy_arr) {
+		mlx5_free(priv->mtr_policy_arr);
+		priv->mtr_policy_arr = NULL;
+	}
+	if (priv->mtr_profile_arr) {
+		mlx5_free(priv->mtr_profile_arr);
+		priv->mtr_profile_arr = NULL;
+	}
+	if (priv->mtr_bulk.aso) {
+		mlx5_free(priv->mtr_bulk.aso);
+		priv->mtr_bulk.aso = NULL;
+		priv->mtr_bulk.size = 0;
+		mlx5_aso_queue_uninit(priv->sh, ASO_OPC_MOD_POLICER);
+	}
+	if (priv->mtr_bulk.action) {
+		mlx5dr_action_destroy(priv->mtr_bulk.action);
+		priv->mtr_bulk.action = NULL;
+	}
+	if (priv->mtr_bulk.devx_obj) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->mtr_bulk.devx_obj));
+		priv->mtr_bulk.devx_obj = NULL;
+	}
+}
+
+int
+mlx5_flow_meter_init(struct rte_eth_dev *dev,
+		     uint32_t nb_meters,
+		     uint32_t nb_meter_profiles,
+		     uint32_t nb_meter_policies)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_obj *dcs = NULL;
+	uint32_t log_obj_size;
+	int ret = 0;
+	int reg_id;
+	struct mlx5_aso_mtr *aso;
+	uint32_t i;
+	struct rte_mtr_error error;
+
+	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
+		ret = ENOTSUP;
+		rte_mtr_error_set(&error, ENOMEM,
+					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter configuration is invalid.");
+		goto err;
+	}
+	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
+		ret = ENOTSUP;
+		rte_mtr_error_set(&error, ENOMEM,
+					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO is not supported.");
+		goto err;
+	}
+	priv->mtr_config.nb_meters = nb_meters;
+	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		ret = ENOMEM;
+		rte_mtr_error_set(&error, ENOMEM,
+					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	log_obj_size = rte_log2_u32(nb_meters >> 1);
+	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
+		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
+			log_obj_size);
+	if (!dcs) {
+		ret = ENOMEM;
+		rte_mtr_error_set(&error, ENOMEM,
+					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO object allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.devx_obj = dcs;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	if (reg_id < 0) {
+		ret = ENOTSUP;
+		rte_mtr_error_set(&error, ENOMEM,
+					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter register is not available.");
+		goto err;
+	}
+	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
+			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
+				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
+				MLX5DR_ACTION_FLAG_HWS_TX |
+				MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!priv->mtr_bulk.action) {
+		ret = ENOMEM;
+		rte_mtr_error_set(&error, ENOMEM,
+					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter action creation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
+						sizeof(struct mlx5_aso_mtr) * nb_meters,
+						RTE_CACHE_LINE_SIZE,
+						SOCKET_ID_ANY);
+	if (!priv->mtr_bulk.aso) {
+		ret = ENOMEM;
+		rte_mtr_error_set(&error, ENOMEM,
+					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter bulk ASO allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.size = nb_meters;
+	aso = priv->mtr_bulk.aso;
+	for (i = 0; i < priv->mtr_bulk.size; i++) {
+		aso->type = ASO_METER_DIRECT;
+		aso->state = ASO_METER_WAIT;
+		aso->offset = i;
+		aso++;
+	}
+	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
+	priv->mtr_profile_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_profile) *
+				nb_meter_profiles,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_profile_arr) {
+		ret = ENOMEM;
+		rte_mtr_error_set(&error, ENOMEM,
+					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter profile allocation failed.");
+		goto err;
+	}
+	priv->mtr_config.nb_meter_policies = nb_meter_policies;
+	priv->mtr_policy_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_policy) *
+				nb_meter_policies,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_policy_arr) {
+		ret = ENOMEM;
+		rte_mtr_error_set(&error, ENOMEM,
+					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter policy allocation failed.");
+		goto err;
+	}
+	return 0;
+err:
+	mlx5_flow_meter_uninit(dev);
+	return ret;
+}
+
 /**
  * Create the meter action.
  *
@@ -98,6 +249,8 @@ mlx5_flow_meter_profile_find(struct mlx5_priv *priv, uint32_t meter_profile_id)
 	union mlx5_l3t_data data;
 	int32_t ret;
 
+	if (priv->mtr_profile_arr)
+		return &priv->mtr_profile_arr[meter_profile_id];
 	if (mlx5_l3t_get_entry(priv->mtr_profile_tbl,
 			       meter_profile_id, &data) || !data.ptr)
 		return NULL;
@@ -145,17 +298,29 @@ mlx5_flow_meter_profile_validate(struct rte_eth_dev *dev,
 					  RTE_MTR_ERROR_TYPE_METER_PROFILE,
 					  NULL, "Meter profile is null.");
 	/* Meter profile ID must be valid. */
-	if (meter_profile_id == UINT32_MAX)
-		return -rte_mtr_error_set(error, EINVAL,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL, "Meter profile id not valid.");
-	/* Meter profile must not exist. */
-	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
-	if (fmp)
-		return -rte_mtr_error_set(error, EEXIST,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL,
-					  "Meter profile already exists.");
+	if (priv->mtr_profile_arr) {
+		if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp->initialized)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	} else {
+		if (meter_profile_id == UINT32_MAX)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	}
 	if (!priv->sh->meter_aso_en) {
 		/* Old version is even not supported. */
 		if (!priv->sh->cdev->config.hca_attr.qos.flow_meter_old)
@@ -574,6 +739,96 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to add MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[in] profile
+ *   Pointer to meter profile detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_add(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_meter_profile *profile,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+	int ret;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Check input params. */
+	ret = mlx5_flow_meter_profile_validate(dev, meter_profile_id,
+					       profile, error);
+	if (ret)
+		return ret;
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	/* Fill profile info. */
+	fmp->id = meter_profile_id;
+	fmp->profile = *profile;
+	fmp->initialized = 1;
+	/* Fill the flow meter parameters for the PRM. */
+	return mlx5_flow_meter_param_fill(fmp, error);
+}
+
+/**
+ * Callback to delete MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_delete(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Meter id must be valid. */
+	if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id not valid.");
+	/* Meter profile must exist. */
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	if (!fmp->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id is invalid.");
+	/* Check profile is unused. */
+	if (fmp->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  NULL, "Meter profile is in use.");
+	memset(fmp, 0, sizeof(struct mlx5_flow_meter_profile));
+	return 0;
+}
+
 /**
  * Find policy by id.
  *
@@ -594,6 +849,11 @@ mlx5_flow_meter_policy_find(struct rte_eth_dev *dev,
 	struct mlx5_flow_meter_sub_policy *sub_policy = NULL;
 	union mlx5_l3t_data data;
 
+	if (priv->mtr_policy_arr) {
+		if (policy_idx)
+			*policy_idx = policy_id;
+		return &priv->mtr_policy_arr[policy_id];
+	}
 	if (policy_id > MLX5_MAX_SUB_POLICY_TBL_NUM || !priv->policy_idx_tbl)
 		return NULL;
 	if (mlx5_l3t_get_entry(priv->policy_idx_tbl, policy_id, &data) ||
@@ -710,6 +970,43 @@ mlx5_flow_meter_policy_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to check MTR policy action validate for HWS
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_validate(struct rte_eth_dev *dev,
+	struct rte_mtr_meter_policy_params *policy,
+	struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_actions_template_attr attr = {
+		.transfer = priv->sh->config.dv_esw_en ? 1 : 0 };
+	int ret;
+	int i;
+
+	if (!priv->mtr_en || !priv->sh->meter_aso_en)
+		return -rte_mtr_error_set(error, ENOTSUP,
+				RTE_MTR_ERROR_TYPE_METER_POLICY,
+				NULL, "meter policy unsupported.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		ret = mlx5_flow_actions_validate(dev, &attr, policy->actions[i],
+						 policy->actions[i], NULL);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int
 __mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 			uint32_t policy_id,
@@ -1004,6 +1301,338 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to delete MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_delete(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy;
+	uint32_t i, j;
+	uint32_t nb_flows = 0;
+	int ret;
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter policy array is not allocated");
+	/* Meter id must be valid. */
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  &policy_id,
+					  "Meter policy id not valid.");
+	/* Meter policy must exist. */
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (!mtr_policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID, NULL,
+			"Meter policy does not exists.");
+	/* Check policy is unused. */
+	if (mtr_policy->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy is in use.");
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->hws_flow_rule[i][j]) {
+				ret = rte_flow_async_destroy(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_rule[i][j],
+					NULL, NULL);
+				if (ret < 0)
+					continue;
+				nb_flows++;
+			}
+		}
+	}
+	ret = rte_flow_push(dev->data->port_id, CTRL_QUEUE_ID(priv), NULL);
+	while (nb_flows && (ret >= 0)) {
+		ret = rte_flow_pull(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), result,
+					nb_flows, NULL);
+		nb_flows -= ret;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		if (mtr_policy->hws_flow_table[i])
+			rte_flow_template_table_destroy(dev->data->port_id,
+				 mtr_policy->hws_flow_table[i], NULL);
+	}
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->hws_act_templ[i])
+			rte_flow_actions_template_destroy(dev->data->port_id,
+				 mtr_policy->hws_act_templ[i], NULL);
+	}
+	if (mtr_policy->hws_item_templ)
+		rte_flow_pattern_template_destroy(dev->data->port_id,
+				mtr_policy->hws_item_templ, NULL);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	return 0;
+}
+
+/**
+ * Callback to add MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[out] policy_id
+ *   Pointer to policy id
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
+			uint32_t policy_id,
+			struct rte_mtr_meter_policy_params *policy,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy = NULL;
+	const struct rte_flow_action *act;
+	const struct rte_flow_action_meter *mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *plc;
+	uint8_t domain_color = MLX5_MTR_ALL_DOMAIN_BIT;
+	bool is_rss = false;
+	bool is_hierarchy = false;
+	int i, j;
+	uint32_t nb_colors = 0;
+	uint32_t nb_flows = 0;
+	int color;
+	int ret;
+	struct rte_flow_pattern_template_attr pta = {0};
+	struct rte_flow_actions_template_attr ata = {0};
+	struct rte_flow_template_table_attr ta = { {0}, 0 };
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+	const uint32_t color_mask = (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	int color_reg_c_idx = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						   0, NULL);
+	struct rte_flow_item_tag tag_spec = {
+		.data = 0,
+		.index = color_reg_c_idx
+	};
+	struct rte_flow_item_tag tag_mask = {
+		.data = color_mask,
+		.index = 0xff};
+	struct rte_flow_item pattern[] = {
+		[0] = {
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &tag_spec,
+			.mask = &tag_mask,
+		},
+		[1] = { .type = RTE_FLOW_ITEM_TYPE_END }
+	};
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy array is not allocated.");
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy id not valid.");
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (mtr_policy->initialized)
+		return -rte_mtr_error_set(error, EEXIST,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy already exists.");
+	if (!policy ||
+	    !policy->actions[RTE_COLOR_RED] ||
+	    !policy->actions[RTE_COLOR_YELLOW] ||
+	    !policy->actions[RTE_COLOR_GREEN])
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy actions are not valid.");
+	if (policy->actions[RTE_COLOR_RED] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_r = 1;
+	if (policy->actions[RTE_COLOR_YELLOW] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_y = 1;
+	if (policy->actions[RTE_COLOR_GREEN] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_g = 1;
+	if (mtr_policy->skip_r && mtr_policy->skip_y && mtr_policy->skip_g)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy actions are empty.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		act = policy->actions[i];
+		while (act && act->type != RTE_FLOW_ACTION_TYPE_END) {
+			switch (act->type) {
+			case RTE_FLOW_ACTION_TYPE_PORT_ID:
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+				domain_color &= ~(MLX5_MTR_DOMAIN_INGRESS_BIT |
+						  MLX5_MTR_DOMAIN_EGRESS_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_RSS:
+				is_rss = true;
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_QUEUE:
+				domain_color &= ~(MLX5_MTR_DOMAIN_EGRESS_BIT |
+						  MLX5_MTR_DOMAIN_TRANSFER_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_METER:
+				is_hierarchy = true;
+				mtr = act->conf;
+				fm = mlx5_flow_meter_find(priv,
+							  mtr->mtr_id, NULL);
+				if (!fm)
+					return -rte_mtr_error_set(error, EINVAL,
+						RTE_MTR_ERROR_TYPE_MTR_ID, NULL,
+						"Meter not found in meter hierarchy.");
+				plc = mlx5_flow_meter_policy_find(dev,
+								  fm->policy_id,
+								  NULL);
+				MLX5_ASSERT(plc);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->ingress <<
+					 MLX5_MTR_DOMAIN_INGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->egress <<
+					 MLX5_MTR_DOMAIN_EGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->transfer <<
+					 MLX5_MTR_DOMAIN_TRANSFER);
+				break;
+			default:
+				break;
+			}
+			act++;
+		}
+	}
+	if (!domain_color)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy domains are conflicting.");
+	mtr_policy->is_rss = is_rss;
+	mtr_policy->ingress = !!(domain_color & MLX5_MTR_DOMAIN_INGRESS_BIT);
+	pta.ingress = mtr_policy->ingress;
+	mtr_policy->egress = !!(domain_color & MLX5_MTR_DOMAIN_EGRESS_BIT);
+	pta.egress = mtr_policy->egress;
+	mtr_policy->transfer = !!(domain_color & MLX5_MTR_DOMAIN_TRANSFER_BIT);
+	pta.transfer = mtr_policy->transfer;
+	mtr_policy->group = MLX5_FLOW_TABLE_HWS_POLICY - policy_id;
+	mtr_policy->is_hierarchy = is_hierarchy;
+	mtr_policy->initialized = 1;
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	mtr_policy->hws_item_templ =
+		rte_flow_pattern_template_create(dev->data->port_id,
+						 &pta, pattern, NULL);
+	if (!mtr_policy->hws_item_templ)
+		goto policy_add_err;
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->skip_g && i == RTE_COLOR_GREEN)
+			continue;
+		if (mtr_policy->skip_y && i == RTE_COLOR_YELLOW)
+			continue;
+		if (mtr_policy->skip_r && i == RTE_COLOR_RED)
+			continue;
+		mtr_policy->hws_act_templ[nb_colors] =
+			rte_flow_actions_template_create(dev->data->port_id,
+						&ata, policy->actions[i],
+						policy->actions[i], NULL);
+		if (!mtr_policy->hws_act_templ[nb_colors])
+			goto policy_add_err;
+		nb_colors++;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		memset(&ta, 0, sizeof(ta));
+		ta.nb_flows = RTE_COLORS;
+		ta.flow_attr.group = mtr_policy->group;
+		if (i == MLX5_MTR_DOMAIN_INGRESS) {
+			if (!mtr_policy->ingress)
+				continue;
+			ta.flow_attr.ingress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_EGRESS) {
+			if (!mtr_policy->egress)
+				continue;
+			ta.flow_attr.egress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_TRANSFER) {
+			if (!mtr_policy->transfer)
+				continue;
+			ta.flow_attr.transfer = 1;
+		}
+		mtr_policy->hws_flow_table[i] =
+			rte_flow_template_table_create(dev->data->port_id,
+					&ta, &mtr_policy->hws_item_templ, 1,
+					mtr_policy->hws_act_templ, nb_colors,
+					NULL);
+		if (!mtr_policy->hws_flow_table[i])
+			goto policy_add_err;
+		nb_colors = 0;
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->skip_g && j == RTE_COLOR_GREEN)
+				continue;
+			if (mtr_policy->skip_y && j == RTE_COLOR_YELLOW)
+				continue;
+			if (mtr_policy->skip_r && j == RTE_COLOR_RED)
+				continue;
+			color = rte_col_2_mlx5_col((enum rte_color)j);
+			tag_spec.data = color;
+			mtr_policy->hws_flow_rule[i][j] =
+				rte_flow_async_create(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_table[i],
+					pattern, 0, policy->actions[j],
+					nb_colors, NULL, NULL);
+			if (!mtr_policy->hws_flow_rule[i][j])
+				goto policy_add_err;
+			nb_colors++;
+			nb_flows++;
+		}
+		ret = rte_flow_push(dev->data->port_id,
+				    CTRL_QUEUE_ID(priv), NULL);
+		if (ret < 0)
+			goto policy_add_err;
+		while (nb_flows) {
+			ret = rte_flow_pull(dev->data->port_id,
+					    CTRL_QUEUE_ID(priv), result,
+					    nb_flows, NULL);
+			if (ret < 0)
+				goto policy_add_err;
+			for (j = 0; j < ret; j++) {
+				if (result[j].status == RTE_FLOW_OP_ERROR)
+					goto policy_add_err;
+			}
+			nb_flows -= ret;
+		}
+	}
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+policy_add_err:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	ret = mlx5_flow_meter_policy_hws_delete(dev, policy_id, error);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	if (ret)
+		return ret;
+	return -rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Failed to create meter policy.");
+}
+
 /**
  * Check meter validation.
  *
@@ -1087,7 +1716,8 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
@@ -1336,7 +1966,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1369,6 +2000,90 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 		NULL, "Failed to create devx meter.");
 }
 
+/**
+ * Create meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[in] params
+ *   Pointer to rte meter parameters.
+ * @param[in] shared
+ *   Meter shared with other flow or not.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
+		       struct rte_mtr_params *params, int shared,
+		       struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *profile;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy = NULL;
+	struct mlx5_aso_mtr *aso_mtr;
+	int ret;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+			"Meter bulk array is not allocated.");
+	/* Meter profile must exist. */
+	profile = mlx5_flow_meter_profile_find(priv, params->meter_profile_id);
+	if (!profile->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+			NULL, "Meter profile id not valid.");
+	/* Meter policy must exist. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			params->meter_policy_id, NULL);
+	if (!policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy id not valid.");
+	/* Meter ID must be valid. */
+	if (meter_id >= priv->mtr_config.nb_meters)
+		return -rte_mtr_error_set(error, EINVAL,
+			RTE_MTR_ERROR_TYPE_MTR_ID,
+			NULL, "Meter id not valid.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object already exists.");
+	/* Fill the flow meter parameters. */
+	fm->meter_id = meter_id;
+	fm->policy_id = params->meter_policy_id;
+	fm->profile = profile;
+	fm->meter_offset = meter_id;
+	fm->group = policy->group;
+	/* Add to the flow meter list. */
+	fm->active_state = 1; /* Config meter starts as active. */
+	fm->is_enable = params->meter_enable;
+	fm->shared = !!shared;
+	fm->initialized = 1;
+	/* Update ASO flow meter by wqe. */
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+					   &priv->mtr_bulk);
+	if (ret)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+			NULL, "Failed to create devx meter.");
+	fm->active_state = params->meter_enable;
+	__atomic_add_fetch(&fm->profile->ref_cnt, 1, __ATOMIC_RELAXED);
+	__atomic_add_fetch(&policy->ref_cnt, 1, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
 mlx5_flow_meter_params_flush(struct rte_eth_dev *dev,
 			struct mlx5_flow_meter_info *fm,
@@ -1475,6 +2190,58 @@ mlx5_flow_meter_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
 	return 0;
 }
 
+/**
+ * Destroy meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_METER_POLICY, NULL,
+			"Meter bulk array is not allocated.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (!fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object id not valid.");
+	/* Meter object must not have any owner. */
+	if (fm->ref_cnt > 0)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter object is being used.");
+	/* Destroy the meter profile. */
+	__atomic_sub_fetch(&fm->profile->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	/* Destroy the meter policy. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			fm->policy_id, NULL);
+	__atomic_sub_fetch(&policy->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	memset(fm, 0, sizeof(struct mlx5_flow_meter_info));
+	return 0;
+}
+
 /**
  * Modify meter state.
  *
@@ -1798,6 +2565,23 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.stats_read = mlx5_flow_meter_stats_read,
 };
 
+static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
+	.capabilities_get = mlx5_flow_mtr_cap_get,
+	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
+	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
+	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
+	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.create = mlx5_flow_meter_hws_create,
+	.destroy = mlx5_flow_meter_hws_destroy,
+	.meter_enable = mlx5_flow_meter_enable,
+	.meter_disable = mlx5_flow_meter_disable,
+	.meter_profile_update = mlx5_flow_meter_profile_update,
+	.meter_dscp_table_update = NULL,
+	.stats_update = NULL,
+	.stats_read = NULL,
+};
+
 /**
  * Get meter operations.
  *
@@ -1812,7 +2596,12 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 int
 mlx5_flow_meter_ops_get(struct rte_eth_dev *dev __rte_unused, void *arg)
 {
-	*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_hws_ops;
+	else
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
 	return 0;
 }
 
@@ -1841,6 +2630,12 @@ mlx5_flow_meter_find(struct mlx5_priv *priv, uint32_t meter_id,
 	union mlx5_l3t_data data;
 	uint16_t n_valid;
 
+	if (priv->mtr_bulk.aso) {
+		if (mtr_idx)
+			*mtr_idx = meter_id;
+		aso_mtr = priv->mtr_bulk.aso + meter_id;
+		return &aso_mtr->fm;
+	}
 	if (priv->sh->meter_aso_en) {
 		rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 		n_valid = pools_mng->n_valid;
@@ -2185,6 +2980,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 	struct mlx5_flow_meter_profile *fmp;
 	struct mlx5_legacy_flow_meter *legacy_fm;
 	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
 	struct mlx5_flow_meter_sub_policy *sub_policy;
 	void *tmp;
 	uint32_t i, mtr_idx, policy_idx;
@@ -2219,6 +3015,14 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 				NULL, "MTR object meter profile invalid.");
 		}
 	}
+	if (priv->mtr_bulk.aso) {
+		for (i = 1; i <= priv->mtr_config.nb_meter_profiles; i++) {
+			aso_mtr = mlx5_aso_meter_by_idx(priv, i);
+			fm = &aso_mtr->fm;
+			if (fm->initialized)
+				mlx5_flow_meter_hws_destroy(dev, i, error);
+		}
+	}
 	if (priv->policy_idx_tbl) {
 		MLX5_L3T_FOREACH(priv->policy_idx_tbl, i, entry) {
 			policy_idx = *(uint32_t *)entry;
@@ -2244,6 +3048,15 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->policy_idx_tbl);
 		priv->policy_idx_tbl = NULL;
 	}
+	if (priv->mtr_policy_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_policies; i++) {
+			policy = mlx5_flow_meter_policy_find(dev, i,
+							     &policy_idx);
+			if (policy->initialized)
+				mlx5_flow_meter_policy_hws_delete(dev, i,
+								  error);
+		}
+	}
 	if (priv->mtr_profile_tbl) {
 		MLX5_L3T_FOREACH(priv->mtr_profile_tbl, i, entry) {
 			fmp = entry;
@@ -2257,9 +3070,19 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->mtr_profile_tbl);
 		priv->mtr_profile_tbl = NULL;
 	}
+	if (priv->mtr_profile_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_profiles; i++) {
+			fmp = mlx5_flow_meter_profile_find(priv, i);
+			if (fmp->initialized)
+				mlx5_flow_meter_profile_hws_delete(dev, i,
+								   error);
+		}
+	}
 	/* Delete default policy table. */
 	mlx5_flow_destroy_def_policy(dev);
 	if (priv->sh->refcnt == 1)
 		mlx5_flow_destroy_mtr_drop_tbls(dev);
+	/* Destroy HWS configuration. */
+	mlx5_flow_meter_uninit(dev);
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 11/27] net/mlx5: add HW steering counter action
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (9 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 10/27] net/mlx5: add HW steering meter action Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 12/27] net/mlx5: support caching queue action Suanming Mou
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella; +Cc: dev, Xiaoyu Min

From: Xiaoyu Min <jackmin@nvidia.com>

This commit adds HW steering counter action support.
Pool mechanism is the basic data structure for the HW steering counter.

The HW steering's counter pool is based on the rte_ring of zero-copy
variation.

There are two global rte_rings:
1. free_list:
     Store the counters indexes, which are ready for use.
2. wait_reset_list:
     Store the counters indexes, which are just freed from the user and
     need to query the hardware counter to get the reset value before
     this counter can be reused again.

The counter pool also supports cache per HW steering's queues, which are
also based on rte_ring of zero-copy variation.

The cache can be configured in size, preload, threshold, and fetch size,
they are all exposed via device args.

The main operations of the counter pool are as follows:

 - Get one counter from the pool:
   1. The user call _get_* API.
   2. If the cache is enabled, dequeue one counter index from the local
      cache:
      2.A: if the dequeued one from the local cache is still in reset
	status (counter's query_gen_when_free is equal to pool's query
	gen):
	I. Flush all counters in local cache back to global
	   wait_reset_list.
	II. Fetch _fetch_sz_ counters into the cache from the global
	    free list.
	III. Fetch one counter from the cache.
   3. If the cache is empty, fetch _fetch_sz_ counters from the global
      free list into the cache and fetch one counter from the cache.
 - Free one counter into the pool:
   1. The user calls _put_* API.
   2. Put the counter into the local cache.
   3. If the local cache is full:
      3.A: Write back all counters above _threshold_ into the global
           wait_reset_list.
      3.B: Also, write back this counter into the global wait_reset_list.

When the local cache is disabled, _get_/_put_ cache directly from/into
global list.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>

net/mlx5: move ASO query counter into ASO file

Move the ASO counter query logical into the dedicated ASO file.
The function name is changed accordingly.
Also use the max SQ size for ASO counter query.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  50 +++
 drivers/common/mlx5/mlx5_devx_cmds.h |  27 ++
 drivers/common/mlx5/mlx5_prm.h       |  20 +-
 drivers/common/mlx5/version.map      |   1 +
 drivers/net/mlx5/meson.build         |   1 +
 drivers/net/mlx5/mlx5.c              |  14 +
 drivers/net/mlx5/mlx5.h              |  27 ++
 drivers/net/mlx5/mlx5_defs.h         |   2 +
 drivers/net/mlx5/mlx5_flow.c         |  27 +-
 drivers/net/mlx5/mlx5_flow.h         |   2 +
 drivers/net/mlx5/mlx5_flow_aso.c     | 261 ++++++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c      | 142 ++++++-
 drivers/net/mlx5/mlx5_flow_meter.c   |   8 +-
 drivers/net/mlx5/mlx5_hws_cnt.c      | 523 +++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_hws_cnt.h      | 558 +++++++++++++++++++++++++++
 15 files changed, 1635 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index ac6891145d..eef7a98248 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -176,6 +176,41 @@ mlx5_devx_cmd_register_write(void *ctx, uint16_t reg_id, uint32_t arg,
 	return 0;
 }
 
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+		struct mlx5_devx_counter_attr *attr)
+{
+	struct mlx5_devx_obj *dcs = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*dcs),
+						0, SOCKET_ID_ANY);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	if (attr->bulk_log_max_alloc)
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk_log_size,
+			 attr->flow_counter_bulk_log_size);
+	else
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk,
+			 attr->bulk_n_128);
+	if (attr->pd_valid)
+		MLX5_SET(alloc_flow_counter_in, in, pd, attr->pd);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		mlx5_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
 /**
  * Allocate flow counters via devx interface.
  *
@@ -967,6 +1002,16 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 					 general_obj_types) &
 			      MLX5_GENERAL_OBJ_TYPES_CAP_CONN_TRACK_OFFLOAD);
 	attr->rq_delay_drop = MLX5_GET(cmd_hca_cap, hcattr, rq_delay_drop);
+	attr->max_flow_counter_15_0 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_15_0);
+	attr->max_flow_counter_31_16 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_31_16);
+	attr->alloc_flow_counter_pd = MLX5_GET(cmd_hca_cap, hcattr,
+			alloc_flow_counter_pd);
+	attr->flow_counter_access_aso = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_counter_access_aso);
+	attr->flow_access_aso_opc_mod = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_access_aso_opc_mod);
 	if (attr->crypto) {
 		attr->aes_xts = MLX5_GET(cmd_hca_cap, hcattr, aes_xts);
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
@@ -989,6 +1034,11 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 		}
 		attr->log_min_stride_wqe_sz = MLX5_GET(cmd_hca_cap_2, hcattr,
 						       log_min_stride_wqe_sz);
+		attr->flow_counter_bulk_log_max_alloc = MLX5_GET(cmd_hca_cap_2,
+				hcattr, flow_counter_bulk_log_max_alloc);
+		attr->flow_counter_bulk_log_granularity =
+			MLX5_GET(cmd_hca_cap_2, hcattr,
+				 flow_counter_bulk_log_granularity);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index d69dad613e..15b46f2acd 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -15,6 +15,16 @@
 #define MLX5_DEVX_MAX_KLM_ENTRIES ((UINT16_MAX - \
 		MLX5_ST_SZ_DW(create_mkey_in) * 4) / (MLX5_ST_SZ_DW(klm) * 4))
 
+struct mlx5_devx_counter_attr {
+	uint32_t pd_valid:1;
+	uint32_t pd:24;
+	uint32_t bulk_log_max_alloc:1;
+	union {
+		uint8_t flow_counter_bulk_log_size;
+		uint8_t bulk_n_128;
+	};
+};
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
@@ -263,6 +273,18 @@ struct mlx5_hca_attr {
 	uint32_t set_reg_c:8;
 	uint32_t nic_flow_table:1;
 	uint32_t modify_outer_ip_ecn:1;
+	union {
+		uint32_t max_flow_counter;
+		struct {
+			uint16_t max_flow_counter_15_0;
+			uint16_t max_flow_counter_31_16;
+		};
+	};
+	uint32_t flow_counter_bulk_log_max_alloc:5;
+	uint32_t flow_counter_bulk_log_granularity:5;
+	uint32_t alloc_flow_counter_pd:1;
+	uint32_t flow_counter_access_aso:1;
+	uint32_t flow_access_aso_opc_mod:8;
 };
 
 /* LAG Context. */
@@ -593,6 +615,11 @@ struct mlx5_devx_crypto_login_attr {
 
 /* mlx5_devx_cmds.c */
 
+__rte_internal
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+				struct mlx5_devx_counter_attr *attr);
+
 __rte_internal
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(void *ctx,
 						       uint32_t bulk_sz);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 628bae72b2..88121d5563 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1169,8 +1169,10 @@ struct mlx5_ifc_alloc_flow_counter_in_bits {
 	u8 reserved_at_10[0x10];
 	u8 reserved_at_20[0x10];
 	u8 op_mod[0x10];
-	u8 flow_counter_id[0x20];
-	u8 reserved_at_40[0x18];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x13];
+	u8 flow_counter_bulk_log_size[0x5];
 	u8 flow_counter_bulk[0x8];
 };
 
@@ -1404,7 +1406,13 @@ enum {
 #define MLX5_STEERING_LOGIC_FORMAT_CONNECTX_6DX 0x1
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x20];
+	u8 access_other_hca_roce[0x1];
+	u8 alloc_flow_counter_pd[0x1];
+	u8 flow_counter_access_aso[0x1];
+	u8 reserved_at_3[0x5];
+	u8 flow_access_aso_opc_mod[0x8];
+	u8 reserved_at_10[0xf];
+	u8 vhca_resource_manager[0x1];
 	u8 hca_cap_2[0x1];
 	u8 reserved_at_21[0xf];
 	u8 vhca_id[0x10];
@@ -2111,7 +2119,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 format_select_dw_8_6_ext[0x1];
 	u8 reserved_at_1ac[0x14];
 	u8 general_obj_types_127_64[0x40];
-	u8 reserved_at_200[0x80];
+	u8 reserved_at_200[0x53];
+	u8 flow_counter_bulk_log_max_alloc[0x5];
+	u8 reserved_at_258[0x3];
+	u8 flow_counter_bulk_log_granularity[0x5];
+	u8 reserved_at_260[0x20];
 	u8 format_select_dw_gtpu_dw_0[0x8];
 	u8 format_select_dw_gtpu_dw_1[0x8];
 	u8 format_select_dw_gtpu_dw_2[0x8];
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 413dec14ab..4f72900519 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -40,6 +40,7 @@ INTERNAL {
 	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_alloc_general;
 	mlx5_devx_cmd_flow_counter_query;
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_flow_single_dump;
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index f9b266c900..4433849c89 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -37,6 +37,7 @@ sources = files(
         'mlx5_vlan.c',
         'mlx5_utils.c',
         'mlx5_devx.c',
+	'mlx5_hws_cnt.c',
 )
 
 if is_linux
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4abb207077..314176022a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -175,6 +175,12 @@
 /* Device parameter to create the fdb default rule in PMD */
 #define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
 
+/* HW steering counter configuration. */
+#define MLX5_HWS_CNT_SERVICE_CORE "service_core"
+
+/* HW steering counter's query interval. */
+#define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1245,6 +1251,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->allow_duplicate_pattern = !!tmp;
 	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
 		config->fdb_def_rule = !!tmp;
+	} else if (strcmp(MLX5_HWS_CNT_SERVICE_CORE, key) == 0) {
+		config->cnt_svc.service_core = tmp;
+	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
+		config->cnt_svc.cycle_time = tmp;
 	}
 	return 0;
 }
@@ -1281,6 +1291,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
 		MLX5_FDB_DEFAULT_RULE_EN,
+		MLX5_HWS_CNT_SERVICE_CORE,
+		MLX5_HWS_CNT_CYCLE_TIME,
 		NULL,
 	};
 	int ret = 0;
@@ -1293,6 +1305,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
 	config->fdb_def_rule = 1;
+	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
+	config->cnt_svc.service_core = rte_get_main_lcore();
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 263b502d37..8d82c68569 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -308,6 +308,10 @@ struct mlx5_sh_config {
 	uint32_t hw_fcs_strip:1; /* FCS stripping is supported. */
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
+	struct {
+		uint16_t service_core;
+		uint32_t cycle_time; /* query cycle time in milli-second. */
+	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
@@ -1224,6 +1228,22 @@ struct mlx5_flex_item {
 	struct mlx5_flex_pattern_field map[MLX5_FLEX_ITEM_MAPPING_NUM];
 };
 
+#define HWS_CNT_ASO_SQ_NUM 4
+
+struct mlx5_hws_aso_mng {
+	uint16_t sq_num;
+	struct mlx5_aso_sq sqs[HWS_CNT_ASO_SQ_NUM];
+};
+
+struct mlx5_hws_cnt_svc_mng {
+	uint32_t refcnt;
+	uint32_t service_core;
+	uint32_t query_interval;
+	pthread_t service_thread;
+	uint8_t svc_running;
+	struct mlx5_hws_aso_mng aso_mng __rte_cache_aligned;
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -1323,6 +1343,7 @@ struct mlx5_dev_ctx_shared {
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
+	struct mlx5_hws_cnt_svc_mng *cnt_svc;
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1623,6 +1644,7 @@ struct mlx5_priv {
 	/* HW steering global tag action. */
 	struct mlx5dr_action *hw_tag[2];
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
+	struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
 #endif
 };
 
@@ -2036,6 +2058,11 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
+void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
+int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
 /* mlx5_flow_flex.c */
 
 struct rte_flow_item_flex_handle *
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 585afb0a98..d064abfef3 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -188,4 +188,6 @@
 #define static_assert _Static_assert
 #endif
 
+#define MLX5_CNT_SVC_CYCLE_TIME_DEFAULT 500
+
 #endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index fb3be940e5..658cc69750 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7832,24 +7832,33 @@ mlx5_flow_isolate(struct rte_eth_dev *dev,
  */
 static int
 flow_drv_query(struct rte_eth_dev *dev,
-	       uint32_t flow_idx,
+	       struct rte_flow *eflow,
 	       const struct rte_flow_action *actions,
 	       void *data,
 	       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow *flow = mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
-					       flow_idx);
-	enum mlx5_flow_drv_type ftype;
+	struct rte_flow *flow = NULL;
+	enum mlx5_flow_drv_type ftype = MLX5_FLOW_TYPE_MIN;
 
+	if (priv->sh->config.dv_flow_en == 2) {
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+		flow = eflow;
+		ftype = MLX5_FLOW_TYPE_HW;
+#endif
+	} else {
+		flow = (struct rte_flow *)mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
+				(uintptr_t)(void *)eflow);
+	}
 	if (!flow) {
 		return rte_flow_error_set(error, ENOENT,
 			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			  NULL,
 			  "invalid flow handle");
 	}
-	ftype = flow->drv_type;
+	if (ftype == MLX5_FLOW_TYPE_MIN)
+		ftype = flow->drv_type;
 	MLX5_ASSERT(ftype > MLX5_FLOW_TYPE_MIN && ftype < MLX5_FLOW_TYPE_MAX);
 	fops = flow_get_drv_ops(ftype);
 
@@ -7870,14 +7879,8 @@ mlx5_flow_query(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	int ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (priv->sh->config.dv_flow_en == 2)
-		return rte_flow_error_set(error, ENOTSUP,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-			  NULL,
-			  "Flow non-Q query not supported");
-	ret = flow_drv_query(dev, (uintptr_t)(void *)flow, actions, data,
+	ret = flow_drv_query(dev, flow, actions, data,
 			     error);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index c5190b1d4f..cdea4076d8 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1104,6 +1104,7 @@ struct rte_flow_hw {
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	struct mlx5dr_rule rule; /* HWS layer data struct. */
+	uint32_t cnt_id;
 } __rte_packed;
 
 /* rte flow action translate to DR action struct. */
@@ -1225,6 +1226,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
+	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 60d0280367..ed9272e583 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -12,6 +12,9 @@
 
 #include "mlx5.h"
 #include "mlx5_flow.h"
+#include "mlx5_hws_cnt.h"
+
+#define MLX5_ASO_CNT_QUEUE_LOG_DESC 14
 
 /**
  * Free MR resources.
@@ -79,6 +82,33 @@ mlx5_aso_destroy_sq(struct mlx5_aso_sq *sq)
 	memset(sq, 0, sizeof(*sq));
 }
 
+/**
+ * Initialize Send Queue used for ASO access counter.
+ *
+ * @param[in] sq
+ *   ASO SQ to initialize.
+ */
+static void
+mlx5_aso_cnt_init_sq(struct mlx5_aso_sq *sq)
+{
+	volatile struct mlx5_aso_wqe *restrict wqe;
+	int i;
+	int size = 1 << sq->log_desc_n;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0, wqe = &sq->sq_obj.aso_wqes[0]; i < size; ++i, ++wqe) {
+		wqe->general_cseg.sq_ds = rte_cpu_to_be_32((sq->sqn << 8) |
+							  (sizeof(*wqe) >> 4));
+		wqe->aso_cseg.operand_masks = rte_cpu_to_be_32
+			(0u |
+			 (ASO_OPER_LOGICAL_OR << ASO_CSEG_COND_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_1_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_0_OPER_OFFSET) |
+			 (BYTEWISE_64BYTE << ASO_CSEG_DATA_MASK_MODE_OFFSET));
+		wqe->aso_cseg.data_mask = RTE_BE64(UINT64_MAX);
+	}
+}
+
 /**
  * Initialize Send Queue used for ASO access.
  *
@@ -191,7 +221,7 @@ mlx5_aso_ct_init_sq(struct mlx5_aso_sq *sq)
  */
 static int
 mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
-		   void *uar)
+		   void *uar, uint16_t log_desc_n)
 {
 	struct mlx5_devx_cq_attr cq_attr = {
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(uar),
@@ -212,12 +242,12 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	int ret;
 
 	if (mlx5_devx_cq_create(cdev->ctx, &sq->cq.cq_obj,
-				MLX5_ASO_QUEUE_LOG_DESC, &cq_attr,
+				log_desc_n, &cq_attr,
 				SOCKET_ID_ANY))
 		goto error;
 	sq->cq.cq_ci = 0;
-	sq->cq.log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
-	sq->log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
+	sq->cq.log_desc_n = log_desc_n;
+	sq->log_desc_n = log_desc_n;
 	sq_attr.cqn = sq->cq.cq_obj.cq->id;
 	/* for mlx5_aso_wqe that is twice the size of mlx5_wqe */
 	log_wqbb_n = sq->log_desc_n + 1;
@@ -269,7 +299,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    sq_desc_n, &sh->aso_age_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->aso_age_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->aso_age_mng->aso_sq.mr);
 			return -1;
 		}
@@ -277,7 +308,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		break;
 	case ASO_OPC_MOD_POLICER:
 		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj))
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
 			return -1;
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
@@ -287,7 +318,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    &sh->ct_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
 			return -1;
 		}
@@ -1403,3 +1434,219 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 	rte_errno = EBUSY;
 	return -rte_errno;
 }
+
+int
+mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_hws_aso_mng *aso_mng = NULL;
+	uint8_t idx;
+	struct mlx5_aso_sq *sq;
+
+	MLX5_ASSERT(sh);
+	MLX5_ASSERT(sh->cnt_svc);
+	aso_mng = &sh->cnt_svc->aso_mng;
+	aso_mng->sq_num = HWS_CNT_ASO_SQ_NUM;
+	for (idx = 0; idx < HWS_CNT_ASO_SQ_NUM; idx++) {
+		sq = &aso_mng->sqs[idx];
+		if (mlx5_aso_sq_create(sh->cdev, sq, sh->tx_uar.obj,
+					MLX5_ASO_CNT_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_cnt_init_sq(sq);
+	}
+	return 0;
+error:
+	mlx5_aso_cnt_queue_uninit(sh);
+	return -1;
+}
+
+void
+mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh)
+{
+	uint16_t idx;
+
+	for (idx = 0; idx < sh->cnt_svc->aso_mng.sq_num; idx++)
+		mlx5_aso_destroy_sq(&sh->cnt_svc->aso_mng.sqs[idx]);
+	sh->cnt_svc->aso_mng.sq_num = 0;
+}
+
+static uint16_t
+mlx5_aso_cnt_sq_enqueue_burst(struct mlx5_hws_cnt_pool *cpool,
+		struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_aso_sq *sq, uint32_t n,
+		uint32_t offset, uint32_t dcs_id_base)
+{
+	volatile struct mlx5_aso_wqe *wqe;
+	uint16_t size = 1 << sq->log_desc_n;
+	uint16_t mask = size - 1;
+	uint16_t max;
+	uint32_t upper_offset = offset;
+	uint64_t addr;
+	uint32_t ctrl_gen_id = 0;
+	uint8_t opcmod = sh->cdev->config.hca_attr.flow_access_aso_opc_mod;
+	rte_be32_t lkey = rte_cpu_to_be_32(cpool->raw_mng->mr.lkey);
+	uint16_t aso_n = (uint16_t)(RTE_ALIGN_CEIL(n, 4) / 4);
+	uint32_t ccntid;
+
+	max = RTE_MIN(size - (uint16_t)(sq->head - sq->tail), aso_n);
+	if (unlikely(!max))
+		return 0;
+	upper_offset += (max * 4);
+	/* Because only one burst at one time, we can use the same elt. */
+	sq->elts[0].burst_size = max;
+	ctrl_gen_id = dcs_id_base;
+	ctrl_gen_id /= 4;
+	do {
+		ccntid = upper_offset - max * 4;
+		wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
+		rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
+		wqe->general_cseg.misc = rte_cpu_to_be_32(ctrl_gen_id);
+		wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+							 MLX5_COMP_MODE_OFFSET);
+		wqe->general_cseg.opcode = rte_cpu_to_be_32
+						(MLX5_OPCODE_ACCESS_ASO |
+						 (opcmod <<
+						  WQE_CSEG_OPC_MOD_OFFSET) |
+						 (sq->pi <<
+						  WQE_CSEG_WQE_INDEX_OFFSET));
+		addr = (uint64_t)RTE_PTR_ADD(cpool->raw_mng->raw,
+				ccntid * sizeof(struct flow_counter_stats));
+		wqe->aso_cseg.va_h = rte_cpu_to_be_32((uint32_t)(addr >> 32));
+		wqe->aso_cseg.va_l_r = rte_cpu_to_be_32((uint32_t)addr | 1u);
+		wqe->aso_cseg.lkey = lkey;
+		sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
+		sq->head++;
+		sq->next++;
+		ctrl_gen_id++;
+		max--;
+	} while (max);
+	wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+							 MLX5_COMP_MODE_OFFSET);
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	return sq->elts[0].burst_size;
+}
+
+static uint16_t
+mlx5_aso_cnt_completion_handle(struct mlx5_aso_sq *sq)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const unsigned int cq_size = 1 << cq->log_desc_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = cq->cq_ci & mask;
+	const uint16_t max = (uint16_t)(sq->head - sq->tail);
+	uint16_t i = 0;
+	int ret;
+	if (unlikely(!max))
+		return 0;
+	idx = next_idx;
+	next_idx = (cq->cq_ci + 1) & mask;
+	rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+	cqe = &cq->cq_obj.cqes[idx];
+	ret = check_cqe(cqe, cq_size, cq->cq_ci);
+	/*
+	 * Be sure owner read is done before any other cookie field or
+	 * opaque field.
+	 */
+	rte_io_rmb();
+	if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+		if (likely(ret == MLX5_CQE_STATUS_HW_OWN))
+			return 0; /* return immediately. */
+		mlx5_aso_cqe_err_handle(sq);
+	}
+	i += sq->elts[0].burst_size;
+	sq->elts[0].burst_size = 0;
+	cq->cq_ci++;
+	if (likely(i)) {
+		sq->tail += i;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return i;
+}
+
+static uint16_t
+mlx5_aso_cnt_query_one_dcs(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool,
+			   uint8_t dcs_idx, uint32_t num)
+{
+	uint32_t dcs_id = cpool->dcs_mng.dcs[dcs_idx].obj->id;
+	uint64_t cnt_num = cpool->dcs_mng.dcs[dcs_idx].batch_sz;
+	uint64_t left;
+	uint32_t iidx = cpool->dcs_mng.dcs[dcs_idx].iidx;
+	uint32_t offset;
+	uint16_t mask;
+	uint16_t sq_idx;
+	uint64_t burst_sz = (uint64_t)(1 << MLX5_ASO_CNT_QUEUE_LOG_DESC) * 4 *
+		sh->cnt_svc->aso_mng.sq_num;
+	uint64_t qburst_sz = burst_sz / sh->cnt_svc->aso_mng.sq_num;
+	uint64_t n;
+	struct mlx5_aso_sq *sq;
+
+	cnt_num = RTE_MIN(num, cnt_num);
+	left = cnt_num;
+	while (left) {
+		mask = 0;
+		for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+				sq_idx++) {
+			if (left == 0) {
+				mask |= (1 << sq_idx);
+				continue;
+			}
+			n = RTE_MIN(left, qburst_sz);
+			offset = cnt_num - left;
+			offset += iidx;
+			mlx5_aso_cnt_sq_enqueue_burst(cpool, sh,
+					&sh->cnt_svc->aso_mng.sqs[sq_idx], n,
+					offset, dcs_id);
+			left -= n;
+		}
+		do {
+			for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+					sq_idx++) {
+				sq = &sh->cnt_svc->aso_mng.sqs[sq_idx];
+				if (mlx5_aso_cnt_completion_handle(sq))
+					mask |= (1 << sq_idx);
+			}
+		} while (mask < ((1 << sh->cnt_svc->aso_mng.sq_num) - 1));
+	}
+	return cnt_num;
+}
+
+/*
+ * Query FW counter via ASO WQE.
+ *
+ * ASO query counter use _sync_ mode, means:
+ * 1. each SQ issue one burst with several WQEs
+ * 2. ask for CQE at last WQE
+ * 3. busy poll CQ of each SQ's
+ * 4. If all SQ's CQE are received then goto step 1, issue next burst
+ *
+ * @param[in] sh
+ *   Pointer to shared device.
+ * @param[in] cpool
+ *   Pointer to counter pool.
+ *
+ * @return
+ *   0 on success, -1 on failure.
+ */
+int
+mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	uint32_t num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool) -
+		rte_ring_count(cpool->free_list);
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		num = RTE_MIN(cnt_num, cpool->dcs_mng.dcs[idx].batch_sz);
+		mlx5_aso_cnt_query_one_dcs(sh, cpool, idx, num);
+		cnt_num -= num;
+		if (cnt_num == 0)
+			break;
+	}
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 959d566d68..8891f4a4e3 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -10,6 +10,9 @@
 #include "mlx5_rx.h"
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+#include "mlx5dr_context.h"
+#include "mlx5dr_send.h"
+#include "mlx5_hws_cnt.h"
 
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
@@ -350,6 +353,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 			mlx5dr_action_destroy(acts->mhdr->action);
 		mlx5_free(acts->mhdr);
 	}
+	if (mlx5_hws_cnt_id_valid(acts->cnt_id)) {
+		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
+		acts->cnt_id = 0;
+	}
 }
 
 /**
@@ -935,6 +942,30 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	}
 	return 0;
 }
+
+static __rte_always_inline int
+flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
+		      struct mlx5_hw_actions *acts)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t pos = start_pos;
+	cnt_id_t cnt_id;
+	int ret;
+
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	if (ret != 0)
+		return ret;
+	ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &acts->rule_acts[pos].action,
+				 &acts->rule_acts[pos].counter.offset);
+	if (ret != 0)
+		return ret;
+	acts->cnt_id = cnt_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1178,6 +1209,20 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (masks->conf &&
+			    ((const struct rte_flow_action_count *)
+			     masks->conf)->id) {
+				err = flow_hw_cnt_compile(dev, i, acts);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, i)) {
+				goto err;
+			}
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1499,7 +1544,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num)
+			  uint32_t *acts_num,
+			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
@@ -1553,6 +1599,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
 		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
@@ -1660,6 +1707,21 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
+					&cnt_id);
+			if (ret != 0)
+				return ret;
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = cnt_id;
+			break;
 		default:
 			break;
 		}
@@ -1669,6 +1731,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
 	}
+	if (mlx5_hws_cnt_id_valid(hw_acts->cnt_id))
+		job->flow->cnt_id = hw_acts->cnt_id;
 	return 0;
 }
 
@@ -1804,7 +1868,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * user's input, in order to save the cost.
 	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num)) {
+				  actions, rule_acts, &acts_num, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1934,6 +1998,13 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
+			    mlx5_hws_cnt_is_shared
+				(priv->hws_cpool, job->flow->cnt_id) == false) {
+				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
+						&job->flow->cnt_id);
+				job->flow->cnt_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -2638,6 +2709,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -4315,6 +4389,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_counters) {
+		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
+				nb_queue);
+		if (priv->hws_cpool == NULL)
+			goto err;
+	}
 	return 0;
 err:
 	flow_hw_free_vport_actions(priv);
@@ -4383,6 +4463,8 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (priv->hws_cpool)
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4597,6 +4679,61 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	return flow_dv_action_destroy(dev, handle, error);
 }
 
+static int
+flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
+		      void *data, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cnt *cnt;
+	struct rte_flow_query_count *qc = data;
+	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint64_t pkts, bytes;
+
+	if (!mlx5_hws_cnt_id_valid(counter))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"counter are not available");
+	cnt = &priv->hws_cpool->pool[iidx];
+	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
+	qc->hits_set = 1;
+	qc->bytes_set = 1;
+	qc->hits = pkts - cnt->reset.hits;
+	qc->bytes = bytes - cnt->reset.bytes;
+	if (qc->reset) {
+		cnt->reset.bytes = bytes;
+		cnt->reset.hits = pkts;
+	}
+	return 0;
+}
+
+static int
+flow_hw_query(struct rte_eth_dev *dev,
+	      struct rte_flow *flow __rte_unused,
+	      const struct rte_flow_action *actions __rte_unused,
+	      void *data __rte_unused,
+	      struct rte_flow_error *error __rte_unused)
+{
+	int ret = -EINVAL;
+	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
+
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
+						  error);
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  actions,
+						  "action not supported");
+		}
+	}
+	return ret;
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -4620,6 +4757,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_destroy = flow_dv_action_destroy,
 	.action_update = flow_dv_action_update,
 	.action_query = flow_dv_action_query,
+	.query = flow_hw_query,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index b69021f6a0..7221bfb642 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -61,6 +61,7 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *aso;
 	uint32_t i;
 	struct rte_mtr_error error;
+	uint32_t flags;
 
 	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
 		ret = ENOTSUP;
@@ -104,11 +105,12 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 					NULL, "Meter register is not available.");
 		goto err;
 	}
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
 	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
 			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
-				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
-				MLX5DR_ACTION_FLAG_HWS_TX |
-				MLX5DR_ACTION_FLAG_HWS_FDB);
+				reg_id - REG_C_0, flags);
 	if (!priv->mtr_bulk.action) {
 		ret = ENOMEM;
 		rte_mtr_error_set(&error, ENOMEM,
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
new file mode 100644
index 0000000000..f7bf36de09
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -0,0 +1,523 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#include <stdint.h>
+#include <rte_malloc.h>
+#include <mlx5_malloc.h>
+#include <rte_ring.h>
+#include <mlx5_devx_cmds.h>
+#include <rte_cycles.h>
+
+#include "mlx5_utils.h"
+#include "mlx5_hws_cnt.h"
+
+#define HWS_CNT_CACHE_SZ_DEFAULT 511
+#define HWS_CNT_CACHE_PRELOAD_DEFAULT 254
+#define HWS_CNT_CACHE_FETCH_DEFAULT 254
+#define HWS_CNT_CACHE_THRESHOLD_DEFAULT 254
+#define HWS_CNT_ALLOC_FACTOR_DEFAULT 20
+
+static void
+__hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t preload;
+	uint32_t q_num = cpool->cache->q_num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	cnt_id_t cnt_id, iidx = 0;
+	uint32_t qidx;
+	struct rte_ring *qcache = NULL;
+
+	/*
+	 * Counter ID order is important for tracking the max number of in used
+	 * counter for querying, which means counter internal index order must
+	 * be from zero to the number user configured, i.e: 0 - 8000000.
+	 * Need to load counter ID in this order into the cache firstly,
+	 * and then the global free list.
+	 * In the end, user fetch the the counter from minimal to the maximum.
+	 */
+	preload = RTE_MIN(cpool->cache->preload_sz, cnt_num / q_num);
+	for (qidx = 0; qidx < q_num; qidx++) {
+		for (; iidx < preload * (qidx + 1); iidx++) {
+			cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+			qcache = cpool->cache->qcache[qidx];
+			if (qcache)
+				rte_ring_enqueue_elem(qcache, &cnt_id,
+						sizeof(cnt_id));
+		}
+	}
+	for (; iidx < cnt_num; iidx++) {
+		cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+		rte_ring_enqueue_elem(cpool->free_list, &cnt_id,
+				sizeof(cnt_id));
+	}
+}
+
+static void
+__mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	struct rte_ring *reset_list = cpool->wait_reset_list;
+	struct rte_ring *reuse_list = cpool->reuse_list;
+	uint32_t reset_cnt_num;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdu = {0};
+
+	reset_cnt_num = rte_ring_count(reset_list);
+	do {
+		cpool->query_gen++;
+		mlx5_aso_cnt_query(sh, cpool);
+		zcdr.n1 = 0;
+		zcdu.n1 = 0;
+		rte_ring_enqueue_zc_burst_elem_start(reuse_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdu,
+				NULL);
+		rte_ring_dequeue_zc_burst_elem_start(reset_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdr,
+				NULL);
+		__hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num);
+		rte_ring_dequeue_zc_elem_finish(reset_list,
+				reset_cnt_num);
+		rte_ring_enqueue_zc_elem_finish(reuse_list,
+				reset_cnt_num);
+		reset_cnt_num = rte_ring_count(reset_list);
+	} while (reset_cnt_num > 0);
+}
+
+static void
+mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_raw_data_mng *mng)
+{
+	if (mng == NULL)
+		return;
+	sh->cdev->mr_scache.dereg_mr_cb(&mng->mr);
+	mlx5_free(mng->raw);
+	mlx5_free(mng);
+}
+
+__rte_unused
+static struct mlx5_hws_cnt_raw_data_mng *
+mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
+{
+	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
+	int ret;
+	size_t sz = n * sizeof(struct flow_counter_stats);
+
+	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
+			SOCKET_ID_ANY);
+	if (mng == NULL)
+		goto error;
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+			SOCKET_ID_ANY);
+	if (mng->raw == NULL)
+		goto error;
+	ret = sh->cdev->mr_scache.reg_mr_cb(sh->cdev->pd, mng->raw, sz,
+					    &mng->mr);
+	if (ret) {
+		rte_errno = errno;
+		goto error;
+	}
+	return mng;
+error:
+	mlx5_hws_cnt_raw_data_free(sh, mng);
+	return NULL;
+}
+
+static void *
+mlx5_hws_cnt_svc(void *opaque)
+{
+	struct mlx5_dev_ctx_shared *sh =
+		(struct mlx5_dev_ctx_shared *)opaque;
+	uint64_t interval =
+		(uint64_t)sh->cnt_svc->query_interval * (US_PER_S / MS_PER_S);
+	uint16_t port_id;
+	uint64_t start_cycle, query_cycle = 0;
+	uint64_t query_us;
+	uint64_t sleep_us;
+
+	while (sh->cnt_svc->svc_running != 0) {
+		start_cycle = rte_rdtsc();
+		MLX5_ETH_FOREACH_DEV(port_id, sh->cdev->dev) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+			if (opriv != NULL &&
+			    opriv->sh == sh &&
+			    opriv->hws_cpool != NULL) {
+				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+			}
+		}
+		query_cycle = rte_rdtsc() - start_cycle;
+		query_us = query_cycle / (rte_get_timer_hz() / US_PER_S);
+		sleep_us = interval - query_us;
+		if (interval > query_us)
+			rte_delay_us_sleep(sleep_us);
+	}
+	return NULL;
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg)
+{
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct mlx5_hws_cnt_pool *cntp;
+	uint64_t cnt_num = 0;
+	uint32_t qidx;
+
+	MLX5_ASSERT(pcfg);
+	MLX5_ASSERT(ccfg);
+	cntp = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*cntp), 0,
+			   SOCKET_ID_ANY);
+	if (cntp == NULL)
+		return NULL;
+
+	cntp->cfg = *pcfg;
+	cntp->cache = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*cntp->cache) +
+			sizeof(((struct mlx5_hws_cnt_pool_caches *)0)->qcache[0])
+				* ccfg->q_num, 0, SOCKET_ID_ANY);
+	if (cntp->cache == NULL)
+		goto error;
+	 /* store the necessary cache parameters. */
+	cntp->cache->fetch_sz = ccfg->fetch_sz;
+	cntp->cache->preload_sz = ccfg->preload_sz;
+	cntp->cache->threshold = ccfg->threshold;
+	cntp->cache->q_num = ccfg->q_num;
+	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
+	if (cnt_num > UINT32_MAX) {
+		DRV_LOG(ERR, "counter number %lu is out of 32bit range",
+			cnt_num);
+		goto error;
+	}
+	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(struct mlx5_hws_cnt) *
+			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
+			0, SOCKET_ID_ANY);
+	if (cntp->pool == NULL)
+		goto error;
+	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
+	cntp->free_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->free_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_R_RING", pcfg->name);
+	cntp->wait_reset_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_MP_HTS_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
+	if (cntp->wait_reset_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_U_RING", pcfg->name);
+	cntp->reuse_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->reuse_list == NULL) {
+		DRV_LOG(ERR, "failed to create reuse list ring");
+		goto error;
+	}
+	for (qidx = 0; qidx < ccfg->q_num; qidx++) {
+		snprintf(mz_name, sizeof(mz_name), "%s_cache/%u", pcfg->name,
+				qidx);
+		cntp->cache->qcache[qidx] = rte_ring_create(mz_name, ccfg->size,
+				SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (cntp->cache->qcache[qidx] == NULL)
+			goto error;
+	}
+	return cntp;
+error:
+	mlx5_hws_cnt_pool_deinit(cntp);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool * const cntp)
+{
+	uint32_t qidx = 0;
+	if (cntp == NULL)
+		return;
+	rte_ring_free(cntp->free_list);
+	rte_ring_free(cntp->wait_reset_list);
+	rte_ring_free(cntp->reuse_list);
+	if (cntp->cache) {
+		for (qidx = 0; qidx < cntp->cache->q_num; qidx++)
+			rte_ring_free(cntp->cache->qcache[qidx]);
+	}
+	mlx5_free(cntp->cache);
+	mlx5_free(cntp->raw_mng);
+	mlx5_free(cntp->pool);
+	mlx5_free(cntp);
+}
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh)
+{
+	char name[NAME_MAX];
+	cpu_set_t cpuset;
+	int ret;
+	uint32_t service_core = sh->cnt_svc->service_core;
+
+	CPU_ZERO(&cpuset);
+	sh->cnt_svc->svc_running = 1;
+	ret = pthread_create(&sh->cnt_svc->service_thread, NULL,
+			mlx5_hws_cnt_svc, sh);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create HW steering's counter service thread.");
+		return -ENOSYS;
+	}
+	snprintf(name, NAME_MAX - 1, "%s/svc@%d",
+		 sh->ibdev_name, service_core);
+	rte_thread_setname(sh->cnt_svc->service_thread, name);
+	CPU_SET(service_core, &cpuset);
+	pthread_setaffinity_np(sh->cnt_svc->service_thread, sizeof(cpuset),
+				&cpuset);
+	return 0;
+}
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc->service_thread == 0)
+		return;
+	sh->cnt_svc->svc_running = 0;
+	pthread_join(sh->cnt_svc->service_thread, NULL);
+	sh->cnt_svc->service_thread = 0;
+}
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+	uint32_t max_log_bulk_sz = 0;
+	uint32_t log_bulk_sz;
+	uint32_t idx, alloced = 0;
+	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	struct mlx5_devx_counter_attr attr = {0};
+	struct mlx5_devx_obj *dcs;
+
+	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
+		DRV_LOG(ERR,
+			"Fw doesn't support bulk log max alloc");
+		return -1;
+	}
+	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
+	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
+	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
+	attr.pd = sh->cdev->pdn;
+	attr.pd_valid = 1;
+	attr.bulk_log_max_alloc = 1;
+	attr.flow_counter_bulk_log_size = log_bulk_sz;
+	idx = 0;
+	dcs = mlx5_devx_cmd_flow_counter_alloc_general(sh->cdev->ctx, &attr);
+	if (dcs == NULL)
+		goto error;
+	cpool->dcs_mng.dcs[idx].obj = dcs;
+	cpool->dcs_mng.dcs[idx].batch_sz = (1 << log_bulk_sz);
+	cpool->dcs_mng.batch_total++;
+	idx++;
+	cpool->dcs_mng.dcs[0].iidx = 0;
+	alloced = cpool->dcs_mng.dcs[0].batch_sz;
+	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
+		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			dcs = mlx5_devx_cmd_flow_counter_alloc_general
+				(sh->cdev->ctx, &attr);
+			if (dcs == NULL)
+				goto error;
+			cpool->dcs_mng.dcs[idx].obj = dcs;
+			cpool->dcs_mng.dcs[idx].batch_sz =
+				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].iidx = alloced;
+			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
+			cpool->dcs_mng.batch_total++;
+		}
+	}
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Cannot alloc device counter, allocated[%" PRIu32 "] request[%" PRIu32 "]",
+		alloced, cnt_num);
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+		cpool->dcs_mng.dcs[idx].obj = NULL;
+		cpool->dcs_mng.dcs[idx].batch_sz = 0;
+		cpool->dcs_mng.dcs[idx].iidx = 0;
+	}
+	cpool->dcs_mng.batch_total = 0;
+	return -1;
+}
+
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+
+	if (cpool == NULL)
+		return;
+	for (idx = 0; idx < MLX5_HWS_CNT_DCS_NUM; idx++)
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+	if (cpool->raw_mng) {
+		mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+		cpool->raw_mng = NULL;
+	}
+}
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	int ret = 0;
+	struct mlx5_hws_cnt_dcs *dcs;
+	uint32_t flags;
+
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		dcs->dr_action = mlx5dr_action_create_counter(priv->dr_ctx,
+					(struct mlx5dr_devx_obj *)dcs->obj,
+					flags);
+		if (dcs->dr_action == NULL) {
+			mlx5_hws_cnt_pool_action_destroy(cpool);
+			ret = -ENOSYS;
+			break;
+		}
+	}
+	return ret;
+}
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	struct mlx5_hws_cnt_dcs *dcs;
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		if (dcs->dr_action != NULL) {
+			mlx5dr_action_destroy(dcs->dr_action);
+			dcs->dr_action = NULL;
+		}
+	}
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue)
+{
+	struct mlx5_hws_cnt_pool *cpool = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cache_param cparam = {0};
+	struct mlx5_hws_cnt_pool_cfg pcfg = {0};
+	char *mp_name;
+	int ret = 0;
+	size_t sz;
+
+	/* init cnt service if not. */
+	if (priv->sh->cnt_svc == NULL) {
+		ret = mlx5_hws_cnt_svc_init(priv->sh);
+		if (ret != 0)
+			return NULL;
+	}
+	cparam.fetch_sz = HWS_CNT_CACHE_FETCH_DEFAULT;
+	cparam.preload_sz = HWS_CNT_CACHE_PRELOAD_DEFAULT;
+	cparam.q_num = nb_queue;
+	cparam.threshold = HWS_CNT_CACHE_THRESHOLD_DEFAULT;
+	cparam.size = HWS_CNT_CACHE_SZ_DEFAULT;
+	pcfg.alloc_factor = HWS_CNT_ALLOC_FACTOR_DEFAULT;
+	mp_name = mlx5_malloc(MLX5_MEM_ZERO, RTE_MEMZONE_NAMESIZE, 0,
+			SOCKET_ID_ANY);
+	if (mp_name == NULL)
+		goto error;
+	snprintf(mp_name, RTE_MEMZONE_NAMESIZE, "MLX5_HWS_CNT_POOL_%u",
+			dev->data->port_id);
+	pcfg.name = mp_name;
+	pcfg.request_num = pattr->nb_counters;
+	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	if (cpool == NULL)
+		goto error;
+	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
+	if (ret != 0)
+		goto error;
+	sz = RTE_ALIGN_CEIL(mlx5_hws_cnt_pool_get_size(cpool), 4);
+	cpool->raw_mng = mlx5_hws_cnt_raw_data_alloc(priv->sh, sz);
+	if (cpool->raw_mng == NULL)
+		goto error;
+	__hws_cnt_id_load(cpool);
+	/*
+	 * Bump query gen right after pool create so the
+	 * pre-loaded counters can be used directly
+	 * because they already have init value no need
+	 * to wait for query.
+	 */
+	cpool->query_gen = 1;
+	ret = mlx5_hws_cnt_pool_action_create(priv, cpool);
+	if (ret != 0)
+		goto error;
+	priv->sh->cnt_svc->refcnt++;
+	return cpool;
+error:
+	mlx5_hws_cnt_pool_destroy(priv->sh, cpool);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	if (cpool == NULL)
+		return;
+	if (--sh->cnt_svc->refcnt == 0)
+		mlx5_hws_cnt_svc_deinit(sh);
+	mlx5_hws_cnt_pool_action_destroy(cpool);
+	mlx5_hws_cnt_pool_dcs_free(sh, cpool);
+	mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+	mlx5_free((void *)cpool->cfg.name);
+	mlx5_hws_cnt_pool_deinit(cpool);
+}
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh)
+{
+	int ret;
+
+	sh->cnt_svc = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*sh->cnt_svc), 0, SOCKET_ID_ANY);
+	if (sh->cnt_svc == NULL)
+		return -1;
+	sh->cnt_svc->query_interval = sh->config.cnt_svc.cycle_time;
+	sh->cnt_svc->service_core = sh->config.cnt_svc.service_core;
+	ret = mlx5_aso_cnt_queue_init(sh);
+	if (ret != 0) {
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+		return -1;
+	}
+	ret = mlx5_hws_cnt_service_thread_create(sh);
+	if (ret != 0) {
+		mlx5_aso_cnt_queue_uninit(sh);
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+	}
+	return 0;
+}
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc == NULL)
+		return;
+	mlx5_hws_cnt_service_thread_destroy(sh);
+	mlx5_aso_cnt_queue_uninit(sh);
+	mlx5_free(sh->cnt_svc);
+	sh->cnt_svc = NULL;
+}
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
new file mode 100644
index 0000000000..312b053c59
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -0,0 +1,558 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Mellanox Technologies, Ltd
+ */
+
+#ifndef _MLX5_HWS_CNT_H_
+#define _MLX5_HWS_CNT_H_
+
+#include <rte_ring.h>
+#include "mlx5_utils.h"
+#include "mlx5_flow.h"
+
+/*
+ * COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    | T |       | D |                                               |
+ *    ~ Y |       | C |                    IDX                        ~
+ *    | P |       | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
+#define MLX5_HWS_CNT_DCS_NUM 4
+#define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
+#define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
+#define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
+
+struct mlx5_hws_cnt_dcs {
+	void *dr_action;
+	uint32_t batch_sz;
+	uint32_t iidx; /* internal index of first counter in this bulk. */
+	struct mlx5_devx_obj *obj;
+};
+
+struct mlx5_hws_cnt_dcs_mng {
+	uint32_t batch_total;
+	struct mlx5_hws_cnt_dcs dcs[MLX5_HWS_CNT_DCS_NUM];
+};
+
+struct mlx5_hws_cnt {
+	struct flow_counter_stats reset;
+	union {
+		uint32_t share: 1;
+		/*
+		 * share will be set to 1 when this counter is used as indirect
+		 * action. Only meaningful when user own this counter.
+		 */
+		uint32_t query_gen_when_free;
+		/*
+		 * When PMD own this counter (user put back counter to PMD
+		 * counter pool, i.e), this field recorded value of counter
+		 * pools query generation at time user release the counter.
+		 */
+	};
+};
+
+struct mlx5_hws_cnt_raw_data_mng {
+	struct flow_counter_stats *raw;
+	struct mlx5_pmd_mr mr;
+};
+
+struct mlx5_hws_cache_param {
+	uint32_t size;
+	uint32_t q_num;
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+};
+
+struct mlx5_hws_cnt_pool_cfg {
+	char *name;
+	uint32_t request_num;
+	uint32_t alloc_factor;
+};
+
+struct mlx5_hws_cnt_pool_caches {
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+	uint32_t q_num;
+	struct rte_ring *qcache[];
+};
+
+struct mlx5_hws_cnt_pool {
+	struct mlx5_hws_cnt_pool_cfg cfg __rte_cache_aligned;
+	struct mlx5_hws_cnt_dcs_mng dcs_mng __rte_cache_aligned;
+	uint32_t query_gen __rte_cache_aligned;
+	struct mlx5_hws_cnt *pool;
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng;
+	struct rte_ring *reuse_list;
+	struct rte_ring *free_list;
+	struct rte_ring *wait_reset_list;
+	struct mlx5_hws_cnt_pool_caches *cache;
+} __rte_cache_aligned;
+
+/**
+ * Translate counter id into internal index (start from 0), which can be used
+ * as index of raw/cnt pool.
+ *
+ * @param cnt_id
+ *   The external counter id
+ * @return
+ *   Internal index
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+	uint32_t offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+
+	dcs_idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	return (cpool->dcs_mng.dcs[dcs_idx].iidx + offset);
+}
+
+/**
+ * Check if it's valid counter id.
+ */
+static __rte_always_inline bool
+mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
+{
+	return (cnt_id >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_COUNT ? true : false;
+}
+
+/**
+ * Generate Counter id from internal index.
+ *
+ * @param cpool
+ *   The pointer to counter pool
+ * @param index
+ *   The internal counter index.
+ *
+ * @return
+ *   Counter id
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+{
+	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
+	uint32_t idx;
+	uint32_t offset;
+	cnt_id_t cnt_id;
+
+	for (idx = 0, offset = iidx; idx < dcs_mng->batch_total; idx++) {
+		if (dcs_mng->dcs[idx].batch_sz <= offset)
+			offset -= dcs_mng->dcs[idx].batch_sz;
+		else
+			break;
+	}
+	cnt_id = offset;
+	cnt_id |= (idx << MLX5_HWS_CNT_DCS_IDX_OFFSET);
+	return (MLX5_INDIRECT_ACTION_TYPE_COUNT <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | cnt_id;
+}
+
+static __rte_always_inline void
+__hws_cnt_query_raw(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		uint64_t *raw_pkts, uint64_t *raw_bytes)
+{
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng = cpool->raw_mng;
+	struct flow_counter_stats s[2];
+	uint8_t i = 0x1;
+	size_t stat_sz = sizeof(s[0]);
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	memcpy(&s[0], &raw_mng->raw[iidx], stat_sz);
+	do {
+		memcpy(&s[i & 1], &raw_mng->raw[iidx], stat_sz);
+		if (memcmp(&s[0], &s[1], stat_sz) == 0) {
+			*raw_pkts = rte_be_to_cpu_64(s[0].hits);
+			*raw_bytes = rte_be_to_cpu_64(s[0].bytes);
+			break;
+		}
+		i = ~i;
+	} while (1);
+}
+
+/**
+ * Copy elems from one zero-copy ring to zero-copy ring in place.
+ *
+ * The input is a rte ring zero-copy data struct, which has two pointer.
+ * in case of the wrapper happened, the ptr2 will be meaningful.
+ *
+ * So this rountin needs to consider the situation that the address given by
+ * source and destination could be both wrapped.
+ * First, calculate the first number of element needs to be copied until wrapped
+ * address, which could be in source or destination.
+ * Second, copy left number of element until second wrapped address. If in first
+ * step the wrapped address is source, then this time it must be in destination.
+ * and vice-vers.
+ * Third, copy all left numbe of element.
+ *
+ * In worst case, we need copy three pieces of continuous memory.
+ *
+ * @param zcdd
+ *   A pointer to zero-copy data of dest ring.
+ * @param zcds
+ *   A pointer to zero-copy data of source ring.
+ * @param n
+ *   Number of elems to copy.
+ */
+static __rte_always_inline void
+__hws_cnt_r2rcpy(struct rte_ring_zc_data *zcdd, struct rte_ring_zc_data *zcds,
+		unsigned int n)
+{
+	unsigned int n1, n2, n3;
+	void *s1, *s2, *s3;
+	void *d1, *d2, *d3;
+
+	s1 = zcds->ptr1;
+	d1 = zcdd->ptr1;
+	n1 = RTE_MIN(zcdd->n1, zcds->n1);
+	if (zcds->n1 > n1) {
+		n2 = zcds->n1 - n1;
+		s2 = RTE_PTR_ADD(zcds->ptr1, sizeof(cnt_id_t) * n1);
+		d2 = zcdd->ptr2;
+		n3 = n - n1 - n2;
+		s3 = zcds->ptr2;
+		d3 = RTE_PTR_ADD(zcdd->ptr2, sizeof(cnt_id_t) * n2);
+	} else {
+		n2 = zcdd->n1 - n1;
+		s2 = zcds->ptr2;
+		d2 = RTE_PTR_ADD(zcdd->ptr1, sizeof(cnt_id_t) * n1);
+		n3 = n - n1 - n2;
+		s3 = RTE_PTR_ADD(zcds->ptr2, sizeof(cnt_id_t) * n2);
+		d3 = zcdd->ptr2;
+	}
+	memcpy(d1, s1, n1 * sizeof(cnt_id_t));
+	if (n2 != 0) {
+		memcpy(d2, s2, n2 * sizeof(cnt_id_t));
+		if (n3 != 0)
+			memcpy(d3, s3, n3 * sizeof(cnt_id_t));
+	}
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_flush(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *reset_list = NULL;
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache,
+			sizeof(cnt_id_t), rte_ring_count(qcache), &zcdc,
+			NULL);
+	MLX5_ASSERT(ret);
+	reset_list = cpool->wait_reset_list;
+	rte_ring_enqueue_zc_burst_elem_start(reset_list,
+			sizeof(cnt_id_t), ret, &zcdr, NULL);
+	__hws_cnt_r2rcpy(&zcdr, &zcdc, ret);
+	rte_ring_enqueue_zc_elem_finish(reset_list, ret);
+	rte_ring_dequeue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_fetch(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+	struct rte_ring *free_list = NULL;
+	struct rte_ring *reuse_list = NULL;
+	struct rte_ring *list = NULL;
+	struct rte_ring_zc_data zcdf = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdu = {0};
+	struct rte_ring_zc_data zcds = {0};
+	struct mlx5_hws_cnt_pool_caches *cache = cpool->cache;
+	unsigned int ret;
+
+	reuse_list = cpool->reuse_list;
+	ret = rte_ring_dequeue_zc_burst_elem_start(reuse_list,
+			sizeof(cnt_id_t), cache->fetch_sz, &zcdu, NULL);
+	zcds = zcdu;
+	list = reuse_list;
+	if (unlikely(ret == 0)) { /* no reuse counter. */
+		rte_ring_dequeue_zc_elem_finish(reuse_list, 0);
+		free_list = cpool->free_list;
+		ret = rte_ring_dequeue_zc_burst_elem_start(free_list,
+				sizeof(cnt_id_t), cache->fetch_sz, &zcdf, NULL);
+		zcds = zcdf;
+		list = free_list;
+		if (unlikely(ret == 0)) { /* no free counter. */
+			rte_ring_dequeue_zc_elem_finish(free_list, 0);
+			if (rte_ring_count(cpool->wait_reset_list))
+				return -EAGAIN;
+			return -ENOENT;
+		}
+	}
+	rte_ring_enqueue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+			ret, &zcdc, NULL);
+	__hws_cnt_r2rcpy(&zcdc, &zcds, ret);
+	rte_ring_dequeue_zc_elem_finish(list, ret);
+	rte_ring_enqueue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+__mlx5_hws_cnt_pool_enqueue_revert(struct rte_ring *r, unsigned int n,
+		struct rte_ring_zc_data *zcd)
+{
+	uint32_t current_head = 0;
+	uint32_t revert2head = 0;
+
+	MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST);
+	MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST);
+	current_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	MLX5_ASSERT(n <= r->capacity);
+	MLX5_ASSERT(n <= rte_ring_count(r));
+	revert2head = current_head - n;
+	r->prod.head = revert2head; /* This ring should be SP. */
+	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
+			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
+	/* Update tail */
+	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
+	return n;
+}
+
+/**
+ * Put one counter back in the mempool.
+ *
+ * @param cpool
+ *   A pointer to the counter pool structure.
+ * @param cnt_id
+ *   A counter id to be added.
+ * @return
+ *   - 0: Success; object taken
+ *   - -ENOENT: not enough entry in pool
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret = 0;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring *qcache = NULL;
+	unsigned int wb_num = 0; /* cache write-back number. */
+	cnt_id_t iidx;
+
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].query_gen_when_free =
+		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_enqueue_elem(cpool->wait_reset_list, cnt_id,
+				sizeof(cnt_id_t));
+		MLX5_ASSERT(ret == 0);
+		return ret;
+	}
+	ret = rte_ring_enqueue_burst_elem(qcache, cnt_id, sizeof(cnt_id_t), 1,
+					  NULL);
+	if (unlikely(ret == 0)) { /* cache is full. */
+		wb_num = rte_ring_count(qcache) - cpool->cache->threshold;
+		MLX5_ASSERT(wb_num < rte_ring_count(qcache));
+		__mlx5_hws_cnt_pool_enqueue_revert(qcache, wb_num, &zcdc);
+		rte_ring_enqueue_zc_burst_elem_start(cpool->wait_reset_list,
+				sizeof(cnt_id_t), wb_num, &zcdr, NULL);
+		__hws_cnt_r2rcpy(&zcdr, &zcdc, wb_num);
+		rte_ring_enqueue_zc_elem_finish(cpool->wait_reset_list, wb_num);
+		/* write-back THIS counter too */
+		ret = rte_ring_enqueue_burst_elem(cpool->wait_reset_list,
+				cnt_id, sizeof(cnt_id_t), 1, NULL);
+	}
+	return ret == 1 ? 0 : -ENOENT;
+}
+
+/**
+ * Get one counter from the pool.
+ *
+ * If @param queue is not null, objects will be retrieved first from queue's
+ * cache, subsequently from the common pool. Note that it can return -ENOENT
+ * when the local cache and common pool are empty, even if cache from other
+ * queue are full.
+ *
+ * @param cntp
+ *   A pointer to the counter pool structure.
+ * @param queue
+ *   A pointer to HWS queue. If null, it means fetch from common pool.
+ * @param cnt_id
+ *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @return
+ *   - 0: Success; objects taken.
+ *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
+ *   - -EAGAIN: counter is not ready; try again.
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *qcache = NULL;
+	uint32_t query_gen = 0;
+	cnt_id_t iidx, tmp_cid = 0;
+
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_dequeue_elem(cpool->reuse_list, &tmp_cid,
+				sizeof(cnt_id_t));
+		if (unlikely(ret != 0)) {
+			ret = rte_ring_dequeue_elem(cpool->free_list, &tmp_cid,
+					sizeof(cnt_id_t));
+			if (unlikely(ret != 0)) {
+				if (rte_ring_count(cpool->wait_reset_list))
+					return -EAGAIN;
+				return -ENOENT;
+			}
+		}
+		*cnt_id = tmp_cid;
+		iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+		__hws_cnt_query_raw(cpool, *cnt_id,
+				    &cpool->pool[iidx].reset.hits,
+				    &cpool->pool[iidx].reset.bytes);
+		return 0;
+	}
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
+			&zcdc, NULL);
+	if (unlikely(ret == 0)) { /* local cache is empty. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+	}
+	/* get one from local cache. */
+	*cnt_id = (*(cnt_id_t *)zcdc.ptr1);
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	query_gen = cpool->pool[iidx].query_gen_when_free;
+	if (cpool->query_gen == query_gen) { /* counter is waiting to reset. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* write-back counter to reset list. */
+		mlx5_hws_cnt_pool_cache_flush(cpool, *queue);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+		*cnt_id = *(cnt_id_t *)zcdc.ptr1;
+	}
+	__hws_cnt_query_raw(cpool, *cnt_id, &cpool->pool[iidx].reset.hits,
+			    &cpool->pool[iidx].reset.bytes);
+	rte_ring_dequeue_zc_elem_finish(qcache, 1);
+	cpool->pool[iidx].share = 0;
+	return 0;
+}
+
+static __always_inline unsigned int
+mlx5_hws_cnt_pool_get_size(struct mlx5_hws_cnt_pool *cpool)
+{
+	return rte_ring_get_capacity(cpool->free_list);
+}
+
+static __always_inline int
+mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
+		cnt_id_t cnt_id, struct mlx5dr_action **action,
+		uint32_t *offset)
+{
+	uint8_t idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+
+	idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	*action = cpool->dcs_mng.dcs[idx].dr_action;
+	*offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx;
+
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	if (ret != 0)
+		return ret;
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	MLX5_ASSERT(cpool->pool[iidx].share == 0);
+	cpool->pool[iidx].share = 1;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_put(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+
+	cpool->pool[iidx].share = 0;
+	ret = mlx5_hws_cnt_pool_put(cpool, NULL, cnt_id);
+	if (unlikely(ret != 0))
+		cpool->pool[iidx].share = 1; /* fail to release, restore. */
+	return ret;
+}
+
+static __rte_always_inline bool
+mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	return cpool->pool[iidx].share ? true : false;
+}
+
+/* init HWS counter pool. */
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg);
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh);
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool);
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool);
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue);
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
+
+#endif /* _MLX5_HWS_CNT_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 12/27] net/mlx5: support caching queue action
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (10 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 11/27] net/mlx5: add HW steering counter action Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 13/27] net/mlx5: support DR action template API Suanming Mou
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

If the port is stopped, the Rx queue data will also be destroyed. At that
time, create table with RSS action would be failed due to lack of Rx queue
data.

This commit adds the cache of queue create operation while port stopped.
In case  port is stopped, add tables to the ongoing list first, then do
action translate only when port starts.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5.h         |  2 +
 drivers/net/mlx5/mlx5_flow.h    |  2 +
 drivers/net/mlx5/mlx5_flow_hw.c | 95 +++++++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_trigger.c |  8 +++
 4 files changed, 97 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8d82c68569..be60038810 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1643,6 +1643,8 @@ struct mlx5_priv {
 	struct mlx5dr_action *hw_drop[2];
 	/* HW steering global tag action. */
 	struct mlx5dr_action *hw_tag[2];
+	/* HW steering create ongoing rte flow table list header. */
+	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 	struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index cdea4076d8..746cf439fc 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2398,4 +2398,6 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_pattern_template_attr *attr,
 		const struct rte_flow_item items[],
 		struct rte_flow_error *error);
+int flow_hw_table_update(struct rte_eth_dev *dev,
+			 struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 8891f4a4e3..fe40b02c49 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -992,11 +992,11 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
  *    Table on success, NULL otherwise and rte_errno is set.
  */
 static int
-flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct mlx5_flow_template_table_cfg *cfg,
-			  struct mlx5_hw_actions *acts,
-			  struct rte_flow_actions_template *at,
-			  struct rte_flow_error *error)
+__flow_hw_actions_translate(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_template_table_cfg *cfg,
+			    struct mlx5_hw_actions *acts,
+			    struct rte_flow_actions_template *at,
+			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
@@ -1309,6 +1309,40 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				  "fail to create rte table");
 }
 
+/**
+ * Translate rte_flow actions to DR action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] tbl
+ *   Pointer to the flow template table.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_actions_translate(struct rte_eth_dev *dev,
+			  struct rte_flow_template_table *tbl,
+			  struct rte_flow_error *error)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->nb_action_templates; i++) {
+		if (__flow_hw_actions_translate(dev, &tbl->cfg,
+						&tbl->ats[i].acts,
+						tbl->ats[i].action_template,
+						error))
+			goto err;
+	}
+	return 0;
+err:
+	while (i--)
+		__flow_hw_action_template_destroy(dev, &tbl->ats[i].acts);
+	return -1;
+}
+
 /**
  * Get shared indirect action.
  *
@@ -1837,6 +1871,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	uint32_t acts_num, flow_idx;
 	int ret;
 
+	if (unlikely((!dev->data->dev_started))) {
+		rte_errno = EINVAL;
+		goto error;
+	}
 	if (unlikely(!priv->hw_q[queue].job_idx)) {
 		rte_errno = ENOMEM;
 		goto error;
@@ -2231,6 +2269,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct mlx5_list_entry *ge;
 	uint32_t i, max_tpl = MLX5_HW_TBL_MAX_ITEM_TEMPLATE;
 	uint32_t nb_flows = rte_align32pow2(attr->nb_flows);
+	bool port_started = !!dev->data->dev_started;
 	int err;
 
 	/* HWS layer accepts only 1 item template with root table. */
@@ -2295,21 +2334,26 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			rte_errno = EINVAL;
 			goto at_error;
 		}
+		tbl->ats[i].action_template = action_templates[i];
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, &tbl->cfg,
-						&tbl->ats[i].acts,
-						action_templates[i], error);
+		if (!port_started)
+			continue;
+		err = __flow_hw_actions_translate(dev, &tbl->cfg,
+						  &tbl->ats[i].acts,
+						  action_templates[i], error);
 		if (err) {
 			i++;
 			goto at_error;
 		}
-		tbl->ats[i].action_template = action_templates[i];
 	}
 	tbl->nb_action_templates = nb_action_templates;
 	tbl->type = attr->flow_attr.transfer ? MLX5DR_TABLE_TYPE_FDB :
 		    (attr->flow_attr.egress ? MLX5DR_TABLE_TYPE_NIC_TX :
 		    MLX5DR_TABLE_TYPE_NIC_RX);
-	LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	if (port_started)
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	else
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl_ongo, tbl, next);
 	return tbl;
 at_error:
 	while (i--) {
@@ -2339,6 +2383,33 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Update flow template table.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+int
+flow_hw_table_update(struct rte_eth_dev *dev,
+		     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table *tbl;
+
+	while ((tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo)) != NULL) {
+		if (flow_hw_actions_translate(dev, tbl, error))
+			return -1;
+		LIST_REMOVE(tbl, next);
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	}
+	return 0;
+}
+
 /**
  * Translates group index specified by the user in @p attr to internal
  * group index.
@@ -4440,6 +4511,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	if (!priv->dr_ctx)
 		return;
 	flow_hw_flush_all_ctrl_flows(dev);
+	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
+		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
+		flow_hw_table_destroy(dev, tbl, NULL);
+	}
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 9e458356a0..ab2b83a870 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1170,6 +1170,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 			dev->data->port_id, rte_strerror(rte_errno));
 		goto error;
 	}
+	if (priv->sh->config.dv_flow_en == 2) {
+		ret = flow_hw_table_update(dev, NULL);
+		if (ret) {
+			DRV_LOG(ERR, "port %u failed to update HWS tables",
+				dev->data->port_id);
+			goto error;
+		}
+	}
 	ret = mlx5_traffic_enable(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u failed to set defaults flows",
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 13/27] net/mlx5: support DR action template API
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (11 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 12/27] net/mlx5: support caching queue action Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 14/27] net/mlx5: fix indirect action validate Suanming Mou
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adapts mlx5 PMD to changes in mlx5dr API regarding action
templates. It changes the following:

1. Actions template creation:

    - Flow actions types are translated to mlx5dr action types in order
      to create mlx5dr_action_template object.
    - An offset is assigned to each flow action. This offset is used to
      predetermine action's location in rule_acts array passed on rule
      creation.

2. Template table creation:

    - Fixed actions are created and put in rule_acts cache using
      predetermined offsets
    - mlx5dr matcher is parametrized by action templates bound to
      template table.
    - mlx5dr matcher is configured to optimize rule creation based on
      passed rule indices.

3. Flow rule creation:

    - mlx5dr rule is parametrized by action template on which these
      rule's actions are based.
    - Rule index hint is provided to mlx5dr.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   6 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 351 ++++++++++++++++++++++++--------
 2 files changed, 268 insertions(+), 89 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 746cf439fc..c982cb953a 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1173,6 +1173,11 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint16_t dr_actions_num; /* Amount of DR rules actions. */
+	uint16_t actions_num; /* Amount of flow actions */
+	uint16_t *actions_off; /* DR action offset for given rte action offset. */
+	uint16_t reformat_off; /* Offset of DR reformat action. */
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
@@ -1224,7 +1229,6 @@ struct mlx5_hw_actions {
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
-	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index fe40b02c49..6a1ed7e790 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -913,33 +913,29 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 static __rte_always_inline int
 flow_hw_meter_compile(struct rte_eth_dev *dev,
 		      const struct mlx5_flow_template_table_cfg *cfg,
-		      uint32_t  start_pos, const struct rte_flow_action *action,
-		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      uint16_t aso_mtr_pos,
+		      uint16_t jump_pos,
+		      const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts,
 		      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr *aso_mtr;
 	const struct rte_flow_action_meter *meter = action->conf;
-	uint32_t pos = start_pos;
 	uint32_t group = cfg->attr.flow_attr.group;
 
 	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
-	acts->rule_acts[pos].action = priv->mtr_bulk.action;
-	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
-		acts->jump = flow_hw_jump_action_register
+	acts->rule_acts[aso_mtr_pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts->jump = flow_hw_jump_action_register
 		(dev, cfg, aso_mtr->fm.group, error);
-	if (!acts->jump) {
-		*end_pos = start_pos;
+	if (!acts->jump)
 		return -ENOMEM;
-	}
-	acts->rule_acts[++pos].action = (!!group) ?
+	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	*end_pos = pos;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
-		*end_pos = start_pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
 		return -ENOMEM;
-	}
 	return 0;
 }
 
@@ -1007,12 +1003,15 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	enum mlx5dr_action_reformat_type refmt_type = 0;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
-	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
+	uint16_t reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
-	uint32_t type, i;
+	uint32_t type;
+	bool reformat_used = false;
+	uint16_t action_pos;
+	uint16_t jump_pos;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1022,46 +1021,53 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 		type = MLX5DR_TABLE_TYPE_NIC_TX;
 	else
 		type = MLX5DR_TABLE_TYPE_NIC_RX;
-	for (i = 0; !actions_end; actions++, masks++) {
+	for (; !actions_end; actions++, masks++) {
 		switch (actions->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			action_pos = at->actions_off[actions - action_start];
 			if (!attr->group) {
 				DRV_LOG(ERR, "Indirect action is not supported in root table.");
 				goto err;
 			}
 			if (actions->conf && masks->conf) {
 				if (flow_hw_shared_action_translate
-				(dev, actions, acts, actions - action_start, i))
+				(dev, actions, acts, actions - action_start, action_pos))
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
-			acts->rule_acts[i++].action =
+			action_pos = at->actions_off[actions - action_start];
+			acts->rule_acts[action_pos].action =
 				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
+			action_pos = at->actions_off[actions - action_start];
 			acts->mark = true;
-			if (masks->conf)
-				acts->rule_acts[i].tag.value =
+			if (masks->conf &&
+			    ((const struct rte_flow_action_mark *)
+			     masks->conf)->id)
+				acts->rule_acts[action_pos].tag.value =
 					mlx5_flow_mark_set
 					(((const struct rte_flow_action_mark *)
 					(masks->conf))->id);
 			else if (__flow_hw_act_data_general_append(priv, acts,
-				actions->type, actions - action_start, i))
+				actions->type, actions - action_start, action_pos))
 				goto err;
-			acts->rule_acts[i++].action =
+			acts->rule_acts[action_pos].action =
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - action_start];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_jump *)
+			     masks->conf)->group) {
 				uint32_t jump_group =
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
@@ -1069,76 +1075,77 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
-				acts->rule_acts[i].action = (!!attr->group) ?
+				acts->rule_acts[action_pos].action = (!!attr->group) ?
 						acts->jump->hws_action :
 						acts->jump->root_action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - action_start];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_queue *)
+			     masks->conf)->index) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - action_start];
+			if (actions->conf && masks->conf) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
-			reformat_pos = i++;
+			MLX5_ASSERT(!reformat_used);
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
@@ -1152,25 +1159,23 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 actions->conf;
 			encap_data = raw_encap_data->data;
 			data_size = raw_encap_data->size;
-			if (reformat_pos != MLX5_HW_MAX_ACTS) {
+			if (reformat_used) {
 				refmt_type = data_size <
 				MLX5_ENCAPSULATION_DECISION_SIZE ?
 				MLX5DR_ACTION_REFORMAT_TYPE_TNL_L3_TO_L2 :
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L3;
 			} else {
-				reformat_pos = i++;
+				reformat_used = true;
 				refmt_type =
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			}
 			reformat_src = actions - action_start;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
-			reformat_pos = i++;
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			if (mhdr.pos == UINT16_MAX)
-				mhdr.pos = i++;
 			err = flow_hw_modify_field_compile(dev, attr, action_start,
 							   actions, masks, acts, &mhdr,
 							   error);
@@ -1188,40 +1193,46 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			action_pos = at->actions_off[actions - action_start];
 			if (flow_hw_represented_port_compile
 					(dev, attr, action_start, actions,
-					 masks, acts, i, error))
+					 masks, acts, action_pos, error))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
+			/*
+			 * METER action is compiled to 2 DR actions - ASO_METER and FT.
+			 * Calculated DR offset is stored only for ASO_METER and FT
+			 * is assumed to be the next action.
+			 */
+			action_pos = at->actions_off[actions - action_start];
+			jump_pos = action_pos + 1;
 			if (actions->conf && masks->conf &&
 			    ((const struct rte_flow_action_meter *)
 			     masks->conf)->mtr_id) {
 				err = flow_hw_meter_compile(dev, cfg,
-						i, actions, acts, &i, error);
+						action_pos, jump_pos, actions, acts, error);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append(priv, acts,
 							actions->type,
 							actions - action_start,
-							i))
+							action_pos))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
+			action_pos = at->actions_off[actions - action_start];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
-				err = flow_hw_cnt_compile(dev, i, acts);
+				err = flow_hw_cnt_compile(dev, action_pos, acts);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -1255,10 +1266,11 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 			goto err;
 		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
 	}
-	if (reformat_pos != MLX5_HW_MAX_ACTS) {
+	if (reformat_used) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
 
+		MLX5_ASSERT(at->reformat_off != UINT16_MAX);
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
 			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
@@ -1286,20 +1298,17 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
-		acts->rule_acts[reformat_pos].action =
-						acts->encap_decap->action;
-		acts->rule_acts[reformat_pos].reformat.data =
-						acts->encap_decap->data;
+		acts->rule_acts[at->reformat_off].action = acts->encap_decap->action;
+		acts->rule_acts[at->reformat_off].reformat.data = acts->encap_decap->data;
 		if (shared_rfmt)
-			acts->rule_acts[reformat_pos].reformat.offset = 0;
+			acts->rule_acts[at->reformat_off].reformat.offset = 0;
 		else if (__flow_hw_act_data_encap_append(priv, acts,
 				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, data_size))
+				 reformat_src, at->reformat_off, data_size))
 			goto err;
 		acts->encap_decap->shared = shared_rfmt;
-		acts->encap_decap_pos = reformat_pos;
+		acts->encap_decap_pos = at->reformat_off;
 	}
-	acts->acts_num = i;
 	return 0;
 err:
 	err = rte_errno;
@@ -1574,16 +1583,17 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 static __rte_always_inline int
 flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5_hw_q_job *job,
-			  const struct mlx5_hw_actions *hw_acts,
+			  const struct mlx5_hw_action_template *hw_at,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
+	const struct rte_flow_actions_template *at = hw_at->action_template;
+	const struct mlx5_hw_actions *hw_acts = &hw_at->acts;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
@@ -1599,11 +1609,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *mtr;
 	uint32_t mtr_id;
 
-	memcpy(rule_acts, hw_acts->rule_acts,
-	       sizeof(*rule_acts) * hw_acts->acts_num);
-	*acts_num = hw_acts->acts_num;
-	if (LIST_EMPTY(&hw_acts->act_list))
-		return 0;
+	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
 	ft_flag = mlx5_hw_act_flag[!!table->grp->group_id][table->type];
 	if (table->type == MLX5DR_TABLE_TYPE_FDB) {
@@ -1737,7 +1743,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			(*acts_num)++;
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
@@ -1864,11 +1869,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 		.burst = attr->postpone,
 	};
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
-	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
 	const struct rte_flow_item *rule_items;
-	uint32_t acts_num, flow_idx;
+	uint32_t flow_idx;
 	int ret;
 
 	if (unlikely((!dev->data->dev_started))) {
@@ -1897,7 +1901,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->flow = flow;
 	job->user_data = user_data;
 	rule_attr.user_data = job;
-	hw_acts = &table->ats[action_template_index].acts;
+	rule_attr.rule_idx = flow_idx;
 	/*
 	 * Construct the flow actions based on the input actions.
 	 * The implicitly appended action is always fixed, like metadata
@@ -1905,8 +1909,8 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num, queue)) {
+	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
+				      pattern_template_index, actions, rule_acts, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1915,7 +1919,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	if (!rule_items)
 		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
-				 pattern_template_index, items,
+				 pattern_template_index, rule_items,
 				 action_template_index, rule_acts,
 				 &rule_attr, &flow->rule);
 	if (likely(!ret))
@@ -2249,6 +2253,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	struct mlx5dr_action_template *at[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
@@ -2304,6 +2309,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl->grp = grp;
 	/* Prepare matcher information. */
 	matcher_attr.priority = attr->flow_attr.priority;
+	matcher_attr.optimize_using_rule_idx = true;
 	matcher_attr.mode = MLX5DR_MATCHER_RESOURCE_MODE_RULE;
 	matcher_attr.rule.num_log = rte_log2_u32(nb_flows);
 	/* Build the item template. */
@@ -2319,10 +2325,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mt[i] = item_templates[i]->mt;
 		tbl->its[i] = item_templates[i];
 	}
-	tbl->matcher = mlx5dr_matcher_create
-		(tbl->grp->tbl, mt, nb_item_templates, NULL, 0, &matcher_attr);
-	if (!tbl->matcher)
-		goto it_error;
 	tbl->nb_item_templates = nb_item_templates;
 	/* Build the action template. */
 	for (i = 0; i < nb_action_templates; i++) {
@@ -2334,6 +2336,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			rte_errno = EINVAL;
 			goto at_error;
 		}
+		at[i] = action_templates[i]->tmpl;
 		tbl->ats[i].action_template = action_templates[i];
 		LIST_INIT(&tbl->ats[i].acts.act_list);
 		if (!port_started)
@@ -2347,6 +2350,10 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		}
 	}
 	tbl->nb_action_templates = nb_action_templates;
+	tbl->matcher = mlx5dr_matcher_create
+		(tbl->grp->tbl, mt, nb_item_templates, at, nb_action_templates, &matcher_attr);
+	if (!tbl->matcher)
+		goto at_error;
 	tbl->type = attr->flow_attr.transfer ? MLX5DR_TABLE_TYPE_FDB :
 		    (attr->flow_attr.egress ? MLX5DR_TABLE_TYPE_NIC_TX :
 		    MLX5DR_TABLE_TYPE_NIC_RX);
@@ -2366,7 +2373,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	while (i--)
 		__atomic_sub_fetch(&item_templates[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
-	mlx5dr_matcher_destroy(tbl->matcher);
 error:
 	err = rte_errno;
 	if (tbl) {
@@ -2796,6 +2802,154 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
+	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
+	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
+	[RTE_FLOW_ACTION_TYPE_JUMP] = MLX5DR_ACTION_TYP_FT,
+	[RTE_FLOW_ACTION_TYPE_QUEUE] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_RSS] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
+	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
+	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+};
+
+static int
+flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
+					  unsigned int action_src,
+					  enum mlx5dr_action_type *action_types,
+					  uint16_t *curr_off,
+					  struct rte_flow_actions_template *at)
+{
+	uint32_t act_idx;
+	uint32_t type;
+
+	if (!mask->conf) {
+		DRV_LOG(WARNING, "Unable to determine indirect action type "
+			"without a mask specified");
+		return -EINVAL;
+	}
+	act_idx = (uint32_t)(uintptr_t)mask->conf;
+	type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
+		*curr_off = *curr_off + 1;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Create DR action template based on a provided sequence of flow actions.
+ *
+ * @param[in] at
+ *   Pointer to flow actions template to be updated.
+ *
+ * @return
+ *   DR action template pointer on success and action offsets in @p at are updated.
+ *   NULL otherwise.
+ */
+static struct mlx5dr_action_template *
+flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
+{
+	struct mlx5dr_action_template *dr_template;
+	enum mlx5dr_action_type action_types[MLX5_HW_MAX_ACTS] = { MLX5DR_ACTION_TYP_LAST };
+	unsigned int i;
+	uint16_t curr_off;
+	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+	uint16_t reformat_off = UINT16_MAX;
+	uint16_t mhdr_off = UINT16_MAX;
+	int ret;
+	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		const struct rte_flow_action_raw_encap *raw_encap_data;
+		size_t data_size;
+		enum mlx5dr_action_type type;
+
+		if (curr_off >= MLX5_HW_MAX_ACTS)
+			goto err_actions_num;
+		switch (at->actions[i].type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
+									action_types,
+									&curr_off, at);
+			if (ret)
+				return NULL;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			MLX5_ASSERT(reformat_off == UINT16_MAX);
+			reformat_off = curr_off++;
+			reformat_act_type = mlx5_hw_dr_action_types[at->actions[i].type];
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data = at->actions[i].conf;
+			data_size = raw_encap_data->size;
+			if (reformat_off != UINT16_MAX) {
+				reformat_act_type = data_size < MLX5_ENCAPSULATION_DECISION_SIZE ?
+					MLX5DR_ACTION_TYP_TNL_L3_TO_L2 :
+					MLX5DR_ACTION_TYP_L2_TO_TNL_L3;
+			} else {
+				reformat_off = curr_off++;
+				reformat_act_type = MLX5DR_ACTION_TYP_L2_TO_TNL_L2;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			reformat_off = curr_off++;
+			reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr_off == UINT16_MAX) {
+				mhdr_off = curr_off++;
+				type = mlx5_hw_dr_action_types[at->actions[i].type];
+				action_types[mhdr_off] = type;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
+			break;
+		default:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			break;
+		}
+	}
+	if (curr_off >= MLX5_HW_MAX_ACTS)
+		goto err_actions_num;
+	if (mhdr_off != UINT16_MAX)
+		at->mhdr_off = mhdr_off;
+	if (reformat_off != UINT16_MAX) {
+		at->reformat_off = reformat_off;
+		action_types[reformat_off] = reformat_act_type;
+	}
+	dr_template = mlx5dr_action_template_create(action_types);
+	if (dr_template)
+		at->dr_actions_num = curr_off;
+	else
+		DRV_LOG(ERR, "Failed to create DR action template: %d", rte_errno);
+	return dr_template;
+err_actions_num:
+	DRV_LOG(ERR, "Number of HW actions (%u) exceeded maximum (%u) allowed in template",
+		curr_off, MLX5_HW_MAX_ACTS);
+	return NULL;
+}
+
 /**
  * Create flow action template.
  *
@@ -2821,7 +2975,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_len, mask_len, i;
+	int len, act_num, act_len, mask_len;
+	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = MLX5_HW_MAX_ACTS;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
@@ -2891,6 +3046,11 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
+	/* Count flow actions to allocate required space for storing DR offsets. */
+	act_num = 0;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
+		act_num++;
+	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
@@ -2900,19 +3060,26 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
-	/* Actions part is in the first half. */
+	/* Actions part is in the first part. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
 				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	/* Masks part is in the second half. */
+	/* Masks part is in the second part. */
 	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
 				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	/* DR actions offsets in the third part. */
+	at->actions_off = (uint16_t *)((uint8_t *)at->masks + mask_len);
+	at->actions_num = act_num;
+	for (i = 0; i < at->actions_num; ++i)
+		at->actions_off[i] = UINT16_MAX;
+	at->reformat_off = UINT16_MAX;
+	at->mhdr_off = UINT16_MAX;
 	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
@@ -2926,12 +3093,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			at->masks[i].conf = masks->conf;
 		}
 	}
+	at->tmpl = flow_hw_dr_actions_template_create(at);
+	if (!at->tmpl)
+		goto error;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	if (at)
+	if (at) {
+		if (at->tmpl)
+			mlx5dr_action_template_destroy(at->tmpl);
 		mlx5_free(at);
+	}
 	return NULL;
 }
 
@@ -2962,6 +3135,8 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 				   "action template in using");
 	}
 	LIST_REMOVE(template, next);
+	if (template->tmpl)
+		mlx5dr_action_template_destroy(template->tmpl);
 	mlx5_free(template);
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 14/27] net/mlx5: fix indirect action validate
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (12 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 13/27] net/mlx5: support DR action template API Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 15/27] net/mlx5: update indirect actions ops to HW variation Suanming Mou
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Dariusz Sosnowski; +Cc: dev

For indirect actions, the action mask type indicates the indirect
action type. And action mask conf be NULL means the indirect action
will be provided by flow action conf.

This commit fixes the indirect action validate.

Fixes: 393e0eb555c0 ("net/mlx5: support DR action template API")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_hw.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6a1ed7e790..d828d49613 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -2726,7 +2726,8 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_action *mask = &masks[i];
 
 		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
-		if (action->type != mask->type)
+		if (action->type != RTE_FLOW_ACTION_TYPE_INDIRECT &&
+		    action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  action,
@@ -2824,22 +2825,25 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 					  uint16_t *curr_off,
 					  struct rte_flow_actions_template *at)
 {
-	uint32_t act_idx;
 	uint32_t type;
 
-	if (!mask->conf) {
+	if (!mask) {
 		DRV_LOG(WARNING, "Unable to determine indirect action type "
 			"without a mask specified");
 		return -EINVAL;
 	}
-	act_idx = (uint32_t)(uintptr_t)mask->conf;
-	type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	type = mask->type;
 	switch (type) {
-	case MLX5_INDIRECT_ACTION_TYPE_RSS:
+	case RTE_FLOW_ACTION_TYPE_RSS:
 		at->actions_off[action_src] = *curr_off;
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 15/27] net/mlx5: update indirect actions ops to HW variation
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (13 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 14/27] net/mlx5: fix indirect action validate Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 16/27] net/mlx5: support indirect count action for HW steering Suanming Mou
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Xiaoyu Min

From: Xiaoyu Min <jackmin@nvidia.com>

Each flow engine should have its own callback functions for each
flow's ops.

Create new callback functions for indirect actions' ops which actually
are wrppers of their mlx5_hw_async_* counter parts.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_hw.c | 98 +++++++++++++++++++++++++++++++--
 1 file changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index d828d49613..de82396a04 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4988,6 +4988,96 @@ flow_hw_query(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Create indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   A valid shared action handle in case of success, NULL otherwise and
+ *   rte_errno is set.
+ */
+static struct rte_flow_action_handle *
+flow_hw_action_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_indir_action_conf *conf,
+		       const struct rte_flow_action *action,
+		       struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
+					    NULL, err);
+}
+
+/**
+ * Destroy the indirect action.
+ * Release action related resources on the NIC and the memory.
+ * Lock free, (mutex should be acquired by caller).
+ * Dispatcher for action type specific call.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be removed.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_destroy(struct rte_eth_dev *dev,
+		       struct rte_flow_action_handle *handle,
+		       struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
+			NULL, error);
+}
+
+/**
+ * Updates in place shared action configuration.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be updated.
+ * @param[in] update
+ *   Action specification used to modify the action pointed by *handle*.
+ *   *update* could be of same type with the action pointed by the *handle*
+ *   handle argument, or some other structures like a wrapper, depending on
+ *   the indirect action type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_update(struct rte_eth_dev *dev,
+		      struct rte_flow_action_handle *handle,
+		      const void *update,
+		      struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
+			update, NULL, err);
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	return flow_dv_action_query(dev, handle, data, error);
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -5007,10 +5097,10 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
 	.action_validate = flow_dv_action_validate,
-	.action_create = flow_dv_action_create,
-	.action_destroy = flow_dv_action_destroy,
-	.action_update = flow_dv_action_update,
-	.action_query = flow_dv_action_query,
+	.action_create = flow_hw_action_create,
+	.action_destroy = flow_hw_action_destroy,
+	.action_update = flow_hw_action_update,
+	.action_query = flow_hw_action_query,
 	.query = flow_hw_query,
 };
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 16/27] net/mlx5: support indirect count action for HW steering
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (14 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 15/27] net/mlx5: update indirect actions ops to HW variation Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 17/27] net/mlx5: add pattern and table attribute validation Suanming Mou
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Xiaoyu Min

From: Xiaoyu Min <jackmin@nvidia.com>

The indirect counter action is taked as _shared_ counter between
the flows use it.

This _shared_ counter is gotten from counter pool at time the indirect
action is created. And be put back to counter pool when indirect action
is destroyed.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   3 +
 drivers/net/mlx5/mlx5_flow_hw.c | 104 +++++++++++++++++++++++++++++++-
 2 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index c982cb953a..a39dacc60a 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1148,6 +1148,9 @@ struct mlx5_action_construct_data {
 			uint32_t level; /* RSS level. */
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
+		struct {
+			uint32_t id;
+		} shared_counter;
 	};
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index de82396a04..92b61b63d1 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -536,6 +536,44 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared counter action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] cnt_id
+ *   Shared counter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t cnt_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_counter.id = cnt_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
+
 /**
  * Translate shared indirect action.
  *
@@ -577,6 +615,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 		    action_src, action_dst, idx, shared_rss))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (__flow_hw_act_data_shared_cnt_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
+			action_src, action_dst, act_idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1454,6 +1499,13 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				(dev, &act_data, item_flags, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+				act_idx,
+				&rule_act->action,
+				&rule_act->counter.offset))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1761,6 +1813,17 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				return ret;
 			job->flow->cnt_id = cnt_id;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 act_data->shared_counter.id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = act_data->shared_counter.id;
+			break;
 		default:
 			break;
 		}
@@ -4860,10 +4923,28 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	cnt_id_t cnt_id;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_create(dev, conf, action, error);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+			rte_flow_error_set(error, ENODEV,
+					RTE_FLOW_ERROR_TYPE_ACTION,
+					NULL,
+					"counter are not configured!");
+		else
+			handle = (struct rte_flow_action_handle *)
+				 (uintptr_t)cnt_id;
+		break;
+	default:
+		handle = flow_dv_action_create(dev, conf, action, error);
+	}
+	return handle;
 }
 
 /**
@@ -4927,10 +5008,19 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			      void *user_data,
 			      struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_destroy(dev, handle, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	default:
+		return flow_dv_action_destroy(dev, handle, error);
+	}
 }
 
 static int
@@ -5075,7 +5165,15 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 		     const struct rte_flow_action_handle *handle, void *data,
 		     struct rte_flow_error *error)
 {
-	return flow_dv_action_query(dev, handle, data, error);
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return flow_hw_query_counter(dev, act_idx, data, error);
+	default:
+		return flow_dv_action_query(dev, handle, data, error);
+	}
 }
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 17/27] net/mlx5: add pattern and table attribute validation
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (15 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 16/27] net/mlx5: support indirect count action for HW steering Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 18/27] net/mlx5: add meta item support in egress Suanming Mou
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adds validation of direction attributes of pattern templates
and template tables.

In case of pattern templates the following configurations are allowed
(and if this configuration requires addition of implicit pattern items):

1. If E-Switch is enabled (i.e. dv_esw_en devarg is set to 1):
    1. If a port is a VF/SF representor:
        1. Ingress only - implicit pattern items are added.
        2. Egress only - implicit pattern items are added.
    2. If a port is a transfer proxy port (E-Switch Manager/PF
       representor):
        1. Ingress, egress and transfer - no implicit items are added.
           This setting is useful for applications which require to
           receive traffic from devices connected to the E-Switch and
           did not hit any transfer flow rules.
        2. Ingress only - implicit pattern items are added.
        3. Egress only - implicit pattern items are added.
        4. Transfer only - no implicit pattern items are added.
2. If E-Switch is disabled (i.e. dv_esw_en devarg is set to 0):
    1. Ingress only - no implicit pattern items are added.
    2. Egress only - no implicit pattern items are added.
    3. Ingress and egress - no implicit pattern items are added.
    4. Transfer is not allowed.

In case of template tables, the table attributes must be consistent
with attributes associated with pattern template attributes.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_hw.c | 80 +++++++++++++++++++++++++--------
 1 file changed, 62 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 92b61b63d1..dfbc434d54 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -2379,6 +2379,13 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	for (i = 0; i < nb_item_templates; i++) {
 		uint32_t ret;
 
+		if ((flow_attr.ingress && !item_templates[i]->attr.ingress) ||
+		    (flow_attr.egress && !item_templates[i]->attr.egress) ||
+		    (flow_attr.transfer && !item_templates[i]->attr.transfer)) {
+			DRV_LOG(ERR, "pattern template and template table attribute mismatch");
+			rte_errno = EINVAL;
+			goto it_error;
+		}
 		ret = __atomic_add_fetch(&item_templates[i]->refcnt, 1,
 					 __ATOMIC_RELAXED);
 		if (ret <= 1) {
@@ -2557,6 +2564,7 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -2565,6 +2573,12 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
+	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
+		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+				  "egress flows are not supported with HW Steering"
+				  " when E-Switch is enabled");
+		return NULL;
+	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -3254,11 +3268,48 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			 const struct rte_flow_item items[],
 			 struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 	bool items_end = false;
-	RTE_SET_USED(dev);
-	RTE_SET_USED(attr);
 
+	if (!attr->ingress && !attr->egress && !attr->transfer)
+		return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "at least one of the direction attributes"
+					  " must be specified");
+	if (priv->sh->config.dv_esw_en) {
+		MLX5_ASSERT(priv->master || priv->representor);
+		if (priv->master) {
+			/*
+			 * It is allowed to specify ingress, egress and transfer attributes
+			 * at the same time, in order to construct flows catching all missed
+			 * FDB traffic and forwarding it to the master port.
+			 */
+			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "only one or all direction attributes"
+							  " at once can be used on transfer proxy"
+							  " port");
+		} else {
+			if (attr->transfer)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+							  "transfer attribute cannot be used with"
+							  " port representors");
+			if (attr->ingress && attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "ingress and egress direction attributes"
+							  " cannot be used at the same time on"
+							  " port representors");
+		}
+	} else {
+		if (attr->transfer)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+						  "transfer attribute cannot be used when"
+						  " E-Switch is disabled");
+	}
 	for (i = 0; !items_end; i++) {
 		int type = items[i].type;
 
@@ -3289,7 +3340,15 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 							  NULL,
 							  "Unsupported internal tag index");
+			break;
 		}
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+			if (attr->ingress || attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when transfer attribute is set");
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -3299,7 +3358,6 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_TCP:
 		case RTE_FLOW_ITEM_TYPE_GTP:
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
-		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
 		case RTE_FLOW_ITEM_TYPE_META:
@@ -3350,21 +3408,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress) {
-		/*
-		 * Disallow pattern template with ingress and egress/transfer
-		 * attributes in order to forbid implicit port matching
-		 * on egress and transfer traffic.
-		 */
-		if (attr->egress || attr->transfer) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					   NULL,
-					   "item template for ingress traffic"
-					   " cannot be used for egress/transfer"
-					   " traffic when E-Switch is enabled");
-			return NULL;
-		}
+	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
 		copied_items = flow_hw_copy_prepend_port_item(items, error);
 		if (!copied_items)
 			return NULL;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 18/27] net/mlx5: add meta item support in egress
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (16 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 17/27] net/mlx5: add pattern and table attribute validation Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 19/27] net/mlx5: add support for ASO return register Suanming Mou
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adds support for META item in HW Steering mode, in NIC TX
domain.

Due to API limitations, usage of META item requires that all mlx5
ports use the same configuration of dv_esw_en and dv_xmeta_en device
arguments in order to consistently translate META item to appropriate
register. If mlx5 ports use different configurations, then configuration
of the first probed device is used.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  1 +
 drivers/net/mlx5/mlx5.c          |  4 ++-
 drivers/net/mlx5/mlx5_flow.h     | 22 +++++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c  | 61 ++++++++++++++++++++++++++++++--
 4 files changed, 84 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 41940d7ce7..54e7164663 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1563,6 +1563,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		flow_hw_init_flow_metadata_config(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
 		    flow_hw_create_vport_action(eth_dev)) {
 			DRV_LOG(ERR, "port %u failed to create vport action",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 314176022a..87cbcd473d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1970,8 +1970,10 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	flow_hw_resource_release(dev);
 #endif
 	flow_hw_clear_port_info(dev);
-	if (priv->sh->config.dv_flow_en == 2)
+	if (priv->sh->config.dv_flow_en == 2) {
+		flow_hw_clear_flow_metadata_config();
 		flow_hw_clear_tags_set(dev);
+	}
 	if (priv->rxq_privs != NULL) {
 		/* XXX race condition if mlx5_rx_burst() is still running. */
 		rte_delay_us_sleep(1000);
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a39dacc60a..dae2fe6b37 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1485,6 +1485,13 @@ flow_hw_get_wire_port(struct ibv_context *ibctx)
 	return NULL;
 }
 
+extern uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+extern uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+extern uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+void flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev);
+void flow_hw_clear_flow_metadata_config(void);
+
 /*
  * Convert metadata or tag to the actual register.
  * META: Can only be used to match in the FDB in this stage, fixed C_1.
@@ -1496,7 +1503,20 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 {
 	switch (type) {
 	case RTE_FLOW_ITEM_TYPE_META:
-		return REG_C_1;
+		if (mlx5_flow_hw_flow_metadata_esw_en &&
+		    mlx5_flow_hw_flow_metadata_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		}
+		/*
+		 * On root table - PMD allows only egress META matching, thus
+		 * REG_A matching is sufficient.
+		 *
+		 * On non-root tables - REG_A corresponds to general_purpose_lookup_field,
+		 * which translates to REG_A in NIC TX and to REG_B in NIC RX.
+		 * However, current FW does not implement REG_B case right now, so
+		 * REG_B case should be rejected on pattern template validation.
+		 */
+		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index dfbc434d54..55a14d39eb 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3332,7 +3332,6 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		{
 			const struct rte_flow_item_tag *tag =
 				(const struct rte_flow_item_tag *)items[i].spec;
-			struct mlx5_priv *priv = dev->data->dev_private;
 			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
 
 			if (!((1 << (tag->index - REG_C_0)) & regcs))
@@ -3349,6 +3348,17 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 						  "represented port item cannot be used"
 						  " when transfer attribute is set");
 			break;
+		case RTE_FLOW_ITEM_TYPE_META:
+			if (!priv->sh->config.dv_esw_en ||
+			    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+				if (attr->ingress)
+					return rte_flow_error_set(error, EINVAL,
+								  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+								  "META item is not supported"
+								  " on current FW with ingress"
+								  " attribute");
+			}
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -3360,7 +3370,6 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
-		case RTE_FLOW_ITEM_TYPE_META:
 		case RTE_FLOW_ITEM_TYPE_GRE:
 		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
@@ -4938,6 +4947,54 @@ void flow_hw_clear_tags_set(struct rte_eth_dev *dev)
 		       sizeof(enum modify_reg) * MLX5_FLOW_HW_TAGS_MAX);
 }
 
+uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+/**
+ * Initializes static configuration of META flow items.
+ *
+ * As a temporary workaround, META flow item is translated to a register,
+ * based on statically saved dv_esw_en and dv_xmeta_en device arguments.
+ * It is a workaround for flow_hw_get_reg_id() where port specific information
+ * is not available at runtime.
+ *
+ * Values of dv_esw_en and dv_xmeta_en device arguments are taken from the first opened port.
+ * This means that each mlx5 port will use the same configuration for translation
+ * of META flow items.
+ *
+ * @param[in] dev
+ *    Pointer to Ethernet device.
+ */
+void
+flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_fetch_add(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = MLX5_SH(dev)->config.dv_esw_en;
+	mlx5_flow_hw_flow_metadata_xmeta_en = MLX5_SH(dev)->config.dv_xmeta_en;
+}
+
+/**
+ * Clears statically stored configuration related to META flow items.
+ */
+void
+flow_hw_clear_flow_metadata_config(void)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_sub_fetch(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = 0;
+	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
+}
+
 /**
  * Create shared action.
  *
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 19/27] net/mlx5: add support for ASO return register
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (17 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 18/27] net/mlx5: add meta item support in egress Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 20/27] lib/ethdev: add connection tracking configuration Suanming Mou
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Bing Zhao

From: Bing Zhao <bingz@nvidia.com>

A REG_C_x metadata register is needed to store the result after an
ASO action. Like in the SWS, the meter color register is used for
all the ASO actions right now and this register was already filtered
out from the available tags.

It is assumed that all the devices are using the same meter color
register inside one application now.

In the next stage, the available tags and other metadata registers
allocation will be stored per device.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c    | 1 +
 drivers/net/mlx5/mlx5_flow.h    | 3 +++
 drivers/net/mlx5/mlx5_flow_hw.c | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 658cc69750..cbf9c31984 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -49,6 +49,7 @@ struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
  */
 uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 enum modify_reg mlx5_flow_hw_avl_tags[MLX5_FLOW_HW_TAGS_MAX] = {REG_NON};
+enum modify_reg mlx5_flow_hw_aso_tag;
 
 struct tunnel_default_miss_ctx {
 	uint16_t *queue;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index dae2fe6b37..a6bd002dca 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1445,6 +1445,7 @@ extern struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
 #define MLX5_FLOW_HW_TAGS_MAX 8
 extern uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 extern enum modify_reg mlx5_flow_hw_avl_tags[];
+extern enum modify_reg mlx5_flow_hw_aso_tag;
 
 /*
  * Get metadata match tag and mask for given rte_eth_dev port.
@@ -1517,6 +1518,8 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 * REG_B case should be rejected on pattern template validation.
 		 */
 		return REG_A;
+	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 55a14d39eb..b9d4402aed 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4903,6 +4903,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
+		MLX5_ASSERT(mlx5_flow_hw_aso_tag == priv->mtr_color_reg);
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
@@ -4925,6 +4926,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		}
 	}
 	priv->sh->hws_tags = 1;
+	mlx5_flow_hw_aso_tag = (enum modify_reg)priv->mtr_color_reg;
 	mlx5_flow_hw_avl_tags_init_cnt++;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 20/27] lib/ethdev: add connection tracking configuration
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (18 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 19/27] net/mlx5: add support for ASO return register Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 21/27] net/mlx5: add HW steering connection tracking support Suanming Mou
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Ori Kam, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko; +Cc: dev

This commit adds the maximum connection tracking number configuration
for async flow engine.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 lib/ethdev/rte_flow.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index abb475bdee..e9a1bce38b 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -4991,6 +4991,11 @@ struct rte_flow_port_attr {
 	 * @see RTE_FLOW_ACTION_TYPE_METER
 	 */
 	uint32_t nb_meter_policies;
+	/**
+	 * Number of connection tracking to configure.
+	 * @see RTE_FLOW_ACTION_TYPE_CONNTRACK
+	 */
+	uint32_t nb_cts;	
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 21/27] net/mlx5: add HW steering connection tracking support
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (19 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 20/27] lib/ethdev: add connection tracking configuration Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 22/27] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This commit adds the support of connection tracking to HW steering as
SW steering did before.

Different with SW steering implementation, take advantage of HW steering
bulk action allocation support, in HW steering only one single CT pool
is needed.

An indexed pool is introduced to record allocated actions from bulk and
CT action state etc. Once one CT action is allocated from bulk, one
indexed object will also be allocated from the indexed pool, similar for
deallocate. That makes mlx5_aso_ct_action can also be managed by that
indexed pool, no need to be reserved from mlx5_aso_ct_pool. The single
CT pool is also saved to mlx5_aso_ct_action struct directly.

The ASO operation functions are shared with SW steering implementation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5.h          |  27 ++-
 drivers/net/mlx5/mlx5_flow.h     |   4 +
 drivers/net/mlx5/mlx5_flow_aso.c |  19 +-
 drivers/net/mlx5/mlx5_flow_dv.c  |   6 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 342 ++++++++++++++++++++++++++++++-
 5 files changed, 388 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index be60038810..ee4823f649 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1159,7 +1159,12 @@ enum mlx5_aso_ct_state {
 
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
-	LIST_ENTRY(mlx5_aso_ct_action) next; /* Pointer to the next ASO CT. */
+	union {
+		LIST_ENTRY(mlx5_aso_ct_action) next;
+		/* Pointer to the next ASO CT. Used only in SWS. */
+		struct mlx5_aso_ct_pool *pool;
+		/* Pointer to action pool. Used only in HWS. */
+	};
 	void *dr_action_orig; /* General action object for original dir. */
 	void *dr_action_rply; /* General action object for reply dir. */
 	uint32_t refcnt; /* Action used count in device flows. */
@@ -1173,15 +1178,30 @@ struct mlx5_aso_ct_action {
 #define MLX5_ASO_CT_UPDATE_STATE(c, s) \
 	__atomic_store_n(&((c)->state), (s), __ATOMIC_RELAXED)
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+
 /* ASO connection tracking software pool definition. */
 struct mlx5_aso_ct_pool {
 	uint16_t index; /* Pool index in pools array. */
+	/* Free ASO CT index in the pool. Used by HWS. */
+	struct mlx5_indexed_pool *cts;
 	struct mlx5_devx_obj *devx_obj;
-	/* The first devx object in the bulk, used for freeing (not yet). */
-	struct mlx5_aso_ct_action actions[MLX5_ASO_CT_ACTIONS_PER_POOL];
+	union {
+		void *dummy_action;
+		/* Dummy action to increase the reference count in the driver. */
+		struct mlx5dr_action *dr_action;
+		/* HWS action. */
+	};
+	struct mlx5_aso_ct_action actions[0];
 	/* CT action structures bulk. */
 };
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
 LIST_HEAD(aso_ct_list, mlx5_aso_ct_action);
 
 /* Pools management structure for ASO connection tracking pools. */
@@ -1647,6 +1667,7 @@ struct mlx5_priv {
 	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 	struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
+	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
 #endif
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a6bd002dca..f7bedd9605 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -83,6 +83,10 @@ enum {
 #define MLX5_INDIRECT_ACT_CT_GET_IDX(index) \
 	((index) & ((1 << MLX5_INDIRECT_ACT_CT_OWNER_SHIFT) - 1))
 
+#define MLX5_ACTION_CTX_CT_GET_IDX  MLX5_INDIRECT_ACT_CT_GET_IDX
+#define MLX5_ACTION_CTX_CT_GET_OWNER MLX5_INDIRECT_ACT_CT_GET_OWNER
+#define MLX5_ACTION_CTX_CT_GEN_IDX MLX5_INDIRECT_ACT_CT_GEN_IDX
+
 /* Matches on selected register. */
 struct mlx5_rte_flow_item_tag {
 	enum modify_reg id;
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index ed9272e583..34fed3f4b8 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -903,6 +903,15 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 	return -1;
 }
 
+static inline struct mlx5_aso_ct_pool*
+__mlx5_aso_ct_get_pool(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_action *ct)
+{
+	if (likely(sh->config.dv_flow_en == 2))
+		return ct->pool;
+	return container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+}
+
 /*
  * Post a WQE to the ASO CT SQ to modify the context.
  *
@@ -945,7 +954,7 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
 	sq->elts[sq->head & mask].ct = ct;
 	sq->elts[sq->head & mask].query_data = NULL;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1113,7 +1122,7 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	wqe_idx = sq->head & mask;
 	sq->elts[wqe_idx].ct = ct;
 	sq->elts[wqe_idx].query_data = data;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1231,7 +1240,7 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1267,7 +1276,7 @@ mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
 		/* Waiting for CQE ready, consider should block or sleep. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	DRV_LOG(ERR, "Fail to poll CQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1383,7 +1392,7 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		else
 			rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 80539fd75d..e2794c1d26 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12790,6 +12790,7 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 	struct mlx5_devx_obj *obj = NULL;
 	uint32_t i;
 	uint32_t log_obj_size = rte_log2_u32(MLX5_ASO_CT_ACTIONS_PER_POOL);
+	size_t mem_size;
 
 	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
 							  priv->sh->cdev->pdn,
@@ -12799,7 +12800,10 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
 		return NULL;
 	}
-	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	mem_size = sizeof(struct mlx5_aso_ct_action) *
+		   MLX5_ASO_CT_ACTIONS_PER_POOL +
+		   sizeof(*pool);
+	pool = mlx5_malloc(MLX5_MEM_ZERO, mem_size, 0, SOCKET_ID_ANY);
 	if (!pool) {
 		rte_errno = ENOMEM;
 		claim_zero(mlx5_devx_cmd_destroy(obj));
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index b9d4402aed..a4a0882d15 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -14,9 +14,19 @@
 #include "mlx5dr_send.h"
 #include "mlx5_hws_cnt.h"
 
+#define MLX5_HW_INV_QUEUE UINT32_MAX
+
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_HW_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_HW_IPOOL_CACHE_MIN (1 << 9)
+
 /* Default push burst threshold. */
 #define BURST_THR 32u
 
@@ -323,6 +333,24 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 	return hrxq;
 }
 
+static __rte_always_inline int
+flow_hw_ct_compile(struct rte_eth_dev *dev, uint32_t idx,
+		   struct mlx5dr_rule_action *rule_act)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+	if (!ct || mlx5_aso_ct_available(priv->sh, ct))
+		return -1;
+	rule_act->action = priv->hws_ctpool->dr_action;
+	rule_act->aso_ct.offset = ct->offset;
+	rule_act->aso_ct.direction = ct->is_original ?
+		MLX5DR_ACTION_ASO_CT_DIRECTION_INITIATOR :
+		MLX5DR_ACTION_ASO_CT_DIRECTION_RESPONDER;
+	return 0;
+}
+
 /**
  * Destroy DR actions created by action template.
  *
@@ -622,6 +650,10 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, idx, &acts->rule_acts[action_dst]))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1057,6 +1089,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	bool reformat_used = false;
 	uint16_t action_pos;
 	uint16_t jump_pos;
+	uint32_t ct_idx;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1279,6 +1312,20 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			action_pos = at->actions_off[actions - action_start];
+			if (masks->conf) {
+				ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+					 ((uint32_t)(uintptr_t)actions->conf);
+				if (flow_hw_ct_compile(dev, ct_idx,
+						       &acts->rule_acts[action_pos]))
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos)) {
+				goto err;
+			}
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1506,6 +1553,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				&rule_act->counter.offset))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, idx, rule_act))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1691,6 +1742,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		uint32_t ct_idx;
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
@@ -1824,6 +1876,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				return ret;
 			job->flow->cnt_id = act_data->shared_counter.id;
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+				 ((uint32_t)(uintptr_t)action->conf);
+			if (flow_hw_ct_compile(dev, ct_idx,
+					       &rule_acts[act_data->action_dst]))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2348,6 +2407,8 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	if (nb_flows < cfg.trunk_size) {
 		cfg.per_core_cache = 0;
 		cfg.trunk_size = nb_flows;
+	} else if (nb_flows <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
 	}
 	/* Check if we requires too many templates. */
 	if (nb_item_templates > max_tpl ||
@@ -2867,6 +2928,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2893,6 +2957,7 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 };
 
 static int
@@ -2921,6 +2986,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3375,6 +3445,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
 		case RTE_FLOW_ITEM_TYPE_ICMP:
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
+		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
@@ -4570,6 +4641,84 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	return -EINVAL;
 }
 
+static void
+flow_hw_ct_pool_destroy(struct rte_eth_dev *dev __rte_unused,
+			struct mlx5_aso_ct_pool *pool)
+{
+	if (pool->dr_action)
+		mlx5dr_action_destroy(pool->dr_action);
+	if (pool->devx_obj)
+		claim_zero(mlx5_devx_cmd_destroy(pool->devx_obj));
+	if (pool->cts)
+		mlx5_ipool_destroy(pool->cts);
+	mlx5_free(pool);
+}
+
+static struct mlx5_aso_ct_pool *
+flow_hw_ct_pool_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *port_attr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_devx_obj *obj;
+	uint32_t nb_cts = rte_align32pow2(port_attr->nb_cts);
+	uint32_t log_obj_size = rte_log2_u32(nb_cts);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_ct_action),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hw_ct_action",
+	};
+	int reg_id;
+	uint32_t flags;
+
+	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	if (!pool) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
+							  priv->sh->cdev->pdn,
+							  log_obj_size);
+	if (!obj) {
+		rte_errno = ENODATA;
+		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
+		goto err;
+	}
+	pool->devx_obj = obj;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_ASO_CONNTRACK, 0, NULL);
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	pool->dr_action = mlx5dr_action_create_aso_ct(priv->dr_ctx,
+						      (struct mlx5dr_devx_obj *)obj,
+						      reg_id - REG_C_0, flags);
+	if (!pool->dr_action)
+		goto err;
+	/*
+	 * No need for local cache if CT number is a small number. Since
+	 * flow insertion rate will be very limited in that case. Here let's
+	 * set the number to less than default trunk size 4K.
+	 */
+	if (nb_cts <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_cts;
+	} else if (nb_cts <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	pool->cts = mlx5_ipool_create(&cfg);
+	if (!pool->cts)
+		goto err;
+	return pool;
+err:
+	flow_hw_ct_pool_destroy(dev, pool);
+	return NULL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4755,6 +4904,11 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_cts) {
+		priv->hws_ctpool = flow_hw_ct_pool_create(dev, port_attr);
+		if (!priv->hws_ctpool)
+			goto err;
+	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
 				nb_queue);
@@ -4763,6 +4917,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	return 0;
 err:
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -4835,6 +4993,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	}
 	if (priv->hws_cpool)
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4997,6 +5159,169 @@ flow_hw_clear_flow_metadata_config(void)
 	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
 }
 
+static int
+flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
+			  uint32_t idx,
+			  struct rte_flow_error *error)
+{
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	struct rte_eth_dev *owndev = &rte_eth_devices[owner];
+	struct mlx5_priv *priv = owndev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT destruction index");
+	}
+	__atomic_store_n(&ct->state, ASO_CONNTRACK_FREE,
+				 __ATOMIC_RELAXED);
+	mlx5_ipool_free(pool->cts, ct_idx);
+	return 0;
+}
+
+static int
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+			struct rte_flow_action_conntrack *profile,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+
+	if (owner != PORT_ID(priv))
+		return rte_flow_error_set(error, EACCES,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Can't query CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT query index");
+	}
+	profile->peer_port = ct->peer;
+	profile->is_original_dir = ct->is_original;
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, ct, profile))
+		return rte_flow_error_set(error, EIO,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Failed to query CT context");
+	return 0;
+}
+
+
+static int
+flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_modify_conntrack *action_conf,
+			 uint32_t idx, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	const struct rte_flow_action_conntrack *new_prf;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+	int ret = 0;
+
+	if (PORT_ID(priv) != owner)
+		return rte_flow_error_set(error, EACCES,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "Can't update CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT update index");
+	}
+	new_prf = &action_conf->new_ct;
+	if (action_conf->direction)
+		ct->is_original = !!new_prf->is_original_dir;
+	if (action_conf->state) {
+		/* Only validate the profile when it needs to be updated. */
+		ret = mlx5_validate_action_ct(dev, new_prf, error);
+		if (ret)
+			return ret;
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, ct, new_prf);
+		if (ret)
+			return rte_flow_error_set(error, EIO,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL,
+					"Failed to send CT context update WQE");
+		if (queue != MLX5_HW_INV_QUEUE)
+			return 0;
+		/* Block until ready or a failure in synchronous mode. */
+		ret = mlx5_aso_ct_available(priv->sh, ct);
+		if (ret)
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+	}
+	return ret;
+}
+
+static struct rte_flow_action_handle *
+flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action_conntrack *pro,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint32_t ct_idx = 0;
+	int ret;
+
+	if (!pool) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "CT is not enabled");
+		return 0;
+	}
+	ct = mlx5_ipool_zmalloc(pool->cts, &ct_idx);
+	if (!ct) {
+		rte_flow_error_set(error, rte_errno,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to allocate CT object");
+		return 0;
+	}
+	ct->offset = ct_idx - 1;
+	ct->is_original = !!pro->is_original_dir;
+	ct->peer = pro->peer_port;
+	ct->pool = pool;
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, ct, pro)) {
+		mlx5_ipool_free(pool->cts, ct_idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
+	if (queue == MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_available(priv->sh, ct);
+		if (ret) {
+			mlx5_ipool_free(pool->cts, ct_idx);
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+			return 0;
+		}
+	}
+	return (struct rte_flow_action_handle *)(uintptr_t)
+		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
+}
+
 /**
  * Create shared action.
  *
@@ -5044,6 +5369,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			handle = (struct rte_flow_action_handle *)
 				 (uintptr_t)cnt_id;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5079,10 +5407,18 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_update(dev, handle, update, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	default:
+		return flow_dv_action_update(dev, handle, update, error);
+	}
 }
 
 /**
@@ -5121,6 +5457,8 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_destroy(dev, act_idx, error);
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -5274,6 +5612,8 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_query(dev, act_idx, data, error);
 	default:
 		return flow_dv_action_query(dev, handle, data, error);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 22/27] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (20 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 21/27] net/mlx5: add HW steering connection tracking support Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 23/27] net/mlx5: add meter color flow matching in dv Suanming Mou
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

Add PMD implementation for HW steering VLAN push, pop and modify flow
actions.

HWS VLAN push flow action is triggered by a sequence of mandatory
OF_PUSH_VLAN, OF_SET_VLAN_VID and optional OF_SET_VLAN_PCP
flow actions commands.
The commands must be arranged in the exact order:
OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ].
In masked HWS VLAN push flow action template *ALL* the above flow
actions must be masked.
In non-masked HWS VLAN push flow action template *ALL* the above flow
actions must not be masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan / \
  of_set_vlan_vid \
  [ / of_set_vlan_pcp  ] / end \
mask \
  of_push_vlan ethertype 0 / \
  of_set_vlan_vid vlan_vid 0 \
  [ / of_set_vlan_pcp vlan_pcp 0 ] / end\

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan ethertype <E>/ \
  of_set_vlan_vid vlan_vid <VID>\
  [ / of_set_vlan_pcp  <PCP>] / end \
mask \
  of_push_vlan ethertype <type != 0> / \
  of_set_vlan_vid vlan_vid <vid_mask != 0>\
  [ / of_set_vlan_pcp vlan_pcp <pcp_mask != 0> ] / end\

HWS VLAN pop flow action is triggered by OF_POP_VLAN
flow action command.
HWS VLAN pop action template is always non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_pop_vlan / end mask of_pop_vlan / end

HWS VLAN VID modify flow action is triggered by a standalone
OF_SET_VLAN_VID flow action command.
HWS VLAN VID modify action template can be ether masked or non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid / end mask of_set_vlan_vid vlan_vid 0 / end

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid vlan_vid 0x101 / end \
mask of_set_vlan_vid vlan_vid 0xffff / end

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5.h         |   2 +
 drivers/net/mlx5/mlx5_flow.h    |   4 +
 drivers/net/mlx5/mlx5_flow_dv.c |   2 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 360 ++++++++++++++++++++++++++++++--
 4 files changed, 348 insertions(+), 20 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ee4823f649..ec08014832 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1590,6 +1590,8 @@ struct mlx5_priv {
 	void *root_drop_action; /* Pointer to root drop action. */
 	rte_spinlock_t hw_ctrl_lock;
 	LIST_HEAD(hw_ctrl_flow, mlx5_hw_ctrl_flow) hw_ctrl_flows;
+	struct mlx5dr_action *hw_push_vlan[MLX5DR_TABLE_TYPE_MAX];
+	struct mlx5dr_action *hw_pop_vlan[MLX5DR_TABLE_TYPE_MAX];
 	struct mlx5dr_action **hw_vport;
 	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index f7bedd9605..2d1a9dba27 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2434,4 +2434,8 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		struct rte_flow_error *error);
 int flow_hw_table_update(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error);
+int mlx5_flow_item_field_width(struct rte_eth_dev *dev,
+			   enum rte_flow_field_id field, int inherit,
+			   const struct rte_flow_attr *attr,
+			   struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index e2794c1d26..36059beb71 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1326,7 +1326,7 @@ flow_dv_convert_action_modify_ipv6_dscp
 					     MLX5_MODIFICATION_TYPE_SET, error);
 }
 
-static int
+int
 mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 			   enum rte_flow_field_id field, int inherit,
 			   const struct rte_flow_attr *attr,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index a4a0882d15..7e7b48f884 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -48,12 +48,22 @@
 /* Lowest priority for HW non-root table. */
 #define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
+#define MLX5_HW_VLAN_PUSH_TYPE_IDX 0
+#define MLX5_HW_VLAN_PUSH_VID_IDX 1
+#define MLX5_HW_VLAN_PUSH_PCP_IDX 2
+
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
 static int flow_hw_translate_group(struct rte_eth_dev *dev,
 				   const struct mlx5_flow_template_table_cfg *cfg,
 				   uint32_t group,
 				   uint32_t *table_group,
 				   struct rte_flow_error *error);
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -1039,6 +1049,52 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	return 0;
 }
 
+static __rte_always_inline bool
+is_of_vlan_pcp_present(const struct rte_flow_action *actions)
+{
+	/*
+	 * Order of RTE VLAN push actions is
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	return actions[MLX5_HW_VLAN_PUSH_PCP_IDX].type ==
+		RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP;
+}
+
+static __rte_always_inline bool
+is_template_masked_push_vlan(const struct rte_flow_action_of_push_vlan *mask)
+{
+	/*
+	 * In masked push VLAN template all RTE push actions are masked.
+	 */
+	return mask && mask->ethertype != 0;
+}
+
+static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
+{
+/*
+ * OpenFlow Switch Specification defines 801.1q VID as 12+1 bits.
+ */
+	rte_be32_t type, vid, pcp;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	rte_be32_t vid_lo, vid_hi;
+#endif
+
+	type = ((const struct rte_flow_action_of_push_vlan *)
+		actions[MLX5_HW_VLAN_PUSH_TYPE_IDX].conf)->ethertype;
+	vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+		actions[MLX5_HW_VLAN_PUSH_VID_IDX].conf)->vlan_vid;
+	pcp = is_of_vlan_pcp_present(actions) ?
+	      ((const struct rte_flow_action_of_set_vlan_pcp *)
+		      actions[MLX5_HW_VLAN_PUSH_PCP_IDX].conf)->vlan_pcp : 0;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	vid_hi = vid & 0xff;
+	vid_lo = vid >> 8;
+	return (((vid_lo << 8) | (pcp << 5) | vid_hi) << 16) | type;
+#else
+	return (type << 16) | (pcp << 13) | vid;
+#endif
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1141,6 +1197,26 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_push_vlan[type];
+			if (is_template_masked_push_vlan(masks->conf))
+				acts->rule_acts[action_pos].push_vlan.vlan_hdr =
+					vlan_hdr_to_be32(actions);
+			else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos))
+				goto err;
+			actions += is_of_vlan_pcp_present(actions) ?
+					MLX5_HW_VLAN_PUSH_PCP_IDX :
+					MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_pop_vlan[type];
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			action_pos = at->actions_off[actions - action_start];
 			if (masks->conf &&
@@ -1746,8 +1822,17 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
-		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
-			    (int)action->type == act_data->type);
+		/*
+		 * action template construction replaces
+		 * OF_SET_VLAN_VID with MODIFY_FIELD
+		 */
+		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+			MLX5_ASSERT(act_data->type ==
+				    RTE_FLOW_ACTION_TYPE_MODIFY_FIELD);
+		else
+			MLX5_ASSERT(action->type ==
+				    RTE_FLOW_ACTION_TYPE_INDIRECT ||
+				    (int)action->type == act_data->type);
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
@@ -1763,6 +1848,10 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			      (action->conf))->id);
 			rule_acts[act_data->action_dst].tag.value = tag;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			rule_acts[act_data->action_dst].push_vlan.vlan_hdr =
+				vlan_hdr_to_be32(action);
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
@@ -1814,10 +1903,16 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				    act_data->encap.len);
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			ret = flow_hw_modify_field_construct(job,
-							     act_data,
-							     hw_acts,
-							     action);
+			if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+				ret = flow_hw_set_vlan_vid_construct(dev, job,
+								     act_data,
+								     hw_acts,
+								     action);
+			else
+				ret = flow_hw_modify_field_construct(job,
+								     act_data,
+								     hw_acts,
+								     action);
 			if (ret)
 				return -1;
 			break;
@@ -2841,6 +2936,56 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 	return 0;
 }
 
+static int
+flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
+				  const
+				  struct rte_flow_actions_template_attr *attr,
+				  const struct rte_flow_action *action,
+				  const struct rte_flow_action *mask,
+				  struct rte_flow_error *error)
+{
+#define X_FIELD(ptr, t, f) ((t *)((ptr)->conf))->f
+	/*
+	 * 1. Mandatory actions order:
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 * 2. All actions ether masked or not.
+	 */
+	const bool masked_action = action[MLX5_HW_VLAN_PUSH_TYPE_IDX].conf &&
+		X_FIELD(action + MLX5_HW_VLAN_PUSH_TYPE_IDX,
+			const struct rte_flow_action_of_push_vlan,
+			ethertype) != 0;
+	bool masked_param;
+
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+	RTE_SET_USED(mask);
+	if (action[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+		RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: invalid actions order");
+	masked_param = action[MLX5_HW_VLAN_PUSH_VID_IDX].conf &&
+		X_FIELD(action + MLX5_HW_VLAN_PUSH_VID_IDX,
+			const struct rte_flow_action_of_set_vlan_vid, vlan_vid);
+	if (!(masked_action & masked_param))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_SET_VLAN_VID: template mask does not match OF_PUSH_VLAN");
+	if (is_of_vlan_pcp_present(action)) {
+		masked_param = action[MLX5_HW_VLAN_PUSH_PCP_IDX].conf &&
+			X_FIELD(action + MLX5_HW_VLAN_PUSH_PCP_IDX,
+				const struct rte_flow_action_of_set_vlan_pcp,
+					vlan_pcp);
+		if (!(masked_action & masked_param))
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "OF_SET_VLAN_PCP: template mask does not match OF_PUSH_VLAN");
+	}
+
+	return 0;
+#undef X_FIELD
+}
+
 static int
 flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
@@ -2931,6 +3076,18 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			ret = flow_hw_validate_action_push_vlan
+					(dev, attr, action, mask, error);
+			if (ret != 0)
+				return ret;
+			i += is_of_vlan_pcp_present(action) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2958,6 +3115,8 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
+	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
+	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
 };
 
 static int
@@ -3074,6 +3233,14 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				goto err_actions_num;
 			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			i += is_of_vlan_pcp_present(at->actions + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3101,6 +3268,95 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	return NULL;
 }
 
+static void
+flow_hw_set_vlan_vid(struct rte_eth_dev *dev,
+		     const struct rte_flow_action *actions,
+		     const struct rte_flow_action *masks,
+		     struct rte_flow_action *ra, struct rte_flow_action *rm,
+		     struct rte_flow_action_modify_field *spec,
+		     struct rte_flow_action_modify_field *mask,
+		     uint32_t act_num, int set_vlan_vid_ix)
+{
+	struct rte_flow_error error;
+	const bool masked = masks[set_vlan_vid_ix].conf &&
+		(((const struct rte_flow_action_of_set_vlan_vid *)
+			masks[set_vlan_vid_ix].conf)->vlan_vid != 0);
+	const struct rte_flow_action_of_set_vlan_vid *conf =
+		actions[set_vlan_vid_ix].conf;
+	rte_be16_t vid = masked ? conf->vlan_vid : 0;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	if (actions == ra) {
+		size_t copy_sz = sizeof(ra[0]) * act_num;
+		rte_memcpy(ra, actions, copy_sz);
+		rte_memcpy(rm, masks, copy_sz);
+	}
+	*spec = (typeof(*spec)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	*mask = (typeof(*mask)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0xffffffff, .offset = 0xffffffff,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = masked ? (1U << width) - 1 : 0,
+			.offset = 0,
+		},
+		.width = 0xffffffff,
+	};
+	ra[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	ra[set_vlan_vid_ix].conf = spec;
+	rm[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	rm[set_vlan_vid_ix].conf = mask;
+}
+
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	struct rte_flow_error error;
+	rte_be16_t vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+			   action->conf)->vlan_vid;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	struct rte_flow_action_modify_field conf = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	struct rte_flow_action modify_action = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &conf
+	};
+
+	return flow_hw_modify_field_construct(job, act_data, hw_acts,
+					      &modify_action);
+}
+
 /**
  * Create flow action template.
  *
@@ -3132,8 +3388,11 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	uint16_t pos = MLX5_HW_MAX_ACTS;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
-	const struct rte_flow_action *ra;
-	const struct rte_flow_action *rm;
+	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
+	struct rte_flow_action *rm = (void *)(uintptr_t)masks;
+	int set_vlan_vid_ix = -1;
+	struct rte_flow_action_modify_field set_vlan_vid_spec = {0, };
+	struct rte_flow_action_modify_field set_vlan_vid_mask = {0, };
 	const struct rte_flow_action_modify_field rx_mreg = {
 		.operation = RTE_FLOW_MODIFY_SET,
 		.dst = {
@@ -3173,22 +3432,42 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
+		/* Application should make sure only one Q/RSS exist in one rule. */
 		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
 						    tmp_action, tmp_mask, &pos)) {
 			rte_flow_error_set(error, EINVAL,
 					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					   "Failed to concatenate new action/mask");
 			return NULL;
+		} else if (pos != MLX5_HW_MAX_ACTS) {
+			ra = tmp_action;
+			rm = tmp_mask;
 		}
 	}
-	/* Application should make sure only one Q/RSS exist in one rule. */
-	if (pos == MLX5_HW_MAX_ACTS) {
-		ra = actions;
-		rm = masks;
-	} else {
-		ra = tmp_action;
-		rm = tmp_mask;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+			switch (ra[i].type) {
+			/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
+			case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+				i += is_of_vlan_pcp_present(ra + i) ?
+					MLX5_HW_VLAN_PUSH_PCP_IDX :
+					MLX5_HW_VLAN_PUSH_VID_IDX;
+				break;
+			case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+				set_vlan_vid_ix = i;
+				break;
+			default:
+				break;
+			}
 	}
+	/* Count flow actions to allocate required space for storing DR offsets. */
+	act_num = i;
+	if (act_num >= MLX5_HW_MAX_ACTS)
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
+	if (set_vlan_vid_ix != -1)
+		flow_hw_set_vlan_vid(dev, actions, masks, ra, rm,
+				     &set_vlan_vid_spec, &set_vlan_vid_mask,
+				     act_num, set_vlan_vid_ix);
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
 		return NULL;
@@ -3197,10 +3476,6 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	/* Count flow actions to allocate required space for storing DR offsets. */
-	act_num = 0;
-	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
-		act_num++;
 	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
@@ -4719,6 +4994,48 @@ flow_hw_ct_pool_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+static void
+flow_hw_destroy_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i < MLX5DR_TABLE_TYPE_MAX; i++) {
+		if (priv->hw_pop_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_pop_vlan[i]);
+			priv->hw_pop_vlan[i] = NULL;
+		}
+		if (priv->hw_push_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_push_vlan[i]);
+			priv->hw_push_vlan[i] = NULL;
+		}
+	}
+}
+
+static int
+flow_hw_create_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+	const enum mlx5dr_action_flags flags[MLX5DR_TABLE_TYPE_MAX] = {
+		MLX5DR_ACTION_FLAG_HWS_RX,
+		MLX5DR_ACTION_FLAG_HWS_TX,
+		MLX5DR_ACTION_FLAG_HWS_FDB
+	};
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i < MLX5DR_TABLE_TYPE_MAX; i++) {
+		priv->hw_pop_vlan[i] =
+			mlx5dr_action_create_pop_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+		priv->hw_push_vlan[i] =
+			mlx5dr_action_create_push_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+	}
+	return 0;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4915,6 +5232,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	ret = flow_hw_create_vlan(dev);
+	if (ret)
+		goto err;
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -4928,6 +5248,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
 	mlx5_free(priv->hw_q);
@@ -4986,6 +5307,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 23/27] net/mlx5: add meter color flow matching in dv
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (21 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 22/27] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 24/27] net/mlx5: add meter color flow matching in hws Suanming Mou
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Create firmware and software steering meter color support.
Allow matching on a meter color in both root and non-root groups.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   3 +
 drivers/net/mlx5/mlx5_flow_dv.c | 113 ++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 2d1a9dba27..99d3c40f36 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -208,6 +208,9 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ITEM_PORT_REPRESENTOR (UINT64_C(1) << 41)
 #define MLX5_FLOW_ITEM_REPRESENTED_PORT (UINT64_C(1) << 42)
 
+/* Meter color item */
+#define MLX5_FLOW_ITEM_METER_COLOR (UINT64_C(1) << 44)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 36059beb71..e1db68b532 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3676,6 +3676,69 @@ flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Validate METER_COLOR item.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] item
+ *   Item specification.
+ * @param[in] attr
+ *   Attributes of flow that includes this item.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_validate_item_meter_color(struct rte_eth_dev *dev,
+			   const struct rte_flow_item *item,
+			   const struct rte_flow_attr *attr __rte_unused,
+			   struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_item_meter_color *spec = item->spec;
+	const struct rte_flow_item_meter_color *mask = item->mask;
+	struct rte_flow_item_meter_color nic_mask = {
+		.color = RTE_COLORS
+	};
+	int ret;
+
+	if (priv->mtr_color_reg == REG_NON)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ITEM, item,
+					  "meter color register"
+					  " isn't available");
+	ret = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, error);
+	if (ret < 0)
+		return ret;
+	if (!spec)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+					  item->spec,
+					  "data cannot be empty");
+	if (spec->color > RTE_COLORS)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &spec->color,
+					  "meter color is invalid");
+	if (!mask)
+		mask = &rte_flow_item_meter_color_mask;
+	if (!mask->color)
+		return rte_flow_error_set(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ITEM_SPEC, NULL,
+					"mask cannot be zero");
+
+	ret = mlx5_flow_item_acceptable(item, (const uint8_t *)mask,
+				(const uint8_t *)&nic_mask,
+				sizeof(struct rte_flow_item_meter_color),
+				MLX5_ITEM_RANGE_NOT_ACCEPTED, error);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
 int
 flow_dv_encap_decap_match_cb(void *tool_ctx __rte_unused,
 			     struct mlx5_list_entry *entry, void *cb_ctx)
@@ -7410,6 +7473,13 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+			ret = flow_dv_validate_item_meter_color(dev, items,
+								attr, error);
+			if (ret < 0)
+				return ret;
+			last_item = MLX5_FLOW_ITEM_METER_COLOR;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -10485,6 +10555,45 @@ flow_dv_translate_item_flex(struct rte_eth_dev *dev, void *matcher, void *key,
 	mlx5_flex_flow_translate_item(dev, matcher, key, item, is_inner);
 }
 
+/**
+ * Add METER_COLOR item to matcher
+ *
+ * @param[in] dev
+ *   The device to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ */
+static void
+flow_dv_translate_item_meter_color(struct rte_eth_dev *dev, void *key,
+			    const struct rte_flow_item *item,
+			    uint32_t key_type)
+{
+	const struct rte_flow_item_meter_color *color_m = item->mask;
+	const struct rte_flow_item_meter_color *color_v = item->spec;
+	uint32_t value, mask;
+	int reg = REG_NON;
+
+	MLX5_ASSERT(color_v);
+	if (MLX5_ITEM_VALID(item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(item, key_type, color_v, color_m,
+		&rte_flow_item_meter_color_mask);
+	value = rte_col_2_mlx5_col(color_v->color);
+	mask = color_m ?
+		color_m->color : (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+	if (reg == REG_NON)
+		return;
+	flow_dv_match_meta_reg(key, (enum modify_reg)reg, value, mask);
+}
+
 static uint32_t matcher_zero[MLX5_ST_SZ_DW(fte_match_param)] = { 0 };
 
 #define HEADER_IS_ZERO(match_criteria, headers)				     \
@@ -13234,6 +13343,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		/* No other protocol should follow eCPRI layer. */
 		last_item = MLX5_FLOW_LAYER_ECPRI;
 		break;
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		flow_dv_translate_item_meter_color(dev, key, items, key_type);
+		last_item = MLX5_FLOW_ITEM_METER_COLOR;
+		break;
 	default:
 		break;
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 24/27] net/mlx5: add meter color flow matching in hws
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (22 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 23/27] net/mlx5: add meter color flow matching in dv Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 25/27] net/mlx5: implement profile/policy get Suanming Mou
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Create hardware steering meter color support.
Allow matching on a meter color using hardware steering.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |  1 +
 drivers/net/mlx5/mlx5_flow_dv.c | 32 ++++++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow_hw.c | 12 ++++++++++++
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 99d3c40f36..514903dbe1 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1526,6 +1526,7 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 */
 		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
 		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index e1db68b532..0785734217 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1387,6 +1387,7 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 		return inherit < 0 ? 0 : inherit;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+	case RTE_FLOW_FIELD_METER_COLOR
 		return 2;
 	default:
 		MLX5_ASSERT(false);
@@ -1846,6 +1847,31 @@ mlx5_flow_field_id_to_modify_info
 				info[idx].offset = data->offset;
 		}
 		break;
+	case RTE_FLOW_FIELD_METER_COLOR:
+		{
+			const uint32_t color_mask =
+				(UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = flow_hw_get_reg_id
+					(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						       0, error);
+			if (reg < 0)
+				return;
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT((unsigned int)reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0,
+						reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, color_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -1893,7 +1919,7 @@ flow_dv_convert_action_modify_field
 	uint32_t type, meta = 0;
 
 	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
-	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
+	    conf->src.field == RTE_FLOW_FIELD_VALUE) {/
 		type = MLX5_MODIFICATION_TYPE_SET;
 		/** For SET fill the destination field (field) first. */
 		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
@@ -1902,7 +1928,9 @@ flow_dv_convert_action_modify_field
 		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
 					(void *)(uintptr_t)conf->src.pvalue :
 					(void *)(uintptr_t)&conf->src.value;
-		if (conf->dst.field == RTE_FLOW_FIELD_META) {
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR) {
 			meta = *(const unaligned_uint32_t *)item.spec;
 			meta = rte_cpu_to_be_32(meta);
 			item.spec = &meta;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 7e7b48f884..87b3e34cb4 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -870,6 +870,7 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
 		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
@@ -1702,6 +1703,7 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
 	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
@@ -3704,6 +3706,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 								  " attribute");
 			}
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		{
+			int reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported meter color register");
+			break;
+		}
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 25/27] net/mlx5: implement profile/policy get
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (23 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 24/27] net/mlx5: add meter color flow matching in hws Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 26/27] net/mlx5: implement METER MARK action for HWS Suanming Mou
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Add callback functions for both software and hardware steering
to get a pointers to Meter profile/policy by their IDs.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_meter.c | 65 ++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index 7221bfb642..893dc42cef 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -741,6 +741,36 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR profile.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_profile *
+mlx5_flow_meter_profile_get(struct rte_eth_dev *dev,
+			  uint32_t meter_profile_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_profile_find(priv,
+							meter_profile_id);
+}
+
 /**
  * Callback to add MTR profile with HWS.
  *
@@ -1303,6 +1333,37 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR policy.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_policy *
+mlx5_flow_meter_policy_get(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t policy_idx;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_policy_find(dev, policy_id,
+							      &policy_idx);
+}
+
 /**
  * Callback to delete MTR policy for HWS.
  *
@@ -2554,9 +2615,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_create,
 	.destroy = mlx5_flow_meter_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2571,9 +2634,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_hws_create,
 	.destroy = mlx5_flow_meter_hws_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 26/27] net/mlx5: implement METER MARK action for HWS
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (24 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 25/27] net/mlx5: implement profile/policy get Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-23 14:43 ` [PATCH 27/27] net/mlx5: implement METER MARK indirect " Suanming Mou
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Implement METER_MARK action for hardware steering case.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |   9 ++-
 drivers/net/mlx5/mlx5_flow.h       |   2 +
 drivers/net/mlx5/mlx5_flow_aso.c   |   7 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 116 +++++++++++++++++++++++++++--
 drivers/net/mlx5/mlx5_flow_meter.c | 107 ++++++++++++++++++--------
 5 files changed, 204 insertions(+), 37 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ec08014832..ff02d4cf13 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -969,12 +969,16 @@ enum mlx5_aso_mtr_type {
 
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
-	LIST_ENTRY(mlx5_aso_mtr) next;
+	union {
+		LIST_ENTRY(mlx5_aso_mtr) next;
+		struct mlx5_aso_mtr_pool *pool;
+	};
 	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
 	uint32_t offset;
+	enum rte_color init_color;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -983,6 +987,8 @@ struct mlx5_aso_mtr_pool {
 	/*Must be the first in pool*/
 	struct mlx5_devx_obj *devx_obj;
 	/* The devx object of the minimum aso flow meter ID. */
+	struct mlx5dr_action *action; /* HWS action. */
+	struct mlx5_indexed_pool *idx_pool; /* HWS index pool. */
 	uint32_t index; /* Pool index in management structure. */
 };
 
@@ -1670,6 +1676,7 @@ struct mlx5_priv {
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 	struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
 	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
+	struct mlx5_aso_mtr_pool *hws_mpool; /* Meter mark indexed pool. */
 #endif
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 514903dbe1..e1eb0ab697 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1112,6 +1112,7 @@ struct rte_flow_hw {
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	struct mlx5dr_rule rule; /* HWS layer data struct. */
 	uint32_t cnt_id;
+	uint32_t mtr_id;
 } __rte_packed;
 
 /* rte flow action translate to DR action struct. */
@@ -1241,6 +1242,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
+	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 34fed3f4b8..8bb7d4ef39 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -700,8 +700,11 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
-		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-				    mtrs[aso_mtr->offset]);
+		if (likely(sh->config.dv_flow_en == 2))
+			pool = aso_mtr->pool;
+		else
+			pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+					    mtrs[aso_mtr->offset]);
 		id = pool->devx_obj->id;
 	} else {
 		id = bulk->devx_obj->id;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 87b3e34cb4..90a6c0c78f 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -395,6 +395,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
 		acts->cnt_id = 0;
 	}
+	if (acts->mtr_id) {
+		mlx5_ipool_free(priv->hws_mpool->idx_pool, acts->mtr_id);
+		acts->mtr_id = 0;
+	}
 }
 
 /**
@@ -1096,6 +1100,70 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 #endif
 }
 
+static __rte_always_inline struct mlx5_aso_mtr *
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
+			   const struct rte_flow_action *action)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_action_meter_mark *meter_mark = action->conf;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t mtr_id;
+
+	aso_mtr = mlx5_ipool_malloc(priv->hws_mpool->idx_pool, &mtr_id);
+	if (!aso_mtr)
+		return NULL;
+	/* Fill the flow meter parameters. */
+	aso_mtr->type = ASO_METER_INDIRECT;
+	fm = &aso_mtr->fm;
+	fm->meter_id = mtr_id;
+	fm->profile = (struct mlx5_flow_meter_profile *)(meter_mark->profile);
+	fm->is_enable = meter_mark->state;
+	fm->color_aware = meter_mark->color_mode;
+	aso_mtr->pool = pool;
+	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->offset = mtr_id - 1;
+	aso_mtr->init_color = (meter_mark->color_mode) ?
+		meter_mark->init_color : RTE_COLOR_GREEN;
+	/* Update ASO flow meter by wqe. */
+	if (mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr, &priv->mtr_bulk)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	/* Wait for ASO object completion. */
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	return aso_mtr;
+}
+
+static __rte_always_inline int
+flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
+			   uint16_t aso_mtr_pos,
+			   const struct rte_flow_action *action,
+			   struct mlx5dr_rule_action *acts,
+			   uint32_t *index)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+
+	aso_mtr = flow_hw_meter_mark_alloc(dev, action);
+	if (!aso_mtr)
+		return -1;
+
+	/* Compile METER_MARK action */
+	acts[aso_mtr_pos].action = pool->action;
+	acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts[aso_mtr_pos].aso_meter.init_color =
+		(enum mlx5dr_action_aso_meter_color)
+		rte_col_2_mlx5_col(aso_mtr->init_color);
+	*index = aso_mtr->fm.meter_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1403,6 +1471,23 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			action_pos = at->actions_off[actions - action_start];
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter_mark *)
+			     masks->conf)->profile) {
+				ret = flow_hw_meter_mark_compile(dev,
+							action_pos, actions,
+							acts->rule_acts,
+							&acts->mtr_id);
+				if (ret)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1788,7 +1873,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	size_t encap_len = 0;
 	int ret;
 	struct mlx5_aso_mtr *mtr;
-	uint32_t mtr_id;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
@@ -1822,6 +1906,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		struct mlx5_hrxq *hrxq;
 		uint32_t ct_idx;
 		cnt_id_t cnt_id;
+		uint32_t mtr_id;
 
 		action = &actions[act_data->action_src];
 		/*
@@ -1928,13 +2013,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			meter = action->conf;
 			mtr_id = meter->mtr_id;
-			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			aso_mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
 			rule_acts[act_data->action_dst].action =
 				priv->mtr_bulk.action;
 			rule_acts[act_data->action_dst].aso_meter.offset =
-								mtr->offset;
+								aso_mtr->offset;
 			jump = flow_hw_jump_action_register
-				(dev, &table->cfg, mtr->fm.group, NULL);
+				(dev, &table->cfg, aso_mtr->fm.group, NULL);
 			if (!jump)
 				return -1;
 			MLX5_ASSERT
@@ -1944,7 +2029,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+			if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
 				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
@@ -1980,6 +2065,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 					       &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			ret = flow_hw_meter_mark_compile(dev,
+				act_data->action_dst, action,
+				rule_acts, &job->flow->mtr_id);
+			if (ret != 0)
+				return ret;
+			break;
 		default:
 			break;
 		}
@@ -2242,6 +2334,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
@@ -2266,6 +2359,10 @@ flow_hw_pull(struct rte_eth_dev *dev,
 						&job->flow->cnt_id);
 				job->flow->cnt_id = 0;
 			}
+			if (job->flow->mtr_id) {
+				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
+				job->flow->mtr_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -3059,6 +3156,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -3243,6 +3343,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index 893dc42cef..1c8bb5fc8c 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -17,6 +17,13 @@
 
 static int mlx5_flow_meter_disable(struct rte_eth_dev *dev,
 		uint32_t meter_id, struct rte_mtr_error *error);
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_MTR_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_MTR_IPOOL_CACHE_MIN (1 << 9)
 
 static void
 mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
@@ -31,6 +38,11 @@ mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 		mlx5_free(priv->mtr_profile_arr);
 		priv->mtr_profile_arr = NULL;
 	}
+	if (priv->hws_mpool) {
+		mlx5_ipool_destroy(priv->hws_mpool->idx_pool);
+		mlx5_free(priv->hws_mpool);
+		priv->hws_mpool = NULL;
+	}
 	if (priv->mtr_bulk.aso) {
 		mlx5_free(priv->mtr_bulk.aso);
 		priv->mtr_bulk.aso = NULL;
@@ -62,27 +74,39 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	uint32_t i;
 	struct rte_mtr_error error;
 	uint32_t flags;
+	uint32_t nb_mtrs = rte_align32pow2(nb_meters);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_mtr),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.max_idx = nb_meters,
+		.free = mlx5_free,
+		.type = "mlx5_hw_mtr_mark_action",
+	};
 
 	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
 		ret = ENOTSUP;
 		rte_mtr_error_set(&error, ENOMEM,
-					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter configuration is invalid.");
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter configuration is invalid.");
 		goto err;
 	}
 	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
 		ret = ENOTSUP;
 		rte_mtr_error_set(&error, ENOMEM,
-					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO is not supported.");
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO is not supported.");
 		goto err;
 	}
 	priv->mtr_config.nb_meters = nb_meters;
 	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
 		ret = ENOMEM;
 		rte_mtr_error_set(&error, ENOMEM,
-					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO queue allocation failed.");
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO queue allocation failed.");
 		goto err;
 	}
 	log_obj_size = rte_log2_u32(nb_meters >> 1);
@@ -92,8 +116,8 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (!dcs) {
 		ret = ENOMEM;
 		rte_mtr_error_set(&error, ENOMEM,
-					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO object allocation failed.");
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO object allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.devx_obj = dcs;
@@ -101,8 +125,8 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (reg_id < 0) {
 		ret = ENOTSUP;
 		rte_mtr_error_set(&error, ENOMEM,
-					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter register is not available.");
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter register is not available.");
 		goto err;
 	}
 	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
@@ -114,19 +138,20 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (!priv->mtr_bulk.action) {
 		ret = ENOMEM;
 		rte_mtr_error_set(&error, ENOMEM,
-					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter action creation failed.");
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter action creation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
-						sizeof(struct mlx5_aso_mtr) * nb_meters,
-						RTE_CACHE_LINE_SIZE,
-						SOCKET_ID_ANY);
+					 sizeof(struct mlx5_aso_mtr) *
+					 nb_meters,
+					 RTE_CACHE_LINE_SIZE,
+					 SOCKET_ID_ANY);
 	if (!priv->mtr_bulk.aso) {
 		ret = ENOMEM;
 		rte_mtr_error_set(&error, ENOMEM,
-					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter bulk ASO allocation failed.");
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter bulk ASO allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.size = nb_meters;
@@ -137,32 +162,56 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		aso->offset = i;
 		aso++;
 	}
+	priv->hws_mpool = mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_aso_mtr_pool),
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	if (!priv->hws_mpool) {
+		ret = ENOMEM;
+		rte_mtr_error_set(&error, ENOMEM,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ipool allocation failed.");
+		goto err;
+	}
+	priv->hws_mpool->devx_obj = priv->mtr_bulk.devx_obj;
+	priv->hws_mpool->action = priv->mtr_bulk.action;
+	/*
+	 * No need for local cache if Meter number is a small number.
+	 * Since flow insertion rate will be very limited in that case.
+	 * Here let's set the number to less than default trunk size 4K.
+	 */
+	if (nb_mtrs <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_mtrs;
+	} else if (nb_mtrs <= MLX5_MTR_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_MTR_IPOOL_CACHE_MIN;
+	}
+	priv->hws_mpool->idx_pool = mlx5_ipool_create(&cfg);
 	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
 	priv->mtr_profile_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_profile) *
-				nb_meter_profiles,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_profile) *
+			    nb_meter_profiles,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_profile_arr) {
 		ret = ENOMEM;
 		rte_mtr_error_set(&error, ENOMEM,
-					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter profile allocation failed.");
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter profile allocation failed.");
 		goto err;
 	}
 	priv->mtr_config.nb_meter_policies = nb_meter_policies;
 	priv->mtr_policy_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_policy) *
-				nb_meter_policies,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_policy) *
+			    nb_meter_policies,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_policy_arr) {
 		ret = ENOMEM;
 		rte_mtr_error_set(&error, ENOMEM,
-					RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter policy allocation failed.");
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter policy allocation failed.");
 		goto err;
 	}
 	return 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH 27/27] net/mlx5: implement METER MARK indirect action for HWS
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (25 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 26/27] net/mlx5: implement METER MARK action for HWS Suanming Mou
@ 2022-09-23 14:43 ` Suanming Mou
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-23 14:43 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Add ability to create an indirect action handle for METER_MARK.
It allows to share one Meter between several different actions.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c    |   6 ++
 drivers/net/mlx5/mlx5_flow.h    |  25 ++++-
 drivers/net/mlx5/mlx5_flow_hw.c | 160 +++++++++++++++++++++++++++++++-
 3 files changed, 183 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index cbf9c31984..9627ffc979 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -4221,6 +4221,12 @@ flow_action_handles_translate(struct rte_eth_dev *dev,
 						MLX5_RTE_FLOW_ACTION_TYPE_COUNT;
 			translated[handle->index].conf = (void *)(uintptr_t)idx;
 			break;
+		case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+			translated[handle->index].type =
+						(enum rte_flow_action_type)
+						MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK;
+			translated[handle->index].conf = (void *)(uintptr_t)idx;
+			break;
 		case MLX5_INDIRECT_ACTION_TYPE_AGE:
 			if (priv->sh->flow_hit_aso_en) {
 				translated[handle->index].type =
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index e1eb0ab697..30b8e1df99 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -47,6 +47,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
 	MLX5_RTE_FLOW_ACTION_TYPE_JUMP,
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
+	MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
 };
 
 /* Private (internal) Field IDs for MODIFY_FIELD action. */
@@ -55,22 +56,35 @@ enum mlx5_rte_flow_field_id {
 			MLX5_RTE_FLOW_FIELD_META_REG,
 };
 
-#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
+#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
 	MLX5_INDIRECT_ACTION_TYPE_COUNT,
 	MLX5_INDIRECT_ACTION_TYPE_CT,
+	MLX5_INDIRECT_ACTION_TYPE_METER_MARK,
 };
 
-/* Now, the maximal ports will be supported is 256, action number is 4M. */
-#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x100
+enum MLX5_SET_MATCHER {
+	MLX5_SET_MATCHER_SW_V = 1 << 0,
+	MLX5_SET_MATCHER_SW_M = 1 << 1,
+	MLX5_SET_MATCHER_HS_V = 1 << 2,
+	MLX5_SET_MATCHER_HS_M = 1 << 3,
+};
+
+#define MLX5_SET_MATCHER_SW (MLX5_SET_MATCHER_SW_V | MLX5_SET_MATCHER_SW_M)
+#define MLX5_SET_MATCHER_HS (MLX5_SET_MATCHER_HS_V | MLX5_SET_MATCHER_HS_M)
+#define MLX5_SET_MATCHER_V (MLX5_SET_MATCHER_SW_V | MLX5_SET_MATCHER_HS_V)
+#define MLX5_SET_MATCHER_M (MLX5_SET_MATCHER_SW_M | MLX5_SET_MATCHER_HS_M)
+
+/* Now, the maximal ports will be supported is 16, action number is 32M. */
+#define MLX5_ACTION_CTX_CT_MAX_PORT 0x10
 
 #define MLX5_INDIRECT_ACT_CT_OWNER_SHIFT 22
 #define MLX5_INDIRECT_ACT_CT_OWNER_MASK (MLX5_INDIRECT_ACT_CT_MAX_PORT - 1)
 
-/* 30-31: type, 22-29: owner port, 0-21: index. */
+/* 29-31: type, 25-28: owner port, 0-24: index */
 #define MLX5_INDIRECT_ACT_CT_GEN_IDX(owner, index) \
 	((MLX5_INDIRECT_ACTION_TYPE_CT << MLX5_INDIRECT_ACTION_TYPE_OFFSET) | \
 	 (((owner) & MLX5_INDIRECT_ACT_CT_OWNER_MASK) << \
@@ -1159,6 +1173,9 @@ struct mlx5_action_construct_data {
 		struct {
 			uint32_t id;
 		} shared_counter;
+		struct {
+			uint32_t id;
+		} shared_meter;
 	};
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 90a6c0c78f..e114bf11c1 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -615,6 +615,42 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared meter_mark action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] mtr_id
+ *   Shared meter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_mtr_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t mtr_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_meter.id = mtr_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
 
 /**
  * Translate shared indirect action.
@@ -668,6 +704,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 		if (flow_hw_ct_compile(dev, idx, &acts->rule_acts[action_dst]))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		if (__flow_hw_act_data_shared_mtr_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
+			action_src, action_dst, idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1682,8 +1725,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
+	struct mlx5_aso_mtr *aso_mtr;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
@@ -1719,6 +1764,17 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 		if (flow_hw_ct_compile(dev, idx, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return -1;
+		rule_act->action = pool->action;
+		rule_act->aso_meter.offset = aso_mtr->offset;
+		rule_act->aso_meter.init_color =
+			(enum mlx5dr_action_aso_meter_color)
+			rte_col_2_mlx5_col(aso_mtr->init_color);
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1856,6 +1912,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_actions_template *at = hw_at->action_template;
@@ -2065,6 +2122,21 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 					       &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK:
+			mtr_id = act_data->shared_meter.id &
+				((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+			/* Find ASO object. */
+			aso_mtr = mlx5_ipool_get(pool->idx_pool, mtr_id);
+			if (!aso_mtr)
+				return -1;
+			rule_acts[act_data->action_dst].action =
+							pool->action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+							aso_mtr->offset;
+			rule_acts[act_data->action_dst].aso_meter.init_color =
+				(enum mlx5dr_action_aso_meter_color)
+				rte_col_2_mlx5_col(aso_mtr->init_color);
+			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
 			ret = flow_hw_meter_mark_compile(dev,
 				act_data->action_dst, action,
@@ -3252,6 +3324,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_METER;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -5793,7 +5870,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
+	uint32_t mtr_id;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5812,6 +5891,14 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		aso_mtr = flow_hw_meter_mark_alloc(dev, action);
+		if (!aso_mtr)
+			break;
+		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
+		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5847,18 +5934,58 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
-	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
-
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_update_meter_mark *upd_meter_mark =
+		(const struct rte_flow_update_meter_mark *)update;
+	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		meter_mark = &upd_meter_mark->meter_mark;
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark update index");
+		fm = &aso_mtr->fm;
+		if (upd_meter_mark->profile_valid)
+			fm->profile = (struct mlx5_flow_meter_profile *)
+							(meter_mark->profile);
+		if (upd_meter_mark->color_mode_valid)
+			fm->color_aware = meter_mark->color_mode;
+		if (upd_meter_mark->init_color_valid)
+			aso_mtr->init_color = (meter_mark->color_mode) ?
+				meter_mark->init_color : RTE_COLOR_GREEN;
+		if (upd_meter_mark->state_valid)
+			fm->is_enable = meter_mark->state;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						 &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		return 0;
 	default:
-		return flow_dv_action_update(dev, handle, update, error);
+		break;
 	}
+	return flow_dv_action_update(dev, handle, update, error);
 }
 
 /**
@@ -5889,7 +6016,11 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5899,6 +6030,27 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark destroy index");
+		fm = &aso_mtr->fm;
+		fm->is_enable = 0;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						 &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		mlx5_ipool_free(pool->idx_pool, idx);
+		return 0;
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 00/17] net/mlx5: HW steering PMD update
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (26 preceding siblings ...)
  2022-09-23 14:43 ` [PATCH 27/27] net/mlx5: implement METER MARK indirect " Suanming Mou
@ 2022-09-28  3:31 ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 01/17] net/mlx5: fix invalid flow attributes Suanming Mou
                     ` (16 more replies)
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                   ` (3 subsequent siblings)
  31 siblings, 17 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  Cc: dev, rasland, orika

The skeleton of mlx5 HW steering(HWS) has been updated into
upstream for pretty a long time, but not updated anymore due
to missing of the low-level steering layer code. Luckily,
better late than never, the steering layer finnaly comes[1].

This series will add more features to the existing PMD code:
 - FDB and metadata copy.
 - Modify field.
 - Meter color.
 - Counter.
 - Aging.
 - Action template pre-parser optimization.
 - Connection tracking.

Some features such as meter/aging/ct touches the public API,
and public API changes have been sent to ML much earily in
other threads in order not to be swallowed by this big series.

The dpends patches as below:
 [1]https://patches.dpdk.org/project/dpdk/cover/20220922190345.394-1-valex@nvidia.com/
 [2]https://patches.dpdk.org/project/dpdk/cover/20220921021133.2982954-1-akozyrev@nvidia.com/
 [3]https://patches.dpdk.org/project/dpdk/cover/20220921145409.511328-1-michaelba@nvidia.com/
 [4]https://patches.dpdk.org/project/dpdk/patch/20220920071036.20878-1-suanmingm@nvidia.com/
 [5]https://patches.dpdk.org/project/dpdk/patch/20220920071141.21769-1-suanmingm@nvidia.com/
 [6]https://patches.dpdk.org/project/dpdk/patch/20220921143202.1790802-1-dsosnowski@nvidia.com/

---

 v2:
  - Remove the rte_flow patches as they will be integrated in other thread.
  - Fix compilation issues.
  - Make the patches be better organized.
   
---

Alexander Kozyrev (2):
  net/mlx5: add HW steering meter action
  net/mlx5: implement METER MARK indirect action for HWS

Bing Zhao (1):
  net/mlx5: add extended metadata mode for hardware steering

Dariusz Sosnowski (4):
  net/mlx5: add HW steering port action
  net/mlx5: support DR action template API
  net/mlx5: support device control for E-Switch default rule
  net/mlx5: support device control of representor matching

Gregory Etelson (2):
  net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  net/mlx5: support flow integrity in HWS group 0

Michael Baum (1):
  net/mlx5: add HWS AGE action support

Suanming Mou (6):
  net/mlx5: fix invalid flow attributes
  net/mlx5: fix IPv6 and TCP RSS hash fields
  net/mlx5: add shared header reformat support
  net/mlx5: add modify field hws support
  net/mlx5: add HW steering connection tracking support
  net/mlx5: add async action push and pull support

Xiaoyu Min (1):
  net/mlx5: add HW steering counter action

 doc/guides/nics/mlx5.rst             |    9 +
 drivers/common/mlx5/mlx5_devx_cmds.c |   50 +
 drivers/common/mlx5/mlx5_devx_cmds.h |   27 +
 drivers/common/mlx5/mlx5_prm.h       |   64 +-
 drivers/common/mlx5/version.map      |    1 +
 drivers/net/mlx5/linux/mlx5_os.c     |   76 +-
 drivers/net/mlx5/meson.build         |    1 +
 drivers/net/mlx5/mlx5.c              |  126 +-
 drivers/net/mlx5/mlx5.h              |  318 +-
 drivers/net/mlx5/mlx5_defs.h         |    5 +
 drivers/net/mlx5/mlx5_flow.c         |  415 +-
 drivers/net/mlx5/mlx5_flow.h         |  312 +-
 drivers/net/mlx5/mlx5_flow_aso.c     |  793 ++-
 drivers/net/mlx5/mlx5_flow_dv.c      | 1115 ++--
 drivers/net/mlx5/mlx5_flow_hw.c      | 7874 +++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow_meter.c   |  771 ++-
 drivers/net/mlx5/mlx5_flow_verbs.c   |    8 +-
 drivers/net/mlx5/mlx5_hws_cnt.c      | 1198 ++++
 drivers/net/mlx5/mlx5_hws_cnt.h      |  703 +++
 drivers/net/mlx5/mlx5_trigger.c      |  254 +-
 drivers/net/mlx5/mlx5_tx.h           |    1 +
 drivers/net/mlx5/mlx5_txq.c          |   47 +
 drivers/net/mlx5/rte_pmd_mlx5.h      |   17 +
 drivers/net/mlx5/version.map         |    1 +
 24 files changed, 12536 insertions(+), 1650 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 01/17] net/mlx5: fix invalid flow attributes
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 02/17] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the function flow_get_drv_type(), attr will be read in non-HWS mode.
In case user call the HWS API in SWS mode, attr should be placed in
HWS functions, or it will cause crash.

Fixes: 572801ab860f ("ethdev: backport upstream rte_flow_async codes")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c | 38 ++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 45109001ca..3abb39aa92 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -3740,6 +3740,8 @@ flow_get_drv_type(struct rte_eth_dev *dev, const struct rte_flow_attr *attr)
 	 */
 	if (priv->sh->config.dv_flow_en == 2)
 		return MLX5_FLOW_TYPE_HW;
+	if (!attr)
+		return MLX5_FLOW_TYPE_MIN;
 	/* If no OS specific type - continue with DV/VERBS selection */
 	if (attr->transfer && priv->sh->config.dv_esw_en)
 		type = MLX5_FLOW_TYPE_DV;
@@ -8252,8 +8254,9 @@ mlx5_flow_info_get(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8287,8 +8290,9 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8319,8 +8323,9 @@ mlx5_flow_pattern_template_create(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8350,8 +8355,9 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8385,8 +8391,9 @@ mlx5_flow_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8416,8 +8423,9 @@ mlx5_flow_actions_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8457,8 +8465,9 @@ mlx5_flow_table_create(struct rte_eth_dev *dev,
 		       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8494,8 +8503,9 @@ mlx5_flow_table_destroy(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8542,8 +8552,9 @@ mlx5_flow_async_flow_create(struct rte_eth_dev *dev,
 			    struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8585,8 +8596,9 @@ mlx5_flow_async_flow_destroy(struct rte_eth_dev *dev,
 			     struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8621,8 +8633,9 @@ mlx5_flow_pull(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8650,8 +8663,9 @@ mlx5_flow_push(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 02/17] net/mlx5: fix IPv6 and TCP RSS hash fields
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 01/17] net/mlx5: fix invalid flow attributes Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 03/17] net/mlx5: add shared header reformat support Suanming Mou
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the flow_dv_hashfields_set() function, while item_flags was 0,
the code went directly to the first if and the else case would
never have chance be checked. This caused the IPv6 and TCP hash
fileds in the else case would never be set.

This commit adds the dedicate HW steering hash field set function
to generate the RSS hash fields.

Fixes: 6540da0b93b5 ("net/mlx5: fix RSS scaling issue")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 12 +++----
 drivers/net/mlx5/mlx5_flow_hw.c | 59 ++++++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index a4c59f3762..a9b0e84736 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -11302,8 +11302,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		rss_inner = 1;
 #endif
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV4)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4)) ||
-	     !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4))) {
 		if (rss_types & MLX5_IPV4_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV4;
@@ -11313,8 +11312,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_IPV4_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV6)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6))) {
 		if (rss_types & MLX5_IPV6_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV6;
@@ -11337,8 +11335,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		return;
 	}
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_UDP)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP)) ||
-	    !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP))) {
 		if (rss_types & RTE_ETH_RSS_UDP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_UDP;
@@ -11348,8 +11345,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_UDP_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_TCP)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP))) {
 		if (rss_types & RTE_ETH_RSS_TCP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_TCP;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 7343d59f1f..46c4169b4f 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -62,6 +62,63 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	priv->mark_enabled = enable;
 }
 
+/**
+ * Set the hash fields according to the @p rss_desc information.
+ *
+ * @param[in] rss_desc
+ *   Pointer to the mlx5_flow_rss_desc.
+ * @param[out] hash_fields
+ *   Pointer to the RSS hash fields.
+ */
+static void
+flow_hw_hashfields_set(struct mlx5_flow_rss_desc *rss_desc,
+		       uint64_t *hash_fields)
+{
+	uint64_t fields = 0;
+	int rss_inner = 0;
+	uint64_t rss_types = rte_eth_rss_hf_refine(rss_desc->types);
+
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	if (rss_desc->level >= 2)
+		rss_inner = 1;
+#endif
+	if (rss_types & MLX5_IPV4_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV4;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV4;
+		else
+			fields |= MLX5_IPV4_IBV_RX_HASH;
+	} else if (rss_types & MLX5_IPV6_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV6;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV6;
+		else
+			fields |= MLX5_IPV6_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_UDP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_UDP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_UDP;
+		else
+			fields |= MLX5_UDP_IBV_RX_HASH;
+	} else if (rss_types & RTE_ETH_RSS_TCP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_TCP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_TCP;
+		else
+			fields |= MLX5_TCP_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_ESP)
+		fields |= IBV_RX_HASH_IPSEC_SPI;
+	if (rss_inner)
+		fields |= IBV_RX_HASH_INNER;
+	*hash_fields = fields;
+}
+
 /**
  * Generate the pattern item flags.
  * Will be used for shared RSS action.
@@ -225,7 +282,7 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 		       MLX5_RSS_HASH_KEY_LEN);
 		rss_desc.key_len = MLX5_RSS_HASH_KEY_LEN;
 		rss_desc.types = !rss->types ? RTE_ETH_RSS_IP : rss->types;
-		flow_dv_hashfields_set(0, &rss_desc, &rss_desc.hash_fields);
+		flow_hw_hashfields_set(&rss_desc, &rss_desc.hash_fields);
 		flow_dv_action_rss_l34_hash_adjust(rss->types,
 						   &rss_desc.hash_fields);
 		if (rss->level > 1) {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 03/17] net/mlx5: add shared header reformat support
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 01/17] net/mlx5: fix invalid flow attributes Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 02/17] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 04/17] net/mlx5: add modify field hws support Suanming Mou
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

As the rte_flow_async API defines, the action mask with field value
not be 0 means the action will be used as shared in all the flows in
the table.

The header reformat action with action mask field not be 0 will be
created as constant shared action. For encapsulation header reformat
action, there are two kinds of encapsulation data, raw_encap_data
and rte_flow_item encap_data. Both of these two kinds of data can be
identified from the action mask conf as constant or not.

Examples:
1. VXLAN encap (encap_data: rte_flow_item)
	action conf (eth/ipv4/udp/vxlan_hdr)

	a. action mask conf (eth/ipv4/udp/vxlan_hdr)
	  - items are constant.
	b. action mask conf (NULL)
	  - items will change.

2. RAW encap (encap_data: raw)
	action conf (raw_data)

	a. action mask conf (not NULL)
	  - encap_data constant.
	b. action mask conf (NULL)
	  - encap_data will change.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   6 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 124 ++++++++++----------------------
 2 files changed, 39 insertions(+), 91 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 4b53912b79..1c9f5fc1d5 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1064,10 +1064,6 @@ struct mlx5_action_construct_data {
 	uint16_t action_dst; /* mlx5dr_rule_action dst offset. */
 	union {
 		struct {
-			/* encap src(item) offset. */
-			uint16_t src;
-			/* encap dst data offset. */
-			uint16_t dst;
 			/* encap data len. */
 			uint16_t len;
 		} encap;
@@ -1110,6 +1106,8 @@ struct mlx5_hw_jump_action {
 /* Encap decap action struct. */
 struct mlx5_hw_encap_decap_action {
 	struct mlx5dr_action *action; /* Action object. */
+	/* Is header_reformat action shared across flows in table. */
+	bool shared;
 	size_t data_size; /* Action metadata size. */
 	uint8_t data[]; /* Action data. */
 };
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 46c4169b4f..b6978bd051 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -402,10 +402,6 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
  *   Offset of source rte flow action.
  * @param[in] action_dst
  *   Offset of destination DR action.
- * @param[in] encap_src
- *   Offset of source encap raw data.
- * @param[in] encap_dst
- *   Offset of destination encap raw data.
  * @param[in] len
  *   Length of the data to be updated.
  *
@@ -418,16 +414,12 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				enum rte_flow_action_type type,
 				uint16_t action_src,
 				uint16_t action_dst,
-				uint16_t encap_src,
-				uint16_t encap_dst,
 				uint16_t len)
 {	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
 		return -1;
-	act_data->encap.src = encap_src;
-	act_data->encap.dst = encap_dst;
 	act_data->encap.len = len;
 	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
 	return 0;
@@ -523,53 +515,6 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
-/**
- * Translate encap items to encapsulation list.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev data structure.
- * @param[in] acts
- *   Pointer to the template HW steering DR actions.
- * @param[in] type
- *   Action type.
- * @param[in] action_src
- *   Offset of source rte flow action.
- * @param[in] action_dst
- *   Offset of destination DR action.
- * @param[in] items
- *   Encap item pattern.
- * @param[in] items_m
- *   Encap item mask indicates which part are constant and dynamic.
- *
- * @return
- *    0 on success, negative value otherwise and rte_errno is set.
- */
-static __rte_always_inline int
-flow_hw_encap_item_translate(struct rte_eth_dev *dev,
-			     struct mlx5_hw_actions *acts,
-			     enum rte_flow_action_type type,
-			     uint16_t action_src,
-			     uint16_t action_dst,
-			     const struct rte_flow_item *items,
-			     const struct rte_flow_item *items_m)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	size_t len, total_len = 0;
-	uint32_t i = 0;
-
-	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++, items_m++, i++) {
-		len = flow_dv_get_item_hdr_len(items->type);
-		if ((!items_m->spec ||
-		    memcmp(items_m->spec, items->spec, len)) &&
-		    __flow_hw_act_data_encap_append(priv, acts, type,
-						    action_src, action_dst, i,
-						    total_len, len))
-			return -1;
-		total_len += len;
-	}
-	return 0;
-}
-
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -611,7 +556,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
-	uint8_t *encap_data = NULL;
+	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	bool actions_end = false;
 	uint32_t type, i;
@@ -718,9 +663,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_vxlan_encap *)
-				 masks->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -729,9 +674,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_nvgre_encap *)
-				actions->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -743,6 +688,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data =
+				(const struct rte_flow_action_raw_encap *)
+				 masks->conf;
+			if (raw_encap_data)
+				encap_data_m = raw_encap_data->data;
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 actions->conf;
@@ -773,22 +723,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
+		bool shared_rfmt = true;
 
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
-			if (flow_dv_convert_encap_data
-				(enc_item, buf, &data_size, error) ||
-			    flow_hw_encap_item_translate
-				(dev, acts, (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos,
-				 enc_item, enc_item_m))
+			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
 				goto err;
 			encap_data = buf;
-		} else if (encap_data && __flow_hw_act_data_encap_append
-				(priv, acts,
-				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, 0, 0, data_size)) {
-			goto err;
+			if (!enc_item_m)
+				shared_rfmt = false;
+		} else if (encap_data && !encap_data_m) {
+			shared_rfmt = false;
 		}
 		acts->encap_decap = mlx5_malloc(MLX5_MEM_ZERO,
 				    sizeof(*acts->encap_decap) + data_size,
@@ -802,12 +747,22 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		acts->encap_decap->action = mlx5dr_action_create_reformat
 				(priv->dr_ctx, refmt_type,
 				 data_size, encap_data,
-				 rte_log2_u32(table_attr->nb_flows),
-				 mlx5_hw_act_flag[!!attr->group][type]);
+				 shared_rfmt ? 0 : rte_log2_u32(table_attr->nb_flows),
+				 mlx5_hw_act_flag[!!attr->group][type] |
+				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
 		acts->rule_acts[reformat_pos].action =
 						acts->encap_decap->action;
+		acts->rule_acts[reformat_pos].reformat.data =
+						acts->encap_decap->data;
+		if (shared_rfmt)
+			acts->rule_acts[reformat_pos].reformat.offset = 0;
+		else if (__flow_hw_act_data_encap_append(priv, acts,
+				 (action_start + reformat_src)->type,
+				 reformat_src, reformat_pos, data_size))
+			goto err;
+		acts->encap_decap->shared = shared_rfmt;
 		acts->encap_decap_pos = reformat_pos;
 	}
 	acts->acts_num = i;
@@ -972,6 +927,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			.ingress = 1,
 	};
 	uint32_t ft_flag;
+	size_t encap_len = 0;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -989,9 +945,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
-	if (hw_acts->encap_decap && hw_acts->encap_decap->data_size)
-		memcpy(buf, hw_acts->encap_decap->data,
-		       hw_acts->encap_decap->data_size);
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1050,23 +1003,20 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 action->conf;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   raw_encap_data->data, act_data->encap.len);
+			rte_memcpy((void *)buf, raw_encap_data->data, act_data->encap.len);
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
@@ -1074,7 +1024,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
-	if (hw_acts->encap_decap) {
+	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 04/17] net/mlx5: add modify field hws support
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (2 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 03/17] net/mlx5: add shared header reformat support Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 05/17] net/mlx5: add HW steering port action Suanming Mou
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

This patch introduces support for modify_field rte_flow actions in HWS
mode. Support includes:

- Ingress and egress domains,
	- SET and ADD operations,
	- usage of arbitrary bit offsets and widths for packet and metadata
	fields.

	Support is implemented in two phases:

	1. On flow table creation the hardware commands are generated, based
	on rte_flow action templates, and stored alongside action template.
	2. On flow rule creation/queueing the hardware commands are updated with
	values provided by the user. Any masks over immediate values, provided
	in action templates, are applied to these values before enqueueing rules
	for creation.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h   |   2 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +-
 drivers/net/mlx5/mlx5.h          |   1 +
 drivers/net/mlx5/mlx5_flow.h     |  96 +++++
 drivers/net/mlx5/mlx5_flow_dv.c  | 551 ++++++++++++++--------------
 drivers/net/mlx5/mlx5_flow_hw.c  | 595 ++++++++++++++++++++++++++++++-
 6 files changed, 988 insertions(+), 275 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index f832bd77cb..c82ec94465 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -746,6 +746,8 @@ enum mlx5_modification_field {
 	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
 	MLX5_MODI_GTP_TEID = 0x6E,
 	MLX5_MODI_OUT_IP_ECN = 0x73,
+	MLX5_MODI_TUNNEL_HDR_DW_1 = 0x75,
+	MLX5_MODI_GTPU_FIRST_EXT_DW_0 = 0x76,
 };
 
 /* Total number of metadata reg_c's. */
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index aed55e6a62..b7cc11a2ef 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1540,6 +1540,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				       mlx5_hrxq_clone_free_cb);
 	if (!priv->hrxqs)
 		goto error;
+	mlx5_set_metadata_mask(eth_dev);
+	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+	    !priv->sh->dv_regc0_mask) {
+		DRV_LOG(ERR, "metadata mode %u is not supported "
+			     "(no metadata reg_c[0] is available)",
+			     sh->config.dv_xmeta_en);
+			err = ENOTSUP;
+			goto error;
+	}
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
@@ -1566,15 +1575,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		err = -err;
 		goto error;
 	}
-	mlx5_set_metadata_mask(eth_dev);
-	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
-	    !priv->sh->dv_regc0_mask) {
-		DRV_LOG(ERR, "metadata mode %u is not supported "
-			     "(no metadata reg_c[0] is available)",
-			     sh->config.dv_xmeta_en);
-			err = ENOTSUP;
-			goto error;
-	}
 	/* Query availability of metadata reg_c's. */
 	if (!priv->sh->metadata_regc_check_flag) {
 		err = mlx5_flow_discover_mreg_c(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e855dc6ab5..a93af75baa 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -343,6 +343,7 @@ struct mlx5_hw_q_job {
 	struct rte_flow_hw *flow; /* Flow attached to the job. */
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
+	struct mlx5_modification_cmd *mhdr_cmd;
 };
 
 /* HW steering job descriptor LIFO pool. */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1c9f5fc1d5..0eab3a3797 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1007,6 +1007,51 @@ flow_items_to_tunnel(const struct rte_flow_item items[])
 	return items[0].spec;
 }
 
+/**
+ * Fetch 1, 2, 3 or 4 byte field from the byte array
+ * and return as unsigned integer in host-endian format.
+ *
+ * @param[in] data
+ *   Pointer to data array.
+ * @param[in] size
+ *   Size of field to extract.
+ *
+ * @return
+ *   converted field in host endian format.
+ */
+static inline uint32_t
+flow_dv_fetch_field(const uint8_t *data, uint32_t size)
+{
+	uint32_t ret;
+
+	switch (size) {
+	case 1:
+		ret = *data;
+		break;
+	case 2:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		break;
+	case 3:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		ret = (ret << 8) | *(data + sizeof(uint16_t));
+		break;
+	case 4:
+		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
+		break;
+	default:
+		MLX5_ASSERT(false);
+		ret = 0;
+		break;
+	}
+	return ret;
+}
+
+struct field_modify_info {
+	uint32_t size; /* Size of field in protocol header, in bytes. */
+	uint32_t offset; /* Offset of field in protocol header, in bytes. */
+	enum mlx5_modification_field id;
+};
+
 /* HW steering flow attributes. */
 struct mlx5_flow_attr {
 	uint32_t port_id; /* Port index. */
@@ -1067,6 +1112,29 @@ struct mlx5_action_construct_data {
 			/* encap data len. */
 			uint16_t len;
 		} encap;
+		struct {
+			/* Modify header action offset in pattern. */
+			uint16_t mhdr_cmds_off;
+			/* Offset in pattern after modify header actions. */
+			uint16_t mhdr_cmds_end;
+			/*
+			 * True if this action is masked and does not need to
+			 * be generated.
+			 */
+			bool shared;
+			/*
+			 * Modified field definitions in dst field (SET, ADD)
+			 * or src field (COPY).
+			 */
+			struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS];
+			/* Modified field definitions in dst field (COPY). */
+			struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS];
+			/*
+			 * Masks applied to field values to generate
+			 * PRM actions.
+			 */
+			uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS];
+		} modify_header;
 		struct {
 			uint64_t types; /* RSS hash types. */
 			uint32_t level; /* RSS level. */
@@ -1092,6 +1160,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 };
 
@@ -1112,6 +1181,22 @@ struct mlx5_hw_encap_decap_action {
 	uint8_t data[]; /* Action data. */
 };
 
+#define MLX5_MHDR_MAX_CMD ((MLX5_MAX_MODIFY_NUM) * 2 + 1)
+
+/* Modify field action struct. */
+struct mlx5_hw_modify_header_action {
+	/* Reference to DR action */
+	struct mlx5dr_action *action;
+	/* Modify header action position in action rule table. */
+	uint16_t pos;
+	/* Is MODIFY_HEADER action shared across flows in table. */
+	bool shared;
+	/* Amount of modification commands stored in the precompiled buffer. */
+	uint32_t mhdr_cmds_num;
+	/* Precompiled modification commands. */
+	struct mlx5_modification_cmd mhdr_cmds[MLX5_MHDR_MAX_CMD];
+};
+
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
@@ -1121,6 +1206,7 @@ struct mlx5_hw_actions {
 	LIST_HEAD(act_list, mlx5_action_construct_data) act_list;
 	struct mlx5_hw_jump_action *jump; /* Jump action. */
 	struct mlx5_hrxq *tir; /* TIR action. */
+	struct mlx5_hw_modify_header_action *mhdr; /* Modify header action. */
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
@@ -2201,6 +2287,16 @@ int flow_dv_action_query(struct rte_eth_dev *dev,
 size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type);
 int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf,
 			   size_t *size, struct rte_flow_error *error);
+void mlx5_flow_field_id_to_modify_info
+		(const struct rte_flow_action_modify_data *data,
+		 struct field_modify_info *info, uint32_t *mask,
+		 uint32_t width, struct rte_eth_dev *dev,
+		 const struct rte_flow_attr *attr, struct rte_flow_error *error);
+int flow_dv_convert_modify_action(struct rte_flow_item *item,
+			      struct field_modify_info *field,
+			      struct field_modify_info *dcopy,
+			      struct mlx5_flow_dv_modify_hdr_resource *resource,
+			      uint32_t type, struct rte_flow_error *error);
 
 #define MLX5_PF_VPORT_ID 0
 #define MLX5_ECPF_VPORT_ID 0xFFFE
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index a9b0e84736..e4af9d910b 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -241,12 +241,6 @@ rte_col_2_mlx5_col(enum rte_color rcol)
 	return MLX5_FLOW_COLOR_UNDEFINED;
 }
 
-struct field_modify_info {
-	uint32_t size; /* Size of field in protocol header, in bytes. */
-	uint32_t offset; /* Offset of field in protocol header, in bytes. */
-	enum mlx5_modification_field id;
-};
-
 struct field_modify_info modify_eth[] = {
 	{4,  0, MLX5_MODI_OUT_DMAC_47_16},
 	{2,  4, MLX5_MODI_OUT_DMAC_15_0},
@@ -379,45 +373,6 @@ mlx5_update_vlan_vid_pcp(const struct rte_flow_action *action,
 	}
 }
 
-/**
- * Fetch 1, 2, 3 or 4 byte field from the byte array
- * and return as unsigned integer in host-endian format.
- *
- * @param[in] data
- *   Pointer to data array.
- * @param[in] size
- *   Size of field to extract.
- *
- * @return
- *   converted field in host endian format.
- */
-static inline uint32_t
-flow_dv_fetch_field(const uint8_t *data, uint32_t size)
-{
-	uint32_t ret;
-
-	switch (size) {
-	case 1:
-		ret = *data;
-		break;
-	case 2:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		break;
-	case 3:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		ret = (ret << 8) | *(data + sizeof(uint16_t));
-		break;
-	case 4:
-		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
-		break;
-	default:
-		MLX5_ASSERT(false);
-		ret = 0;
-		break;
-	}
-	return ret;
-}
-
 /**
  * Convert modify-header action to DV specification.
  *
@@ -446,7 +401,7 @@ flow_dv_fetch_field(const uint8_t *data, uint32_t size)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-static int
+int
 flow_dv_convert_modify_action(struct rte_flow_item *item,
 			      struct field_modify_info *field,
 			      struct field_modify_info *dcopy,
@@ -1464,7 +1419,32 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static void
+static __rte_always_inline uint8_t
+flow_modify_info_mask_8(uint32_t length, uint32_t off)
+{
+	return (0xffu >> (8 - length)) << off;
+}
+
+static __rte_always_inline uint16_t
+flow_modify_info_mask_16(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_16((0xffffu >> (16 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_32((0xffffffffu >> (32 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mask)
+{
+	uint32_t mask = (0xffffffffu >> (32 - length)) << off;
+	return rte_cpu_to_be_32(mask & post_mask);
+}
+
+void
 mlx5_flow_field_id_to_modify_info
 		(const struct rte_flow_action_modify_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
@@ -1473,323 +1453,340 @@ mlx5_flow_field_id_to_modify_info
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint32_t idx = 0;
-	uint32_t off = 0;
-
-	switch (data->field) {
+	uint32_t off_be = 0;
+	uint32_t length = 0;
+	switch ((int)data->field) {
 	case RTE_FLOW_FIELD_START:
 		/* not supported yet */
 		MLX5_ASSERT(false);
 		break;
 	case RTE_FLOW_FIELD_MAC_DST:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_DMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_DMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_DMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_DMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_DMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_SRC:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_SMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_SMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_SMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_SMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_SMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VLAN_TYPE:
 		/* not supported yet */
 		break;
 	case RTE_FLOW_FIELD_VLAN_ID:
+		MLX5_ASSERT(data->offset + width <= 12);
+		off_be = 12 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_FIRST_VID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x0fff >> (12 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_TYPE:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_ETHERTYPE};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_TTL:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV4_TTL};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_SRC:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_SIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DST:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_DIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_HOPLIMIT:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV6_HOPLIMIT};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_SRC:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_SIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_SIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_SIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	case RTE_FLOW_FIELD_IPV6_SRC: {
+		/*
+		 * Fields corresponding to IPv6 source address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_SIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_SIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_SIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_SIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_DST:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_DIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_DIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_DIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	}
+	case RTE_FLOW_FIELD_IPV6_DST: {
+		/*
+		 * Fields corresponding to IPv6 destination address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_DIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_DIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_DIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_DIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
+	}
 	case RTE_FLOW_FIELD_TCP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_SEQ_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_SEQ_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_ACK_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_ACK_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_FLAGS:
+		MLX5_ASSERT(data->offset + width <= 9);
+		off_be = 9 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_FLAGS};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x1ff >> (9 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VXLAN_VNI:
-		/* not supported yet */
+		MLX5_ASSERT(data->offset + width <= 24);
+		/* VNI is on bits 31-8 of TUNNEL_HDR_DW_1. */
+		off_be = 24 - (data->offset + width) + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_TUNNEL_HDR_DW_1};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_GENEVE_VNI:
 		/* not supported yet*/
 		break;
 	case RTE_FLOW_FIELD_GTP_TEID:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_GTP_TEID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TAG:
 		{
-			int reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
-						   data->level, error);
+			MLX5_ASSERT(data->offset + width <= 32);
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = REG_C_1;
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
+							   data->level, error);
 			if (reg < 0)
 				return;
 			MLX5_ASSERT(reg != REG_NON);
@@ -1797,15 +1794,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] =
-					rte_cpu_to_be_32(0xffffffff >>
-							 (32 - width));
+				mask[idx] = flow_modify_info_mask_32
+					(width, data->offset);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_MARK:
 		{
 			uint32_t mark_mask = priv->sh->dv_mark_mask;
 			uint32_t mark_count = __builtin_popcount(mark_mask);
+			RTE_SET_USED(mark_count);
+			MLX5_ASSERT(data->offset + width <= mark_count);
 			int reg = mlx5_flow_get_reg_id(dev, MLX5_FLOW_MARK,
 						       0, error);
 			if (reg < 0)
@@ -1815,14 +1815,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((mark_mask >>
-					 (mark_count - width)) & mark_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, mark_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_META:
 		{
 			uint32_t meta_mask = priv->sh->dv_meta_mask;
 			uint32_t meta_count = __builtin_popcount(meta_mask);
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
 			int reg = flow_dv_get_metadata_reg(dev, attr, error);
 			if (reg < 0)
 				return;
@@ -1831,16 +1835,32 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((meta_mask >>
-					(meta_count - width)) & meta_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+		MLX5_ASSERT(data->offset + width <= 2);
+		off_be = 2 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_ECN};
 		if (mask)
-			mask[idx] = 0x3 >> (2 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
+		break;
+	case RTE_FLOW_FIELD_GTP_PSC_QFI:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = data->offset + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_GTPU_FIRST_EXT_DW_0};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
@@ -1890,7 +1910,8 @@ flow_dv_convert_action_modify_field
 
 	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
 	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
-		type = MLX5_MODIFICATION_TYPE_SET;
+		type = conf->operation == RTE_FLOW_MODIFY_SET ?
+			MLX5_MODIFICATION_TYPE_SET : MLX5_MODIFICATION_TYPE_ADD;
 		/** For SET fill the destination field (field) first. */
 		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
 						  conf->width, dev,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index b6978bd051..3321e17fef 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -319,6 +319,11 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->mhdr) {
+		if (acts->mhdr->action)
+			mlx5dr_action_destroy(acts->mhdr->action);
+		mlx5_free(acts->mhdr);
+	}
 }
 
 /**
@@ -425,6 +430,37 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+static __rte_always_inline int
+__flow_hw_act_data_hdr_modify_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     uint16_t mhdr_cmds_off,
+				     uint16_t mhdr_cmds_end,
+				     bool shared,
+				     struct field_modify_info *field,
+				     struct field_modify_info *dcopy,
+				     uint32_t *mask)
+{
+	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->modify_header.mhdr_cmds_off = mhdr_cmds_off;
+	act_data->modify_header.mhdr_cmds_end = mhdr_cmds_end;
+	act_data->modify_header.shared = shared;
+	rte_memcpy(act_data->modify_header.field, field,
+		   sizeof(*field) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.dcopy, dcopy,
+		   sizeof(*dcopy) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.mask, mask,
+		   sizeof(*mask) * MLX5_ACT_MAX_MOD_FIELDS);
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
 /**
  * Append shared RSS action to the dynamic action list.
  *
@@ -515,6 +551,265 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline bool
+flow_hw_action_modify_field_is_shared(const struct rte_flow_action *action,
+				      const struct rte_flow_action *mask)
+{
+	const struct rte_flow_action_modify_field *v = action->conf;
+	const struct rte_flow_action_modify_field *m = mask->conf;
+
+	if (v->src.field == RTE_FLOW_FIELD_VALUE) {
+		uint32_t j;
+
+		if (m == NULL)
+			return false;
+		for (j = 0; j < RTE_DIM(m->src.value); ++j) {
+			/*
+			 * Immediate value is considered to be masked
+			 * (and thus shared by all flow rules), if mask
+			 * is non-zero. Partial mask over immediate value
+			 * is not allowed.
+			 */
+			if (m->src.value[j])
+				return true;
+		}
+		return false;
+	}
+	if (v->src.field == RTE_FLOW_FIELD_POINTER)
+		return m->src.pvalue != NULL;
+	/*
+	 * Source field types other than VALUE and
+	 * POINTER are always shared.
+	 */
+	return true;
+}
+
+static __rte_always_inline bool
+flow_hw_should_insert_nop(const struct mlx5_hw_modify_header_action *mhdr,
+			  const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd last_cmd = { { 0 } };
+	struct mlx5_modification_cmd new_cmd = { { 0 } };
+	const uint32_t cmds_num = mhdr->mhdr_cmds_num;
+	unsigned int last_type;
+	bool should_insert = false;
+
+	if (cmds_num == 0)
+		return false;
+	last_cmd = *(&mhdr->mhdr_cmds[cmds_num - 1]);
+	last_cmd.data0 = rte_be_to_cpu_32(last_cmd.data0);
+	last_cmd.data1 = rte_be_to_cpu_32(last_cmd.data1);
+	last_type = last_cmd.action_type;
+	new_cmd = *cmd;
+	new_cmd.data0 = rte_be_to_cpu_32(new_cmd.data0);
+	new_cmd.data1 = rte_be_to_cpu_32(new_cmd.data1);
+	switch (new_cmd.action_type) {
+	case MLX5_MODIFICATION_TYPE_SET:
+	case MLX5_MODIFICATION_TYPE_ADD:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = new_cmd.field == last_cmd.field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = new_cmd.field == last_cmd.dst_field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	case MLX5_MODIFICATION_TYPE_COPY:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = (new_cmd.field == last_cmd.field ||
+					 new_cmd.dst_field == last_cmd.field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = (new_cmd.field == last_cmd.dst_field ||
+					 new_cmd.dst_field == last_cmd.dst_field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	default:
+		/* Other action types should be rejected on AT validation. */
+		MLX5_ASSERT(false);
+		break;
+	}
+	return should_insert;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_nop_append(struct mlx5_hw_modify_header_action *mhdr)
+{
+	struct mlx5_modification_cmd *nop;
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	nop = mhdr->mhdr_cmds + num;
+	nop->data0 = 0;
+	nop->action_type = MLX5_MODIFICATION_TYPE_NOP;
+	nop->data0 = rte_cpu_to_be_32(nop->data0);
+	nop->data1 = 0;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_append(struct mlx5_hw_modify_header_action *mhdr,
+			struct mlx5_modification_cmd *cmd)
+{
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	mhdr->mhdr_cmds[num] = *cmd;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_converted_mhdr_cmds_append(struct mlx5_hw_modify_header_action *mhdr,
+				   struct mlx5_flow_dv_modify_hdr_resource *resource)
+{
+	uint32_t idx;
+	int ret;
+
+	for (idx = 0; idx < resource->actions_num; ++idx) {
+		struct mlx5_modification_cmd *src = &resource->actions[idx];
+
+		if (flow_hw_should_insert_nop(mhdr, src)) {
+			ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+			if (ret)
+				return ret;
+		}
+		ret = flow_hw_mhdr_cmd_append(mhdr, src);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static __rte_always_inline void
+flow_hw_modify_field_init(struct mlx5_hw_modify_header_action *mhdr,
+			  struct rte_flow_actions_template *at)
+{
+	memset(mhdr, 0, sizeof(*mhdr));
+	/* Modify header action without any commands is shared by default. */
+	mhdr->shared = true;
+	mhdr->pos = at->mhdr_off;
+}
+
+static __rte_always_inline int
+flow_hw_modify_field_compile(struct rte_eth_dev *dev,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *action_start, /* Start of AT actions. */
+			     const struct rte_flow_action *action, /* Current action from AT. */
+			     const struct rte_flow_action *action_mask, /* Current mask from AT. */
+			     struct mlx5_hw_actions *acts,
+			     struct mlx5_hw_modify_header_action *mhdr,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_modify_field *conf = action->conf;
+	union {
+		struct mlx5_flow_dv_modify_hdr_resource resource;
+		uint8_t data[sizeof(struct mlx5_flow_dv_modify_hdr_resource) +
+			     sizeof(struct mlx5_modification_cmd) * MLX5_MHDR_MAX_CMD];
+	} dummy;
+	struct mlx5_flow_dv_modify_hdr_resource *resource;
+	struct rte_flow_item item = {
+		.spec = NULL,
+		.mask = NULL
+	};
+	struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS] = { 0 };
+	uint32_t type, value = 0;
+	uint16_t cmds_start, cmds_end;
+	bool shared;
+	int ret;
+
+	/*
+	 * Modify header action is shared if previous modify_field actions
+	 * are shared and currently compiled action is shared.
+	 */
+	shared = flow_hw_action_modify_field_is_shared(action, action_mask);
+	mhdr->shared &= shared;
+	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
+	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
+		type = conf->operation == RTE_FLOW_MODIFY_SET ? MLX5_MODIFICATION_TYPE_SET :
+								MLX5_MODIFICATION_TYPE_ADD;
+		/* For SET/ADD fill the destination field (field) first. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
+						  conf->width, dev,
+						  attr, error);
+		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
+				(void *)(uintptr_t)conf->src.pvalue :
+				(void *)(uintptr_t)&conf->src.value;
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+			value = *(const unaligned_uint32_t *)item.spec;
+			value = rte_cpu_to_be_32(value);
+			item.spec = &value;
+		} else if (conf->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+			/*
+			 * QFI is passed as an uint8_t integer, but it is accessed through
+			 * a 2nd least significant byte of a 32-bit field in modify header command.
+			 */
+			value = *(const uint8_t *)item.spec;
+			value = rte_cpu_to_be_32(value << 8);
+			item.spec = &value;
+		}
+	} else {
+		type = MLX5_MODIFICATION_TYPE_COPY;
+		/* For COPY fill the destination field (dcopy) without mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, dcopy, NULL,
+						  conf->width, dev,
+						  attr, error);
+		/* Then construct the source field (field) with mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->src, field, mask,
+						  conf->width, dev,
+						  attr, error);
+	}
+	item.mask = &mask;
+	memset(&dummy, 0, sizeof(dummy));
+	resource = &dummy.resource;
+	ret = flow_dv_convert_modify_action(&item, field, dcopy, resource, type, error);
+	if (ret)
+		return ret;
+	MLX5_ASSERT(resource->actions_num > 0);
+	/*
+	 * If previous modify field action collide with this one, then insert NOP command.
+	 * This NOP command will not be a part of action's command range used to update commands
+	 * on rule creation.
+	 */
+	if (flow_hw_should_insert_nop(mhdr, &resource->actions[0])) {
+		ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+		if (ret)
+			return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL, "too many modify field operations specified");
+	}
+	cmds_start = mhdr->mhdr_cmds_num;
+	ret = flow_hw_converted_mhdr_cmds_append(mhdr, resource);
+	if (ret)
+		return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "too many modify field operations specified");
+
+	cmds_end = mhdr->mhdr_cmds_num;
+	if (shared)
+		return 0;
+	ret = __flow_hw_act_data_hdr_modify_append(priv, acts, RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+						   action - action_start, mhdr->pos,
+						   cmds_start, cmds_end, shared,
+						   field, dcopy, mask);
+	if (ret)
+		return rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "not enough memory to store modify field metadata");
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -558,10 +853,12 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
+	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
 	uint32_t type, i;
 	int err;
 
+	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
 		type = MLX5DR_TABLE_TYPE_FDB;
 	else if (attr->egress)
@@ -714,6 +1011,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			reformat_pos = i++;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr.pos == UINT16_MAX)
+				mhdr.pos = i++;
+			err = flow_hw_modify_field_compile(dev, attr, action_start,
+							   actions, masks, acts, &mhdr,
+							   error);
+			if (err)
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -721,6 +1027,31 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (mhdr.pos != UINT16_MAX) {
+		uint32_t flags;
+		uint32_t bulk_size;
+		size_t mhdr_len;
+
+		acts->mhdr = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*acts->mhdr),
+					 0, SOCKET_ID_ANY);
+		if (!acts->mhdr)
+			goto err;
+		rte_memcpy(acts->mhdr, &mhdr, sizeof(*acts->mhdr));
+		mhdr_len = sizeof(struct mlx5_modification_cmd) * acts->mhdr->mhdr_cmds_num;
+		flags = mlx5_hw_act_flag[!!attr->group][type];
+		if (acts->mhdr->shared) {
+			flags |= MLX5DR_ACTION_FLAG_SHARED;
+			bulk_size = 0;
+		} else {
+			bulk_size = rte_log2_u32(table_attr->nb_flows);
+		}
+		acts->mhdr->action = mlx5dr_action_create_modify_header
+				(priv->dr_ctx, mhdr_len, (__be64 *)acts->mhdr->mhdr_cmds,
+				 bulk_size, flags);
+		if (!acts->mhdr->action)
+			goto err;
+		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
+	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
@@ -884,6 +1215,110 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_mhdr_cmd_is_nop(const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd cmd_he = {
+		.data0 = rte_be_to_cpu_32(cmd->data0),
+		.data1 = 0,
+	};
+
+	return cmd_he.action_type == MLX5_MODIFICATION_TYPE_NOP;
+}
+
+/**
+ * Construct flow action array.
+ *
+ * For action template contains dynamic actions, these actions need to
+ * be updated according to the rte_flow action during flow creation.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] job
+ *   Pointer to job descriptor.
+ * @param[in] hw_acts
+ *   Pointer to translated actions from template.
+ * @param[in] it_idx
+ *   Item template index the action template refer to.
+ * @param[in] actions
+ *   Array of rte_flow action need to be checked.
+ * @param[in] rule_acts
+ *   Array of DR rule actions to be used during flow creation..
+ * @param[in] acts_num
+ *   Pointer to the real acts_num flow has.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	const struct rte_flow_action_modify_field *mhdr_action = action->conf;
+	uint8_t values[16] = { 0 };
+	unaligned_uint32_t *value_p;
+	uint32_t i;
+	struct field_modify_info *field;
+
+	if (!hw_acts->mhdr)
+		return -1;
+	if (hw_acts->mhdr->shared || act_data->modify_header.shared)
+		return 0;
+	MLX5_ASSERT(mhdr_action->operation == RTE_FLOW_MODIFY_SET ||
+		    mhdr_action->operation == RTE_FLOW_MODIFY_ADD);
+	if (mhdr_action->src.field != RTE_FLOW_FIELD_VALUE &&
+	    mhdr_action->src.field != RTE_FLOW_FIELD_POINTER)
+		return 0;
+	if (mhdr_action->src.field == RTE_FLOW_FIELD_VALUE)
+		rte_memcpy(values, &mhdr_action->src.value, sizeof(values));
+	else
+		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
+	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(*value_p);
+	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+		uint32_t tmp;
+
+		/*
+		 * QFI is passed as an uint8_t integer, but it is accessed through
+		 * a 2nd least significant byte of a 32-bit field in modify header command.
+		 */
+		tmp = values[0];
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(tmp << 8);
+	}
+	i = act_data->modify_header.mhdr_cmds_off;
+	field = act_data->modify_header.field;
+	do {
+		uint32_t off_b;
+		uint32_t mask;
+		uint32_t data;
+		const uint8_t *mask_src;
+
+		if (i >= act_data->modify_header.mhdr_cmds_end)
+			return -1;
+		if (flow_hw_mhdr_cmd_is_nop(&job->mhdr_cmd[i])) {
+			++i;
+			continue;
+		}
+		mask_src = (const uint8_t *)act_data->modify_header.mask;
+		mask = flow_dv_fetch_field(mask_src + field->offset, field->size);
+		if (!mask) {
+			++field;
+			continue;
+		}
+		off_b = rte_bsf32(mask);
+		data = flow_dv_fetch_field(values + field->offset, field->size);
+		data = (data & mask) >> off_b;
+		job->mhdr_cmd[i++].data1 = rte_cpu_to_be_32(data);
+		++field;
+	} while (field->size);
+	return 0;
+}
+
 /**
  * Construct flow action array.
  *
@@ -928,6 +1363,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	};
 	uint32_t ft_flag;
 	size_t encap_len = 0;
+	int ret;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -945,6 +1381,18 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
+	if (hw_acts->mhdr && hw_acts->mhdr->mhdr_cmds_num > 0) {
+		uint16_t pos = hw_acts->mhdr->pos;
+
+		if (!hw_acts->mhdr->shared) {
+			rule_acts[pos].modify_header.offset =
+						job->flow->idx - 1;
+			rule_acts[pos].modify_header.data =
+						(uint8_t *)job->mhdr_cmd;
+			rte_memcpy(job->mhdr_cmd, hw_acts->mhdr->mhdr_cmds,
+				   sizeof(*job->mhdr_cmd) * hw_acts->mhdr->mhdr_cmds_num);
+		}
+	}
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1020,6 +1468,14 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_modify_field_construct(job,
+							     act_data,
+							     hw_acts,
+							     action);
+			if (ret)
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -1609,6 +2065,136 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
+				     const struct rte_flow_action *mask,
+				     struct rte_flow_error *error)
+{
+	const struct rte_flow_action_modify_field *action_conf =
+		action->conf;
+	const struct rte_flow_action_modify_field *mask_conf =
+		mask->conf;
+
+	if (action_conf->operation != mask_conf->operation)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field operation mask and template are not equal");
+	if (action_conf->dst.field != mask_conf->dst.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->dst.field == RTE_FLOW_FIELD_POINTER ||
+	    action_conf->dst.field == RTE_FLOW_FIELD_VALUE)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"immediate value and pointer cannot be used as destination");
+	if (mask_conf->dst.level != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination encapsulation level must be fully masked");
+	if (mask_conf->dst.offset != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination offset level must be fully masked");
+	if (action_conf->src.field != mask_conf->src.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->src.field != RTE_FLOW_FIELD_POINTER &&
+	    action_conf->src.field != RTE_FLOW_FIELD_VALUE) {
+		if (mask_conf->src.level != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source encapsulation level must be fully masked");
+		if (mask_conf->src.offset != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source offset level must be fully masked");
+	}
+	if (mask_conf->width != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field width field must be fully masked");
+	return 0;
+}
+
+static int
+flow_hw_action_validate(const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	int i;
+	bool actions_end = false;
+	int ret;
+
+	for (i = 0; !actions_end; ++i) {
+		const struct rte_flow_action *action = &actions[i];
+		const struct rte_flow_action *mask = &masks[i];
+
+		if (action->type != mask->type)
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "mask type does not match action type");
+		switch (action->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MARK:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_DROP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_JUMP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_validate_action_modify_field(action,
+									mask,
+									error);
+			if (ret < 0)
+				return ret;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			actions_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "action not supported in template API");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow action template.
  *
@@ -1637,6 +2223,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
+	if (flow_hw_action_validate(actions, masks, error))
+		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
 	if (act_len <= 0)
@@ -2093,6 +2681,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
+			    sizeof(struct mlx5_modification_cmd) *
+			    MLX5_MHDR_MAX_CMD +
 			    sizeof(struct mlx5_hw_q_job)) *
 			    queue_attr[0]->size;
 	}
@@ -2104,6 +2694,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	for (i = 0; i < nb_queue; i++) {
 		uint8_t *encap = NULL;
+		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 
 		priv->hw_q[i].job_idx = queue_attr[i]->size;
 		priv->hw_q[i].size = queue_attr[i]->size;
@@ -2115,8 +2706,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 					    &job[queue_attr[i - 1]->size];
 		job = (struct mlx5_hw_q_job *)
 		      &priv->hw_q[i].job[queue_attr[i]->size];
-		encap = (uint8_t *)&job[queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
+		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
 		for (j = 0; j < queue_attr[i]->size; j++) {
+			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
 			priv->hw_q[i].job[j] = &job[j];
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 05/17] net/mlx5: add HW steering port action
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (3 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 04/17] net/mlx5: add modify field hws support Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 06/17] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch implements creating and caching of port actions for use with
HW Steering FDB flows.

Actions are created on flow template API configuration and created
only on the port designated as master. Attaching and detaching of ports
in the same switching domain causes an update to the port actions cache
by, respectively, creating and destroying actions.

A new devarg fdb_def_rule_en is being added and it's used to control
the default dedicated E-Switch rule is created by PMD implicitly or not,
and PMD sets this value to 1 by default.
If set to 0, the default E-Switch rule will not be created and user can
create the specific E-Switch rule on root table if needed.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |    9 +
 drivers/net/mlx5/linux/mlx5_os.c   |   14 +
 drivers/net/mlx5/mlx5.c            |   14 +
 drivers/net/mlx5/mlx5.h            |   26 +-
 drivers/net/mlx5/mlx5_flow.c       |   96 +-
 drivers/net/mlx5/mlx5_flow.h       |   22 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   93 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1356 +++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_trigger.c    |   77 +-
 10 files changed, 1594 insertions(+), 117 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 631f0840eb..c42ac482d8 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1118,6 +1118,15 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``fdb_def_rule_en`` parameter [int]
+
+  A non-zero value enables the PMD to create a dedicated rule on E-Switch root
+  table, this dedicated rule forwards all incoming packets into table 1, other
+  rules will be created in E-Switch table original table level plus one, to
+  improve the flow insertion rate due to skip root table managed by firmware.
+  If set to 0, all rules will be created on the original E-Switch table level.
+
+  By default, the PMD will set this value to 1.
 
 Supported NICs
 --------------
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index b7cc11a2ef..e0586a4d6f 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1556,6 +1556,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			flow_hw_set_port_info(eth_dev);
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    flow_hw_create_vport_action(eth_dev)) {
+			DRV_LOG(ERR, "port %u failed to create vport action",
+				eth_dev->data->port_id);
+			err = EINVAL;
+			goto error;
+		}
 		return eth_dev;
 #else
 		DRV_LOG(ERR, "DV support is missing for HWS.");
@@ -1620,6 +1627,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	return eth_dev;
 error:
 	if (priv) {
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+		if (eth_dev &&
+		    priv->sh &&
+		    priv->sh->config.dv_flow_en == 2 &&
+		    priv->sh->config.dv_esw_en)
+			flow_hw_destroy_vport_action(eth_dev);
+#endif
 		if (priv->mreg_cp_tbl)
 			mlx5_hlist_destroy(priv->mreg_cp_tbl);
 		if (priv->sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b39ef1ecbe..74adb677f4 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -172,6 +172,9 @@
 /* Device parameter to configure the delay drop when creating Rxqs. */
 #define MLX5_DELAY_DROP "delay_drop"
 
+/* Device parameter to create the fdb default rule in PMD */
+#define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1239,6 +1242,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->decap_en = !!tmp;
 	} else if (strcmp(MLX5_ALLOW_DUPLICATE_PATTERN, key) == 0) {
 		config->allow_duplicate_pattern = !!tmp;
+	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
+		config->fdb_def_rule = !!tmp;
 	}
 	return 0;
 }
@@ -1274,6 +1279,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_RECLAIM_MEM,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
+		MLX5_FDB_DEFAULT_RULE_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1285,6 +1291,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->dv_flow_en = 1;
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
+	config->fdb_def_rule = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1360,6 +1367,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"decap_en\" is %u.", config->decap_en);
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
+	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
 	return 0;
 }
 
@@ -1943,6 +1951,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	mlx5_flex_parser_ecpri_release(dev);
 	mlx5_flex_item_port_cleanup(dev);
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
 	if (priv->sh->config.dv_flow_en == 2)
@@ -2644,6 +2653,11 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.fdb_def_rule ^ config->fdb_def_rule) {
+		DRV_LOG(ERR, "\"fdb_def_rule_en\" configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.l3_vxlan_en ^ config->l3_vxlan_en) {
 		DRV_LOG(ERR, "\"l3_vxlan_en\" "
 			"configuration mismatch for shared %s context.",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a93af75baa..84f6937c95 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -309,6 +309,7 @@ struct mlx5_sh_config {
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
 	/* Allow/Prevent the duplicate rules pattern. */
+	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
 
@@ -337,6 +338,8 @@ enum {
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
 };
 
+#define MLX5_HW_MAX_ITEMS (16)
+
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
@@ -344,6 +347,8 @@ struct mlx5_hw_q_job {
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
+	struct rte_flow_item *items;
+	struct rte_flow_item_ethdev port_spec;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -1202,6 +1207,8 @@ struct mlx5_dev_ctx_shared {
 	uint32_t flow_priority_check_flag:1; /* Check Flag for flow priority. */
 	uint32_t metadata_regc_check_flag:1; /* Check Flag for metadata REGC. */
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
+	uint32_t shared_mark_enabled:1;
+	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1450,6 +1457,12 @@ struct mlx5_obj_ops {
 
 #define MLX5_RSS_HASH_FIELDS_LEN RTE_DIM(mlx5_rss_hash_fields)
 
+struct mlx5_hw_ctrl_flow {
+	LIST_ENTRY(mlx5_hw_ctrl_flow) next;
+	struct rte_eth_dev *owner_dev;
+	struct rte_flow *flow;
+};
+
 struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared device context. */
@@ -1490,6 +1503,11 @@ struct mlx5_priv {
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
 	void *root_drop_action; /* Pointer to root drop action. */
+	rte_spinlock_t hw_ctrl_lock;
+	LIST_HEAD(hw_ctrl_flow, mlx5_hw_ctrl_flow) hw_ctrl_flows;
+	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
+	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
+	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
@@ -1550,11 +1568,11 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_drop[MLX5_HW_ACTION_FLAG_MAX]
-				     [MLX5DR_TABLE_TYPE_MAX];
-	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_tag[MLX5_HW_ACTION_FLAG_MAX];
+	struct mlx5dr_action *hw_drop[2];
+	/* HW steering global tag action. */
+	struct mlx5dr_action *hw_tag[2];
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 3abb39aa92..9c44b2e99b 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -999,6 +999,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.flex_item_create = mlx5_flow_flex_item_create,
 	.flex_item_release = mlx5_flow_flex_item_release,
 	.info_get = mlx5_flow_info_get,
+	.pick_transfer_proxy = mlx5_flow_pick_transfer_proxy,
 	.configure = mlx5_flow_port_configure,
 	.pattern_template_create = mlx5_flow_pattern_template_create,
 	.pattern_template_destroy = mlx5_flow_pattern_template_destroy,
@@ -1242,7 +1243,7 @@ mlx5_get_lowest_priority(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (!attr->group && !attr->transfer)
+	if (!attr->group && !(attr->transfer && priv->fdb_def_rule))
 		return priv->sh->flow_max_priority - 2;
 	return MLX5_NON_ROOT_FLOW_MAX_PRIO - 1;
 }
@@ -1269,11 +1270,14 @@ mlx5_get_matcher_priority(struct rte_eth_dev *dev,
 	uint16_t priority = (uint16_t)attr->priority;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+	/* NIC root rules */
 	if (!attr->group && !attr->transfer) {
 		if (attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR)
 			priority = priv->sh->flow_max_priority - 1;
 		return mlx5_os_flow_adjust_priority(dev, priority, subpriority);
-	} else if (!external && attr->transfer && attr->group == 0 &&
+	/* FDB root rules */
+	} else if (attr->transfer && (!external || !priv->fdb_def_rule) &&
+		   attr->group == 0 &&
 		   attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR) {
 		return (priv->sh->flow_max_priority - 1) * 3;
 	}
@@ -1481,13 +1485,32 @@ flow_rxq_mark_flag_set(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_ctrl *rxq_ctrl;
+	uint16_t port_id;
 
-	if (priv->mark_enabled)
+	if (priv->sh->shared_mark_enabled)
 		return;
-	LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
-		rxq_ctrl->rxq.mark = 1;
+	if (priv->master || priv->representor) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->domain_id != priv->domain_id ||
+			    opriv->mark_enabled)
+				continue;
+			LIST_FOREACH(rxq_ctrl, &opriv->rxqsctrl, next) {
+				rxq_ctrl->rxq.mark = 1;
+			}
+			opriv->mark_enabled = 1;
+		}
+	} else {
+		LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
+			rxq_ctrl->rxq.mark = 1;
+		}
+		priv->mark_enabled = 1;
 	}
-	priv->mark_enabled = 1;
+	priv->sh->shared_mark_enabled = 1;
 }
 
 /**
@@ -1623,6 +1646,7 @@ flow_rxq_flags_clear(struct rte_eth_dev *dev)
 		rxq->ctrl->rxq.tunnel = 0;
 	}
 	priv->mark_enabled = 0;
+	priv->sh->shared_mark_enabled = 0;
 }
 
 /**
@@ -2808,8 +2832,8 @@ mlx5_flow_validate_item_tcp(const struct rte_flow_item *item,
  *   Item specification.
  * @param[in] item_flags
  *   Bit-fields that holds the items detected until now.
- * @param[in] attr
- *   Flow rule attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2821,7 +2845,7 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 			      uint16_t udp_dport,
 			      const struct rte_flow_item *item,
 			      uint64_t item_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_vxlan *spec = item->spec;
@@ -2858,12 +2882,11 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 	if (priv->sh->steering_format_version !=
 	    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
 	    !udp_dport || udp_dport == MLX5_UDP_PORT_VXLAN) {
-		/* FDB domain & NIC domain non-zero group */
-		if ((attr->transfer || attr->group) && priv->sh->misc5_cap)
+		/* non-root table */
+		if (!root && priv->sh->misc5_cap)
 			valid_mask = &nic_mask;
 		/* Group zero in NIC domain */
-		if (!attr->group && !attr->transfer &&
-		    priv->sh->tunnel_header_0_1)
+		if (!root && priv->sh->tunnel_header_0_1)
 			valid_mask = &nic_mask;
 	}
 	ret = mlx5_flow_item_acceptable
@@ -3102,11 +3125,11 @@ mlx5_flow_validate_item_gre_option(struct rte_eth_dev *dev,
 	if (mask->checksum_rsvd.checksum || mask->sequence.sequence) {
 		if (priv->sh->steering_format_version ==
 		    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
-		    ((attr->group || attr->transfer) &&
+		    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
 		     !priv->sh->misc5_cap) ||
 		    (!(priv->sh->tunnel_header_0_1 &&
 		       priv->sh->tunnel_header_2_3) &&
-		    !attr->group && !attr->transfer))
+		    !attr->group && (!attr->transfer || !priv->fdb_def_rule)))
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  item,
@@ -6163,7 +6186,8 @@ flow_create_split_metadata(struct rte_eth_dev *dev,
 	}
 	if (qrss) {
 		/* Check if it is in meter suffix table. */
-		mtr_sfx = attr->group == (attr->transfer ?
+		mtr_sfx = attr->group ==
+			  ((attr->transfer && priv->fdb_def_rule) ?
 			  (MLX5_FLOW_TABLE_LEVEL_METER - 1) :
 			  MLX5_FLOW_TABLE_LEVEL_METER);
 		/*
@@ -11086,3 +11110,43 @@ int mlx5_flow_get_item_vport_id(struct rte_eth_dev *dev,
 
 	return 0;
 }
+
+int
+mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+			      uint16_t *proxy_port_id,
+			      struct rte_flow_error *error)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t port_id;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " without E-Switch configured");
+	if (!priv->master && !priv->representor)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " for port which is not a master"
+					  " or a representor port");
+	if (priv->master) {
+		*proxy_port_id = dev->data->port_id;
+		return 0;
+	}
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_priv->master &&
+		    port_priv->domain_id == priv->domain_id) {
+			*proxy_port_id = port_id;
+			return 0;
+		}
+	}
+	return rte_flow_error_set(error, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "unable to find a proxy port");
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 0eab3a3797..93f0e189d4 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1151,6 +1151,11 @@ struct rte_flow_pattern_template {
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
 	uint32_t refcnt;  /* Reference counter. */
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * represented_port pattern item.
+	 */
+	bool implicit_port;
 };
 
 /* Flow action template struct. */
@@ -1226,6 +1231,7 @@ struct mlx5_hw_action_template {
 /* mlx5 flow group struct. */
 struct mlx5_flow_group {
 	struct mlx5_list_entry entry;
+	struct rte_eth_dev *dev; /* Reference to corresponding device. */
 	struct mlx5dr_table *tbl; /* HWS table object. */
 	struct mlx5_hw_jump_action jump; /* Jump action. */
 	enum mlx5dr_table_type type; /* Table type. */
@@ -1484,6 +1490,9 @@ void flow_hw_clear_port_info(struct rte_eth_dev *dev);
 void flow_hw_init_tags_set(struct rte_eth_dev *dev);
 void flow_hw_clear_tags_set(struct rte_eth_dev *dev);
 
+int flow_hw_create_vport_action(struct rte_eth_dev *dev);
+void flow_hw_destroy_vport_action(struct rte_eth_dev *dev);
+
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr,
 				    const struct rte_flow_item items[],
@@ -2056,7 +2065,7 @@ int mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 				  uint16_t udp_dport,
 				  const struct rte_flow_item *item,
 				  uint64_t item_flags,
-				  const struct rte_flow_attr *attr,
+				  bool root,
 				  struct rte_flow_error *error);
 int mlx5_flow_validate_item_vxlan_gpe(const struct rte_flow_item *item,
 				      uint64_t item_flags,
@@ -2313,4 +2322,15 @@ int flow_dv_translate_items_hws(const struct rte_flow_item *items,
 				uint32_t key_type, uint64_t *item_flags,
 				uint8_t *match_criteria,
 				struct rte_flow_error *error);
+
+int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+				  uint16_t *proxy_port_id,
+				  struct rte_flow_error *error);
+
+int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
+
+int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
+					 uint32_t txq);
+int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index e4af9d910b..ace69c2b40 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2471,8 +2471,8 @@ flow_dv_validate_item_gtp(struct rte_eth_dev *dev,
  *   Previous validated item in the pattern items.
  * @param[in] gtp_item
  *   Previous GTP item specification.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2483,7 +2483,7 @@ static int
 flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			      uint64_t last_item,
 			      const struct rte_flow_item *gtp_item,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_gtp *gtp_spec;
@@ -2508,7 +2508,7 @@ flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, item,
 			 "GTP E flag must be 1 to match GTP PSC");
 	/* Check the flow is not created in group zero. */
-	if (!attr->transfer && !attr->group)
+	if (root)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 			 "GTP PSC is not supported for group 0");
@@ -3373,20 +3373,19 @@ flow_dv_validate_action_set_tag(struct rte_eth_dev *dev,
 /**
  * Indicates whether ASO aging is supported.
  *
- * @param[in] sh
- *   Pointer to shared device context structure.
- * @param[in] attr
- *   Attributes of flow that includes AGE action.
+ * @param[in] priv
+ *   Pointer to device private context structure.
+ * @param[in] root
+ *   Whether action is on root table.
  *
  * @return
  *   True when ASO aging is supported, false otherwise.
  */
 static inline bool
-flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
-		const struct rte_flow_attr *attr)
+flow_hit_aso_supported(const struct mlx5_priv *priv, bool root)
 {
-	MLX5_ASSERT(sh && attr);
-	return (sh->flow_hit_aso_en && (attr->transfer || attr->group));
+	MLX5_ASSERT(priv);
+	return (priv->sh->flow_hit_aso_en && !root);
 }
 
 /**
@@ -3398,8 +3397,8 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
  *   Indicator if action is shared.
  * @param[in] action_flags
  *   Holds the actions detected until now.
- * @param[in] attr
- *   Attributes of flow that includes this action.
+ * @param[in] root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3409,7 +3408,7 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
 static int
 flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 			      uint64_t action_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -3421,7 +3420,7 @@ flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "duplicate count actions set");
 	if (shared && (action_flags & MLX5_FLOW_ACTION_AGE) &&
-	    !flow_hit_aso_supported(priv->sh, attr))
+	    !flow_hit_aso_supported(priv, root))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "old age and indirect count combination is not supported");
@@ -3652,8 +3651,8 @@ flow_dv_validate_action_raw_encap_decap
  *   Holds the actions detected until now.
  * @param[in] item_flags
  *   The items found in this flow rule.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3664,12 +3663,12 @@ static int
 flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 			       uint64_t action_flags,
 			       uint64_t item_flags,
-			       const struct rte_flow_attr *attr,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	RTE_SET_USED(dev);
 
-	if (attr->group == 0 && !attr->transfer)
+	if (root)
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -4919,6 +4918,8 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   Pointer to the modify action.
  * @param[in] attr
  *   Pointer to the flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -4931,6 +4932,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 				   const uint64_t action_flags,
 				   const struct rte_flow_action *action,
 				   const struct rte_flow_attr *attr,
+				   bool root,
 				   struct rte_flow_error *error)
 {
 	int ret = 0;
@@ -4978,7 +4980,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	}
 	if (action_modify_field->src.field != RTE_FLOW_FIELD_VALUE &&
 	    action_modify_field->src.field != RTE_FLOW_FIELD_POINTER) {
-		if (!attr->transfer && !attr->group)
+		if (root)
 			return rte_flow_error_set(error, ENOTSUP,
 					RTE_FLOW_ERROR_TYPE_ACTION, action,
 					"modify field action is not"
@@ -5068,8 +5070,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV4_ECN ||
 	    action_modify_field->dst.field == RTE_FLOW_FIELD_IPV6_ECN ||
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV6_ECN)
-		if (!hca_attr->modify_outer_ip_ecn &&
-		    !attr->transfer && !attr->group)
+		if (!hca_attr->modify_outer_ip_ecn && root)
 			return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_ACTION, action,
 				"modifications of the ECN for current firmware is not supported");
@@ -5103,11 +5104,12 @@ flow_dv_validate_action_jump(struct rte_eth_dev *dev,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
 	struct flow_grp_info grp_info = {
 		.external = !!external,
 		.transfer = !!attributes->transfer,
-		.fdb_def_rule = 1,
+		.fdb_def_rule = !!priv->fdb_def_rule,
 		.std_tbl_fix = 0
 	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
@@ -5687,6 +5689,8 @@ flow_dv_modify_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  *   Pointer to the COUNT action in sample action list.
  * @param[out] fdb_mirror_limit
  *   Pointer to the FDB mirror limitation flag.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -5703,6 +5707,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 			       const struct rte_flow_action_rss **sample_rss,
 			       const struct rte_flow_action_count **count,
 			       int *fdb_mirror_limit,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5804,7 +5809,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count
 				(dev, false, *action_flags | sub_action_flags,
-				 attr, error);
+				 root, error);
 			if (ret < 0)
 				return ret;
 			*count = act->conf;
@@ -7284,7 +7289,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
@@ -7378,7 +7383,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
 			ret = flow_dv_validate_item_gtp_psc(items, last_item,
-							    gtp_item, attr,
+							    gtp_item, is_root,
 							    error);
 			if (ret < 0)
 				return ret;
@@ -7595,7 +7600,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count(dev, shared_count,
 							    action_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			count = actions->conf;
@@ -7889,7 +7894,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_SET_TAG;
 			break;
 		case MLX5_RTE_FLOW_ACTION_TYPE_AGE:
-			if (!attr->transfer && !attr->group)
+			if (is_root)
 				return rte_flow_error_set(error, ENOTSUP,
 						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 									   NULL,
@@ -7914,7 +7919,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			 * Validate the regular AGE action (using counter)
 			 * mutual exclusion with indirect counter actions.
 			 */
-			if (!flow_hit_aso_supported(priv->sh, attr)) {
+			if (!flow_hit_aso_supported(priv, is_root)) {
 				if (shared_count)
 					return rte_flow_error_set
 						(error, EINVAL,
@@ -7970,6 +7975,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 							     rss, &sample_rss,
 							     &sample_count,
 							     &fdb_mirror_limit,
+							     is_root,
 							     error);
 			if (ret < 0)
 				return ret;
@@ -7986,6 +7992,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 								   action_flags,
 								   actions,
 								   attr,
+								   is_root,
 								   error);
 			if (ret < 0)
 				return ret;
@@ -7999,8 +8006,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			ret = flow_dv_validate_action_aso_ct(dev, action_flags,
-							     item_flags, attr,
-							     error);
+							     item_flags,
+							     is_root, error);
 			if (ret < 0)
 				return ret;
 			action_flags |= MLX5_FLOW_ACTION_CT;
@@ -9200,15 +9207,18 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev,
 	if (MLX5_ITEM_VALID(item, key_type))
 		return;
 	MLX5_ITEM_UPDATE(item, key_type, vxlan_v, vxlan_m, &nic_mask);
-	if (item->mask == &nic_mask &&
-	    ((!attr->group && !priv->sh->tunnel_header_0_1) ||
-	    (attr->group && !priv->sh->misc5_cap)))
+	if ((item->mask == &nic_mask) &&
+	    ((!attr->group && !(attr->transfer && priv->fdb_def_rule) &&
+	    !priv->sh->tunnel_header_0_1) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)))
 		vxlan_m = &rte_flow_item_vxlan_mask;
 	if ((priv->sh->steering_format_version ==
 	     MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 &&
 	     dport != MLX5_UDP_PORT_VXLAN) ||
-	    (!attr->group && !attr->transfer) ||
-	    ((attr->group || attr->transfer) && !priv->sh->misc5_cap)) {
+	    (!attr->group && !(attr->transfer && priv->fdb_def_rule)) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)) {
 		misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 		size = sizeof(vxlan_m->vni);
 		vni_v = MLX5_ADDR_OF(fte_match_set_misc, misc_v, vxlan_vni);
@@ -14180,7 +14190,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 			 */
 			if (action_flags & MLX5_FLOW_ACTION_AGE) {
 				if ((non_shared_age && count) ||
-				    !flow_hit_aso_supported(priv->sh, attr)) {
+				    !flow_hit_aso_supported(priv, !dev_flow->dv.group)) {
 					/* Creates age by counters. */
 					cnt_act = flow_dv_prepare_counter
 								(dev, dev_flow,
@@ -18329,6 +18339,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 			struct rte_flow_error *err)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	/* called from RTE API */
 
 	RTE_SET_USED(conf);
 	switch (action->type) {
@@ -18356,7 +18367,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 						"Indirect age action not supported");
 		return flow_dv_validate_action_age(0, action, dev, err);
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		return flow_dv_validate_action_count(dev, true, 0, NULL, err);
+		return flow_dv_validate_action_count(dev, true, 0, false, err);
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		if (!priv->sh->ct_aso_en)
 			return rte_flow_error_set(err, ENOTSUP,
@@ -18533,6 +18544,8 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 	bool def_green = false;
 	bool def_yellow = false;
 	const struct rte_flow_action_rss *rss_color[RTE_COLORS] = {NULL};
+	/* Called from RTE API */
+	bool is_root = !(attr->group || (attr->transfer && priv->fdb_def_rule));
 
 	if (!dev_conf->dv_esw_en)
 		def_domain &= ~MLX5_MTR_DOMAIN_TRANSFER_BIT;
@@ -18734,7 +18747,7 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 				break;
 			case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 				ret = flow_dv_validate_action_modify_field(dev,
-					action_flags[i], act, attr, &flow_err);
+					action_flags[i], act, attr, is_root, &flow_err);
 				if (ret < 0)
 					return -rte_mtr_error_set(error,
 					  ENOTSUP,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 3321e17fef..728370328c 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,6 +20,14 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
+/* Maximum number of rules in control flow tables */
+#define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
+
+/* Flow group for SQ miss default flows/ */
+#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+
+static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -57,6 +65,9 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_ctrl_get(dev, i);
 
+		/* With RXQ start/stop feature, RXQ might be stopped. */
+		if (!rxq_ctrl)
+			continue;
 		rxq_ctrl->rxq.mark = enable;
 	}
 	priv->mark_enabled = enable;
@@ -810,6 +821,77 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+flow_hw_represented_port_compile(struct rte_eth_dev *dev,
+				 const struct rte_flow_attr *attr,
+				 const struct rte_flow_action *action_start,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *action_mask,
+				 struct mlx5_hw_actions *acts,
+				 uint16_t action_dst,
+				 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_ethdev *v = action->conf;
+	const struct rte_flow_action_ethdev *m = action_mask->conf;
+	int ret;
+
+	if (!attr->group)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used on group 0");
+	if (!attr->transfer)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER,
+					  NULL,
+					  "represented_port action requires"
+					  " transfer attribute");
+	if (attr->ingress || attr->egress)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used with direction attributes");
+	if (!priv->master)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "represented_port acton must"
+					  " be used on proxy port");
+	if (m && !!m->port_id) {
+		struct mlx5_priv *port_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
+		if (port_priv == NULL)
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "port does not exist or unable to"
+					 " obtain E-Switch info for port");
+		MLX5_ASSERT(priv->hw_vport != NULL);
+		if (priv->hw_vport[v->port_id]) {
+			acts->rule_acts[action_dst].action =
+					priv->hw_vport[v->port_id];
+		} else {
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "cannot use represented_port action"
+					 " with this port");
+		}
+	} else {
+		ret = __flow_hw_act_data_general_append
+				(priv, acts, action->type,
+				 action - action_start, action_dst);
+		if (ret)
+			return rte_flow_error_set
+					(error, ENOMEM,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "not enough memory to store"
+					 " vport action");
+	}
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -887,7 +969,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			acts->rule_acts[i++].action =
-				priv->hw_drop[!!attr->group][type];
+				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			acts->mark = true;
@@ -1020,6 +1102,13 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			if (err)
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			if (flow_hw_represented_port_compile
+					(dev, attr, action_start, actions,
+					 masks, acts, i, error))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1352,11 +1441,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5dr_rule_action *rule_acts,
 			  uint32_t *acts_num)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
+	const struct rte_flow_action_ethdev *port_action = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1476,6 +1567,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (ret)
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			port_action = action->conf;
+			if (!priv->hw_vport[port_action->port_id])
+				return -1;
+			rule_acts[act_data->action_dst].action =
+					priv->hw_vport[port_action->port_id];
+			break;
 		default:
 			break;
 		}
@@ -1488,6 +1586,52 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static const struct rte_flow_item *
+flow_hw_get_rule_items(struct rte_eth_dev *dev,
+		       struct rte_flow_template_table *table,
+		       const struct rte_flow_item items[],
+		       uint8_t pattern_template_index,
+		       struct mlx5_hw_q_job *job)
+{
+	if (table->its[pattern_template_index]->implicit_port) {
+		const struct rte_flow_item *curr_item;
+		unsigned int nb_items;
+		bool found_end;
+		unsigned int i;
+
+		/* Count number of pattern items. */
+		nb_items = 0;
+		found_end = false;
+		for (curr_item = items; !found_end; ++curr_item) {
+			++nb_items;
+			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+				found_end = true;
+		}
+		/* Prepend represented port item. */
+		job->port_spec = (struct rte_flow_item_ethdev){
+			.port_id = dev->data->port_id,
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &job->port_spec,
+		};
+		found_end = false;
+		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
+			job->items[i] = items[i - 1];
+			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
+				found_end = true;
+				break;
+			}
+		}
+		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+		return job->items;
+	}
+	return items;
+}
+
 /**
  * Enqueue HW steering flow creation.
  *
@@ -1539,6 +1683,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
+	const struct rte_flow_item *rule_items;
 	uint32_t acts_num, flow_idx;
 	int ret;
 
@@ -1565,15 +1710,23 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow action array based on the input actions.*/
-	flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num);
+	/* Construct the flow actions based on the input actions.*/
+	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
+				  actions, rule_acts, &acts_num)) {
+		rte_errno = EINVAL;
+		goto free;
+	}
+	rule_items = flow_hw_get_rule_items(dev, table, items,
+					    pattern_template_index, job);
+	if (!rule_items)
+		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
 				 pattern_template_index, items,
 				 action_template_index, rule_acts,
 				 &rule_attr, &flow->rule);
 	if (likely(!ret))
 		return (struct rte_flow *)flow;
+free:
 	/* Flow created fail, return the descriptor and flow memory. */
 	mlx5_ipool_free(table->flow, flow_idx);
 	priv->hw_q[queue].job_idx++;
@@ -1754,7 +1907,9 @@ __flow_hw_pull_comp(struct rte_eth_dev *dev,
 	struct rte_flow_op_result comp[BURST_THR];
 	int ret, i, empty_loop = 0;
 
-	flow_hw_push(dev, queue, error);
+	ret = flow_hw_push(dev, queue, error);
+	if (ret < 0)
+		return ret;
 	while (pending_rules) {
 		ret = flow_hw_pull(dev, queue, comp, BURST_THR, error);
 		if (ret < 0)
@@ -2039,8 +2194,12 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
+	uint32_t fidx = 1;
 
-	if (table->refcnt) {
+	/* Build ipool allocated object bitmap. */
+	mlx5_ipool_flush_cache(table->flow);
+	/* Check if ipool has allocated objects. */
+	if (table->refcnt || mlx5_ipool_get_next(table->flow, &fidx)) {
 		DRV_LOG(WARNING, "Table %p is still in using.", (void *)table);
 		return rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2052,8 +2211,6 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 		__atomic_sub_fetch(&table->its[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
 	for (i = 0; i < table->nb_action_templates; i++) {
-		if (table->ats[i].acts.mark)
-			flow_hw_rxq_flag_set(dev, false);
 		__flow_hw_action_template_destroy(dev, &table->ats[i].acts);
 		__atomic_sub_fetch(&table->ats[i].action_template->refcnt,
 				   1, __ATOMIC_RELAXED);
@@ -2119,7 +2276,51 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
 }
 
 static int
-flow_hw_action_validate(const struct rte_flow_action actions[],
+flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
+					 const struct rte_flow_action *action,
+					 const struct rte_flow_action *mask,
+					 struct rte_flow_error *error)
+{
+	const struct rte_flow_action_ethdev *action_conf = action->conf;
+	const struct rte_flow_action_ethdev *mask_conf = mask->conf;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "cannot use represented_port actions"
+					  " without an E-Switch");
+	if (mask_conf->port_id) {
+		struct mlx5_priv *port_priv;
+		struct mlx5_priv *dev_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
+		if (!port_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for port");
+		dev_priv = mlx5_dev_to_eswitch_info(dev);
+		if (!dev_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for transfer proxy");
+		if (port_priv->domain_id != dev_priv->domain_id)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "cannot forward to port from"
+						  " a different E-Switch");
+	}
+	return 0;
+}
+
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
@@ -2182,6 +2383,12 @@ flow_hw_action_validate(const struct rte_flow_action actions[],
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			ret = flow_hw_validate_action_represented_port
+					(dev, action, mask, error);
+			if (ret < 0)
+				return ret;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2223,7 +2430,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
-	if (flow_hw_action_validate(actions, masks, error))
+	if (flow_hw_action_validate(dev, actions, masks, error))
 		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
@@ -2306,6 +2513,46 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
+static struct rte_flow_item *
+flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
+			       struct rte_flow_error *error)
+{
+	const struct rte_flow_item *curr_item;
+	struct rte_flow_item *copied_items;
+	bool found_end;
+	unsigned int nb_items;
+	unsigned int i;
+	size_t size;
+
+	/* Count number of pattern items. */
+	nb_items = 0;
+	found_end = false;
+	for (curr_item = items; !found_end; ++curr_item) {
+		++nb_items;
+		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+			found_end = true;
+	}
+	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	size = sizeof(*copied_items) * (nb_items + 1);
+	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
+	if (!copied_items) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				   NULL,
+				   "cannot allocate item template");
+		return NULL;
+	}
+	copied_items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = NULL,
+		.last = NULL,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	for (i = 1; i < nb_items + 1; ++i)
+		copied_items[i] = items[i - 1];
+	return copied_items;
+}
+
 /**
  * Create flow item template.
  *
@@ -2329,9 +2576,35 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *it;
+	struct rte_flow_item *copied_items = NULL;
+	const struct rte_flow_item *tmpl_items;
 
+	if (priv->sh->config.dv_esw_en && attr->ingress) {
+		/*
+		 * Disallow pattern template with ingress and egress/transfer
+		 * attributes in order to forbid implicit port matching
+		 * on egress and transfer traffic.
+		 */
+		if (attr->egress || attr->transfer) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "item template for ingress traffic"
+					   " cannot be used for egress/transfer"
+					   " traffic when E-Switch is enabled");
+			return NULL;
+		}
+		copied_items = flow_hw_copy_prepend_port_item(items, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else {
+		tmpl_items = items;
+	}
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL,
@@ -2339,8 +2612,10 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
-	it->mt = mlx5dr_match_template_create(items, attr->relaxed_matching);
+	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		mlx5_free(it);
 		rte_flow_error_set(error, rte_errno,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2348,9 +2623,12 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 				   "cannot create match template");
 		return NULL;
 	}
-	it->item_flags = flow_hw_rss_item_flags_get(items);
+	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
+	it->implicit_port = !!copied_items;
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
+	if (copied_items)
+		mlx5_free(copied_items);
 	return it;
 }
 
@@ -2476,6 +2754,7 @@ flow_hw_grp_create_cb(void *tool_ctx, void *cb_ctx)
 			goto error;
 		grp_data->jump.root_action = jump;
 	}
+	grp_data->dev = dev;
 	grp_data->idx = idx;
 	grp_data->group_id = attr->group;
 	grp_data->type = dr_tbl_attr.type;
@@ -2544,7 +2823,8 @@ flow_hw_grp_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 	struct rte_flow_attr *attr =
 			(struct rte_flow_attr *)ctx->data;
 
-	return (grp_data->group_id != attr->group) ||
+	return (grp_data->dev != ctx->dev) ||
+		(grp_data->group_id != attr->group) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_FDB) &&
 		attr->transfer) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_NIC_TX) &&
@@ -2607,6 +2887,545 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
 	mlx5_ipool_free(sh->ipool[MLX5_IPOOL_HW_GRP], grp_data->idx);
 }
 
+/**
+ * Create and cache a vport action for given @p dev port. vport actions
+ * cache is used in HWS with FDB flows.
+ *
+ * This function does not create any function if proxy port for @p dev port
+ * was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ *
+ * @return
+ *   0 on success, positive value otherwise.
+ */
+int
+flow_hw_create_vport_action(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+	int ret;
+
+	ret = mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL);
+	if (ret)
+		return ret;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport)
+		return 0;
+	if (proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u HWS vport action already created",
+			port_id);
+		return -EINVAL;
+	}
+	proxy_priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+			(proxy_priv->dr_ctx, priv->dev_port,
+			 MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u unable to create HWS vport action",
+			port_id);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Destroys the vport action associated with @p dev device
+ * from actions' cache.
+ *
+ * This function does not destroy any action if there is no action cached
+ * for @p dev or proxy port was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ */
+void
+flow_hw_destroy_vport_action(struct rte_eth_dev *dev)
+{
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+
+	if (mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL))
+		return;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport || !proxy_priv->hw_vport[port_id])
+		return;
+	mlx5dr_action_destroy(proxy_priv->hw_vport[port_id]);
+	proxy_priv->hw_vport[port_id] = NULL;
+}
+
+static int
+flow_hw_create_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	MLX5_ASSERT(!priv->hw_vport);
+	priv->hw_vport = mlx5_malloc(MLX5_MEM_ZERO,
+				     sizeof(*priv->hw_vport) * RTE_MAX_ETHPORTS,
+				     0, SOCKET_ID_ANY);
+	if (!priv->hw_vport)
+		return -ENOMEM;
+	DRV_LOG(DEBUG, "port %u :: creating vport actions", priv->dev_data->port_id);
+	DRV_LOG(DEBUG, "port %u ::    domain_id=%u", priv->dev_data->port_id, priv->domain_id);
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
+		struct mlx5_priv *port_priv = rte_eth_devices[port_id].data->dev_private;
+
+		if (!port_priv ||
+		    port_priv->domain_id != priv->domain_id)
+			continue;
+		DRV_LOG(DEBUG, "port %u :: for port_id=%u, calling mlx5dr_action_create_dest_vport() with ibport=%u",
+			priv->dev_data->port_id, port_id, port_priv->dev_port);
+		priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+				(priv->dr_ctx, port_priv->dev_port,
+				 MLX5DR_ACTION_FLAG_HWS_FDB);
+		DRV_LOG(DEBUG, "port %u :: priv->hw_vport[%u]=%p",
+			priv->dev_data->port_id, port_id, (void *)priv->hw_vport[port_id]);
+		if (!priv->hw_vport[port_id])
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static void
+flow_hw_free_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	if (!priv->hw_vport)
+		return;
+	for (port_id = 0; port_id < RTE_MAX_ETHPORTS; ++port_id)
+		if (priv->hw_vport[port_id])
+			mlx5dr_action_destroy(priv->hw_vport[port_id]);
+	mlx5_free(priv->hw_vport);
+	priv->hw_vport = NULL;
+}
+
+/**
+ * Creates a flow pattern template used to match on E-Switch Manager.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template used to match on a TX queue.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template with unmasked represented port matching.
+ * This template is used to set up a table for default transfer flows
+ * directing packets to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow actions template with an unmasked JUMP action. Flows
+ * based on this template will perform a jump to some group. This template
+ * is used to set up tables for control flows.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param group
+ *   Destination group for this action template.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_jump_actions_template(struct rte_eth_dev *dev,
+					  uint32_t group)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = group,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a flow action template with a unmasked REPRESENTED_PORT action.
+ * It is used to create control flow tables.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow action template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_ethdev port_v = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action_ethdev port_m = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
+				       struct rte_flow_pattern_template *it,
+				       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
+				  struct rte_flow_pattern_template *it,
+				  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = MLX5_HW_SQ_MISS_GROUP,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic
+ * from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table *
+flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
+			       struct rte_flow_pattern_template *it,
+			       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 15, /* TODO: Flow priority discovery. */
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a set of flow tables used to create control flows used
+ * when E-Switch is engaged.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, EINVAL otherwise
+ */
+static __rte_unused int
+flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
+	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *port_items_tmpl = NULL;
+	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_actions_template *port_actions_tmpl = NULL;
+	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+
+	/* Item templates */
+	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
+	if (!esw_mgr_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
+			" template for control flows", dev->data->port_id);
+		goto error;
+	}
+	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
+	if (!sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Action templates */
+	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
+									 MLX5_HW_SQ_MISS_GROUP);
+	if (!jump_sq_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Tables */
+	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
+	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
+			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_root_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+								     port_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
+	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
+							       jump_one_actions_tmpl);
+	if (!priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default jump to group 1"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	return 0;
+error:
+	if (priv->hw_esw_zero_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_zero_tbl, NULL);
+		priv->hw_esw_zero_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_tbl, NULL);
+		priv->hw_esw_sq_miss_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_root_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
+		priv->hw_esw_sq_miss_root_tbl = NULL;
+	}
+	if (jump_one_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
+	if (port_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
+	if (jump_sq_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (port_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
+	if (sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (esw_mgr_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
+	return -EINVAL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -2624,7 +3443,6 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-
 static int
 flow_hw_configure(struct rte_eth_dev *dev,
 		  const struct rte_flow_port_attr *port_attr,
@@ -2647,6 +3465,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		.free = mlx5_free,
 		.type = "mlx5_hw_action_construct_data",
 	};
+	/* Adds one queue to be used by PMD.
+	 * The last queue will be used by the PMD.
+	 */
+	uint16_t nb_q_updated;
+	struct rte_flow_queue_attr **_queue_attr = NULL;
+	struct rte_flow_queue_attr ctrl_queue_attr = {0};
+	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
+	int ret;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -2655,7 +3481,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* In case re-configuring, release existing context at first. */
 	if (priv->dr_ctx) {
 		/* */
-		for (i = 0; i < nb_queue; i++) {
+		for (i = 0; i < priv->nb_queue; i++) {
 			hw_q = &priv->hw_q[i];
 			/* Make sure all queues are empty. */
 			if (hw_q->size != hw_q->job_idx) {
@@ -2665,26 +3491,42 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		flow_hw_resource_release(dev);
 	}
+	ctrl_queue_attr.size = queue_attr[0]->size;
+	nb_q_updated = nb_queue + 1;
+	_queue_attr = mlx5_malloc(MLX5_MEM_ZERO,
+				  nb_q_updated *
+				  sizeof(struct rte_flow_queue_attr *),
+				  64, SOCKET_ID_ANY);
+	if (!_queue_attr) {
+		rte_errno = ENOMEM;
+		goto err;
+	}
+
+	memcpy(_queue_attr, queue_attr,
+	       sizeof(void *) * nb_queue);
+	_queue_attr[nb_queue] = &ctrl_queue_attr;
 	priv->acts_ipool = mlx5_ipool_create(&cfg);
 	if (!priv->acts_ipool)
 		goto err;
 	/* Allocate the queue job descriptor LIFO. */
-	mem_size = sizeof(priv->hw_q[0]) * nb_queue;
-	for (i = 0; i < nb_queue; i++) {
+	mem_size = sizeof(priv->hw_q[0]) * nb_q_updated;
+	for (i = 0; i < nb_q_updated; i++) {
 		/*
 		 * Check if the queues' size are all the same as the
 		 * limitation from HWS layer.
 		 */
-		if (queue_attr[i]->size != queue_attr[0]->size) {
+		if (_queue_attr[i]->size != _queue_attr[0]->size) {
 			rte_errno = EINVAL;
 			goto err;
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
+			    sizeof(struct mlx5_hw_q_job) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
 			    sizeof(struct mlx5_modification_cmd) *
 			    MLX5_MHDR_MAX_CMD +
-			    sizeof(struct mlx5_hw_q_job)) *
-			    queue_attr[0]->size;
+			    sizeof(struct rte_flow_item) *
+			    MLX5_HW_MAX_ITEMS) *
+			    _queue_attr[i]->size;
 	}
 	priv->hw_q = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
 				 64, SOCKET_ID_ANY);
@@ -2692,58 +3534,82 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		rte_errno = ENOMEM;
 		goto err;
 	}
-	for (i = 0; i < nb_queue; i++) {
+	for (i = 0; i < nb_q_updated; i++) {
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
+		struct rte_flow_item *items = NULL;
 
-		priv->hw_q[i].job_idx = queue_attr[i]->size;
-		priv->hw_q[i].size = queue_attr[i]->size;
+		priv->hw_q[i].job_idx = _queue_attr[i]->size;
+		priv->hw_q[i].size = _queue_attr[i]->size;
 		if (i == 0)
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &priv->hw_q[nb_queue];
+					    &priv->hw_q[nb_q_updated];
 		else
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &job[queue_attr[i - 1]->size];
+				&job[_queue_attr[i - 1]->size - 1].items
+				 [MLX5_HW_MAX_ITEMS];
 		job = (struct mlx5_hw_q_job *)
-		      &priv->hw_q[i].job[queue_attr[i]->size];
-		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
-		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
-		for (j = 0; j < queue_attr[i]->size; j++) {
+		      &priv->hw_q[i].job[_queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)
+			   &job[_queue_attr[i]->size];
+		encap = (uint8_t *)
+			 &mhdr_cmd[_queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
+		items = (struct rte_flow_item *)
+			 &encap[_queue_attr[i]->size * MLX5_ENCAP_MAX_LEN];
+		for (j = 0; j < _queue_attr[i]->size; j++) {
 			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
+			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
-	dr_ctx_attr.queues = nb_queue;
+	dr_ctx_attr.queues = nb_q_updated;
 	/* Queue size should all be the same. Take the first one. */
-	dr_ctx_attr.queue_size = queue_attr[0]->size;
+	dr_ctx_attr.queue_size = _queue_attr[0]->size;
 	dr_ctx = mlx5dr_context_open(priv->sh->cdev->ctx, &dr_ctx_attr);
 	/* rte_errno has been updated by HWS layer. */
 	if (!dr_ctx)
 		goto err;
 	priv->dr_ctx = dr_ctx;
-	priv->nb_queue = nb_queue;
+	priv->nb_queue = nb_q_updated;
+	rte_spinlock_init(&priv->hw_ctrl_lock);
+	LIST_INIT(&priv->hw_ctrl_flows);
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			priv->hw_drop[i][j] = mlx5dr_action_create_dest_drop
-				(priv->dr_ctx, mlx5_hw_act_flag[i][j]);
-			if (!priv->hw_drop[i][j])
-				goto err;
-		}
+		uint32_t act_flags = 0;
+
+		act_flags = mlx5_hw_act_flag[i][0] | mlx5_hw_act_flag[i][1];
+		if (is_proxy)
+			act_flags |= mlx5_hw_act_flag[i][2];
+		priv->hw_drop[i] = mlx5dr_action_create_dest_drop(priv->dr_ctx, act_flags);
+		if (!priv->hw_drop[i])
+			goto err;
 		priv->hw_tag[i] = mlx5dr_action_create_tag
 			(priv->dr_ctx, mlx5_hw_act_flag[i][0]);
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (is_proxy) {
+		ret = flow_hw_create_vport_actions(priv);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+		ret = flow_hw_create_ctrl_tables(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return 0;
 err:
+	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
@@ -2755,6 +3621,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -2773,10 +3641,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i, j;
+	int i;
 
 	if (!priv->dr_ctx)
 		return;
+	flow_hw_rxq_flag_set(dev, false);
+	flow_hw_flush_all_ctrl_flows(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -2790,13 +3660,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, at, NULL);
 	}
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
@@ -3039,4 +3908,397 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
+static uint32_t
+flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
+{
+	MLX5_ASSERT(priv->nb_queue > 0);
+	return priv->nb_queue - 1;
+}
+
+/**
+ * Creates a control flow using flow template API on @p proxy_dev device,
+ * on behalf of @p owner_dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * Created flow is stored in private list associated with @p proxy_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device on behalf of which flow is created.
+ * @param proxy_dev
+ *   Pointer to Ethernet device on which flow is created.
+ * @param table
+ *   Pointer to flow table.
+ * @param items
+ *   Pointer to flow rule items.
+ * @param item_template_idx
+ *   Index of an item template associated with @p table.
+ * @param actions
+ *   Pointer to flow rule actions.
+ * @param action_template_idx
+ *   Index of an action template associated with @p table.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno set.
+ */
+static __rte_unused int
+flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
+			 struct rte_eth_dev *proxy_dev,
+			 struct rte_flow_template_table *table,
+			 struct rte_flow_item items[],
+			 uint8_t item_template_idx,
+			 struct rte_flow_action actions[],
+			 uint8_t action_template_idx)
+{
+	struct mlx5_priv *priv = proxy_dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	struct rte_flow *flow = NULL;
+	struct mlx5_hw_ctrl_flow *entry = NULL;
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	entry = mlx5_malloc(MLX5_MEM_ZERO | MLX5_MEM_SYS, sizeof(*entry),
+			    0, SOCKET_ID_ANY);
+	if (!entry) {
+		DRV_LOG(ERR, "port %u not enough memory to create control flows",
+			proxy_dev->data->port_id);
+		rte_errno = ENOMEM;
+		ret = -rte_errno;
+		goto error;
+	}
+	flow = flow_hw_async_flow_create(proxy_dev, queue, &op_attr, table,
+					 items, item_template_idx,
+					 actions, action_template_idx,
+					 NULL, NULL);
+	if (!flow) {
+		DRV_LOG(ERR, "port %u failed to enqueue create control"
+			" flow operation", proxy_dev->data->port_id);
+		ret = -rte_errno;
+		goto error;
+	}
+	ret = flow_hw_push(proxy_dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			proxy_dev->data->port_id);
+		goto error;
+	}
+	ret = __flow_hw_pull_comp(proxy_dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to insert control flow",
+			proxy_dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto error;
+	}
+	entry->owner_dev = owner_dev;
+	entry->flow = flow;
+	LIST_INSERT_HEAD(&priv->hw_ctrl_flows, entry, next);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+error:
+	if (entry)
+		mlx5_free(entry);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys a control flow @p flow using flow template API on @p dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * If the @p flow is stored on any private list/pool, then caller must free up
+ * the relevant resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param flow
+ *   Pointer to flow rule.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+static int
+flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	ret = flow_hw_async_flow_destroy(dev, queue, &op_attr, flow, NULL, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to enqueue destroy control"
+			" flow operation", dev->data->port_id);
+		goto exit;
+	}
+	ret = flow_hw_push(dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			dev->data->port_id);
+		goto exit;
+	}
+	ret = __flow_hw_pull_comp(dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to destroy control flow",
+			dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto exit;
+	}
+exit:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys control flows created on behalf of @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	if (owner_priv->sh->config.dv_esw_en) {
+		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u",
+				owner_port_id);
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+		proxy_priv = proxy_dev->data->dev_private;
+	} else {
+		proxy_dev = owner_dev;
+		proxy_priv = owner_priv;
+	}
+	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		if (cf->owner_dev == owner_dev) {
+			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+			if (ret) {
+				rte_errno = ret;
+				return -ret;
+			}
+			LIST_REMOVE(cf, next);
+			mlx5_free(cf);
+		}
+		cf = cf_next;
+	}
+	return 0;
+}
+
+/**
+ * Destroys all control flows created on @p dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+static int
+flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	int ret;
+
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
+		if (ret) {
+			rte_errno = ret;
+			return -ret;
+		}
+		LIST_REMOVE(cf, next);
+		mlx5_free(cf);
+		cf = cf_next;
+	}
+	return 0;
+}
+
+int
+mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HW_SQ_MISS_GROUP,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx ||
+	    !priv->hw_esw_sq_miss_root_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_esw_sq_miss_root_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = txq,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_ethdev port = {
+		.port_id = port_id,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	RTE_SET_USED(txq);
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
+	    !proxy_priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_sq_miss_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = port_id,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = 1,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_zero_tbl,
+					items, 0, actions, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index fd902078f8..7ffaf4c227 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1245,12 +1245,14 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 	uint16_t ether_type = 0;
 	bool is_empty_vlan = false;
 	uint16_t udp_dport = 0;
+	bool is_root;
 
 	if (items == NULL)
 		return -1;
 	ret = mlx5_flow_validate_attributes(dev, attr, error);
 	if (ret < 0)
 		return ret;
+	is_root = ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int ret = 0;
@@ -1380,7 +1382,7 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index c68b32cf14..6313602a66 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1280,6 +1280,52 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+
+static int
+mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	int ret;
+
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
+			goto error;
+	}
+	for (i = 0; i < priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
+		uint32_t queue;
+
+		if (!txq)
+			continue;
+		if (txq->is_hairpin)
+			queue = txq->obj->sq->id;
+		else
+			queue = txq->obj->sq_obj.sq->id;
+		if ((priv->representor || priv->master) &&
+		    priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
+	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
+			goto error;
+	}
+	return 0;
+error:
+	ret = rte_errno;
+	mlx5_flow_hw_flush_ctrl_flows(dev);
+	rte_errno = ret;
+	return -rte_errno;
+}
+
+#endif
+
 /**
  * Enable traffic flows configured by control plane
  *
@@ -1316,6 +1362,10 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 	unsigned int j;
 	int ret;
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_traffic_enable_hws(dev);
+#endif
 	/*
 	 * Hairpin txq default flow should be created no matter if it is
 	 * isolation mode. Or else all the packets to be sent will be sent
@@ -1346,13 +1396,17 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_create_esw_table_zero_flow(dev))
-			priv->fdb_def_rule = 1;
-		else
-			DRV_LOG(INFO, "port %u FDB default rule cannot be"
-				" configured - only Eswitch group 0 flows are"
-				" supported.", dev->data->port_id);
+	if (priv->sh->config.fdb_def_rule) {
+		if (priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_create_esw_table_zero_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				DRV_LOG(INFO, "port %u FDB default rule cannot be configured - only Eswitch group 0 flows are supported.",
+					dev->data->port_id);
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled",
+			dev->data->port_id);
 	}
 	if (!priv->sh->config.lacp_by_user && priv->pf_bond >= 0) {
 		ret = mlx5_flow_lacp_miss(dev);
@@ -1470,7 +1524,14 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 void
 mlx5_traffic_disable(struct rte_eth_dev *dev)
 {
-	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		mlx5_flow_hw_flush_ctrl_flows(dev);
+	else
+#endif
+		mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 06/17] net/mlx5: add extended metadata mode for hardware steering
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (4 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 05/17] net/mlx5: add HW steering port action Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 07/17] net/mlx5: add HW steering meter action Suanming Mou
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Bing Zhao

From: Bing Zhao <bingz@nvidia.com>

The new mode 4 of devarg "dv_xmeta_en" is added for HWS only. In this
mode, the Rx / Tx metadata with 32b width copy between FDB and NIC is
supported. The mark is only supported in NIC and there is no copy
supported.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  10 +-
 drivers/net/mlx5/mlx5.c          |   7 +-
 drivers/net/mlx5/mlx5.h          |   8 +-
 drivers/net/mlx5/mlx5_flow.c     |   8 +-
 drivers/net/mlx5/mlx5_flow.h     |  14 +
 drivers/net/mlx5/mlx5_flow_dv.c  |  21 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 862 ++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_trigger.c  |   3 +
 8 files changed, 851 insertions(+), 82 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index e0586a4d6f..061b825e7b 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1554,6 +1554,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+			DRV_LOG(ERR,
+				"metadata mode %u is not supported in HWS eswitch mode",
+				priv->sh->config.dv_xmeta_en);
+				err = ENOTSUP;
+				goto error;
+		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
@@ -1569,7 +1578,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		goto error;
 #endif
 	}
-	/* Port representor shares the same max priority with pf port. */
 	if (!priv->sh->flow_priority_check_flag) {
 		/* Supported Verbs flow priority number detection. */
 		err = mlx5_flow_discover_priorities(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 74adb677f4..cf5146d677 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1218,7 +1218,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
 		    tmp != MLX5_XMETA_MODE_META32 &&
-		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
+		    tmp != MLX5_XMETA_MODE_MISS_INFO &&
+		    tmp != MLX5_XMETA_MODE_META32_HWS) {
 			DRV_LOG(ERR, "Invalid extensive metadata parameter.");
 			rte_errno = EINVAL;
 			return -rte_errno;
@@ -2849,6 +2850,10 @@ mlx5_set_metadata_mask(struct rte_eth_dev *dev)
 		meta = UINT32_MAX;
 		mark = (reg_c0 >> rte_bsf32(reg_c0)) & MLX5_FLOW_MARK_MASK;
 		break;
+	case MLX5_XMETA_MODE_META32_HWS:
+		meta = UINT32_MAX;
+		mark = MLX5_FLOW_MARK_MASK;
+		break;
 	default:
 		meta = 0;
 		mark = 0;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 84f6937c95..49981a8d33 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -298,8 +298,8 @@ struct mlx5_sh_config {
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
-	unsigned int dv_flow_en:2;
-	uint32_t dv_xmeta_en:2; /* Enable extensive flow metadata. */
+	uint32_t dv_flow_en:2; /* Enable DV flow. */
+	uint32_t dv_xmeta_en:3; /* Enable extensive flow metadata. */
 	uint32_t dv_miss_info:1; /* Restore packet after partial hw miss. */
 	uint32_t l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	uint32_t vf_nl_en:1; /* Enable Netlink requests in VF mode. */
@@ -312,7 +312,6 @@ struct mlx5_sh_config {
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
@@ -1279,12 +1278,12 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
+	/* Availability of mreg_c's. */
 	void *devx_channel_lwm;
 	struct rte_intr_handle *intr_handle_lwm;
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
-	/* Availability of mreg_c's. */
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1508,6 +1507,7 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
+	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 9c44b2e99b..b570ed7f69 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1107,6 +1107,8 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_METADATA_TX:
@@ -1119,11 +1121,14 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_FLOW_MARK:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
+		case MLX5_XMETA_MODE_META32_HWS:
 			return REG_NON;
 		case MLX5_XMETA_MODE_META16:
 			return REG_C_1;
@@ -4442,7 +4447,8 @@ static bool flow_check_modify_action_type(struct rte_eth_dev *dev,
 		return true;
 	case RTE_FLOW_ACTION_TYPE_FLAG:
 	case RTE_FLOW_ACTION_TYPE_MARK:
-		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY)
+		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS)
 			return true;
 		else
 			return false;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 93f0e189d4..a8b27ea494 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -48,6 +48,12 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
 };
 
+/* Private (internal) Field IDs for MODIFY_FIELD action. */
+enum mlx5_rte_flow_field_id {
+		MLX5_RTE_FLOW_FIELD_END = INT_MIN,
+			MLX5_RTE_FLOW_FIELD_META_REG,
+};
+
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
 
 enum {
@@ -1167,6 +1173,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
+	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
 };
 
 /* Jump action struct. */
@@ -1243,6 +1250,11 @@ struct mlx5_flow_group {
 #define MLX5_HW_TBL_MAX_ITEM_TEMPLATE 2
 #define MLX5_HW_TBL_MAX_ACTION_TEMPLATE 32
 
+struct mlx5_flow_template_table_cfg {
+	struct rte_flow_template_table_attr attr; /* Table attributes passed through flow API. */
+	bool external; /* True if created by flow API, false if table is internal to PMD. */
+};
+
 struct rte_flow_template_table {
 	LIST_ENTRY(rte_flow_template_table) next;
 	struct mlx5_flow_group *grp; /* The group rte_flow_template_table uses. */
@@ -1252,6 +1264,7 @@ struct rte_flow_template_table {
 	/* Action templates bind to the table. */
 	struct mlx5_hw_action_template ats[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	struct mlx5_indexed_pool *flow; /* The table's flow ipool. */
+	struct mlx5_flow_template_table_cfg cfg;
 	uint32_t type; /* Flow table type RX/TX/FDB. */
 	uint8_t nb_item_templates; /* Item template number. */
 	uint8_t nb_action_templates; /* Action template number. */
@@ -2333,4 +2346,5 @@ int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index ace69c2b40..1ed0a8ab80 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1783,7 +1783,8 @@ mlx5_flow_field_id_to_modify_info
 			int reg;
 
 			if (priv->sh->config.dv_flow_en == 2)
-				reg = REG_C_1;
+				reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG,
+							 data->level);
 			else
 				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
 							   data->level, error);
@@ -1862,6 +1863,24 @@ mlx5_flow_field_id_to_modify_info
 		else
 			info[idx].offset = off_be;
 		break;
+	case MLX5_RTE_FLOW_FIELD_META_REG:
+		{
+			uint32_t meta_mask = priv->sh->dv_meta_mask;
+			uint32_t meta_count = __builtin_popcount(meta_mask);
+			uint32_t reg = data->level;
+
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT(reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0, reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 728370328c..e32e673d1a 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,13 +20,27 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
-/* Maximum number of rules in control flow tables */
+/* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Flow group for SQ miss default flows/ */
-#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+/* Lowest flow group usable by an application. */
+#define MLX5_HW_LOWEST_USABLE_GROUP (1)
+
+/* Maximum group index usable by user applications for transfer flows. */
+#define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
+
+/* Lowest priority for HW root table. */
+#define MLX5_HW_LOWEST_PRIO_ROOT 15
+
+/* Lowest priority for HW non-root table. */
+#define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+static int flow_hw_translate_group(struct rte_eth_dev *dev,
+				   const struct mlx5_flow_template_table_cfg *cfg,
+				   uint32_t group,
+				   uint32_t *table_group,
+				   struct rte_flow_error *error);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -213,12 +227,12 @@ flow_hw_rss_item_flags_get(const struct rte_flow_item items[])
  */
 static struct mlx5_hw_jump_action *
 flow_hw_jump_action_register(struct rte_eth_dev *dev,
-			     const struct rte_flow_attr *attr,
+			     const struct mlx5_flow_template_table_cfg *cfg,
 			     uint32_t dest_group,
 			     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_attr jattr = *attr;
+	struct rte_flow_attr jattr = cfg->attr.flow_attr;
 	struct mlx5_flow_group *grp;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -226,9 +240,13 @@ flow_hw_jump_action_register(struct rte_eth_dev *dev,
 		.data = &jattr,
 	};
 	struct mlx5_list_entry *ge;
+	uint32_t target_group;
 
-	jattr.group = dest_group;
-	ge = mlx5_hlist_register(priv->sh->flow_tbls, dest_group, &ctx);
+	target_group = dest_group;
+	if (flow_hw_translate_group(dev, cfg, dest_group, &target_group, error))
+		return NULL;
+	jattr.group = target_group;
+	ge = mlx5_hlist_register(priv->sh->flow_tbls, target_group, &ctx);
 	if (!ge)
 		return NULL;
 	grp = container_of(ge, struct mlx5_flow_group, entry);
@@ -760,7 +778,8 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)conf->src.pvalue :
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
-		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
 			item.spec = &value;
@@ -860,6 +879,9 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	if (m && !!m->port_id) {
 		struct mlx5_priv *port_priv;
 
+		if (!v)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
 		if (port_priv == NULL)
 			return rte_flow_error_set
@@ -903,8 +925,8 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] table_attr
- *   Pointer to the table attributes.
+ * @param[in] cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in/out] acts
@@ -919,12 +941,13 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  */
 static int
 flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct rte_flow_template_table_attr *table_attr,
+			  const struct mlx5_flow_template_table_cfg *cfg,
 			  struct mlx5_hw_actions *acts,
 			  struct rte_flow_actions_template *at,
 			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
 	const struct rte_flow_attr *attr = &table_attr->flow_attr;
 	struct rte_flow_action *actions = at->actions;
 	struct rte_flow_action *action_start = actions;
@@ -991,7 +1014,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
 				acts->jump = flow_hw_jump_action_register
-						(dev, attr, jump_group, error);
+						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
 				acts->rule_acts[i].action = (!!attr->group) ?
@@ -1101,6 +1124,16 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 							   error);
 			if (err)
 				goto err;
+			/*
+			 * Adjust the action source position for the following.
+			 * ... / MODIFY_FIELD: rx_cpy_pos / (QUEUE|RSS) / ...
+			 * The next action will be Q/RSS, there will not be
+			 * another adjustment and the real source position of
+			 * the following actions will be decreased by 1.
+			 * No change of the total actions in the new template.
+			 */
+			if ((actions - action_start) == at->rx_cpy_pos)
+				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			if (flow_hw_represented_port_compile
@@ -1365,7 +1398,8 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 	else
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
-	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
 	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
@@ -1513,7 +1547,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
 			jump = flow_hw_jump_action_register
-				(dev, &attr, jump_group, NULL);
+				(dev, &table->cfg, jump_group, NULL);
 			if (!jump)
 				return -1;
 			rule_acts[act_data->action_dst].action =
@@ -1710,7 +1744,13 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow actions based on the input actions.*/
+	/*
+	 * Construct the flow actions based on the input actions.
+	 * The implicitly appended action is always fixed, like metadata
+	 * copy action from FDB to NIC Rx.
+	 * No need to copy and contrust a new "actions" list based on the
+	 * user's input, in order to save the cost.
+	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
 				  actions, rule_acts, &acts_num)) {
 		rte_errno = EINVAL;
@@ -2018,8 +2058,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] attr
- *   Pointer to the table attributes.
+ * @param[in] table_cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in] nb_item_templates
@@ -2036,7 +2076,7 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  */
 static struct rte_flow_template_table *
 flow_hw_table_create(struct rte_eth_dev *dev,
-		     const struct rte_flow_template_table_attr *attr,
+		     const struct mlx5_flow_template_table_cfg *table_cfg,
 		     struct rte_flow_pattern_template *item_templates[],
 		     uint8_t nb_item_templates,
 		     struct rte_flow_actions_template *action_templates[],
@@ -2048,6 +2088,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -2088,6 +2129,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*tbl), 0, rte_socket_id());
 	if (!tbl)
 		goto error;
+	tbl->cfg = *table_cfg;
 	/* Allocate flow indexed pool. */
 	tbl->flow = mlx5_ipool_create(&cfg);
 	if (!tbl->flow)
@@ -2131,7 +2173,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			goto at_error;
 		}
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, attr,
+		err = flow_hw_actions_translate(dev, &tbl->cfg,
 						&tbl->ats[i].acts,
 						action_templates[i], error);
 		if (err) {
@@ -2174,6 +2216,96 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Translates group index specified by the user in @p attr to internal
+ * group index.
+ *
+ * Translation is done by incrementing group index, so group n becomes n + 1.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] cfg
+ *   Pointer to the template table configuration.
+ * @param[in] group
+ *   Currently used group index (table group or jump destination).
+ * @param[out] table_group
+ *   Pointer to output group index.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success. Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static int
+flow_hw_translate_group(struct rte_eth_dev *dev,
+			const struct mlx5_flow_template_table_cfg *cfg,
+			uint32_t group,
+			uint32_t *table_group,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
+
+	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
+	} else {
+		*table_group = group;
+	}
+	return 0;
+}
+
+/**
+ * Create flow table.
+ *
+ * This function is a wrapper over @ref flow_hw_table_create(), which translates parameters
+ * provided by user to proper internal values.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Pointer to the table attributes.
+ * @param[in] item_templates
+ *   Item template array to be binded to the table.
+ * @param[in] nb_item_templates
+ *   Number of item templates.
+ * @param[in] action_templates
+ *   Action template array to be binded to the table.
+ * @param[in] nb_action_templates
+ *   Number of action templates.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Table on success, Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static struct rte_flow_template_table *
+flow_hw_template_table_create(struct rte_eth_dev *dev,
+			      const struct rte_flow_template_table_attr *attr,
+			      struct rte_flow_pattern_template *item_templates[],
+			      uint8_t nb_item_templates,
+			      struct rte_flow_actions_template *action_templates[],
+			      uint8_t nb_action_templates,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = *attr,
+		.external = true,
+	};
+	uint32_t group = attr->flow_attr.group;
+
+	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
+		return NULL;
+	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
+				    action_templates, nb_action_templates, error);
+}
+
 /**
  * Destroy flow table.
  *
@@ -2290,10 +2422,13 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 					  "cannot use represented_port actions"
 					  " without an E-Switch");
-	if (mask_conf->port_id) {
+	if (mask_conf && mask_conf->port_id) {
 		struct mlx5_priv *port_priv;
 		struct mlx5_priv *dev_priv;
 
+		if (!action_conf)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
 		if (!port_priv)
 			return rte_flow_error_set(error, rte_errno,
@@ -2318,20 +2453,77 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static inline int
+flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
+				const struct rte_flow_action masks[],
+				const struct rte_flow_action *ins_actions,
+				const struct rte_flow_action *ins_masks,
+				struct rte_flow_action *new_actions,
+				struct rte_flow_action *new_masks,
+				uint16_t *ins_pos)
+{
+	uint16_t idx, total = 0;
+	bool ins = false;
+	bool act_end = false;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(ins_actions && ins_masks);
+	for (idx = 0; !act_end; idx++) {
+		if (idx >= MLX5_HW_MAX_ACTS)
+			return -1;
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
+		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+			ins = true;
+			*ins_pos = idx;
+		}
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+			act_end = true;
+	}
+	if (!ins)
+		return 0;
+	else if (idx == MLX5_HW_MAX_ACTS)
+		return -1; /* No more space. */
+	total = idx;
+	/* Before the position, no change for the actions. */
+	for (idx = 0; idx < *ins_pos; idx++) {
+		new_actions[idx] = actions[idx];
+		new_masks[idx] = masks[idx];
+	}
+	/* Insert the new action and mask to the position. */
+	new_actions[idx] = *ins_actions;
+	new_masks[idx] = *ins_masks;
+	/* Remaining content is right shifted by one position. */
+	for (; idx < total; idx++) {
+		new_actions[idx + 1] = actions[idx];
+		new_masks[idx + 1] = masks[idx];
+	}
+	return 0;
+}
+
 static int
 flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
-	int i;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t i;
 	bool actions_end = false;
 	int ret;
 
+	/* FDB actions are only valid to proxy port. */
+	if (attr->transfer && (!priv->sh->config.dv_esw_en || !priv->master))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "transfer actions are only valid to proxy port");
 	for (i = 0; !actions_end; ++i) {
 		const struct rte_flow_action *action = &actions[i];
 		const struct rte_flow_action *mask = &masks[i];
 
+		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
 		if (action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -2428,21 +2620,77 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int len, act_len, mask_len, i;
-	struct rte_flow_actions_template *at;
+	struct rte_flow_actions_template *at = NULL;
+	uint16_t pos = MLX5_HW_MAX_ACTS;
+	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
+	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
+	const struct rte_flow_action *ra;
+	const struct rte_flow_action *rm;
+	const struct rte_flow_action_modify_field rx_mreg = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_B,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field rx_mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action rx_cpy = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg,
+	};
+	const struct rte_flow_action rx_cpy_mask = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg_mask,
+	};
 
-	if (flow_hw_action_validate(dev, actions, masks, error))
+	if (flow_hw_action_validate(dev, attr, actions, masks, error))
 		return NULL;
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				NULL, 0, actions, error);
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en) {
+		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
+						    tmp_action, tmp_mask, &pos)) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "Failed to concatenate new action/mask");
+			return NULL;
+		}
+	}
+	/* Application should make sure only one Q/RSS exist in one rule. */
+	if (pos == MLX5_HW_MAX_ACTS) {
+		ra = actions;
+		rm = masks;
+	} else {
+		ra = tmp_action;
+		rm = tmp_mask;
+	}
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
 		return NULL;
 	len = RTE_ALIGN(act_len, 16);
-	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				 NULL, 0, masks, error);
+	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, rm, error);
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at), 64, rte_socket_id());
+	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
+			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2450,18 +2698,20 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
+	/* Actions part is in the first half. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions, len,
-				actions, error);
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
+				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	at->masks = (struct rte_flow_action *)
-		    (((uint8_t *)at->actions) + act_len);
+	/* Masks part is in the second half. */
+	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
-				 len - act_len, masks, error);
+				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
 	 * The rte_flow_conv() function copies the content from conf pointer.
@@ -2478,7 +2728,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	mlx5_free(at);
+	if (at)
+		mlx5_free(at);
 	return NULL;
 }
 
@@ -2553,6 +2804,80 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 	return copied_items;
 }
 
+static int
+flow_hw_pattern_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error)
+{
+	int i;
+	bool items_end = false;
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+
+	for (i = 0; !items_end; i++) {
+		int type = items[i].type;
+
+		switch (type) {
+		case RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			int reg;
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+
+			reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, tag->index);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported tag index");
+			break;
+		}
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+			struct mlx5_priv *priv = dev->data->dev_private;
+			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
+
+			if (!((1 << (tag->index - REG_C_0)) & regcs))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported internal tag index");
+		}
+		case RTE_FLOW_ITEM_TYPE_VOID:
+		case RTE_FLOW_ITEM_TYPE_ETH:
+		case RTE_FLOW_ITEM_TYPE_VLAN:
+		case RTE_FLOW_ITEM_TYPE_IPV4:
+		case RTE_FLOW_ITEM_TYPE_IPV6:
+		case RTE_FLOW_ITEM_TYPE_UDP:
+		case RTE_FLOW_ITEM_TYPE_TCP:
+		case RTE_FLOW_ITEM_TYPE_GTP:
+		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ITEM_TYPE_VXLAN:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+		case RTE_FLOW_ITEM_TYPE_META:
+		case RTE_FLOW_ITEM_TYPE_GRE:
+		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
+		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
+		case RTE_FLOW_ITEM_TYPE_ICMP:
+		case RTE_FLOW_ITEM_TYPE_ICMP6:
+			break;
+		case RTE_FLOW_ITEM_TYPE_END:
+			items_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL,
+						  "Unsupported item type");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow item template.
  *
@@ -2579,6 +2904,8 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
 
+	if (flow_hw_pattern_validate(dev, attr, items, error))
+		return NULL;
 	if (priv->sh->config.dv_esw_en && attr->ingress) {
 		/*
 		 * Disallow pattern template with ingress and egress/transfer
@@ -3013,6 +3340,17 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+static uint32_t
+flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+{
+	uint32_t usable_mask = ~priv->vport_meta_mask;
+
+	if (usable_mask)
+		return (1 << rte_bsf32(usable_mask));
+	else
+		return 0;
+}
+
 /**
  * Creates a flow pattern template used to match on E-Switch Manager.
  * This template is used to set up a table for SQ miss default flow.
@@ -3051,7 +3389,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match on a TX queue.
+ * Creates a flow pattern template used to match REG_C_0 and a TX queue.
+ * Matching on REG_C_0 is set up to match on least significant bit usable
+ * by user-space, which is set when packet was originated from E-Switch Manager.
+ *
  * This template is used to set up a table for SQ miss default flow.
  *
  * @param dev
@@ -3061,16 +3402,30 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
  *   Pointer to flow pattern template on success, NULL otherwise.
  */
 static struct rte_flow_pattern_template *
-flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
 	};
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_tx_queue queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
@@ -3081,6 +3436,12 @@ flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
+		return NULL;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -3118,6 +3479,132 @@ flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
+/*
+ * Creating a flow pattern template with all ETH packets matching.
+ * This template is used to set up a table for default Tx copy (Tx metadata
+ * to REG_C_1) flow rule usage.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr tx_pa_attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_pattern_template_create(dev, &tx_pa_attr, eth_all, &drop_err);
+}
+
+/**
+ * Creates a flow actions template with modify field action and masked jump action.
+ * Modify field action sets the least significant bit of REG_C_0 (usable by user-space)
+ * to 1, meaning that packet was originated from E-Switch Manager. Jump action
+ * transfers steering to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
+	uint32_t marker_bit_mask = UINT32_MAX;
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
+		return NULL;
+	}
+	set_reg_v.dst.offset = rte_bsf32(marker_bit);
+	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
+	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
 /**
  * Creates a flow actions template with an unmasked JUMP action. Flows
  * based on this template will perform a jump to some group. This template
@@ -3212,6 +3699,73 @@ flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
 					       NULL);
 }
 
+/*
+ * Creating an actions template to use header modify action for register
+ * copying. This template is used to set up a table for copy flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr tx_act_attr = {
+		.egress = 1,
+	};
+	const struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	const struct rte_flow_action copy_reg_mask[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_mask,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
+					       copy_reg_mask, &drop_err);
+}
+
 /**
  * Creates a control flow table used to transfer traffic from E-Switch Manager
  * and TX queues from group 0 to group 1.
@@ -3241,8 +3795,12 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 
@@ -3267,16 +3825,56 @@ flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
 {
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
-			.group = MLX5_HW_SQ_MISS_GROUP,
-			.priority = 0,
+			.group = 1,
+			.priority = MLX5_HW_LOWEST_PRIO_NON_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
+}
+
+/*
+ * Creating the default Tx metadata copy table on NIC Tx group 0.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param pt
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
+					  struct rte_flow_pattern_template *pt,
+					  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr tx_tbl_attr = {
+		.flow_attr = {
+			.group = 0, /* Root */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = 1, /* One default flow rule for all. */
+	};
+	struct mlx5_flow_template_table_cfg tx_tbl_cfg = {
+		.attr = tx_tbl_attr,
+		.external = false,
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
 }
 
 /**
@@ -3301,15 +3899,19 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 15, /* TODO: Flow priority discovery. */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 /**
@@ -3327,11 +3929,14 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
-	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *regc_sq_items_tmpl = NULL;
 	struct rte_flow_pattern_template *port_items_tmpl = NULL;
-	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_pattern_template *tx_meta_items_tmpl = NULL;
+	struct rte_flow_actions_template *regc_jump_actions_tmpl = NULL;
 	struct rte_flow_actions_template *port_actions_tmpl = NULL;
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
+	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
 
 	/* Item templates */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
@@ -3340,8 +3945,8 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
-	if (!sq_items_tmpl) {
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create SQ item template for"
 			" control flows", dev->data->port_id);
 		goto error;
@@ -3352,11 +3957,18 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Action templates */
-	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
-									 MLX5_HW_SQ_MISS_GROUP);
-	if (!jump_sq_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
+	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
+	if (!regc_jump_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
@@ -3366,23 +3978,32 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
 	if (!jump_one_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
-			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_root_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
-	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
@@ -3397,6 +4018,16 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
+		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
+					tx_meta_items_tmpl, tx_meta_actions_tmpl);
+		if (!priv->hw_tx_meta_cpy_tbl) {
+			DRV_LOG(ERR, "port %u failed to create table for default"
+				" Tx metadata copy flow rule", dev->data->port_id);
+			goto error;
+		}
+	}
 	return 0;
 error:
 	if (priv->hw_esw_zero_tbl) {
@@ -3411,16 +4042,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
 	if (port_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
-	if (jump_sq_actions_tmpl)
-		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (regc_jump_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
-	if (sq_items_tmpl)
-		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (regc_sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, regc_sq_items_tmpl, NULL);
 	if (esw_mgr_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
 	return -EINVAL;
@@ -3472,7 +4107,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
-	int ret;
+	int ret = 0;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -3623,6 +4258,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	/* Do not overwrite the internal errno information. */
+	if (ret)
+		return ret;
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -3732,17 +4370,17 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		return;
 	unset |= 1 << (priv->mtr_color_reg - REG_C_0);
 	unset |= 1 << (REG_C_6 - REG_C_0);
-	if (meta_mode == MLX5_XMETA_MODE_META32_HWS) {
-		unset |= 1 << (REG_C_1 - REG_C_0);
+	if (priv->sh->config.dv_esw_en)
 		unset |= 1 << (REG_C_0 - REG_C_0);
-	}
+	if (meta_mode == MLX5_XMETA_MODE_META32_HWS)
+		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
 						mlx5_flow_hw_avl_tags[i];
-				copy_masks |= (1 << i);
+				copy_masks |= (1 << (mlx5_flow_hw_avl_tags[i] - REG_C_0));
 			}
 		}
 		if (copy_masks != masks) {
@@ -3884,7 +4522,6 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	return flow_dv_action_destroy(dev, handle, error);
 }
 
-
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -3892,7 +4529,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
-	.template_table_create = flow_hw_table_create,
+	.template_table_create = flow_hw_template_table_create,
 	.template_table_destroy = flow_hw_table_destroy,
 	.async_flow_create = flow_hw_async_flow_create,
 	.async_flow_destroy = flow_hw_async_flow_destroy,
@@ -3908,13 +4545,6 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
-static uint32_t
-flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
-{
-	MLX5_ASSERT(priv->nb_queue > 0);
-	return priv->nb_queue - 1;
-}
-
 /**
  * Creates a control flow using flow template API on @p proxy_dev device,
  * on behalf of @p owner_dev device.
@@ -3952,7 +4582,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4027,7 +4657,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4164,10 +4794,24 @@ mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
+	struct rte_flow_action_modify_field modify_field = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
 	struct rte_flow_action_jump jump = {
-		.group = MLX5_HW_SQ_MISS_GROUP,
+		.group = 1,
 	};
 	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &modify_field,
+		},
 		{
 			.type = RTE_FLOW_ACTION_TYPE_JUMP,
 			.conf = &jump,
@@ -4190,6 +4834,12 @@ int
 mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 {
 	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_tx_queue queue_spec = {
 		.queue = txq,
 	};
@@ -4197,6 +4847,12 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
@@ -4222,6 +4878,7 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
+	uint32_t marker_bit;
 	int ret;
 
 	RTE_SET_USED(txq);
@@ -4242,6 +4899,14 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
+	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_create_ctrl_flow(dev, proxy_dev,
 					proxy_priv->hw_esw_sq_miss_tbl,
 					items, 0, actions, 0);
@@ -4301,4 +4966,53 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 					items, 0, actions, 0);
 }
 
+int
+mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx || !priv->hw_tx_meta_cpy_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_tx_meta_cpy_tbl,
+					eth_all, 0, copy_reg_action, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 6313602a66..ccefebefc9 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1292,6 +1292,9 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	if (priv->sh->config.dv_esw_en && priv->master) {
 		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
 			goto error;
+		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
+			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+				goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 07/17] net/mlx5: add HW steering meter action
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (5 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 06/17] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 08/17] net/mlx5: add HW steering counter action Suanming Mou
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

This commit adds meter action for HWS steering.

HW steering meter is based on ASO. The number of meters will
be used by flows should be specified in advanced in the flow
configure API.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |  61 ++-
 drivers/net/mlx5/mlx5_flow.c       |  71 +++
 drivers/net/mlx5/mlx5_flow.h       |  50 ++
 drivers/net/mlx5/mlx5_flow_aso.c   |  30 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  25 -
 drivers/net/mlx5/mlx5_flow_hw.c    | 264 ++++++++++-
 drivers/net/mlx5/mlx5_flow_meter.c | 702 ++++++++++++++++++++++++++++-
 7 files changed, 1142 insertions(+), 61 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 49981a8d33..9fbb6ee2b0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -357,6 +357,9 @@ struct mlx5_hw_q {
 	struct mlx5_hw_q_job **job; /* LIFO header. */
 } __rte_cache_aligned;
 
+
+
+
 #define MLX5_COUNTERS_PER_POOL 512
 #define MLX5_MAX_PENDING_QUERIES 4
 #define MLX5_CNT_CONTAINER_RESIZE 64
@@ -782,15 +785,29 @@ struct mlx5_flow_meter_policy {
 	/* Is meter action in policy table. */
 	uint32_t hierarchy_drop_cnt:1;
 	/* Is any meter in hierarchy contains drop_cnt. */
+	uint32_t skip_r:1;
+	/* If red color policy is skipped. */
 	uint32_t skip_y:1;
 	/* If yellow color policy is skipped. */
 	uint32_t skip_g:1;
 	/* If green color policy is skipped. */
 	uint32_t mark:1;
 	/* If policy contains mark action. */
+	uint32_t initialized:1;
+	/* Initialized. */
+	uint16_t group;
+	/* The group. */
 	rte_spinlock_t sl;
 	uint32_t ref_cnt;
 	/* Use count. */
+	struct rte_flow_pattern_template *hws_item_templ;
+	/* Hardware steering item templates. */
+	struct rte_flow_actions_template *hws_act_templ[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering action templates. */
+	struct rte_flow_template_table *hws_flow_table[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering tables. */
+	struct rte_flow *hws_flow_rule[MLX5_MTR_DOMAIN_MAX][RTE_COLORS];
+	/* Hardware steering rules. */
 	struct mlx5_meter_policy_action_container act_cnt[MLX5_MTR_RTE_COLORS];
 	/* Policy actions container. */
 	void *dr_drop_action[MLX5_MTR_DOMAIN_MAX];
@@ -865,6 +882,7 @@ struct mlx5_flow_meter_info {
 	 */
 	uint32_t transfer:1;
 	uint32_t def_policy:1;
+	uint32_t initialized:1;
 	/* Meter points to default policy. */
 	uint32_t color_aware:1;
 	/* Meter is color aware mode. */
@@ -880,6 +898,10 @@ struct mlx5_flow_meter_info {
 	/**< Flow meter action. */
 	void *meter_action_y;
 	/**< Flow meter action for yellow init_color. */
+	uint32_t meter_offset;
+	/**< Flow meter offset. */
+	uint16_t group;
+	/**< Flow meter group. */
 };
 
 /* PPS(packets per second) map to BPS(Bytes per second).
@@ -914,6 +936,7 @@ struct mlx5_flow_meter_profile {
 	uint32_t ref_cnt; /**< Use count. */
 	uint32_t g_support:1; /**< If G color will be generated. */
 	uint32_t y_support:1; /**< If Y color will be generated. */
+	uint32_t initialized:1; /**< Initialized. */
 };
 
 /* 2 meters in each ASO cache line */
@@ -934,13 +957,20 @@ enum mlx5_aso_mtr_state {
 	ASO_METER_READY, /* CQE received. */
 };
 
+/*aso flow meter type*/
+enum mlx5_aso_mtr_type {
+	ASO_METER_INDIRECT,
+	ASO_METER_DIRECT,
+};
+
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
 	LIST_ENTRY(mlx5_aso_mtr) next;
+	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
-	uint8_t offset;
+	uint32_t offset;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -964,6 +994,14 @@ struct mlx5_aso_mtr_pools_mng {
 	struct mlx5_aso_mtr_pool **pools; /* ASO flow meter pool array. */
 };
 
+/* Bulk management structure for ASO flow meter. */
+struct mlx5_mtr_bulk {
+	uint32_t size; /* Number of ASO objects. */
+	struct mlx5dr_action *action; /* HWS action */
+	struct mlx5_devx_obj *devx_obj; /* DEVX object. */
+	struct mlx5_aso_mtr *aso; /* Array of ASO objects. */
+};
+
 /* Meter management structure for global flow meter resource. */
 struct mlx5_flow_mtr_mng {
 	struct mlx5_aso_mtr_pools_mng pools_mng;
@@ -1017,6 +1055,7 @@ struct mlx5_flow_tbl_resource {
 #define MLX5_FLOW_TABLE_LEVEL_METER (MLX5_MAX_TABLES - 3)
 #define MLX5_FLOW_TABLE_LEVEL_POLICY (MLX5_MAX_TABLES - 4)
 #define MLX5_MAX_TABLES_EXTERNAL MLX5_FLOW_TABLE_LEVEL_POLICY
+#define MLX5_FLOW_TABLE_HWS_POLICY (MLX5_MAX_TABLES - 10)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 #define MLX5_FLOW_TABLE_FACTOR 10
 
@@ -1303,6 +1342,12 @@ TAILQ_HEAD(mlx5_mtr_profiles, mlx5_flow_meter_profile);
 /* MTR list. */
 TAILQ_HEAD(mlx5_legacy_flow_meters, mlx5_legacy_flow_meter);
 
+struct mlx5_mtr_config {
+	uint32_t nb_meters; /**< Number of configured meters */
+	uint32_t nb_meter_profiles; /**< Number of configured meter profiles */
+	uint32_t nb_meter_policies; /**< Number of configured meter policies */
+};
+
 /* RSS description. */
 struct mlx5_flow_rss_desc {
 	uint32_t level;
@@ -1538,12 +1583,16 @@ struct mlx5_priv {
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
+	struct mlx5_mtr_config mtr_config; /* Meter configuration */
 	uint8_t mtr_sfx_reg; /* Meter prefix-suffix flow match REG_C. */
 	uint8_t mtr_color_reg; /* Meter color match REG_C. */
 	struct mlx5_legacy_flow_meters flow_meters; /* MTR list. */
 	struct mlx5_l3t_tbl *mtr_profile_tbl; /* Meter index lookup table. */
+	struct mlx5_flow_meter_profile *mtr_profile_arr; /* Profile array. */
 	struct mlx5_l3t_tbl *policy_idx_tbl; /* Policy index lookup table. */
+	struct mlx5_flow_meter_policy *mtr_policy_arr; /* Policy array. */
 	struct mlx5_l3t_tbl *mtr_idx_tbl; /* Meter index lookup table. */
+	struct mlx5_mtr_bulk mtr_bulk; /* Meter index mapping for HWS */
 	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 	uint8_t fdb_def_rule; /* Whether fdb jump to table 1 is configured. */
 	struct mlx5_mp_id mp_id; /* ID of a multi-process process */
@@ -1557,13 +1606,13 @@ struct mlx5_priv {
 	struct mlx5_flex_item flex_item[MLX5_PORT_FLEX_ITEM_NUM];
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
+	uint32_t nb_queue; /* HW steering queue number. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
 	/* Action template list. */
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
-	uint32_t nb_queue; /* HW steering queue number. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
@@ -1579,6 +1628,7 @@ struct mlx5_priv {
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
 #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])
+#define CTRL_QUEUE_ID(priv) ((priv)->nb_queue - 1)
 
 struct rte_hairpin_peer_info {
 	uint32_t qp_id;
@@ -1890,6 +1940,11 @@ void mlx5_pmd_socket_uninit(void);
 
 /* mlx5_flow_meter.c */
 
+int mlx5_flow_meter_init(struct rte_eth_dev *dev,
+			 uint32_t nb_meters,
+			 uint32_t nb_meter_profiles,
+			 uint32_t nb_meter_policies);
+void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
 		uint32_t meter_id, uint32_t *mtr_idx);
@@ -1964,7 +2019,7 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b570ed7f69..fb3be940e5 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8331,6 +8331,40 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 	return fops->configure(dev, port_attr, nb_queue, queue_attr, error);
 }
 
+/**
+ * Validate item template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the item template attributes.
+ * @param[in] items
+ *   The template item pattern.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"pattern validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->pattern_validate(dev, attr, items, error);
+}
+
 /**
  * Create flow item template.
  *
@@ -8396,6 +8430,43 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 	return fops->pattern_template_destroy(dev, template, error);
 }
 
+/**
+ * Validate flow actions template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the action template attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[in] masks
+ *   List of actions that marks which of the action's member is constant.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
+			const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"actions validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->actions_validate(dev, attr, actions, masks, error);
+}
+
 /**
  * Create flow item template.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a8b27ea494..3bde95c927 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1654,6 +1654,11 @@ typedef int (*mlx5_flow_port_configure_t)
 			 uint16_t nb_queue,
 			 const struct rte_flow_queue_attr *queue_attr[],
 			 struct rte_flow_error *err);
+typedef int (*mlx5_flow_pattern_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_pattern_template *(*mlx5_flow_pattern_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_pattern_template_attr *attr,
@@ -1663,6 +1668,12 @@ typedef int (*mlx5_flow_pattern_template_destroy_t)
 			(struct rte_eth_dev *dev,
 			 struct rte_flow_pattern_template *template,
 			 struct rte_flow_error *error);
+typedef int (*mlx5_flow_actions_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_actions_template *(*mlx5_flow_actions_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_actions_template_attr *attr,
@@ -1779,8 +1790,10 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_item_update_t item_update;
 	mlx5_flow_info_get_t info_get;
 	mlx5_flow_port_configure_t configure;
+	mlx5_flow_pattern_validate_t pattern_validate;
 	mlx5_flow_pattern_template_create_t pattern_template_create;
 	mlx5_flow_pattern_template_destroy_t pattern_template_destroy;
+	mlx5_flow_actions_validate_t actions_validate;
 	mlx5_flow_actions_template_create_t actions_template_create;
 	mlx5_flow_actions_template_destroy_t actions_template_destroy;
 	mlx5_flow_table_create_t template_table_create;
@@ -1862,6 +1875,8 @@ mlx5_aso_meter_by_idx(struct mlx5_priv *priv, uint32_t idx)
 
 	/* Decrease to original index. */
 	idx--;
+	if (priv->mtr_bulk.aso)
+		return priv->mtr_bulk.aso + idx;
 	MLX5_ASSERT(idx / MLX5_ASO_MTRS_PER_POOL < pools_mng->n);
 	rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 	pool = pools_mng->pools[idx / MLX5_ASO_MTRS_PER_POOL];
@@ -1964,6 +1979,32 @@ mlx5_translate_tunnel_etypes(uint64_t pattern_flags)
 
 int flow_hw_q_flow_flush(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error);
+
+/*
+ * Convert rte_mtr_color to mlx5 color.
+ *
+ * @param[in] rcol
+ *   rte_mtr_color.
+ *
+ * @return
+ *   mlx5 color.
+ */
+static inline int
+rte_col_2_mlx5_col(enum rte_color rcol)
+{
+	switch (rcol) {
+	case RTE_COLOR_GREEN:
+		return MLX5_FLOW_COLOR_GREEN;
+	case RTE_COLOR_YELLOW:
+		return MLX5_FLOW_COLOR_YELLOW;
+	case RTE_COLOR_RED:
+		return MLX5_FLOW_COLOR_RED;
+	default:
+		break;
+	}
+	return MLX5_FLOW_COLOR_UNDEFINED;
+}
+
 int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
 			     const struct mlx5_flow_tunnel *tunnel,
 			     uint32_t group, uint32_t *table,
@@ -2347,4 +2388,13 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_actions_template_attr *attr,
+		const struct rte_flow_action actions[],
+		const struct rte_flow_action masks[],
+		struct rte_flow_error *error);
+int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 4129e3a9e0..60d0280367 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -642,7 +642,8 @@ mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh)
 static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
-			       struct mlx5_aso_mtr *aso_mtr)
+			       struct mlx5_aso_mtr *aso_mtr,
+			       struct mlx5_mtr_bulk *bulk)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -653,6 +654,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t dseg_idx = 0;
 	struct mlx5_aso_mtr_pool *pool = NULL;
 	uint32_t param_le;
+	int id;
 
 	rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
@@ -666,14 +668,19 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
-	pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-			mtrs[aso_mtr->offset]);
-	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
-			(aso_mtr->offset >> 1));
-	wqe->general_cseg.opcode = rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
-			(ASO_OPC_MOD_POLICER <<
-			WQE_CSEG_OPC_MOD_OFFSET) |
-			sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
+	if (aso_mtr->type == ASO_METER_INDIRECT) {
+		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+				    mtrs[aso_mtr->offset]);
+		id = pool->devx_obj->id;
+	} else {
+		id = bulk->devx_obj->id;
+	}
+	wqe->general_cseg.misc = rte_cpu_to_be_32(id +
+						  (aso_mtr->offset >> 1));
+	wqe->general_cseg.opcode =
+		rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
+			(ASO_OPC_MOD_POLICER << WQE_CSEG_OPC_MOD_OFFSET) |
+			 sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
 	/* There are 2 meters in one ASO cache line. */
 	dseg_idx = aso_mtr->offset & 0x1;
 	wqe->aso_cseg.data_mask =
@@ -811,14 +818,15 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  */
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-			struct mlx5_aso_mtr *mtr)
+			struct mlx5_aso_mtr *mtr,
+			struct mlx5_mtr_bulk *bulk)
 {
 	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 
 	do {
 		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1ed0a8ab80..90441fbd6e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -216,31 +216,6 @@ flow_dv_attr_init(const struct rte_flow_item *item, union flow_dv_attr *attr,
 	attr->valid = 1;
 }
 
-/*
- * Convert rte_mtr_color to mlx5 color.
- *
- * @param[in] rcol
- *   rte_mtr_color.
- *
- * @return
- *   mlx5 color.
- */
-static inline int
-rte_col_2_mlx5_col(enum rte_color rcol)
-{
-	switch (rcol) {
-	case RTE_COLOR_GREEN:
-		return MLX5_FLOW_COLOR_GREEN;
-	case RTE_COLOR_YELLOW:
-		return MLX5_FLOW_COLOR_YELLOW;
-	case RTE_COLOR_RED:
-		return MLX5_FLOW_COLOR_RED;
-	default:
-		break;
-	}
-	return MLX5_FLOW_COLOR_UNDEFINED;
-}
-
 struct field_modify_info modify_eth[] = {
 	{4,  0, MLX5_MODI_OUT_DMAC_47_16},
 	{2,  4, MLX5_MODI_OUT_DMAC_15_0},
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index e32e673d1a..6938e77609 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -914,6 +914,38 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_meter_compile(struct rte_eth_dev *dev,
+		      const struct mlx5_flow_template_table_cfg *cfg,
+		      uint32_t  start_pos, const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	const struct rte_flow_action_meter *meter = action->conf;
+	uint32_t pos = start_pos;
+	uint32_t group = cfg->attr.flow_attr.group;
+
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
+	acts->rule_acts[pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
+		acts->jump = flow_hw_jump_action_register
+		(dev, cfg, aso_mtr->fm.group, error);
+	if (!acts->jump) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	acts->rule_acts[++pos].action = (!!group) ?
+				    acts->jump->hws_action :
+				    acts->jump->root_action;
+	*end_pos = pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	return 0;
+}
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1142,6 +1174,21 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter *)
+			     masks->conf)->mtr_id) {
+				err = flow_hw_meter_compile(dev, cfg,
+						i, actions, acts, &i, error);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							i))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1482,6 +1529,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
+	const struct rte_flow_action_meter *meter = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1489,6 +1537,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	struct mlx5_aso_mtr *mtr;
+	uint32_t mtr_id;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -1608,6 +1658,29 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			rule_acts[act_data->action_dst].action =
 					priv->hw_vport[port_action->port_id];
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			meter = action->conf;
+			mtr_id = meter->mtr_id;
+			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			rule_acts[act_data->action_dst].action =
+				priv->mtr_bulk.action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+								mtr->offset;
+			jump = flow_hw_jump_action_register
+				(dev, &table->cfg, mtr->fm.group, NULL);
+			if (!jump)
+				return -1;
+			MLX5_ASSERT
+				(!rule_acts[act_data->action_dst + 1].action);
+			rule_acts[act_data->action_dst + 1].action =
+					(!!attr.group) ? jump->hws_action :
+							 jump->root_action;
+			job->flow->jump = jump;
+			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
+			(*acts_num)++;
+			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2502,7 +2575,7 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 }
 
 static int
-flow_hw_action_validate(struct rte_eth_dev *dev,
+flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
@@ -2568,6 +2641,9 @@ flow_hw_action_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -2661,7 +2737,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_action_validate(dev, attr, actions, masks, error))
+	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
@@ -3007,15 +3083,27 @@ flow_hw_pattern_template_destroy(struct rte_eth_dev *dev __rte_unused,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_hw_info_get(struct rte_eth_dev *dev __rte_unused,
-		 struct rte_flow_port_info *port_info __rte_unused,
-		 struct rte_flow_queue_info *queue_info __rte_unused,
+flow_hw_info_get(struct rte_eth_dev *dev,
+		 struct rte_flow_port_info *port_info,
+		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
-	/* Nothing to be updated currently. */
+	uint16_t port_id = dev->data->port_id;
+	struct rte_mtr_capabilities mtr_cap;
+	int ret;
+
 	memset(port_info, 0, sizeof(*port_info));
 	/* Queue size is unlimited from low-level. */
+	port_info->max_nb_queues = UINT32_MAX;
 	queue_info->max_size = UINT32_MAX;
+
+	memset(&mtr_cap, 0, sizeof(struct rte_mtr_capabilities));
+	ret = rte_mtr_capabilities_get(port_id, &mtr_cap, NULL);
+	if (!ret) {
+		port_info->max_nb_meters = mtr_cap.n_max;
+		port_info->max_nb_meter_profiles = UINT32_MAX;
+		port_info->max_nb_meter_policies = UINT32_MAX;
+	}
 	return 0;
 }
 
@@ -4210,6 +4298,13 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	priv->nb_queue = nb_q_updated;
 	rte_spinlock_init(&priv->hw_ctrl_lock);
 	LIST_INIT(&priv->hw_ctrl_flows);
+	/* Initialize meter library*/
+	if (port_attr->nb_meters)
+		if (mlx5_flow_meter_init(dev,
+					port_attr->nb_meters,
+					port_attr->nb_meter_profiles,
+					port_attr->nb_meter_policies))
+			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		uint32_t act_flags = 0;
@@ -4525,8 +4620,10 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
+	.pattern_validate = flow_hw_pattern_validate,
 	.pattern_template_create = flow_hw_pattern_template_create,
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
+	.actions_validate = flow_hw_actions_validate,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
 	.template_table_create = flow_hw_template_table_create,
@@ -4582,7 +4679,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4657,7 +4754,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -5015,4 +5112,155 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+void
+mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->mtr_policy_arr) {
+		mlx5_free(priv->mtr_policy_arr);
+		priv->mtr_policy_arr = NULL;
+	}
+	if (priv->mtr_profile_arr) {
+		mlx5_free(priv->mtr_profile_arr);
+		priv->mtr_profile_arr = NULL;
+	}
+	if (priv->mtr_bulk.aso) {
+		mlx5_free(priv->mtr_bulk.aso);
+		priv->mtr_bulk.aso = NULL;
+		priv->mtr_bulk.size = 0;
+		mlx5_aso_queue_uninit(priv->sh, ASO_OPC_MOD_POLICER);
+	}
+	if (priv->mtr_bulk.action) {
+		mlx5dr_action_destroy(priv->mtr_bulk.action);
+		priv->mtr_bulk.action = NULL;
+	}
+	if (priv->mtr_bulk.devx_obj) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->mtr_bulk.devx_obj));
+		priv->mtr_bulk.devx_obj = NULL;
+	}
+}
+
+int
+mlx5_flow_meter_init(struct rte_eth_dev *dev,
+		     uint32_t nb_meters,
+		     uint32_t nb_meter_profiles,
+		     uint32_t nb_meter_policies)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_obj *dcs = NULL;
+	uint32_t log_obj_size;
+	int ret = 0;
+	int reg_id;
+	struct mlx5_aso_mtr *aso;
+	uint32_t i;
+	struct rte_flow_error error;
+
+	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter configuration is invalid.");
+		goto err;
+	}
+	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO is not supported.");
+		goto err;
+	}
+	priv->mtr_config.nb_meters = nb_meters;
+	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	log_obj_size = rte_log2_u32(nb_meters >> 1);
+	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
+		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
+			log_obj_size);
+	if (!dcs) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO object allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.devx_obj = dcs;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	if (reg_id < 0) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter register is not available.");
+		goto err;
+	}
+	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
+			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
+				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
+				MLX5DR_ACTION_FLAG_HWS_TX |
+				MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!priv->mtr_bulk.action) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter action creation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
+						sizeof(struct mlx5_aso_mtr) * nb_meters,
+						RTE_CACHE_LINE_SIZE,
+						SOCKET_ID_ANY);
+	if (!priv->mtr_bulk.aso) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter bulk ASO allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.size = nb_meters;
+	aso = priv->mtr_bulk.aso;
+	for (i = 0; i < priv->mtr_bulk.size; i++) {
+		aso->type = ASO_METER_DIRECT;
+		aso->state = ASO_METER_WAIT;
+		aso->offset = i;
+		aso++;
+	}
+	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
+	priv->mtr_profile_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_profile) *
+				nb_meter_profiles,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_profile_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter profile allocation failed.");
+		goto err;
+	}
+	priv->mtr_config.nb_meter_policies = nb_meter_policies;
+	priv->mtr_policy_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_policy) *
+				nb_meter_policies,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_policy_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter policy allocation failed.");
+		goto err;
+	}
+	return 0;
+err:
+	mlx5_flow_meter_uninit(dev);
+	return ret;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index d4aafe4eea..792b945c98 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -98,6 +98,8 @@ mlx5_flow_meter_profile_find(struct mlx5_priv *priv, uint32_t meter_profile_id)
 	union mlx5_l3t_data data;
 	int32_t ret;
 
+	if (priv->mtr_profile_arr)
+		return &priv->mtr_profile_arr[meter_profile_id];
 	if (mlx5_l3t_get_entry(priv->mtr_profile_tbl,
 			       meter_profile_id, &data) || !data.ptr)
 		return NULL;
@@ -145,17 +147,29 @@ mlx5_flow_meter_profile_validate(struct rte_eth_dev *dev,
 					  RTE_MTR_ERROR_TYPE_METER_PROFILE,
 					  NULL, "Meter profile is null.");
 	/* Meter profile ID must be valid. */
-	if (meter_profile_id == UINT32_MAX)
-		return -rte_mtr_error_set(error, EINVAL,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL, "Meter profile id not valid.");
-	/* Meter profile must not exist. */
-	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
-	if (fmp)
-		return -rte_mtr_error_set(error, EEXIST,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL,
-					  "Meter profile already exists.");
+	if (priv->mtr_profile_arr) {
+		if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp->initialized)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	} else {
+		if (meter_profile_id == UINT32_MAX)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	}
 	if (!priv->sh->meter_aso_en) {
 		/* Old version is even not supported. */
 		if (!priv->sh->cdev->config.hca_attr.qos.flow_meter_old)
@@ -574,6 +588,96 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to add MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[in] profile
+ *   Pointer to meter profile detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_add(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_meter_profile *profile,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+	int ret;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Check input params. */
+	ret = mlx5_flow_meter_profile_validate(dev, meter_profile_id,
+					       profile, error);
+	if (ret)
+		return ret;
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	/* Fill profile info. */
+	fmp->id = meter_profile_id;
+	fmp->profile = *profile;
+	fmp->initialized = 1;
+	/* Fill the flow meter parameters for the PRM. */
+	return mlx5_flow_meter_param_fill(fmp, error);
+}
+
+/**
+ * Callback to delete MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_delete(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Meter id must be valid. */
+	if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id not valid.");
+	/* Meter profile must exist. */
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	if (!fmp->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id is invalid.");
+	/* Check profile is unused. */
+	if (fmp->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  NULL, "Meter profile is in use.");
+	memset(fmp, 0, sizeof(struct mlx5_flow_meter_profile));
+	return 0;
+}
+
 /**
  * Find policy by id.
  *
@@ -594,6 +698,11 @@ mlx5_flow_meter_policy_find(struct rte_eth_dev *dev,
 	struct mlx5_flow_meter_sub_policy *sub_policy = NULL;
 	union mlx5_l3t_data data;
 
+	if (priv->mtr_policy_arr) {
+		if (policy_idx)
+			*policy_idx = policy_id;
+		return &priv->mtr_policy_arr[policy_id];
+	}
 	if (policy_id > MLX5_MAX_SUB_POLICY_TBL_NUM || !priv->policy_idx_tbl)
 		return NULL;
 	if (mlx5_l3t_get_entry(priv->policy_idx_tbl, policy_id, &data) ||
@@ -710,6 +819,43 @@ mlx5_flow_meter_policy_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to check MTR policy action validate for HWS
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_validate(struct rte_eth_dev *dev,
+	struct rte_mtr_meter_policy_params *policy,
+	struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_actions_template_attr attr = {
+		.transfer = priv->sh->config.dv_esw_en ? 1 : 0 };
+	int ret;
+	int i;
+
+	if (!priv->mtr_en || !priv->sh->meter_aso_en)
+		return -rte_mtr_error_set(error, ENOTSUP,
+				RTE_MTR_ERROR_TYPE_METER_POLICY,
+				NULL, "meter policy unsupported.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		ret = mlx5_flow_actions_validate(dev, &attr, policy->actions[i],
+						 policy->actions[i], NULL);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int
 __mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 			uint32_t policy_id,
@@ -1004,6 +1150,338 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to delete MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_delete(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy;
+	uint32_t i, j;
+	uint32_t nb_flows = 0;
+	int ret;
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter policy array is not allocated");
+	/* Meter id must be valid. */
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  &policy_id,
+					  "Meter policy id not valid.");
+	/* Meter policy must exist. */
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (!mtr_policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID, NULL,
+			"Meter policy does not exists.");
+	/* Check policy is unused. */
+	if (mtr_policy->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy is in use.");
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->hws_flow_rule[i][j]) {
+				ret = rte_flow_async_destroy(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_rule[i][j],
+					NULL, NULL);
+				if (ret < 0)
+					continue;
+				nb_flows++;
+			}
+		}
+	}
+	ret = rte_flow_push(dev->data->port_id, CTRL_QUEUE_ID(priv), NULL);
+	while (nb_flows && (ret >= 0)) {
+		ret = rte_flow_pull(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), result,
+					nb_flows, NULL);
+		nb_flows -= ret;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		if (mtr_policy->hws_flow_table[i])
+			rte_flow_template_table_destroy(dev->data->port_id,
+				 mtr_policy->hws_flow_table[i], NULL);
+	}
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->hws_act_templ[i])
+			rte_flow_actions_template_destroy(dev->data->port_id,
+				 mtr_policy->hws_act_templ[i], NULL);
+	}
+	if (mtr_policy->hws_item_templ)
+		rte_flow_pattern_template_destroy(dev->data->port_id,
+				mtr_policy->hws_item_templ, NULL);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	return 0;
+}
+
+/**
+ * Callback to add MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[out] policy_id
+ *   Pointer to policy id
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
+			uint32_t policy_id,
+			struct rte_mtr_meter_policy_params *policy,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy = NULL;
+	const struct rte_flow_action *act;
+	const struct rte_flow_action_meter *mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *plc;
+	uint8_t domain_color = MLX5_MTR_ALL_DOMAIN_BIT;
+	bool is_rss = false;
+	bool is_hierarchy = false;
+	int i, j;
+	uint32_t nb_colors = 0;
+	uint32_t nb_flows = 0;
+	int color;
+	int ret;
+	struct rte_flow_pattern_template_attr pta = {0};
+	struct rte_flow_actions_template_attr ata = {0};
+	struct rte_flow_template_table_attr ta = { {0}, 0 };
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+	const uint32_t color_mask = (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	int color_reg_c_idx = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						   0, NULL);
+	struct rte_flow_item_tag tag_spec = {
+		.data = 0,
+		.index = color_reg_c_idx
+	};
+	struct rte_flow_item_tag tag_mask = {
+		.data = color_mask,
+		.index = 0xff};
+	struct rte_flow_item pattern[] = {
+		[0] = {
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &tag_spec,
+			.mask = &tag_mask,
+		},
+		[1] = { .type = RTE_FLOW_ITEM_TYPE_END }
+	};
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy array is not allocated.");
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy id not valid.");
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (mtr_policy->initialized)
+		return -rte_mtr_error_set(error, EEXIST,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy already exists.");
+	if (!policy ||
+	    !policy->actions[RTE_COLOR_RED] ||
+	    !policy->actions[RTE_COLOR_YELLOW] ||
+	    !policy->actions[RTE_COLOR_GREEN])
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy actions are not valid.");
+	if (policy->actions[RTE_COLOR_RED] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_r = 1;
+	if (policy->actions[RTE_COLOR_YELLOW] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_y = 1;
+	if (policy->actions[RTE_COLOR_GREEN] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_g = 1;
+	if (mtr_policy->skip_r && mtr_policy->skip_y && mtr_policy->skip_g)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy actions are empty.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		act = policy->actions[i];
+		while (act && act->type != RTE_FLOW_ACTION_TYPE_END) {
+			switch (act->type) {
+			case RTE_FLOW_ACTION_TYPE_PORT_ID:
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+				domain_color &= ~(MLX5_MTR_DOMAIN_INGRESS_BIT |
+						  MLX5_MTR_DOMAIN_EGRESS_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_RSS:
+				is_rss = true;
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_QUEUE:
+				domain_color &= ~(MLX5_MTR_DOMAIN_EGRESS_BIT |
+						  MLX5_MTR_DOMAIN_TRANSFER_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_METER:
+				is_hierarchy = true;
+				mtr = act->conf;
+				fm = mlx5_flow_meter_find(priv,
+							  mtr->mtr_id, NULL);
+				if (!fm)
+					return -rte_mtr_error_set(error, EINVAL,
+						RTE_MTR_ERROR_TYPE_MTR_ID, NULL,
+						"Meter not found in meter hierarchy.");
+				plc = mlx5_flow_meter_policy_find(dev,
+								  fm->policy_id,
+								  NULL);
+				MLX5_ASSERT(plc);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->ingress <<
+					 MLX5_MTR_DOMAIN_INGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->egress <<
+					 MLX5_MTR_DOMAIN_EGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->transfer <<
+					 MLX5_MTR_DOMAIN_TRANSFER);
+				break;
+			default:
+				break;
+			}
+			act++;
+		}
+	}
+	if (!domain_color)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy domains are conflicting.");
+	mtr_policy->is_rss = is_rss;
+	mtr_policy->ingress = !!(domain_color & MLX5_MTR_DOMAIN_INGRESS_BIT);
+	pta.ingress = mtr_policy->ingress;
+	mtr_policy->egress = !!(domain_color & MLX5_MTR_DOMAIN_EGRESS_BIT);
+	pta.egress = mtr_policy->egress;
+	mtr_policy->transfer = !!(domain_color & MLX5_MTR_DOMAIN_TRANSFER_BIT);
+	pta.transfer = mtr_policy->transfer;
+	mtr_policy->group = MLX5_FLOW_TABLE_HWS_POLICY - policy_id;
+	mtr_policy->is_hierarchy = is_hierarchy;
+	mtr_policy->initialized = 1;
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	mtr_policy->hws_item_templ =
+		rte_flow_pattern_template_create(dev->data->port_id,
+						 &pta, pattern, NULL);
+	if (!mtr_policy->hws_item_templ)
+		goto policy_add_err;
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->skip_g && i == RTE_COLOR_GREEN)
+			continue;
+		if (mtr_policy->skip_y && i == RTE_COLOR_YELLOW)
+			continue;
+		if (mtr_policy->skip_r && i == RTE_COLOR_RED)
+			continue;
+		mtr_policy->hws_act_templ[nb_colors] =
+			rte_flow_actions_template_create(dev->data->port_id,
+						&ata, policy->actions[i],
+						policy->actions[i], NULL);
+		if (!mtr_policy->hws_act_templ[nb_colors])
+			goto policy_add_err;
+		nb_colors++;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		memset(&ta, 0, sizeof(ta));
+		ta.nb_flows = RTE_COLORS;
+		ta.flow_attr.group = mtr_policy->group;
+		if (i == MLX5_MTR_DOMAIN_INGRESS) {
+			if (!mtr_policy->ingress)
+				continue;
+			ta.flow_attr.ingress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_EGRESS) {
+			if (!mtr_policy->egress)
+				continue;
+			ta.flow_attr.egress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_TRANSFER) {
+			if (!mtr_policy->transfer)
+				continue;
+			ta.flow_attr.transfer = 1;
+		}
+		mtr_policy->hws_flow_table[i] =
+			rte_flow_template_table_create(dev->data->port_id,
+					&ta, &mtr_policy->hws_item_templ, 1,
+					mtr_policy->hws_act_templ, nb_colors,
+					NULL);
+		if (!mtr_policy->hws_flow_table[i])
+			goto policy_add_err;
+		nb_colors = 0;
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->skip_g && j == RTE_COLOR_GREEN)
+				continue;
+			if (mtr_policy->skip_y && j == RTE_COLOR_YELLOW)
+				continue;
+			if (mtr_policy->skip_r && j == RTE_COLOR_RED)
+				continue;
+			color = rte_col_2_mlx5_col((enum rte_color)j);
+			tag_spec.data = color;
+			mtr_policy->hws_flow_rule[i][j] =
+				rte_flow_async_create(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_table[i],
+					pattern, 0, policy->actions[j],
+					nb_colors, NULL, NULL);
+			if (!mtr_policy->hws_flow_rule[i][j])
+				goto policy_add_err;
+			nb_colors++;
+			nb_flows++;
+		}
+		ret = rte_flow_push(dev->data->port_id,
+				    CTRL_QUEUE_ID(priv), NULL);
+		if (ret < 0)
+			goto policy_add_err;
+		while (nb_flows) {
+			ret = rte_flow_pull(dev->data->port_id,
+					    CTRL_QUEUE_ID(priv), result,
+					    nb_flows, NULL);
+			if (ret < 0)
+				goto policy_add_err;
+			for (j = 0; j < ret; j++) {
+				if (result[j].status == RTE_FLOW_OP_ERROR)
+					goto policy_add_err;
+			}
+			nb_flows -= ret;
+		}
+	}
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+policy_add_err:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	ret = mlx5_flow_meter_policy_hws_delete(dev, policy_id, error);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	if (ret)
+		return ret;
+	return -rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Failed to create meter policy.");
+}
+
 /**
  * Check meter validation.
  *
@@ -1087,7 +1565,8 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
@@ -1336,7 +1815,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1369,6 +1849,90 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 		NULL, "Failed to create devx meter.");
 }
 
+/**
+ * Create meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[in] params
+ *   Pointer to rte meter parameters.
+ * @param[in] shared
+ *   Meter shared with other flow or not.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
+		       struct rte_mtr_params *params, int shared,
+		       struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *profile;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy = NULL;
+	struct mlx5_aso_mtr *aso_mtr;
+	int ret;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+			"Meter bulk array is not allocated.");
+	/* Meter profile must exist. */
+	profile = mlx5_flow_meter_profile_find(priv, params->meter_profile_id);
+	if (!profile->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+			NULL, "Meter profile id not valid.");
+	/* Meter policy must exist. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			params->meter_policy_id, NULL);
+	if (!policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy id not valid.");
+	/* Meter ID must be valid. */
+	if (meter_id >= priv->mtr_config.nb_meters)
+		return -rte_mtr_error_set(error, EINVAL,
+			RTE_MTR_ERROR_TYPE_MTR_ID,
+			NULL, "Meter id not valid.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object already exists.");
+	/* Fill the flow meter parameters. */
+	fm->meter_id = meter_id;
+	fm->policy_id = params->meter_policy_id;
+	fm->profile = profile;
+	fm->meter_offset = meter_id;
+	fm->group = policy->group;
+	/* Add to the flow meter list. */
+	fm->active_state = 1; /* Config meter starts as active. */
+	fm->is_enable = params->meter_enable;
+	fm->shared = !!shared;
+	fm->initialized = 1;
+	/* Update ASO flow meter by wqe. */
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+					   &priv->mtr_bulk);
+	if (ret)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+			NULL, "Failed to create devx meter.");
+	fm->active_state = params->meter_enable;
+	__atomic_add_fetch(&fm->profile->ref_cnt, 1, __ATOMIC_RELAXED);
+	__atomic_add_fetch(&policy->ref_cnt, 1, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
 mlx5_flow_meter_params_flush(struct rte_eth_dev *dev,
 			struct mlx5_flow_meter_info *fm,
@@ -1475,6 +2039,58 @@ mlx5_flow_meter_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
 	return 0;
 }
 
+/**
+ * Destroy meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_METER_POLICY, NULL,
+			"Meter bulk array is not allocated.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (!fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object id not valid.");
+	/* Meter object must not have any owner. */
+	if (fm->ref_cnt > 0)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter object is being used.");
+	/* Destroy the meter profile. */
+	__atomic_sub_fetch(&fm->profile->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	/* Destroy the meter policy. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			fm->policy_id, NULL);
+	__atomic_sub_fetch(&policy->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	memset(fm, 0, sizeof(struct mlx5_flow_meter_info));
+	return 0;
+}
+
 /**
  * Modify meter state.
  *
@@ -1798,6 +2414,23 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.stats_read = mlx5_flow_meter_stats_read,
 };
 
+static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
+	.capabilities_get = mlx5_flow_mtr_cap_get,
+	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
+	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
+	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
+	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.create = mlx5_flow_meter_hws_create,
+	.destroy = mlx5_flow_meter_hws_destroy,
+	.meter_enable = mlx5_flow_meter_enable,
+	.meter_disable = mlx5_flow_meter_disable,
+	.meter_profile_update = mlx5_flow_meter_profile_update,
+	.meter_dscp_table_update = NULL,
+	.stats_update = NULL,
+	.stats_read = NULL,
+};
+
 /**
  * Get meter operations.
  *
@@ -1812,7 +2445,12 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 int
 mlx5_flow_meter_ops_get(struct rte_eth_dev *dev __rte_unused, void *arg)
 {
-	*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_hws_ops;
+	else
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
 	return 0;
 }
 
@@ -1841,6 +2479,12 @@ mlx5_flow_meter_find(struct mlx5_priv *priv, uint32_t meter_id,
 	union mlx5_l3t_data data;
 	uint16_t n_valid;
 
+	if (priv->mtr_bulk.aso) {
+		if (mtr_idx)
+			*mtr_idx = meter_id;
+		aso_mtr = priv->mtr_bulk.aso + meter_id;
+		return &aso_mtr->fm;
+	}
 	if (priv->sh->meter_aso_en) {
 		rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 		n_valid = pools_mng->n_valid;
@@ -2185,6 +2829,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 	struct mlx5_flow_meter_profile *fmp;
 	struct mlx5_legacy_flow_meter *legacy_fm;
 	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
 	struct mlx5_flow_meter_sub_policy *sub_policy;
 	void *tmp;
 	uint32_t i, mtr_idx, policy_idx;
@@ -2219,6 +2864,14 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 				NULL, "MTR object meter profile invalid.");
 		}
 	}
+	if (priv->mtr_bulk.aso) {
+		for (i = 1; i <= priv->mtr_config.nb_meter_profiles; i++) {
+			aso_mtr = mlx5_aso_meter_by_idx(priv, i);
+			fm = &aso_mtr->fm;
+			if (fm->initialized)
+				mlx5_flow_meter_hws_destroy(dev, i, error);
+		}
+	}
 	if (priv->policy_idx_tbl) {
 		MLX5_L3T_FOREACH(priv->policy_idx_tbl, i, entry) {
 			policy_idx = *(uint32_t *)entry;
@@ -2244,6 +2897,15 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->policy_idx_tbl);
 		priv->policy_idx_tbl = NULL;
 	}
+	if (priv->mtr_policy_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_policies; i++) {
+			policy = mlx5_flow_meter_policy_find(dev, i,
+							     &policy_idx);
+			if (policy->initialized)
+				mlx5_flow_meter_policy_hws_delete(dev, i,
+								  error);
+		}
+	}
 	if (priv->mtr_profile_tbl) {
 		MLX5_L3T_FOREACH(priv->mtr_profile_tbl, i, entry) {
 			fmp = entry;
@@ -2257,9 +2919,21 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->mtr_profile_tbl);
 		priv->mtr_profile_tbl = NULL;
 	}
+	if (priv->mtr_profile_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_profiles; i++) {
+			fmp = mlx5_flow_meter_profile_find(priv, i);
+			if (fmp->initialized)
+				mlx5_flow_meter_profile_hws_delete(dev, i,
+								   error);
+		}
+	}
 	/* Delete default policy table. */
 	mlx5_flow_destroy_def_policy(dev);
 	if (priv->sh->refcnt == 1)
 		mlx5_flow_destroy_mtr_drop_tbls(dev);
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	/* Destroy HWS configuration. */
+	mlx5_flow_meter_uninit(dev);
+#endif
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 08/17] net/mlx5: add HW steering counter action
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (6 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 07/17] net/mlx5: add HW steering meter action Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 09/17] net/mlx5: support DR action template API Suanming Mou
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Xiaoyu Min

From: Xiaoyu Min <jackmin@nvidia.com>

This commit adds HW steering counter action support.
Pool mechanism is the basic data structure for the HW steering counter.

The HW steering's counter pool is based on the rte_ring of zero-copy
variation.

There are two global rte_rings:
1. free_list:
     Store the counters indexes, which are ready for use.
2. wait_reset_list:
     Store the counters indexes, which are just freed from the user and
     need to query the hardware counter to get the reset value before
     this counter can be reused again.

The counter pool also supports cache per HW steering's queues, which are
also based on rte_ring of zero-copy variation.

The cache can be configured in size, preload, threshold, and fetch size,
they are all exposed via device args.

The main operations of the counter pool are as follows:

 - Get one counter from the pool:
   1. The user call _get_* API.
   2. If the cache is enabled, dequeue one counter index from the local
      cache:
      2.A: if the dequeued one from the local cache is still in reset
	status (counter's query_gen_when_free is equal to pool's query
	gen):
	I. Flush all counters in local cache back to global
	   wait_reset_list.
	II. Fetch _fetch_sz_ counters into the cache from the global
	    free list.
	III. Fetch one counter from the cache.
   3. If the cache is empty, fetch _fetch_sz_ counters from the global
      free list into the cache and fetch one counter from the cache.
 - Free one counter into the pool:
   1. The user calls _put_* API.
   2. Put the counter into the local cache.
   3. If the local cache is full:
      3.A: Write back all counters above _threshold_ into the global
           wait_reset_list.
      3.B: Also, write back this counter into the global wait_reset_list.

When the local cache is disabled, _get_/_put_ cache directly from/into
global list.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  50 +++
 drivers/common/mlx5/mlx5_devx_cmds.h |  27 ++
 drivers/common/mlx5/mlx5_prm.h       |  62 ++-
 drivers/common/mlx5/version.map      |   1 +
 drivers/net/mlx5/meson.build         |   1 +
 drivers/net/mlx5/mlx5.c              |  14 +
 drivers/net/mlx5/mlx5.h              |  27 ++
 drivers/net/mlx5/mlx5_defs.h         |   2 +
 drivers/net/mlx5/mlx5_flow.c         |  27 +-
 drivers/net/mlx5/mlx5_flow.h         |   5 +
 drivers/net/mlx5/mlx5_flow_aso.c     | 261 ++++++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c      | 340 +++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.c      | 528 +++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_hws_cnt.h      | 558 +++++++++++++++++++++++++++
 14 files changed, 1871 insertions(+), 32 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index ac6891145d..eef7a98248 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -176,6 +176,41 @@ mlx5_devx_cmd_register_write(void *ctx, uint16_t reg_id, uint32_t arg,
 	return 0;
 }
 
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+		struct mlx5_devx_counter_attr *attr)
+{
+	struct mlx5_devx_obj *dcs = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*dcs),
+						0, SOCKET_ID_ANY);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	if (attr->bulk_log_max_alloc)
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk_log_size,
+			 attr->flow_counter_bulk_log_size);
+	else
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk,
+			 attr->bulk_n_128);
+	if (attr->pd_valid)
+		MLX5_SET(alloc_flow_counter_in, in, pd, attr->pd);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		mlx5_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
 /**
  * Allocate flow counters via devx interface.
  *
@@ -967,6 +1002,16 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 					 general_obj_types) &
 			      MLX5_GENERAL_OBJ_TYPES_CAP_CONN_TRACK_OFFLOAD);
 	attr->rq_delay_drop = MLX5_GET(cmd_hca_cap, hcattr, rq_delay_drop);
+	attr->max_flow_counter_15_0 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_15_0);
+	attr->max_flow_counter_31_16 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_31_16);
+	attr->alloc_flow_counter_pd = MLX5_GET(cmd_hca_cap, hcattr,
+			alloc_flow_counter_pd);
+	attr->flow_counter_access_aso = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_counter_access_aso);
+	attr->flow_access_aso_opc_mod = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_access_aso_opc_mod);
 	if (attr->crypto) {
 		attr->aes_xts = MLX5_GET(cmd_hca_cap, hcattr, aes_xts);
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
@@ -989,6 +1034,11 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 		}
 		attr->log_min_stride_wqe_sz = MLX5_GET(cmd_hca_cap_2, hcattr,
 						       log_min_stride_wqe_sz);
+		attr->flow_counter_bulk_log_max_alloc = MLX5_GET(cmd_hca_cap_2,
+				hcattr, flow_counter_bulk_log_max_alloc);
+		attr->flow_counter_bulk_log_granularity =
+			MLX5_GET(cmd_hca_cap_2, hcattr,
+				 flow_counter_bulk_log_granularity);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index d69dad613e..15b46f2acd 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -15,6 +15,16 @@
 #define MLX5_DEVX_MAX_KLM_ENTRIES ((UINT16_MAX - \
 		MLX5_ST_SZ_DW(create_mkey_in) * 4) / (MLX5_ST_SZ_DW(klm) * 4))
 
+struct mlx5_devx_counter_attr {
+	uint32_t pd_valid:1;
+	uint32_t pd:24;
+	uint32_t bulk_log_max_alloc:1;
+	union {
+		uint8_t flow_counter_bulk_log_size;
+		uint8_t bulk_n_128;
+	};
+};
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
@@ -263,6 +273,18 @@ struct mlx5_hca_attr {
 	uint32_t set_reg_c:8;
 	uint32_t nic_flow_table:1;
 	uint32_t modify_outer_ip_ecn:1;
+	union {
+		uint32_t max_flow_counter;
+		struct {
+			uint16_t max_flow_counter_15_0;
+			uint16_t max_flow_counter_31_16;
+		};
+	};
+	uint32_t flow_counter_bulk_log_max_alloc:5;
+	uint32_t flow_counter_bulk_log_granularity:5;
+	uint32_t alloc_flow_counter_pd:1;
+	uint32_t flow_counter_access_aso:1;
+	uint32_t flow_access_aso_opc_mod:8;
 };
 
 /* LAG Context. */
@@ -593,6 +615,11 @@ struct mlx5_devx_crypto_login_attr {
 
 /* mlx5_devx_cmds.c */
 
+__rte_internal
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+				struct mlx5_devx_counter_attr *attr);
+
 __rte_internal
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(void *ctx,
 						       uint32_t bulk_sz);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index c82ec94465..8514ca8fc4 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1161,8 +1161,10 @@ struct mlx5_ifc_alloc_flow_counter_in_bits {
 	u8 reserved_at_10[0x10];
 	u8 reserved_at_20[0x10];
 	u8 op_mod[0x10];
-	u8 flow_counter_id[0x20];
-	u8 reserved_at_40[0x18];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x13];
+	u8 flow_counter_bulk_log_size[0x5];
 	u8 flow_counter_bulk[0x8];
 };
 
@@ -1382,7 +1384,13 @@ enum {
 #define MLX5_STEERING_LOGIC_FORMAT_CONNECTX_6DX 0x1
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x20];
+	u8 access_other_hca_roce[0x1];
+	u8 alloc_flow_counter_pd[0x1];
+	u8 flow_counter_access_aso[0x1];
+	u8 reserved_at_3[0x5];
+	u8 flow_access_aso_opc_mod[0x8];
+	u8 reserved_at_10[0xf];
+	u8 vhca_resource_manager[0x1];
 	u8 hca_cap_2[0x1];
 	u8 reserved_at_21[0xf];
 	u8 vhca_id[0x10];
@@ -2058,8 +2066,52 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 log_conn_track_max_alloc[0x5];
 	u8 reserved_at_d8[0x3];
 	u8 log_max_conn_track_offload[0x5];
-	u8 reserved_at_e0[0x20]; /* End of DW7. */
-	u8 reserved_at_100[0x700];
+	u8 reserved_at_e0[0xc0];
+	u8 reserved_at_1a0[0xb];
+	u8 format_select_dw_8_6_ext[0x1];
+	u8 reserved_at_1ac[0x14];
+	u8 general_obj_types_127_64[0x40];
+	u8 reserved_at_200[0x53];
+	u8 flow_counter_bulk_log_max_alloc[0x5];
+	u8 reserved_at_258[0x3];
+	u8 flow_counter_bulk_log_granularity[0x5];
+	u8 reserved_at_260[0x20];
+	u8 format_select_dw_gtpu_dw_0[0x8];
+	u8 format_select_dw_gtpu_dw_1[0x8];
+	u8 format_select_dw_gtpu_dw_2[0x8];
+	u8 format_select_dw_gtpu_first_ext_dw_0[0x8];
+	u8 reserved_at_2a0[0x560];
+};
+
+struct mlx5_ifc_wqe_based_flow_table_cap_bits {
+	u8 reserved_at_0[0x3];
+	u8 log_max_num_ste[0x5];
+	u8 reserved_at_8[0x3];
+	u8 log_max_num_stc[0x5];
+	u8 reserved_at_10[0x3];
+	u8 log_max_num_rtc[0x5];
+	u8 reserved_at_18[0x3];
+	u8 log_max_num_header_modify_pattern[0x5];
+	u8 reserved_at_20[0x3];
+	u8 stc_alloc_log_granularity[0x5];
+	u8 reserved_at_28[0x3];
+	u8 stc_alloc_log_max[0x5];
+	u8 reserved_at_30[0x3];
+	u8 ste_alloc_log_granularity[0x5];
+	u8 reserved_at_38[0x3];
+	u8 ste_alloc_log_max[0x5];
+	u8 reserved_at_40[0xb];
+	u8 rtc_reparse_mode[0x5];
+	u8 reserved_at_50[0x3];
+	u8 rtc_index_mode[0x5];
+	u8 reserved_at_58[0x3];
+	u8 rtc_log_depth_max[0x5];
+	u8 reserved_at_60[0x10];
+	u8 ste_format[0x10];
+	u8 stc_action_type[0x80];
+	u8 header_insert_type[0x10];
+	u8 header_remove_type[0x10];
+	u8 trivial_match_definer[0x20];
 };
 
 struct mlx5_ifc_esw_cap_bits {
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 413dec14ab..4f72900519 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -40,6 +40,7 @@ INTERNAL {
 	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_alloc_general;
 	mlx5_devx_cmd_flow_counter_query;
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_flow_single_dump;
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 6a84d96380..f2d7bcaff6 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -38,6 +38,7 @@ sources = files(
         'mlx5_vlan.c',
         'mlx5_utils.c',
         'mlx5_devx.c',
+	'mlx5_hws_cnt.c',
 )
 
 if is_linux
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cf5146d677..b6a66f12ee 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -175,6 +175,12 @@
 /* Device parameter to create the fdb default rule in PMD */
 #define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
 
+/* HW steering counter configuration. */
+#define MLX5_HWS_CNT_SERVICE_CORE "service_core"
+
+/* HW steering counter's query interval. */
+#define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1245,6 +1251,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->allow_duplicate_pattern = !!tmp;
 	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
 		config->fdb_def_rule = !!tmp;
+	} else if (strcmp(MLX5_HWS_CNT_SERVICE_CORE, key) == 0) {
+		config->cnt_svc.service_core = tmp;
+	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
+		config->cnt_svc.cycle_time = tmp;
 	}
 	return 0;
 }
@@ -1281,6 +1291,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
 		MLX5_FDB_DEFAULT_RULE_EN,
+		MLX5_HWS_CNT_SERVICE_CORE,
+		MLX5_HWS_CNT_CYCLE_TIME,
 		NULL,
 	};
 	int ret = 0;
@@ -1293,6 +1305,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
 	config->fdb_def_rule = 1;
+	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
+	config->cnt_svc.service_core = rte_get_main_lcore();
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9fbb6ee2b0..8875b96faf 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -308,6 +308,10 @@ struct mlx5_sh_config {
 	uint32_t hw_fcs_strip:1; /* FCS stripping is supported. */
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
+	struct {
+		uint16_t service_core;
+		uint32_t cycle_time; /* query cycle time in milli-second. */
+	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
@@ -1224,6 +1228,22 @@ struct mlx5_flex_item {
 	struct mlx5_flex_pattern_field map[MLX5_FLEX_ITEM_MAPPING_NUM];
 };
 
+#define HWS_CNT_ASO_SQ_NUM 4
+
+struct mlx5_hws_aso_mng {
+	uint16_t sq_num;
+	struct mlx5_aso_sq sqs[HWS_CNT_ASO_SQ_NUM];
+};
+
+struct mlx5_hws_cnt_svc_mng {
+	uint32_t refcnt;
+	uint32_t service_core;
+	uint32_t query_interval;
+	pthread_t service_thread;
+	uint8_t svc_running;
+	struct mlx5_hws_aso_mng aso_mng __rte_cache_aligned;
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -1323,6 +1343,7 @@ struct mlx5_dev_ctx_shared {
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
+	struct mlx5_hws_cnt_svc_mng *cnt_svc;
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1607,6 +1628,7 @@ struct mlx5_priv {
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
 	uint32_t nb_queue; /* HW steering queue number. */
+	struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
@@ -2037,6 +2059,11 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
+void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
+int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
 /* mlx5_flow_flex.c */
 
 struct rte_flow_item_flex_handle *
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 585afb0a98..d064abfef3 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -188,4 +188,6 @@
 #define static_assert _Static_assert
 #endif
 
+#define MLX5_CNT_SVC_CYCLE_TIME_DEFAULT 500
+
 #endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index fb3be940e5..658cc69750 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7832,24 +7832,33 @@ mlx5_flow_isolate(struct rte_eth_dev *dev,
  */
 static int
 flow_drv_query(struct rte_eth_dev *dev,
-	       uint32_t flow_idx,
+	       struct rte_flow *eflow,
 	       const struct rte_flow_action *actions,
 	       void *data,
 	       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow *flow = mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
-					       flow_idx);
-	enum mlx5_flow_drv_type ftype;
+	struct rte_flow *flow = NULL;
+	enum mlx5_flow_drv_type ftype = MLX5_FLOW_TYPE_MIN;
 
+	if (priv->sh->config.dv_flow_en == 2) {
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+		flow = eflow;
+		ftype = MLX5_FLOW_TYPE_HW;
+#endif
+	} else {
+		flow = (struct rte_flow *)mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
+				(uintptr_t)(void *)eflow);
+	}
 	if (!flow) {
 		return rte_flow_error_set(error, ENOENT,
 			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			  NULL,
 			  "invalid flow handle");
 	}
-	ftype = flow->drv_type;
+	if (ftype == MLX5_FLOW_TYPE_MIN)
+		ftype = flow->drv_type;
 	MLX5_ASSERT(ftype > MLX5_FLOW_TYPE_MIN && ftype < MLX5_FLOW_TYPE_MAX);
 	fops = flow_get_drv_ops(ftype);
 
@@ -7870,14 +7879,8 @@ mlx5_flow_query(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	int ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (priv->sh->config.dv_flow_en == 2)
-		return rte_flow_error_set(error, ENOTSUP,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-			  NULL,
-			  "Flow non-Q query not supported");
-	ret = flow_drv_query(dev, (uintptr_t)(void *)flow, actions, data,
+	ret = flow_drv_query(dev, flow, actions, data,
 			     error);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 3bde95c927..8f1b66eaac 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1103,6 +1103,7 @@ struct rte_flow_hw {
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	struct mlx5dr_rule rule; /* HWS layer data struct. */
+	uint32_t cnt_id;
 } __rte_packed;
 
 /* rte flow action translate to DR action struct. */
@@ -1146,6 +1147,9 @@ struct mlx5_action_construct_data {
 			uint32_t level; /* RSS level. */
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
+		struct {
+			uint32_t id;
+		} shared_counter;
 	};
 };
 
@@ -1224,6 +1228,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
+	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 60d0280367..ed9272e583 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -12,6 +12,9 @@
 
 #include "mlx5.h"
 #include "mlx5_flow.h"
+#include "mlx5_hws_cnt.h"
+
+#define MLX5_ASO_CNT_QUEUE_LOG_DESC 14
 
 /**
  * Free MR resources.
@@ -79,6 +82,33 @@ mlx5_aso_destroy_sq(struct mlx5_aso_sq *sq)
 	memset(sq, 0, sizeof(*sq));
 }
 
+/**
+ * Initialize Send Queue used for ASO access counter.
+ *
+ * @param[in] sq
+ *   ASO SQ to initialize.
+ */
+static void
+mlx5_aso_cnt_init_sq(struct mlx5_aso_sq *sq)
+{
+	volatile struct mlx5_aso_wqe *restrict wqe;
+	int i;
+	int size = 1 << sq->log_desc_n;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0, wqe = &sq->sq_obj.aso_wqes[0]; i < size; ++i, ++wqe) {
+		wqe->general_cseg.sq_ds = rte_cpu_to_be_32((sq->sqn << 8) |
+							  (sizeof(*wqe) >> 4));
+		wqe->aso_cseg.operand_masks = rte_cpu_to_be_32
+			(0u |
+			 (ASO_OPER_LOGICAL_OR << ASO_CSEG_COND_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_1_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_0_OPER_OFFSET) |
+			 (BYTEWISE_64BYTE << ASO_CSEG_DATA_MASK_MODE_OFFSET));
+		wqe->aso_cseg.data_mask = RTE_BE64(UINT64_MAX);
+	}
+}
+
 /**
  * Initialize Send Queue used for ASO access.
  *
@@ -191,7 +221,7 @@ mlx5_aso_ct_init_sq(struct mlx5_aso_sq *sq)
  */
 static int
 mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
-		   void *uar)
+		   void *uar, uint16_t log_desc_n)
 {
 	struct mlx5_devx_cq_attr cq_attr = {
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(uar),
@@ -212,12 +242,12 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	int ret;
 
 	if (mlx5_devx_cq_create(cdev->ctx, &sq->cq.cq_obj,
-				MLX5_ASO_QUEUE_LOG_DESC, &cq_attr,
+				log_desc_n, &cq_attr,
 				SOCKET_ID_ANY))
 		goto error;
 	sq->cq.cq_ci = 0;
-	sq->cq.log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
-	sq->log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
+	sq->cq.log_desc_n = log_desc_n;
+	sq->log_desc_n = log_desc_n;
 	sq_attr.cqn = sq->cq.cq_obj.cq->id;
 	/* for mlx5_aso_wqe that is twice the size of mlx5_wqe */
 	log_wqbb_n = sq->log_desc_n + 1;
@@ -269,7 +299,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    sq_desc_n, &sh->aso_age_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->aso_age_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->aso_age_mng->aso_sq.mr);
 			return -1;
 		}
@@ -277,7 +308,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		break;
 	case ASO_OPC_MOD_POLICER:
 		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj))
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
 			return -1;
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
@@ -287,7 +318,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    &sh->ct_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
 			return -1;
 		}
@@ -1403,3 +1434,219 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 	rte_errno = EBUSY;
 	return -rte_errno;
 }
+
+int
+mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_hws_aso_mng *aso_mng = NULL;
+	uint8_t idx;
+	struct mlx5_aso_sq *sq;
+
+	MLX5_ASSERT(sh);
+	MLX5_ASSERT(sh->cnt_svc);
+	aso_mng = &sh->cnt_svc->aso_mng;
+	aso_mng->sq_num = HWS_CNT_ASO_SQ_NUM;
+	for (idx = 0; idx < HWS_CNT_ASO_SQ_NUM; idx++) {
+		sq = &aso_mng->sqs[idx];
+		if (mlx5_aso_sq_create(sh->cdev, sq, sh->tx_uar.obj,
+					MLX5_ASO_CNT_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_cnt_init_sq(sq);
+	}
+	return 0;
+error:
+	mlx5_aso_cnt_queue_uninit(sh);
+	return -1;
+}
+
+void
+mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh)
+{
+	uint16_t idx;
+
+	for (idx = 0; idx < sh->cnt_svc->aso_mng.sq_num; idx++)
+		mlx5_aso_destroy_sq(&sh->cnt_svc->aso_mng.sqs[idx]);
+	sh->cnt_svc->aso_mng.sq_num = 0;
+}
+
+static uint16_t
+mlx5_aso_cnt_sq_enqueue_burst(struct mlx5_hws_cnt_pool *cpool,
+		struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_aso_sq *sq, uint32_t n,
+		uint32_t offset, uint32_t dcs_id_base)
+{
+	volatile struct mlx5_aso_wqe *wqe;
+	uint16_t size = 1 << sq->log_desc_n;
+	uint16_t mask = size - 1;
+	uint16_t max;
+	uint32_t upper_offset = offset;
+	uint64_t addr;
+	uint32_t ctrl_gen_id = 0;
+	uint8_t opcmod = sh->cdev->config.hca_attr.flow_access_aso_opc_mod;
+	rte_be32_t lkey = rte_cpu_to_be_32(cpool->raw_mng->mr.lkey);
+	uint16_t aso_n = (uint16_t)(RTE_ALIGN_CEIL(n, 4) / 4);
+	uint32_t ccntid;
+
+	max = RTE_MIN(size - (uint16_t)(sq->head - sq->tail), aso_n);
+	if (unlikely(!max))
+		return 0;
+	upper_offset += (max * 4);
+	/* Because only one burst at one time, we can use the same elt. */
+	sq->elts[0].burst_size = max;
+	ctrl_gen_id = dcs_id_base;
+	ctrl_gen_id /= 4;
+	do {
+		ccntid = upper_offset - max * 4;
+		wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
+		rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
+		wqe->general_cseg.misc = rte_cpu_to_be_32(ctrl_gen_id);
+		wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+							 MLX5_COMP_MODE_OFFSET);
+		wqe->general_cseg.opcode = rte_cpu_to_be_32
+						(MLX5_OPCODE_ACCESS_ASO |
+						 (opcmod <<
+						  WQE_CSEG_OPC_MOD_OFFSET) |
+						 (sq->pi <<
+						  WQE_CSEG_WQE_INDEX_OFFSET));
+		addr = (uint64_t)RTE_PTR_ADD(cpool->raw_mng->raw,
+				ccntid * sizeof(struct flow_counter_stats));
+		wqe->aso_cseg.va_h = rte_cpu_to_be_32((uint32_t)(addr >> 32));
+		wqe->aso_cseg.va_l_r = rte_cpu_to_be_32((uint32_t)addr | 1u);
+		wqe->aso_cseg.lkey = lkey;
+		sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
+		sq->head++;
+		sq->next++;
+		ctrl_gen_id++;
+		max--;
+	} while (max);
+	wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+							 MLX5_COMP_MODE_OFFSET);
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	return sq->elts[0].burst_size;
+}
+
+static uint16_t
+mlx5_aso_cnt_completion_handle(struct mlx5_aso_sq *sq)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const unsigned int cq_size = 1 << cq->log_desc_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = cq->cq_ci & mask;
+	const uint16_t max = (uint16_t)(sq->head - sq->tail);
+	uint16_t i = 0;
+	int ret;
+	if (unlikely(!max))
+		return 0;
+	idx = next_idx;
+	next_idx = (cq->cq_ci + 1) & mask;
+	rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+	cqe = &cq->cq_obj.cqes[idx];
+	ret = check_cqe(cqe, cq_size, cq->cq_ci);
+	/*
+	 * Be sure owner read is done before any other cookie field or
+	 * opaque field.
+	 */
+	rte_io_rmb();
+	if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+		if (likely(ret == MLX5_CQE_STATUS_HW_OWN))
+			return 0; /* return immediately. */
+		mlx5_aso_cqe_err_handle(sq);
+	}
+	i += sq->elts[0].burst_size;
+	sq->elts[0].burst_size = 0;
+	cq->cq_ci++;
+	if (likely(i)) {
+		sq->tail += i;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return i;
+}
+
+static uint16_t
+mlx5_aso_cnt_query_one_dcs(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool,
+			   uint8_t dcs_idx, uint32_t num)
+{
+	uint32_t dcs_id = cpool->dcs_mng.dcs[dcs_idx].obj->id;
+	uint64_t cnt_num = cpool->dcs_mng.dcs[dcs_idx].batch_sz;
+	uint64_t left;
+	uint32_t iidx = cpool->dcs_mng.dcs[dcs_idx].iidx;
+	uint32_t offset;
+	uint16_t mask;
+	uint16_t sq_idx;
+	uint64_t burst_sz = (uint64_t)(1 << MLX5_ASO_CNT_QUEUE_LOG_DESC) * 4 *
+		sh->cnt_svc->aso_mng.sq_num;
+	uint64_t qburst_sz = burst_sz / sh->cnt_svc->aso_mng.sq_num;
+	uint64_t n;
+	struct mlx5_aso_sq *sq;
+
+	cnt_num = RTE_MIN(num, cnt_num);
+	left = cnt_num;
+	while (left) {
+		mask = 0;
+		for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+				sq_idx++) {
+			if (left == 0) {
+				mask |= (1 << sq_idx);
+				continue;
+			}
+			n = RTE_MIN(left, qburst_sz);
+			offset = cnt_num - left;
+			offset += iidx;
+			mlx5_aso_cnt_sq_enqueue_burst(cpool, sh,
+					&sh->cnt_svc->aso_mng.sqs[sq_idx], n,
+					offset, dcs_id);
+			left -= n;
+		}
+		do {
+			for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+					sq_idx++) {
+				sq = &sh->cnt_svc->aso_mng.sqs[sq_idx];
+				if (mlx5_aso_cnt_completion_handle(sq))
+					mask |= (1 << sq_idx);
+			}
+		} while (mask < ((1 << sh->cnt_svc->aso_mng.sq_num) - 1));
+	}
+	return cnt_num;
+}
+
+/*
+ * Query FW counter via ASO WQE.
+ *
+ * ASO query counter use _sync_ mode, means:
+ * 1. each SQ issue one burst with several WQEs
+ * 2. ask for CQE at last WQE
+ * 3. busy poll CQ of each SQ's
+ * 4. If all SQ's CQE are received then goto step 1, issue next burst
+ *
+ * @param[in] sh
+ *   Pointer to shared device.
+ * @param[in] cpool
+ *   Pointer to counter pool.
+ *
+ * @return
+ *   0 on success, -1 on failure.
+ */
+int
+mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	uint32_t num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool) -
+		rte_ring_count(cpool->free_list);
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		num = RTE_MIN(cnt_num, cpool->dcs_mng.dcs[idx].batch_sz);
+		mlx5_aso_cnt_query_one_dcs(sh, cpool, idx, num);
+		cnt_num -= num;
+		if (cnt_num == 0)
+			break;
+	}
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6938e77609..6778536031 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -10,6 +10,7 @@
 #include "mlx5_rx.h"
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+#include "mlx5_hws_cnt.h"
 
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
@@ -353,6 +354,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 			mlx5dr_action_destroy(acts->mhdr->action);
 		mlx5_free(acts->mhdr);
 	}
+	if (mlx5_hws_cnt_id_valid(acts->cnt_id)) {
+		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
+		acts->cnt_id = 0;
+	}
 }
 
 /**
@@ -532,6 +537,44 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared counter action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] cnt_id
+ *   Shared counter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t cnt_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_counter.id = cnt_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
+
 /**
  * Translate shared indirect action.
  *
@@ -573,6 +616,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 		    action_src, action_dst, idx, shared_rss))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (__flow_hw_act_data_shared_cnt_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
+			action_src, action_dst, act_idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -946,6 +996,30 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	}
 	return 0;
 }
+
+static __rte_always_inline int
+flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
+		      struct mlx5_hw_actions *acts)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t pos = start_pos;
+	cnt_id_t cnt_id;
+	int ret;
+
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	if (ret != 0)
+		return ret;
+	ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &acts->rule_acts[pos].action,
+				 &acts->rule_acts[pos].counter.offset);
+	if (ret != 0)
+		return ret;
+	acts->cnt_id = cnt_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1189,6 +1263,20 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (masks->conf &&
+			    ((const struct rte_flow_action_count *)
+			     masks->conf)->id) {
+				err = flow_hw_cnt_compile(dev, i, acts);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, i)) {
+				goto err;
+			}
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1377,6 +1465,13 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				(dev, &act_data, item_flags, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+				act_idx,
+				&rule_act->action,
+				&rule_act->counter.offset))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1520,7 +1615,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num)
+			  uint32_t *acts_num,
+			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
@@ -1574,6 +1670,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
 		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
@@ -1681,6 +1778,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
+					&cnt_id);
+			if (ret != 0)
+				return ret;
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = cnt_id;
+			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 act_data->shared_counter.id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = act_data->shared_counter.id;
+			break;
 		default:
 			break;
 		}
@@ -1690,6 +1813,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
 	}
+	if (mlx5_hws_cnt_id_valid(hw_acts->cnt_id))
+		job->flow->cnt_id = hw_acts->cnt_id;
 	return 0;
 }
 
@@ -1825,7 +1950,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * user's input, in order to save the cost.
 	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num)) {
+				  actions, rule_acts, &acts_num, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1955,6 +2080,13 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
+			    mlx5_hws_cnt_is_shared
+				(priv->hws_cpool, job->flow->cnt_id) == false) {
+				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
+						&job->flow->cnt_id);
+				job->flow->cnt_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -2657,6 +2789,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -4334,6 +4469,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_counters) {
+		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
+				nb_queue);
+		if (priv->hws_cpool == NULL)
+			goto err;
+	}
 	return 0;
 err:
 	flow_hw_free_vport_actions(priv);
@@ -4403,6 +4544,8 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (priv->hws_cpool)
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4544,10 +4687,28 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	cnt_id_t cnt_id;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_create(dev, conf, action, error);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+			rte_flow_error_set(error, ENODEV,
+					RTE_FLOW_ERROR_TYPE_ACTION,
+					NULL,
+					"counter are not configured!");
+		else
+			handle = (struct rte_flow_action_handle *)
+				 (uintptr_t)cnt_id;
+		break;
+	default:
+		handle = flow_dv_action_create(dev, conf, action, error);
+	}
+	return handle;
 }
 
 /**
@@ -4611,10 +4772,172 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			      void *user_data,
 			      struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_destroy(dev, handle, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	default:
+		return flow_dv_action_destroy(dev, handle, error);
+	}
+}
+
+static int
+flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
+		      void *data, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cnt *cnt;
+	struct rte_flow_query_count *qc = data;
+	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint64_t pkts, bytes;
+
+	if (!mlx5_hws_cnt_id_valid(counter))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"counter are not available");
+	cnt = &priv->hws_cpool->pool[iidx];
+	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
+	qc->hits_set = 1;
+	qc->bytes_set = 1;
+	qc->hits = pkts - cnt->reset.hits;
+	qc->bytes = bytes - cnt->reset.bytes;
+	if (qc->reset) {
+		cnt->reset.bytes = bytes;
+		cnt->reset.hits = pkts;
+	}
+	return 0;
+}
+
+static int
+flow_hw_query(struct rte_eth_dev *dev,
+	      struct rte_flow *flow __rte_unused,
+	      const struct rte_flow_action *actions __rte_unused,
+	      void *data __rte_unused,
+	      struct rte_flow_error *error __rte_unused)
+{
+	int ret = -EINVAL;
+	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
+
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
+						  error);
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  actions,
+						  "action not supported");
+		}
+	}
+	return ret;
+}
+
+/**
+ * Create indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   A valid shared action handle in case of success, NULL otherwise and
+ *   rte_errno is set.
+ */
+static struct rte_flow_action_handle *
+flow_hw_action_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_indir_action_conf *conf,
+		       const struct rte_flow_action *action,
+		       struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
+					    NULL, err);
+}
+
+/**
+ * Destroy the indirect action.
+ * Release action related resources on the NIC and the memory.
+ * Lock free, (mutex should be acquired by caller).
+ * Dispatcher for action type specific call.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be removed.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_destroy(struct rte_eth_dev *dev,
+		       struct rte_flow_action_handle *handle,
+		       struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
+			NULL, error);
+}
+
+/**
+ * Updates in place shared action configuration.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be updated.
+ * @param[in] update
+ *   Action specification used to modify the action pointed by *handle*.
+ *   *update* could be of same type with the action pointed by the *handle*
+ *   handle argument, or some other structures like a wrapper, depending on
+ *   the indirect action type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_update(struct rte_eth_dev *dev,
+		      struct rte_flow_action_handle *handle,
+		      const void *update,
+		      struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
+			update, NULL, err);
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return flow_hw_query_counter(dev, act_idx, data, error);
+	default:
+		return flow_dv_action_query(dev, handle, data, error);
+	}
 }
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
@@ -4636,10 +4959,11 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
 	.action_validate = flow_dv_action_validate,
-	.action_create = flow_dv_action_create,
-	.action_destroy = flow_dv_action_destroy,
-	.action_update = flow_dv_action_update,
-	.action_query = flow_dv_action_query,
+	.action_create = flow_hw_action_create,
+	.action_destroy = flow_hw_action_destroy,
+	.action_update = flow_hw_action_update,
+	.action_query = flow_hw_action_query,
+	.query = flow_hw_query,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
new file mode 100644
index 0000000000..e2408ef36d
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -0,0 +1,528 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#include <stdint.h>
+#include <rte_malloc.h>
+#include <mlx5_malloc.h>
+#include <rte_ring.h>
+#include <mlx5_devx_cmds.h>
+#include <rte_cycles.h>
+
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+
+#include "mlx5_utils.h"
+#include "mlx5_hws_cnt.h"
+
+#define HWS_CNT_CACHE_SZ_DEFAULT 511
+#define HWS_CNT_CACHE_PRELOAD_DEFAULT 254
+#define HWS_CNT_CACHE_FETCH_DEFAULT 254
+#define HWS_CNT_CACHE_THRESHOLD_DEFAULT 254
+#define HWS_CNT_ALLOC_FACTOR_DEFAULT 20
+
+static void
+__hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t preload;
+	uint32_t q_num = cpool->cache->q_num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	cnt_id_t cnt_id, iidx = 0;
+	uint32_t qidx;
+	struct rte_ring *qcache = NULL;
+
+	/*
+	 * Counter ID order is important for tracking the max number of in used
+	 * counter for querying, which means counter internal index order must
+	 * be from zero to the number user configured, i.e: 0 - 8000000.
+	 * Need to load counter ID in this order into the cache firstly,
+	 * and then the global free list.
+	 * In the end, user fetch the the counter from minimal to the maximum.
+	 */
+	preload = RTE_MIN(cpool->cache->preload_sz, cnt_num / q_num);
+	for (qidx = 0; qidx < q_num; qidx++) {
+		for (; iidx < preload * (qidx + 1); iidx++) {
+			cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+			qcache = cpool->cache->qcache[qidx];
+			if (qcache)
+				rte_ring_enqueue_elem(qcache, &cnt_id,
+						sizeof(cnt_id));
+		}
+	}
+	for (; iidx < cnt_num; iidx++) {
+		cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+		rte_ring_enqueue_elem(cpool->free_list, &cnt_id,
+				sizeof(cnt_id));
+	}
+}
+
+static void
+__mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	struct rte_ring *reset_list = cpool->wait_reset_list;
+	struct rte_ring *reuse_list = cpool->reuse_list;
+	uint32_t reset_cnt_num;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdu = {0};
+
+	reset_cnt_num = rte_ring_count(reset_list);
+	do {
+		cpool->query_gen++;
+		mlx5_aso_cnt_query(sh, cpool);
+		zcdr.n1 = 0;
+		zcdu.n1 = 0;
+		rte_ring_enqueue_zc_burst_elem_start(reuse_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdu,
+				NULL);
+		rte_ring_dequeue_zc_burst_elem_start(reset_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdr,
+				NULL);
+		__hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num);
+		rte_ring_dequeue_zc_elem_finish(reset_list,
+				reset_cnt_num);
+		rte_ring_enqueue_zc_elem_finish(reuse_list,
+				reset_cnt_num);
+		reset_cnt_num = rte_ring_count(reset_list);
+	} while (reset_cnt_num > 0);
+}
+
+static void
+mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_raw_data_mng *mng)
+{
+	if (mng == NULL)
+		return;
+	sh->cdev->mr_scache.dereg_mr_cb(&mng->mr);
+	mlx5_free(mng->raw);
+	mlx5_free(mng);
+}
+
+__rte_unused
+static struct mlx5_hws_cnt_raw_data_mng *
+mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
+{
+	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
+	int ret;
+	size_t sz = n * sizeof(struct flow_counter_stats);
+
+	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
+			SOCKET_ID_ANY);
+	if (mng == NULL)
+		goto error;
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+			SOCKET_ID_ANY);
+	if (mng->raw == NULL)
+		goto error;
+	ret = sh->cdev->mr_scache.reg_mr_cb(sh->cdev->pd, mng->raw, sz,
+					    &mng->mr);
+	if (ret) {
+		rte_errno = errno;
+		goto error;
+	}
+	return mng;
+error:
+	mlx5_hws_cnt_raw_data_free(sh, mng);
+	return NULL;
+}
+
+static void *
+mlx5_hws_cnt_svc(void *opaque)
+{
+	struct mlx5_dev_ctx_shared *sh =
+		(struct mlx5_dev_ctx_shared *)opaque;
+	uint64_t interval =
+		(uint64_t)sh->cnt_svc->query_interval * (US_PER_S / MS_PER_S);
+	uint16_t port_id;
+	uint64_t start_cycle, query_cycle = 0;
+	uint64_t query_us;
+	uint64_t sleep_us;
+
+	while (sh->cnt_svc->svc_running != 0) {
+		start_cycle = rte_rdtsc();
+		MLX5_ETH_FOREACH_DEV(port_id, sh->cdev->dev) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+			if (opriv != NULL &&
+			    opriv->sh == sh &&
+			    opriv->hws_cpool != NULL) {
+				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+			}
+		}
+		query_cycle = rte_rdtsc() - start_cycle;
+		query_us = query_cycle / (rte_get_timer_hz() / US_PER_S);
+		sleep_us = interval - query_us;
+		if (interval > query_us)
+			rte_delay_us_sleep(sleep_us);
+	}
+	return NULL;
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg)
+{
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct mlx5_hws_cnt_pool *cntp;
+	uint64_t cnt_num = 0;
+	uint32_t qidx;
+
+	MLX5_ASSERT(pcfg);
+	MLX5_ASSERT(ccfg);
+	cntp = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*cntp), 0,
+			   SOCKET_ID_ANY);
+	if (cntp == NULL)
+		return NULL;
+
+	cntp->cfg = *pcfg;
+	cntp->cache = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*cntp->cache) +
+			sizeof(((struct mlx5_hws_cnt_pool_caches *)0)->qcache[0])
+				* ccfg->q_num, 0, SOCKET_ID_ANY);
+	if (cntp->cache == NULL)
+		goto error;
+	 /* store the necessary cache parameters. */
+	cntp->cache->fetch_sz = ccfg->fetch_sz;
+	cntp->cache->preload_sz = ccfg->preload_sz;
+	cntp->cache->threshold = ccfg->threshold;
+	cntp->cache->q_num = ccfg->q_num;
+	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
+	if (cnt_num > UINT32_MAX) {
+		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
+			cnt_num);
+		goto error;
+	}
+	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(struct mlx5_hws_cnt) *
+			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
+			0, SOCKET_ID_ANY);
+	if (cntp->pool == NULL)
+		goto error;
+	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
+	cntp->free_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->free_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_R_RING", pcfg->name);
+	cntp->wait_reset_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_MP_HTS_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
+	if (cntp->wait_reset_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_U_RING", pcfg->name);
+	cntp->reuse_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->reuse_list == NULL) {
+		DRV_LOG(ERR, "failed to create reuse list ring");
+		goto error;
+	}
+	for (qidx = 0; qidx < ccfg->q_num; qidx++) {
+		snprintf(mz_name, sizeof(mz_name), "%s_cache/%u", pcfg->name,
+				qidx);
+		cntp->cache->qcache[qidx] = rte_ring_create(mz_name, ccfg->size,
+				SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (cntp->cache->qcache[qidx] == NULL)
+			goto error;
+	}
+	return cntp;
+error:
+	mlx5_hws_cnt_pool_deinit(cntp);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool * const cntp)
+{
+	uint32_t qidx = 0;
+	if (cntp == NULL)
+		return;
+	rte_ring_free(cntp->free_list);
+	rte_ring_free(cntp->wait_reset_list);
+	rte_ring_free(cntp->reuse_list);
+	if (cntp->cache) {
+		for (qidx = 0; qidx < cntp->cache->q_num; qidx++)
+			rte_ring_free(cntp->cache->qcache[qidx]);
+	}
+	mlx5_free(cntp->cache);
+	mlx5_free(cntp->raw_mng);
+	mlx5_free(cntp->pool);
+	mlx5_free(cntp);
+}
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh)
+{
+#define CNT_THREAD_NAME_MAX 256
+	char name[CNT_THREAD_NAME_MAX];
+	rte_cpuset_t cpuset;
+	int ret;
+	uint32_t service_core = sh->cnt_svc->service_core;
+
+	CPU_ZERO(&cpuset);
+	sh->cnt_svc->svc_running = 1;
+	ret = pthread_create(&sh->cnt_svc->service_thread, NULL,
+			mlx5_hws_cnt_svc, sh);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create HW steering's counter service thread.");
+		return -ENOSYS;
+	}
+	snprintf(name, CNT_THREAD_NAME_MAX - 1, "%s/svc@%d",
+		 sh->ibdev_name, service_core);
+	rte_thread_setname(sh->cnt_svc->service_thread, name);
+	CPU_SET(service_core, &cpuset);
+	pthread_setaffinity_np(sh->cnt_svc->service_thread, sizeof(cpuset),
+				&cpuset);
+	return 0;
+}
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc->service_thread == 0)
+		return;
+	sh->cnt_svc->svc_running = 0;
+	pthread_join(sh->cnt_svc->service_thread, NULL);
+	sh->cnt_svc->service_thread = 0;
+}
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+	uint32_t max_log_bulk_sz = 0;
+	uint32_t log_bulk_sz;
+	uint32_t idx, alloced = 0;
+	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	struct mlx5_devx_counter_attr attr = {0};
+	struct mlx5_devx_obj *dcs;
+
+	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
+		DRV_LOG(ERR,
+			"Fw doesn't support bulk log max alloc");
+		return -1;
+	}
+	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
+	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
+	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
+	attr.pd = sh->cdev->pdn;
+	attr.pd_valid = 1;
+	attr.bulk_log_max_alloc = 1;
+	attr.flow_counter_bulk_log_size = log_bulk_sz;
+	idx = 0;
+	dcs = mlx5_devx_cmd_flow_counter_alloc_general(sh->cdev->ctx, &attr);
+	if (dcs == NULL)
+		goto error;
+	cpool->dcs_mng.dcs[idx].obj = dcs;
+	cpool->dcs_mng.dcs[idx].batch_sz = (1 << log_bulk_sz);
+	cpool->dcs_mng.batch_total++;
+	idx++;
+	cpool->dcs_mng.dcs[0].iidx = 0;
+	alloced = cpool->dcs_mng.dcs[0].batch_sz;
+	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
+		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			dcs = mlx5_devx_cmd_flow_counter_alloc_general
+				(sh->cdev->ctx, &attr);
+			if (dcs == NULL)
+				goto error;
+			cpool->dcs_mng.dcs[idx].obj = dcs;
+			cpool->dcs_mng.dcs[idx].batch_sz =
+				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].iidx = alloced;
+			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
+			cpool->dcs_mng.batch_total++;
+		}
+	}
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Cannot alloc device counter, allocated[%" PRIu32 "] request[%" PRIu32 "]",
+		alloced, cnt_num);
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+		cpool->dcs_mng.dcs[idx].obj = NULL;
+		cpool->dcs_mng.dcs[idx].batch_sz = 0;
+		cpool->dcs_mng.dcs[idx].iidx = 0;
+	}
+	cpool->dcs_mng.batch_total = 0;
+	return -1;
+}
+
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+
+	if (cpool == NULL)
+		return;
+	for (idx = 0; idx < MLX5_HWS_CNT_DCS_NUM; idx++)
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+	if (cpool->raw_mng) {
+		mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+		cpool->raw_mng = NULL;
+	}
+}
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	int ret = 0;
+	struct mlx5_hws_cnt_dcs *dcs;
+	uint32_t flags;
+
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		dcs->dr_action = mlx5dr_action_create_counter(priv->dr_ctx,
+					(struct mlx5dr_devx_obj *)dcs->obj,
+					flags);
+		if (dcs->dr_action == NULL) {
+			mlx5_hws_cnt_pool_action_destroy(cpool);
+			ret = -ENOSYS;
+			break;
+		}
+	}
+	return ret;
+}
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	struct mlx5_hws_cnt_dcs *dcs;
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		if (dcs->dr_action != NULL) {
+			mlx5dr_action_destroy(dcs->dr_action);
+			dcs->dr_action = NULL;
+		}
+	}
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue)
+{
+	struct mlx5_hws_cnt_pool *cpool = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cache_param cparam = {0};
+	struct mlx5_hws_cnt_pool_cfg pcfg = {0};
+	char *mp_name;
+	int ret = 0;
+	size_t sz;
+
+	/* init cnt service if not. */
+	if (priv->sh->cnt_svc == NULL) {
+		ret = mlx5_hws_cnt_svc_init(priv->sh);
+		if (ret != 0)
+			return NULL;
+	}
+	cparam.fetch_sz = HWS_CNT_CACHE_FETCH_DEFAULT;
+	cparam.preload_sz = HWS_CNT_CACHE_PRELOAD_DEFAULT;
+	cparam.q_num = nb_queue;
+	cparam.threshold = HWS_CNT_CACHE_THRESHOLD_DEFAULT;
+	cparam.size = HWS_CNT_CACHE_SZ_DEFAULT;
+	pcfg.alloc_factor = HWS_CNT_ALLOC_FACTOR_DEFAULT;
+	mp_name = mlx5_malloc(MLX5_MEM_ZERO, RTE_MEMZONE_NAMESIZE, 0,
+			SOCKET_ID_ANY);
+	if (mp_name == NULL)
+		goto error;
+	snprintf(mp_name, RTE_MEMZONE_NAMESIZE, "MLX5_HWS_CNT_POOL_%u",
+			dev->data->port_id);
+	pcfg.name = mp_name;
+	pcfg.request_num = pattr->nb_counters;
+	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	if (cpool == NULL)
+		goto error;
+	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
+	if (ret != 0)
+		goto error;
+	sz = RTE_ALIGN_CEIL(mlx5_hws_cnt_pool_get_size(cpool), 4);
+	cpool->raw_mng = mlx5_hws_cnt_raw_data_alloc(priv->sh, sz);
+	if (cpool->raw_mng == NULL)
+		goto error;
+	__hws_cnt_id_load(cpool);
+	/*
+	 * Bump query gen right after pool create so the
+	 * pre-loaded counters can be used directly
+	 * because they already have init value no need
+	 * to wait for query.
+	 */
+	cpool->query_gen = 1;
+	ret = mlx5_hws_cnt_pool_action_create(priv, cpool);
+	if (ret != 0)
+		goto error;
+	priv->sh->cnt_svc->refcnt++;
+	return cpool;
+error:
+	mlx5_hws_cnt_pool_destroy(priv->sh, cpool);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	if (cpool == NULL)
+		return;
+	if (--sh->cnt_svc->refcnt == 0)
+		mlx5_hws_cnt_svc_deinit(sh);
+	mlx5_hws_cnt_pool_action_destroy(cpool);
+	mlx5_hws_cnt_pool_dcs_free(sh, cpool);
+	mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+	mlx5_free((void *)cpool->cfg.name);
+	mlx5_hws_cnt_pool_deinit(cpool);
+}
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh)
+{
+	int ret;
+
+	sh->cnt_svc = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*sh->cnt_svc), 0, SOCKET_ID_ANY);
+	if (sh->cnt_svc == NULL)
+		return -1;
+	sh->cnt_svc->query_interval = sh->config.cnt_svc.cycle_time;
+	sh->cnt_svc->service_core = sh->config.cnt_svc.service_core;
+	ret = mlx5_aso_cnt_queue_init(sh);
+	if (ret != 0) {
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+		return -1;
+	}
+	ret = mlx5_hws_cnt_service_thread_create(sh);
+	if (ret != 0) {
+		mlx5_aso_cnt_queue_uninit(sh);
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+	}
+	return 0;
+}
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc == NULL)
+		return;
+	mlx5_hws_cnt_service_thread_destroy(sh);
+	mlx5_aso_cnt_queue_uninit(sh);
+	mlx5_free(sh->cnt_svc);
+	sh->cnt_svc = NULL;
+}
+
+#endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
new file mode 100644
index 0000000000..5fab4ba597
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -0,0 +1,558 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Mellanox Technologies, Ltd
+ */
+
+#ifndef _MLX5_HWS_CNT_H_
+#define _MLX5_HWS_CNT_H_
+
+#include <rte_ring.h>
+#include "mlx5_utils.h"
+#include "mlx5_flow.h"
+
+/*
+ * COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    | T |       | D |                                               |
+ *    ~ Y |       | C |                    IDX                        ~
+ *    | P |       | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
+#define MLX5_HWS_CNT_DCS_NUM 4
+#define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
+#define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
+#define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
+
+struct mlx5_hws_cnt_dcs {
+	void *dr_action;
+	uint32_t batch_sz;
+	uint32_t iidx; /* internal index of first counter in this bulk. */
+	struct mlx5_devx_obj *obj;
+};
+
+struct mlx5_hws_cnt_dcs_mng {
+	uint32_t batch_total;
+	struct mlx5_hws_cnt_dcs dcs[MLX5_HWS_CNT_DCS_NUM];
+};
+
+struct mlx5_hws_cnt {
+	struct flow_counter_stats reset;
+	union {
+		uint32_t share: 1;
+		/*
+		 * share will be set to 1 when this counter is used as indirect
+		 * action. Only meaningful when user own this counter.
+		 */
+		uint32_t query_gen_when_free;
+		/*
+		 * When PMD own this counter (user put back counter to PMD
+		 * counter pool, i.e), this field recorded value of counter
+		 * pools query generation at time user release the counter.
+		 */
+	};
+};
+
+struct mlx5_hws_cnt_raw_data_mng {
+	struct flow_counter_stats *raw;
+	struct mlx5_pmd_mr mr;
+};
+
+struct mlx5_hws_cache_param {
+	uint32_t size;
+	uint32_t q_num;
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+};
+
+struct mlx5_hws_cnt_pool_cfg {
+	char *name;
+	uint32_t request_num;
+	uint32_t alloc_factor;
+};
+
+struct mlx5_hws_cnt_pool_caches {
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+	uint32_t q_num;
+	struct rte_ring *qcache[];
+};
+
+struct mlx5_hws_cnt_pool {
+	struct mlx5_hws_cnt_pool_cfg cfg __rte_cache_aligned;
+	struct mlx5_hws_cnt_dcs_mng dcs_mng __rte_cache_aligned;
+	uint32_t query_gen __rte_cache_aligned;
+	struct mlx5_hws_cnt *pool;
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng;
+	struct rte_ring *reuse_list;
+	struct rte_ring *free_list;
+	struct rte_ring *wait_reset_list;
+	struct mlx5_hws_cnt_pool_caches *cache;
+} __rte_cache_aligned;
+
+/**
+ * Translate counter id into internal index (start from 0), which can be used
+ * as index of raw/cnt pool.
+ *
+ * @param cnt_id
+ *   The external counter id
+ * @return
+ *   Internal index
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+	uint32_t offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+
+	dcs_idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	return (cpool->dcs_mng.dcs[dcs_idx].iidx + offset);
+}
+
+/**
+ * Check if it's valid counter id.
+ */
+static __rte_always_inline bool
+mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
+{
+	return (cnt_id >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_COUNT ? true : false;
+}
+
+/**
+ * Generate Counter id from internal index.
+ *
+ * @param cpool
+ *   The pointer to counter pool
+ * @param index
+ *   The internal counter index.
+ *
+ * @return
+ *   Counter id
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+{
+	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
+	uint32_t idx;
+	uint32_t offset;
+	cnt_id_t cnt_id;
+
+	for (idx = 0, offset = iidx; idx < dcs_mng->batch_total; idx++) {
+		if (dcs_mng->dcs[idx].batch_sz <= offset)
+			offset -= dcs_mng->dcs[idx].batch_sz;
+		else
+			break;
+	}
+	cnt_id = offset;
+	cnt_id |= (idx << MLX5_HWS_CNT_DCS_IDX_OFFSET);
+	return (MLX5_INDIRECT_ACTION_TYPE_COUNT <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | cnt_id;
+}
+
+static __rte_always_inline void
+__hws_cnt_query_raw(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		uint64_t *raw_pkts, uint64_t *raw_bytes)
+{
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng = cpool->raw_mng;
+	struct flow_counter_stats s[2];
+	uint8_t i = 0x1;
+	size_t stat_sz = sizeof(s[0]);
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	memcpy(&s[0], &raw_mng->raw[iidx], stat_sz);
+	do {
+		memcpy(&s[i & 1], &raw_mng->raw[iidx], stat_sz);
+		if (memcmp(&s[0], &s[1], stat_sz) == 0) {
+			*raw_pkts = rte_be_to_cpu_64(s[0].hits);
+			*raw_bytes = rte_be_to_cpu_64(s[0].bytes);
+			break;
+		}
+		i = ~i;
+	} while (1);
+}
+
+/**
+ * Copy elems from one zero-copy ring to zero-copy ring in place.
+ *
+ * The input is a rte ring zero-copy data struct, which has two pointer.
+ * in case of the wrapper happened, the ptr2 will be meaningful.
+ *
+ * So this rountin needs to consider the situation that the address given by
+ * source and destination could be both wrapped.
+ * First, calculate the first number of element needs to be copied until wrapped
+ * address, which could be in source or destination.
+ * Second, copy left number of element until second wrapped address. If in first
+ * step the wrapped address is source, then this time it must be in destination.
+ * and vice-vers.
+ * Third, copy all left numbe of element.
+ *
+ * In worst case, we need copy three pieces of continuous memory.
+ *
+ * @param zcdd
+ *   A pointer to zero-copy data of dest ring.
+ * @param zcds
+ *   A pointer to zero-copy data of source ring.
+ * @param n
+ *   Number of elems to copy.
+ */
+static __rte_always_inline void
+__hws_cnt_r2rcpy(struct rte_ring_zc_data *zcdd, struct rte_ring_zc_data *zcds,
+		unsigned int n)
+{
+	unsigned int n1, n2, n3;
+	void *s1, *s2, *s3;
+	void *d1, *d2, *d3;
+
+	s1 = zcds->ptr1;
+	d1 = zcdd->ptr1;
+	n1 = RTE_MIN(zcdd->n1, zcds->n1);
+	if (zcds->n1 > n1) {
+		n2 = zcds->n1 - n1;
+		s2 = RTE_PTR_ADD(zcds->ptr1, sizeof(cnt_id_t) * n1);
+		d2 = zcdd->ptr2;
+		n3 = n - n1 - n2;
+		s3 = zcds->ptr2;
+		d3 = RTE_PTR_ADD(zcdd->ptr2, sizeof(cnt_id_t) * n2);
+	} else {
+		n2 = zcdd->n1 - n1;
+		s2 = zcds->ptr2;
+		d2 = RTE_PTR_ADD(zcdd->ptr1, sizeof(cnt_id_t) * n1);
+		n3 = n - n1 - n2;
+		s3 = RTE_PTR_ADD(zcds->ptr2, sizeof(cnt_id_t) * n2);
+		d3 = zcdd->ptr2;
+	}
+	memcpy(d1, s1, n1 * sizeof(cnt_id_t));
+	if (n2 != 0) {
+		memcpy(d2, s2, n2 * sizeof(cnt_id_t));
+		if (n3 != 0)
+			memcpy(d3, s3, n3 * sizeof(cnt_id_t));
+	}
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_flush(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *reset_list = NULL;
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache,
+			sizeof(cnt_id_t), rte_ring_count(qcache), &zcdc,
+			NULL);
+	MLX5_ASSERT(ret);
+	reset_list = cpool->wait_reset_list;
+	rte_ring_enqueue_zc_burst_elem_start(reset_list,
+			sizeof(cnt_id_t), ret, &zcdr, NULL);
+	__hws_cnt_r2rcpy(&zcdr, &zcdc, ret);
+	rte_ring_enqueue_zc_elem_finish(reset_list, ret);
+	rte_ring_dequeue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_fetch(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+	struct rte_ring *free_list = NULL;
+	struct rte_ring *reuse_list = NULL;
+	struct rte_ring *list = NULL;
+	struct rte_ring_zc_data zcdf = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdu = {0};
+	struct rte_ring_zc_data zcds = {0};
+	struct mlx5_hws_cnt_pool_caches *cache = cpool->cache;
+	unsigned int ret;
+
+	reuse_list = cpool->reuse_list;
+	ret = rte_ring_dequeue_zc_burst_elem_start(reuse_list,
+			sizeof(cnt_id_t), cache->fetch_sz, &zcdu, NULL);
+	zcds = zcdu;
+	list = reuse_list;
+	if (unlikely(ret == 0)) { /* no reuse counter. */
+		rte_ring_dequeue_zc_elem_finish(reuse_list, 0);
+		free_list = cpool->free_list;
+		ret = rte_ring_dequeue_zc_burst_elem_start(free_list,
+				sizeof(cnt_id_t), cache->fetch_sz, &zcdf, NULL);
+		zcds = zcdf;
+		list = free_list;
+		if (unlikely(ret == 0)) { /* no free counter. */
+			rte_ring_dequeue_zc_elem_finish(free_list, 0);
+			if (rte_ring_count(cpool->wait_reset_list))
+				return -EAGAIN;
+			return -ENOENT;
+		}
+	}
+	rte_ring_enqueue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+			ret, &zcdc, NULL);
+	__hws_cnt_r2rcpy(&zcdc, &zcds, ret);
+	rte_ring_dequeue_zc_elem_finish(list, ret);
+	rte_ring_enqueue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+__mlx5_hws_cnt_pool_enqueue_revert(struct rte_ring *r, unsigned int n,
+		struct rte_ring_zc_data *zcd)
+{
+	uint32_t current_head = 0;
+	uint32_t revert2head = 0;
+
+	MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST);
+	MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST);
+	current_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	MLX5_ASSERT(n <= r->capacity);
+	MLX5_ASSERT(n <= rte_ring_count(r));
+	revert2head = current_head - n;
+	r->prod.head = revert2head; /* This ring should be SP. */
+	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
+			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
+	/* Update tail */
+	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
+	return n;
+}
+
+/**
+ * Put one counter back in the mempool.
+ *
+ * @param cpool
+ *   A pointer to the counter pool structure.
+ * @param cnt_id
+ *   A counter id to be added.
+ * @return
+ *   - 0: Success; object taken
+ *   - -ENOENT: not enough entry in pool
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret = 0;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring *qcache = NULL;
+	unsigned int wb_num = 0; /* cache write-back number. */
+	cnt_id_t iidx;
+
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].query_gen_when_free =
+		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_enqueue_elem(cpool->wait_reset_list, cnt_id,
+				sizeof(cnt_id_t));
+		MLX5_ASSERT(ret == 0);
+		return ret;
+	}
+	ret = rte_ring_enqueue_burst_elem(qcache, cnt_id, sizeof(cnt_id_t), 1,
+					  NULL);
+	if (unlikely(ret == 0)) { /* cache is full. */
+		wb_num = rte_ring_count(qcache) - cpool->cache->threshold;
+		MLX5_ASSERT(wb_num < rte_ring_count(qcache));
+		__mlx5_hws_cnt_pool_enqueue_revert(qcache, wb_num, &zcdc);
+		rte_ring_enqueue_zc_burst_elem_start(cpool->wait_reset_list,
+				sizeof(cnt_id_t), wb_num, &zcdr, NULL);
+		__hws_cnt_r2rcpy(&zcdr, &zcdc, wb_num);
+		rte_ring_enqueue_zc_elem_finish(cpool->wait_reset_list, wb_num);
+		/* write-back THIS counter too */
+		ret = rte_ring_enqueue_burst_elem(cpool->wait_reset_list,
+				cnt_id, sizeof(cnt_id_t), 1, NULL);
+	}
+	return ret == 1 ? 0 : -ENOENT;
+}
+
+/**
+ * Get one counter from the pool.
+ *
+ * If @param queue is not null, objects will be retrieved first from queue's
+ * cache, subsequently from the common pool. Note that it can return -ENOENT
+ * when the local cache and common pool are empty, even if cache from other
+ * queue are full.
+ *
+ * @param cntp
+ *   A pointer to the counter pool structure.
+ * @param queue
+ *   A pointer to HWS queue. If null, it means fetch from common pool.
+ * @param cnt_id
+ *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @return
+ *   - 0: Success; objects taken.
+ *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
+ *   - -EAGAIN: counter is not ready; try again.
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *qcache = NULL;
+	uint32_t query_gen = 0;
+	cnt_id_t iidx, tmp_cid = 0;
+
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_dequeue_elem(cpool->reuse_list, &tmp_cid,
+				sizeof(cnt_id_t));
+		if (unlikely(ret != 0)) {
+			ret = rte_ring_dequeue_elem(cpool->free_list, &tmp_cid,
+					sizeof(cnt_id_t));
+			if (unlikely(ret != 0)) {
+				if (rte_ring_count(cpool->wait_reset_list))
+					return -EAGAIN;
+				return -ENOENT;
+			}
+		}
+		*cnt_id = tmp_cid;
+		iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+		__hws_cnt_query_raw(cpool, *cnt_id,
+				    &cpool->pool[iidx].reset.hits,
+				    &cpool->pool[iidx].reset.bytes);
+		return 0;
+	}
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
+			&zcdc, NULL);
+	if (unlikely(ret == 0)) { /* local cache is empty. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+	}
+	/* get one from local cache. */
+	*cnt_id = (*(cnt_id_t *)zcdc.ptr1);
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	query_gen = cpool->pool[iidx].query_gen_when_free;
+	if (cpool->query_gen == query_gen) { /* counter is waiting to reset. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* write-back counter to reset list. */
+		mlx5_hws_cnt_pool_cache_flush(cpool, *queue);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+		*cnt_id = *(cnt_id_t *)zcdc.ptr1;
+	}
+	__hws_cnt_query_raw(cpool, *cnt_id, &cpool->pool[iidx].reset.hits,
+			    &cpool->pool[iidx].reset.bytes);
+	rte_ring_dequeue_zc_elem_finish(qcache, 1);
+	cpool->pool[iidx].share = 0;
+	return 0;
+}
+
+static __rte_always_inline unsigned int
+mlx5_hws_cnt_pool_get_size(struct mlx5_hws_cnt_pool *cpool)
+{
+	return rte_ring_get_capacity(cpool->free_list);
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
+		cnt_id_t cnt_id, struct mlx5dr_action **action,
+		uint32_t *offset)
+{
+	uint8_t idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+
+	idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	*action = cpool->dcs_mng.dcs[idx].dr_action;
+	*offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx;
+
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	if (ret != 0)
+		return ret;
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	MLX5_ASSERT(cpool->pool[iidx].share == 0);
+	cpool->pool[iidx].share = 1;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_put(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+
+	cpool->pool[iidx].share = 0;
+	ret = mlx5_hws_cnt_pool_put(cpool, NULL, cnt_id);
+	if (unlikely(ret != 0))
+		cpool->pool[iidx].share = 1; /* fail to release, restore. */
+	return ret;
+}
+
+static __rte_always_inline bool
+mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	return cpool->pool[iidx].share ? true : false;
+}
+
+/* init HWS counter pool. */
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg);
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh);
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool);
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool);
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue);
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
+
+#endif /* _MLX5_HWS_CNT_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 09/17] net/mlx5: support DR action template API
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (7 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 08/17] net/mlx5: add HW steering counter action Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 10/17] net/mlx5: add HW steering connection tracking support Suanming Mou
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adapts mlx5 PMD to changes in mlx5dr API regarding action
templates. It changes the following:

1. Actions template creation:

    - Flow actions types are translated to mlx5dr action types in order
      to create mlx5dr_action_template object.
    - An offset is assigned to each flow action. This offset is used to
      predetermine action's location in rule_acts array passed on rule
      creation.

2. Template table creation:

    - Fixed actions are created and put in rule_acts cache using
      predetermined offsets
    - mlx5dr matcher is parametrized by action templates bound to
      template table.
    - mlx5dr matcher is configured to optimize rule creation based on
      passed rule indices.

3. Flow rule creation:

    - mlx5dr rule is parametrized by action template on which these
      rule's actions are based.
    - Rule index hint is provided to mlx5dr.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   1 +
 drivers/net/mlx5/mlx5.c          |   4 +-
 drivers/net/mlx5/mlx5.h          |   2 +
 drivers/net/mlx5/mlx5_flow.h     |  30 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 598 +++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c  |  10 +
 6 files changed, 523 insertions(+), 122 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 061b825e7b..65795da516 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1565,6 +1565,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		flow_hw_init_flow_metadata_config(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
 		    flow_hw_create_vport_action(eth_dev)) {
 			DRV_LOG(ERR, "port %u failed to create vport action",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b6a66f12ee..cf7b7b7158 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1969,8 +1969,10 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
-	if (priv->sh->config.dv_flow_en == 2)
+	if (priv->sh->config.dv_flow_en == 2) {
+		flow_hw_clear_flow_metadata_config();
 		flow_hw_clear_tags_set(dev);
+	}
 #endif
 	if (priv->rxq_privs != NULL) {
 		/* XXX race condition if mlx5_rx_burst() is still running. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8875b96faf..26f627ae1b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1644,6 +1644,8 @@ struct mlx5_priv {
 	struct mlx5dr_action *hw_drop[2];
 	/* HW steering global tag action. */
 	struct mlx5dr_action *hw_tag[2];
+	/* HW steering create ongoing rte flow table list header. */
+	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 8f1b66eaac..ae1417f10e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1175,6 +1175,11 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint16_t dr_actions_num; /* Amount of DR rules actions. */
+	uint16_t actions_num; /* Amount of flow actions */
+	uint16_t *actions_off; /* DR action offset for given rte action offset. */
+	uint16_t reformat_off; /* Offset of DR reformat action. */
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
@@ -1226,7 +1231,6 @@ struct mlx5_hw_actions {
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
-	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
@@ -1482,6 +1486,13 @@ flow_hw_get_wire_port(struct ibv_context *ibctx)
 }
 #endif
 
+extern uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+extern uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+extern uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+void flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev);
+void flow_hw_clear_flow_metadata_config(void);
+
 /*
  * Convert metadata or tag to the actual register.
  * META: Can only be used to match in the FDB in this stage, fixed C_1.
@@ -1493,7 +1504,20 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 {
 	switch (type) {
 	case RTE_FLOW_ITEM_TYPE_META:
-		return REG_C_1;
+		if (mlx5_flow_hw_flow_metadata_esw_en &&
+		    mlx5_flow_hw_flow_metadata_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		}
+		/*
+		 * On root table - PMD allows only egress META matching, thus
+		 * REG_A matching is sufficient.
+		 *
+		 * On non-root tables - REG_A corresponds to general_purpose_lookup_field,
+		 * which translates to REG_A in NIC TX and to REG_B in NIC RX.
+		 * However, current FW does not implement REG_B case right now, so
+		 * REG_B case should be rejected on pattern template validation.
+		 */
+		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
@@ -2402,4 +2426,6 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_pattern_template_attr *attr,
 		const struct rte_flow_item items[],
 		struct rte_flow_error *error);
+int flow_hw_table_update(struct rte_eth_dev *dev,
+			 struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6778536031..2798fedd00 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -967,33 +967,29 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 static __rte_always_inline int
 flow_hw_meter_compile(struct rte_eth_dev *dev,
 		      const struct mlx5_flow_template_table_cfg *cfg,
-		      uint32_t  start_pos, const struct rte_flow_action *action,
-		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      uint16_t aso_mtr_pos,
+		      uint16_t jump_pos,
+		      const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts,
 		      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr *aso_mtr;
 	const struct rte_flow_action_meter *meter = action->conf;
-	uint32_t pos = start_pos;
 	uint32_t group = cfg->attr.flow_attr.group;
 
 	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
-	acts->rule_acts[pos].action = priv->mtr_bulk.action;
-	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
-		acts->jump = flow_hw_jump_action_register
+	acts->rule_acts[aso_mtr_pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts->jump = flow_hw_jump_action_register
 		(dev, cfg, aso_mtr->fm.group, error);
-	if (!acts->jump) {
-		*end_pos = start_pos;
+	if (!acts->jump)
 		return -ENOMEM;
-	}
-	acts->rule_acts[++pos].action = (!!group) ?
+	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	*end_pos = pos;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
-		*end_pos = start_pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
 		return -ENOMEM;
-	}
 	return 0;
 }
 
@@ -1046,11 +1042,11 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
  *    Table on success, NULL otherwise and rte_errno is set.
  */
 static int
-flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct mlx5_flow_template_table_cfg *cfg,
-			  struct mlx5_hw_actions *acts,
-			  struct rte_flow_actions_template *at,
-			  struct rte_flow_error *error)
+__flow_hw_actions_translate(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_template_table_cfg *cfg,
+			    struct mlx5_hw_actions *acts,
+			    struct rte_flow_actions_template *at,
+			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
@@ -1061,12 +1057,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	enum mlx5dr_action_reformat_type refmt_type = 0;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
-	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
+	uint16_t reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
-	uint32_t type, i;
+	uint32_t type;
+	bool reformat_used = false;
+	uint16_t action_pos;
+	uint16_t jump_pos;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1076,46 +1075,53 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		type = MLX5DR_TABLE_TYPE_NIC_TX;
 	else
 		type = MLX5DR_TABLE_TYPE_NIC_RX;
-	for (i = 0; !actions_end; actions++, masks++) {
+	for (; !actions_end; actions++, masks++) {
 		switch (actions->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			action_pos = at->actions_off[actions - action_start];
 			if (!attr->group) {
 				DRV_LOG(ERR, "Indirect action is not supported in root table.");
 				goto err;
 			}
 			if (actions->conf && masks->conf) {
 				if (flow_hw_shared_action_translate
-				(dev, actions, acts, actions - action_start, i))
+				(dev, actions, acts, actions - action_start, action_pos))
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
-			acts->rule_acts[i++].action =
+			action_pos = at->actions_off[actions - action_start];
+			acts->rule_acts[action_pos].action =
 				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
+			action_pos = at->actions_off[actions - action_start];
 			acts->mark = true;
-			if (masks->conf)
-				acts->rule_acts[i].tag.value =
+			if (masks->conf &&
+			    ((const struct rte_flow_action_mark *)
+			     masks->conf)->id)
+				acts->rule_acts[action_pos].tag.value =
 					mlx5_flow_mark_set
 					(((const struct rte_flow_action_mark *)
 					(masks->conf))->id);
 			else if (__flow_hw_act_data_general_append(priv, acts,
-				actions->type, actions - action_start, i))
+				actions->type, actions - action_start, action_pos))
 				goto err;
-			acts->rule_acts[i++].action =
+			acts->rule_acts[action_pos].action =
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - action_start];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_jump *)
+			     masks->conf)->group) {
 				uint32_t jump_group =
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
@@ -1123,76 +1129,77 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
-				acts->rule_acts[i].action = (!!attr->group) ?
+				acts->rule_acts[action_pos].action = (!!attr->group) ?
 						acts->jump->hws_action :
 						acts->jump->root_action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - action_start];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_queue *)
+			     masks->conf)->index) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - action_start];
+			if (actions->conf && masks->conf) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
-			reformat_pos = i++;
+			MLX5_ASSERT(!reformat_used);
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
@@ -1206,25 +1213,23 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 actions->conf;
 			encap_data = raw_encap_data->data;
 			data_size = raw_encap_data->size;
-			if (reformat_pos != MLX5_HW_MAX_ACTS) {
+			if (reformat_used) {
 				refmt_type = data_size <
 				MLX5_ENCAPSULATION_DECISION_SIZE ?
 				MLX5DR_ACTION_REFORMAT_TYPE_TNL_L3_TO_L2 :
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L3;
 			} else {
-				reformat_pos = i++;
+				reformat_used = true;
 				refmt_type =
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			}
 			reformat_src = actions - action_start;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
-			reformat_pos = i++;
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			if (mhdr.pos == UINT16_MAX)
-				mhdr.pos = i++;
 			err = flow_hw_modify_field_compile(dev, attr, action_start,
 							   actions, masks, acts, &mhdr,
 							   error);
@@ -1242,40 +1247,46 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			action_pos = at->actions_off[actions - action_start];
 			if (flow_hw_represented_port_compile
 					(dev, attr, action_start, actions,
-					 masks, acts, i, error))
+					 masks, acts, action_pos, error))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
+			/*
+			 * METER action is compiled to 2 DR actions - ASO_METER and FT.
+			 * Calculated DR offset is stored only for ASO_METER and FT
+			 * is assumed to be the next action.
+			 */
+			action_pos = at->actions_off[actions - action_start];
+			jump_pos = action_pos + 1;
 			if (actions->conf && masks->conf &&
 			    ((const struct rte_flow_action_meter *)
 			     masks->conf)->mtr_id) {
 				err = flow_hw_meter_compile(dev, cfg,
-						i, actions, acts, &i, error);
+						action_pos, jump_pos, actions, acts, error);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append(priv, acts,
 							actions->type,
 							actions - action_start,
-							i))
+							action_pos))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
+			action_pos = at->actions_off[actions - action_start];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
-				err = flow_hw_cnt_compile(dev, i, acts);
+				err = flow_hw_cnt_compile(dev, action_pos, acts);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -1309,10 +1320,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			goto err;
 		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
 	}
-	if (reformat_pos != MLX5_HW_MAX_ACTS) {
+	if (reformat_used) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
 
+		MLX5_ASSERT(at->reformat_off != UINT16_MAX);
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
 			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
@@ -1340,20 +1352,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
-		acts->rule_acts[reformat_pos].action =
-						acts->encap_decap->action;
-		acts->rule_acts[reformat_pos].reformat.data =
-						acts->encap_decap->data;
+		acts->rule_acts[at->reformat_off].action = acts->encap_decap->action;
+		acts->rule_acts[at->reformat_off].reformat.data = acts->encap_decap->data;
 		if (shared_rfmt)
-			acts->rule_acts[reformat_pos].reformat.offset = 0;
+			acts->rule_acts[at->reformat_off].reformat.offset = 0;
 		else if (__flow_hw_act_data_encap_append(priv, acts,
 				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, data_size))
+				 reformat_src, at->reformat_off, data_size))
 			goto err;
 		acts->encap_decap->shared = shared_rfmt;
-		acts->encap_decap_pos = reformat_pos;
+		acts->encap_decap_pos = at->reformat_off;
 	}
-	acts->acts_num = i;
 	return 0;
 err:
 	err = rte_errno;
@@ -1363,6 +1372,40 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				  "fail to create rte table");
 }
 
+/**
+ * Translate rte_flow actions to DR action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] tbl
+ *   Pointer to the flow template table.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_actions_translate(struct rte_eth_dev *dev,
+			  struct rte_flow_template_table *tbl,
+			  struct rte_flow_error *error)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->nb_action_templates; i++) {
+		if (__flow_hw_actions_translate(dev, &tbl->cfg,
+						&tbl->ats[i].acts,
+						tbl->ats[i].action_template,
+						error))
+			goto err;
+	}
+	return 0;
+err:
+	while (i--)
+		__flow_hw_action_template_destroy(dev, &tbl->ats[i].acts);
+	return -1;
+}
+
 /**
  * Get shared indirect action.
  *
@@ -1611,16 +1654,17 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 static __rte_always_inline int
 flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5_hw_q_job *job,
-			  const struct mlx5_hw_actions *hw_acts,
+			  const struct mlx5_hw_action_template *hw_at,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
+	const struct rte_flow_actions_template *at = hw_at->action_template;
+	const struct mlx5_hw_actions *hw_acts = &hw_at->acts;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
@@ -1636,11 +1680,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *mtr;
 	uint32_t mtr_id;
 
-	memcpy(rule_acts, hw_acts->rule_acts,
-	       sizeof(*rule_acts) * hw_acts->acts_num);
-	*acts_num = hw_acts->acts_num;
-	if (LIST_EMPTY(&hw_acts->act_list))
-		return 0;
+	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
 	ft_flag = mlx5_hw_act_flag[!!table->grp->group_id][table->type];
 	if (table->type == MLX5DR_TABLE_TYPE_FDB) {
@@ -1774,7 +1814,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			(*acts_num)++;
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
@@ -1912,13 +1951,16 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 		.burst = attr->postpone,
 	};
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
-	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
 	const struct rte_flow_item *rule_items;
-	uint32_t acts_num, flow_idx;
+	uint32_t flow_idx;
 	int ret;
 
+	if (unlikely((!dev->data->dev_started))) {
+		rte_errno = EINVAL;
+		goto error;
+	}
 	if (unlikely(!priv->hw_q[queue].job_idx)) {
 		rte_errno = ENOMEM;
 		goto error;
@@ -1941,7 +1983,12 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->flow = flow;
 	job->user_data = user_data;
 	rule_attr.user_data = job;
-	hw_acts = &table->ats[action_template_index].acts;
+	/*
+	 * Indexed pool returns 1-based indices, but mlx5dr expects 0-based indices for rule
+	 * insertion hints.
+	 */
+	MLX5_ASSERT(flow_idx > 0);
+	rule_attr.rule_idx = flow_idx - 1;
 	/*
 	 * Construct the flow actions based on the input actions.
 	 * The implicitly appended action is always fixed, like metadata
@@ -1949,8 +1996,8 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num, queue)) {
+	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
+				      pattern_template_index, actions, rule_acts, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1959,7 +2006,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	if (!rule_items)
 		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
-				 pattern_template_index, items,
+				 pattern_template_index, rule_items,
 				 action_template_index, rule_acts,
 				 &rule_attr, &flow->rule);
 	if (likely(!ret))
@@ -2293,6 +2340,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	struct mlx5dr_action_template *at[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
@@ -2313,6 +2361,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct mlx5_list_entry *ge;
 	uint32_t i, max_tpl = MLX5_HW_TBL_MAX_ITEM_TEMPLATE;
 	uint32_t nb_flows = rte_align32pow2(attr->nb_flows);
+	bool port_started = !!dev->data->dev_started;
 	int err;
 
 	/* HWS layer accepts only 1 item template with root table. */
@@ -2347,12 +2396,20 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl->grp = grp;
 	/* Prepare matcher information. */
 	matcher_attr.priority = attr->flow_attr.priority;
+	matcher_attr.optimize_using_rule_idx = true;
 	matcher_attr.mode = MLX5DR_MATCHER_RESOURCE_MODE_RULE;
 	matcher_attr.rule.num_log = rte_log2_u32(nb_flows);
 	/* Build the item template. */
 	for (i = 0; i < nb_item_templates; i++) {
 		uint32_t ret;
 
+		if ((flow_attr.ingress && !item_templates[i]->attr.ingress) ||
+		    (flow_attr.egress && !item_templates[i]->attr.egress) ||
+		    (flow_attr.transfer && !item_templates[i]->attr.transfer)) {
+			DRV_LOG(ERR, "pattern template and template table attribute mismatch");
+			rte_errno = EINVAL;
+			goto it_error;
+		}
 		ret = __atomic_add_fetch(&item_templates[i]->refcnt, 1,
 					 __ATOMIC_RELAXED);
 		if (ret <= 1) {
@@ -2362,10 +2419,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mt[i] = item_templates[i]->mt;
 		tbl->its[i] = item_templates[i];
 	}
-	tbl->matcher = mlx5dr_matcher_create
-		(tbl->grp->tbl, mt, nb_item_templates, NULL, 0, &matcher_attr);
-	if (!tbl->matcher)
-		goto it_error;
 	tbl->nb_item_templates = nb_item_templates;
 	/* Build the action template. */
 	for (i = 0; i < nb_action_templates; i++) {
@@ -2377,21 +2430,31 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			rte_errno = EINVAL;
 			goto at_error;
 		}
+		at[i] = action_templates[i]->tmpl;
+		tbl->ats[i].action_template = action_templates[i];
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, &tbl->cfg,
-						&tbl->ats[i].acts,
-						action_templates[i], error);
+		if (!port_started)
+			continue;
+		err = __flow_hw_actions_translate(dev, &tbl->cfg,
+						  &tbl->ats[i].acts,
+						  action_templates[i], error);
 		if (err) {
 			i++;
 			goto at_error;
 		}
-		tbl->ats[i].action_template = action_templates[i];
 	}
 	tbl->nb_action_templates = nb_action_templates;
+	tbl->matcher = mlx5dr_matcher_create
+		(tbl->grp->tbl, mt, nb_item_templates, at, nb_action_templates, &matcher_attr);
+	if (!tbl->matcher)
+		goto at_error;
 	tbl->type = attr->flow_attr.transfer ? MLX5DR_TABLE_TYPE_FDB :
 		    (attr->flow_attr.egress ? MLX5DR_TABLE_TYPE_NIC_TX :
 		    MLX5DR_TABLE_TYPE_NIC_RX);
-	LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	if (port_started)
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	else
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl_ongo, tbl, next);
 	return tbl;
 at_error:
 	while (i--) {
@@ -2404,7 +2467,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	while (i--)
 		__atomic_sub_fetch(&item_templates[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
-	mlx5dr_matcher_destroy(tbl->matcher);
 error:
 	err = rte_errno;
 	if (tbl) {
@@ -2421,6 +2483,33 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Update flow template table.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+int
+flow_hw_table_update(struct rte_eth_dev *dev,
+		     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table *tbl;
+
+	while ((tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo)) != NULL) {
+		if (flow_hw_actions_translate(dev, tbl, error))
+			return -1;
+		LIST_REMOVE(tbl, next);
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	}
+	return 0;
+}
+
 /**
  * Translates group index specified by the user in @p attr to internal
  * group index.
@@ -2499,6 +2588,7 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -2507,6 +2597,12 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
+	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
+		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+				  "egress flows are not supported with HW Steering"
+				  " when E-Switch is enabled");
+		return NULL;
+	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -2729,7 +2825,8 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_action *mask = &masks[i];
 
 		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
-		if (action->type != mask->type)
+		if (action->type != RTE_FLOW_ACTION_TYPE_INDIRECT &&
+		    action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  action,
@@ -2805,6 +2902,157 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
+	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
+	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
+	[RTE_FLOW_ACTION_TYPE_JUMP] = MLX5DR_ACTION_TYP_FT,
+	[RTE_FLOW_ACTION_TYPE_QUEUE] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_RSS] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
+	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
+	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+};
+
+static int
+flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
+					  unsigned int action_src,
+					  enum mlx5dr_action_type *action_types,
+					  uint16_t *curr_off,
+					  struct rte_flow_actions_template *at)
+{
+	uint32_t type;
+
+	if (!mask) {
+		DRV_LOG(WARNING, "Unable to determine indirect action type "
+			"without a mask specified");
+		return -EINVAL;
+	}
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
+		*curr_off = *curr_off + 1;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
+		*curr_off = *curr_off + 1;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Create DR action template based on a provided sequence of flow actions.
+ *
+ * @param[in] at
+ *   Pointer to flow actions template to be updated.
+ *
+ * @return
+ *   DR action template pointer on success and action offsets in @p at are updated.
+ *   NULL otherwise.
+ */
+static struct mlx5dr_action_template *
+flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
+{
+	struct mlx5dr_action_template *dr_template;
+	enum mlx5dr_action_type action_types[MLX5_HW_MAX_ACTS] = { MLX5DR_ACTION_TYP_LAST };
+	unsigned int i;
+	uint16_t curr_off;
+	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+	uint16_t reformat_off = UINT16_MAX;
+	uint16_t mhdr_off = UINT16_MAX;
+	int ret;
+	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		const struct rte_flow_action_raw_encap *raw_encap_data;
+		size_t data_size;
+		enum mlx5dr_action_type type;
+
+		if (curr_off >= MLX5_HW_MAX_ACTS)
+			goto err_actions_num;
+		switch (at->actions[i].type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
+									action_types,
+									&curr_off, at);
+			if (ret)
+				return NULL;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			MLX5_ASSERT(reformat_off == UINT16_MAX);
+			reformat_off = curr_off++;
+			reformat_act_type = mlx5_hw_dr_action_types[at->actions[i].type];
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data = at->actions[i].conf;
+			data_size = raw_encap_data->size;
+			if (reformat_off != UINT16_MAX) {
+				reformat_act_type = data_size < MLX5_ENCAPSULATION_DECISION_SIZE ?
+					MLX5DR_ACTION_TYP_TNL_L3_TO_L2 :
+					MLX5DR_ACTION_TYP_L2_TO_TNL_L3;
+			} else {
+				reformat_off = curr_off++;
+				reformat_act_type = MLX5DR_ACTION_TYP_L2_TO_TNL_L2;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			reformat_off = curr_off++;
+			reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr_off == UINT16_MAX) {
+				mhdr_off = curr_off++;
+				type = mlx5_hw_dr_action_types[at->actions[i].type];
+				action_types[mhdr_off] = type;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
+			break;
+		default:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			break;
+		}
+	}
+	if (curr_off >= MLX5_HW_MAX_ACTS)
+		goto err_actions_num;
+	if (mhdr_off != UINT16_MAX)
+		at->mhdr_off = mhdr_off;
+	if (reformat_off != UINT16_MAX) {
+		at->reformat_off = reformat_off;
+		action_types[reformat_off] = reformat_act_type;
+	}
+	dr_template = mlx5dr_action_template_create(action_types);
+	if (dr_template)
+		at->dr_actions_num = curr_off;
+	else
+		DRV_LOG(ERR, "Failed to create DR action template: %d", rte_errno);
+	return dr_template;
+err_actions_num:
+	DRV_LOG(ERR, "Number of HW actions (%u) exceeded maximum (%u) allowed in template",
+		curr_off, MLX5_HW_MAX_ACTS);
+	return NULL;
+}
+
 /**
  * Create flow action template.
  *
@@ -2830,7 +3078,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_len, mask_len, i;
+	int len, act_num, act_len, mask_len;
+	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = MLX5_HW_MAX_ACTS;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
@@ -2900,6 +3149,11 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
+	/* Count flow actions to allocate required space for storing DR offsets. */
+	act_num = 0;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
+		act_num++;
+	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
@@ -2909,19 +3163,26 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
-	/* Actions part is in the first half. */
+	/* Actions part is in the first part. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
 				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	/* Masks part is in the second half. */
+	/* Masks part is in the second part. */
 	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
 				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	/* DR actions offsets in the third part. */
+	at->actions_off = (uint16_t *)((uint8_t *)at->masks + mask_len);
+	at->actions_num = act_num;
+	for (i = 0; i < at->actions_num; ++i)
+		at->actions_off[i] = UINT16_MAX;
+	at->reformat_off = UINT16_MAX;
+	at->mhdr_off = UINT16_MAX;
 	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
@@ -2935,12 +3196,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			at->masks[i].conf = masks->conf;
 		}
 	}
+	at->tmpl = flow_hw_dr_actions_template_create(at);
+	if (!at->tmpl)
+		goto error;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	if (at)
+	if (at) {
+		if (at->tmpl)
+			mlx5dr_action_template_destroy(at->tmpl);
 		mlx5_free(at);
+	}
 	return NULL;
 }
 
@@ -2971,6 +3238,8 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 				   "action template in using");
 	}
 	LIST_REMOVE(template, next);
+	if (template->tmpl)
+		mlx5dr_action_template_destroy(template->tmpl);
 	mlx5_free(template);
 	return 0;
 }
@@ -3021,11 +3290,48 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			 const struct rte_flow_item items[],
 			 struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 	bool items_end = false;
-	RTE_SET_USED(dev);
-	RTE_SET_USED(attr);
 
+	if (!attr->ingress && !attr->egress && !attr->transfer)
+		return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "at least one of the direction attributes"
+					  " must be specified");
+	if (priv->sh->config.dv_esw_en) {
+		MLX5_ASSERT(priv->master || priv->representor);
+		if (priv->master) {
+			/*
+			 * It is allowed to specify ingress, egress and transfer attributes
+			 * at the same time, in order to construct flows catching all missed
+			 * FDB traffic and forwarding it to the master port.
+			 */
+			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "only one or all direction attributes"
+							  " at once can be used on transfer proxy"
+							  " port");
+		} else {
+			if (attr->transfer)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+							  "transfer attribute cannot be used with"
+							  " port representors");
+			if (attr->ingress && attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "ingress and egress direction attributes"
+							  " cannot be used at the same time on"
+							  " port representors");
+		}
+	} else {
+		if (attr->transfer)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+						  "transfer attribute cannot be used when"
+						  " E-Switch is disabled");
+	}
 	for (i = 0; !items_end; i++) {
 		int type = items[i].type;
 
@@ -3048,7 +3354,6 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		{
 			const struct rte_flow_item_tag *tag =
 				(const struct rte_flow_item_tag *)items[i].spec;
-			struct mlx5_priv *priv = dev->data->dev_private;
 			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
 
 			if (!((1 << (tag->index - REG_C_0)) & regcs))
@@ -3056,7 +3361,26 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 							  NULL,
 							  "Unsupported internal tag index");
+			break;
 		}
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+			if (attr->ingress || attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when transfer attribute is set");
+			break;
+		case RTE_FLOW_ITEM_TYPE_META:
+			if (!priv->sh->config.dv_esw_en ||
+			    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+				if (attr->ingress)
+					return rte_flow_error_set(error, EINVAL,
+								  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+								  "META item is not supported"
+								  " on current FW with ingress"
+								  " attribute");
+			}
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -3066,10 +3390,8 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_TCP:
 		case RTE_FLOW_ITEM_TYPE_GTP:
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
-		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
-		case RTE_FLOW_ITEM_TYPE_META:
 		case RTE_FLOW_ITEM_TYPE_GRE:
 		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
@@ -3117,21 +3439,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress) {
-		/*
-		 * Disallow pattern template with ingress and egress/transfer
-		 * attributes in order to forbid implicit port matching
-		 * on egress and transfer traffic.
-		 */
-		if (attr->egress || attr->transfer) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					   NULL,
-					   "item template for ingress traffic"
-					   " cannot be used for egress/transfer"
-					   " traffic when E-Switch is enabled");
-			return NULL;
-		}
+	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
 		copied_items = flow_hw_copy_prepend_port_item(items, error);
 		if (!copied_items)
 			return NULL;
@@ -4521,6 +4829,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
+		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
+		flow_hw_table_destroy(dev, tbl, NULL);
+	}
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -4658,6 +4970,54 @@ void flow_hw_clear_tags_set(struct rte_eth_dev *dev)
 		       sizeof(enum modify_reg) * MLX5_FLOW_HW_TAGS_MAX);
 }
 
+uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+/**
+ * Initializes static configuration of META flow items.
+ *
+ * As a temporary workaround, META flow item is translated to a register,
+ * based on statically saved dv_esw_en and dv_xmeta_en device arguments.
+ * It is a workaround for flow_hw_get_reg_id() where port specific information
+ * is not available at runtime.
+ *
+ * Values of dv_esw_en and dv_xmeta_en device arguments are taken from the first opened port.
+ * This means that each mlx5 port will use the same configuration for translation
+ * of META flow items.
+ *
+ * @param[in] dev
+ *    Pointer to Ethernet device.
+ */
+void
+flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_fetch_add(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = MLX5_SH(dev)->config.dv_esw_en;
+	mlx5_flow_hw_flow_metadata_xmeta_en = MLX5_SH(dev)->config.dv_xmeta_en;
+}
+
+/**
+ * Clears statically stored configuration related to META flow items.
+ */
+void
+flow_hw_clear_flow_metadata_config(void)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_sub_fetch(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = 0;
+	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
+}
+
 /**
  * Create shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index ccefebefc9..2603196933 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1170,6 +1170,16 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 			dev->data->port_id, rte_strerror(rte_errno));
 		goto error;
 	}
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		ret = flow_hw_table_update(dev, NULL);
+		if (ret) {
+			DRV_LOG(ERR, "port %u failed to update HWS tables",
+				dev->data->port_id);
+			goto error;
+		}
+	}
+#endif
 	ret = mlx5_traffic_enable(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u failed to set defaults flows",
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 10/17] net/mlx5: add HW steering connection tracking support
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (8 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 09/17] net/mlx5: support DR action template API Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 11/17] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

This commit adds the support of connection tracking to HW steering as
SW steering did before.

Different with SW steering implementation, take advantage of HW steering
bulk action allocation support, in HW steering only one single CT pool
is needed.

An indexed pool is introduced to record allocated actions from bulk and
CT action state etc. Once one CT action is allocated from bulk, one
indexed object will also be allocated from the indexed pool, similar for
deallocate. That makes mlx5_aso_ct_action can also be managed by that
indexed pool, no need to be reserved from mlx5_aso_ct_pool. The single
CT pool is also saved to mlx5_aso_ct_action struct directly.

The ASO operation functions are shared with SW steering implementation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   8 +-
 drivers/net/mlx5/mlx5.c          |   3 +-
 drivers/net/mlx5/mlx5.h          |  54 ++++-
 drivers/net/mlx5/mlx5_flow.c     |   1 +
 drivers/net/mlx5/mlx5_flow.h     |   7 +
 drivers/net/mlx5/mlx5_flow_aso.c | 212 +++++++++++++----
 drivers/net/mlx5/mlx5_flow_dv.c  |  28 ++-
 drivers/net/mlx5/mlx5_flow_hw.c  | 381 ++++++++++++++++++++++++++++++-
 8 files changed, 617 insertions(+), 77 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 65795da516..60a1a391fb 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1349,9 +1349,11 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			DRV_LOG(DEBUG, "Flow Hit ASO is supported.");
 		}
 #endif /* HAVE_MLX5_DR_CREATE_ACTION_ASO */
-#if defined(HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
-	defined(HAVE_MLX5_DR_ACTION_ASO_CT)
-		if (hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
+#if defined (HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
+    defined (HAVE_MLX5_DR_ACTION_ASO_CT)
+		/* HWS create CT ASO SQ based on HWS configure queue number. */
+		if (sh->config.dv_flow_en != 2 &&
+		    hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
 			err = mlx5_flow_aso_ct_mng_init(sh);
 			if (err) {
 				err = -err;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cf7b7b7158..925e19bcd5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -755,7 +755,8 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 
 	if (sh->ct_mng)
 		return 0;
-	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng),
+	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng) +
+				 sizeof(struct mlx5_aso_sq) * MLX5_ASO_CT_SQ_NUM,
 				 RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
 	if (!sh->ct_mng) {
 		DRV_LOG(ERR, "ASO CT management allocation failed.");
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 26f627ae1b..df962a1fc0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -39,6 +39,8 @@
 
 #define MLX5_SH(dev) (((struct mlx5_priv *)(dev)->data->dev_private)->sh)
 
+#define MLX5_HW_INV_QUEUE UINT32_MAX
+
 /*
  * Number of modification commands.
  * The maximal actions amount in FW is some constant, and it is 16 in the
@@ -1159,7 +1161,12 @@ enum mlx5_aso_ct_state {
 
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
-	LIST_ENTRY(mlx5_aso_ct_action) next; /* Pointer to the next ASO CT. */
+	union {
+		LIST_ENTRY(mlx5_aso_ct_action) next;
+		/* Pointer to the next ASO CT. Used only in SWS. */
+		struct mlx5_aso_ct_pool *pool;
+		/* Pointer to action pool. Used only in HWS. */
+	};
 	void *dr_action_orig; /* General action object for original dir. */
 	void *dr_action_rply; /* General action object for reply dir. */
 	uint32_t refcnt; /* Action used count in device flows. */
@@ -1173,28 +1180,48 @@ struct mlx5_aso_ct_action {
 #define MLX5_ASO_CT_UPDATE_STATE(c, s) \
 	__atomic_store_n(&((c)->state), (s), __ATOMIC_RELAXED)
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+
 /* ASO connection tracking software pool definition. */
 struct mlx5_aso_ct_pool {
 	uint16_t index; /* Pool index in pools array. */
+	/* Free ASO CT index in the pool. Used by HWS. */
+	struct mlx5_indexed_pool *cts;
 	struct mlx5_devx_obj *devx_obj;
-	/* The first devx object in the bulk, used for freeing (not yet). */
-	struct mlx5_aso_ct_action actions[MLX5_ASO_CT_ACTIONS_PER_POOL];
+	union {
+		void *dummy_action;
+		/* Dummy action to increase the reference count in the driver. */
+		struct mlx5dr_action *dr_action;
+		/* HWS action. */
+	};
+	struct mlx5_aso_sq *sq; /* Async ASO SQ. */
+	struct mlx5_aso_sq *shared_sq; /* Shared ASO SQ. */
+	struct mlx5_aso_ct_action actions[0];
 	/* CT action structures bulk. */
 };
 
 LIST_HEAD(aso_ct_list, mlx5_aso_ct_action);
 
+#define MLX5_ASO_CT_SQ_NUM 16
+
 /* Pools management structure for ASO connection tracking pools. */
 struct mlx5_aso_ct_pools_mng {
 	struct mlx5_aso_ct_pool **pools;
 	uint16_t n; /* Total number of pools. */
 	uint16_t next; /* Number of pools in use, index of next free pool. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
 	rte_spinlock_t ct_sl; /* The ASO CT free list lock. */
 	rte_rwlock_t resize_rwl; /* The ASO CT pool resize lock. */
 	struct aso_ct_list free_cts; /* Free ASO CT objects list. */
-	struct mlx5_aso_sq aso_sq; /* ASO queue objects. */
+	struct mlx5_aso_sq aso_sqs[0]; /* ASO queue objects. */
 };
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
 /* LAG attr. */
 struct mlx5_lag {
 	uint8_t tx_remap_affinity[16]; /* The PF port number of affinity */
@@ -1332,8 +1359,7 @@ struct mlx5_dev_ctx_shared {
 	rte_spinlock_t geneve_tlv_opt_sl; /* Lock for geneve tlv resource */
 	struct mlx5_flow_mtr_mng *mtrmng;
 	/* Meter management structure. */
-	struct mlx5_aso_ct_pools_mng *ct_mng;
-	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pools_mng *ct_mng; /* Management data for ASO CT in HWS only. */
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
@@ -1647,6 +1673,9 @@ struct mlx5_priv {
 	/* HW steering create ongoing rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
+	struct mlx5_aso_ct_pools_mng *ct_mng;
+	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
 #endif
 };
 
@@ -2046,15 +2075,15 @@ int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
-int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
-int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
 			     struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
 mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
@@ -2065,6 +2094,11 @@ int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_hws_cnt_pool *cpool);
+int mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_aso_ct_pools_mng *ct_mng,
+			   uint32_t nb_queues);
+int mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_aso_ct_pools_mng *ct_mng);
 
 /* mlx5_flow_flex.c */
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 658cc69750..cbf9c31984 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -49,6 +49,7 @@ struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
  */
 uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 enum modify_reg mlx5_flow_hw_avl_tags[MLX5_FLOW_HW_TAGS_MAX] = {REG_NON};
+enum modify_reg mlx5_flow_hw_aso_tag;
 
 struct tunnel_default_miss_ctx {
 	uint16_t *queue;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ae1417f10e..f75a56a57b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -82,6 +82,10 @@ enum {
 #define MLX5_INDIRECT_ACT_CT_GET_IDX(index) \
 	((index) & ((1 << MLX5_INDIRECT_ACT_CT_OWNER_SHIFT) - 1))
 
+#define MLX5_ACTION_CTX_CT_GET_IDX  MLX5_INDIRECT_ACT_CT_GET_IDX
+#define MLX5_ACTION_CTX_CT_GET_OWNER MLX5_INDIRECT_ACT_CT_GET_OWNER
+#define MLX5_ACTION_CTX_CT_GEN_IDX MLX5_INDIRECT_ACT_CT_GEN_IDX
+
 /* Matches on selected register. */
 struct mlx5_rte_flow_item_tag {
 	enum modify_reg id;
@@ -1444,6 +1448,7 @@ extern struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
 #define MLX5_FLOW_HW_TAGS_MAX 8
 extern uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 extern enum modify_reg mlx5_flow_hw_avl_tags[];
+extern enum modify_reg mlx5_flow_hw_aso_tag;
 
 /*
  * Get metadata match tag and mask for given rte_eth_dev port.
@@ -1518,6 +1523,8 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 * REG_B case should be rejected on pattern template validation.
 		 */
 		return REG_A;
+	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index ed9272e583..c00c07b891 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -313,16 +313,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		/* 64B per object for query. */
-		if (mlx5_aso_reg_mr(cdev, 64 * sq_desc_n,
-				    &sh->ct_mng->aso_sq.mr))
+		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
 			return -1;
-		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
-			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
-			return -1;
-		}
-		mlx5_aso_ct_init_sq(&sh->ct_mng->aso_sq);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
@@ -343,7 +335,7 @@ void
 mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		      enum mlx5_access_aso_opc_mod aso_opc_mod)
 {
-	struct mlx5_aso_sq *sq;
+	struct mlx5_aso_sq *sq = NULL;
 
 	switch (aso_opc_mod) {
 	case ASO_OPC_MOD_FLOW_HIT:
@@ -354,14 +346,14 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->mtrmng->pools_mng.sq;
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		mlx5_aso_dereg_mr(sh->cdev, &sh->ct_mng->aso_sq.mr);
-		sq = &sh->ct_mng->aso_sq;
+		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
 		return;
 	}
-	mlx5_aso_destroy_sq(sq);
+	if (sq)
+		mlx5_aso_destroy_sq(sq);
 }
 
 /**
@@ -903,6 +895,89 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 	return -1;
 }
 
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_hws(uint32_t queue,
+			    struct mlx5_aso_ct_pool *pool)
+{
+	return (queue == MLX5_HW_INV_QUEUE) ?
+		pool->shared_sq : &pool->sq[queue];
+}
+
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_sws(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_ct_action *ct)
+{
+	return &sh->ct_mng->aso_sqs[ct->offset & (MLX5_ASO_CT_SQ_NUM - 1)];
+}
+
+static inline struct mlx5_aso_ct_pool*
+__mlx5_aso_ct_get_pool(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_action *ct)
+{
+	if (likely(sh->config.dv_flow_en == 2))
+		return ct->pool;
+	return container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+}
+
+int
+mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			 struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < ct_mng->nb_sq; i++) {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	}
+	return 0;
+}
+
+/**
+ * API to create and initialize CT Send Queue used for ASO access.
+ *
+ * @param[in] sh
+ *   Pointer to shared device context.
+ * @param[in] ct_mng
+ *   Pointer to the CT management struct.
+ * *param[in] nb_queues
+ *   Number of queues to be allocated.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_pools_mng *ct_mng,
+		       uint32_t nb_queues)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < nb_queues; i++) {
+		if (mlx5_aso_reg_mr(sh->cdev, 64 * (1 << MLX5_ASO_QUEUE_LOG_DESC),
+				    &ct_mng->aso_sqs[i].mr))
+			goto error;
+		if (mlx5_aso_sq_create(sh->cdev, &ct_mng->aso_sqs[i],
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_ct_init_sq(&ct_mng->aso_sqs[i]);
+	}
+	ct_mng->nb_sq = nb_queues;
+	return 0;
+error:
+	do {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		if (&ct_mng->aso_sqs[i])
+			mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	} while (i--);
+	ct_mng->nb_sq = 0;
+	return -1;
+}
+
 /*
  * Post a WQE to the ASO CT SQ to modify the context.
  *
@@ -918,11 +993,12 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
  */
 static uint16_t
 mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
+			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile)
+			      const struct rte_flow_action_conntrack *profile,
+			      bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -931,11 +1007,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	void *orig_dir;
 	void *reply_dir;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	/* Prevent other threads to update the index. */
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -945,7 +1023,7 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
 	sq->elts[sq->head & mask].ct = ct;
 	sq->elts[sq->head & mask].query_data = NULL;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1028,7 +1106,8 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1080,10 +1159,11 @@ mlx5_aso_ct_status_update(struct mlx5_aso_sq *sq, uint16_t num)
  */
 static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
-			    struct mlx5_aso_ct_action *ct, char *data)
+			    struct mlx5_aso_sq *sq,
+			    struct mlx5_aso_ct_action *ct, char *data,
+			    bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -1098,10 +1178,12 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	} else if (state == ASO_CONNTRACK_WAIT) {
 		return 0;
 	}
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -1113,7 +1195,7 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	wqe_idx = sq->head & mask;
 	sq->elts[wqe_idx].ct = ct;
 	sq->elts[wqe_idx].query_data = data;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1141,7 +1223,8 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1152,9 +1235,10 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
  *   Pointer to the CT pools management structure.
  */
 static void
-mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
+mlx5_aso_ct_completion_handle(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			      struct mlx5_aso_sq *sq,
+			      bool need_lock)
 {
-	struct mlx5_aso_sq *sq = &mng->aso_sq;
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
 	const uint32_t cq_size = 1 << cq->log_desc_n;
@@ -1165,10 +1249,12 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return;
 	}
 	next_idx = cq->cq_ci & mask;
@@ -1199,7 +1285,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /*
@@ -1207,6 +1294,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue index.
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  * @param[in] profile
@@ -1217,21 +1306,26 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  */
 int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
 			  const struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, ct, profile))
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1242,6 +1336,8 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue which CT works on..
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  *
@@ -1249,25 +1345,29 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, -1 on failure.
  */
 int
-mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		       struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 	    ASO_CONNTRACK_READY)
 		return 0;
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 		    ASO_CONNTRACK_READY)
 			return 0;
 		/* Waiting for CQE ready, consider should block or sleep. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to poll CQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1363,18 +1463,24 @@ mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
  */
 int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
 			 struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	char out_data[64 * 2];
 	int ret;
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		ret = mlx5_aso_ct_sq_query_single(sh, ct, out_data);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1383,12 +1489,11 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		else
 			rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
 data_handle:
-	ret = mlx5_aso_ct_wait_ready(sh, ct);
+	ret = mlx5_aso_ct_wait_ready(sh, queue, ct);
 	if (!ret)
 		mlx5_aso_ct_obj_analyze(profile, out_data);
 	return ret;
@@ -1408,13 +1513,20 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
  */
 int
 mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+		      uint32_t queue,
 		      struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	enum mlx5_aso_ct_state state =
 				__atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (state == ASO_CONNTRACK_FREE) {
 		rte_errno = ENXIO;
 		return -rte_errno;
@@ -1423,13 +1535,13 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		return 0;
 	}
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		state = __atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 		if (state == ASO_CONNTRACK_READY ||
 		    state == ASO_CONNTRACK_QUERY)
 			return 0;
-		/* Waiting for CQE ready, consider should block or sleep. */
-		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
+		/* Waiting for CQE ready, consider should block or sleep.  */
+		rte_delay_us_block(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
 	rte_errno = EBUSY;
 	return -rte_errno;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 90441fbd6e..9721c5c311 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12801,6 +12801,7 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 	struct mlx5_devx_obj *obj = NULL;
 	uint32_t i;
 	uint32_t log_obj_size = rte_log2_u32(MLX5_ASO_CT_ACTIONS_PER_POOL);
+	size_t mem_size;
 
 	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
 							  priv->sh->cdev->pdn,
@@ -12810,7 +12811,10 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
 		return NULL;
 	}
-	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	mem_size = sizeof(struct mlx5_aso_ct_action) *
+		   MLX5_ASO_CT_ACTIONS_PER_POOL +
+		   sizeof(*pool);
+	pool = mlx5_malloc(MLX5_MEM_ZERO, mem_size, 0, SOCKET_ID_ANY);
 	if (!pool) {
 		rte_errno = ENOMEM;
 		claim_zero(mlx5_devx_cmd_destroy(obj));
@@ -12950,10 +12954,13 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, ct, pro))
-		return rte_flow_error_set(error, EBUSY,
-					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					  "Failed to update CT");
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+		flow_dv_aso_ct_dev_release(dev, idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	return idx;
@@ -14147,7 +14154,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
 						"Failed to get CT object.");
-			if (mlx5_aso_ct_available(priv->sh, ct))
+			if (mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct))
 				return rte_flow_error_set(error, rte_errno,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
@@ -15755,14 +15762,15 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						ct, new_prf);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
 					"Failed to send CT context update WQE");
-		/* Block until ready or a failure. */
-		ret = mlx5_aso_ct_available(priv->sh, ct);
+		/* Block until ready or a failure, default is asynchronous. */
+		ret = mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct);
 		if (ret)
 			rte_flow_error_set(error, rte_errno,
 					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16591,7 +16599,7 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 2798fedd00..9f575786f7 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -15,6 +15,14 @@
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_HW_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_HW_IPOOL_CACHE_MIN (1 << 9)
+
 /* Default push burst threshold. */
 #define BURST_THR 32u
 
@@ -324,6 +332,25 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 	return hrxq;
 }
 
+static __rte_always_inline int
+flow_hw_ct_compile(struct rte_eth_dev *dev,
+		   uint32_t queue, uint32_t idx,
+		   struct mlx5dr_rule_action *rule_act)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(priv->hws_ctpool->cts, MLX5_ACTION_CTX_CT_GET_IDX(idx));
+	if (!ct || mlx5_aso_ct_available(priv->sh, queue, ct))
+		return -1;
+	rule_act->action = priv->hws_ctpool->dr_action;
+	rule_act->aso_ct.offset = ct->offset;
+	rule_act->aso_ct.direction = ct->is_original ?
+		MLX5DR_ACTION_ASO_CT_DIRECTION_INITIATOR :
+		MLX5DR_ACTION_ASO_CT_DIRECTION_RESPONDER;
+	return 0;
+}
+
 /**
  * Destroy DR actions created by action template.
  *
@@ -623,6 +650,11 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
+				       idx, &acts->rule_acts[action_dst]))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1066,6 +1098,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	bool reformat_used = false;
 	uint16_t action_pos;
 	uint16_t jump_pos;
+	uint32_t ct_idx;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1288,6 +1321,20 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			action_pos = at->actions_off[actions - action_start];
+			if (masks->conf) {
+				ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+					 ((uint32_t)(uintptr_t)actions->conf);
+				if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE, ct_idx,
+						       &acts->rule_acts[action_pos]))
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos)) {
+				goto err;
+			}
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1462,6 +1509,8 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev data structure.
+ * @param[in] queue
+ *   The flow creation queue index.
  * @param[in] action
  *   Pointer to the shared indirect rte_flow action.
  * @param[in] table
@@ -1475,7 +1524,7 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *    0 on success, negative value otherwise and rte_errno is set.
  */
 static __rte_always_inline int
-flow_hw_shared_action_construct(struct rte_eth_dev *dev,
+flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
 				const uint8_t it_idx,
@@ -1515,6 +1564,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				&rule_act->counter.offset))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1710,6 +1763,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		uint32_t ct_idx;
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
@@ -1718,7 +1772,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
-					(dev, action, table, it_idx,
+					(dev, queue, action, table, it_idx,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -1843,6 +1897,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				return ret;
 			job->flow->cnt_id = act_data->shared_counter.id;
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+				 ((uint32_t)(uintptr_t)action->conf);
+			if (flow_hw_ct_compile(dev, queue, ct_idx,
+					       &rule_acts[act_data->action_dst]))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2372,6 +2433,8 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	if (nb_flows < cfg.trunk_size) {
 		cfg.per_core_cache = 0;
 		cfg.trunk_size = nb_flows;
+	} else if (nb_flows <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
 	}
 	/* Check if we requires too many templates. */
 	if (nb_item_templates > max_tpl ||
@@ -2889,6 +2952,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2915,6 +2981,7 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 };
 
 static int
@@ -2943,6 +3010,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3397,6 +3469,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
 		case RTE_FLOW_ITEM_TYPE_ICMP:
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
+		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
@@ -4592,6 +4665,97 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	return -EINVAL;
 }
 
+static void
+flow_hw_ct_mng_destroy(struct rte_eth_dev *dev,
+		       struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	mlx5_aso_ct_queue_uninit(priv->sh, ct_mng);
+	mlx5_free(ct_mng);
+}
+
+static void
+flow_hw_ct_pool_destroy(struct rte_eth_dev *dev __rte_unused,
+			struct mlx5_aso_ct_pool *pool)
+{
+	if (pool->dr_action)
+		mlx5dr_action_destroy(pool->dr_action);
+	if (pool->devx_obj)
+		claim_zero(mlx5_devx_cmd_destroy(pool->devx_obj));
+	if (pool->cts)
+		mlx5_ipool_destroy(pool->cts);
+	mlx5_free(pool);
+}
+
+static struct mlx5_aso_ct_pool *
+flow_hw_ct_pool_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *port_attr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_devx_obj *obj;
+	uint32_t nb_cts = rte_align32pow2(port_attr->nb_conn_tracks);
+	uint32_t log_obj_size = rte_log2_u32(nb_cts);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_ct_action),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hw_ct_action",
+	};
+	int reg_id;
+	uint32_t flags;
+
+	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	if (!pool) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
+							  priv->sh->cdev->pdn,
+							  log_obj_size);
+	if (!obj) {
+		rte_errno = ENODATA;
+		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
+		goto err;
+	}
+	pool->devx_obj = obj;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_ASO_CONNTRACK, 0, NULL);
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	pool->dr_action = mlx5dr_action_create_aso_ct(priv->dr_ctx,
+						      (struct mlx5dr_devx_obj *)obj,
+						      reg_id - REG_C_0, flags);
+	if (!pool->dr_action)
+		goto err;
+	/*
+	 * No need for local cache if CT number is a small number. Since
+	 * flow insertion rate will be very limited in that case. Here let's
+	 * set the number to less than default trunk size 4K.
+	 */
+	if (nb_cts <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_cts;
+	} else if (nb_cts <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	pool->cts = mlx5_ipool_create(&cfg);
+	if (!pool->cts)
+		goto err;
+	pool->sq = priv->ct_mng->aso_sqs;
+	/* Assign the last extra ASO SQ as public SQ. */
+	pool->shared_sq = &priv->ct_mng->aso_sqs[priv->nb_queue - 1];
+	return pool;
+err:
+	flow_hw_ct_pool_destroy(dev, pool);
+	return NULL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4777,6 +4941,20 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_conn_tracks) {
+		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
+			   sizeof(*priv->ct_mng);
+		priv->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
+					   RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!priv->ct_mng)
+			goto err;
+		if (mlx5_aso_ct_queue_init(priv->sh, priv->ct_mng, nb_q_updated))
+			goto err;
+		priv->hws_ctpool = flow_hw_ct_pool_create(dev, port_attr);
+		if (!priv->hws_ctpool)
+			goto err;
+		priv->sh->ct_aso_en = 1;
+	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
 				nb_queue);
@@ -4785,6 +4963,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	return 0;
 err:
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -4858,6 +5044,14 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	}
 	if (priv->hws_cpool)
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4926,6 +5120,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
+		MLX5_ASSERT(mlx5_flow_hw_aso_tag == priv->mtr_color_reg);
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
@@ -4948,6 +5143,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		}
 	}
 	priv->sh->hws_tags = 1;
+	mlx5_flow_hw_aso_tag = (enum modify_reg)priv->mtr_color_reg;
 	mlx5_flow_hw_avl_tags_init_cnt++;
 }
 
@@ -5018,6 +5214,170 @@ flow_hw_clear_flow_metadata_config(void)
 	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
 }
 
+static int
+flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
+			  uint32_t idx,
+			  struct rte_flow_error *error)
+{
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	struct rte_eth_dev *owndev = &rte_eth_devices[owner];
+	struct mlx5_priv *priv = owndev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT destruction index");
+	}
+	__atomic_store_n(&ct->state, ASO_CONNTRACK_FREE,
+				 __ATOMIC_RELAXED);
+	mlx5_ipool_free(pool->cts, ct_idx);
+	return 0;
+}
+
+static int
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+			struct rte_flow_action_conntrack *profile,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+
+	if (owner != PORT_ID(priv))
+		return rte_flow_error_set(error, EACCES,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Can't query CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT query index");
+	}
+	profile->peer_port = ct->peer;
+	profile->is_original_dir = ct->is_original;
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+		return rte_flow_error_set(error, EIO,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Failed to query CT context");
+	return 0;
+}
+
+
+static int
+flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_modify_conntrack *action_conf,
+			 uint32_t idx, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	const struct rte_flow_action_conntrack *new_prf;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+	int ret = 0;
+
+	if (PORT_ID(priv) != owner)
+		return rte_flow_error_set(error, EACCES,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "Can't update CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT update index");
+	}
+	new_prf = &action_conf->new_ct;
+	if (action_conf->direction)
+		ct->is_original = !!new_prf->is_original_dir;
+	if (action_conf->state) {
+		/* Only validate the profile when it needs to be updated. */
+		ret = mlx5_validate_action_ct(dev, new_prf, error);
+		if (ret)
+			return ret;
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		if (ret)
+			return rte_flow_error_set(error, EIO,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL,
+					"Failed to send CT context update WQE");
+		if (queue != MLX5_HW_INV_QUEUE)
+			return 0;
+		/* Block until ready or a failure in synchronous mode. */
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret)
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+	}
+	return ret;
+}
+
+static struct rte_flow_action_handle *
+flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action_conntrack *pro,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint32_t ct_idx = 0;
+	int ret;
+	bool async = !!(queue != MLX5_HW_INV_QUEUE);
+
+	if (!pool) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "CT is not enabled");
+		return 0;
+	}
+	ct = mlx5_ipool_zmalloc(pool->cts, &ct_idx);
+	if (!ct) {
+		rte_flow_error_set(error, rte_errno,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to allocate CT object");
+		return 0;
+	}
+	ct->offset = ct_idx - 1;
+	ct->is_original = !!pro->is_original_dir;
+	ct->peer = pro->peer_port;
+	ct->pool = pool;
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+		mlx5_ipool_free(pool->cts, ct_idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
+	if (!async) {
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret) {
+			mlx5_ipool_free(pool->cts, ct_idx);
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+			return 0;
+		}
+	}
+	return (struct rte_flow_action_handle *)(uintptr_t)
+		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
+}
+
 /**
  * Create shared action.
  *
@@ -5065,6 +5425,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			handle = (struct rte_flow_action_handle *)
 				 (uintptr_t)cnt_id;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5100,10 +5463,18 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_update(dev, handle, update, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	default:
+		return flow_dv_action_update(dev, handle, update, error);
+	}
 }
 
 /**
@@ -5142,6 +5513,8 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_destroy(dev, act_idx, error);
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -5295,6 +5668,8 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_query(dev, act_idx, data, error);
 	default:
 		return flow_dv_action_query(dev, handle, data, error);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 11/17] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (9 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 10/17] net/mlx5: add HW steering connection tracking support Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 12/17] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

Add PMD implementation for HW steering VLAN push, pop and modify flow
actions.

HWS VLAN push flow action is triggered by a sequence of mandatory
OF_PUSH_VLAN, OF_SET_VLAN_VID and optional OF_SET_VLAN_PCP
flow actions commands.
The commands must be arranged in the exact order:
OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ].
In masked HWS VLAN push flow action template *ALL* the above flow
actions must be masked.
In non-masked HWS VLAN push flow action template *ALL* the above flow
actions must not be masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan / \
  of_set_vlan_vid \
  [ / of_set_vlan_pcp  ] / end \
mask \
  of_push_vlan ethertype 0 / \
  of_set_vlan_vid vlan_vid 0 \
  [ / of_set_vlan_pcp vlan_pcp 0 ] / end\

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan ethertype <E>/ \
  of_set_vlan_vid vlan_vid <VID>\
  [ / of_set_vlan_pcp  <PCP>] / end \
mask \
  of_push_vlan ethertype <type != 0> / \
  of_set_vlan_vid vlan_vid <vid_mask != 0>\
  [ / of_set_vlan_pcp vlan_pcp <pcp_mask != 0> ] / end\

HWS VLAN pop flow action is triggered by OF_POP_VLAN
flow action command.
HWS VLAN pop action template is always non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_pop_vlan / end mask of_pop_vlan / end

HWS VLAN VID modify flow action is triggered by a standalone
OF_SET_VLAN_VID flow action command.
HWS VLAN VID modify action template can be ether masked or non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid / end mask of_set_vlan_vid vlan_vid 0 / end

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid vlan_vid 0x101 / end \
mask of_set_vlan_vid vlan_vid 0xffff / end

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5.h         |   2 +
 drivers/net/mlx5/mlx5_flow.h    |   4 +
 drivers/net/mlx5/mlx5_flow_dv.c |   2 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 492 +++++++++++++++++++++++++++++---
 4 files changed, 463 insertions(+), 37 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index df962a1fc0..16cd261942 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1665,6 +1665,8 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action *hw_push_vlan[MLX5DR_TABLE_TYPE_MAX];
+	struct mlx5dr_action *hw_pop_vlan[MLX5DR_TABLE_TYPE_MAX];
 	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
 	struct mlx5dr_action *hw_drop[2];
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index f75a56a57b..6d928b477e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2435,4 +2435,8 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		struct rte_flow_error *error);
 int flow_hw_table_update(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error);
+int mlx5_flow_item_field_width(struct rte_eth_dev *dev,
+			   enum rte_flow_field_id field, int inherit,
+			   const struct rte_flow_attr *attr,
+			   struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 9721c5c311..7e0829d1fd 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1326,7 +1326,7 @@ flow_dv_convert_action_modify_ipv6_dscp
 					     MLX5_MODIFICATION_TYPE_SET, error);
 }
 
-static int
+int
 mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 			   enum rte_flow_field_id field, int inherit,
 			   const struct rte_flow_attr *attr,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 9f575786f7..af06659052 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -44,12 +44,22 @@
 /* Lowest priority for HW non-root table. */
 #define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
+#define MLX5_HW_VLAN_PUSH_TYPE_IDX 0
+#define MLX5_HW_VLAN_PUSH_VID_IDX 1
+#define MLX5_HW_VLAN_PUSH_PCP_IDX 2
+
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
 static int flow_hw_translate_group(struct rte_eth_dev *dev,
 				   const struct mlx5_flow_template_table_cfg *cfg,
 				   uint32_t group,
 				   uint32_t *table_group,
 				   struct rte_flow_error *error);
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -1048,6 +1058,52 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	return 0;
 }
 
+static __rte_always_inline bool
+is_of_vlan_pcp_present(const struct rte_flow_action *actions)
+{
+	/*
+	 * Order of RTE VLAN push actions is
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	return actions[MLX5_HW_VLAN_PUSH_PCP_IDX].type ==
+		RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP;
+}
+
+static __rte_always_inline bool
+is_template_masked_push_vlan(const struct rte_flow_action_of_push_vlan *mask)
+{
+	/*
+	 * In masked push VLAN template all RTE push actions are masked.
+	 */
+	return mask && mask->ethertype != 0;
+}
+
+static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
+{
+/*
+ * OpenFlow Switch Specification defines 801.1q VID as 12+1 bits.
+ */
+	rte_be32_t type, vid, pcp;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	rte_be32_t vid_lo, vid_hi;
+#endif
+
+	type = ((const struct rte_flow_action_of_push_vlan *)
+		actions[MLX5_HW_VLAN_PUSH_TYPE_IDX].conf)->ethertype;
+	vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+		actions[MLX5_HW_VLAN_PUSH_VID_IDX].conf)->vlan_vid;
+	pcp = is_of_vlan_pcp_present(actions) ?
+	      ((const struct rte_flow_action_of_set_vlan_pcp *)
+		      actions[MLX5_HW_VLAN_PUSH_PCP_IDX].conf)->vlan_pcp : 0;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	vid_hi = vid & 0xff;
+	vid_lo = vid >> 8;
+	return (((vid_lo << 8) | (pcp << 5) | vid_hi) << 16) | type;
+#else
+	return (type << 16) | (pcp << 13) | vid;
+#endif
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1150,6 +1206,26 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_push_vlan[type];
+			if (is_template_masked_push_vlan(masks->conf))
+				acts->rule_acts[action_pos].push_vlan.vlan_hdr =
+					vlan_hdr_to_be32(actions);
+			else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos))
+				goto err;
+			actions += is_of_vlan_pcp_present(actions) ?
+					MLX5_HW_VLAN_PUSH_PCP_IDX :
+					MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_pop_vlan[type];
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			action_pos = at->actions_off[actions - action_start];
 			if (masks->conf &&
@@ -1767,8 +1843,17 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
-		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
-			    (int)action->type == act_data->type);
+		/*
+		 * action template construction replaces
+		 * OF_SET_VLAN_VID with MODIFY_FIELD
+		 */
+		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+			MLX5_ASSERT(act_data->type ==
+				    RTE_FLOW_ACTION_TYPE_MODIFY_FIELD);
+		else
+			MLX5_ASSERT(action->type ==
+				    RTE_FLOW_ACTION_TYPE_INDIRECT ||
+				    (int)action->type == act_data->type);
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
@@ -1784,6 +1869,10 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			      (action->conf))->id);
 			rule_acts[act_data->action_dst].tag.value = tag;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			rule_acts[act_data->action_dst].push_vlan.vlan_hdr =
+				vlan_hdr_to_be32(action);
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
@@ -1835,10 +1924,16 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				    act_data->encap.len);
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			ret = flow_hw_modify_field_construct(job,
-							     act_data,
-							     hw_acts,
-							     action);
+			if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+				ret = flow_hw_set_vlan_vid_construct(dev, job,
+								     act_data,
+								     hw_acts,
+								     action);
+			else
+				ret = flow_hw_modify_field_construct(job,
+								     act_data,
+								     hw_acts,
+								     action);
 			if (ret)
 				return -1;
 			break;
@@ -2540,9 +2635,14 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			mlx5_ipool_destroy(tbl->flow);
 		mlx5_free(tbl);
 	}
-	rte_flow_error_set(error, err,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
-			  "fail to create rte table");
+	if (error != NULL) {
+		rte_flow_error_set(error, err,
+				error->type == RTE_FLOW_ERROR_TYPE_NONE ?
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED : error->type,
+				NULL,
+				error->message == NULL ?
+				"fail to create rte table" : error->message);
+	}
 	return NULL;
 }
 
@@ -2827,28 +2927,76 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 				uint16_t *ins_pos)
 {
 	uint16_t idx, total = 0;
-	bool ins = false;
+	uint16_t end_idx = UINT16_MAX;
 	bool act_end = false;
+	bool modify_field = false;
+	bool rss_or_queue = false;
 
 	MLX5_ASSERT(actions && masks);
 	MLX5_ASSERT(new_actions && new_masks);
 	MLX5_ASSERT(ins_actions && ins_masks);
 	for (idx = 0; !act_end; idx++) {
-		if (idx >= MLX5_HW_MAX_ACTS)
-			return -1;
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
-		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
-			ins = true;
-			*ins_pos = idx;
-		}
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* It is assumed that application provided only single RSS/QUEUE action. */
+			MLX5_ASSERT(!rss_or_queue);
+			rss_or_queue = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			modify_field = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			end_idx = idx;
 			act_end = true;
+			break;
+		default:
+			break;
+		}
 	}
-	if (!ins)
+	if (!rss_or_queue)
 		return 0;
-	else if (idx == MLX5_HW_MAX_ACTS)
+	else if (idx >= MLX5_HW_MAX_ACTS)
 		return -1; /* No more space. */
 	total = idx;
+	/*
+	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
+	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
+	 * first MODIFY_FIELD flow action.
+	 */
+	if (modify_field) {
+		*ins_pos = end_idx;
+		goto insert_meta_copy;
+	}
+	/*
+	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
+	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	 */
+	act_end = false;
+	for (idx = 0; !act_end; idx++) {
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+		case RTE_FLOW_ACTION_TYPE_METER:
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			*ins_pos = idx;
+			act_end = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			act_end = true;
+			break;
+		default:
+			break;
+		}
+	}
+insert_meta_copy:
+	MLX5_ASSERT(*ins_pos != UINT16_MAX);
+	MLX5_ASSERT(*ins_pos < total);
 	/* Before the position, no change for the actions. */
 	for (idx = 0; idx < *ins_pos; idx++) {
 		new_actions[idx] = actions[idx];
@@ -2865,6 +3013,73 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 	return 0;
 }
 
+static int
+flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
+				  const
+				  struct rte_flow_actions_template_attr *attr,
+				  const struct rte_flow_action *action,
+				  const struct rte_flow_action *mask,
+				  struct rte_flow_error *error)
+{
+#define X_FIELD(ptr, t, f) (((ptr)->conf) && ((t *)((ptr)->conf))->f)
+
+	const bool masked_push =
+		X_FIELD(mask + MLX5_HW_VLAN_PUSH_TYPE_IDX,
+			const struct rte_flow_action_of_push_vlan, ethertype);
+	bool masked_param;
+
+	/*
+	 * Mandatory actions order:
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+	/* Check that mark matches OF_PUSH_VLAN */
+	if (mask[MLX5_HW_VLAN_PUSH_TYPE_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: mask does not match");
+	/* Check that the second template and mask items are SET_VLAN_VID */
+	if (action[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID ||
+	    mask[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: invalid actions order");
+	masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_VID_IDX,
+			       const struct rte_flow_action_of_set_vlan_vid,
+			       vlan_vid);
+	/*
+	 * PMD requires OF_SET_VLAN_VID mask to must match OF_PUSH_VLAN
+	 */
+	if (masked_push ^ masked_param)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "OF_SET_VLAN_VID: mask does not match OF_PUSH_VLAN");
+	if (is_of_vlan_pcp_present(action)) {
+		if (mask[MLX5_HW_VLAN_PUSH_PCP_IDX].type !=
+		     RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "OF_SET_VLAN_PCP: missing mask configuration");
+		masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_PCP_IDX,
+				       const struct
+				       rte_flow_action_of_set_vlan_pcp,
+				       vlan_pcp);
+		/*
+		 * PMD requires OF_SET_VLAN_PCP mask to must match OF_PUSH_VLAN
+		 */
+		if (masked_push ^ masked_param)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION, action,
+						  "OF_SET_VLAN_PCP: mask does not match OF_PUSH_VLAN");
+	}
+	return 0;
+#undef X_FIELD
+}
+
 static int
 flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
@@ -2955,6 +3170,18 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			ret = flow_hw_validate_action_push_vlan
+					(dev, attr, action, mask, error);
+			if (ret != 0)
+				return ret;
+			i += is_of_vlan_pcp_present(action) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2982,6 +3209,8 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
+	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
+	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
 };
 
 static int
@@ -3098,6 +3327,14 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				goto err_actions_num;
 			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			i += is_of_vlan_pcp_present(at->actions + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3125,6 +3362,89 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	return NULL;
 }
 
+static void
+flow_hw_set_vlan_vid(struct rte_eth_dev *dev,
+		     struct rte_flow_action *ra,
+		     struct rte_flow_action *rm,
+		     struct rte_flow_action_modify_field *spec,
+		     struct rte_flow_action_modify_field *mask,
+		     int set_vlan_vid_ix)
+{
+	struct rte_flow_error error;
+	const bool masked = rm[set_vlan_vid_ix].conf &&
+		(((const struct rte_flow_action_of_set_vlan_vid *)
+			rm[set_vlan_vid_ix].conf)->vlan_vid != 0);
+	const struct rte_flow_action_of_set_vlan_vid *conf =
+		ra[set_vlan_vid_ix].conf;
+	rte_be16_t vid = masked ? conf->vlan_vid : 0;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	*spec = (typeof(*spec)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	*mask = (typeof(*mask)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0xffffffff, .offset = 0xffffffff,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = masked ? (1U << width) - 1 : 0,
+			.offset = 0,
+		},
+		.width = 0xffffffff,
+	};
+	ra[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	ra[set_vlan_vid_ix].conf = spec;
+	rm[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	rm[set_vlan_vid_ix].conf = mask;
+}
+
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	struct rte_flow_error error;
+	rte_be16_t vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+			   action->conf)->vlan_vid;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	struct rte_flow_action_modify_field conf = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	struct rte_flow_action modify_action = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &conf
+	};
+
+	return flow_hw_modify_field_construct(job, act_data, hw_acts,
+					      &modify_action);
+}
+
 /**
  * Create flow action template.
  *
@@ -3150,14 +3470,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_num, act_len, mask_len;
+	int len, act_len, mask_len;
+	unsigned int act_num;
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
-	uint16_t pos = MLX5_HW_MAX_ACTS;
+	uint16_t pos = UINT16_MAX;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
-	const struct rte_flow_action *ra;
-	const struct rte_flow_action *rm;
+	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
+	struct rte_flow_action *rm = (void *)(uintptr_t)masks;
+	int set_vlan_vid_ix = -1;
+	struct rte_flow_action_modify_field set_vlan_vid_spec = {0, };
+	struct rte_flow_action_modify_field set_vlan_vid_mask = {0, };
 	const struct rte_flow_action_modify_field rx_mreg = {
 		.operation = RTE_FLOW_MODIFY_SET,
 		.dst = {
@@ -3197,21 +3521,58 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
+		/* Application should make sure only one Q/RSS exist in one rule. */
 		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
 						    tmp_action, tmp_mask, &pos)) {
 			rte_flow_error_set(error, EINVAL,
 					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					   "Failed to concatenate new action/mask");
 			return NULL;
+		} else if (pos != UINT16_MAX) {
+			ra = tmp_action;
+			rm = tmp_mask;
 		}
 	}
-	/* Application should make sure only one Q/RSS exist in one rule. */
-	if (pos == MLX5_HW_MAX_ACTS) {
-		ra = actions;
-		rm = masks;
-	} else {
-		ra = tmp_action;
-		rm = tmp_mask;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		switch (ra[i].type) {
+		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			i += is_of_vlan_pcp_present(ra + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			set_vlan_vid_ix = i;
+			break;
+		default:
+			break;
+		}
+	}
+	/*
+	 * Count flow actions to allocate required space for storing DR offsets and to check
+	 * if temporary buffer would not be overrun.
+	 */
+	act_num = i + 1;
+	if (act_num >= MLX5_HW_MAX_ACTS) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
+		return NULL;
+	}
+	if (set_vlan_vid_ix != -1) {
+		/* If temporary action buffer was not used, copy template actions to it */
+		if (ra == actions && rm == masks) {
+			for (i = 0; i < act_num; ++i) {
+				tmp_action[i] = actions[i];
+				tmp_mask[i] = masks[i];
+				if (actions[i].type == RTE_FLOW_ACTION_TYPE_END)
+					break;
+			}
+			ra = tmp_action;
+			rm = tmp_mask;
+		}
+		flow_hw_set_vlan_vid(dev, ra, rm,
+				     &set_vlan_vid_spec, &set_vlan_vid_mask,
+				     set_vlan_vid_ix);
 	}
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
@@ -3221,10 +3582,6 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	/* Count flow actions to allocate required space for storing DR offsets. */
-	act_num = 0;
-	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
-		act_num++;
 	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
@@ -4475,7 +4832,11 @@ flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
 		.attr = tx_tbl_attr,
 		.external = false,
 	};
-	struct rte_flow_error drop_err;
+	struct rte_flow_error drop_err = {
+		.type = RTE_FLOW_ERROR_TYPE_NONE,
+		.cause = NULL,
+		.message = NULL,
+	};
 
 	RTE_SET_USED(drop_err);
 	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
@@ -4756,6 +5117,60 @@ flow_hw_ct_pool_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+static void
+flow_hw_destroy_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i < MLX5DR_TABLE_TYPE_MAX; i++) {
+		if (priv->hw_pop_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_pop_vlan[i]);
+			priv->hw_pop_vlan[i] = NULL;
+		}
+		if (priv->hw_push_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_push_vlan[i]);
+			priv->hw_push_vlan[i] = NULL;
+		}
+	}
+}
+
+static int
+flow_hw_create_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+	const enum mlx5dr_action_flags flags[MLX5DR_TABLE_TYPE_MAX] = {
+		MLX5DR_ACTION_FLAG_HWS_RX,
+		MLX5DR_ACTION_FLAG_HWS_TX,
+		MLX5DR_ACTION_FLAG_HWS_FDB
+	};
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i <= MLX5DR_TABLE_TYPE_NIC_TX; i++) {
+		priv->hw_pop_vlan[i] =
+			mlx5dr_action_create_pop_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+		priv->hw_push_vlan[i] =
+			mlx5dr_action_create_push_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+	}
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_pop_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+		priv->hw_push_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_push_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+	}
+	return 0;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4961,6 +5376,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	ret = flow_hw_create_vlan(dev);
+	if (ret)
+		goto err;
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -4978,6 +5396,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
 	mlx5_free(priv->hw_q);
@@ -5037,6 +5456,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 12/17] net/mlx5: implement METER MARK indirect action for HWS
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (10 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 11/17] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 13/17] net/mlx5: add HWS AGE action support Suanming Mou
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Add ability to create an indirect action handle for METER_MARK.
It allows to share one Meter between several different actions.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5.c            |   4 +-
 drivers/net/mlx5/mlx5.h            |  33 ++-
 drivers/net/mlx5/mlx5_flow.c       |   6 +
 drivers/net/mlx5/mlx5_flow.h       |  19 +-
 drivers/net/mlx5/mlx5_flow_aso.c   | 139 +++++++--
 drivers/net/mlx5/mlx5_flow_dv.c    | 145 +++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c    | 438 +++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c |  79 +++++-
 8 files changed, 764 insertions(+), 99 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 925e19bcd5..383a789dfa 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -442,7 +442,7 @@ mlx5_flow_aso_age_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT, 1);
 	if (err) {
 		mlx5_free(sh->aso_age_mng);
 		return -1;
@@ -763,7 +763,7 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING, MLX5_ASO_CT_SQ_NUM);
 	if (err) {
 		mlx5_free(sh->ct_mng);
 		/* rte_errno should be extracted from the failure. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 16cd261942..d85cb7adea 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -971,12 +971,16 @@ enum mlx5_aso_mtr_type {
 
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
-	LIST_ENTRY(mlx5_aso_mtr) next;
+	union {
+		LIST_ENTRY(mlx5_aso_mtr) next;
+		struct mlx5_aso_mtr_pool *pool;
+	};
 	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
 	uint32_t offset;
+	enum rte_color init_color;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -985,7 +989,11 @@ struct mlx5_aso_mtr_pool {
 	/*Must be the first in pool*/
 	struct mlx5_devx_obj *devx_obj;
 	/* The devx object of the minimum aso flow meter ID. */
+	struct mlx5dr_action *action; /* HWS action. */
+	struct mlx5_indexed_pool *idx_pool; /* HWS index pool. */
 	uint32_t index; /* Pool index in management structure. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
+	struct mlx5_aso_sq *sq; /* ASO SQs. */
 };
 
 LIST_HEAD(aso_meter_list, mlx5_aso_mtr);
@@ -1678,6 +1686,7 @@ struct mlx5_priv {
 	struct mlx5_aso_ct_pools_mng *ct_mng;
 	/* Management data for ASO connection tracking. */
 	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
+	struct mlx5_aso_mtr_pool *hws_mpool; /* HW steering's Meter pool. */
 #endif
 };
 
@@ -1998,7 +2007,8 @@ void mlx5_pmd_socket_uninit(void);
 int mlx5_flow_meter_init(struct rte_eth_dev *dev,
 			 uint32_t nb_meters,
 			 uint32_t nb_meter_profiles,
-			 uint32_t nb_meter_policies);
+			 uint32_t nb_meter_policies,
+			 uint32_t nb_queues);
 void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
@@ -2067,15 +2077,24 @@ eth_tx_burst_t mlx5_select_tx_function(struct rte_eth_dev *dev);
 
 /* mlx5_flow_aso.c */
 
+int mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_mtr_pool *hws_pool,
+			    struct mlx5_aso_mtr_pools_mng *pool_mng,
+			    uint32_t nb_queues);
+void mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			       struct mlx5_aso_mtr_pool *hws_pool,
+			       struct mlx5_aso_mtr_pools_mng *pool_mng);
 int mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
+			enum mlx5_access_aso_opc_mod aso_opc_mode,
+			uint32_t nb_queues);
 int mlx5_aso_flow_hit_queue_poll_start(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
-int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
-int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+			   enum mlx5_access_aso_opc_mod aso_opc_mod);
+int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
+				 struct mlx5_aso_mtr *mtr,
+				 struct mlx5_mtr_bulk *bulk);
+int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index cbf9c31984..9627ffc979 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -4221,6 +4221,12 @@ flow_action_handles_translate(struct rte_eth_dev *dev,
 						MLX5_RTE_FLOW_ACTION_TYPE_COUNT;
 			translated[handle->index].conf = (void *)(uintptr_t)idx;
 			break;
+		case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+			translated[handle->index].type =
+						(enum rte_flow_action_type)
+						MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK;
+			translated[handle->index].conf = (void *)(uintptr_t)idx;
+			break;
 		case MLX5_INDIRECT_ACTION_TYPE_AGE:
 			if (priv->sh->flow_hit_aso_en) {
 				translated[handle->index].type =
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 6d928b477e..ffa4f28255 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -46,6 +46,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
 	MLX5_RTE_FLOW_ACTION_TYPE_JUMP,
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
+	MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
 };
 
 /* Private (internal) Field IDs for MODIFY_FIELD action. */
@@ -54,22 +55,23 @@ enum mlx5_rte_flow_field_id {
 			MLX5_RTE_FLOW_FIELD_META_REG,
 };
 
-#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
+#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
 	MLX5_INDIRECT_ACTION_TYPE_COUNT,
 	MLX5_INDIRECT_ACTION_TYPE_CT,
+	MLX5_INDIRECT_ACTION_TYPE_METER_MARK,
 };
 
-/* Now, the maximal ports will be supported is 256, action number is 4M. */
-#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x100
+/* Now, the maximal ports will be supported is 16, action number is 32M. */
+#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x10
 
 #define MLX5_INDIRECT_ACT_CT_OWNER_SHIFT 22
 #define MLX5_INDIRECT_ACT_CT_OWNER_MASK (MLX5_INDIRECT_ACT_CT_MAX_PORT - 1)
 
-/* 30-31: type, 22-29: owner port, 0-21: index. */
+/* 29-31: type, 25-28: owner port, 0-24: index */
 #define MLX5_INDIRECT_ACT_CT_GEN_IDX(owner, index) \
 	((MLX5_INDIRECT_ACTION_TYPE_CT << MLX5_INDIRECT_ACTION_TYPE_OFFSET) | \
 	 (((owner) & MLX5_INDIRECT_ACT_CT_OWNER_MASK) << \
@@ -207,6 +209,9 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ITEM_PORT_REPRESENTOR (UINT64_C(1) << 41)
 #define MLX5_FLOW_ITEM_REPRESENTED_PORT (UINT64_C(1) << 42)
 
+/* Meter color item */
+#define MLX5_FLOW_ITEM_METER_COLOR (UINT64_C(1) << 44)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
@@ -1108,6 +1113,7 @@ struct rte_flow_hw {
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	struct mlx5dr_rule rule; /* HWS layer data struct. */
 	uint32_t cnt_id;
+	uint32_t mtr_id;
 } __rte_packed;
 
 /* rte flow action translate to DR action struct. */
@@ -1154,6 +1160,9 @@ struct mlx5_action_construct_data {
 		struct {
 			uint32_t id;
 		} shared_counter;
+		struct {
+			uint32_t id;
+		} shared_meter;
 	};
 };
 
@@ -1237,6 +1246,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
+	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
@@ -1524,6 +1534,7 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 */
 		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
 		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index c00c07b891..f371fff2e2 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -275,6 +275,65 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	return -1;
 }
 
+void
+mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			  struct mlx5_aso_mtr_pool *hws_pool,
+			  struct mlx5_aso_mtr_pools_mng *pool_mng)
+{
+	uint32_t i;
+
+	if (hws_pool) {
+		for (i = 0; i < hws_pool->nb_sq; i++)
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+		mlx5_free(hws_pool->sq);
+		return;
+	}
+	if (pool_mng)
+		mlx5_aso_destroy_sq(&pool_mng->sq);
+}
+
+int
+mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+				struct mlx5_aso_mtr_pool *hws_pool,
+				struct mlx5_aso_mtr_pools_mng *pool_mng,
+				uint32_t nb_queues)
+{
+	struct mlx5_common_device *cdev = sh->cdev;
+	struct mlx5_aso_sq *sq;
+	uint32_t i;
+
+	if (hws_pool) {
+		sq = mlx5_malloc(MLX5_MEM_ZERO,
+			sizeof(struct mlx5_aso_sq) * nb_queues,
+			RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!sq)
+			return -1;
+		hws_pool->sq = sq;
+		for (i = 0; i < nb_queues; i++) {
+			if (mlx5_aso_sq_create(cdev, hws_pool->sq + i,
+					       sh->tx_uar.obj,
+					       MLX5_ASO_QUEUE_LOG_DESC))
+				goto error;
+			mlx5_aso_mtr_init_sq(hws_pool->sq + i);
+		}
+		hws_pool->nb_sq = nb_queues;
+	}
+	if (pool_mng) {
+		if (mlx5_aso_sq_create(cdev, &pool_mng->sq,
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			return -1;
+		mlx5_aso_mtr_init_sq(&pool_mng->sq);
+	}
+	return 0;
+error:
+	do {
+		if (&hws_pool->sq[i])
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+	} while (i--);
+	return -1;
+}
+
 /**
  * API to create and initialize Send Queue used for ASO access.
  *
@@ -282,13 +341,16 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
  *   Pointer to shared device context.
  * @param[in] aso_opc_mod
  *   Mode of ASO feature.
+ * @param[in] nb_queues
+ *   Number of Send Queues to create.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		    enum mlx5_access_aso_opc_mod aso_opc_mod)
+		    enum mlx5_access_aso_opc_mod aso_opc_mod,
+			uint32_t nb_queues)
 {
 	uint32_t sq_desc_n = 1 << MLX5_ASO_QUEUE_LOG_DESC;
 	struct mlx5_common_device *cdev = sh->cdev;
@@ -307,10 +369,9 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_age_init_sq(&sh->aso_age_mng->aso_sq);
 		break;
 	case ASO_OPC_MOD_POLICER:
-		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
+		if (mlx5_aso_mtr_queue_init(sh, NULL,
+					    &sh->mtrmng->pools_mng, nb_queues))
 			return -1;
-		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
@@ -343,7 +404,7 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->aso_age_mng->aso_sq;
 		break;
 	case ASO_OPC_MOD_POLICER:
-		sq = &sh->mtrmng->pools_mng.sq;
+		mlx5_aso_mtr_queue_uninit(sh, NULL, &sh->mtrmng->pools_mng);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
@@ -666,7 +727,8 @@ static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
-			       struct mlx5_mtr_bulk *bulk)
+			       struct mlx5_mtr_bulk *bulk,
+				   bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -679,11 +741,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t param_le;
 	int id;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return 0;
 	}
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
@@ -692,8 +756,11 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
-		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-				    mtrs[aso_mtr->offset]);
+		if (likely(sh->config.dv_flow_en == 2))
+			pool = aso_mtr->pool;
+		else
+			pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+					    mtrs[aso_mtr->offset]);
 		id = pool->devx_obj->id;
 	} else {
 		id = bulk->devx_obj->id;
@@ -756,7 +823,8 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -779,7 +847,7 @@ mlx5_aso_mtrs_status_update(struct mlx5_aso_sq *sq, uint16_t aso_mtrs_nums)
 }
 
 static void
-mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
+mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 {
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
@@ -791,7 +859,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
 		rte_spinlock_unlock(&sq->sqsl);
@@ -823,7 +892,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /**
@@ -840,16 +910,30 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
 			struct mlx5_mtr_bulk *bulk)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2)) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
+						   bulk, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -873,17 +957,30 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2)) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 		return 0;
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
 		if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 			return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 7e0829d1fd..a50a600024 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1387,6 +1387,7 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 		return inherit < 0 ? 0 : inherit;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+	case RTE_FLOW_FIELD_METER_COLOR:
 		return 2;
 	default:
 		MLX5_ASSERT(false);
@@ -1856,6 +1857,31 @@ mlx5_flow_field_id_to_modify_info
 				info[idx].offset = data->offset;
 		}
 		break;
+	case RTE_FLOW_FIELD_METER_COLOR:
+		{
+			const uint32_t color_mask =
+				(UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = flow_hw_get_reg_id
+					(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						       0, error);
+			if (reg < 0)
+				return;
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT((unsigned int)reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0,
+						reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, color_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -1913,7 +1939,9 @@ flow_dv_convert_action_modify_field
 		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
 					(void *)(uintptr_t)conf->src.pvalue :
 					(void *)(uintptr_t)&conf->src.value;
-		if (conf->dst.field == RTE_FLOW_FIELD_META) {
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR) {
 			meta = *(const unaligned_uint32_t *)item.spec;
 			meta = rte_cpu_to_be_32(meta);
 			item.spec = &meta;
@@ -3687,6 +3715,69 @@ flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Validate METER_COLOR item.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] item
+ *   Item specification.
+ * @param[in] attr
+ *   Attributes of flow that includes this item.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_validate_item_meter_color(struct rte_eth_dev *dev,
+			   const struct rte_flow_item *item,
+			   const struct rte_flow_attr *attr __rte_unused,
+			   struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_item_meter_color *spec = item->spec;
+	const struct rte_flow_item_meter_color *mask = item->mask;
+	struct rte_flow_item_meter_color nic_mask = {
+		.color = RTE_COLORS
+	};
+	int ret;
+
+	if (priv->mtr_color_reg == REG_NON)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ITEM, item,
+					  "meter color register"
+					  " isn't available");
+	ret = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, error);
+	if (ret < 0)
+		return ret;
+	if (!spec)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+					  item->spec,
+					  "data cannot be empty");
+	if (spec->color > RTE_COLORS)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &spec->color,
+					  "meter color is invalid");
+	if (!mask)
+		mask = &rte_flow_item_meter_color_mask;
+	if (!mask->color)
+		return rte_flow_error_set(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ITEM_SPEC, NULL,
+					"mask cannot be zero");
+
+	ret = mlx5_flow_item_acceptable(item, (const uint8_t *)mask,
+				(const uint8_t *)&nic_mask,
+				sizeof(struct rte_flow_item_meter_color),
+				MLX5_ITEM_RANGE_NOT_ACCEPTED, error);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
 int
 flow_dv_encap_decap_match_cb(void *tool_ctx __rte_unused,
 			     struct mlx5_list_entry *entry, void *cb_ctx)
@@ -6519,7 +6610,7 @@ flow_dv_mtr_container_resize(struct rte_eth_dev *dev)
 		return -ENOMEM;
 	}
 	if (!pools_mng->n)
-		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER, 1)) {
 			mlx5_free(pools);
 			return -ENOMEM;
 		}
@@ -7421,6 +7512,13 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+			ret = flow_dv_validate_item_meter_color(dev, items,
+								attr, error);
+			if (ret < 0)
+				return ret;
+			last_item = MLX5_FLOW_ITEM_METER_COLOR;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -10496,6 +10594,45 @@ flow_dv_translate_item_flex(struct rte_eth_dev *dev, void *matcher, void *key,
 	mlx5_flex_flow_translate_item(dev, matcher, key, item, is_inner);
 }
 
+/**
+ * Add METER_COLOR item to matcher
+ *
+ * @param[in] dev
+ *   The device to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ */
+static void
+flow_dv_translate_item_meter_color(struct rte_eth_dev *dev, void *key,
+			    const struct rte_flow_item *item,
+			    uint32_t key_type)
+{
+	const struct rte_flow_item_meter_color *color_m = item->mask;
+	const struct rte_flow_item_meter_color *color_v = item->spec;
+	uint32_t value, mask;
+	int reg = REG_NON;
+
+	MLX5_ASSERT(color_v);
+	if (MLX5_ITEM_VALID(item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(item, key_type, color_v, color_m,
+		&rte_flow_item_meter_color_mask);
+	value = rte_col_2_mlx5_col(color_v->color);
+	mask = color_m ?
+		color_m->color : (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+	if (reg == REG_NON)
+		return;
+	flow_dv_match_meta_reg(key, (enum modify_reg)reg, value, mask);
+}
+
 static uint32_t matcher_zero[MLX5_ST_SZ_DW(fte_match_param)] = { 0 };
 
 #define HEADER_IS_ZERO(match_criteria, headers)				     \
@@ -13248,6 +13385,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		/* No other protocol should follow eCPRI layer. */
 		last_item = MLX5_FLOW_LAYER_ECPRI;
 		break;
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		flow_dv_translate_item_meter_color(dev, key, items, key_type);
+		last_item = MLX5_FLOW_ITEM_METER_COLOR;
+		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index af06659052..d4ce2f185a 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -395,6 +395,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
 		acts->cnt_id = 0;
 	}
+	if (acts->mtr_id) {
+		mlx5_ipool_free(priv->hws_mpool->idx_pool, acts->mtr_id);
+		acts->mtr_id = 0;
+	}
 }
 
 /**
@@ -611,6 +615,42 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared meter_mark action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] mtr_id
+ *   Shared meter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_mtr_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t mtr_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_meter.id = mtr_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
 
 /**
  * Translate shared indirect action.
@@ -665,6 +705,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 				       idx, &acts->rule_acts[action_dst]))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		if (__flow_hw_act_data_shared_mtr_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
+			action_src, action_dst, idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -871,6 +918,7 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
 		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
@@ -1030,7 +1078,7 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
+	if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 		return -ENOMEM;
 	return 0;
 }
@@ -1104,6 +1152,74 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 #endif
 }
 
+static __rte_always_inline struct mlx5_aso_mtr *
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
+			   const struct rte_flow_action *action,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_action_meter_mark *meter_mark = action->conf;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t mtr_id;
+
+	aso_mtr = mlx5_ipool_malloc(priv->hws_mpool->idx_pool, &mtr_id);
+	if (!aso_mtr)
+		return NULL;
+	/* Fill the flow meter parameters. */
+	aso_mtr->type = ASO_METER_INDIRECT;
+	fm = &aso_mtr->fm;
+	fm->meter_id = mtr_id;
+	fm->profile = (struct mlx5_flow_meter_profile *)(meter_mark->profile);
+	fm->is_enable = meter_mark->state;
+	fm->color_aware = meter_mark->color_mode;
+	aso_mtr->pool = pool;
+	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->offset = mtr_id - 1;
+	aso_mtr->init_color = (meter_mark->color_mode) ?
+		meter_mark->init_color : RTE_COLOR_GREEN;
+	/* Update ASO flow meter by wqe. */
+	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+					 &priv->mtr_bulk)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	/* Wait for ASO object completion. */
+	if (queue == MLX5_HW_INV_QUEUE &&
+	    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	return aso_mtr;
+}
+
+static __rte_always_inline int
+flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
+			   uint16_t aso_mtr_pos,
+			   const struct rte_flow_action *action,
+			   struct mlx5dr_rule_action *acts,
+			   uint32_t *index,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+
+	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	if (!aso_mtr)
+		return -1;
+
+	/* Compile METER_MARK action */
+	acts[aso_mtr_pos].action = pool->action;
+	acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts[aso_mtr_pos].aso_meter.init_color =
+		(enum mlx5dr_action_aso_meter_color)
+		rte_col_2_mlx5_col(aso_mtr->init_color);
+	*index = aso_mtr->fm.meter_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1411,6 +1527,24 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			action_pos = at->actions_off[actions - action_start];
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter_mark *)
+			     masks->conf)->profile) {
+				err = flow_hw_meter_mark_compile(dev,
+							action_pos, actions,
+							acts->rule_acts,
+							&acts->mtr_id,
+							MLX5_HW_INV_QUEUE);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1607,8 +1741,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
+	struct mlx5_aso_mtr *aso_mtr;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
@@ -1644,6 +1780,17 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return -1;
+		rule_act->action = pool->action;
+		rule_act->aso_meter.offset = aso_mtr->offset;
+		rule_act->aso_meter.init_color =
+			(enum mlx5dr_action_aso_meter_color)
+			rte_col_2_mlx5_col(aso_mtr->init_color);
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1713,6 +1860,7 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
 	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
@@ -1790,6 +1938,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_actions_template *at = hw_at->action_template;
@@ -1806,8 +1955,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
-	struct mlx5_aso_mtr *mtr;
-	uint32_t mtr_id;
+	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
@@ -1841,6 +1989,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		struct mlx5_hrxq *hrxq;
 		uint32_t ct_idx;
 		cnt_id_t cnt_id;
+		uint32_t mtr_id;
 
 		action = &actions[act_data->action_src];
 		/*
@@ -1947,13 +2096,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			meter = action->conf;
 			mtr_id = meter->mtr_id;
-			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			aso_mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
 			rule_acts[act_data->action_dst].action =
 				priv->mtr_bulk.action;
 			rule_acts[act_data->action_dst].aso_meter.offset =
-								mtr->offset;
+								aso_mtr->offset;
 			jump = flow_hw_jump_action_register
-				(dev, &table->cfg, mtr->fm.group, NULL);
+				(dev, &table->cfg, aso_mtr->fm.group, NULL);
 			if (!jump)
 				return -1;
 			MLX5_ASSERT
@@ -1963,7 +2112,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
@@ -1999,6 +2148,28 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 					       &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK:
+			mtr_id = act_data->shared_meter.id &
+				((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+			/* Find ASO object. */
+			aso_mtr = mlx5_ipool_get(pool->idx_pool, mtr_id);
+			if (!aso_mtr)
+				return -1;
+			rule_acts[act_data->action_dst].action =
+							pool->action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+							aso_mtr->offset;
+			rule_acts[act_data->action_dst].aso_meter.init_color =
+				(enum mlx5dr_action_aso_meter_color)
+				rte_col_2_mlx5_col(aso_mtr->init_color);
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			ret = flow_hw_meter_mark_compile(dev,
+				act_data->action_dst, action,
+				rule_acts, &job->flow->mtr_id, queue);
+			if (ret != 0)
+				return ret;
+			break;
 		default:
 			break;
 		}
@@ -2266,6 +2437,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
@@ -2290,6 +2462,10 @@ flow_hw_pull(struct rte_eth_dev *dev,
 						&job->flow->cnt_id);
 				job->flow->cnt_id = 0;
 			}
+			if (job->flow->mtr_id) {
+				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
+				job->flow->mtr_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -3151,6 +3327,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -3244,6 +3423,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_METER;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3335,6 +3519,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3810,6 +4000,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 								  " attribute");
 			}
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		{
+			int reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported meter color register");
+			break;
+		}
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -5325,7 +5525,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (mlx5_flow_meter_init(dev,
 					port_attr->nb_meters,
 					port_attr->nb_meter_profiles,
-					port_attr->nb_meter_policies))
+					port_attr->nb_meter_policies,
+					nb_q_updated))
 			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
@@ -5829,7 +6030,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
+	uint32_t mtr_id;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5848,6 +6051,14 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		if (!aso_mtr)
+			break;
+		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
+		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5883,18 +6094,59 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
-	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
-
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_update_meter_mark *upd_meter_mark =
+		(const struct rte_flow_update_meter_mark *)update;
+	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		meter_mark = &upd_meter_mark->meter_mark;
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark update index");
+		fm = &aso_mtr->fm;
+		if (upd_meter_mark->profile_valid)
+			fm->profile = (struct mlx5_flow_meter_profile *)
+							(meter_mark->profile);
+		if (upd_meter_mark->color_mode_valid)
+			fm->color_aware = meter_mark->color_mode;
+		if (upd_meter_mark->init_color_valid)
+			aso_mtr->init_color = (meter_mark->color_mode) ?
+				meter_mark->init_color : RTE_COLOR_GREEN;
+		if (upd_meter_mark->state_valid)
+			fm->is_enable = meter_mark->state;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
+						 aso_mtr, &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		return 0;
 	default:
-		return flow_dv_action_update(dev, handle, update, error);
+		break;
 	}
+	return flow_dv_action_update(dev, handle, update, error);
 }
 
 /**
@@ -5925,7 +6177,11 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5935,6 +6191,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark destroy index");
+		fm = &aso_mtr->fm;
+		fm->is_enable = 0;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+						 &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		mlx5_ipool_free(pool->idx_pool, idx);
+		return 0;
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -6018,8 +6296,8 @@ flow_hw_action_create(struct rte_eth_dev *dev,
 		       const struct rte_flow_action *action,
 		       struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
-					    NULL, err);
+	return flow_hw_action_handle_create(dev, MLX5_HW_INV_QUEUE,
+					    NULL, conf, action, NULL, err);
 }
 
 /**
@@ -6044,8 +6322,8 @@ flow_hw_action_destroy(struct rte_eth_dev *dev,
 		       struct rte_flow_action_handle *handle,
 		       struct rte_flow_error *error)
 {
-	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
-			NULL, error);
+	return flow_hw_action_handle_destroy(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, NULL, error);
 }
 
 /**
@@ -6073,8 +6351,8 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 		      const void *update,
 		      struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
-			update, NULL, err);
+	return flow_hw_action_handle_update(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, update, NULL, err);
 }
 
 static int
@@ -6604,6 +6882,12 @@ mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 		mlx5_free(priv->mtr_profile_arr);
 		priv->mtr_profile_arr = NULL;
 	}
+	if (priv->hws_mpool) {
+		mlx5_aso_mtr_queue_uninit(priv->sh, priv->hws_mpool, NULL);
+		mlx5_ipool_destroy(priv->hws_mpool->idx_pool);
+		mlx5_free(priv->hws_mpool);
+		priv->hws_mpool = NULL;
+	}
 	if (priv->mtr_bulk.aso) {
 		mlx5_free(priv->mtr_bulk.aso);
 		priv->mtr_bulk.aso = NULL;
@@ -6624,7 +6908,8 @@ int
 mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		     uint32_t nb_meters,
 		     uint32_t nb_meter_profiles,
-		     uint32_t nb_meter_policies)
+		     uint32_t nb_meter_policies,
+		     uint32_t nb_queues)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_obj *dcs = NULL;
@@ -6634,29 +6919,35 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *aso;
 	uint32_t i;
 	struct rte_flow_error error;
+	uint32_t flags;
+	uint32_t nb_mtrs = rte_align32pow2(nb_meters);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_mtr),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.max_idx = nb_meters,
+		.free = mlx5_free,
+		.type = "mlx5_hw_mtr_mark_action",
+	};
 
 	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter configuration is invalid.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter configuration is invalid.");
 		goto err;
 	}
 	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO is not supported.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO is not supported.");
 		goto err;
 	}
 	priv->mtr_config.nb_meters = nb_meters;
-	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
-		ret = ENOMEM;
-		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO queue allocation failed.");
-		goto err;
-	}
 	log_obj_size = rte_log2_u32(nb_meters >> 1);
 	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
 		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
@@ -6664,8 +6955,8 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (!dcs) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO object allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO object allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.devx_obj = dcs;
@@ -6673,31 +6964,33 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (reg_id < 0) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter register is not available.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter register is not available.");
 		goto err;
 	}
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
 	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
 			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
-				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
-				MLX5DR_ACTION_FLAG_HWS_TX |
-				MLX5DR_ACTION_FLAG_HWS_FDB);
+				reg_id - REG_C_0, flags);
 	if (!priv->mtr_bulk.action) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter action creation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter action creation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
-						sizeof(struct mlx5_aso_mtr) * nb_meters,
-						RTE_CACHE_LINE_SIZE,
-						SOCKET_ID_ANY);
+					 sizeof(struct mlx5_aso_mtr) *
+					 nb_meters,
+					 RTE_CACHE_LINE_SIZE,
+					 SOCKET_ID_ANY);
 	if (!priv->mtr_bulk.aso) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter bulk ASO allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter bulk ASO allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.size = nb_meters;
@@ -6708,32 +7001,65 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		aso->offset = i;
 		aso++;
 	}
+	priv->hws_mpool = mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_aso_mtr_pool),
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	if (!priv->hws_mpool) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ipool allocation failed.");
+		goto err;
+	}
+	priv->hws_mpool->devx_obj = priv->mtr_bulk.devx_obj;
+	priv->hws_mpool->action = priv->mtr_bulk.action;
+	priv->hws_mpool->nb_sq = nb_queues;
+	if (mlx5_aso_mtr_queue_init(priv->sh, priv->hws_mpool,
+				    NULL, nb_queues)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	/*
+	 * No need for local cache if Meter number is a small number.
+	 * Since flow insertion rate will be very limited in that case.
+	 * Here let's set the number to less than default trunk size 4K.
+	 */
+	if (nb_mtrs <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_mtrs;
+	} else if (nb_mtrs <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	priv->hws_mpool->idx_pool = mlx5_ipool_create(&cfg);
 	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
 	priv->mtr_profile_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_profile) *
-				nb_meter_profiles,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_profile) *
+			    nb_meter_profiles,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_profile_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter profile allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter profile allocation failed.");
 		goto err;
 	}
 	priv->mtr_config.nb_meter_policies = nb_meter_policies;
 	priv->mtr_policy_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_policy) *
-				nb_meter_policies,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_policy) *
+			    nb_meter_policies,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_policy_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter policy allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter policy allocation failed.");
 		goto err;
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index 792b945c98..fd1337ae73 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -588,6 +588,36 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR profile.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_profile *
+mlx5_flow_meter_profile_get(struct rte_eth_dev *dev,
+			  uint32_t meter_profile_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_profile_find(priv,
+							meter_profile_id);
+}
+
 /**
  * Callback to add MTR profile with HWS.
  *
@@ -1150,6 +1180,37 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR policy.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_policy *
+mlx5_flow_meter_policy_get(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t policy_idx;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_policy_find(dev, policy_id,
+							      &policy_idx);
+}
+
 /**
  * Callback to delete MTR policy for HWS.
  *
@@ -1565,11 +1626,11 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
+		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
 		if (ret)
 			return ret;
 	} else {
@@ -1815,8 +1876,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1921,7 +1982,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->shared = !!shared;
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
-	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
 					   &priv->mtr_bulk);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
@@ -2401,9 +2462,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_create,
 	.destroy = mlx5_flow_meter_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2418,9 +2481,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_hws_create,
 	.destroy = mlx5_flow_meter_hws_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2566,7 +2631,7 @@ mlx5_flow_meter_attach(struct mlx5_priv *priv,
 		struct mlx5_aso_mtr *aso_mtr;
 
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
 			return rte_flow_error_set(error, ENOENT,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 13/17] net/mlx5: add HWS AGE action support
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (11 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 12/17] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 14/17] net/mlx5: add async action push and pull support Suanming Mou
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Michael Baum

From: Michael Baum <michaelba@nvidia.com>

Add support for AGE action for HW steering.
This patch includes:

 1. Add new structures to manage the aging.
 2. Initialize all them in configure function.
 3. Implement per second aging check using CNT background thread.
 4. Enable AGE action in flow create/destroy operations.
 5. Implement queue-based function to report aged flow rules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 drivers/net/mlx5/mlx5.c            |   67 +-
 drivers/net/mlx5/mlx5.h            |   51 +-
 drivers/net/mlx5/mlx5_defs.h       |    3 +
 drivers/net/mlx5/mlx5_flow.c       |   89 ++-
 drivers/net/mlx5/mlx5_flow.h       |   33 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   30 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1104 ++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_hws_cnt.c    |  704 +++++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.h    |  193 ++++-
 10 files changed, 2013 insertions(+), 265 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 383a789dfa..742607509b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -497,6 +497,12 @@ mlx5_flow_aging_init(struct mlx5_dev_ctx_shared *sh)
 	uint32_t i;
 	struct mlx5_age_info *age_info;
 
+	/*
+	 * In HW steering, aging information structure is initialized later
+	 * during configure function.
+	 */
+	if (sh->config.dv_flow_en == 2)
+		return;
 	for (i = 0; i < sh->max_port; i++) {
 		age_info = &sh->port[i].age_info;
 		age_info->flags = 0;
@@ -540,8 +546,8 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 			hca_attr->flow_counter_bulk_alloc_bitmap);
 	/* Initialize fallback mode only on the port initializes sh. */
 	if (sh->refcnt == 1)
-		sh->cmng.counter_fallback = fallback;
-	else if (fallback != sh->cmng.counter_fallback)
+		sh->sws_cmng.counter_fallback = fallback;
+	else if (fallback != sh->sws_cmng.counter_fallback)
 		DRV_LOG(WARNING, "Port %d in sh has different fallback mode "
 			"with others:%d.", PORT_ID(priv), fallback);
 #endif
@@ -556,17 +562,38 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 static void
 mlx5_flow_counters_mng_init(struct mlx5_dev_ctx_shared *sh)
 {
-	int i;
+	int i, j;
+
+	if (sh->config.dv_flow_en < 2) {
+		memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
+		TAILQ_INIT(&sh->sws_cmng.flow_counters);
+		sh->sws_cmng.min_id = MLX5_CNT_BATCH_OFFSET;
+		sh->sws_cmng.max_id = -1;
+		sh->sws_cmng.last_pool_idx = POOL_IDX_INVALID;
+		rte_spinlock_init(&sh->sws_cmng.pool_update_sl);
+		for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
+			TAILQ_INIT(&sh->sws_cmng.counters[i]);
+			rte_spinlock_init(&sh->sws_cmng.csl[i]);
+		}
+	} else {
+		struct mlx5_hca_attr *attr = &sh->cdev->config.hca_attr;
+		uint32_t fw_max_nb_cnts = attr->max_flow_counter;
+		uint8_t log_dcs = log2above(fw_max_nb_cnts) - 1;
+		uint32_t max_nb_cnts = 0;
+
+		for (i = 0, j = 0; j < MLX5_HWS_CNT_DCS_NUM; ++i) {
+			int log_dcs_i = log_dcs - i;
 
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
-	TAILQ_INIT(&sh->cmng.flow_counters);
-	sh->cmng.min_id = MLX5_CNT_BATCH_OFFSET;
-	sh->cmng.max_id = -1;
-	sh->cmng.last_pool_idx = POOL_IDX_INVALID;
-	rte_spinlock_init(&sh->cmng.pool_update_sl);
-	for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
-		TAILQ_INIT(&sh->cmng.counters[i]);
-		rte_spinlock_init(&sh->cmng.csl[i]);
+			if (log_dcs_i < 0)
+				break;
+			if ((max_nb_cnts | RTE_BIT32(log_dcs_i)) >
+			    fw_max_nb_cnts)
+				continue;
+			max_nb_cnts |= RTE_BIT32(log_dcs_i);
+			j++;
+		}
+		sh->hws_max_log_bulk_sz = log_dcs;
+		sh->hws_max_nb_counters = max_nb_cnts;
 	}
 }
 
@@ -607,13 +634,13 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 		rte_pause();
 	}
 
-	if (sh->cmng.pools) {
+	if (sh->sws_cmng.pools) {
 		struct mlx5_flow_counter_pool *pool;
-		uint16_t n_valid = sh->cmng.n_valid;
-		bool fallback = sh->cmng.counter_fallback;
+		uint16_t n_valid = sh->sws_cmng.n_valid;
+		bool fallback = sh->sws_cmng.counter_fallback;
 
 		for (i = 0; i < n_valid; ++i) {
-			pool = sh->cmng.pools[i];
+			pool = sh->sws_cmng.pools[i];
 			if (!fallback && pool->min_dcs)
 				claim_zero(mlx5_devx_cmd_destroy
 							       (pool->min_dcs));
@@ -632,14 +659,14 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 			}
 			mlx5_free(pool);
 		}
-		mlx5_free(sh->cmng.pools);
+		mlx5_free(sh->sws_cmng.pools);
 	}
-	mng = LIST_FIRST(&sh->cmng.mem_mngs);
+	mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	while (mng) {
 		mlx5_flow_destroy_counter_stat_mem_mng(mng);
-		mng = LIST_FIRST(&sh->cmng.mem_mngs);
+		mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	}
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
+	memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d85cb7adea..eca719f269 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -639,12 +639,45 @@ struct mlx5_geneve_tlv_option_resource {
 /* Current time in seconds. */
 #define MLX5_CURR_TIME_SEC	(rte_rdtsc() / rte_get_tsc_hz())
 
+/*
+ * HW steering queue oriented AGE info.
+ * It contains an array of rings, one for each HWS queue.
+ */
+struct mlx5_hws_q_age_info {
+	uint16_t nb_rings; /* Number of aged-out ring lists. */
+	struct rte_ring *aged_lists[]; /* Aged-out lists. */
+};
+
+/*
+ * HW steering AGE info.
+ * It has a ring list containing all aged out flow rules.
+ */
+struct mlx5_hws_age_info {
+	struct rte_ring *aged_list; /* Aged out lists. */
+};
+
 /* Aging information for per port. */
 struct mlx5_age_info {
 	uint8_t flags; /* Indicate if is new event or need to be triggered. */
-	struct mlx5_counters aged_counters; /* Aged counter list. */
-	struct aso_age_list aged_aso; /* Aged ASO actions list. */
-	rte_spinlock_t aged_sl; /* Aged flow list lock. */
+	union {
+		/* SW/FW steering AGE info. */
+		struct {
+			struct mlx5_counters aged_counters;
+			/* Aged counter list. */
+			struct aso_age_list aged_aso;
+			/* Aged ASO actions list. */
+			rte_spinlock_t aged_sl; /* Aged flow list lock. */
+		};
+		struct {
+			struct mlx5_indexed_pool *ages_ipool;
+			union {
+				struct mlx5_hws_age_info hw_age;
+				/* HW steering AGE info. */
+				struct mlx5_hws_q_age_info *hw_q_age;
+				/* HW steering queue oriented AGE info. */
+			};
+		};
+	};
 };
 
 /* Per port data of shared IB device. */
@@ -1302,6 +1335,9 @@ struct mlx5_dev_ctx_shared {
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
 	uint32_t shared_mark_enabled:1;
 	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
+	uint32_t hws_max_log_bulk_sz:5;
+	/* Log of minimal HWS counters created hard coded. */
+	uint32_t hws_max_nb_counters; /* Maximal number for HWS counters. */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1342,7 +1378,8 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_list *dest_array_list;
 	struct mlx5_list *flex_parsers_dv; /* Flex Item parsers. */
 	/* List of destination array actions. */
-	struct mlx5_flow_counter_mng cmng; /* Counters management structure. */
+	struct mlx5_flow_counter_mng sws_cmng;
+	/* SW steering counters management structure. */
 	void *default_miss_action; /* Default miss action. */
 	struct mlx5_indexed_pool *ipool[MLX5_IPOOL_MAX];
 	struct mlx5_indexed_pool *mdh_ipools[MLX5_MAX_MODIFY_NUM];
@@ -1670,6 +1707,9 @@ struct mlx5_priv {
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
+	uint32_t hws_strict_queue:1;
+	/**< Whether all operations strictly happen on the same HWS queue. */
+	uint32_t hws_age_req:1; /**< Whether this port has AGE indexed pool. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
@@ -1985,6 +2025,9 @@ int mlx5_validate_action_ct(struct rte_eth_dev *dev,
 			    const struct rte_flow_action_conntrack *conntrack,
 			    struct rte_flow_error *error);
 
+int mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			       void **contexts, uint32_t nb_contexts,
+			       struct rte_flow_error *error);
 
 /* mlx5_mp_os.c */
 
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index d064abfef3..2af8c731ef 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -43,6 +43,9 @@
 #define MLX5_PMD_SOFT_COUNTERS 1
 #endif
 
+/* Maximum number of DCS created per port. */
+#define MLX5_HWS_CNT_DCS_NUM 4
+
 /* Alarm timeout. */
 #define MLX5_ALARM_TIMEOUT_US 100000
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 9627ffc979..4bfa604578 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -987,6 +987,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.isolate = mlx5_flow_isolate,
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
+	.get_q_aged_flows = mlx5_flow_get_q_aged_flows,
 	.get_aged_flows = mlx5_flow_get_aged_flows,
 	.action_handle_create = mlx5_action_handle_create,
 	.action_handle_destroy = mlx5_action_handle_destroy,
@@ -8942,11 +8943,11 @@ mlx5_flow_create_counter_stat_mem_mng(struct mlx5_dev_ctx_shared *sh)
 		mem_mng->raws[i].data = raw_data + i * MLX5_COUNTERS_PER_POOL;
 	}
 	for (i = 0; i < MLX5_MAX_PENDING_QUERIES; ++i)
-		LIST_INSERT_HEAD(&sh->cmng.free_stat_raws,
+		LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws,
 				 mem_mng->raws + MLX5_CNT_CONTAINER_RESIZE + i,
 				 next);
-	LIST_INSERT_HEAD(&sh->cmng.mem_mngs, mem_mng, next);
-	sh->cmng.mem_mng = mem_mng;
+	LIST_INSERT_HEAD(&sh->sws_cmng.mem_mngs, mem_mng, next);
+	sh->sws_cmng.mem_mng = mem_mng;
 	return 0;
 }
 
@@ -8965,7 +8966,7 @@ static int
 mlx5_flow_set_counter_stat_mem(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_flow_counter_pool *pool)
 {
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	/* Resize statistic memory once used out. */
 	if (!(pool->index % MLX5_CNT_CONTAINER_RESIZE) &&
 	    mlx5_flow_create_counter_stat_mem_mng(sh)) {
@@ -8994,14 +8995,14 @@ mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh)
 {
 	uint32_t pools_n, us;
 
-	pools_n = __atomic_load_n(&sh->cmng.n_valid, __ATOMIC_RELAXED);
+	pools_n = __atomic_load_n(&sh->sws_cmng.n_valid, __ATOMIC_RELAXED);
 	us = MLX5_POOL_QUERY_FREQ_US / pools_n;
 	DRV_LOG(DEBUG, "Set alarm for %u pools each %u us", pools_n, us);
 	if (rte_eal_alarm_set(us, mlx5_flow_query_alarm, sh)) {
-		sh->cmng.query_thread_on = 0;
+		sh->sws_cmng.query_thread_on = 0;
 		DRV_LOG(ERR, "Cannot reinitialize query alarm");
 	} else {
-		sh->cmng.query_thread_on = 1;
+		sh->sws_cmng.query_thread_on = 1;
 	}
 }
 
@@ -9017,12 +9018,12 @@ mlx5_flow_query_alarm(void *arg)
 {
 	struct mlx5_dev_ctx_shared *sh = arg;
 	int ret;
-	uint16_t pool_index = sh->cmng.pool_index;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	uint16_t pool_index = sh->sws_cmng.pool_index;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	uint16_t n_valid;
 
-	if (sh->cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
+	if (sh->sws_cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
 		goto set_alarm;
 	rte_spinlock_lock(&cmng->pool_update_sl);
 	pool = cmng->pools[pool_index];
@@ -9035,7 +9036,7 @@ mlx5_flow_query_alarm(void *arg)
 		/* There is a pool query in progress. */
 		goto set_alarm;
 	pool->raw_hw =
-		LIST_FIRST(&sh->cmng.free_stat_raws);
+		LIST_FIRST(&sh->sws_cmng.free_stat_raws);
 	if (!pool->raw_hw)
 		/* No free counter statistics raw memory. */
 		goto set_alarm;
@@ -9061,12 +9062,12 @@ mlx5_flow_query_alarm(void *arg)
 		goto set_alarm;
 	}
 	LIST_REMOVE(pool->raw_hw, next);
-	sh->cmng.pending_queries++;
+	sh->sws_cmng.pending_queries++;
 	pool_index++;
 	if (pool_index >= n_valid)
 		pool_index = 0;
 set_alarm:
-	sh->cmng.pool_index = pool_index;
+	sh->sws_cmng.pool_index = pool_index;
 	mlx5_set_query_alarm(sh);
 }
 
@@ -9149,7 +9150,7 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 		(struct mlx5_flow_counter_pool *)(uintptr_t)async_id;
 	struct mlx5_counter_stats_raw *raw_to_free;
 	uint8_t query_gen = pool->query_gen ^ 1;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 		pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 				MLX5_COUNTER_TYPE_ORIGIN;
@@ -9172,9 +9173,9 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 			rte_spinlock_unlock(&cmng->csl[cnt_type]);
 		}
 	}
-	LIST_INSERT_HEAD(&sh->cmng.free_stat_raws, raw_to_free, next);
+	LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws, raw_to_free, next);
 	pool->raw_hw = NULL;
-	sh->cmng.pending_queries--;
+	sh->sws_cmng.pending_queries--;
 }
 
 static int
@@ -9534,7 +9535,7 @@ mlx5_flow_dev_dump_sh_all(struct rte_eth_dev *dev,
 	struct mlx5_list_inconst *l_inconst;
 	struct mlx5_list_entry *e;
 	int lcore_index;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	uint32_t max;
 	void *action;
 
@@ -9705,18 +9706,58 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 {
 	const struct mlx5_flow_driver_ops *fops;
 	struct rte_flow_attr attr = { .transfer = 0 };
+	enum mlx5_flow_drv_type type = flow_get_drv_type(dev, &attr);
 
-	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_DV) {
-		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_DV);
-		return fops->get_aged_flows(dev, contexts, nb_contexts,
-						    error);
+	if (type == MLX5_FLOW_TYPE_DV || type == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(type);
+		return fops->get_aged_flows(dev, contexts, nb_contexts, error);
 	}
-	DRV_LOG(ERR,
-		"port %u get aged flows is not supported.",
-		 dev->data->port_id);
+	DRV_LOG(ERR, "port %u get aged flows is not supported.",
+		dev->data->port_id);
 	return -ENOTSUP;
 }
 
+/**
+ * Get aged-out flows per HWS queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query.
+ * @param[in] context
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_countexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+int
+mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			   void **contexts, uint32_t nb_contexts,
+			   struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = { 0 };
+
+	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+		return fops->get_q_aged_flows(dev, queue_id, contexts,
+					      nb_contexts, error);
+	}
+	DRV_LOG(ERR, "port %u queue %u get aged flows is not supported.",
+		dev->data->port_id, queue_id);
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "get Q aged flows with incorrect steering mode");
+}
+
 /* Wrapper for driver action_validate op callback */
 static int
 flow_drv_action_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ffa4f28255..30a18ea35e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -293,6 +293,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_MODIFY_FIELD (1ull << 39)
 #define MLX5_FLOW_ACTION_METER_WITH_TERMINATED_POLICY (1ull << 40)
 #define MLX5_FLOW_ACTION_CT (1ull << 41)
+#define MLX5_FLOW_ACTION_INDIRECT_COUNT (1ull << 42)
+#define MLX5_FLOW_ACTION_INDIRECT_AGE (1ull << 43)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -1099,6 +1101,22 @@ struct rte_flow {
 	uint32_t geneve_tlv_option; /**< Holds Geneve TLV option id. > */
 } __rte_packed;
 
+/*
+ * HWS COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
 /* HWS flow struct. */
@@ -1112,7 +1130,8 @@ struct rte_flow_hw {
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	struct mlx5dr_rule rule; /* HWS layer data struct. */
-	uint32_t cnt_id;
+	uint32_t age_idx;
+	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 } __rte_packed;
 
@@ -1158,7 +1177,7 @@ struct mlx5_action_construct_data {
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
 		struct {
-			uint32_t id;
+			cnt_id_t id;
 		} shared_counter;
 		struct {
 			uint32_t id;
@@ -1189,6 +1208,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint64_t action_flags; /* Bit-map of all valid action in template. */
 	uint16_t dr_actions_num; /* Amount of DR rules actions. */
 	uint16_t actions_num; /* Amount of flow actions */
 	uint16_t *actions_off; /* DR action offset for given rte action offset. */
@@ -1245,7 +1265,7 @@ struct mlx5_hw_actions {
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
-	uint32_t cnt_id; /* Counter id. */
+	cnt_id_t cnt_id; /* Counter id. */
 	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
@@ -1619,6 +1639,12 @@ typedef int (*mlx5_flow_get_aged_flows_t)
 					 void **context,
 					 uint32_t nb_contexts,
 					 struct rte_flow_error *error);
+typedef int (*mlx5_flow_get_q_aged_flows_t)
+					(struct rte_eth_dev *dev,
+					 uint32_t queue_id,
+					 void **context,
+					 uint32_t nb_contexts,
+					 struct rte_flow_error *error);
 typedef int (*mlx5_flow_action_validate_t)
 				(struct rte_eth_dev *dev,
 				 const struct rte_flow_indir_action_conf *conf,
@@ -1825,6 +1851,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_counter_free_t counter_free;
 	mlx5_flow_counter_query_t counter_query;
 	mlx5_flow_get_aged_flows_t get_aged_flows;
+	mlx5_flow_get_q_aged_flows_t get_q_aged_flows;
 	mlx5_flow_action_validate_t action_validate;
 	mlx5_flow_action_create_t action_create;
 	mlx5_flow_action_destroy_t action_destroy;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index a50a600024..58a7e94ee0 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5524,7 +5524,7 @@ flow_dv_validate_action_age(uint64_t action_flags,
 	const struct rte_flow_action_age *age = action->conf;
 
 	if (!priv->sh->cdev->config.devx ||
-	    (priv->sh->cmng.counter_fallback && !priv->sh->aso_age_mng))
+	    (priv->sh->sws_cmng.counter_fallback && !priv->sh->aso_age_mng))
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -6085,7 +6085,7 @@ flow_dv_counter_get_by_idx(struct rte_eth_dev *dev,
 			   struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	/* Decrease to original index and clear shared bit. */
@@ -6179,7 +6179,7 @@ static int
 flow_dv_container_resize(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	void *old_pools = cmng->pools;
 	uint32_t resize = cmng->n + MLX5_CNT_CONTAINER_RESIZE;
 	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * resize;
@@ -6225,7 +6225,7 @@ _flow_dv_query_count(struct rte_eth_dev *dev, uint32_t counter, uint64_t *pkts,
 
 	cnt = flow_dv_counter_get_by_idx(dev, counter, &pool);
 	MLX5_ASSERT(pool);
-	if (priv->sh->cmng.counter_fallback)
+	if (priv->sh->sws_cmng.counter_fallback)
 		return mlx5_devx_cmd_flow_counter_query(cnt->dcs_when_active, 0,
 					0, pkts, bytes, 0, NULL, NULL, 0);
 	rte_spinlock_lock(&pool->sl);
@@ -6262,8 +6262,8 @@ flow_dv_pool_create(struct rte_eth_dev *dev, struct mlx5_devx_obj *dcs,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t size = sizeof(*pool);
 
 	size += MLX5_COUNTERS_PER_POOL * MLX5_CNT_SIZE;
@@ -6324,14 +6324,14 @@ flow_dv_counter_pool_prepare(struct rte_eth_dev *dev,
 			     uint32_t age)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	struct mlx5_counters tmp_tq;
 	struct mlx5_devx_obj *dcs = NULL;
 	struct mlx5_flow_counter *cnt;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t i;
 
 	if (fallback) {
@@ -6395,8 +6395,8 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt_free = NULL;
-	bool fallback = priv->sh->cmng.counter_fallback;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
 	uint32_t cnt_idx;
@@ -6442,7 +6442,7 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	if (_flow_dv_query_count(dev, cnt_idx, &cnt_free->hits,
 				 &cnt_free->bytes))
 		goto err;
-	if (!fallback && !priv->sh->cmng.query_thread_on)
+	if (!fallback && !priv->sh->sws_cmng.query_thread_on)
 		/* Start the asynchronous batch query by the host thread. */
 		mlx5_set_query_alarm(priv->sh);
 	/*
@@ -6570,7 +6570,7 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 	 * this case, lock will not be needed as query callback and release
 	 * function both operate with the different list.
 	 */
-	if (!priv->sh->cmng.counter_fallback) {
+	if (!priv->sh->sws_cmng.counter_fallback) {
 		rte_spinlock_lock(&pool->csl);
 		TAILQ_INSERT_TAIL(&pool->counters[pool->query_gen], cnt, next);
 		rte_spinlock_unlock(&pool->csl);
@@ -6578,10 +6578,10 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 		cnt->dcs_when_free = cnt->dcs_when_active;
 		cnt_type = pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 					   MLX5_COUNTER_TYPE_ORIGIN;
-		rte_spinlock_lock(&priv->sh->cmng.csl[cnt_type]);
-		TAILQ_INSERT_TAIL(&priv->sh->cmng.counters[cnt_type],
+		rte_spinlock_lock(&priv->sh->sws_cmng.csl[cnt_type]);
+		TAILQ_INSERT_TAIL(&priv->sh->sws_cmng.counters[cnt_type],
 				  cnt, next);
-		rte_spinlock_unlock(&priv->sh->cmng.csl[cnt_type]);
+		rte_spinlock_unlock(&priv->sh->sws_cmng.csl[cnt_type]);
 	}
 }
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index d4ce2f185a..5c0981d385 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -460,7 +460,8 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
 				  enum rte_flow_action_type type,
 				  uint16_t action_src,
 				  uint16_t action_dst)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -495,7 +496,8 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				uint16_t action_src,
 				uint16_t action_dst,
 				uint16_t len)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -565,7 +567,8 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 				     uint16_t action_dst,
 				     uint32_t idx,
 				     struct mlx5_shared_action_rss *rss)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -604,7 +607,8 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 				     uint16_t action_src,
 				     uint16_t action_dst,
 				     cnt_id_t cnt_id)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -700,6 +704,10 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/* Not supported, prevent by validate function. */
+		MLX5_ASSERT(0);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
 				       idx, &acts->rule_acts[action_dst]))
@@ -1092,7 +1100,7 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	cnt_id_t cnt_id;
 	int ret;
 
-	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0);
 	if (ret != 0)
 		return ret;
 	ret = mlx5_hws_cnt_pool_get_action_offset
@@ -1233,8 +1241,6 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to the rte_eth_dev structure.
  * @param[in] cfg
  *   Pointer to the table configuration.
- * @param[in] item_templates
- *   Item template array to be binded to the table.
  * @param[in/out] acts
  *   Pointer to the template HW steering DR actions.
  * @param[in] at
@@ -1243,7 +1249,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to error structure.
  *
  * @return
- *    Table on success, NULL otherwise and rte_errno is set.
+ *   0 on success, a negative errno otherwise and rte_errno is set.
  */
 static int
 __flow_hw_actions_translate(struct rte_eth_dev *dev,
@@ -1272,6 +1278,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t jump_pos;
 	uint32_t ct_idx;
 	int err;
+	uint32_t target_grp = 0;
 
 	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
@@ -1499,8 +1506,42 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 							action_pos))
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Age action on root table is not supported in HW steering mode");
+			}
+			action_pos = at->actions_off[actions - at->actions];
+			if (__flow_hw_act_data_general_append(priv, acts,
+							 actions->type,
+							 actions - action_start,
+							 action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			action_pos = at->actions_off[actions - action_start];
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Counter action on root table is not supported in HW steering mode");
+			}
+			if ((at->action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * When both COUNT and AGE are requested, it is
+				 * saved as AGE action which creates also the
+				 * counter.
+				 */
+				break;
+			action_pos = at->actions_off[actions - at->actions];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
@@ -1727,6 +1768,10 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *   Pointer to the flow table.
  * @param[in] it_idx
  *   Item template index the action template refer to.
+ * @param[in] action_flags
+ *   Actions bit-map detected in this template.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
  * @param[in] rule_act
  *   Pointer to the shared action's destination rule DR action.
  *
@@ -1737,7 +1782,8 @@ static __rte_always_inline int
 flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
-				const uint8_t it_idx,
+				const uint8_t it_idx, uint64_t action_flags,
+				struct rte_flow_hw *flow,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -1745,11 +1791,14 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
 	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_age_info *age_info;
+	struct mlx5_hws_age_param *param;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
 		       ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	uint64_t item_flags;
+	cnt_id_t age_cnt;
 
 	memset(&act_data, 0, sizeof(act_data));
 	switch (type) {
@@ -1775,6 +1824,44 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				&rule_act->action,
 				&rule_act->counter.offset))
 			return -1;
+		flow->cnt_id = act_idx;
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/*
+		 * Save the index with the indirect type, to recognize
+		 * it in flow destroy.
+		 */
+		flow->age_idx = act_idx;
+		if (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+			/*
+			 * The mutual update for idirect AGE & COUNT will be
+			 * performed later after we have ID for both of them.
+			 */
+			break;
+		age_info = GET_PORT_AGE_INFO(priv);
+		param = mlx5_ipool_get(age_info->ages_ipool, idx);
+		if (param == NULL)
+			return -1;
+		if (action_flags & MLX5_FLOW_ACTION_COUNT) {
+			if (mlx5_hws_cnt_pool_get(priv->hws_cpool,
+						  &param->queue_id, &age_cnt,
+						  idx) < 0)
+				return -1;
+			flow->cnt_id = age_cnt;
+			param->nb_cnts++;
+		} else {
+			/*
+			 * Get the counter of this indirect AGE or create one
+			 * if doesn't exist.
+			 */
+			age_cnt = mlx5_hws_age_cnt_get(priv, param, idx);
+			if (age_cnt == 0)
+				return -1;
+		}
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+						     age_cnt, &rule_act->action,
+						     &rule_act->counter.offset))
+			return -1;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
@@ -1935,7 +2022,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t queue)
+			  uint32_t queue,
+			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1948,6 +2036,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
 	const struct rte_flow_action_meter *meter = NULL;
+	const struct rte_flow_action_age *age = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1955,6 +2044,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	uint32_t age_idx = 0;
 	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
@@ -2007,6 +2097,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
 					(dev, queue, action, table, it_idx,
+					 at->action_flags, job->flow,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -2115,9 +2206,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			age = action->conf;
+			/*
+			 * First, create the AGE parameter, then create its
+			 * counter later:
+			 * Regular counter - in next case.
+			 * Indirect counter - update it after the loop.
+			 */
+			age_idx = mlx5_hws_age_action_create(priv, queue, 0,
+							     age,
+							     job->flow->idx,
+							     error);
+			if (age_idx == 0)
+				return -rte_errno;
+			job->flow->age_idx = age_idx;
+			if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+				/*
+				 * When AGE uses indirect counter, no need to
+				 * create counter but need to update it with the
+				 * AGE parameter, will be done after the loop.
+				 */
+				break;
+			/* Fall-through. */
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
-					&cnt_id);
+						    &cnt_id, age_idx);
 			if (ret != 0)
 				return ret;
 			ret = mlx5_hws_cnt_pool_get_action_offset
@@ -2174,6 +2288,25 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT) {
+		if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE) {
+			age_idx = job->flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+			if (mlx5_hws_cnt_age_get(priv->hws_cpool,
+						 job->flow->cnt_id) != age_idx)
+				/*
+				 * This is first use of this indirect counter
+				 * for this indirect AGE, need to increase the
+				 * number of counters.
+				 */
+				mlx5_hws_age_nb_cnt_increase(priv, age_idx);
+		}
+		/*
+		 * Update this indirect counter the indirect/direct AGE in which
+		 * using it.
+		 */
+		mlx5_hws_cnt_age_set(priv->hws_cpool, job->flow->cnt_id,
+				     age_idx);
+	}
 	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
@@ -2323,8 +2456,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
-				      pattern_template_index, actions, rule_acts, queue)) {
+	if (flow_hw_actions_construct(dev, job,
+				      &table->ats[action_template_index],
+				      pattern_template_index, actions,
+				      rule_acts, queue, error)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -2409,6 +2544,49 @@ flow_hw_async_flow_destroy(struct rte_eth_dev *dev,
 			"fail to create rte flow");
 }
 
+/**
+ * Release the AGE and counter for given flow.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue
+ *   The queue to release the counter.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
+ * @param[out] error
+ *   Pointer to error structure.
+ */
+static void
+flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
+			  struct rte_flow_hw *flow,
+			  struct rte_flow_error *error)
+{
+	if (mlx5_hws_cnt_is_shared(priv->hws_cpool, flow->cnt_id)) {
+		if (flow->age_idx && !mlx5_hws_age_is_indirect(flow->age_idx)) {
+			/* Remove this AGE parameter from indirect counter. */
+			mlx5_hws_cnt_age_set(priv->hws_cpool, flow->cnt_id, 0);
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+			flow->age_idx = 0;
+		}
+		return;
+	}
+	/* Put the counter first to reduce the race risk in BG thread. */
+	mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue, &flow->cnt_id);
+	flow->cnt_id = 0;
+	if (flow->age_idx) {
+		if (mlx5_hws_age_is_indirect(flow->age_idx)) {
+			uint32_t idx = flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+
+			mlx5_hws_age_nb_cnt_decrease(priv, idx);
+		} else {
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+		}
+		flow->age_idx = 0;
+	}
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2455,13 +2633,9 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
-			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
-			    mlx5_hws_cnt_is_shared
-				(priv->hws_cpool, job->flow->cnt_id) == false) {
-				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
-						&job->flow->cnt_id);
-				job->flow->cnt_id = 0;
-			}
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id))
+				flow_hw_age_count_release(priv, queue,
+							  job->flow, error);
 			if (job->flow->mtr_id) {
 				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
 				job->flow->mtr_id = 0;
@@ -3093,100 +3267,315 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static inline int
-flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
-				const struct rte_flow_action masks[],
-				const struct rte_flow_action *ins_actions,
-				const struct rte_flow_action *ins_masks,
-				struct rte_flow_action *new_actions,
-				struct rte_flow_action *new_masks,
-				uint16_t *ins_pos)
+/**
+ * Validate AGE action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[in] fixed_cnt
+ *   Indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_age(struct rte_eth_dev *dev,
+			    const struct rte_flow_action *action,
+			    uint64_t action_flags, bool fixed_cnt,
+			    struct rte_flow_error *error)
 {
-	uint16_t idx, total = 0;
-	uint16_t end_idx = UINT16_MAX;
-	bool act_end = false;
-	bool modify_field = false;
-	bool rss_or_queue = false;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
 
-	MLX5_ASSERT(actions && masks);
-	MLX5_ASSERT(new_actions && new_masks);
-	MLX5_ASSERT(ins_actions && ins_masks);
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_RSS:
-		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			/* It is assumed that application provided only single RSS/QUEUE action. */
-			MLX5_ASSERT(!rss_or_queue);
-			rss_or_queue = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			modify_field = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_END:
-			end_idx = idx;
-			act_end = true;
-			break;
-		default:
-			break;
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "AGE action not supported");
+	if (age_info->ages_ipool == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "aging pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_AGE) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate AGE actions set");
+	if (fixed_cnt)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "AGE and fixed COUNT combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate count action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_count(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      const struct rte_flow_action *mask,
+			      uint64_t action_flags,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count = mask->conf;
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "count action not supported");
+	if (!priv->hws_cpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "counters pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_COUNT) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate count actions set");
+	if (count && count->id && (action_flags & MLX5_FLOW_ACTION_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, mask,
+					  "AGE and COUNT action shared by mask combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate meter_mark action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_meter_mark(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	RTE_SET_USED(action);
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark action not supported");
+	if (!priv->hws_mpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark pool not initialized");
+	return 0;
+}
+
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in, out] action_flags
+ *   Holds the actions detected until now.
+ * @param[in, out] fixed_cnt
+ *   Pointer to indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_indirect(struct rte_eth_dev *dev,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *mask,
+				 uint64_t *action_flags, bool *fixed_cnt,
+				 struct rte_flow_error *error)
+{
+	uint32_t type;
+	int ret;
+
+	if (!mask)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "Unable to determine indirect action type without a mask specified");
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		ret = flow_hw_validate_action_meter_mark(dev, mask, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_METER;
+		break;
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_RSS;
+		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_CT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (action->conf && mask->conf) {
+			if ((*action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (*action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * AGE cannot use indirect counter which is
+				 * shared with enother flow rules.
+				 */
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "AGE and fixed COUNT combination is not supported");
+			*fixed_cnt = true;
 		}
+		ret = flow_hw_validate_action_count(dev, action, mask,
+						    *action_flags, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_COUNT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		ret = flow_hw_validate_action_age(dev, action, *action_flags,
+						  *fixed_cnt, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_AGE;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, mask,
+					  "Unsupported indirect action type");
 	}
-	if (!rss_or_queue)
-		return 0;
-	else if (idx >= MLX5_HW_MAX_ACTS)
-		return -1; /* No more space. */
-	total = idx;
-	/*
-	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
-	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
-	 * first MODIFY_FIELD flow action.
-	 */
-	if (modify_field) {
-		*ins_pos = end_idx;
-		goto insert_meta_copy;
-	}
-	/*
-	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
-	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	return 0;
+}
+
+/**
+ * Validate raw_encap action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_raw_encap(struct rte_eth_dev *dev __rte_unused,
+				  const struct rte_flow_action *action,
+				  struct rte_flow_error *error)
+{
+	const struct rte_flow_action_raw_encap *raw_encap_data = action->conf;
+
+	if (!raw_encap_data || !raw_encap_data->size || !raw_encap_data->data)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "invalid raw_encap_data");
+	return 0;
+}
+
+static inline uint16_t
+flow_hw_template_expand_modify_field(const struct rte_flow_action actions[],
+				     const struct rte_flow_action masks[],
+				     const struct rte_flow_action *mf_action,
+				     const struct rte_flow_action *mf_mask,
+				     struct rte_flow_action *new_actions,
+				     struct rte_flow_action *new_masks,
+				     uint64_t flags, uint32_t act_num)
+{
+	uint32_t i, tail;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(mf_action && mf_mask);
+	if (flags & MLX5_FLOW_ACTION_MODIFY_FIELD) {
+		/*
+		 * Application action template already has Modify Field.
+		 * It's location will be used in DR.
+		 * Expanded MF action can be added before the END.
+		 */
+		i = act_num - 1;
+		goto insert;
+	}
+	/**
+	 * Locate the first action positioned BEFORE the new MF.
+	 *
+	 * Search for a place to insert modify header
+	 * from the END action backwards:
+	 * 1. END is always present in actions array
+	 * 2. END location is always at action[act_num - 1]
+	 * 3. END always positioned AFTER modify field location
+	 *
+	 * Relative actions order is the same for RX, TX and FDB.
+	 *
+	 * Current actions order (draft-3)
+	 * @see action_order_arr[]
 	 */
-	act_end = false;
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_COUNT:
-		case RTE_FLOW_ACTION_TYPE_METER:
-		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+	for (i = act_num - 2; (int)i >= 0; i--) {
+		enum rte_flow_action_type type = actions[i].type;
+
+		if (type == RTE_FLOW_ACTION_TYPE_INDIRECT)
+			type = masks[i].type;
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_DROP:
+		case RTE_FLOW_ACTION_TYPE_JUMP:
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			*ins_pos = idx;
-			act_end = true;
-			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+		case RTE_FLOW_ACTION_TYPE_VOID:
 		case RTE_FLOW_ACTION_TYPE_END:
-			act_end = true;
 			break;
 		default:
+			i++; /* new MF inserted AFTER actions[i] */
+			goto insert;
 			break;
 		}
 	}
-insert_meta_copy:
-	MLX5_ASSERT(*ins_pos != UINT16_MAX);
-	MLX5_ASSERT(*ins_pos < total);
-	/* Before the position, no change for the actions. */
-	for (idx = 0; idx < *ins_pos; idx++) {
-		new_actions[idx] = actions[idx];
-		new_masks[idx] = masks[idx];
-	}
-	/* Insert the new action and mask to the position. */
-	new_actions[idx] = *ins_actions;
-	new_masks[idx] = *ins_masks;
-	/* Remaining content is right shifted by one position. */
-	for (; idx < total; idx++) {
-		new_actions[idx + 1] = actions[idx];
-		new_masks[idx + 1] = masks[idx];
-	}
-	return 0;
+	i = 0;
+insert:
+	tail = act_num - i; /* num action to move */
+	memcpy(new_actions, actions, sizeof(actions[0]) * i);
+	new_actions[i] = *mf_action;
+	memcpy(new_actions + i + 1, actions + i, sizeof(actions[0]) * tail);
+	memcpy(new_masks, masks, sizeof(masks[0]) * i);
+	new_masks[i] = *mf_mask;
+	memcpy(new_masks + i + 1, masks + i, sizeof(masks[0]) * tail);
+	return i;
 }
 
 static int
@@ -3257,13 +3646,17 @@ flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_actions_validate(struct rte_eth_dev *dev,
-			const struct rte_flow_actions_template_attr *attr,
-			const struct rte_flow_action actions[],
-			const struct rte_flow_action masks[],
-			struct rte_flow_error *error)
+mlx5_flow_hw_actions_validate(struct rte_eth_dev *dev,
+			      const struct rte_flow_actions_template_attr *attr,
+			      const struct rte_flow_action actions[],
+			      const struct rte_flow_action masks[],
+			      uint64_t *act_flags,
+			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count_mask = NULL;
+	bool fixed_cnt = false;
+	uint64_t action_flags = 0;
 	uint16_t i;
 	bool actions_end = false;
 	int ret;
@@ -3289,46 +3682,70 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_indirect(dev, action,
+							       mask,
+							       &action_flags,
+							       &fixed_cnt,
+							       error);
+			if (ret < 0)
+				return ret;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_MARK;
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DROP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_JUMP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_QUEUE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_RSS;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_raw_encap(dev, action, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_meter_mark(dev, action,
+								 error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
@@ -3336,21 +3753,43 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 									error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			ret = flow_hw_validate_action_represented_port
 					(dev, action, mask, error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_PORT_ID;
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			if (count_mask && count_mask->id)
+				fixed_cnt = true;
+			ret = flow_hw_validate_action_age(dev, action,
+							  action_flags,
+							  fixed_cnt, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_AGE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_count(dev, action, mask,
+							    action_flags,
+							    error);
+			if (ret < 0)
+				return ret;
+			count_mask = mask->conf;
+			action_flags |= MLX5_FLOW_ACTION_COUNT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_CT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_flags |= MLX5_FLOW_ACTION_OF_POP_VLAN;
+			break;
 		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			action_flags |= MLX5_FLOW_ACTION_OF_SET_VLAN_VID;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
 			ret = flow_hw_validate_action_push_vlan
@@ -3360,6 +3799,7 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			i += is_of_vlan_pcp_present(action) ?
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
+			action_flags |= MLX5_FLOW_ACTION_OF_PUSH_VLAN;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -3371,9 +3811,23 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 						  "action not supported in template API");
 		}
 	}
+	if (act_flags != NULL)
+		*act_flags = action_flags;
 	return 0;
 }
 
+static int
+flow_hw_actions_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error)
+{
+	return mlx5_flow_hw_actions_validate(dev, attr, actions, masks, NULL,
+					     error);
+}
+
+
 static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
 	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
@@ -3386,7 +3840,6 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
-	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
 	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
@@ -3396,7 +3849,7 @@ static int
 flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 					  unsigned int action_src,
 					  enum mlx5dr_action_type *action_types,
-					  uint16_t *curr_off,
+					  uint16_t *curr_off, uint16_t *cnt_off,
 					  struct rte_flow_actions_template *at)
 {
 	uint32_t type;
@@ -3413,10 +3866,18 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		at->actions_off[action_src] = *curr_off;
-		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
-		*curr_off = *curr_off + 1;
+		/*
+		 * Both AGE and COUNT action need counter, the first one fills
+		 * the action_types array, and the second only saves the offset.
+		 */
+		if (*cnt_off == UINT16_MAX) {
+			*cnt_off = *curr_off;
+			action_types[*cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			*curr_off = *curr_off + 1;
+		}
+		at->actions_off[action_src] = *cnt_off;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		at->actions_off[action_src] = *curr_off;
@@ -3455,6 +3916,7 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
 	uint16_t reformat_off = UINT16_MAX;
 	uint16_t mhdr_off = UINT16_MAX;
+	uint16_t cnt_off = UINT16_MAX;
 	int ret;
 	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		const struct rte_flow_action_raw_encap *raw_encap_data;
@@ -3467,9 +3929,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
-									action_types,
-									&curr_off, at);
+			ret = flow_hw_dr_actions_template_handle_shared
+								 (&at->masks[i],
+								  i,
+								  action_types,
+								  &curr_off,
+								  &cnt_off, at);
 			if (ret)
 				return NULL;
 			break;
@@ -3525,6 +3990,19 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 			if (curr_off >= MLX5_HW_MAX_ACTS)
 				goto err_actions_num;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/*
+			 * Both AGE and COUNT action need counter, the first
+			 * one fills the action_types array, and the second only
+			 * saves the offset.
+			 */
+			if (cnt_off == UINT16_MAX) {
+				cnt_off = curr_off++;
+				action_types[cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			}
+			at->actions_off[i] = cnt_off;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3665,6 +4143,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = UINT16_MAX;
+	uint64_t action_flags = 0;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
@@ -3707,22 +4186,9 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
+	if (mlx5_flow_hw_actions_validate(dev, attr, actions, masks,
+					  &action_flags, error))
 		return NULL;
-	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
-	    priv->sh->config.dv_esw_en) {
-		/* Application should make sure only one Q/RSS exist in one rule. */
-		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
-						    tmp_action, tmp_mask, &pos)) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					   "Failed to concatenate new action/mask");
-			return NULL;
-		} else if (pos != UINT16_MAX) {
-			ra = tmp_action;
-			rm = tmp_mask;
-		}
-	}
 	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		switch (ra[i].type) {
 		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
@@ -3748,6 +4214,29 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
 		return NULL;
 	}
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en &&
+	    (action_flags &
+	     (RTE_FLOW_ACTION_TYPE_QUEUE | RTE_FLOW_ACTION_TYPE_RSS))) {
+		/* Insert META copy */
+		if (act_num + 1 > MLX5_HW_MAX_ACTS) {
+			rte_flow_error_set(error, E2BIG,
+					   RTE_FLOW_ERROR_TYPE_ACTION,
+					   NULL, "cannot expand: too many actions");
+			return NULL;
+		}
+		/* Application should make sure only one Q/RSS exist in one rule. */
+		pos = flow_hw_template_expand_modify_field(actions, masks,
+							   &rx_cpy,
+							   &rx_cpy_mask,
+							   tmp_action, tmp_mask,
+							   action_flags,
+							   act_num);
+		ra = tmp_action;
+		rm = tmp_mask;
+		act_num++;
+		action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
+	}
 	if (set_vlan_vid_ix != -1) {
 		/* If temporary action buffer was not used, copy template actions to it */
 		if (ra == actions && rm == masks) {
@@ -3818,6 +4307,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	at->tmpl = flow_hw_dr_actions_template_create(at);
 	if (!at->tmpl)
 		goto error;
+	at->action_flags = action_flags;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
@@ -4161,6 +4651,7 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t port_id = dev->data->port_id;
 	struct rte_mtr_capabilities mtr_cap;
 	int ret;
@@ -4177,6 +4668,8 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 		port_info->max_nb_meter_profiles = UINT32_MAX;
 		port_info->max_nb_meter_policies = UINT32_MAX;
 	}
+	port_info->max_nb_counters = priv->sh->hws_max_nb_counters;
+	port_info->max_nb_aging_objects = port_info->max_nb_counters;
 	return 0;
 }
 
@@ -5555,8 +6048,6 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			goto err;
 		}
 	}
-	if (_queue_attr)
-		mlx5_free(_queue_attr);
 	if (port_attr->nb_conn_tracks) {
 		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
 			   sizeof(*priv->ct_mng);
@@ -5573,13 +6064,35 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
-				nb_queue);
+							   nb_queue);
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	if (port_attr->nb_aging_objects) {
+		if (port_attr->nb_counters == 0) {
+			/*
+			 * Aging management uses counter. Number counters
+			 * requesting should take into account a counter for
+			 * each flow rules containing AGE without counter.
+			 */
+			DRV_LOG(ERR, "Port %u AGE objects are requested (%u) "
+				"without counters requesting.",
+				dev->data->port_id,
+				port_attr->nb_aging_objects);
+			rte_errno = EINVAL;
+			goto err;
+		}
+		ret = mlx5_hws_age_pool_init(dev, port_attr, nb_queue);
+		if (ret < 0)
+			goto err;
+	}
 	ret = flow_hw_create_vlan(dev);
 	if (ret)
 		goto err;
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
+	if (port_attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE)
+		priv->hws_strict_queue = 1;
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -5590,6 +6103,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
+	if (priv->hws_cpool)
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -5663,6 +6180,8 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
 	if (priv->hws_cpool)
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	if (priv->hws_ctpool) {
@@ -5999,13 +6518,53 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
 }
 
+/**
+ * Validate shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used.
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] conf
+ *   Indirect action configuration.
+ * @param[in] action
+ *   rte_flow action detail.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_handle_validate(struct rte_eth_dev *dev, uint32_t queue,
+			       const struct rte_flow_op_attr *attr,
+			       const struct rte_flow_indir_action_conf *conf,
+			       const struct rte_flow_action *action,
+			       void *user_data,
+			       struct rte_flow_error *error)
+{
+	RTE_SET_USED(attr);
+	RTE_SET_USED(queue);
+	RTE_SET_USED(user_data);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		return flow_hw_validate_action_meter_mark(dev, action, error);
+	default:
+		return flow_dv_action_validate(dev, conf, action, error);
+	}
+}
+
 /**
  * Create shared action.
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] conf
@@ -6030,16 +6589,32 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
+	uint32_t age_idx;
 
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		age = action->conf;
+		age_idx = mlx5_hws_age_action_create(priv, queue, true, age,
+						     0, error);
+		if (age_idx == 0) {
+			rte_flow_error_set(error, ENODEV,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "AGE are not configured!");
+		} else {
+			age_idx = (MLX5_INDIRECT_ACTION_TYPE_AGE <<
+				   MLX5_INDIRECT_ACTION_TYPE_OFFSET) | age_idx;
+			handle =
+			    (struct rte_flow_action_handle *)(uintptr_t)age_idx;
+		}
+		break;
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0))
 			rte_flow_error_set(error, ENODEV,
 					RTE_FLOW_ERROR_TYPE_ACTION,
 					NULL,
@@ -6059,8 +6634,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
 		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
 		break;
-	default:
+	case RTE_FLOW_ACTION_TYPE_RSS:
 		handle = flow_dv_action_create(dev, conf, action, error);
+		break;
+	default:
+		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				   NULL, "action type not supported");
+		return NULL;
 	}
 	return handle;
 }
@@ -6071,7 +6651,7 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6094,7 +6674,6 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6109,6 +6688,8 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_update(priv, idx, update, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
@@ -6142,11 +6723,15 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		return 0;
-	default:
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
+		return flow_dv_action_update(dev, handle, update, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
-	return flow_dv_action_update(dev, handle, update, error);
+	return 0;
 }
 
 /**
@@ -6155,7 +6740,7 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6177,6 +6762,7 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -6187,7 +6773,16 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_destroy(priv, age_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
+		if (age_idx != 0)
+			/*
+			 * If this counter belongs to indirect AGE, here is the
+			 * time to update the AGE.
+			 */
+			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
@@ -6212,10 +6807,15 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
 		mlx5_ipool_free(pool->idx_pool, idx);
-		return 0;
-	default:
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_destroy(dev, handle, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
+	return 0;
 }
 
 static int
@@ -6225,13 +6825,14 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hws_cnt *cnt;
 	struct rte_flow_query_count *qc = data;
-	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint32_t iidx;
 	uint64_t pkts, bytes;
 
 	if (!mlx5_hws_cnt_id_valid(counter))
 		return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				"counter are not available");
+	iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
 	cnt = &priv->hws_cpool->pool[iidx];
 	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
 	qc->hits_set = 1;
@@ -6245,12 +6846,64 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	return 0;
 }
 
+/**
+ * Query a flow rule AGE action for aging information.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] age_idx
+ *   Index of AGE action parameter.
+ * @param[out] data
+ *   Data retrieved by the query.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_query_age(const struct rte_eth_dev *dev, uint32_t age_idx, void *data,
+		  struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+	struct rte_flow_query_age *resp = data;
+
+	if (!param || !param->timeout)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "age data not available");
+	switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+	case HWS_AGE_AGED_OUT_REPORTED:
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		resp->aged = 1;
+		break;
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		resp->aged = 0;
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * When state is FREE the flow itself should be invalid.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	resp->sec_since_last_hit_valid = !resp->aged;
+	if (resp->sec_since_last_hit_valid)
+		resp->sec_since_last_hit = __atomic_load_n
+				 (&param->sec_since_last_hit, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
-flow_hw_query(struct rte_eth_dev *dev,
-	      struct rte_flow *flow __rte_unused,
-	      const struct rte_flow_action *actions __rte_unused,
-	      void *data __rte_unused,
-	      struct rte_flow_error *error __rte_unused)
+flow_hw_query(struct rte_eth_dev *dev, struct rte_flow *flow,
+	      const struct rte_flow_action *actions, void *data,
+	      struct rte_flow_error *error)
 {
 	int ret = -EINVAL;
 	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
@@ -6261,7 +6914,11 @@ flow_hw_query(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
-						  error);
+						    error);
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			ret = flow_hw_query_age(dev, hw_flow->age_idx, data,
+						error);
 			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
@@ -6273,6 +6930,32 @@ flow_hw_query(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_indir_action_conf *conf,
+			const struct rte_flow_action *action,
+			struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_validate(dev, MLX5_HW_INV_QUEUE, NULL,
+					      conf, action, NULL, err);
+}
+
 /**
  * Create indirect action.
  *
@@ -6296,6 +6979,12 @@ flow_hw_action_create(struct rte_eth_dev *dev,
 		       const struct rte_flow_action *action,
 		       struct rte_flow_error *err)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hws_strict_queue)
+		DRV_LOG(WARNING,
+			"port %u create indirect action called in strict queue mode.",
+			dev->data->port_id);
 	return flow_hw_action_handle_create(dev, MLX5_HW_INV_QUEUE,
 					    NULL, conf, action, NULL, err);
 }
@@ -6362,17 +7051,118 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return flow_hw_query_age(dev, age_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	default:
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_query(dev, handle, data, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
 }
 
+/**
+ * Get aged-out flows of a given port on the given HWS flow queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query. Ignored when RTE_FLOW_PORT_FLAG_STRICT_QUEUE not set.
+ * @param[in, out] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array, otherwise negative errno value.
+ */
+static int
+flow_hw_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			 void **contexts, uint32_t nb_contexts,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct rte_ring *r;
+	int nb_flows = 0;
+
+	if (nb_contexts && !contexts)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "empty context");
+	if (priv->hws_strict_queue) {
+		if (queue_id >= age_info->hw_q_age->nb_rings)
+			return rte_flow_error_set(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						NULL, "invalid queue id");
+		r = age_info->hw_q_age->aged_lists[queue_id];
+	} else {
+		r = age_info->hw_age.aged_list;
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	if (nb_contexts == 0)
+		return rte_ring_count(r);
+	while ((uint32_t)nb_flows < nb_contexts) {
+		uint32_t age_idx;
+
+		if (rte_ring_dequeue_elem(r, &age_idx, sizeof(uint32_t)) < 0)
+			break;
+		/* get the AGE context if the aged-out index is still valid. */
+		contexts[nb_flows] = mlx5_hws_age_context_get(priv, age_idx);
+		if (!contexts[nb_flows])
+			continue;
+		nb_flows++;
+	}
+	return nb_flows;
+}
+
+/**
+ * Get aged-out flows.
+ *
+ * This function is relevant only if RTE_FLOW_PORT_FLAG_STRICT_QUEUE isn't set.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+static int
+flow_hw_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
+		       uint32_t nb_contexts, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hws_strict_queue)
+		DRV_LOG(WARNING,
+			"port %u get aged flows called in strict queue mode.",
+			dev->data->port_id);
+	return flow_hw_get_q_aged_flows(dev, 0, contexts, nb_contexts, error);
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -6391,12 +7181,14 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
-	.action_validate = flow_dv_action_validate,
+	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
 	.action_update = flow_hw_action_update,
 	.action_query = flow_hw_action_query,
 	.query = flow_hw_query,
+	.get_aged_flows = flow_hw_get_aged_flows,
+	.get_q_aged_flows = flow_hw_get_q_aged_flows,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 7ffaf4c227..81a33ddf09 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -122,7 +122,7 @@ flow_verbs_counter_get_by_idx(struct rte_eth_dev *dev,
 			      struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	idx = (idx - 1) & (MLX5_CNT_SHARED_OFFSET - 1);
@@ -215,7 +215,7 @@ static uint32_t
 flow_verbs_counter_new(struct rte_eth_dev *dev, uint32_t id __rte_unused)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt = NULL;
 	uint32_t n_valid = cmng->n_valid;
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
index e2408ef36d..cd606dc20f 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.c
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -8,6 +8,7 @@
 #include <rte_ring.h>
 #include <mlx5_devx_cmds.h>
 #include <rte_cycles.h>
+#include <rte_eal_paging.h>
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
@@ -26,8 +27,8 @@ __hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
 	uint32_t preload;
 	uint32_t q_num = cpool->cache->q_num;
 	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
-	cnt_id_t cnt_id, iidx = 0;
-	uint32_t qidx;
+	cnt_id_t cnt_id;
+	uint32_t qidx, iidx = 0;
 	struct rte_ring *qcache = NULL;
 
 	/*
@@ -86,6 +87,174 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
 	} while (reset_cnt_num > 0);
 }
 
+/**
+ * Release AGE parameter.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param own_cnt_index
+ *   Counter ID to created only for this AGE to release.
+ *   Zero means there is no such counter.
+ * @param age_ipool
+ *   Pointer to AGE parameter indexed pool.
+ * @param idx
+ *   Index of AGE parameter in the indexed pool.
+ */
+static void
+mlx5_hws_age_param_free(struct mlx5_priv *priv, cnt_id_t own_cnt_index,
+			struct mlx5_indexed_pool *age_ipool, uint32_t idx)
+{
+	if (own_cnt_index) {
+		struct mlx5_hws_cnt_pool *cpool = priv->hws_cpool;
+
+		MLX5_ASSERT(mlx5_hws_cnt_is_shared(cpool, own_cnt_index));
+		mlx5_hws_cnt_shared_put(cpool, &own_cnt_index);
+	}
+	mlx5_ipool_free(age_ipool, idx);
+}
+
+/**
+ * Check and callback event for new aged flow in the HWS counter pool.
+ *
+ * @param[in] priv
+ *   Pointer to port private object.
+ * @param[in] cpool
+ *   Pointer to current counter pool.
+ */
+static void
+mlx5_hws_aging_check(struct mlx5_priv *priv, struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct flow_counter_stats *stats = cpool->raw_mng->raw;
+	struct mlx5_hws_age_param *param;
+	struct rte_ring *r;
+	const uint64_t curr_time = MLX5_CURR_TIME_SEC;
+	const uint32_t time_delta = curr_time - cpool->time_of_last_age_check;
+	uint32_t nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(cpool);
+	uint16_t expected1 = HWS_AGE_CANDIDATE;
+	uint16_t expected2 = HWS_AGE_CANDIDATE_INSIDE_RING;
+	uint32_t i;
+
+	cpool->time_of_last_age_check = curr_time;
+	for (i = 0; i < nb_alloc_cnts; ++i) {
+		uint32_t age_idx = cpool->pool[i].age_idx;
+		uint64_t hits;
+
+		if (!cpool->pool[i].in_used || age_idx == 0)
+			continue;
+		param = mlx5_ipool_get(age_info->ages_ipool, age_idx);
+		if (unlikely(param == NULL)) {
+			/*
+			 * When AGE which used indirect counter it is user
+			 * responsibility not using this indirect counter
+			 * without this AGE.
+			 * If this counter is used after the AGE was freed, the
+			 * AGE index is invalid and using it here will cause a
+			 * segmentation fault.
+			 */
+			DRV_LOG(WARNING,
+				"Counter %u is lost his AGE, it is unused.", i);
+			continue;
+		}
+		if (param->timeout == 0)
+			continue;
+		switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+		case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		case HWS_AGE_AGED_OUT_REPORTED:
+			/* Already aged-out, no action is needed. */
+			continue;
+		case HWS_AGE_CANDIDATE:
+		case HWS_AGE_CANDIDATE_INSIDE_RING:
+			/* This AGE candidate to be aged-out, go to checking. */
+			break;
+		case HWS_AGE_FREE:
+			/*
+			 * AGE parameter with state "FREE" couldn't be pointed
+			 * by any counter since counter is destroyed first.
+			 * Fall-through.
+			 */
+		default:
+			MLX5_ASSERT(0);
+			continue;
+		}
+		hits = rte_be_to_cpu_64(stats[i].hits);
+		if (param->nb_cnts == 1) {
+			if (stats[i].hits != param->accumulator_last_hits) {
+				__atomic_store_n(&param->sec_since_last_hit, 0,
+						 __ATOMIC_RELAXED);
+				param->accumulator_last_hits = hits;
+				continue;
+			}
+		} else {
+			param->accumulator_hits += hits;
+			param->accumulator_cnt++;
+			if (param->accumulator_cnt < param->nb_cnts)
+				continue;
+			param->accumulator_cnt = 0;
+			if (param->accumulator_last_hits !=
+						param->accumulator_hits) {
+				__atomic_store_n(&param->sec_since_last_hit,
+						 0, __ATOMIC_RELAXED);
+				param->accumulator_last_hits =
+							param->accumulator_hits;
+				param->accumulator_hits = 0;
+				continue;
+			}
+			param->accumulator_hits = 0;
+		}
+		if (__atomic_add_fetch(&param->sec_since_last_hit, time_delta,
+				       __ATOMIC_RELAXED) <=
+		   __atomic_load_n(&param->timeout, __ATOMIC_RELAXED))
+			continue;
+		/* Prepare the relevant ring for this AGE parameter */
+		if (priv->hws_strict_queue)
+			r = age_info->hw_q_age->aged_lists[param->queue_id];
+		else
+			r = age_info->hw_age.aged_list;
+		/* Changing the state atomically and insert it into the ring. */
+		if (__atomic_compare_exchange_n(&param->state, &expected1,
+						HWS_AGE_AGED_OUT_NOT_REPORTED,
+						false, __ATOMIC_RELAXED,
+						__ATOMIC_RELAXED)) {
+			int ret = rte_ring_enqueue_burst_elem(r, &age_idx,
+							      sizeof(uint32_t),
+							      1, NULL);
+
+			/*
+			 * The ring doesn't have enough room for this entry,
+			 * it replace back the state for the next second.
+			 *
+			 * FIXME: if until next sec it get traffic, we are going
+			 *        to lose this "aged out", will be fixed later
+			 *        when optimise it to fill ring in bulks.
+			 */
+			expected2 = HWS_AGE_AGED_OUT_NOT_REPORTED;
+			if (ret < 0 &&
+			    !__atomic_compare_exchange_n(&param->state,
+							 &expected2, expected1,
+							 false,
+							 __ATOMIC_RELAXED,
+							 __ATOMIC_RELAXED) &&
+			    expected2 == HWS_AGE_FREE)
+				mlx5_hws_age_param_free(priv,
+							param->own_cnt_index,
+							age_info->ages_ipool,
+							age_idx);
+			/* The event is irrelevant in strict queue mode. */
+			if (!priv->hws_strict_queue)
+				MLX5_AGE_SET(age_info, MLX5_AGE_EVENT_NEW);
+		} else {
+			__atomic_compare_exchange_n(&param->state, &expected2,
+						  HWS_AGE_AGED_OUT_NOT_REPORTED,
+						  false, __ATOMIC_RELAXED,
+						  __ATOMIC_RELAXED);
+		}
+	}
+	/* The event is irrelevant in strict queue mode. */
+	if (!priv->hws_strict_queue)
+		mlx5_age_event_prepare(priv->sh);
+}
+
 static void
 mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
 			   struct mlx5_hws_cnt_raw_data_mng *mng)
@@ -104,12 +273,14 @@ mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
 	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
 	int ret;
 	size_t sz = n * sizeof(struct flow_counter_stats);
+	size_t pgsz = rte_mem_page_size();
 
+	MLX5_ASSERT(pgsz > 0);
 	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
 			SOCKET_ID_ANY);
 	if (mng == NULL)
 		goto error;
-	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, pgsz,
 			SOCKET_ID_ANY);
 	if (mng->raw == NULL)
 		goto error;
@@ -146,6 +317,9 @@ mlx5_hws_cnt_svc(void *opaque)
 			    opriv->sh == sh &&
 			    opriv->hws_cpool != NULL) {
 				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+				if (opriv->hws_age_req)
+					mlx5_hws_aging_check(opriv,
+							     opriv->hws_cpool);
 			}
 		}
 		query_cycle = rte_rdtsc() - start_cycle;
@@ -158,8 +332,9 @@ mlx5_hws_cnt_svc(void *opaque)
 }
 
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg)
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct mlx5_hws_cnt_pool *cntp;
@@ -185,16 +360,26 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 	cntp->cache->preload_sz = ccfg->preload_sz;
 	cntp->cache->threshold = ccfg->threshold;
 	cntp->cache->q_num = ccfg->q_num;
+	if (pcfg->request_num > sh->hws_max_nb_counters) {
+		DRV_LOG(ERR, "Counter number %u "
+			"is greater than the maximum supported (%u).",
+			pcfg->request_num, sh->hws_max_nb_counters);
+		goto error;
+	}
 	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
 	if (cnt_num > UINT32_MAX) {
 		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
 			cnt_num);
 		goto error;
 	}
+	/*
+	 * When counter request number is supported, but the factor takes it
+	 * out of size, the factor is reduced.
+	 */
+	cnt_num = RTE_MIN((uint32_t)cnt_num, sh->hws_max_nb_counters);
 	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
-			sizeof(struct mlx5_hws_cnt) *
-			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
-			0, SOCKET_ID_ANY);
+				 sizeof(struct mlx5_hws_cnt) * cnt_num,
+				 0, SOCKET_ID_ANY);
 	if (cntp->pool == NULL)
 		goto error;
 	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
@@ -231,6 +416,8 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 		if (cntp->cache->qcache[qidx] == NULL)
 			goto error;
 	}
+	/* Initialize the time for aging-out calculation. */
+	cntp->time_of_last_age_check = MLX5_CURR_TIME_SEC;
 	return cntp;
 error:
 	mlx5_hws_cnt_pool_deinit(cntp);
@@ -297,19 +484,17 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_hws_cnt_pool *cpool)
 {
 	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
-	uint32_t max_log_bulk_sz = 0;
+	uint32_t max_log_bulk_sz = sh->hws_max_log_bulk_sz;
 	uint32_t log_bulk_sz;
-	uint32_t idx, alloced = 0;
+	uint32_t idx, alloc_candidate, alloced = 0;
 	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
 	struct mlx5_devx_counter_attr attr = {0};
 	struct mlx5_devx_obj *dcs;
 
 	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
-		DRV_LOG(ERR,
-			"Fw doesn't support bulk log max alloc");
+		DRV_LOG(ERR, "Fw doesn't support bulk log max alloc");
 		return -1;
 	}
-	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
 	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
 	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
 	attr.pd = sh->cdev->pdn;
@@ -327,18 +512,23 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 	cpool->dcs_mng.dcs[0].iidx = 0;
 	alloced = cpool->dcs_mng.dcs[0].batch_sz;
 	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
-		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+		while (idx < MLX5_HWS_CNT_DCS_NUM) {
 			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			alloc_candidate = RTE_BIT32(max_log_bulk_sz);
+			if (alloced + alloc_candidate > sh->hws_max_nb_counters)
+				continue;
 			dcs = mlx5_devx_cmd_flow_counter_alloc_general
 				(sh->cdev->ctx, &attr);
 			if (dcs == NULL)
 				goto error;
 			cpool->dcs_mng.dcs[idx].obj = dcs;
-			cpool->dcs_mng.dcs[idx].batch_sz =
-				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].batch_sz = alloc_candidate;
 			cpool->dcs_mng.dcs[idx].iidx = alloced;
 			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
 			cpool->dcs_mng.batch_total++;
+			if (alloced >= cnt_num)
+				break;
+			idx++;
 		}
 	}
 	return 0;
@@ -445,7 +635,7 @@ mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
 			dev->data->port_id);
 	pcfg.name = mp_name;
 	pcfg.request_num = pattr->nb_counters;
-	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	cpool = mlx5_hws_cnt_pool_init(priv->sh, &pcfg, &cparam);
 	if (cpool == NULL)
 		goto error;
 	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
@@ -525,4 +715,484 @@ mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
 	sh->cnt_svc = NULL;
 }
 
+/**
+ * Destroy AGE action.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ * @param error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	switch (__atomic_exchange_n(&param->state, HWS_AGE_FREE,
+				    __ATOMIC_RELAXED)) {
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_AGED_OUT_REPORTED:
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		/*
+		 * In both cases AGE is inside the ring. Change the state here
+		 * and destroy it later when it is taken out of ring.
+		 */
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * If index is valid and state is FREE, it says this AGE has
+		 * been freed for the user but not for the PMD since it is
+		 * inside the ring.
+		 */
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "this AGE has already been released");
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return 0;
+}
+
+/**
+ * Create AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue_id
+ *   Which HWS queue to be used.
+ * @param[in] shared
+ *   Whether it indirect AGE action.
+ * @param[in] flow_idx
+ *   Flow index from indexed pool.
+ *   For indirect AGE action it doesn't affect.
+ * @param[in] age
+ *   Pointer to the aging action configuration.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Index to AGE action parameter on success, 0 otherwise.
+ */
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param;
+	uint32_t age_idx;
+
+	param = mlx5_ipool_malloc(ipool, &age_idx);
+	if (param == NULL) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   "cannot allocate AGE parameter");
+		return 0;
+	}
+	MLX5_ASSERT(__atomic_load_n(&param->state,
+				    __ATOMIC_RELAXED) == HWS_AGE_FREE);
+	if (shared) {
+		param->nb_cnts = 0;
+		param->accumulator_hits = 0;
+		param->accumulator_cnt = 0;
+		flow_idx = age_idx;
+	} else {
+		param->nb_cnts = 1;
+	}
+	param->context = age->context ? age->context :
+					(void *)(uintptr_t)flow_idx;
+	param->timeout = age->timeout;
+	param->queue_id = queue_id;
+	param->accumulator_last_hits = 0;
+	param->own_cnt_index = 0;
+	param->sec_since_last_hit = 0;
+	param->state = HWS_AGE_CANDIDATE;
+	return age_idx;
+}
+
+/**
+ * Update indirect AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] idx
+ *   Index of AGE parameter.
+ * @param[in] update
+ *   Update value.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error)
+{
+	const struct rte_flow_update_age *update_ade = update;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	bool sec_since_last_hit_reset = false;
+	bool state_update = false;
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	if (update_ade->timeout_valid) {
+		uint32_t old_timeout = __atomic_exchange_n(&param->timeout,
+							   update_ade->timeout,
+							   __ATOMIC_RELAXED);
+
+		if (old_timeout == 0)
+			sec_since_last_hit_reset = true;
+		else if (old_timeout < update_ade->timeout ||
+			 update_ade->timeout == 0)
+			/*
+			 * When timeout is increased, aged-out flows might be
+			 * active again and state should be updated accordingly.
+			 * When new timeout is 0, we update the state for not
+			 * reporting aged-out stopped.
+			 */
+			state_update = true;
+	}
+	if (update_ade->touch) {
+		sec_since_last_hit_reset = true;
+		state_update = true;
+	}
+	if (sec_since_last_hit_reset)
+		__atomic_store_n(&param->sec_since_last_hit, 0,
+				 __ATOMIC_RELAXED);
+	if (state_update) {
+		uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+		/*
+		 * Change states of aged-out flows to active:
+		 *  - AGED_OUT_NOT_REPORTED -> CANDIDATE_INSIDE_RING
+		 *  - AGED_OUT_REPORTED -> CANDIDATE
+		 */
+		if (!__atomic_compare_exchange_n(&param->state, &expected,
+						 HWS_AGE_CANDIDATE_INSIDE_RING,
+						 false, __ATOMIC_RELAXED,
+						 __ATOMIC_RELAXED) &&
+		    expected == HWS_AGE_AGED_OUT_REPORTED)
+			__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+					 __ATOMIC_RELAXED);
+	}
+	return 0;
+}
+
+/**
+ * Get the AGE context if the aged-out index is still valid.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ *
+ * @return
+ *   AGE context if the index is still aged-out, NULL otherwise.
+ */
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+	MLX5_ASSERT(param != NULL);
+	if (__atomic_compare_exchange_n(&param->state, &expected,
+					HWS_AGE_AGED_OUT_REPORTED, false,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
+		return param->context;
+	switch (expected) {
+	case HWS_AGE_FREE:
+		/*
+		 * This AGE couldn't have been destroyed since it was inside
+		 * the ring. Its state has updated, and now it is actually
+		 * destroyed.
+		 */
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+				 __ATOMIC_RELAXED);
+		break;
+	case HWS_AGE_CANDIDATE:
+		/*
+		 * Only BG thread pushes to ring and it never pushes this state.
+		 * When AGE inside the ring becomes candidate, it has a special
+		 * state called HWS_AGE_CANDIDATE_INSIDE_RING.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_REPORTED:
+		/*
+		 * Only this thread (doing query) may write this state, and it
+		 * happens only after the query thread takes it out of the ring.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		/*
+		 * In this case the compare return true and function return
+		 * the context immediately.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return NULL;
+}
+
+#ifdef RTE_ARCH_64
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX UINT32_MAX
+#else
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX RTE_BIT32(8)
+#endif
+
+/**
+ * Get the size of aged out ring list for each queue.
+ *
+ * The size is one percent of nb_counters divided by nb_queues.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is on.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ * @param nb_queues
+ *   Number of HWS queues in this port.
+ *
+ * @return
+ *   Size of aged out ring per queue.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_q_ring_size_get(uint32_t nb_counters, uint32_t nb_queues)
+{
+	uint32_t size = rte_align32pow2((nb_counters / 100) / nb_queues);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Get the size of the aged out ring list.
+ *
+ * The size is one percent of nb_counters.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is off.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ *
+ * @return
+ *   Size of the aged out ring list.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_ring_size_get(uint32_t nb_counters)
+{
+	uint32_t size = rte_align32pow2(nb_counters / 100);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Initialize the shared aging list information per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param nb_queues
+ *   Number of HWS queues.
+ * @param strict_queue
+ *   Indicator whether is strict_queue mode.
+ * @param ring_size
+ *   Size of aged-out ring for creation.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hws_age_info_init(struct rte_eth_dev *dev, uint16_t nb_queues,
+		       bool strict_queue, uint32_t ring_size)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint32_t flags = RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct rte_ring *r = NULL;
+	uint32_t qidx;
+
+	age_info->flags = 0;
+	if (strict_queue) {
+		size_t size = sizeof(*age_info->hw_q_age) +
+			      sizeof(struct rte_ring *) * nb_queues;
+
+		age_info->hw_q_age = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+						 size, 0, SOCKET_ID_ANY);
+		if (age_info->hw_q_age == NULL)
+			return -ENOMEM;
+		for (qidx = 0; qidx < nb_queues; ++qidx) {
+			snprintf(mz_name, sizeof(mz_name),
+				 "port_%u_queue_%u_aged_out_ring",
+				 dev->data->port_id, qidx);
+			r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY,
+					    flags);
+			if (r == NULL) {
+				DRV_LOG(ERR, "\"%s\" creation failed: %s",
+					mz_name, rte_strerror(rte_errno));
+				goto error;
+			}
+			age_info->hw_q_age->aged_lists[qidx] = r;
+			DRV_LOG(DEBUG,
+				"\"%s\" is successfully created (size=%u).",
+				mz_name, ring_size);
+		}
+		age_info->hw_q_age->nb_rings = nb_queues;
+	} else {
+		snprintf(mz_name, sizeof(mz_name), "port_%u_aged_out_ring",
+			 dev->data->port_id);
+		r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY, flags);
+		if (r == NULL) {
+			DRV_LOG(ERR, "\"%s\" creation failed: %s", mz_name,
+				rte_strerror(rte_errno));
+			return -rte_errno;
+		}
+		age_info->hw_age.aged_list = r;
+		DRV_LOG(DEBUG, "\"%s\" is successfully created (size=%u).",
+			mz_name, ring_size);
+		/* In non "strict_queue" mode, initialize the event. */
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	return 0;
+error:
+	MLX5_ASSERT(strict_queue);
+	while (qidx--)
+		rte_ring_free(age_info->hw_q_age->aged_lists[qidx]);
+	rte_free(age_info->hw_q_age);
+	return -1;
+}
+
+/**
+ * Destroy the shared aging list information per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+static void
+mlx5_hws_age_info_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint16_t nb_queues = age_info->hw_q_age->nb_rings;
+
+	if (priv->hws_strict_queue) {
+		uint32_t qidx;
+
+		for (qidx = 0; qidx < nb_queues; ++qidx)
+			rte_ring_free(age_info->hw_q_age->aged_lists[qidx]);
+		rte_free(age_info->hw_q_age);
+	} else {
+		rte_ring_free(age_info->hw_age.aged_list);
+	}
+}
+
+/**
+ * Initialize the aging mechanism per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param attr
+ *   Port configuration attributes.
+ * @param nb_queues
+ *   Number of HWS queues.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool_config cfg = {
+		.size =
+		      RTE_CACHE_LINE_ROUNDUP(sizeof(struct mlx5_hws_age_param)),
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hws_age_pool",
+	};
+	bool strict_queue = !!(attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE);
+	uint32_t nb_alloc_cnts;
+	uint32_t rsize;
+	uint32_t nb_ages_updated;
+	int ret;
+
+	MLX5_ASSERT(priv->hws_cpool);
+	nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(priv->hws_cpool);
+	if (strict_queue) {
+		rsize = mlx5_hws_aged_out_q_ring_size_get(nb_alloc_cnts,
+							  nb_queues);
+		nb_ages_updated = rsize * nb_queues + attr->nb_aging_objects;
+	} else {
+		rsize = mlx5_hws_aged_out_ring_size_get(nb_alloc_cnts);
+		nb_ages_updated = rsize + attr->nb_aging_objects;
+	}
+	ret = mlx5_hws_age_info_init(dev, nb_queues, strict_queue, rsize);
+	if (ret < 0)
+		return ret;
+	cfg.trunk_size = rte_align32pow2(nb_ages_updated);
+	age_info->ages_ipool = mlx5_ipool_create(&cfg);
+	if (age_info->ages_ipool == NULL) {
+		mlx5_hws_age_info_destroy(priv);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	priv->hws_age_req = 1;
+	return 0;
+}
+
+/**
+ * Cleanup all aging resources per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+
+	MLX5_ASSERT(priv->hws_age_req);
+	mlx5_ipool_destroy(age_info->ages_ipool);
+	age_info->ages_ipool = NULL;
+	mlx5_hws_age_info_destroy(priv);
+	priv->hws_age_req = 0;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index 5fab4ba597..e311923f71 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -10,26 +10,26 @@
 #include "mlx5_flow.h"
 
 /*
- * COUNTER ID's layout
+ * HWS COUNTER ID's layout
  *       3                   2                   1                   0
  *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- *    | T |       | D |                                               |
- *    ~ Y |       | C |                    IDX                        ~
- *    | P |       | S |                                               |
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
- *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
  *    Bit 25:24 = DCS index
  *    Bit 23:00 = IDX in this counter belonged DCS bulk.
  */
-typedef uint32_t cnt_id_t;
 
-#define MLX5_HWS_CNT_DCS_NUM 4
 #define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
 #define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
 #define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
 
+#define MLX5_HWS_AGE_IDX_MASK (RTE_BIT32(MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1)
+
 struct mlx5_hws_cnt_dcs {
 	void *dr_action;
 	uint32_t batch_sz;
@@ -44,12 +44,22 @@ struct mlx5_hws_cnt_dcs_mng {
 
 struct mlx5_hws_cnt {
 	struct flow_counter_stats reset;
+	bool in_used; /* Indicator whether this counter in used or in pool. */
 	union {
-		uint32_t share: 1;
-		/*
-		 * share will be set to 1 when this counter is used as indirect
-		 * action. Only meaningful when user own this counter.
-		 */
+		struct {
+			uint32_t share:1;
+			/*
+			 * share will be set to 1 when this counter is used as
+			 * indirect action.
+			 */
+			uint32_t age_idx:24;
+			/*
+			 * When this counter uses for aging, it save the index
+			 * of AGE parameter. For pure counter (without aging)
+			 * this index is zero.
+			 */
+		};
+		/* This struct is only meaningful when user own this counter. */
 		uint32_t query_gen_when_free;
 		/*
 		 * When PMD own this counter (user put back counter to PMD
@@ -96,8 +106,48 @@ struct mlx5_hws_cnt_pool {
 	struct rte_ring *free_list;
 	struct rte_ring *wait_reset_list;
 	struct mlx5_hws_cnt_pool_caches *cache;
+	uint64_t time_of_last_age_check;
 } __rte_cache_aligned;
 
+/* HWS AGE status. */
+enum {
+	HWS_AGE_FREE, /* Initialized state. */
+	HWS_AGE_CANDIDATE, /* AGE assigned to flows. */
+	HWS_AGE_CANDIDATE_INSIDE_RING,
+	/*
+	 * AGE assigned to flows but it still in ring. It was aged-out but the
+	 * timeout was changed, so it in ring but stiil candidate.
+	 */
+	HWS_AGE_AGED_OUT_REPORTED,
+	/*
+	 * Aged-out, reported by rte_flow_get_q_aged_flows and wait for destroy.
+	 */
+	HWS_AGE_AGED_OUT_NOT_REPORTED,
+	/*
+	 * Aged-out, inside the aged-out ring.
+	 * wait for rte_flow_get_q_aged_flows and destroy.
+	 */
+};
+
+/* HWS counter age parameter. */
+struct mlx5_hws_age_param {
+	uint32_t timeout; /* Aging timeout in seconds (atomically accessed). */
+	uint32_t sec_since_last_hit;
+	/* Time in seconds since last hit (atomically accessed). */
+	uint16_t state; /* AGE state (atomically accessed). */
+	uint64_t accumulator_last_hits;
+	/* Last total value of hits for comparing. */
+	uint64_t accumulator_hits;
+	/* Accumulator for hits coming from several counters. */
+	uint32_t accumulator_cnt;
+	/* Number counters which already updated the accumulator in this sec. */
+	uint32_t nb_cnts; /* Number counters used by this AGE. */
+	uint32_t queue_id; /* Queue id of the counter. */
+	cnt_id_t own_cnt_index;
+	/* Counter action created specifically for this AGE action. */
+	void *context; /* Flow AGE context. */
+} __rte_packed __rte_cache_aligned;
+
 /**
  * Translate counter id into internal index (start from 0), which can be used
  * as index of raw/cnt pool.
@@ -107,7 +157,7 @@ struct mlx5_hws_cnt_pool {
  * @return
  *   Internal index
  */
-static __rte_always_inline cnt_id_t
+static __rte_always_inline uint32_t
 mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 {
 	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
@@ -139,7 +189,7 @@ mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
  *   Counter id
  */
 static __rte_always_inline cnt_id_t
-mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, uint32_t iidx)
 {
 	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
 	uint32_t idx;
@@ -344,9 +394,10 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
 	struct rte_ring_zc_data zcdr = {0};
 	struct rte_ring *qcache = NULL;
 	unsigned int wb_num = 0; /* cache write-back number. */
-	cnt_id_t iidx;
+	uint32_t iidx;
 
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].in_used = false;
 	cpool->pool[iidx].query_gen_when_free =
 		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
 	if (likely(queue != NULL))
@@ -388,20 +439,23 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
  *   A pointer to HWS queue. If null, it means fetch from common pool.
  * @param cnt_id
  *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @param age_idx
+ *   Index of AGE parameter using this counter, zero means there is no such AGE.
+ *
  * @return
  *   - 0: Success; objects taken.
  *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
  *   - -EAGAIN: counter is not ready; try again.
  */
 static __rte_always_inline int
-mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
-		uint32_t *queue, cnt_id_t *cnt_id)
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue,
+		      cnt_id_t *cnt_id, uint32_t age_idx)
 {
 	unsigned int ret;
 	struct rte_ring_zc_data zcdc = {0};
 	struct rte_ring *qcache = NULL;
-	uint32_t query_gen = 0;
-	cnt_id_t iidx, tmp_cid = 0;
+	uint32_t iidx, query_gen = 0;
+	cnt_id_t tmp_cid = 0;
 
 	if (likely(queue != NULL))
 		qcache = cpool->cache->qcache[*queue];
@@ -422,6 +476,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 		__hws_cnt_query_raw(cpool, *cnt_id,
 				    &cpool->pool[iidx].reset.hits,
 				    &cpool->pool[iidx].reset.bytes);
+		cpool->pool[iidx].in_used = true;
+		cpool->pool[iidx].age_idx = age_idx;
 		return 0;
 	}
 	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
@@ -455,6 +511,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 			    &cpool->pool[iidx].reset.bytes);
 	rte_ring_dequeue_zc_elem_finish(qcache, 1);
 	cpool->pool[iidx].share = 0;
+	cpool->pool[iidx].in_used = true;
+	cpool->pool[iidx].age_idx = age_idx;
 	return 0;
 }
 
@@ -478,16 +536,16 @@ mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
 }
 
 static __rte_always_inline int
-mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id,
+			uint32_t age_idx)
 {
 	int ret;
 	uint32_t iidx;
 
-	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id, age_idx);
 	if (ret != 0)
 		return ret;
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
-	MLX5_ASSERT(cpool->pool[iidx].share == 0);
 	cpool->pool[iidx].share = 1;
 	return 0;
 }
@@ -513,10 +571,73 @@ mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 	return cpool->pool[iidx].share ? true : false;
 }
 
+static __rte_always_inline void
+mlx5_hws_cnt_age_set(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		     uint32_t age_idx)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	cpool->pool[iidx].age_idx = age_idx;
+}
+
+static __rte_always_inline uint32_t
+mlx5_hws_cnt_age_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	return cpool->pool[iidx].age_idx;
+}
+
+static __rte_always_inline cnt_id_t
+mlx5_hws_age_cnt_get(struct mlx5_priv *priv, struct mlx5_hws_age_param *param,
+		     uint32_t age_idx)
+{
+	if (!param->own_cnt_index) {
+		/* Create indirect counter one for internal usage. */
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool,
+					    &param->own_cnt_index, age_idx) < 0)
+			return 0;
+		param->nb_cnts++;
+	}
+	return param->own_cnt_index;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_increase(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	MLX5_ASSERT(param != NULL);
+	param->nb_cnts++;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_decrease(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	if (param != NULL)
+		param->nb_cnts--;
+}
+
+static __rte_always_inline bool
+mlx5_hws_age_is_indirect(uint32_t age_idx)
+{
+	return (age_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_AGE ? true : false;
+}
+
 /* init HWS counter pool. */
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg);
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg);
 
 void
 mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
@@ -555,4 +676,28 @@ mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
 void
 mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
 
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error);
+
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error);
+
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error);
+
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx);
+
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues);
+
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv);
+
 #endif /* _MLX5_HWS_CNT_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 14/17] net/mlx5: add async action push and pull support
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (12 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 13/17] net/mlx5: add HWS AGE action support Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 15/17] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

The queue based rte_flow_async_action_* functions work same as
queue based async flow functions. The operations can be pushed
asynchronously, so is the pull.

This commit adds the async action missing push and pull support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |  62 ++++-
 drivers/net/mlx5/mlx5_flow.c       |  45 ++++
 drivers/net/mlx5/mlx5_flow.h       |  17 ++
 drivers/net/mlx5/mlx5_flow_aso.c   | 181 +++++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c    |   7 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 412 +++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c |   6 +-
 7 files changed, 626 insertions(+), 104 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index eca719f269..5d92df8965 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -341,6 +341,8 @@ struct mlx5_lb_ctx {
 enum {
 	MLX5_HW_Q_JOB_TYPE_CREATE, /* Flow create job type. */
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
+	MLX5_HW_Q_JOB_TYPE_UPDATE,
+	MLX5_HW_Q_JOB_TYPE_QUERY,
 };
 
 #define MLX5_HW_MAX_ITEMS (16)
@@ -348,12 +350,23 @@ enum {
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
-	struct rte_flow_hw *flow; /* Flow attached to the job. */
+	union {
+		struct rte_flow_hw *flow; /* Flow attached to the job. */
+		const void *action; /* Indirect action attached to the job. */
+	};
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
 	struct rte_flow_item *items;
-	struct rte_flow_item_ethdev port_spec;
+	union {
+		struct {
+			/* Pointer to ct query user memory. */
+			struct rte_flow_action_conntrack *profile;
+			/* Pointer to ct ASO query out memory. */
+			void *out_data;
+		} __rte_packed;
+		struct rte_flow_item_ethdev port_spec;
+	} __rte_packed;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -361,6 +374,8 @@ struct mlx5_hw_q {
 	uint32_t job_idx; /* Free job index. */
 	uint32_t size; /* LIFO size. */
 	struct mlx5_hw_q_job **job; /* LIFO header. */
+	struct rte_ring *indir_cq; /* Indirect action SW completion queue. */
+	struct rte_ring *indir_iq; /* Indirect action SW in progress queue. */
 } __rte_cache_aligned;
 
 
@@ -569,6 +584,7 @@ struct mlx5_aso_sq_elem {
 			struct mlx5_aso_ct_action *ct;
 			char *query_data;
 		};
+		void *user_data;
 	};
 };
 
@@ -578,7 +594,9 @@ struct mlx5_aso_sq {
 	struct mlx5_aso_cq cq;
 	struct mlx5_devx_sq sq_obj;
 	struct mlx5_pmd_mr mr;
+	volatile struct mlx5_aso_wqe *db;
 	uint16_t pi;
+	uint16_t db_pi;
 	uint32_t head;
 	uint32_t tail;
 	uint32_t sqn;
@@ -993,6 +1011,7 @@ struct mlx5_flow_meter_profile {
 enum mlx5_aso_mtr_state {
 	ASO_METER_FREE, /* In free list. */
 	ASO_METER_WAIT, /* ACCESS_ASO WQE in progress. */
+	ASO_METER_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_METER_READY, /* CQE received. */
 };
 
@@ -1195,6 +1214,7 @@ struct mlx5_bond_info {
 enum mlx5_aso_ct_state {
 	ASO_CONNTRACK_FREE, /* Inactive, in the free list. */
 	ASO_CONNTRACK_WAIT, /* WQE sent in the SQ. */
+	ASO_CONNTRACK_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_CONNTRACK_READY, /* CQE received w/o error. */
 	ASO_CONNTRACK_QUERY, /* WQE for query sent. */
 	ASO_CONNTRACK_MAX, /* Guard. */
@@ -1203,13 +1223,21 @@ enum mlx5_aso_ct_state {
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
 	union {
-		LIST_ENTRY(mlx5_aso_ct_action) next;
-		/* Pointer to the next ASO CT. Used only in SWS. */
-		struct mlx5_aso_ct_pool *pool;
-		/* Pointer to action pool. Used only in HWS. */
+		/* SWS mode struct. */
+		struct {
+			/* Pointer to the next ASO CT. Used only in SWS. */
+			LIST_ENTRY(mlx5_aso_ct_action) next;
+		};
+		/* HWS mode struct. */
+		struct {
+			/* Pointer to action pool. Used only in HWS. */
+			struct mlx5_aso_ct_pool *pool;
+		};
 	};
-	void *dr_action_orig; /* General action object for original dir. */
-	void *dr_action_rply; /* General action object for reply dir. */
+	/* General action object for original dir. */
+	void *dr_action_orig;
+	/* General action object for reply dir. */
+	void *dr_action_rply;
 	uint32_t refcnt; /* Action used count in device flows. */
 	uint16_t offset; /* Offset of ASO CT in DevX objects bulk. */
 	uint16_t peer; /* The only peer port index could also use this CT. */
@@ -2135,18 +2163,21 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 			   enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
-				 struct mlx5_aso_mtr *mtr,
-				 struct mlx5_mtr_bulk *bulk);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk,
+		void *user_data, bool push);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile);
+			      const struct rte_flow_action_conntrack *profile,
+			      void *user_data,
+			      bool push);
 int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
 int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
-			     struct rte_flow_action_conntrack *profile);
+			     struct rte_flow_action_conntrack *profile,
+			     void *user_data, bool push);
 int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
@@ -2154,6 +2185,13 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+void mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
+			     char *wdata);
+void mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_sq *sq);
+int mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			     struct rte_flow_op_result res[],
+			     uint16_t n_res);
 int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 4bfa604578..bc2ccb4d3c 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -979,6 +979,14 @@ mlx5_flow_async_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				  void *user_data,
 				  struct rte_flow_error *error);
 
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				 const struct rte_flow_op_attr *attr,
+				 const struct rte_flow_action_handle *handle,
+				 void *data,
+				 void *user_data,
+				 struct rte_flow_error *error);
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -1015,6 +1023,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.push = mlx5_flow_push,
 	.async_action_handle_create = mlx5_flow_async_action_handle_create,
 	.async_action_handle_update = mlx5_flow_async_action_handle_update,
+	.async_action_handle_query = mlx5_flow_async_action_handle_query,
 	.async_action_handle_destroy = mlx5_flow_async_action_handle_destroy,
 };
 
@@ -8858,6 +8867,42 @@ mlx5_flow_async_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 					 update, user_data, error);
 }
 
+/**
+ * Query shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used..
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] handle
+ *   Action handle to be updated.
+ * @param[in] data
+ *   Pointer query result data.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				    const struct rte_flow_op_attr *attr,
+				    const struct rte_flow_action_handle *handle,
+				    void *data,
+				    void *user_data,
+				    struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops =
+			flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+
+	return fops->async_action_query(dev, queue, attr, handle,
+					data, user_data, error);
+}
+
 /**
  * Destroy shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 30a18ea35e..e45869a890 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -57,6 +57,13 @@ enum mlx5_rte_flow_field_id {
 
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
+#define MLX5_INDIRECT_ACTION_TYPE_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) >> MLX5_INDIRECT_ACTION_TYPE_OFFSET)
+
+#define MLX5_INDIRECT_ACTION_IDX_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) & \
+	 ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1))
+
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
@@ -1816,6 +1823,15 @@ typedef int (*mlx5_flow_async_action_handle_update_t)
 			 void *user_data,
 			 struct rte_flow_error *error);
 
+typedef int (*mlx5_flow_async_action_handle_query_t)
+			(struct rte_eth_dev *dev,
+			 uint32_t queue,
+			 const struct rte_flow_op_attr *attr,
+			 const struct rte_flow_action_handle *handle,
+			 void *data,
+			 void *user_data,
+			 struct rte_flow_error *error);
+
 typedef int (*mlx5_flow_async_action_handle_destroy_t)
 			(struct rte_eth_dev *dev,
 			 uint32_t queue,
@@ -1878,6 +1894,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_push_t push;
 	mlx5_flow_async_action_handle_create_t async_action_create;
 	mlx5_flow_async_action_handle_update_t async_action_update;
+	mlx5_flow_async_action_handle_query_t async_action_query;
 	mlx5_flow_async_action_handle_destroy_t async_action_destroy;
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index f371fff2e2..43ef893e9d 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -519,6 +519,70 @@ mlx5_aso_cqe_err_handle(struct mlx5_aso_sq *sq)
 			       (volatile uint32_t *)&sq->sq_obj.aso_wqes[idx]);
 }
 
+int
+mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			 struct rte_flow_op_result res[],
+			 uint16_t n_res)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const uint32_t cq_size = 1 << cq->log_desc_n;
+	const uint32_t mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx;
+	uint16_t max;
+	uint16_t n = 0;
+	int ret;
+
+	max = (uint16_t)(sq->head - sq->tail);
+	if (unlikely(!max || !n_res))
+		return 0;
+	next_idx = cq->cq_ci & mask;
+	do {
+		idx = next_idx;
+		next_idx = (cq->cq_ci + 1) & mask;
+		/* Need to confirm the position of the prefetch. */
+		rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+		cqe = &cq->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, cq->cq_ci);
+		/*
+		 * Be sure owner read is done before any other cookie field or
+		 * opaque field.
+		 */
+		rte_io_rmb();
+		if (ret == MLX5_CQE_STATUS_HW_OWN)
+			break;
+		res[n].user_data = sq->elts[(uint16_t)((sq->tail + n) & mask)].user_data;
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			mlx5_aso_cqe_err_handle(sq);
+			res[n].status = RTE_FLOW_OP_ERROR;
+		} else {
+			res[n].status = RTE_FLOW_OP_SUCCESS;
+		}
+		cq->cq_ci++;
+		if (++n == n_res)
+			break;
+	} while (1);
+	if (likely(n)) {
+		sq->tail += n;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return n;
+}
+
+void
+mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		  struct mlx5_aso_sq *sq)
+{
+	if (sq->db_pi == sq->pi)
+		return;
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)sq->db,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	sq->db_pi = sq->pi;
+}
+
 /**
  * Update ASO objects upon completion.
  *
@@ -728,7 +792,9 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
 			       struct mlx5_mtr_bulk *bulk,
-				   bool need_lock)
+			       bool need_lock,
+			       void *user_data,
+			       bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -754,7 +820,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
-	sq->elts[sq->head & mask].mtr = aso_mtr;
+	sq->elts[sq->head & mask].mtr = user_data ? user_data : aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
 		if (likely(sh->config.dv_flow_en == 2))
 			pool = aso_mtr->pool;
@@ -820,9 +886,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	 */
 	sq->head++;
 	sq->pi += 2;/* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -912,11 +982,14 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
-			struct mlx5_mtr_bulk *bulk)
+			struct mlx5_mtr_bulk *bulk,
+			void *user_data,
+			bool push)
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 	bool need_lock;
+	int ret;
 
 	if (likely(sh->config.dv_flow_en == 2)) {
 		if (queue == MLX5_HW_INV_QUEUE) {
@@ -930,10 +1003,15 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						     need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
-						   bulk, need_lock))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						   need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -962,6 +1040,7 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	uint8_t state;
 	bool need_lock;
 
 	if (likely(sh->config.dv_flow_en == 2)) {
@@ -976,8 +1055,8 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
-	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
-					    ASO_METER_READY)
+	state = __atomic_load_n(&mtr->state, __ATOMIC_RELAXED);
+	if (state == ASO_METER_READY || state == ASO_METER_WAIT_ASYNC)
 		return 0;
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
@@ -1093,7 +1172,9 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile,
-			      bool need_lock)
+			      bool need_lock,
+			      void *user_data,
+			      bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1117,10 +1198,16 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
-	sq->elts[sq->head & mask].ct = ct;
-	sq->elts[sq->head & mask].query_data = NULL;
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_WAIT);
+	if (user_data) {
+		sq->elts[sq->head & mask].user_data = user_data;
+	} else {
+		sq->elts[sq->head & mask].ct = ct;
+		sq->elts[sq->head & mask].query_data = NULL;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
+
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1200,9 +1287,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 		 profile->reply_dir.max_ack);
 	sq->head++;
 	sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1258,7 +1349,9 @@ static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_aso_sq *sq,
 			    struct mlx5_aso_ct_action *ct, char *data,
-			    bool need_lock)
+			    bool need_lock,
+			    void *user_data,
+			    bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1284,14 +1377,23 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_QUERY);
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_QUERY);
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	/* Confirm the location and address of the prefetch instruction. */
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	wqe_idx = sq->head & mask;
-	sq->elts[wqe_idx].ct = ct;
-	sq->elts[wqe_idx].query_data = data;
+	/* Check if this is async mode. */
+	if (user_data) {
+		struct mlx5_hw_q_job *job = (struct mlx5_hw_q_job *)user_data;
+
+		sq->elts[wqe_idx].ct = user_data;
+		job->out_data = (char *)((uintptr_t)sq->mr.addr + wqe_idx * 64);
+	} else {
+		sq->elts[wqe_idx].query_data = data;
+		sq->elts[wqe_idx].ct = ct;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
@@ -1317,9 +1419,13 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	 * data segment is not used in this case.
 	 */
 	sq->pi += 2;
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1405,20 +1511,29 @@ int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
-			  const struct rte_flow_action_conntrack *profile)
+			  const struct rte_flow_action_conntrack *profile,
+			  void *user_data,
+			  bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
 	struct mlx5_aso_sq *sq;
 	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
+	int ret;
 
 	if (sh->config.dv_flow_en == 2)
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						    need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
-		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
+		mlx5_aso_ct_completion_handle(sh, sq,  need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						  need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
@@ -1478,7 +1593,7 @@ mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
  * @param[in] wdata
  *   Pointer to data fetched from hardware.
  */
-static inline void
+void
 mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
 			char *wdata)
 {
@@ -1562,7 +1677,8 @@ int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
-			 struct rte_flow_action_conntrack *profile)
+			 struct rte_flow_action_conntrack *profile,
+			 void *user_data, bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
@@ -1575,9 +1691,15 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+						  need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+				need_lock, NULL, true);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1628,7 +1750,8 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		rte_errno = ENXIO;
 		return -rte_errno;
 	} else if (state == ASO_CONNTRACK_READY ||
-		   state == ASO_CONNTRACK_QUERY) {
+		   state == ASO_CONNTRACK_QUERY ||
+		   state == ASO_CONNTRACK_WAIT_ASYNC) {
 		return 0;
 	}
 	do {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 58a7e94ee0..085cb23c78 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -13091,7 +13091,7 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro, NULL, true)) {
 		flow_dv_aso_ct_dev_release(dev, idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -15904,7 +15904,7 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		if (ret)
 			return ret;
 		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						ct, new_prf);
+						ct, new_prf, NULL, true);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16740,7 +16740,8 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct,
+					data, NULL, true))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 5c0981d385..1879c8e9ca 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -1161,9 +1161,9 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 }
 
 static __rte_always_inline struct mlx5_aso_mtr *
-flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
-			   const struct rte_flow_action *action,
-			   uint32_t queue)
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action *action,
+			 void *user_data, bool push)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1183,13 +1183,14 @@ flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
 	fm->is_enable = meter_mark->state;
 	fm->color_aware = meter_mark->color_mode;
 	aso_mtr->pool = pool;
-	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->state = (queue == MLX5_HW_INV_QUEUE) ?
+			  ASO_METER_WAIT : ASO_METER_WAIT_ASYNC;
 	aso_mtr->offset = mtr_id - 1;
 	aso_mtr->init_color = (meter_mark->color_mode) ?
 		meter_mark->init_color : RTE_COLOR_GREEN;
 	/* Update ASO flow meter by wqe. */
 	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-					 &priv->mtr_bulk)) {
+					 &priv->mtr_bulk, user_data, push)) {
 		mlx5_ipool_free(pool->idx_pool, mtr_id);
 		return NULL;
 	}
@@ -1214,7 +1215,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_aso_mtr *aso_mtr;
 
-	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, NULL, true);
 	if (!aso_mtr)
 		return -1;
 
@@ -2278,9 +2279,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				rte_col_2_mlx5_col(aso_mtr->init_color);
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/*
+			 * Allocate meter directly will slow down flow
+			 * insertion rate.
+			 */
 			ret = flow_hw_meter_mark_compile(dev,
 				act_data->action_dst, action,
-				rule_acts, &job->flow->mtr_id, queue);
+				rule_acts, &job->flow->mtr_id, MLX5_HW_INV_QUEUE);
 			if (ret != 0)
 				return ret;
 			break;
@@ -2587,6 +2592,74 @@ flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
 	}
 }
 
+static inline int
+__flow_hw_pull_indir_action_comp(struct rte_eth_dev *dev,
+				 uint32_t queue,
+				 struct rte_flow_op_result res[],
+				 uint16_t n_res)
+
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *r = priv->hw_q[queue].indir_cq;
+	struct mlx5_hw_q_job *job;
+	void *user_data = NULL;
+	uint32_t type, idx;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_aso_ct_action *aso_ct;
+	int ret_comp, i;
+
+	ret_comp = (int)rte_ring_count(r);
+	if (ret_comp > n_res)
+		ret_comp = n_res;
+	for (i = 0; i < ret_comp; i++) {
+		rte_ring_dequeue(r, &user_data);
+		res[i].user_data = user_data;
+		res[i].status = RTE_FLOW_OP_SUCCESS;
+	}
+	if (ret_comp < n_res && priv->hws_mpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->hws_mpool->sq[queue],
+				&res[ret_comp], n_res - ret_comp);
+	if (ret_comp < n_res && priv->hws_ctpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->ct_mng->aso_sqs[queue],
+				&res[ret_comp], n_res - ret_comp);
+	for (i = 0; i <  ret_comp; i++) {
+		job = (struct mlx5_hw_q_job *)res[i].user_data;
+		/* Restore user data. */
+		res[i].user_data = job->user_data;
+		if (job->type == MLX5_HW_Q_JOB_TYPE_DESTROY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				mlx5_ipool_free(priv->hws_mpool->idx_pool, idx);
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_CREATE) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				aso_mtr = mlx5_ipool_get(priv->hws_mpool->idx_pool, idx);
+				aso_mtr->state = ASO_METER_READY;
+			} else if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_QUERY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				mlx5_aso_ct_obj_analyze(job->profile,
+							job->out_data);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		}
+		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
+	}
+	return ret_comp;
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2619,6 +2692,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
+	/* 1. Pull the flow completion. */
 	ret = mlx5dr_send_queue_poll(priv->dr_ctx, queue, res, n_res);
 	if (ret < 0)
 		return rte_flow_error_set(error, rte_errno,
@@ -2644,9 +2718,34 @@ flow_hw_pull(struct rte_eth_dev *dev,
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
 	}
+	/* 2. Pull indirect action comp. */
+	if (ret < n_res)
+		ret += __flow_hw_pull_indir_action_comp(dev, queue, &res[ret],
+							n_res - ret);
 	return ret;
 }
 
+static inline void
+__flow_hw_push_action(struct rte_eth_dev *dev,
+		    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *iq = priv->hw_q[queue].indir_iq;
+	struct rte_ring *cq = priv->hw_q[queue].indir_cq;
+	void *job = NULL;
+	uint32_t ret, i;
+
+	ret = rte_ring_count(iq);
+	for (i = 0; i < ret; i++) {
+		rte_ring_dequeue(iq, &job);
+		rte_ring_enqueue(cq, job);
+	}
+	if (priv->hws_ctpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->ct_mng->aso_sqs[queue]);
+	if (priv->hws_mpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->hws_mpool->sq[queue]);
+}
+
 /**
  * Push the enqueued flows to HW.
  *
@@ -2670,6 +2769,7 @@ flow_hw_push(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret;
 
+	__flow_hw_push_action(dev, queue);
 	ret = mlx5dr_send_queue_action(priv->dr_ctx, queue,
 				       MLX5DR_SEND_QUEUE_ACTION_DRAIN);
 	if (ret) {
@@ -5906,7 +6006,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* Adds one queue to be used by PMD.
 	 * The last queue will be used by the PMD.
 	 */
-	uint16_t nb_q_updated;
+	uint16_t nb_q_updated = 0;
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
@@ -5973,6 +6073,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		goto err;
 	}
 	for (i = 0; i < nb_q_updated; i++) {
+		char mz_name[RTE_MEMZONE_NAMESIZE];
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 		struct rte_flow_item *items = NULL;
@@ -6000,6 +6101,22 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_cq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_cq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_cq)
+			goto err;
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_iq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_iq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_iq)
+			goto err;
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
 	dr_ctx_attr.queues = nb_q_updated;
@@ -6117,6 +6234,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
+	for (i = 0; i < nb_q_updated; i++) {
+		if (priv->hw_q[i].indir_iq)
+			rte_ring_free(priv->hw_q[i].indir_iq);
+		if (priv->hw_q[i].indir_cq)
+			rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	if (priv->acts_ipool) {
@@ -6146,7 +6269,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i;
+	uint32_t i;
 
 	if (!priv->dr_ctx)
 		return;
@@ -6192,6 +6315,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	for (i = 0; i < priv->nb_queue; i++) {
+		rte_ring_free(priv->hw_q[i].indir_iq);
+		rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -6380,8 +6507,9 @@ flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
 }
 
 static int
-flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t queue, uint32_t idx,
 			struct rte_flow_action_conntrack *profile,
+			void *user_data, bool push,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6405,7 +6533,7 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 	}
 	profile->peer_port = ct->peer;
 	profile->is_original_dir = ct->is_original;
-	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, queue, ct, profile, user_data, push))
 		return rte_flow_error_set(error, EIO,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -6417,7 +6545,8 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 static int
 flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_modify_conntrack *action_conf,
-			 uint32_t idx, struct rte_flow_error *error)
+			 uint32_t idx, void *user_data, bool push,
+			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
@@ -6448,7 +6577,8 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf,
+						user_data, push);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -6470,6 +6600,7 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 static struct rte_flow_action_handle *
 flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_action_conntrack *pro,
+			 void *user_data, bool push,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6496,7 +6627,7 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	ct->pool = pool;
-	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro, user_data, push)) {
 		mlx5_ipool_free(pool->cts, ct_idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -6588,15 +6719,29 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     struct rte_flow_error *error)
 {
 	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 	uint32_t age_idx;
+	bool push = true;
+	bool aso = false;
 
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx)) {
+			rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Flow queue full.");
+			return NULL;
+		}
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_CREATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (action->type) {
 	case RTE_FLOW_ACTION_TYPE_AGE:
 		age = action->conf;
@@ -6624,10 +6769,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 				 (uintptr_t)cnt_id;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
-		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		aso = true;
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, job,
+						  push, error);
 		break;
 	case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		aso = true;
+		aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, job, push);
 		if (!aso_mtr)
 			break;
 		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
@@ -6640,7 +6788,20 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	default:
 		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				   NULL, "action type not supported");
-		return NULL;
+		break;
+	}
+	if (job) {
+		if (!handle) {
+			priv->hw_q[queue].job_idx++;
+			return NULL;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return handle;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
 	return handle;
 }
@@ -6674,32 +6835,56 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_modify_conntrack *ct_conf =
+		(const struct rte_flow_modify_conntrack *)update;
 	const struct rte_flow_update_meter_mark *upd_meter_mark =
 		(const struct rte_flow_update_meter_mark *)update;
 	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+	int ret = 0;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action update failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_UPDATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_update(priv, idx, update, error);
+		ret = mlx5_hws_age_action_update(priv, idx, update, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+		if (ct_conf->state)
+			aso = true;
+		ret = flow_hw_conntrack_update(dev, queue, update, act_idx,
+					       job, push, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso = true;
 		meter_mark = &upd_meter_mark->meter_mark;
 		/* Find ASO object. */
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark update index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		if (upd_meter_mark->profile_valid)
 			fm->profile = (struct mlx5_flow_meter_profile *)
@@ -6713,25 +6898,46 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			fm->is_enable = meter_mark->state;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
-						 aso_mtr, &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 aso_mtr, &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
+		}
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_update(dev, handle, update, error);
+		ret = flow_dv_action_update(dev, handle, update, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return 0;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 /**
@@ -6766,15 +6972,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
+	bool push = true;
+	bool aso = false;
+	int ret = 0;
 
-	RTE_SET_USED(queue);
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_DESTROY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_destroy(priv, age_idx, error);
+		ret = mlx5_hws_age_action_destroy(priv, age_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
 		if (age_idx != 0)
@@ -6783,39 +7002,69 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			 * time to update the AGE.
 			 */
 			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
-		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		ret = mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_destroy(dev, act_idx, error);
+		ret = flow_hw_conntrack_destroy(dev, act_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark destroy index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		fm->is_enable = 0;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-						 &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		mlx5_ipool_free(pool->idx_pool, idx);
+			break;
+		}
+		if (!job)
+			mlx5_ipool_free(pool->idx_pool, idx);
+		else
+			aso = true;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_destroy(dev, handle, error);
+		ret = flow_dv_action_destroy(dev, handle, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 static int
@@ -7045,28 +7294,76 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_action_query(struct rte_eth_dev *dev,
-		     const struct rte_flow_action_handle *handle, void *data,
-		     struct rte_flow_error *error)
+flow_hw_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+			    const struct rte_flow_op_attr *attr,
+			    const struct rte_flow_action_handle *handle,
+			    void *data, void *user_data,
+			    struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_q_job *job = NULL;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
+	int ret;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_QUERY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return flow_hw_query_age(dev, age_idx, data, error);
+		ret = flow_hw_query_age(dev, age_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
-		return flow_hw_query_counter(dev, act_idx, data, error);
+		ret = flow_hw_query_counter(dev, act_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_query(dev, handle, data, error);
+		aso = true;
+		if (job)
+			job->profile = (struct rte_flow_action_conntrack *)data;
+		ret = flow_hw_conntrack_query(dev, queue, act_idx, data,
+					      job, push, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
+	}
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
+	return 0;
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_query(dev, MLX5_HW_INV_QUEUE, NULL,
+			handle, data, NULL, error);
 }
 
 /**
@@ -7181,6 +7478,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
+	.async_action_query = flow_hw_action_handle_query,
 	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index fd1337ae73..480ac6c8ec 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -1627,7 +1627,7 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
@@ -1877,7 +1877,7 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1983,7 +1983,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
 	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
-					   &priv->mtr_bulk);
+					   &priv->mtr_bulk, NULL, true);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
 			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 15/17] net/mlx5: support flow integrity in HWS group 0
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (13 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 14/17] net/mlx5: add async action push and pull support Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 16/17] net/mlx5: support device control for E-Switch default rule Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 17/17] net/mlx5: support device control of representor matching Suanming Mou
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

- Reformat flow integrity item translation for HWS code.
- Support flow integrity bits in HWS group 0.
- Update integrity item translation to match positive semantics only.
Positive flow semantics was described in patch [ae37c0f60c].

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   1 +
 drivers/net/mlx5/mlx5_flow_dv.c | 163 ++++++++++++++++----------------
 drivers/net/mlx5/mlx5_flow_hw.c |   8 ++
 3 files changed, 90 insertions(+), 82 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index e45869a890..3f4aa080bb 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1462,6 +1462,7 @@ struct mlx5_dv_matcher_workspace {
 	struct mlx5_flow_rss_desc *rss_desc; /* RSS descriptor. */
 	const struct rte_flow_item *tunnel_item; /* Flow tunnel item. */
 	const struct rte_flow_item *gre_item; /* Flow GRE item. */
+	const struct rte_flow_item *integrity_items[2];
 };
 
 struct mlx5_flow_split_info {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 085cb23c78..758672568c 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12636,132 +12636,121 @@ flow_dv_aso_age_params_init(struct rte_eth_dev *dev,
 
 static void
 flow_dv_translate_integrity_l4(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v)
+			       void *headers)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value is used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l4_ok) {
 		/* RTE l4_ok filter aggregates hardware l4_ok and
 		 * l4_checksum_ok filters.
 		 * Positive RTE l4_ok match requires hardware match on both L4
 		 * hardware integrity bits.
-		 * For negative match, check hardware l4_checksum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L4.
+		 * PMD supports positive integrity item semantics only.
 		 */
-		if (value->l4_ok) {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_ok, 1);
-		}
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 !!value->l4_ok);
-	}
-	if (mask->l4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 value->l4_csum_ok);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_ok, 1);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
+	} else if (mask->l4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
 	}
 }
 
 static void
 flow_dv_translate_integrity_l3(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v, bool is_ipv4)
+			       void *headers, bool is_ipv4)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l3_ok) {
 		/* RTE l3_ok filter aggregates for IPv4 hardware l3_ok and
 		 * ipv4_csum_ok filters.
 		 * Positive RTE l3_ok match requires hardware match on both L3
 		 * hardware integrity bits.
-		 * For negative match, check hardware l3_csum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L3.
+		 * PMD supports positive integrity item semantics only.
 		 */
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l3_ok, 1);
 		if (is_ipv4) {
-			if (value->l3_ok) {
-				MLX5_SET(fte_match_set_lyr_2_4, headers_m,
-					 l3_ok, 1);
-				MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-					 l3_ok, 1);
-			}
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m,
+			MLX5_SET(fte_match_set_lyr_2_4, headers,
 				 ipv4_checksum_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-				 ipv4_checksum_ok, !!value->l3_ok);
-		} else {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l3_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l3_ok,
-				 value->l3_ok);
 		}
-	}
-	if (mask->ipv4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ipv4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ipv4_checksum_ok,
-			 value->ipv4_csum_ok);
+	} else if (is_ipv4 && mask->ipv4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, ipv4_checksum_ok, 1);
 	}
 }
 
 static void
-set_integrity_bits(void *headers_m, void *headers_v,
-		   const struct rte_flow_item *integrity_item, bool is_l3_ip4)
+set_integrity_bits(void *headers, const struct rte_flow_item *integrity_item,
+		   bool is_l3_ip4, uint32_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = integrity_item->spec;
-	const struct rte_flow_item_integrity *mask = integrity_item->mask;
+	const struct rte_flow_item_integrity *spec;
+	const struct rte_flow_item_integrity *mask;
 
 	/* Integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (!mask)
-		mask = &rte_flow_item_integrity_mask;
-	flow_dv_translate_integrity_l3(mask, spec, headers_m, headers_v,
-				       is_l3_ip4);
-	flow_dv_translate_integrity_l4(mask, spec, headers_m, headers_v);
+	if (MLX5_ITEM_VALID(integrity_item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(integrity_item, key_type, spec, mask,
+			 &rte_flow_item_integrity_mask);
+	flow_dv_translate_integrity_l3(mask, headers, is_l3_ip4);
+	flow_dv_translate_integrity_l4(mask, headers);
 }
 
 static void
-flow_dv_translate_item_integrity_post(void *matcher, void *key,
+flow_dv_translate_item_integrity_post(void *key,
 				      const
 				      struct rte_flow_item *integrity_items[2],
-				      uint64_t pattern_flags)
+				      uint64_t pattern_flags, uint32_t key_type)
 {
-	void *headers_m, *headers_v;
+	void *headers;
 	bool is_l3_ip4;
 
 	if (pattern_flags & MLX5_FLOW_ITEM_INNER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 inner_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_INNER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[1], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[1], is_l3_ip4,
+				   key_type);
 	}
 	if (pattern_flags & MLX5_FLOW_ITEM_OUTER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 outer_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_OUTER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[0], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[0], is_l3_ip4,
+				   key_type);
 	}
 }
 
-static void
+static uint64_t
 flow_dv_translate_item_integrity(const struct rte_flow_item *item,
-				 const struct rte_flow_item *integrity_items[2],
-				 uint64_t *last_item)
+				 struct mlx5_dv_matcher_workspace *wks,
+				 uint64_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = (typeof(spec))item->spec;
+	if ((key_type & MLX5_SET_MATCHER_SW) != 0) {
+		const struct rte_flow_item_integrity
+			*spec = (typeof(spec))item->spec;
 
-	/* integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (spec->level > 1) {
-		integrity_items[1] = item;
-		*last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		/* SWS integrity bits validation cleared spec pointer */
+		if (spec->level > 1) {
+			wks->integrity_items[1] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		} else {
+			wks->integrity_items[0] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		}
 	} else {
-		integrity_items[0] = item;
-		*last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		/* HWS supports outer integrity only */
+		wks->integrity_items[0] = item;
+		wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
 	}
+	return wks->last_item;
 }
 
 /**
@@ -13389,6 +13378,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		flow_dv_translate_item_meter_color(dev, key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_METER_COLOR;
 		break;
+	case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+		last_item = flow_dv_translate_item_integrity(items,
+							     wks, key_type);
+		break;
 	default:
 		break;
 	}
@@ -13452,6 +13445,12 @@ flow_dv_translate_items_hws(const struct rte_flow_item *items,
 		if (ret)
 			return ret;
 	}
+	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
+		flow_dv_translate_item_integrity_post(key,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      key_type);
+	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(key,
 						 wks.tunnel_item,
@@ -13532,7 +13531,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			     mlx5_flow_get_thread_workspace())->rss_desc,
 	};
 	struct mlx5_dv_matcher_workspace wks_m = wks;
-	const struct rte_flow_item *integrity_items[2] = {NULL, NULL};
 	int ret = 0;
 	int tunnel;
 
@@ -13543,10 +13541,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 						  NULL, "item not supported");
 		tunnel = !!(wks.item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		switch (items->type) {
-		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
-			flow_dv_translate_item_integrity(items, integrity_items,
-							 &wks.last_item);
-			break;
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			flow_dv_translate_item_aso_ct(dev, match_mask,
 						      match_value, items);
@@ -13588,9 +13582,14 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			return -rte_errno;
 	}
 	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
-		flow_dv_translate_item_integrity_post(match_mask, match_value,
-						      integrity_items,
-						      wks.item_flags);
+		flow_dv_translate_item_integrity_post(match_mask,
+						      wks_m.integrity_items,
+						      wks_m.item_flags,
+						      MLX5_SET_MATCHER_SW_M);
+		flow_dv_translate_item_integrity_post(match_value,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      MLX5_SET_MATCHER_SW_V);
 	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(match_mask,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1879c8e9ca..31f98a2636 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4618,6 +4618,14 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
+		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+			/*
+			 * Integrity flow item validation require access to
+			 * both item mask and spec.
+			 * Current HWS model allows item mask in pattern
+			 * template and item spec in flow rule.
+			 */
+			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
 			break;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 16/17] net/mlx5: support device control for E-Switch default rule
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (14 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 15/17] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  2022-09-28  3:31   ` [PATCH v2 17/17] net/mlx5: support device control of representor matching Suanming Mou
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Dariusz Sosnowski, Xueming Li

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch introduces
This patch adds support for fdb_def_rule_en device argument to HW
Steering, which controls:

- creation of default FDB jump flow rule,
- ability of the user to create transfer flow rules in root table.

A new PMD API to allow user application to enable traffic with port
ID and SQ number is also added to direct packet to wire.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  14 ++
 drivers/net/mlx5/mlx5.h          |   4 +-
 drivers/net/mlx5/mlx5_flow.c     |  28 ++--
 drivers/net/mlx5/mlx5_flow.h     |  11 +-
 drivers/net/mlx5/mlx5_flow_dv.c  |  46 ++---
 drivers/net/mlx5/mlx5_flow_hw.c  | 279 +++++++++++++++----------------
 drivers/net/mlx5/mlx5_trigger.c  |  31 ++--
 drivers/net/mlx5/mlx5_tx.h       |   1 +
 drivers/net/mlx5/mlx5_txq.c      |  47 ++++++
 drivers/net/mlx5/rte_pmd_mlx5.h  |  17 ++
 drivers/net/mlx5/version.map     |   1 +
 11 files changed, 274 insertions(+), 205 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 60a1a391fb..de8c003d02 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1554,6 +1554,20 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
+		if (priv->sh->config.dv_esw_en) {
+			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
+				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
+					     "but it is disabled (configure it through devlink)");
+				err = ENOTSUP;
+				goto error;
+			}
+			if (priv->sh->dv_regc0_mask == 0) {
+				DRV_LOG(ERR, "E-Switch with HWS is not supported "
+					     "(no available bits in reg_c[0])");
+				err = ENOTSUP;
+				goto error;
+			}
+		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5d92df8965..9299ffe9f9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2015,7 +2015,7 @@ int mlx5_flow_ops_get(struct rte_eth_dev *dev, const struct rte_flow_ops **ops);
 int mlx5_flow_start_default(struct rte_eth_dev *dev);
 void mlx5_flow_stop_default(struct rte_eth_dev *dev);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
-int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t sq_num);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
@@ -2027,7 +2027,7 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev,
 int mlx5_flow_lacp_miss(struct rte_eth_dev *dev);
 struct rte_flow *mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev);
 uint32_t mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev,
-					    uint32_t txq);
+					    uint32_t sq_num);
 void mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 				       uint64_t async_id, int status);
 void mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index bc2ccb4d3c..2142cd828a 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7155,14 +7155,14 @@ mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param txq
- *   Txq index.
+ * @param sq_num
+ *   SQ number.
  *
  * @return
  *   Flow ID on success, 0 otherwise and rte_errno is set.
  */
 uint32_t
-mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sq_num)
 {
 	struct rte_flow_attr attr = {
 		.group = 0,
@@ -7174,8 +7174,8 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_flow_item_port_id port_spec = {
 		.id = MLX5_PORT_ESW_MGR,
 	};
-	struct mlx5_rte_flow_item_tx_queue txq_spec = {
-		.queue = txq,
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sq_num,
 	};
 	struct rte_flow_item pattern[] = {
 		{
@@ -7184,8 +7184,8 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		},
 		{
 			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
-			.spec = &txq_spec,
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &sq_spec,
 		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
@@ -7556,30 +7556,30 @@ mlx5_flow_verify(struct rte_eth_dev *dev __rte_unused)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param queue
- *   The queue index.
+ * @param sq_num
+ *   The SQ hw number.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
-			    uint32_t queue)
+			    uint32_t sq_num)
 {
 	const struct rte_flow_attr attr = {
 		.egress = 1,
 		.priority = 0,
 	};
-	struct mlx5_rte_flow_item_tx_queue queue_spec = {
-		.queue = queue,
+	struct mlx5_rte_flow_item_sq queue_spec = {
+		.queue = sq_num,
 	};
-	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
 		{
 			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
 			.spec = &queue_spec,
 			.last = NULL,
 			.mask = &queue_mask,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 3f4aa080bb..63f946473d 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -29,7 +29,7 @@
 enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
-	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+	MLX5_RTE_FLOW_ITEM_TYPE_SQ,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
@@ -115,8 +115,8 @@ struct mlx5_flow_action_copy_mreg {
 };
 
 /* Matches on source queue. */
-struct mlx5_rte_flow_item_tx_queue {
-	uint32_t queue;
+struct mlx5_rte_flow_item_sq {
+	uint32_t queue; /* DevX SQ number */
 };
 
 /* Feature name to allocate metadata register. */
@@ -179,7 +179,7 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_LAYER_GENEVE (1u << 26)
 
 /* Queue items. */
-#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 27)
+#define MLX5_FLOW_ITEM_SQ (1u << 27)
 
 /* Pattern tunnel Layer bits (continued). */
 #define MLX5_FLOW_LAYER_GTP (1u << 28)
@@ -2475,9 +2475,8 @@ int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 
 int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
 
-int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
-					 uint32_t txq);
+					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 758672568c..e5c09c27eb 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -7453,8 +7453,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 				return ret;
 			last_item = MLX5_FLOW_ITEM_TAG;
 			break;
-		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
-			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
+			last_item = MLX5_FLOW_ITEM_SQ;
 			break;
 		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
 			break;
@@ -8343,7 +8343,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 	 * work due to metadata regC0 mismatch.
 	 */
 	if ((!attr->transfer && attr->egress) && priv->representor &&
-	    !(item_flags & MLX5_FLOW_ITEM_TX_QUEUE))
+	    !(item_flags & MLX5_FLOW_ITEM_SQ))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ITEM,
 					  NULL,
@@ -11390,10 +11390,10 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
 }
 
 /**
- * Add Tx queue matcher
+ * Add SQ matcher
  *
- * @param[in] dev
- *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
  * @param[in, out] key
  *   Flow matcher value.
  * @param[in] item
@@ -11402,40 +11402,29 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
  *   Set flow matcher mask or value.
  */
 static void
-flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
-				void *key,
-				const struct rte_flow_item *item,
-				uint32_t key_type)
+flow_dv_translate_item_sq(void *key,
+			  const struct rte_flow_item *item,
+			  uint32_t key_type)
 {
-	const struct mlx5_rte_flow_item_tx_queue *queue_m;
-	const struct mlx5_rte_flow_item_tx_queue *queue_v;
-	const struct mlx5_rte_flow_item_tx_queue queue_mask = {
+	const struct mlx5_rte_flow_item_sq *queue_m;
+	const struct mlx5_rte_flow_item_sq *queue_v;
+	const struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
-	void *misc_v =
-		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
-	struct mlx5_txq_ctrl *txq = NULL;
+	void *misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 	uint32_t queue;
 
 	MLX5_ITEM_UPDATE(item, key_type, queue_v, queue_m, &queue_mask);
 	if (!queue_m || !queue_v)
 		return;
 	if (key_type & MLX5_SET_MATCHER_V) {
-		txq = mlx5_txq_get(dev, queue_v->queue);
-		if (!txq)
-			return;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = queue_v->queue;
 		if (key_type == MLX5_SET_MATCHER_SW_V)
 			queue &= queue_m->queue;
 	} else {
 		queue = queue_m->queue;
 	}
 	MLX5_SET(fte_match_set_misc, misc_v, source_sqn, queue);
-	if (txq)
-		mlx5_txq_release(dev, queue_v->queue);
 }
 
 /**
@@ -13341,9 +13330,9 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		flow_dv_translate_mlx5_item_tag(dev, key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_TAG;
 		break;
-	case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
-		flow_dv_translate_item_tx_queue(dev, key, items, key_type);
-		last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+	case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
+		flow_dv_translate_item_sq(key, items, key_type);
+		last_item = MLX5_FLOW_ITEM_SQ;
 		break;
 	case RTE_FLOW_ITEM_TYPE_GTP:
 		flow_dv_translate_item_gtp(key, items, tunnel, key_type);
@@ -13552,7 +13541,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			wks.last_item = tunnel ? MLX5_FLOW_ITEM_INNER_FLEX :
 						 MLX5_FLOW_ITEM_OUTER_FLEX;
 			break;
-
 		default:
 			ret = flow_dv_translate_items(dev, items, &wks_m,
 				match_mask, MLX5_SET_MATCHER_SW_M, error);
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 31f98a2636..6a7f1376f7 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3154,7 +3154,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+	if (priv->sh->config.dv_esw_en &&
+	    priv->fdb_def_rule &&
+	    cfg->external &&
+	    flow_attr->transfer) {
 		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -4610,7 +4613,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_GTP:
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
-		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
 		case RTE_FLOW_ITEM_TYPE_GRE:
 		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
@@ -5103,14 +5106,23 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 }
 
 static uint32_t
-flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
-	uint32_t usable_mask = ~priv->vport_meta_mask;
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
 
-	if (usable_mask)
-		return (1 << rte_bsf32(usable_mask));
-	else
-		return 0;
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return mask;
+}
+
+static uint32_t
+flow_hw_esw_mgr_regc_marker(struct rte_eth_dev *dev)
+{
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return RTE_BIT32(rte_bsf32(mask));
 }
 
 /**
@@ -5136,12 +5148,19 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 	struct rte_flow_item_ethdev port_mask = {
 		.port_id = UINT16_MAX,
 	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
 	struct rte_flow_item items[] = {
 		{
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &port_spec,
 			.mask = &port_mask,
 		},
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
@@ -5151,9 +5170,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match REG_C_0 and a TX queue.
- * Matching on REG_C_0 is set up to match on least significant bit usable
- * by user-space, which is set when packet was originated from E-Switch Manager.
+ * Creates a flow pattern template used to match REG_C_0 and a SQ.
+ * Matching on REG_C_0 is set up to match on all bits usable by user-space.
+ * If traffic was sent from E-Switch Manager, then all usable bits will be set to 0,
+ * except the least significant bit, which will be set to 1.
  *
  * This template is used to set up a table for SQ miss default flow.
  *
@@ -5166,8 +5186,6 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_pattern_template *
 flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
@@ -5177,8 +5195,9 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
-	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
@@ -5190,7 +5209,7 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 		{
 			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
 			.mask = &queue_mask,
 		},
 		{
@@ -5198,12 +5217,6 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
-		return NULL;
-	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -5295,9 +5308,8 @@ flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_actions_template *
 flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
-	uint32_t marker_bit_mask = UINT32_MAX;
+	uint32_t marker_mask = flow_hw_esw_mgr_regc_marker_mask(dev);
+	uint32_t marker_bits = flow_hw_esw_mgr_regc_marker(dev);
 	struct rte_flow_actions_template_attr attr = {
 		.transfer = 1,
 	};
@@ -5310,7 +5322,7 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		.src = {
 			.field = RTE_FLOW_FIELD_VALUE,
 		},
-		.width = 1,
+		.width = __builtin_popcount(marker_mask),
 	};
 	struct rte_flow_action_modify_field set_reg_m = {
 		.operation = RTE_FLOW_MODIFY_SET,
@@ -5357,13 +5369,9 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		}
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
-		return NULL;
-	}
-	set_reg_v.dst.offset = rte_bsf32(marker_bit);
-	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
-	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	set_reg_v.dst.offset = rte_bsf32(marker_mask);
+	rte_memcpy(set_reg_v.src.value, &marker_bits, sizeof(marker_bits));
+	rte_memcpy(set_reg_m.src.value, &marker_mask, sizeof(marker_mask));
 	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
 }
 
@@ -5550,7 +5558,7 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -5665,7 +5673,7 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.priority = 0,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -7727,141 +7735,123 @@ flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
 }
 
 int
-mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sqn)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_item_ethdev port_spec = {
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev esw_mgr_spec = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item_ethdev port_mask = {
+	struct rte_flow_item_ethdev esw_mgr_mask = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item items[] = {
-		{
-			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-			.spec = &port_spec,
-			.mask = &port_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
-	};
-	struct rte_flow_action_modify_field modify_field = {
-		.operation = RTE_FLOW_MODIFY_SET,
-		.dst = {
-			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-		},
-		.src = {
-			.field = RTE_FLOW_FIELD_VALUE,
-		},
-		.width = 1,
-	};
-	struct rte_flow_action_jump jump = {
-		.group = 1,
-	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
-			.conf = &modify_field,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_JUMP,
-			.conf = &jump,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
-
-	MLX5_ASSERT(priv->master);
-	if (!priv->dr_ctx ||
-	    !priv->hw_esw_sq_miss_root_tbl)
-		return 0;
-	return flow_hw_create_ctrl_flow(dev, dev,
-					priv->hw_esw_sq_miss_root_tbl,
-					items, 0, actions, 0);
-}
-
-int
-mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
-{
-	uint16_t port_id = dev->data->port_id;
 	struct rte_flow_item_tag reg_c0_spec = {
 		.index = (uint8_t)REG_C_0,
+		.data = flow_hw_esw_mgr_regc_marker(dev),
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
-	struct mlx5_rte_flow_item_tx_queue queue_spec = {
-		.queue = txq,
-	};
-	struct mlx5_rte_flow_item_tx_queue queue_mask = {
-		.queue = UINT32_MAX,
-	};
-	struct rte_flow_item items[] = {
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
-			.spec = &reg_c0_spec,
-			.mask = &reg_c0_mask,
-		},
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
-			.spec = &queue_spec,
-			.mask = &queue_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
 	};
 	struct rte_flow_action_ethdev port = {
 		.port_id = port_id,
 	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
-			.conf = &port,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
+	struct rte_flow_item items[3] = { { 0 } };
+	struct rte_flow_action actions[3] = { { 0 } };
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
-	uint32_t marker_bit;
 	int ret;
 
-	RTE_SET_USED(txq);
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default SQ miss flows.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default SQ miss flows. Default flows will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
 	    !proxy_priv->hw_esw_sq_miss_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
-		rte_errno = EINVAL;
-		return -rte_errno;
+	/*
+	 * Create a root SQ miss flow rule - match E-Switch Manager and SQ,
+	 * and jump to group 1.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = &esw_mgr_spec,
+		.mask = &esw_mgr_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_JUMP,
+	};
+	actions[2] = (struct rte_flow_action) {
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_root_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create root SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
 	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
-	return flow_hw_create_ctrl_flow(dev, proxy_dev,
-					proxy_priv->hw_esw_sq_miss_tbl,
-					items, 0, actions, 0);
+	/*
+	 * Create a non-root SQ miss flow rule - match REG_C_0 marker and SQ,
+	 * and forward to port.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &reg_c0_spec,
+		.mask = &reg_c0_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+		.conf = &port,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create HWS SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
+	}
+	return 0;
 }
 
 int
@@ -7899,17 +7889,24 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default FDB jump rule.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default FDB jump rule. Default rule will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_zero_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 2603196933..a973cbc5e3 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -426,7 +426,7 @@ mlx5_hairpin_queue_peer_update(struct rte_eth_dev *dev, uint16_t peer_queue,
 			mlx5_txq_release(dev, peer_queue);
 			return -rte_errno;
 		}
-		peer_info->qp_id = txq_ctrl->obj->sq->id;
+		peer_info->qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		peer_info->vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		/* 1-to-1 mapping, only the first one is used. */
 		peer_info->peer_q = txq_ctrl->hairpin_conf.peers[0].queue;
@@ -818,7 +818,7 @@ mlx5_hairpin_bind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 		}
 		/* Pass TxQ's information to peer RxQ and try binding. */
 		cur.peer_q = rx_queue;
-		cur.qp_id = txq_ctrl->obj->sq->id;
+		cur.qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		cur.vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		cur.tx_explicit = txq_ctrl->hairpin_conf.tx_explicit;
 		cur.manual_bind = txq_ctrl->hairpin_conf.manual_bind;
@@ -1300,8 +1300,6 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	int ret;
 
 	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
-			goto error;
 		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
 			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
 				goto error;
@@ -1312,10 +1310,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 
 		if (!txq)
 			continue;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = mlx5_txq_get_sqn(txq);
 		if ((priv->representor || priv->master) &&
 		    priv->sh->config.dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
@@ -1325,9 +1320,15 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
-			goto error;
+	if (priv->sh->config.fdb_def_rule) {
+		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				goto error;
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled", dev->data->port_id);
 	}
 	return 0;
 error:
@@ -1393,14 +1394,18 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		    txq_ctrl->hairpin_conf.tx_explicit == 0 &&
 		    txq_ctrl->hairpin_conf.peers[0].port ==
 		    priv->dev_data->port_id) {
-			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			ret = mlx5_ctrl_flow_source_queue(dev,
+					mlx5_txq_get_sqn(txq_ctrl));
 			if (ret) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
 		if (priv->sh->config.dv_esw_en) {
-			if (mlx5_flow_create_devx_sq_miss_flow(dev, i) == 0) {
+			uint32_t q = mlx5_txq_get_sqn(txq_ctrl);
+
+			if (mlx5_flow_create_devx_sq_miss_flow(dev, q) == 0) {
+				mlx5_txq_release(dev, i);
 				DRV_LOG(ERR,
 					"Port %u Tx queue %u SQ create representor devx default miss rule failed.",
 					dev->data->port_id, i);
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index e0fc1872fe..6471ebf59f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -213,6 +213,7 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
 uint64_t mlx5_get_tx_port_offloads(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9150ced72d..7a0f1d61a5 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -27,6 +27,8 @@
 #include "mlx5_tx.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
+#include "rte_pmd_mlx5.h"
+#include "mlx5_flow.h"
 
 /**
  * Allocate TX queue elements.
@@ -1274,6 +1276,51 @@ mlx5_txq_verify(struct rte_eth_dev *dev)
 	return ret;
 }
 
+int
+mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq)
+{
+	return txq->is_hairpin ? txq->obj->sq->id : txq->obj->sq_obj.sq->id;
+}
+
+int
+rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint32_t flow;
+
+	if (rte_eth_dev_is_valid_port(port_id) < 0) {
+		DRV_LOG(ERR, "There is no Ethernet device for port %u.",
+			port_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if ((!priv->representor && !priv->master) ||
+	    !priv->sh->config.dv_esw_en) {
+		DRV_LOG(ERR, "Port %u must be represetnor or master port in E-Switch mode.",
+			port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (sq_num == 0) {
+		DRV_LOG(ERR, "Invalid SQ number.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_flow_hw_esw_create_sq_miss_flow(dev, sq_num);
+#endif
+	flow = mlx5_flow_create_devx_sq_miss_flow(dev, sq_num);
+	if (flow > 0)
+		return 0;
+	DRV_LOG(ERR, "Port %u failed to create default miss flow for SQ %u.",
+		port_id, sq_num);
+	return -rte_errno;
+}
+
 /**
  * Set the Tx queue dynamic timestamp (mask and offset)
  *
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index fbfdd9737b..d4caea5b20 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -139,6 +139,23 @@ int rte_pmd_mlx5_external_rx_queue_id_unmap(uint16_t port_id,
 __rte_experimental
 int rte_pmd_mlx5_host_shaper_config(int port_id, uint8_t rate, uint32_t flags);
 
+/**
+ * Enable traffic for external SQ.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] sq_num
+ *   SQ HW number.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ *   Possible values for rte_errno:
+ *   - EINVAL - invalid sq_number or port type.
+ *   - ENODEV - there is no Ethernet device for this port id.
+ */
+__rte_experimental
+int rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/mlx5/version.map b/drivers/net/mlx5/version.map
index 9942de5079..848270da13 100644
--- a/drivers/net/mlx5/version.map
+++ b/drivers/net/mlx5/version.map
@@ -14,4 +14,5 @@ EXPERIMENTAL {
 	rte_pmd_mlx5_external_rx_queue_id_unmap;
 	# added in 22.07
 	rte_pmd_mlx5_host_shaper_config;
+	rte_pmd_mlx5_external_sq_enable;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v2 17/17] net/mlx5: support device control of representor matching
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (15 preceding siblings ...)
  2022-09-28  3:31   ` [PATCH v2 16/17] net/mlx5: support device control for E-Switch default rule Suanming Mou
@ 2022-09-28  3:31   ` Suanming Mou
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-28  3:31 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

In some E-Switch use cases applications want to receive all traffic
on a single port. Since currently flow API does not provide a way to
match traffic forwarded to any port representor, this patch adds
support for controlling representor matching on ingress flow rules.

Representor matching is controlled through new device argument
repr_matching_en.

- If representor matching is enabled (default setting),
  then each ingress pattern template has an implicit REPRESENTED_PORT
  item added. Flow rules based on this pattern template will match
  the vport associated with port on which rule is created.
- If representor matching is disabled, then there will be no implicit
  item added. As a result ingress flow rules will match traffic
  coming to any port, not only the port on which flow rule is created.

Representor matching is enabled by default, to provide an expected
default behavior.

This patch enables egress flow rules on representors when E-Switch is
enabled in the following configurations:

- repr_matching_en=1 and dv_xmeta_en=4
- repr_matching_en=1 and dv_xmeta_en=0
- repr_matching_en=0 and dv_xmeta_en=0

When representor matching is enabled, the following logic is
implemented:

1. Creating an egress template table in group 0 for each port. These
   tables will hold default flow rules defined as follows:

      pattern SQ
      actions MODIFY_FIELD (set available bits in REG_C_0 to
                            vport_meta_tag)
              MODIFY_FIELD (copy REG_A to REG_C_1, only when
                            dv_xmeta_en == 4)
              JUMP (group 1)

2. Egress pattern templates created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   available bits of REG_C_0.

3. Egress flow rules created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   vport_meta_tag placed in available bits of REG_C_0.

4. Egress template tables created by an application, which are in
   group n, are placed in group n + 1.

5. Items and actions related to META are operating on REG_A when
   dv_xmeta_en == 0 or REG_C_1 when dv_xmeta_en == 4.

When representor matching is disabled and extended metadata is disabled,
no changes to current logic are required.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  11 +
 drivers/net/mlx5/mlx5.c          |  13 +
 drivers/net/mlx5/mlx5.h          |   5 +
 drivers/net/mlx5/mlx5_flow.c     |   8 +-
 drivers/net/mlx5/mlx5_flow.h     |   7 +
 drivers/net/mlx5/mlx5_flow_hw.c  | 718 ++++++++++++++++++++++++-------
 drivers/net/mlx5/mlx5_trigger.c  | 167 ++++++-
 7 files changed, 771 insertions(+), 158 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index de8c003d02..50d34b152a 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1555,6 +1555,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		if (priv->sh->config.dv_esw_en) {
+			uint32_t usable_bits;
+			uint32_t required_bits;
+
 			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
 				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
 					     "but it is disabled (configure it through devlink)");
@@ -1567,6 +1570,14 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				err = ENOTSUP;
 				goto error;
 			}
+			usable_bits = __builtin_popcount(priv->sh->dv_regc0_mask);
+			required_bits = __builtin_popcount(priv->vport_meta_mask);
+			if (usable_bits < required_bits) {
+				DRV_LOG(ERR, "Not enough bits available in reg_c[0] to provide "
+					     "representor matching.");
+				err = ENOTSUP;
+				goto error;
+			}
 		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 742607509b..c249619a60 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -181,6 +181,9 @@
 /* HW steering counter's query interval. */
 #define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
 
+/* Device parameter to control representor matching in ingress/egress flows with HWS. */
+#define MLX5_REPR_MATCHING_EN "repr_matching_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1283,6 +1286,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->cnt_svc.service_core = tmp;
 	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
 		config->cnt_svc.cycle_time = tmp;
+	} else if (strcmp(MLX5_REPR_MATCHING_EN, key) == 0) {
+		config->repr_matching = !!tmp;
 	}
 	return 0;
 }
@@ -1321,6 +1326,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_FDB_DEFAULT_RULE_EN,
 		MLX5_HWS_CNT_SERVICE_CORE,
 		MLX5_HWS_CNT_CYCLE_TIME,
+		MLX5_REPR_MATCHING_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1335,6 +1341,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->fdb_def_rule = 1;
 	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
 	config->cnt_svc.service_core = rte_get_main_lcore();
+	config->repr_matching = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1368,6 +1375,11 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 			config->dv_xmeta_en);
 		config->dv_xmeta_en = MLX5_XMETA_MODE_LEGACY;
 	}
+	if (config->dv_flow_en != 2 && !config->repr_matching) {
+		DRV_LOG(DEBUG, "Disabling representor matching is valid only "
+			       "when HW Steering is enabled.");
+		config->repr_matching = 1;
+	}
 	if (config->tx_pp && !sh->dev_cap.txpp_en) {
 		DRV_LOG(ERR, "Packet pacing is not supported.");
 		rte_errno = ENODEV;
@@ -1411,6 +1423,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
 	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
+	DRV_LOG(DEBUG, "\"repr_matching_en\" is %u.", config->repr_matching);
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9299ffe9f9..25719401c8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -316,6 +316,7 @@ struct mlx5_sh_config {
 	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
+	uint32_t repr_matching:1; /* Enable implicit vport matching in HWS FDB. */
 };
 
 /* Structure for VF VLAN workaround. */
@@ -366,6 +367,7 @@ struct mlx5_hw_q_job {
 			void *out_data;
 		} __rte_packed;
 		struct rte_flow_item_ethdev port_spec;
+		struct rte_flow_item_tag tag_spec;
 	} __rte_packed;
 };
 
@@ -1673,6 +1675,9 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
+	struct rte_flow_pattern_template *hw_tx_repr_tagging_pt;
+	struct rte_flow_actions_template *hw_tx_repr_tagging_at;
+	struct rte_flow_template_table *hw_tx_repr_tagging_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 2142cd828a..026d4eb9c0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1123,7 +1123,11 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 		}
 		break;
 	case MLX5_METADATA_TX:
-		return REG_A;
+		if (config->dv_flow_en == 2 && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		} else {
+			return REG_A;
+		}
 	case MLX5_METADATA_FDB:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
@@ -11319,7 +11323,7 @@ mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 			return 0;
 		}
 	}
-	return rte_flow_error_set(error, EINVAL,
+	return rte_flow_error_set(error, ENODEV,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, "unable to find a proxy port");
 }
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 63f946473d..a497dac474 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1199,12 +1199,18 @@ struct rte_flow_pattern_template {
 	struct rte_flow_pattern_template_attr attr;
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
+	uint64_t orig_item_nb; /* Number of pattern items provided by the user (with END item). */
 	uint32_t refcnt;  /* Reference counter. */
 	/*
 	 * If true, then rule pattern should be prepended with
 	 * represented_port pattern item.
 	 */
 	bool implicit_port;
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * tag pattern item for representor matching.
+	 */
+	bool implicit_tag;
 };
 
 /* Flow action template struct. */
@@ -2479,6 +2485,7 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_actions_template_attr *attr,
 		const struct rte_flow_action actions[],
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6a7f1376f7..242bbaa4d4 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -32,12 +32,15 @@
 /* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Lowest flow group usable by an application. */
+/* Lowest flow group usable by an application if group translation is done. */
 #define MLX5_HW_LOWEST_USABLE_GROUP (1)
 
 /* Maximum group index usable by user applications for transfer flows. */
 #define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
 
+/* Maximum group index usable by user applications for egress flows. */
+#define MLX5_HW_MAX_EGRESS_GROUP (UINT32_MAX - 1)
+
 /* Lowest priority for HW root table. */
 #define MLX5_HW_LOWEST_PRIO_ROOT 15
 
@@ -61,6 +64,9 @@ flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
 			       const struct mlx5_hw_actions *hw_acts,
 			       const struct rte_flow_action *action);
 
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev);
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -2329,21 +2335,18 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 		       uint8_t pattern_template_index,
 		       struct mlx5_hw_q_job *job)
 {
-	if (table->its[pattern_template_index]->implicit_port) {
-		const struct rte_flow_item *curr_item;
-		unsigned int nb_items;
-		bool found_end;
-		unsigned int i;
-
-		/* Count number of pattern items. */
-		nb_items = 0;
-		found_end = false;
-		for (curr_item = items; !found_end; ++curr_item) {
-			++nb_items;
-			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-				found_end = true;
+	struct rte_flow_pattern_template *pt = table->its[pattern_template_index];
+
+	/* Only one implicit item can be added to flow rule pattern. */
+	MLX5_ASSERT(!pt->implicit_port || !pt->implicit_tag);
+	/* At least one item was allocated in job descriptor for items. */
+	MLX5_ASSERT(MLX5_HW_MAX_ITEMS >= 1);
+	if (pt->implicit_port) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
-		/* Prepend represented port item. */
+		/* Set up represented port item in job descriptor. */
 		job->port_spec = (struct rte_flow_item_ethdev){
 			.port_id = dev->data->port_id,
 		};
@@ -2351,21 +2354,26 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &job->port_spec,
 		};
-		found_end = false;
-		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
-			job->items[i] = items[i - 1];
-			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
-				found_end = true;
-				break;
-			}
-		}
-		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
+		return job->items;
+	} else if (pt->implicit_tag) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
 			rte_errno = ENOMEM;
 			return NULL;
 		}
+		/* Set up tag item in job descriptor. */
+		job->tag_spec = (struct rte_flow_item_tag){
+			.data = flow_hw_tx_tag_regc_value(dev),
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &job->tag_spec,
+		};
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
 		return job->items;
+	} else {
+		return items;
 	}
-	return items;
 }
 
 /**
@@ -3152,9 +3160,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en &&
+	if (config->dv_esw_en &&
 	    priv->fdb_def_rule &&
 	    cfg->external &&
 	    flow_attr->transfer) {
@@ -3164,6 +3173,22 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 						  NULL,
 						  "group index not supported");
 		*table_group = group + 1;
+	} else if (config->dv_esw_en &&
+		   !(config->repr_matching && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) &&
+		   cfg->external &&
+		   flow_attr->egress) {
+		/*
+		 * On E-Switch setups, egress group translation is not done if and only if
+		 * representor matching is disabled and legacy metadata mode is selected.
+		 * In all other cases, egree group 0 is reserved for representor tagging flows
+		 * and metadata copy flows.
+		 */
+		if (group > MLX5_HW_MAX_EGRESS_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
 	} else {
 		*table_group = group;
 	}
@@ -3204,7 +3229,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -3213,12 +3237,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
-		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-				  "egress flows are not supported with HW Steering"
-				  " when E-Switch is enabled");
-		return NULL;
-	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -4456,26 +4474,28 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
-static struct rte_flow_item *
-flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
-			       struct rte_flow_error *error)
+static uint32_t
+flow_hw_count_items(const struct rte_flow_item *items)
 {
 	const struct rte_flow_item *curr_item;
-	struct rte_flow_item *copied_items;
-	bool found_end;
-	unsigned int nb_items;
-	unsigned int i;
-	size_t size;
+	uint32_t nb_items;
 
-	/* Count number of pattern items. */
 	nb_items = 0;
-	found_end = false;
-	for (curr_item = items; !found_end; ++curr_item) {
+	for (curr_item = items; curr_item->type != RTE_FLOW_ITEM_TYPE_END; ++curr_item)
 		++nb_items;
-		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-			found_end = true;
-	}
-	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	return ++nb_items;
+}
+
+static struct rte_flow_item *
+flow_hw_prepend_item(const struct rte_flow_item *items,
+		     const uint32_t nb_items,
+		     const struct rte_flow_item *new_item,
+		     struct rte_flow_error *error)
+{
+	struct rte_flow_item *copied_items;
+	size_t size;
+
+	/* Allocate new array of items. */
 	size = sizeof(*copied_items) * (nb_items + 1);
 	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
 	if (!copied_items) {
@@ -4485,14 +4505,9 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 				   "cannot allocate item template");
 		return NULL;
 	}
-	copied_items[0] = (struct rte_flow_item){
-		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-		.spec = NULL,
-		.last = NULL,
-		.mask = &rte_flow_item_ethdev_mask,
-	};
-	for (i = 1; i < nb_items + 1; ++i)
-		copied_items[i] = items[i - 1];
+	/* Put new item at the beginning and copy the rest. */
+	copied_items[0] = *new_item;
+	rte_memcpy(&copied_items[1], items, sizeof(*items) * nb_items);
 	return copied_items;
 }
 
@@ -4513,17 +4528,13 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	if (priv->sh->config.dv_esw_en) {
 		MLX5_ASSERT(priv->master || priv->representor);
 		if (priv->master) {
-			/*
-			 * It is allowed to specify ingress, egress and transfer attributes
-			 * at the same time, in order to construct flows catching all missed
-			 * FDB traffic and forwarding it to the master port.
-			 */
-			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+			if ((attr->ingress && attr->egress) ||
+			    (attr->ingress && attr->transfer) ||
+			    (attr->egress && attr->transfer))
 				return rte_flow_error_set(error, EINVAL,
 							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-							  "only one or all direction attributes"
-							  " at once can be used on transfer proxy"
-							  " port");
+							  "only one direction attribute at once"
+							  " can be used on transfer proxy port");
 		} else {
 			if (attr->transfer)
 				return rte_flow_error_set(error, EINVAL,
@@ -4576,11 +4587,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			break;
 		}
 		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
-			if (attr->ingress || attr->egress)
+			if (attr->ingress && priv->sh->config.repr_matching)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when ingress attribute is set");
+			if (attr->egress)
 				return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
 						  "represented port item cannot be used"
-						  " when transfer attribute is set");
+						  " when egress attribute is set");
 			break;
 		case RTE_FLOW_ITEM_TYPE_META:
 			if (!priv->sh->config.dv_esw_en ||
@@ -4642,6 +4658,17 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static bool
+flow_hw_pattern_has_sq_match(const struct rte_flow_item *items)
+{
+	unsigned int i;
+
+	for (i = 0; items[i].type != RTE_FLOW_ITEM_TYPE_END; ++i)
+		if (items[i].type == (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ)
+			return true;
+	return false;
+}
+
 /**
  * Create flow item template.
  *
@@ -4667,17 +4694,53 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
+	uint64_t orig_item_nb;
+	struct rte_flow_item port = {
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	struct rte_flow_item_tag tag_v = {
+		.data = 0,
+		.index = REG_C_0,
+	};
+	struct rte_flow_item_tag tag_m = {
+		.data = flow_hw_tx_tag_regc_mask(dev),
+		.index = 0xff,
+	};
+	struct rte_flow_item tag = {
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &tag_v,
+		.mask = &tag_m,
+		.last = NULL
+	};
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
-		copied_items = flow_hw_copy_prepend_port_item(items, error);
+	orig_item_nb = flow_hw_count_items(items);
+	if (priv->sh->config.dv_esw_en &&
+	    priv->sh->config.repr_matching &&
+	    attr->ingress && !attr->egress && !attr->transfer) {
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &port, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else if (priv->sh->config.dv_esw_en &&
+		   priv->sh->config.repr_matching &&
+		   !attr->ingress && attr->egress && !attr->transfer) {
+		if (flow_hw_pattern_has_sq_match(items)) {
+			DRV_LOG(DEBUG, "Port %u omitting implicit REG_C_0 match for egress "
+				       "pattern template", dev->data->port_id);
+			tmpl_items = items;
+			goto setup_pattern_template;
+		}
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &tag, error);
 		if (!copied_items)
 			return NULL;
 		tmpl_items = copied_items;
 	} else {
 		tmpl_items = items;
 	}
+setup_pattern_template:
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
 		if (copied_items)
@@ -4689,6 +4752,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
+	it->orig_item_nb = orig_item_nb;
 	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
 		if (copied_items)
@@ -4701,11 +4765,15 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
-	it->implicit_port = !!copied_items;
+	if (copied_items) {
+		if (attr->ingress)
+			it->implicit_port = true;
+		else if (attr->egress)
+			it->implicit_tag = true;
+		mlx5_free(copied_items);
+	}
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
-	if (copied_items)
-		mlx5_free(copied_items);
 	return it;
 }
 
@@ -5105,6 +5173,254 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+/**
+ * Create an egress pattern template matching on source SQ.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to pattern template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_repr_sq_pattern_tmpl(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t mask = priv->sh->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(mask != 0);
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT(__builtin_popcount(mask) >= __builtin_popcount(priv->vport_meta_mask));
+	return mask;
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t tag;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(priv->vport_meta_mask != 0);
+	tag = priv->vport_meta_tag >> (rte_bsf32(priv->vport_meta_mask));
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT((tag & priv->sh->dv_regc0_mask) == tag);
+	return tag;
+}
+
+static void
+flow_hw_update_action_mask(struct rte_flow_action *action,
+			   struct rte_flow_action *mask,
+			   enum rte_flow_action_type type,
+			   void *conf_v,
+			   void *conf_m)
+{
+	action->type = type;
+	action->conf = conf_v;
+	mask->type = type;
+	mask->conf = conf_m;
+}
+
+/**
+ * Create an egress actions template with MODIFY_FIELD action for setting unused REG_C_0 bits
+ * to vport tag and JUMP action to group 1.
+ *
+ * If extended metadata mode is enabled, then MODIFY_FIELD action for copying software metadata
+ * to REG_C_1 is added as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to actions template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_repr_tag_jump_acts_tmpl(struct rte_eth_dev *dev)
+{
+	uint32_t tag_mask = flow_hw_tx_tag_regc_mask(dev);
+	uint32_t tag_value = flow_hw_tx_tag_regc_value(dev);
+	struct rte_flow_actions_template_attr attr = {
+		.egress = 1,
+	};
+	struct rte_flow_action_modify_field set_tag_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+			.offset = rte_bsf32(tag_mask),
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = __builtin_popcount(tag_mask),
+	};
+	struct rte_flow_action_modify_field set_tag_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_modify_field copy_metadata_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action_modify_field copy_metadata_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[4] = { { 0 } };
+	struct rte_flow_action actions_m[4] = { { 0 } };
+	unsigned int idx = 0;
+
+	rte_memcpy(set_tag_v.src.value, &tag_value, sizeof(tag_value));
+	rte_memcpy(set_tag_m.src.value, &tag_mask, sizeof(tag_mask));
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+				   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+				   &set_tag_v, &set_tag_m);
+	idx++;
+	if (MLX5_SH(dev)->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+					   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+					   &copy_metadata_v, &copy_metadata_m);
+		idx++;
+	}
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_JUMP,
+				   &jump_v, &jump_m);
+	idx++;
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_END,
+				   NULL, NULL);
+	idx++;
+	MLX5_ASSERT(idx <= RTE_DIM(actions_v));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
+static void
+flow_hw_cleanup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hw_tx_repr_tagging_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_tx_repr_tagging_tbl, NULL);
+		priv->hw_tx_repr_tagging_tbl = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_at) {
+		flow_hw_actions_template_destroy(dev, priv->hw_tx_repr_tagging_at, NULL);
+		priv->hw_tx_repr_tagging_at = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_pt) {
+		flow_hw_pattern_template_destroy(dev, priv->hw_tx_repr_tagging_pt, NULL);
+		priv->hw_tx_repr_tagging_pt = NULL;
+	}
+}
+
+/**
+ * Setup templates and table used to create default Tx flow rules. These default rules
+ * allow for matching Tx representor traffic using a vport tag placed in unused bits of
+ * REG_C_0 register.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static int
+flow_hw_setup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
+
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	priv->hw_tx_repr_tagging_pt = flow_hw_create_tx_repr_sq_pattern_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_pt)
+		goto error;
+	priv->hw_tx_repr_tagging_at = flow_hw_create_tx_repr_tag_jump_acts_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_at)
+		goto error;
+	priv->hw_tx_repr_tagging_tbl = flow_hw_table_create(dev, &cfg,
+							    &priv->hw_tx_repr_tagging_pt, 1,
+							    &priv->hw_tx_repr_tagging_at, 1,
+							    NULL);
+	if (!priv->hw_tx_repr_tagging_tbl)
+		goto error;
+	return 0;
+error:
+	flow_hw_cleanup_tx_repr_tagging(dev);
+	return -rte_errno;
+}
+
 static uint32_t
 flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
@@ -5511,29 +5827,43 @@ flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
 		},
 		.width = UINT32_MAX,
 	};
-	const struct rte_flow_action copy_reg_action[] = {
+	const struct rte_flow_action_jump jump_action = {
+		.group = 1,
+	};
+	const struct rte_flow_action_jump jump_mask = {
+		.group = UINT32_MAX,
+	};
+	const struct rte_flow_action actions[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_action,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
-	const struct rte_flow_action copy_reg_mask[] = {
+	const struct rte_flow_action masks[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_mask,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_mask,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
 	struct rte_flow_error drop_err;
 
 	RTE_SET_USED(drop_err);
-	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
-					       copy_reg_mask, &drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, actions,
+					       masks, &drop_err);
 }
 
 /**
@@ -5711,63 +6041,21 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
 	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
 	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
+	uint32_t repr_matching = priv->sh->config.repr_matching;
 
-	/* Item templates */
+	/* Create templates and table for default SQ miss flow rules - root table. */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
 	if (!esw_mgr_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
-	if (!regc_sq_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
-	if (!port_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
-		if (!tx_meta_items_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Action templates */
 	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
 	if (!regc_jump_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
-	if (!port_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create port action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
-			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
-	if (!jump_one_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
-		if (!tx_meta_actions_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
 			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
@@ -5776,6 +6064,19 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default SQ miss flow rules - non-root table. */
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
 	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
@@ -5784,6 +6085,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default FDB jump flow rules. */
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
 	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
 							       jump_one_actions_tmpl);
@@ -5792,7 +6107,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+	/* Create templates and table for default Tx metadata copy flow rule. */
+	if (!repr_matching && xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
 		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
 		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
 					tx_meta_items_tmpl, tx_meta_actions_tmpl);
@@ -5816,7 +6144,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+	if (tx_meta_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
@@ -5824,7 +6152,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
 	if (regc_jump_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+	if (tx_meta_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
@@ -6169,6 +6497,13 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (priv->sh->config.dv_esw_en && priv->sh->config.repr_matching) {
+		ret = flow_hw_setup_tx_repr_tagging(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
 	if (is_proxy) {
 		ret = flow_hw_create_vport_actions(priv);
 		if (ret) {
@@ -6291,6 +6626,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	flow_hw_cleanup_tx_repr_tagging(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -7650,45 +7986,30 @@ flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 }
 
 /**
- * Destroys control flows created on behalf of @p owner_dev device.
+ * Destroys control flows created on behalf of @p owner device on @p dev device.
  *
- * @param owner_dev
+ * @param dev
+ *   Pointer to Ethernet device on which control flows were created.
+ * @param owner
  *   Pointer to Ethernet device owning control flows.
  *
  * @return
  *   0 on success, otherwise negative error code is returned and
  *   rte_errno is set.
  */
-int
-mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+static int
+flow_hw_flush_ctrl_flows_owned_by(struct rte_eth_dev *dev, struct rte_eth_dev *owner)
 {
-	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
-	struct rte_eth_dev *proxy_dev;
-	struct mlx5_priv *proxy_priv;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hw_ctrl_flow *cf;
 	struct mlx5_hw_ctrl_flow *cf_next;
-	uint16_t owner_port_id = owner_dev->data->port_id;
-	uint16_t proxy_port_id = owner_dev->data->port_id;
 	int ret;
 
-	if (owner_priv->sh->config.dv_esw_en) {
-		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
-			DRV_LOG(ERR, "Unable to find proxy port for port %u",
-				owner_port_id);
-			rte_errno = EINVAL;
-			return -rte_errno;
-		}
-		proxy_dev = &rte_eth_devices[proxy_port_id];
-		proxy_priv = proxy_dev->data->dev_private;
-	} else {
-		proxy_dev = owner_dev;
-		proxy_priv = owner_priv;
-	}
-	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
 	while (cf != NULL) {
 		cf_next = LIST_NEXT(cf, next);
-		if (cf->owner_dev == owner_dev) {
-			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+		if (cf->owner_dev == owner) {
+			ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
 			if (ret) {
 				rte_errno = ret;
 				return -ret;
@@ -7701,6 +8022,50 @@ mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
 	return 0;
 }
 
+/**
+ * Destroys control flows created for @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	/* Flush all flows created by this port for itself. */
+	ret = flow_hw_flush_ctrl_flows_owned_by(owner_dev, owner_dev);
+	if (ret)
+		return ret;
+	/* Flush all flows created for this port on proxy port. */
+	if (owner_priv->sh->config.dv_esw_en) {
+		ret = rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL);
+		if (ret == -ENODEV) {
+			DRV_LOG(DEBUG, "Unable to find transfer proxy port for port %u. It was "
+				       "probably closed. Control flows were cleared.",
+				       owner_port_id);
+			rte_errno = 0;
+			return 0;
+		} else if (ret) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u (ret = %d)",
+				owner_port_id, ret);
+			return ret;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+	} else {
+		proxy_dev = owner_dev;
+	}
+	return flow_hw_flush_ctrl_flows_owned_by(proxy_dev, owner_dev);
+}
+
 /**
  * Destroys all control flows created on @p dev device.
  *
@@ -7952,6 +8317,9 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
@@ -7964,6 +8332,60 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+int
+mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &sq_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	/*
+	 * Allocate actions array suitable for all cases - extended metadata enabled or not.
+	 * With extended metadata there will be an additional MODIFY_FIELD action before JUMP.
+	 */
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD },
+		{ .type = RTE_FLOW_ACTION_TYPE_JUMP },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+
+	/* It is assumed that caller checked for representor matching. */
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	if (!priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Port %u must be configured for HWS, before creating "
+			       "default egress flow rules. Omitting creation.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!priv->hw_tx_repr_tagging_tbl) {
+		DRV_LOG(ERR, "Port %u is configured for HWS, but table for default "
+			     "egress flow rules does not exist.",
+			     dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * If extended metadata mode is enabled, then an additional MODIFY_FIELD action must be
+	 * placed before terminating JUMP action.
+	 */
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		actions[1].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+		actions[2].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	}
+	return flow_hw_create_ctrl_flow(dev, dev, priv->hw_tx_repr_tagging_tbl,
+					items, 0, actions, 0);
+}
+
 void
 mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 {
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a973cbc5e3..dcb02f2a7f 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1065,6 +1065,69 @@ mlx5_hairpin_get_peer_ports(struct rte_eth_dev *dev, uint16_t *peer_ports,
 	return ret;
 }
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+
+/**
+ * Check if starting representor port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then starting representor port
+ * is allowed if and only if transfer proxy port is started as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping representor port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_representor_port_allowed_start(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = UINT16_MAX;
+	int ret;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->representor);
+	ret = rte_flow_pick_transfer_proxy(dev->data->port_id, &proxy_port_id, NULL);
+	if (ret) {
+		if (ret == -ENODEV)
+			DRV_LOG(ERR, "Starting representor port %u is not allowed. Transfer "
+				     "proxy port is not available.", dev->data->port_id);
+		else
+			DRV_LOG(ERR, "Failed to pick transfer proxy for port %u (ret = %d)",
+				dev->data->port_id, ret);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (proxy_priv->dr_ctx == NULL) {
+		DRV_LOG(DEBUG, "Starting representor port %u is allowed, but default traffic flows"
+			       " will not be created. Transfer proxy port must be configured"
+			       " for HWS and started.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!proxy_dev->data->dev_started) {
+		DRV_LOG(ERR, "Failed to start port %u: transfer proxy (port %u) must be started",
+			     dev->data->port_id, proxy_port_id);
+		rte_errno = EAGAIN;
+		return -rte_errno;
+	}
+	if (priv->sh->config.repr_matching && !priv->dr_ctx) {
+		DRV_LOG(ERR, "Failed to start port %u: with representor matching enabled, port "
+			     "must be configured for HWS", dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return 0;
+}
+
+#endif
+
 /**
  * DPDK callback to start the device.
  *
@@ -1084,6 +1147,19 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	int fine_inline;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_start;
+		/* If master is being started, then it is always allowed. */
+		if (priv->master)
+			goto continue_dev_start;
+		if (mlx5_hw_representor_port_allowed_start(dev))
+			return -rte_errno;
+	}
+continue_dev_start:
+#endif
 	fine_inline = rte_mbuf_dynflag_lookup
 		(RTE_PMD_MLX5_FINE_GRANULARITY_INLINE, NULL);
 	if (fine_inline >= 0)
@@ -1248,6 +1324,53 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	return -rte_errno;
 }
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+/**
+ * Check if stopping transfer proxy port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then it is allowed to stop it
+ * if and only if all other representor ports are stopped.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping transfer proxy port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_proxy_port_allowed_stop(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	bool representor_started = false;
+	uint16_t port_id;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->master);
+	/* If transfer proxy port was not configured for HWS, then stopping it is allowed. */
+	if (!priv->dr_ctx)
+		return 0;
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_id != dev->data->port_id &&
+		    port_priv->domain_id == priv->domain_id &&
+		    port_dev->data->dev_started)
+			representor_started = true;
+	}
+	if (representor_started) {
+		DRV_LOG(INFO, "Failed to stop port %u: attached representor ports"
+			      " must be stopped before stopping transfer proxy port",
+			      dev->data->port_id);
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
+	return 0;
+}
+#endif
+
 /**
  * DPDK callback to stop the device.
  *
@@ -1261,6 +1384,21 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_stop;
+		/* If representor is being stopped, then it is always allowed. */
+		if (priv->representor)
+			goto continue_dev_stop;
+		if (mlx5_hw_proxy_port_allowed_stop(dev)) {
+			dev->data->dev_started = 1;
+			return -rte_errno;
+		}
+	}
+continue_dev_stop:
+#endif
 	dev->data->dev_started = 0;
 	/* Prevent crashes when queues are still in use. */
 	dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
@@ -1296,13 +1434,21 @@ static int
 mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	unsigned int i;
 	int ret;
 
-	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
-			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
-				goto error;
+	/*
+	 * With extended metadata enabled, the Tx metadata copy is handled by default
+	 * Tx tagging flow rules, so default Tx flow rule is not needed. It is only
+	 * required when representor matching is disabled.
+	 */
+	if (config->dv_esw_en &&
+	    !config->repr_matching &&
+	    config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->master) {
+		if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+			goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
@@ -1311,17 +1457,22 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		if (!txq)
 			continue;
 		queue = mlx5_txq_get_sqn(txq);
-		if ((priv->representor || priv->master) &&
-		    priv->sh->config.dv_esw_en) {
+		if ((priv->representor || priv->master) && config->dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
+		if (config->dv_esw_en && config->repr_matching) {
+			if (mlx5_flow_hw_tx_repr_matching_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.fdb_def_rule) {
-		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+	if (config->fdb_def_rule) {
+		if ((priv->master || priv->representor) && config->dv_esw_en) {
 			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
 				priv->fdb_def_rule = 1;
 			else
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 00/17] net/mlx5: HW steering PMD update
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (27 preceding siblings ...)
  2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-09-30 12:52 ` Suanming Mou
  2022-09-30 12:52   ` [PATCH v3 01/17] net/mlx5: fix invalid flow attributes Suanming Mou
                     ` (16 more replies)
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                   ` (2 subsequent siblings)
  31 siblings, 17 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:52 UTC (permalink / raw)
  Cc: dev, rasland, orika

The skeleton of mlx5 HW steering(HWS) has been updated into
upstream for pretty a long time, but not updated anymore due
to missing of the low-level steering layer code. Luckily,
better late than never, the steering layer finnaly comes[1].

This series will add more features to the existing PMD code:
 - FDB and metadata copy.
 - Modify field.
 - Meter color.
 - Counter.
 - Aging.
 - Action template pre-parser optimization.
 - Connection tracking.

Some features such as meter/aging/ct touches the public API,
and public API changes have been sent to ML much earily in
other threads in order not to be swallowed by this big series.

The dpends patches as below:
 [1]https://inbox.dpdk.org/dev/20220922190345.394-1-valex@nvidia.com/
 [2]https://inbox.dpdk.org/dev/20220921145409.511328-1-michaelba@nvidia.com/

---

 v3:
  - Fixed flow can't be aged out.
  - Fix error not be filled properly while table creat failed.
  - Remove transfer_mode in flow attributes before ethdev layer applied.
    https://patches.dpdk.org/project/dpdk/patch/20220928092425.68214-1-rongweil@nvidia.com/

 v2:
  - Remove the rte_flow patches as they will be integrated in other thread.
  - Fix compilation issues.
  - Make the patches be better organized.
   
---

Alexander Kozyrev (2):
  net/mlx5: add HW steering meter action
  net/mlx5: implement METER MARK indirect action for HWS

Bing Zhao (1):
  net/mlx5: add extended metadata mode for hardware steering

Dariusz Sosnowski (4):
  net/mlx5: add HW steering port action
  net/mlx5: support DR action template API
  net/mlx5: support device control for E-Switch default rule
  net/mlx5: support device control of representor matching

Gregory Etelson (2):
  net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  net/mlx5: support flow integrity in HWS group 0

Michael Baum (1):
  net/mlx5: add HWS AGE action support

Suanming Mou (6):
  net/mlx5: fix invalid flow attributes
  net/mlx5: fix IPv6 and TCP RSS hash fields
  net/mlx5: add shared header reformat support
  net/mlx5: add modify field hws support
  net/mlx5: add HW steering connection tracking support
  net/mlx5: add async action push and pull support

Xiaoyu Min (1):
  net/mlx5: add HW steering counter action

 doc/guides/nics/mlx5.rst             |    9 +
 drivers/common/mlx5/mlx5_devx_cmds.c |   50 +
 drivers/common/mlx5/mlx5_devx_cmds.h |   27 +
 drivers/common/mlx5/mlx5_prm.h       |   64 +-
 drivers/common/mlx5/version.map      |    1 +
 drivers/net/mlx5/linux/mlx5_os.c     |   76 +-
 drivers/net/mlx5/meson.build         |    1 +
 drivers/net/mlx5/mlx5.c              |  126 +-
 drivers/net/mlx5/mlx5.h              |  318 +-
 drivers/net/mlx5/mlx5_defs.h         |    5 +
 drivers/net/mlx5/mlx5_flow.c         |  415 +-
 drivers/net/mlx5/mlx5_flow.h         |  312 +-
 drivers/net/mlx5/mlx5_flow_aso.c     |  793 ++-
 drivers/net/mlx5/mlx5_flow_dv.c      | 1169 ++--
 drivers/net/mlx5/mlx5_flow_hw.c      | 7898 +++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow_meter.c   |  771 ++-
 drivers/net/mlx5/mlx5_flow_verbs.c   |    8 +-
 drivers/net/mlx5/mlx5_hws_cnt.c      | 1198 ++++
 drivers/net/mlx5/mlx5_hws_cnt.h      |  703 +++
 drivers/net/mlx5/mlx5_trigger.c      |  254 +-
 drivers/net/mlx5/mlx5_tx.h           |    1 +
 drivers/net/mlx5/mlx5_txq.c          |   47 +
 drivers/net/mlx5/rte_pmd_mlx5.h      |   17 +
 drivers/net/mlx5/version.map         |    1 +
 24 files changed, 12619 insertions(+), 1645 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 01/17] net/mlx5: fix invalid flow attributes
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-09-30 12:52   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 02/17] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:52 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the function flow_get_drv_type(), attr will be read in non-HWS mode.
In case user call the HWS API in SWS mode, attr should be placed in
HWS functions, or it will cause crash.

Fixes: 572801ab860f ("ethdev: backport upstream rte_flow_async codes")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c | 38 ++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 45109001ca..3abb39aa92 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -3740,6 +3740,8 @@ flow_get_drv_type(struct rte_eth_dev *dev, const struct rte_flow_attr *attr)
 	 */
 	if (priv->sh->config.dv_flow_en == 2)
 		return MLX5_FLOW_TYPE_HW;
+	if (!attr)
+		return MLX5_FLOW_TYPE_MIN;
 	/* If no OS specific type - continue with DV/VERBS selection */
 	if (attr->transfer && priv->sh->config.dv_esw_en)
 		type = MLX5_FLOW_TYPE_DV;
@@ -8252,8 +8254,9 @@ mlx5_flow_info_get(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8287,8 +8290,9 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8319,8 +8323,9 @@ mlx5_flow_pattern_template_create(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8350,8 +8355,9 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8385,8 +8391,9 @@ mlx5_flow_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8416,8 +8423,9 @@ mlx5_flow_actions_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8457,8 +8465,9 @@ mlx5_flow_table_create(struct rte_eth_dev *dev,
 		       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8494,8 +8503,9 @@ mlx5_flow_table_destroy(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8542,8 +8552,9 @@ mlx5_flow_async_flow_create(struct rte_eth_dev *dev,
 			    struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8585,8 +8596,9 @@ mlx5_flow_async_flow_destroy(struct rte_eth_dev *dev,
 			     struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8621,8 +8633,9 @@ mlx5_flow_pull(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8650,8 +8663,9 @@ mlx5_flow_push(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 02/17] net/mlx5: fix IPv6 and TCP RSS hash fields
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
  2022-09-30 12:52   ` [PATCH v3 01/17] net/mlx5: fix invalid flow attributes Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 03/17] net/mlx5: add shared header reformat support Suanming Mou
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the flow_dv_hashfields_set() function, while item_flags was 0,
the code went directly to the first if and the else case would
never have chance be checked. This caused the IPv6 and TCP hash
fileds in the else case would never be set.

This commit adds the dedicate HW steering hash field set function
to generate the RSS hash fields.

Fixes: 6540da0b93b5 ("net/mlx5: fix RSS scaling issue")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 12 +++----
 drivers/net/mlx5/mlx5_flow_hw.c | 59 ++++++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index fb08684114..cb034a01f9 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -11299,8 +11299,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		rss_inner = 1;
 #endif
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV4)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4)) ||
-	     !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4))) {
 		if (rss_types & MLX5_IPV4_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV4;
@@ -11310,8 +11309,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_IPV4_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV6)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6))) {
 		if (rss_types & MLX5_IPV6_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV6;
@@ -11334,8 +11332,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		return;
 	}
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_UDP)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP)) ||
-	    !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP))) {
 		if (rss_types & RTE_ETH_RSS_UDP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_UDP;
@@ -11345,8 +11342,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_UDP_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_TCP)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP))) {
 		if (rss_types & RTE_ETH_RSS_TCP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_TCP;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 7343d59f1f..46c4169b4f 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -62,6 +62,63 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	priv->mark_enabled = enable;
 }
 
+/**
+ * Set the hash fields according to the @p rss_desc information.
+ *
+ * @param[in] rss_desc
+ *   Pointer to the mlx5_flow_rss_desc.
+ * @param[out] hash_fields
+ *   Pointer to the RSS hash fields.
+ */
+static void
+flow_hw_hashfields_set(struct mlx5_flow_rss_desc *rss_desc,
+		       uint64_t *hash_fields)
+{
+	uint64_t fields = 0;
+	int rss_inner = 0;
+	uint64_t rss_types = rte_eth_rss_hf_refine(rss_desc->types);
+
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	if (rss_desc->level >= 2)
+		rss_inner = 1;
+#endif
+	if (rss_types & MLX5_IPV4_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV4;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV4;
+		else
+			fields |= MLX5_IPV4_IBV_RX_HASH;
+	} else if (rss_types & MLX5_IPV6_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV6;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV6;
+		else
+			fields |= MLX5_IPV6_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_UDP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_UDP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_UDP;
+		else
+			fields |= MLX5_UDP_IBV_RX_HASH;
+	} else if (rss_types & RTE_ETH_RSS_TCP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_TCP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_TCP;
+		else
+			fields |= MLX5_TCP_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_ESP)
+		fields |= IBV_RX_HASH_IPSEC_SPI;
+	if (rss_inner)
+		fields |= IBV_RX_HASH_INNER;
+	*hash_fields = fields;
+}
+
 /**
  * Generate the pattern item flags.
  * Will be used for shared RSS action.
@@ -225,7 +282,7 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 		       MLX5_RSS_HASH_KEY_LEN);
 		rss_desc.key_len = MLX5_RSS_HASH_KEY_LEN;
 		rss_desc.types = !rss->types ? RTE_ETH_RSS_IP : rss->types;
-		flow_dv_hashfields_set(0, &rss_desc, &rss_desc.hash_fields);
+		flow_hw_hashfields_set(&rss_desc, &rss_desc.hash_fields);
 		flow_dv_action_rss_l34_hash_adjust(rss->types,
 						   &rss_desc.hash_fields);
 		if (rss->level > 1) {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 03/17] net/mlx5: add shared header reformat support
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
  2022-09-30 12:52   ` [PATCH v3 01/17] net/mlx5: fix invalid flow attributes Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 02/17] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 04/17] net/mlx5: add modify field hws support Suanming Mou
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

As the rte_flow_async API defines, the action mask with field value
not be 0 means the action will be used as shared in all the flows in
the table.

The header reformat action with action mask field not be 0 will be
created as constant shared action. For encapsulation header reformat
action, there are two kinds of encapsulation data, raw_encap_data
and rte_flow_item encap_data. Both of these two kinds of data can be
identified from the action mask conf as constant or not.

Examples:
1. VXLAN encap (encap_data: rte_flow_item)
	action conf (eth/ipv4/udp/vxlan_hdr)

	a. action mask conf (eth/ipv4/udp/vxlan_hdr)
	  - items are constant.
	b. action mask conf (NULL)
	  - items will change.

2. RAW encap (encap_data: raw)
	action conf (raw_data)

	a. action mask conf (not NULL)
	  - encap_data constant.
	b. action mask conf (NULL)
	  - encap_data will change.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   6 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 124 ++++++++++----------------------
 2 files changed, 39 insertions(+), 91 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 4b53912b79..1c9f5fc1d5 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1064,10 +1064,6 @@ struct mlx5_action_construct_data {
 	uint16_t action_dst; /* mlx5dr_rule_action dst offset. */
 	union {
 		struct {
-			/* encap src(item) offset. */
-			uint16_t src;
-			/* encap dst data offset. */
-			uint16_t dst;
 			/* encap data len. */
 			uint16_t len;
 		} encap;
@@ -1110,6 +1106,8 @@ struct mlx5_hw_jump_action {
 /* Encap decap action struct. */
 struct mlx5_hw_encap_decap_action {
 	struct mlx5dr_action *action; /* Action object. */
+	/* Is header_reformat action shared across flows in table. */
+	bool shared;
 	size_t data_size; /* Action metadata size. */
 	uint8_t data[]; /* Action data. */
 };
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 46c4169b4f..b6978bd051 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -402,10 +402,6 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
  *   Offset of source rte flow action.
  * @param[in] action_dst
  *   Offset of destination DR action.
- * @param[in] encap_src
- *   Offset of source encap raw data.
- * @param[in] encap_dst
- *   Offset of destination encap raw data.
  * @param[in] len
  *   Length of the data to be updated.
  *
@@ -418,16 +414,12 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				enum rte_flow_action_type type,
 				uint16_t action_src,
 				uint16_t action_dst,
-				uint16_t encap_src,
-				uint16_t encap_dst,
 				uint16_t len)
 {	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
 		return -1;
-	act_data->encap.src = encap_src;
-	act_data->encap.dst = encap_dst;
 	act_data->encap.len = len;
 	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
 	return 0;
@@ -523,53 +515,6 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
-/**
- * Translate encap items to encapsulation list.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev data structure.
- * @param[in] acts
- *   Pointer to the template HW steering DR actions.
- * @param[in] type
- *   Action type.
- * @param[in] action_src
- *   Offset of source rte flow action.
- * @param[in] action_dst
- *   Offset of destination DR action.
- * @param[in] items
- *   Encap item pattern.
- * @param[in] items_m
- *   Encap item mask indicates which part are constant and dynamic.
- *
- * @return
- *    0 on success, negative value otherwise and rte_errno is set.
- */
-static __rte_always_inline int
-flow_hw_encap_item_translate(struct rte_eth_dev *dev,
-			     struct mlx5_hw_actions *acts,
-			     enum rte_flow_action_type type,
-			     uint16_t action_src,
-			     uint16_t action_dst,
-			     const struct rte_flow_item *items,
-			     const struct rte_flow_item *items_m)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	size_t len, total_len = 0;
-	uint32_t i = 0;
-
-	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++, items_m++, i++) {
-		len = flow_dv_get_item_hdr_len(items->type);
-		if ((!items_m->spec ||
-		    memcmp(items_m->spec, items->spec, len)) &&
-		    __flow_hw_act_data_encap_append(priv, acts, type,
-						    action_src, action_dst, i,
-						    total_len, len))
-			return -1;
-		total_len += len;
-	}
-	return 0;
-}
-
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -611,7 +556,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
-	uint8_t *encap_data = NULL;
+	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	bool actions_end = false;
 	uint32_t type, i;
@@ -718,9 +663,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_vxlan_encap *)
-				 masks->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -729,9 +674,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_nvgre_encap *)
-				actions->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -743,6 +688,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data =
+				(const struct rte_flow_action_raw_encap *)
+				 masks->conf;
+			if (raw_encap_data)
+				encap_data_m = raw_encap_data->data;
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 actions->conf;
@@ -773,22 +723,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
+		bool shared_rfmt = true;
 
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
-			if (flow_dv_convert_encap_data
-				(enc_item, buf, &data_size, error) ||
-			    flow_hw_encap_item_translate
-				(dev, acts, (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos,
-				 enc_item, enc_item_m))
+			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
 				goto err;
 			encap_data = buf;
-		} else if (encap_data && __flow_hw_act_data_encap_append
-				(priv, acts,
-				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, 0, 0, data_size)) {
-			goto err;
+			if (!enc_item_m)
+				shared_rfmt = false;
+		} else if (encap_data && !encap_data_m) {
+			shared_rfmt = false;
 		}
 		acts->encap_decap = mlx5_malloc(MLX5_MEM_ZERO,
 				    sizeof(*acts->encap_decap) + data_size,
@@ -802,12 +747,22 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		acts->encap_decap->action = mlx5dr_action_create_reformat
 				(priv->dr_ctx, refmt_type,
 				 data_size, encap_data,
-				 rte_log2_u32(table_attr->nb_flows),
-				 mlx5_hw_act_flag[!!attr->group][type]);
+				 shared_rfmt ? 0 : rte_log2_u32(table_attr->nb_flows),
+				 mlx5_hw_act_flag[!!attr->group][type] |
+				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
 		acts->rule_acts[reformat_pos].action =
 						acts->encap_decap->action;
+		acts->rule_acts[reformat_pos].reformat.data =
+						acts->encap_decap->data;
+		if (shared_rfmt)
+			acts->rule_acts[reformat_pos].reformat.offset = 0;
+		else if (__flow_hw_act_data_encap_append(priv, acts,
+				 (action_start + reformat_src)->type,
+				 reformat_src, reformat_pos, data_size))
+			goto err;
+		acts->encap_decap->shared = shared_rfmt;
 		acts->encap_decap_pos = reformat_pos;
 	}
 	acts->acts_num = i;
@@ -972,6 +927,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			.ingress = 1,
 	};
 	uint32_t ft_flag;
+	size_t encap_len = 0;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -989,9 +945,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
-	if (hw_acts->encap_decap && hw_acts->encap_decap->data_size)
-		memcpy(buf, hw_acts->encap_decap->data,
-		       hw_acts->encap_decap->data_size);
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1050,23 +1003,20 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 action->conf;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   raw_encap_data->data, act_data->encap.len);
+			rte_memcpy((void *)buf, raw_encap_data->data, act_data->encap.len);
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
@@ -1074,7 +1024,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
-	if (hw_acts->encap_decap) {
+	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 04/17] net/mlx5: add modify field hws support
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (2 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 03/17] net/mlx5: add shared header reformat support Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 05/17] net/mlx5: add HW steering port action Suanming Mou
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

This patch introduces support for modify_field rte_flow actions in HWS
mode. Support includes:

- Ingress and egress domains,
	- SET and ADD operations,
	- usage of arbitrary bit offsets and widths for packet and metadata
	fields.

	Support is implemented in two phases:

	1. On flow table creation the hardware commands are generated, based
	on rte_flow action templates, and stored alongside action template.
	2. On flow rule creation/queueing the hardware commands are updated with
	values provided by the user. Any masks over immediate values, provided
	in action templates, are applied to these values before enqueueing rules
	for creation.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h   |   2 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +-
 drivers/net/mlx5/mlx5.h          |   1 +
 drivers/net/mlx5/mlx5_flow.h     |  96 +++++
 drivers/net/mlx5/mlx5_flow_dv.c  | 551 ++++++++++++++-------------
 drivers/net/mlx5/mlx5_flow_hw.c  | 614 ++++++++++++++++++++++++++++++-
 6 files changed, 1007 insertions(+), 275 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index f832bd77cb..c82ec94465 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -746,6 +746,8 @@ enum mlx5_modification_field {
 	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
 	MLX5_MODI_GTP_TEID = 0x6E,
 	MLX5_MODI_OUT_IP_ECN = 0x73,
+	MLX5_MODI_TUNNEL_HDR_DW_1 = 0x75,
+	MLX5_MODI_GTPU_FIRST_EXT_DW_0 = 0x76,
 };
 
 /* Total number of metadata reg_c's. */
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index aed55e6a62..b7cc11a2ef 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1540,6 +1540,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				       mlx5_hrxq_clone_free_cb);
 	if (!priv->hrxqs)
 		goto error;
+	mlx5_set_metadata_mask(eth_dev);
+	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+	    !priv->sh->dv_regc0_mask) {
+		DRV_LOG(ERR, "metadata mode %u is not supported "
+			     "(no metadata reg_c[0] is available)",
+			     sh->config.dv_xmeta_en);
+			err = ENOTSUP;
+			goto error;
+	}
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
@@ -1566,15 +1575,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		err = -err;
 		goto error;
 	}
-	mlx5_set_metadata_mask(eth_dev);
-	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
-	    !priv->sh->dv_regc0_mask) {
-		DRV_LOG(ERR, "metadata mode %u is not supported "
-			     "(no metadata reg_c[0] is available)",
-			     sh->config.dv_xmeta_en);
-			err = ENOTSUP;
-			goto error;
-	}
 	/* Query availability of metadata reg_c's. */
 	if (!priv->sh->metadata_regc_check_flag) {
 		err = mlx5_flow_discover_mreg_c(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ea63c29bf9..d07f5b0d8a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -343,6 +343,7 @@ struct mlx5_hw_q_job {
 	struct rte_flow_hw *flow; /* Flow attached to the job. */
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
+	struct mlx5_modification_cmd *mhdr_cmd;
 };
 
 /* HW steering job descriptor LIFO pool. */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1c9f5fc1d5..0eab3a3797 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1007,6 +1007,51 @@ flow_items_to_tunnel(const struct rte_flow_item items[])
 	return items[0].spec;
 }
 
+/**
+ * Fetch 1, 2, 3 or 4 byte field from the byte array
+ * and return as unsigned integer in host-endian format.
+ *
+ * @param[in] data
+ *   Pointer to data array.
+ * @param[in] size
+ *   Size of field to extract.
+ *
+ * @return
+ *   converted field in host endian format.
+ */
+static inline uint32_t
+flow_dv_fetch_field(const uint8_t *data, uint32_t size)
+{
+	uint32_t ret;
+
+	switch (size) {
+	case 1:
+		ret = *data;
+		break;
+	case 2:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		break;
+	case 3:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		ret = (ret << 8) | *(data + sizeof(uint16_t));
+		break;
+	case 4:
+		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
+		break;
+	default:
+		MLX5_ASSERT(false);
+		ret = 0;
+		break;
+	}
+	return ret;
+}
+
+struct field_modify_info {
+	uint32_t size; /* Size of field in protocol header, in bytes. */
+	uint32_t offset; /* Offset of field in protocol header, in bytes. */
+	enum mlx5_modification_field id;
+};
+
 /* HW steering flow attributes. */
 struct mlx5_flow_attr {
 	uint32_t port_id; /* Port index. */
@@ -1067,6 +1112,29 @@ struct mlx5_action_construct_data {
 			/* encap data len. */
 			uint16_t len;
 		} encap;
+		struct {
+			/* Modify header action offset in pattern. */
+			uint16_t mhdr_cmds_off;
+			/* Offset in pattern after modify header actions. */
+			uint16_t mhdr_cmds_end;
+			/*
+			 * True if this action is masked and does not need to
+			 * be generated.
+			 */
+			bool shared;
+			/*
+			 * Modified field definitions in dst field (SET, ADD)
+			 * or src field (COPY).
+			 */
+			struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS];
+			/* Modified field definitions in dst field (COPY). */
+			struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS];
+			/*
+			 * Masks applied to field values to generate
+			 * PRM actions.
+			 */
+			uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS];
+		} modify_header;
 		struct {
 			uint64_t types; /* RSS hash types. */
 			uint32_t level; /* RSS level. */
@@ -1092,6 +1160,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 };
 
@@ -1112,6 +1181,22 @@ struct mlx5_hw_encap_decap_action {
 	uint8_t data[]; /* Action data. */
 };
 
+#define MLX5_MHDR_MAX_CMD ((MLX5_MAX_MODIFY_NUM) * 2 + 1)
+
+/* Modify field action struct. */
+struct mlx5_hw_modify_header_action {
+	/* Reference to DR action */
+	struct mlx5dr_action *action;
+	/* Modify header action position in action rule table. */
+	uint16_t pos;
+	/* Is MODIFY_HEADER action shared across flows in table. */
+	bool shared;
+	/* Amount of modification commands stored in the precompiled buffer. */
+	uint32_t mhdr_cmds_num;
+	/* Precompiled modification commands. */
+	struct mlx5_modification_cmd mhdr_cmds[MLX5_MHDR_MAX_CMD];
+};
+
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
@@ -1121,6 +1206,7 @@ struct mlx5_hw_actions {
 	LIST_HEAD(act_list, mlx5_action_construct_data) act_list;
 	struct mlx5_hw_jump_action *jump; /* Jump action. */
 	struct mlx5_hrxq *tir; /* TIR action. */
+	struct mlx5_hw_modify_header_action *mhdr; /* Modify header action. */
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
@@ -2201,6 +2287,16 @@ int flow_dv_action_query(struct rte_eth_dev *dev,
 size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type);
 int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf,
 			   size_t *size, struct rte_flow_error *error);
+void mlx5_flow_field_id_to_modify_info
+		(const struct rte_flow_action_modify_data *data,
+		 struct field_modify_info *info, uint32_t *mask,
+		 uint32_t width, struct rte_eth_dev *dev,
+		 const struct rte_flow_attr *attr, struct rte_flow_error *error);
+int flow_dv_convert_modify_action(struct rte_flow_item *item,
+			      struct field_modify_info *field,
+			      struct field_modify_info *dcopy,
+			      struct mlx5_flow_dv_modify_hdr_resource *resource,
+			      uint32_t type, struct rte_flow_error *error);
 
 #define MLX5_PF_VPORT_ID 0
 #define MLX5_ECPF_VPORT_ID 0xFFFE
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index cb034a01f9..7dff2ab44f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -241,12 +241,6 @@ rte_col_2_mlx5_col(enum rte_color rcol)
 	return MLX5_FLOW_COLOR_UNDEFINED;
 }
 
-struct field_modify_info {
-	uint32_t size; /* Size of field in protocol header, in bytes. */
-	uint32_t offset; /* Offset of field in protocol header, in bytes. */
-	enum mlx5_modification_field id;
-};
-
 struct field_modify_info modify_eth[] = {
 	{4,  0, MLX5_MODI_OUT_DMAC_47_16},
 	{2,  4, MLX5_MODI_OUT_DMAC_15_0},
@@ -379,45 +373,6 @@ mlx5_update_vlan_vid_pcp(const struct rte_flow_action *action,
 	}
 }
 
-/**
- * Fetch 1, 2, 3 or 4 byte field from the byte array
- * and return as unsigned integer in host-endian format.
- *
- * @param[in] data
- *   Pointer to data array.
- * @param[in] size
- *   Size of field to extract.
- *
- * @return
- *   converted field in host endian format.
- */
-static inline uint32_t
-flow_dv_fetch_field(const uint8_t *data, uint32_t size)
-{
-	uint32_t ret;
-
-	switch (size) {
-	case 1:
-		ret = *data;
-		break;
-	case 2:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		break;
-	case 3:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		ret = (ret << 8) | *(data + sizeof(uint16_t));
-		break;
-	case 4:
-		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
-		break;
-	default:
-		MLX5_ASSERT(false);
-		ret = 0;
-		break;
-	}
-	return ret;
-}
-
 /**
  * Convert modify-header action to DV specification.
  *
@@ -446,7 +401,7 @@ flow_dv_fetch_field(const uint8_t *data, uint32_t size)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-static int
+int
 flow_dv_convert_modify_action(struct rte_flow_item *item,
 			      struct field_modify_info *field,
 			      struct field_modify_info *dcopy,
@@ -1464,7 +1419,32 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static void
+static __rte_always_inline uint8_t
+flow_modify_info_mask_8(uint32_t length, uint32_t off)
+{
+	return (0xffu >> (8 - length)) << off;
+}
+
+static __rte_always_inline uint16_t
+flow_modify_info_mask_16(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_16((0xffffu >> (16 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_32((0xffffffffu >> (32 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mask)
+{
+	uint32_t mask = (0xffffffffu >> (32 - length)) << off;
+	return rte_cpu_to_be_32(mask & post_mask);
+}
+
+void
 mlx5_flow_field_id_to_modify_info
 		(const struct rte_flow_action_modify_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
@@ -1473,323 +1453,340 @@ mlx5_flow_field_id_to_modify_info
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint32_t idx = 0;
-	uint32_t off = 0;
-
-	switch (data->field) {
+	uint32_t off_be = 0;
+	uint32_t length = 0;
+	switch ((int)data->field) {
 	case RTE_FLOW_FIELD_START:
 		/* not supported yet */
 		MLX5_ASSERT(false);
 		break;
 	case RTE_FLOW_FIELD_MAC_DST:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_DMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_DMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_DMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_DMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_DMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_SRC:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_SMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_SMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_SMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_SMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_SMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VLAN_TYPE:
 		/* not supported yet */
 		break;
 	case RTE_FLOW_FIELD_VLAN_ID:
+		MLX5_ASSERT(data->offset + width <= 12);
+		off_be = 12 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_FIRST_VID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x0fff >> (12 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_TYPE:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_ETHERTYPE};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_TTL:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV4_TTL};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_SRC:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_SIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DST:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_DIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_HOPLIMIT:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV6_HOPLIMIT};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_SRC:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_SIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_SIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_SIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	case RTE_FLOW_FIELD_IPV6_SRC: {
+		/*
+		 * Fields corresponding to IPv6 source address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_SIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_SIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_SIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_SIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_DST:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_DIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_DIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_DIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	}
+	case RTE_FLOW_FIELD_IPV6_DST: {
+		/*
+		 * Fields corresponding to IPv6 destination address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_DIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_DIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_DIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_DIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
+	}
 	case RTE_FLOW_FIELD_TCP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_SEQ_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_SEQ_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_ACK_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_ACK_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_FLAGS:
+		MLX5_ASSERT(data->offset + width <= 9);
+		off_be = 9 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_FLAGS};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x1ff >> (9 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VXLAN_VNI:
-		/* not supported yet */
+		MLX5_ASSERT(data->offset + width <= 24);
+		/* VNI is on bits 31-8 of TUNNEL_HDR_DW_1. */
+		off_be = 24 - (data->offset + width) + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_TUNNEL_HDR_DW_1};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_GENEVE_VNI:
 		/* not supported yet*/
 		break;
 	case RTE_FLOW_FIELD_GTP_TEID:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_GTP_TEID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TAG:
 		{
-			int reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
-						   data->level, error);
+			MLX5_ASSERT(data->offset + width <= 32);
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = REG_C_1;
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
+							   data->level, error);
 			if (reg < 0)
 				return;
 			MLX5_ASSERT(reg != REG_NON);
@@ -1797,15 +1794,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] =
-					rte_cpu_to_be_32(0xffffffff >>
-							 (32 - width));
+				mask[idx] = flow_modify_info_mask_32
+					(width, data->offset);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_MARK:
 		{
 			uint32_t mark_mask = priv->sh->dv_mark_mask;
 			uint32_t mark_count = __builtin_popcount(mark_mask);
+			RTE_SET_USED(mark_count);
+			MLX5_ASSERT(data->offset + width <= mark_count);
 			int reg = mlx5_flow_get_reg_id(dev, MLX5_FLOW_MARK,
 						       0, error);
 			if (reg < 0)
@@ -1815,14 +1815,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((mark_mask >>
-					 (mark_count - width)) & mark_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, mark_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_META:
 		{
 			uint32_t meta_mask = priv->sh->dv_meta_mask;
 			uint32_t meta_count = __builtin_popcount(meta_mask);
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
 			int reg = flow_dv_get_metadata_reg(dev, attr, error);
 			if (reg < 0)
 				return;
@@ -1831,16 +1835,32 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((meta_mask >>
-					(meta_count - width)) & meta_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+		MLX5_ASSERT(data->offset + width <= 2);
+		off_be = 2 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_ECN};
 		if (mask)
-			mask[idx] = 0x3 >> (2 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
+		break;
+	case RTE_FLOW_FIELD_GTP_PSC_QFI:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = data->offset + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_GTPU_FIRST_EXT_DW_0};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
@@ -1890,7 +1910,8 @@ flow_dv_convert_action_modify_field
 
 	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
 	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
-		type = MLX5_MODIFICATION_TYPE_SET;
+		type = conf->operation == RTE_FLOW_MODIFY_SET ?
+			MLX5_MODIFICATION_TYPE_SET : MLX5_MODIFICATION_TYPE_ADD;
 		/** For SET fill the destination field (field) first. */
 		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
 						  conf->width, dev,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index b6978bd051..8e0135fdb0 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -319,6 +319,11 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->mhdr) {
+		if (acts->mhdr->action)
+			mlx5dr_action_destroy(acts->mhdr->action);
+		mlx5_free(acts->mhdr);
+	}
 }
 
 /**
@@ -425,6 +430,37 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+static __rte_always_inline int
+__flow_hw_act_data_hdr_modify_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     uint16_t mhdr_cmds_off,
+				     uint16_t mhdr_cmds_end,
+				     bool shared,
+				     struct field_modify_info *field,
+				     struct field_modify_info *dcopy,
+				     uint32_t *mask)
+{
+	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->modify_header.mhdr_cmds_off = mhdr_cmds_off;
+	act_data->modify_header.mhdr_cmds_end = mhdr_cmds_end;
+	act_data->modify_header.shared = shared;
+	rte_memcpy(act_data->modify_header.field, field,
+		   sizeof(*field) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.dcopy, dcopy,
+		   sizeof(*dcopy) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.mask, mask,
+		   sizeof(*mask) * MLX5_ACT_MAX_MOD_FIELDS);
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
 /**
  * Append shared RSS action to the dynamic action list.
  *
@@ -515,6 +551,265 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline bool
+flow_hw_action_modify_field_is_shared(const struct rte_flow_action *action,
+				      const struct rte_flow_action *mask)
+{
+	const struct rte_flow_action_modify_field *v = action->conf;
+	const struct rte_flow_action_modify_field *m = mask->conf;
+
+	if (v->src.field == RTE_FLOW_FIELD_VALUE) {
+		uint32_t j;
+
+		if (m == NULL)
+			return false;
+		for (j = 0; j < RTE_DIM(m->src.value); ++j) {
+			/*
+			 * Immediate value is considered to be masked
+			 * (and thus shared by all flow rules), if mask
+			 * is non-zero. Partial mask over immediate value
+			 * is not allowed.
+			 */
+			if (m->src.value[j])
+				return true;
+		}
+		return false;
+	}
+	if (v->src.field == RTE_FLOW_FIELD_POINTER)
+		return m->src.pvalue != NULL;
+	/*
+	 * Source field types other than VALUE and
+	 * POINTER are always shared.
+	 */
+	return true;
+}
+
+static __rte_always_inline bool
+flow_hw_should_insert_nop(const struct mlx5_hw_modify_header_action *mhdr,
+			  const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd last_cmd = { { 0 } };
+	struct mlx5_modification_cmd new_cmd = { { 0 } };
+	const uint32_t cmds_num = mhdr->mhdr_cmds_num;
+	unsigned int last_type;
+	bool should_insert = false;
+
+	if (cmds_num == 0)
+		return false;
+	last_cmd = *(&mhdr->mhdr_cmds[cmds_num - 1]);
+	last_cmd.data0 = rte_be_to_cpu_32(last_cmd.data0);
+	last_cmd.data1 = rte_be_to_cpu_32(last_cmd.data1);
+	last_type = last_cmd.action_type;
+	new_cmd = *cmd;
+	new_cmd.data0 = rte_be_to_cpu_32(new_cmd.data0);
+	new_cmd.data1 = rte_be_to_cpu_32(new_cmd.data1);
+	switch (new_cmd.action_type) {
+	case MLX5_MODIFICATION_TYPE_SET:
+	case MLX5_MODIFICATION_TYPE_ADD:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = new_cmd.field == last_cmd.field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = new_cmd.field == last_cmd.dst_field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	case MLX5_MODIFICATION_TYPE_COPY:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = (new_cmd.field == last_cmd.field ||
+					 new_cmd.dst_field == last_cmd.field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = (new_cmd.field == last_cmd.dst_field ||
+					 new_cmd.dst_field == last_cmd.dst_field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	default:
+		/* Other action types should be rejected on AT validation. */
+		MLX5_ASSERT(false);
+		break;
+	}
+	return should_insert;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_nop_append(struct mlx5_hw_modify_header_action *mhdr)
+{
+	struct mlx5_modification_cmd *nop;
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	nop = mhdr->mhdr_cmds + num;
+	nop->data0 = 0;
+	nop->action_type = MLX5_MODIFICATION_TYPE_NOP;
+	nop->data0 = rte_cpu_to_be_32(nop->data0);
+	nop->data1 = 0;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_append(struct mlx5_hw_modify_header_action *mhdr,
+			struct mlx5_modification_cmd *cmd)
+{
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	mhdr->mhdr_cmds[num] = *cmd;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_converted_mhdr_cmds_append(struct mlx5_hw_modify_header_action *mhdr,
+				   struct mlx5_flow_dv_modify_hdr_resource *resource)
+{
+	uint32_t idx;
+	int ret;
+
+	for (idx = 0; idx < resource->actions_num; ++idx) {
+		struct mlx5_modification_cmd *src = &resource->actions[idx];
+
+		if (flow_hw_should_insert_nop(mhdr, src)) {
+			ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+			if (ret)
+				return ret;
+		}
+		ret = flow_hw_mhdr_cmd_append(mhdr, src);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static __rte_always_inline void
+flow_hw_modify_field_init(struct mlx5_hw_modify_header_action *mhdr,
+			  struct rte_flow_actions_template *at)
+{
+	memset(mhdr, 0, sizeof(*mhdr));
+	/* Modify header action without any commands is shared by default. */
+	mhdr->shared = true;
+	mhdr->pos = at->mhdr_off;
+}
+
+static __rte_always_inline int
+flow_hw_modify_field_compile(struct rte_eth_dev *dev,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *action_start, /* Start of AT actions. */
+			     const struct rte_flow_action *action, /* Current action from AT. */
+			     const struct rte_flow_action *action_mask, /* Current mask from AT. */
+			     struct mlx5_hw_actions *acts,
+			     struct mlx5_hw_modify_header_action *mhdr,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_modify_field *conf = action->conf;
+	union {
+		struct mlx5_flow_dv_modify_hdr_resource resource;
+		uint8_t data[sizeof(struct mlx5_flow_dv_modify_hdr_resource) +
+			     sizeof(struct mlx5_modification_cmd) * MLX5_MHDR_MAX_CMD];
+	} dummy;
+	struct mlx5_flow_dv_modify_hdr_resource *resource;
+	struct rte_flow_item item = {
+		.spec = NULL,
+		.mask = NULL
+	};
+	struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS] = { 0 };
+	uint32_t type, value = 0;
+	uint16_t cmds_start, cmds_end;
+	bool shared;
+	int ret;
+
+	/*
+	 * Modify header action is shared if previous modify_field actions
+	 * are shared and currently compiled action is shared.
+	 */
+	shared = flow_hw_action_modify_field_is_shared(action, action_mask);
+	mhdr->shared &= shared;
+	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
+	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
+		type = conf->operation == RTE_FLOW_MODIFY_SET ? MLX5_MODIFICATION_TYPE_SET :
+								MLX5_MODIFICATION_TYPE_ADD;
+		/* For SET/ADD fill the destination field (field) first. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
+						  conf->width, dev,
+						  attr, error);
+		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
+				(void *)(uintptr_t)conf->src.pvalue :
+				(void *)(uintptr_t)&conf->src.value;
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+			value = *(const unaligned_uint32_t *)item.spec;
+			value = rte_cpu_to_be_32(value);
+			item.spec = &value;
+		} else if (conf->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+			/*
+			 * QFI is passed as an uint8_t integer, but it is accessed through
+			 * a 2nd least significant byte of a 32-bit field in modify header command.
+			 */
+			value = *(const uint8_t *)item.spec;
+			value = rte_cpu_to_be_32(value << 8);
+			item.spec = &value;
+		}
+	} else {
+		type = MLX5_MODIFICATION_TYPE_COPY;
+		/* For COPY fill the destination field (dcopy) without mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, dcopy, NULL,
+						  conf->width, dev,
+						  attr, error);
+		/* Then construct the source field (field) with mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->src, field, mask,
+						  conf->width, dev,
+						  attr, error);
+	}
+	item.mask = &mask;
+	memset(&dummy, 0, sizeof(dummy));
+	resource = &dummy.resource;
+	ret = flow_dv_convert_modify_action(&item, field, dcopy, resource, type, error);
+	if (ret)
+		return ret;
+	MLX5_ASSERT(resource->actions_num > 0);
+	/*
+	 * If previous modify field action collide with this one, then insert NOP command.
+	 * This NOP command will not be a part of action's command range used to update commands
+	 * on rule creation.
+	 */
+	if (flow_hw_should_insert_nop(mhdr, &resource->actions[0])) {
+		ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+		if (ret)
+			return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL, "too many modify field operations specified");
+	}
+	cmds_start = mhdr->mhdr_cmds_num;
+	ret = flow_hw_converted_mhdr_cmds_append(mhdr, resource);
+	if (ret)
+		return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "too many modify field operations specified");
+
+	cmds_end = mhdr->mhdr_cmds_num;
+	if (shared)
+		return 0;
+	ret = __flow_hw_act_data_hdr_modify_append(priv, acts, RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+						   action - action_start, mhdr->pos,
+						   cmds_start, cmds_end, shared,
+						   field, dcopy, mask);
+	if (ret)
+		return rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "not enough memory to store modify field metadata");
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -558,10 +853,12 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
+	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
 	uint32_t type, i;
 	int err;
 
+	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
 		type = MLX5DR_TABLE_TYPE_FDB;
 	else if (attr->egress)
@@ -714,6 +1011,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			reformat_pos = i++;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr.pos == UINT16_MAX)
+				mhdr.pos = i++;
+			err = flow_hw_modify_field_compile(dev, attr, action_start,
+							   actions, masks, acts, &mhdr,
+							   error);
+			if (err)
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -721,6 +1027,31 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (mhdr.pos != UINT16_MAX) {
+		uint32_t flags;
+		uint32_t bulk_size;
+		size_t mhdr_len;
+
+		acts->mhdr = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*acts->mhdr),
+					 0, SOCKET_ID_ANY);
+		if (!acts->mhdr)
+			goto err;
+		rte_memcpy(acts->mhdr, &mhdr, sizeof(*acts->mhdr));
+		mhdr_len = sizeof(struct mlx5_modification_cmd) * acts->mhdr->mhdr_cmds_num;
+		flags = mlx5_hw_act_flag[!!attr->group][type];
+		if (acts->mhdr->shared) {
+			flags |= MLX5DR_ACTION_FLAG_SHARED;
+			bulk_size = 0;
+		} else {
+			bulk_size = rte_log2_u32(table_attr->nb_flows);
+		}
+		acts->mhdr->action = mlx5dr_action_create_modify_header
+				(priv->dr_ctx, mhdr_len, (__be64 *)acts->mhdr->mhdr_cmds,
+				 bulk_size, flags);
+		if (!acts->mhdr->action)
+			goto err;
+		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
+	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
@@ -884,6 +1215,110 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_mhdr_cmd_is_nop(const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd cmd_he = {
+		.data0 = rte_be_to_cpu_32(cmd->data0),
+		.data1 = 0,
+	};
+
+	return cmd_he.action_type == MLX5_MODIFICATION_TYPE_NOP;
+}
+
+/**
+ * Construct flow action array.
+ *
+ * For action template contains dynamic actions, these actions need to
+ * be updated according to the rte_flow action during flow creation.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] job
+ *   Pointer to job descriptor.
+ * @param[in] hw_acts
+ *   Pointer to translated actions from template.
+ * @param[in] it_idx
+ *   Item template index the action template refer to.
+ * @param[in] actions
+ *   Array of rte_flow action need to be checked.
+ * @param[in] rule_acts
+ *   Array of DR rule actions to be used during flow creation..
+ * @param[in] acts_num
+ *   Pointer to the real acts_num flow has.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	const struct rte_flow_action_modify_field *mhdr_action = action->conf;
+	uint8_t values[16] = { 0 };
+	unaligned_uint32_t *value_p;
+	uint32_t i;
+	struct field_modify_info *field;
+
+	if (!hw_acts->mhdr)
+		return -1;
+	if (hw_acts->mhdr->shared || act_data->modify_header.shared)
+		return 0;
+	MLX5_ASSERT(mhdr_action->operation == RTE_FLOW_MODIFY_SET ||
+		    mhdr_action->operation == RTE_FLOW_MODIFY_ADD);
+	if (mhdr_action->src.field != RTE_FLOW_FIELD_VALUE &&
+	    mhdr_action->src.field != RTE_FLOW_FIELD_POINTER)
+		return 0;
+	if (mhdr_action->src.field == RTE_FLOW_FIELD_VALUE)
+		rte_memcpy(values, &mhdr_action->src.value, sizeof(values));
+	else
+		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
+	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(*value_p);
+	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+		uint32_t tmp;
+
+		/*
+		 * QFI is passed as an uint8_t integer, but it is accessed through
+		 * a 2nd least significant byte of a 32-bit field in modify header command.
+		 */
+		tmp = values[0];
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(tmp << 8);
+	}
+	i = act_data->modify_header.mhdr_cmds_off;
+	field = act_data->modify_header.field;
+	do {
+		uint32_t off_b;
+		uint32_t mask;
+		uint32_t data;
+		const uint8_t *mask_src;
+
+		if (i >= act_data->modify_header.mhdr_cmds_end)
+			return -1;
+		if (flow_hw_mhdr_cmd_is_nop(&job->mhdr_cmd[i])) {
+			++i;
+			continue;
+		}
+		mask_src = (const uint8_t *)act_data->modify_header.mask;
+		mask = flow_dv_fetch_field(mask_src + field->offset, field->size);
+		if (!mask) {
+			++field;
+			continue;
+		}
+		off_b = rte_bsf32(mask);
+		data = flow_dv_fetch_field(values + field->offset, field->size);
+		data = (data & mask) >> off_b;
+		job->mhdr_cmd[i++].data1 = rte_cpu_to_be_32(data);
+		++field;
+	} while (field->size);
+	return 0;
+}
+
 /**
  * Construct flow action array.
  *
@@ -928,6 +1363,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	};
 	uint32_t ft_flag;
 	size_t encap_len = 0;
+	int ret;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -945,6 +1381,18 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
+	if (hw_acts->mhdr && hw_acts->mhdr->mhdr_cmds_num > 0) {
+		uint16_t pos = hw_acts->mhdr->pos;
+
+		if (!hw_acts->mhdr->shared) {
+			rule_acts[pos].modify_header.offset =
+						job->flow->idx - 1;
+			rule_acts[pos].modify_header.data =
+						(uint8_t *)job->mhdr_cmd;
+			rte_memcpy(job->mhdr_cmd, hw_acts->mhdr->mhdr_cmds,
+				   sizeof(*job->mhdr_cmd) * hw_acts->mhdr->mhdr_cmds_num);
+		}
+	}
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1020,6 +1468,14 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_modify_field_construct(job,
+							     act_data,
+							     hw_acts,
+							     action);
+			if (ret)
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -1609,6 +2065,155 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static bool
+flow_hw_modify_field_is_used(const struct rte_flow_action_modify_field *action,
+			     enum rte_flow_field_id field)
+{
+	return action->src.field == field || action->dst.field == field;
+}
+
+static int
+flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
+				     const struct rte_flow_action *mask,
+				     struct rte_flow_error *error)
+{
+	const struct rte_flow_action_modify_field *action_conf =
+		action->conf;
+	const struct rte_flow_action_modify_field *mask_conf =
+		mask->conf;
+
+	if (action_conf->operation != mask_conf->operation)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field operation mask and template are not equal");
+	if (action_conf->dst.field != mask_conf->dst.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->dst.field == RTE_FLOW_FIELD_POINTER ||
+	    action_conf->dst.field == RTE_FLOW_FIELD_VALUE)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"immediate value and pointer cannot be used as destination");
+	if (mask_conf->dst.level != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination encapsulation level must be fully masked");
+	if (mask_conf->dst.offset != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination offset level must be fully masked");
+	if (action_conf->src.field != mask_conf->src.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->src.field != RTE_FLOW_FIELD_POINTER &&
+	    action_conf->src.field != RTE_FLOW_FIELD_VALUE) {
+		if (mask_conf->src.level != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source encapsulation level must be fully masked");
+		if (mask_conf->src.offset != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source offset level must be fully masked");
+	}
+	if (mask_conf->width != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field width field must be fully masked");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_START))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying arbitrary place in a packet is not supported");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_VLAN_TYPE))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying vlan_type is not supported");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_GENEVE_VNI))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying Geneve VNI is not supported");
+	return 0;
+}
+
+static int
+flow_hw_action_validate(const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	int i;
+	bool actions_end = false;
+	int ret;
+
+	for (i = 0; !actions_end; ++i) {
+		const struct rte_flow_action *action = &actions[i];
+		const struct rte_flow_action *mask = &masks[i];
+
+		if (action->type != mask->type)
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "mask type does not match action type");
+		switch (action->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MARK:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_DROP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_JUMP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_validate_action_modify_field(action,
+									mask,
+									error);
+			if (ret < 0)
+				return ret;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			actions_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "action not supported in template API");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow action template.
  *
@@ -1637,6 +2242,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
+	if (flow_hw_action_validate(actions, masks, error))
+		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
 	if (act_len <= 0)
@@ -2093,6 +2700,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
+			    sizeof(struct mlx5_modification_cmd) *
+			    MLX5_MHDR_MAX_CMD +
 			    sizeof(struct mlx5_hw_q_job)) *
 			    queue_attr[0]->size;
 	}
@@ -2104,6 +2713,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	for (i = 0; i < nb_queue; i++) {
 		uint8_t *encap = NULL;
+		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 
 		priv->hw_q[i].job_idx = queue_attr[i]->size;
 		priv->hw_q[i].size = queue_attr[i]->size;
@@ -2115,8 +2725,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 					    &job[queue_attr[i - 1]->size];
 		job = (struct mlx5_hw_q_job *)
 		      &priv->hw_q[i].job[queue_attr[i]->size];
-		encap = (uint8_t *)&job[queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
+		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
 		for (j = 0; j < queue_attr[i]->size; j++) {
+			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
 			priv->hw_q[i].job[j] = &job[j];
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 05/17] net/mlx5: add HW steering port action
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (3 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 04/17] net/mlx5: add modify field hws support Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 06/17] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch implements creating and caching of port actions for use with
HW Steering FDB flows.

Actions are created on flow template API configuration and created
only on the port designated as master. Attaching and detaching of ports
in the same switching domain causes an update to the port actions cache
by, respectively, creating and destroying actions.

A new devarg fdb_def_rule_en is being added and it's used to control
the default dedicated E-Switch rule is created by PMD implicitly or not,
and PMD sets this value to 1 by default.
If set to 0, the default E-Switch rule will not be created and user can
create the specific E-Switch rule on root table if needed.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |    9 +
 drivers/net/mlx5/linux/mlx5_os.c   |   14 +
 drivers/net/mlx5/mlx5.c            |   14 +
 drivers/net/mlx5/mlx5.h            |   26 +-
 drivers/net/mlx5/mlx5_flow.c       |   96 +-
 drivers/net/mlx5/mlx5_flow.h       |   22 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   93 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1356 +++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_trigger.c    |   77 +-
 10 files changed, 1594 insertions(+), 117 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 631f0840eb..c42ac482d8 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1118,6 +1118,15 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``fdb_def_rule_en`` parameter [int]
+
+  A non-zero value enables the PMD to create a dedicated rule on E-Switch root
+  table, this dedicated rule forwards all incoming packets into table 1, other
+  rules will be created in E-Switch table original table level plus one, to
+  improve the flow insertion rate due to skip root table managed by firmware.
+  If set to 0, all rules will be created on the original E-Switch table level.
+
+  By default, the PMD will set this value to 1.
 
 Supported NICs
 --------------
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index b7cc11a2ef..e0586a4d6f 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1556,6 +1556,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			flow_hw_set_port_info(eth_dev);
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    flow_hw_create_vport_action(eth_dev)) {
+			DRV_LOG(ERR, "port %u failed to create vport action",
+				eth_dev->data->port_id);
+			err = EINVAL;
+			goto error;
+		}
 		return eth_dev;
 #else
 		DRV_LOG(ERR, "DV support is missing for HWS.");
@@ -1620,6 +1627,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	return eth_dev;
 error:
 	if (priv) {
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+		if (eth_dev &&
+		    priv->sh &&
+		    priv->sh->config.dv_flow_en == 2 &&
+		    priv->sh->config.dv_esw_en)
+			flow_hw_destroy_vport_action(eth_dev);
+#endif
 		if (priv->mreg_cp_tbl)
 			mlx5_hlist_destroy(priv->mreg_cp_tbl);
 		if (priv->sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b39ef1ecbe..74adb677f4 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -172,6 +172,9 @@
 /* Device parameter to configure the delay drop when creating Rxqs. */
 #define MLX5_DELAY_DROP "delay_drop"
 
+/* Device parameter to create the fdb default rule in PMD */
+#define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1239,6 +1242,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->decap_en = !!tmp;
 	} else if (strcmp(MLX5_ALLOW_DUPLICATE_PATTERN, key) == 0) {
 		config->allow_duplicate_pattern = !!tmp;
+	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
+		config->fdb_def_rule = !!tmp;
 	}
 	return 0;
 }
@@ -1274,6 +1279,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_RECLAIM_MEM,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
+		MLX5_FDB_DEFAULT_RULE_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1285,6 +1291,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->dv_flow_en = 1;
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
+	config->fdb_def_rule = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1360,6 +1367,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"decap_en\" is %u.", config->decap_en);
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
+	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
 	return 0;
 }
 
@@ -1943,6 +1951,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	mlx5_flex_parser_ecpri_release(dev);
 	mlx5_flex_item_port_cleanup(dev);
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
 	if (priv->sh->config.dv_flow_en == 2)
@@ -2644,6 +2653,11 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.fdb_def_rule ^ config->fdb_def_rule) {
+		DRV_LOG(ERR, "\"fdb_def_rule_en\" configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.l3_vxlan_en ^ config->l3_vxlan_en) {
 		DRV_LOG(ERR, "\"l3_vxlan_en\" "
 			"configuration mismatch for shared %s context.",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d07f5b0d8a..0bf21c1efe 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -309,6 +309,7 @@ struct mlx5_sh_config {
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
 	/* Allow/Prevent the duplicate rules pattern. */
+	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
 
@@ -337,6 +338,8 @@ enum {
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
 };
 
+#define MLX5_HW_MAX_ITEMS (16)
+
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
@@ -344,6 +347,8 @@ struct mlx5_hw_q_job {
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
+	struct rte_flow_item *items;
+	struct rte_flow_item_ethdev port_spec;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -1202,6 +1207,8 @@ struct mlx5_dev_ctx_shared {
 	uint32_t flow_priority_check_flag:1; /* Check Flag for flow priority. */
 	uint32_t metadata_regc_check_flag:1; /* Check Flag for metadata REGC. */
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
+	uint32_t shared_mark_enabled:1;
+	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1450,6 +1457,12 @@ struct mlx5_obj_ops {
 
 #define MLX5_RSS_HASH_FIELDS_LEN RTE_DIM(mlx5_rss_hash_fields)
 
+struct mlx5_hw_ctrl_flow {
+	LIST_ENTRY(mlx5_hw_ctrl_flow) next;
+	struct rte_eth_dev *owner_dev;
+	struct rte_flow *flow;
+};
+
 struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared device context. */
@@ -1490,6 +1503,11 @@ struct mlx5_priv {
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
 	void *root_drop_action; /* Pointer to root drop action. */
+	rte_spinlock_t hw_ctrl_lock;
+	LIST_HEAD(hw_ctrl_flow, mlx5_hw_ctrl_flow) hw_ctrl_flows;
+	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
+	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
+	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
@@ -1550,11 +1568,11 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_drop[MLX5_HW_ACTION_FLAG_MAX]
-				     [MLX5DR_TABLE_TYPE_MAX];
-	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_tag[MLX5_HW_ACTION_FLAG_MAX];
+	struct mlx5dr_action *hw_drop[2];
+	/* HW steering global tag action. */
+	struct mlx5dr_action *hw_tag[2];
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 3abb39aa92..9c44b2e99b 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -999,6 +999,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.flex_item_create = mlx5_flow_flex_item_create,
 	.flex_item_release = mlx5_flow_flex_item_release,
 	.info_get = mlx5_flow_info_get,
+	.pick_transfer_proxy = mlx5_flow_pick_transfer_proxy,
 	.configure = mlx5_flow_port_configure,
 	.pattern_template_create = mlx5_flow_pattern_template_create,
 	.pattern_template_destroy = mlx5_flow_pattern_template_destroy,
@@ -1242,7 +1243,7 @@ mlx5_get_lowest_priority(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (!attr->group && !attr->transfer)
+	if (!attr->group && !(attr->transfer && priv->fdb_def_rule))
 		return priv->sh->flow_max_priority - 2;
 	return MLX5_NON_ROOT_FLOW_MAX_PRIO - 1;
 }
@@ -1269,11 +1270,14 @@ mlx5_get_matcher_priority(struct rte_eth_dev *dev,
 	uint16_t priority = (uint16_t)attr->priority;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+	/* NIC root rules */
 	if (!attr->group && !attr->transfer) {
 		if (attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR)
 			priority = priv->sh->flow_max_priority - 1;
 		return mlx5_os_flow_adjust_priority(dev, priority, subpriority);
-	} else if (!external && attr->transfer && attr->group == 0 &&
+	/* FDB root rules */
+	} else if (attr->transfer && (!external || !priv->fdb_def_rule) &&
+		   attr->group == 0 &&
 		   attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR) {
 		return (priv->sh->flow_max_priority - 1) * 3;
 	}
@@ -1481,13 +1485,32 @@ flow_rxq_mark_flag_set(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_ctrl *rxq_ctrl;
+	uint16_t port_id;
 
-	if (priv->mark_enabled)
+	if (priv->sh->shared_mark_enabled)
 		return;
-	LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
-		rxq_ctrl->rxq.mark = 1;
+	if (priv->master || priv->representor) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->domain_id != priv->domain_id ||
+			    opriv->mark_enabled)
+				continue;
+			LIST_FOREACH(rxq_ctrl, &opriv->rxqsctrl, next) {
+				rxq_ctrl->rxq.mark = 1;
+			}
+			opriv->mark_enabled = 1;
+		}
+	} else {
+		LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
+			rxq_ctrl->rxq.mark = 1;
+		}
+		priv->mark_enabled = 1;
 	}
-	priv->mark_enabled = 1;
+	priv->sh->shared_mark_enabled = 1;
 }
 
 /**
@@ -1623,6 +1646,7 @@ flow_rxq_flags_clear(struct rte_eth_dev *dev)
 		rxq->ctrl->rxq.tunnel = 0;
 	}
 	priv->mark_enabled = 0;
+	priv->sh->shared_mark_enabled = 0;
 }
 
 /**
@@ -2808,8 +2832,8 @@ mlx5_flow_validate_item_tcp(const struct rte_flow_item *item,
  *   Item specification.
  * @param[in] item_flags
  *   Bit-fields that holds the items detected until now.
- * @param[in] attr
- *   Flow rule attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2821,7 +2845,7 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 			      uint16_t udp_dport,
 			      const struct rte_flow_item *item,
 			      uint64_t item_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_vxlan *spec = item->spec;
@@ -2858,12 +2882,11 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 	if (priv->sh->steering_format_version !=
 	    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
 	    !udp_dport || udp_dport == MLX5_UDP_PORT_VXLAN) {
-		/* FDB domain & NIC domain non-zero group */
-		if ((attr->transfer || attr->group) && priv->sh->misc5_cap)
+		/* non-root table */
+		if (!root && priv->sh->misc5_cap)
 			valid_mask = &nic_mask;
 		/* Group zero in NIC domain */
-		if (!attr->group && !attr->transfer &&
-		    priv->sh->tunnel_header_0_1)
+		if (!root && priv->sh->tunnel_header_0_1)
 			valid_mask = &nic_mask;
 	}
 	ret = mlx5_flow_item_acceptable
@@ -3102,11 +3125,11 @@ mlx5_flow_validate_item_gre_option(struct rte_eth_dev *dev,
 	if (mask->checksum_rsvd.checksum || mask->sequence.sequence) {
 		if (priv->sh->steering_format_version ==
 		    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
-		    ((attr->group || attr->transfer) &&
+		    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
 		     !priv->sh->misc5_cap) ||
 		    (!(priv->sh->tunnel_header_0_1 &&
 		       priv->sh->tunnel_header_2_3) &&
-		    !attr->group && !attr->transfer))
+		    !attr->group && (!attr->transfer || !priv->fdb_def_rule)))
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  item,
@@ -6163,7 +6186,8 @@ flow_create_split_metadata(struct rte_eth_dev *dev,
 	}
 	if (qrss) {
 		/* Check if it is in meter suffix table. */
-		mtr_sfx = attr->group == (attr->transfer ?
+		mtr_sfx = attr->group ==
+			  ((attr->transfer && priv->fdb_def_rule) ?
 			  (MLX5_FLOW_TABLE_LEVEL_METER - 1) :
 			  MLX5_FLOW_TABLE_LEVEL_METER);
 		/*
@@ -11086,3 +11110,43 @@ int mlx5_flow_get_item_vport_id(struct rte_eth_dev *dev,
 
 	return 0;
 }
+
+int
+mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+			      uint16_t *proxy_port_id,
+			      struct rte_flow_error *error)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t port_id;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " without E-Switch configured");
+	if (!priv->master && !priv->representor)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " for port which is not a master"
+					  " or a representor port");
+	if (priv->master) {
+		*proxy_port_id = dev->data->port_id;
+		return 0;
+	}
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_priv->master &&
+		    port_priv->domain_id == priv->domain_id) {
+			*proxy_port_id = port_id;
+			return 0;
+		}
+	}
+	return rte_flow_error_set(error, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "unable to find a proxy port");
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 0eab3a3797..93f0e189d4 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1151,6 +1151,11 @@ struct rte_flow_pattern_template {
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
 	uint32_t refcnt;  /* Reference counter. */
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * represented_port pattern item.
+	 */
+	bool implicit_port;
 };
 
 /* Flow action template struct. */
@@ -1226,6 +1231,7 @@ struct mlx5_hw_action_template {
 /* mlx5 flow group struct. */
 struct mlx5_flow_group {
 	struct mlx5_list_entry entry;
+	struct rte_eth_dev *dev; /* Reference to corresponding device. */
 	struct mlx5dr_table *tbl; /* HWS table object. */
 	struct mlx5_hw_jump_action jump; /* Jump action. */
 	enum mlx5dr_table_type type; /* Table type. */
@@ -1484,6 +1490,9 @@ void flow_hw_clear_port_info(struct rte_eth_dev *dev);
 void flow_hw_init_tags_set(struct rte_eth_dev *dev);
 void flow_hw_clear_tags_set(struct rte_eth_dev *dev);
 
+int flow_hw_create_vport_action(struct rte_eth_dev *dev);
+void flow_hw_destroy_vport_action(struct rte_eth_dev *dev);
+
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr,
 				    const struct rte_flow_item items[],
@@ -2056,7 +2065,7 @@ int mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 				  uint16_t udp_dport,
 				  const struct rte_flow_item *item,
 				  uint64_t item_flags,
-				  const struct rte_flow_attr *attr,
+				  bool root,
 				  struct rte_flow_error *error);
 int mlx5_flow_validate_item_vxlan_gpe(const struct rte_flow_item *item,
 				      uint64_t item_flags,
@@ -2313,4 +2322,15 @@ int flow_dv_translate_items_hws(const struct rte_flow_item *items,
 				uint32_t key_type, uint64_t *item_flags,
 				uint8_t *match_criteria,
 				struct rte_flow_error *error);
+
+int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+				  uint16_t *proxy_port_id,
+				  struct rte_flow_error *error);
+
+int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
+
+int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
+					 uint32_t txq);
+int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 7dff2ab44f..3fc2453045 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2471,8 +2471,8 @@ flow_dv_validate_item_gtp(struct rte_eth_dev *dev,
  *   Previous validated item in the pattern items.
  * @param[in] gtp_item
  *   Previous GTP item specification.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2483,7 +2483,7 @@ static int
 flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			      uint64_t last_item,
 			      const struct rte_flow_item *gtp_item,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_gtp *gtp_spec;
@@ -2508,7 +2508,7 @@ flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, item,
 			 "GTP E flag must be 1 to match GTP PSC");
 	/* Check the flow is not created in group zero. */
-	if (!attr->transfer && !attr->group)
+	if (root)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 			 "GTP PSC is not supported for group 0");
@@ -3373,20 +3373,19 @@ flow_dv_validate_action_set_tag(struct rte_eth_dev *dev,
 /**
  * Indicates whether ASO aging is supported.
  *
- * @param[in] sh
- *   Pointer to shared device context structure.
- * @param[in] attr
- *   Attributes of flow that includes AGE action.
+ * @param[in] priv
+ *   Pointer to device private context structure.
+ * @param[in] root
+ *   Whether action is on root table.
  *
  * @return
  *   True when ASO aging is supported, false otherwise.
  */
 static inline bool
-flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
-		const struct rte_flow_attr *attr)
+flow_hit_aso_supported(const struct mlx5_priv *priv, bool root)
 {
-	MLX5_ASSERT(sh && attr);
-	return (sh->flow_hit_aso_en && (attr->transfer || attr->group));
+	MLX5_ASSERT(priv);
+	return (priv->sh->flow_hit_aso_en && !root);
 }
 
 /**
@@ -3398,8 +3397,8 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
  *   Indicator if action is shared.
  * @param[in] action_flags
  *   Holds the actions detected until now.
- * @param[in] attr
- *   Attributes of flow that includes this action.
+ * @param[in] root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3409,7 +3408,7 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
 static int
 flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 			      uint64_t action_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -3421,7 +3420,7 @@ flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "duplicate count actions set");
 	if (shared && (action_flags & MLX5_FLOW_ACTION_AGE) &&
-	    !flow_hit_aso_supported(priv->sh, attr))
+	    !flow_hit_aso_supported(priv, root))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "old age and indirect count combination is not supported");
@@ -3652,8 +3651,8 @@ flow_dv_validate_action_raw_encap_decap
  *   Holds the actions detected until now.
  * @param[in] item_flags
  *   The items found in this flow rule.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3664,12 +3663,12 @@ static int
 flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 			       uint64_t action_flags,
 			       uint64_t item_flags,
-			       const struct rte_flow_attr *attr,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	RTE_SET_USED(dev);
 
-	if (attr->group == 0 && !attr->transfer)
+	if (root)
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -4919,6 +4918,8 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   Pointer to the modify action.
  * @param[in] attr
  *   Pointer to the flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -4931,6 +4932,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 				   const uint64_t action_flags,
 				   const struct rte_flow_action *action,
 				   const struct rte_flow_attr *attr,
+				   bool root,
 				   struct rte_flow_error *error)
 {
 	int ret = 0;
@@ -4978,7 +4980,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	}
 	if (action_modify_field->src.field != RTE_FLOW_FIELD_VALUE &&
 	    action_modify_field->src.field != RTE_FLOW_FIELD_POINTER) {
-		if (!attr->transfer && !attr->group)
+		if (root)
 			return rte_flow_error_set(error, ENOTSUP,
 					RTE_FLOW_ERROR_TYPE_ACTION, action,
 					"modify field action is not"
@@ -5068,8 +5070,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV4_ECN ||
 	    action_modify_field->dst.field == RTE_FLOW_FIELD_IPV6_ECN ||
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV6_ECN)
-		if (!hca_attr->modify_outer_ip_ecn &&
-		    !attr->transfer && !attr->group)
+		if (!hca_attr->modify_outer_ip_ecn && root)
 			return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_ACTION, action,
 				"modifications of the ECN for current firmware is not supported");
@@ -5103,11 +5104,12 @@ flow_dv_validate_action_jump(struct rte_eth_dev *dev,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
 	struct flow_grp_info grp_info = {
 		.external = !!external,
 		.transfer = !!attributes->transfer,
-		.fdb_def_rule = 1,
+		.fdb_def_rule = !!priv->fdb_def_rule,
 		.std_tbl_fix = 0
 	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
@@ -5687,6 +5689,8 @@ flow_dv_modify_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  *   Pointer to the COUNT action in sample action list.
  * @param[out] fdb_mirror_limit
  *   Pointer to the FDB mirror limitation flag.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -5703,6 +5707,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 			       const struct rte_flow_action_rss **sample_rss,
 			       const struct rte_flow_action_count **count,
 			       int *fdb_mirror_limit,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5804,7 +5809,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count
 				(dev, false, *action_flags | sub_action_flags,
-				 attr, error);
+				 root, error);
 			if (ret < 0)
 				return ret;
 			*count = act->conf;
@@ -7284,7 +7289,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
@@ -7378,7 +7383,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
 			ret = flow_dv_validate_item_gtp_psc(items, last_item,
-							    gtp_item, attr,
+							    gtp_item, is_root,
 							    error);
 			if (ret < 0)
 				return ret;
@@ -7595,7 +7600,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count(dev, shared_count,
 							    action_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			count = actions->conf;
@@ -7889,7 +7894,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_SET_TAG;
 			break;
 		case MLX5_RTE_FLOW_ACTION_TYPE_AGE:
-			if (!attr->transfer && !attr->group)
+			if (is_root)
 				return rte_flow_error_set(error, ENOTSUP,
 						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 									   NULL,
@@ -7914,7 +7919,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			 * Validate the regular AGE action (using counter)
 			 * mutual exclusion with indirect counter actions.
 			 */
-			if (!flow_hit_aso_supported(priv->sh, attr)) {
+			if (!flow_hit_aso_supported(priv, is_root)) {
 				if (shared_count)
 					return rte_flow_error_set
 						(error, EINVAL,
@@ -7970,6 +7975,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 							     rss, &sample_rss,
 							     &sample_count,
 							     &fdb_mirror_limit,
+							     is_root,
 							     error);
 			if (ret < 0)
 				return ret;
@@ -7986,6 +7992,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 								   action_flags,
 								   actions,
 								   attr,
+								   is_root,
 								   error);
 			if (ret < 0)
 				return ret;
@@ -7999,8 +8006,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			ret = flow_dv_validate_action_aso_ct(dev, action_flags,
-							     item_flags, attr,
-							     error);
+							     item_flags,
+							     is_root, error);
 			if (ret < 0)
 				return ret;
 			action_flags |= MLX5_FLOW_ACTION_CT;
@@ -9197,15 +9204,18 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev,
 	if (MLX5_ITEM_VALID(item, key_type))
 		return;
 	MLX5_ITEM_UPDATE(item, key_type, vxlan_v, vxlan_m, &nic_mask);
-	if (item->mask == &nic_mask &&
-	    ((!attr->group && !priv->sh->tunnel_header_0_1) ||
-	    (attr->group && !priv->sh->misc5_cap)))
+	if ((item->mask == &nic_mask) &&
+	    ((!attr->group && !(attr->transfer && priv->fdb_def_rule) &&
+	    !priv->sh->tunnel_header_0_1) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)))
 		vxlan_m = &rte_flow_item_vxlan_mask;
 	if ((priv->sh->steering_format_version ==
 	     MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 &&
 	     dport != MLX5_UDP_PORT_VXLAN) ||
-	    (!attr->group && !attr->transfer) ||
-	    ((attr->group || attr->transfer) && !priv->sh->misc5_cap)) {
+	    (!attr->group && !(attr->transfer && priv->fdb_def_rule)) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)) {
 		misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 		size = sizeof(vxlan_m->vni);
 		vni_v = MLX5_ADDR_OF(fte_match_set_misc, misc_v, vxlan_vni);
@@ -14177,7 +14187,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 			 */
 			if (action_flags & MLX5_FLOW_ACTION_AGE) {
 				if ((non_shared_age && count) ||
-				    !flow_hit_aso_supported(priv->sh, attr)) {
+				    !flow_hit_aso_supported(priv, !dev_flow->dv.group)) {
 					/* Creates age by counters. */
 					cnt_act = flow_dv_prepare_counter
 								(dev, dev_flow,
@@ -18326,6 +18336,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 			struct rte_flow_error *err)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	/* called from RTE API */
 
 	RTE_SET_USED(conf);
 	switch (action->type) {
@@ -18353,7 +18364,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 						"Indirect age action not supported");
 		return flow_dv_validate_action_age(0, action, dev, err);
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		return flow_dv_validate_action_count(dev, true, 0, NULL, err);
+		return flow_dv_validate_action_count(dev, true, 0, false, err);
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		if (!priv->sh->ct_aso_en)
 			return rte_flow_error_set(err, ENOTSUP,
@@ -18530,6 +18541,8 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 	bool def_green = false;
 	bool def_yellow = false;
 	const struct rte_flow_action_rss *rss_color[RTE_COLORS] = {NULL};
+	/* Called from RTE API */
+	bool is_root = !(attr->group || (attr->transfer && priv->fdb_def_rule));
 
 	if (!dev_conf->dv_esw_en)
 		def_domain &= ~MLX5_MTR_DOMAIN_TRANSFER_BIT;
@@ -18731,7 +18744,7 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 				break;
 			case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 				ret = flow_dv_validate_action_modify_field(dev,
-					action_flags[i], act, attr, &flow_err);
+					action_flags[i], act, attr, is_root, &flow_err);
 				if (ret < 0)
 					return -rte_mtr_error_set(error,
 					  ENOTSUP,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 8e0135fdb0..b3b37f36a2 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,6 +20,14 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
+/* Maximum number of rules in control flow tables */
+#define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
+
+/* Flow group for SQ miss default flows/ */
+#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+
+static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -57,6 +65,9 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_ctrl_get(dev, i);
 
+		/* With RXQ start/stop feature, RXQ might be stopped. */
+		if (!rxq_ctrl)
+			continue;
 		rxq_ctrl->rxq.mark = enable;
 	}
 	priv->mark_enabled = enable;
@@ -810,6 +821,77 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+flow_hw_represented_port_compile(struct rte_eth_dev *dev,
+				 const struct rte_flow_attr *attr,
+				 const struct rte_flow_action *action_start,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *action_mask,
+				 struct mlx5_hw_actions *acts,
+				 uint16_t action_dst,
+				 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_ethdev *v = action->conf;
+	const struct rte_flow_action_ethdev *m = action_mask->conf;
+	int ret;
+
+	if (!attr->group)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used on group 0");
+	if (!attr->transfer)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER,
+					  NULL,
+					  "represented_port action requires"
+					  " transfer attribute");
+	if (attr->ingress || attr->egress)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used with direction attributes");
+	if (!priv->master)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "represented_port acton must"
+					  " be used on proxy port");
+	if (m && !!m->port_id) {
+		struct mlx5_priv *port_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
+		if (port_priv == NULL)
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "port does not exist or unable to"
+					 " obtain E-Switch info for port");
+		MLX5_ASSERT(priv->hw_vport != NULL);
+		if (priv->hw_vport[v->port_id]) {
+			acts->rule_acts[action_dst].action =
+					priv->hw_vport[v->port_id];
+		} else {
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "cannot use represented_port action"
+					 " with this port");
+		}
+	} else {
+		ret = __flow_hw_act_data_general_append
+				(priv, acts, action->type,
+				 action - action_start, action_dst);
+		if (ret)
+			return rte_flow_error_set
+					(error, ENOMEM,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "not enough memory to store"
+					 " vport action");
+	}
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -887,7 +969,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			acts->rule_acts[i++].action =
-				priv->hw_drop[!!attr->group][type];
+				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			acts->mark = true;
@@ -1020,6 +1102,13 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			if (err)
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			if (flow_hw_represented_port_compile
+					(dev, attr, action_start, actions,
+					 masks, acts, i, error))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1352,11 +1441,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5dr_rule_action *rule_acts,
 			  uint32_t *acts_num)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
+	const struct rte_flow_action_ethdev *port_action = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1476,6 +1567,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (ret)
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			port_action = action->conf;
+			if (!priv->hw_vport[port_action->port_id])
+				return -1;
+			rule_acts[act_data->action_dst].action =
+					priv->hw_vport[port_action->port_id];
+			break;
 		default:
 			break;
 		}
@@ -1488,6 +1586,52 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static const struct rte_flow_item *
+flow_hw_get_rule_items(struct rte_eth_dev *dev,
+		       struct rte_flow_template_table *table,
+		       const struct rte_flow_item items[],
+		       uint8_t pattern_template_index,
+		       struct mlx5_hw_q_job *job)
+{
+	if (table->its[pattern_template_index]->implicit_port) {
+		const struct rte_flow_item *curr_item;
+		unsigned int nb_items;
+		bool found_end;
+		unsigned int i;
+
+		/* Count number of pattern items. */
+		nb_items = 0;
+		found_end = false;
+		for (curr_item = items; !found_end; ++curr_item) {
+			++nb_items;
+			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+				found_end = true;
+		}
+		/* Prepend represented port item. */
+		job->port_spec = (struct rte_flow_item_ethdev){
+			.port_id = dev->data->port_id,
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &job->port_spec,
+		};
+		found_end = false;
+		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
+			job->items[i] = items[i - 1];
+			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
+				found_end = true;
+				break;
+			}
+		}
+		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+		return job->items;
+	}
+	return items;
+}
+
 /**
  * Enqueue HW steering flow creation.
  *
@@ -1539,6 +1683,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
+	const struct rte_flow_item *rule_items;
 	uint32_t acts_num, flow_idx;
 	int ret;
 
@@ -1565,15 +1710,23 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow action array based on the input actions.*/
-	flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num);
+	/* Construct the flow actions based on the input actions.*/
+	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
+				  actions, rule_acts, &acts_num)) {
+		rte_errno = EINVAL;
+		goto free;
+	}
+	rule_items = flow_hw_get_rule_items(dev, table, items,
+					    pattern_template_index, job);
+	if (!rule_items)
+		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
 				 pattern_template_index, items,
 				 action_template_index, rule_acts,
 				 &rule_attr, &flow->rule);
 	if (likely(!ret))
 		return (struct rte_flow *)flow;
+free:
 	/* Flow created fail, return the descriptor and flow memory. */
 	mlx5_ipool_free(table->flow, flow_idx);
 	priv->hw_q[queue].job_idx++;
@@ -1754,7 +1907,9 @@ __flow_hw_pull_comp(struct rte_eth_dev *dev,
 	struct rte_flow_op_result comp[BURST_THR];
 	int ret, i, empty_loop = 0;
 
-	flow_hw_push(dev, queue, error);
+	ret = flow_hw_push(dev, queue, error);
+	if (ret < 0)
+		return ret;
 	while (pending_rules) {
 		ret = flow_hw_pull(dev, queue, comp, BURST_THR, error);
 		if (ret < 0)
@@ -2039,8 +2194,12 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
+	uint32_t fidx = 1;
 
-	if (table->refcnt) {
+	/* Build ipool allocated object bitmap. */
+	mlx5_ipool_flush_cache(table->flow);
+	/* Check if ipool has allocated objects. */
+	if (table->refcnt || mlx5_ipool_get_next(table->flow, &fidx)) {
 		DRV_LOG(WARNING, "Table %p is still in using.", (void *)table);
 		return rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2052,8 +2211,6 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 		__atomic_sub_fetch(&table->its[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
 	for (i = 0; i < table->nb_action_templates; i++) {
-		if (table->ats[i].acts.mark)
-			flow_hw_rxq_flag_set(dev, false);
 		__flow_hw_action_template_destroy(dev, &table->ats[i].acts);
 		__atomic_sub_fetch(&table->ats[i].action_template->refcnt,
 				   1, __ATOMIC_RELAXED);
@@ -2138,7 +2295,51 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
 }
 
 static int
-flow_hw_action_validate(const struct rte_flow_action actions[],
+flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
+					 const struct rte_flow_action *action,
+					 const struct rte_flow_action *mask,
+					 struct rte_flow_error *error)
+{
+	const struct rte_flow_action_ethdev *action_conf = action->conf;
+	const struct rte_flow_action_ethdev *mask_conf = mask->conf;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "cannot use represented_port actions"
+					  " without an E-Switch");
+	if (mask_conf->port_id) {
+		struct mlx5_priv *port_priv;
+		struct mlx5_priv *dev_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
+		if (!port_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for port");
+		dev_priv = mlx5_dev_to_eswitch_info(dev);
+		if (!dev_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for transfer proxy");
+		if (port_priv->domain_id != dev_priv->domain_id)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "cannot forward to port from"
+						  " a different E-Switch");
+	}
+	return 0;
+}
+
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
@@ -2201,6 +2402,12 @@ flow_hw_action_validate(const struct rte_flow_action actions[],
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			ret = flow_hw_validate_action_represented_port
+					(dev, action, mask, error);
+			if (ret < 0)
+				return ret;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2242,7 +2449,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
-	if (flow_hw_action_validate(actions, masks, error))
+	if (flow_hw_action_validate(dev, actions, masks, error))
 		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
@@ -2325,6 +2532,46 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
+static struct rte_flow_item *
+flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
+			       struct rte_flow_error *error)
+{
+	const struct rte_flow_item *curr_item;
+	struct rte_flow_item *copied_items;
+	bool found_end;
+	unsigned int nb_items;
+	unsigned int i;
+	size_t size;
+
+	/* Count number of pattern items. */
+	nb_items = 0;
+	found_end = false;
+	for (curr_item = items; !found_end; ++curr_item) {
+		++nb_items;
+		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+			found_end = true;
+	}
+	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	size = sizeof(*copied_items) * (nb_items + 1);
+	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
+	if (!copied_items) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				   NULL,
+				   "cannot allocate item template");
+		return NULL;
+	}
+	copied_items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = NULL,
+		.last = NULL,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	for (i = 1; i < nb_items + 1; ++i)
+		copied_items[i] = items[i - 1];
+	return copied_items;
+}
+
 /**
  * Create flow item template.
  *
@@ -2348,9 +2595,35 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *it;
+	struct rte_flow_item *copied_items = NULL;
+	const struct rte_flow_item *tmpl_items;
 
+	if (priv->sh->config.dv_esw_en && attr->ingress) {
+		/*
+		 * Disallow pattern template with ingress and egress/transfer
+		 * attributes in order to forbid implicit port matching
+		 * on egress and transfer traffic.
+		 */
+		if (attr->egress || attr->transfer) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "item template for ingress traffic"
+					   " cannot be used for egress/transfer"
+					   " traffic when E-Switch is enabled");
+			return NULL;
+		}
+		copied_items = flow_hw_copy_prepend_port_item(items, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else {
+		tmpl_items = items;
+	}
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL,
@@ -2358,8 +2631,10 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
-	it->mt = mlx5dr_match_template_create(items, attr->relaxed_matching);
+	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		mlx5_free(it);
 		rte_flow_error_set(error, rte_errno,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2367,9 +2642,12 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 				   "cannot create match template");
 		return NULL;
 	}
-	it->item_flags = flow_hw_rss_item_flags_get(items);
+	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
+	it->implicit_port = !!copied_items;
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
+	if (copied_items)
+		mlx5_free(copied_items);
 	return it;
 }
 
@@ -2495,6 +2773,7 @@ flow_hw_grp_create_cb(void *tool_ctx, void *cb_ctx)
 			goto error;
 		grp_data->jump.root_action = jump;
 	}
+	grp_data->dev = dev;
 	grp_data->idx = idx;
 	grp_data->group_id = attr->group;
 	grp_data->type = dr_tbl_attr.type;
@@ -2563,7 +2842,8 @@ flow_hw_grp_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 	struct rte_flow_attr *attr =
 			(struct rte_flow_attr *)ctx->data;
 
-	return (grp_data->group_id != attr->group) ||
+	return (grp_data->dev != ctx->dev) ||
+		(grp_data->group_id != attr->group) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_FDB) &&
 		attr->transfer) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_NIC_TX) &&
@@ -2626,6 +2906,545 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
 	mlx5_ipool_free(sh->ipool[MLX5_IPOOL_HW_GRP], grp_data->idx);
 }
 
+/**
+ * Create and cache a vport action for given @p dev port. vport actions
+ * cache is used in HWS with FDB flows.
+ *
+ * This function does not create any function if proxy port for @p dev port
+ * was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ *
+ * @return
+ *   0 on success, positive value otherwise.
+ */
+int
+flow_hw_create_vport_action(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+	int ret;
+
+	ret = mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL);
+	if (ret)
+		return ret;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport)
+		return 0;
+	if (proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u HWS vport action already created",
+			port_id);
+		return -EINVAL;
+	}
+	proxy_priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+			(proxy_priv->dr_ctx, priv->dev_port,
+			 MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u unable to create HWS vport action",
+			port_id);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Destroys the vport action associated with @p dev device
+ * from actions' cache.
+ *
+ * This function does not destroy any action if there is no action cached
+ * for @p dev or proxy port was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ */
+void
+flow_hw_destroy_vport_action(struct rte_eth_dev *dev)
+{
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+
+	if (mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL))
+		return;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport || !proxy_priv->hw_vport[port_id])
+		return;
+	mlx5dr_action_destroy(proxy_priv->hw_vport[port_id]);
+	proxy_priv->hw_vport[port_id] = NULL;
+}
+
+static int
+flow_hw_create_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	MLX5_ASSERT(!priv->hw_vport);
+	priv->hw_vport = mlx5_malloc(MLX5_MEM_ZERO,
+				     sizeof(*priv->hw_vport) * RTE_MAX_ETHPORTS,
+				     0, SOCKET_ID_ANY);
+	if (!priv->hw_vport)
+		return -ENOMEM;
+	DRV_LOG(DEBUG, "port %u :: creating vport actions", priv->dev_data->port_id);
+	DRV_LOG(DEBUG, "port %u ::    domain_id=%u", priv->dev_data->port_id, priv->domain_id);
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
+		struct mlx5_priv *port_priv = rte_eth_devices[port_id].data->dev_private;
+
+		if (!port_priv ||
+		    port_priv->domain_id != priv->domain_id)
+			continue;
+		DRV_LOG(DEBUG, "port %u :: for port_id=%u, calling mlx5dr_action_create_dest_vport() with ibport=%u",
+			priv->dev_data->port_id, port_id, port_priv->dev_port);
+		priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+				(priv->dr_ctx, port_priv->dev_port,
+				 MLX5DR_ACTION_FLAG_HWS_FDB);
+		DRV_LOG(DEBUG, "port %u :: priv->hw_vport[%u]=%p",
+			priv->dev_data->port_id, port_id, (void *)priv->hw_vport[port_id]);
+		if (!priv->hw_vport[port_id])
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static void
+flow_hw_free_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	if (!priv->hw_vport)
+		return;
+	for (port_id = 0; port_id < RTE_MAX_ETHPORTS; ++port_id)
+		if (priv->hw_vport[port_id])
+			mlx5dr_action_destroy(priv->hw_vport[port_id]);
+	mlx5_free(priv->hw_vport);
+	priv->hw_vport = NULL;
+}
+
+/**
+ * Creates a flow pattern template used to match on E-Switch Manager.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template used to match on a TX queue.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template with unmasked represented port matching.
+ * This template is used to set up a table for default transfer flows
+ * directing packets to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow actions template with an unmasked JUMP action. Flows
+ * based on this template will perform a jump to some group. This template
+ * is used to set up tables for control flows.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param group
+ *   Destination group for this action template.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_jump_actions_template(struct rte_eth_dev *dev,
+					  uint32_t group)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = group,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a flow action template with a unmasked REPRESENTED_PORT action.
+ * It is used to create control flow tables.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow action template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_ethdev port_v = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action_ethdev port_m = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
+				       struct rte_flow_pattern_template *it,
+				       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
+				  struct rte_flow_pattern_template *it,
+				  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = MLX5_HW_SQ_MISS_GROUP,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic
+ * from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table *
+flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
+			       struct rte_flow_pattern_template *it,
+			       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 15, /* TODO: Flow priority discovery. */
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a set of flow tables used to create control flows used
+ * when E-Switch is engaged.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, EINVAL otherwise
+ */
+static __rte_unused int
+flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
+	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *port_items_tmpl = NULL;
+	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_actions_template *port_actions_tmpl = NULL;
+	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+
+	/* Item templates */
+	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
+	if (!esw_mgr_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
+			" template for control flows", dev->data->port_id);
+		goto error;
+	}
+	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
+	if (!sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Action templates */
+	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
+									 MLX5_HW_SQ_MISS_GROUP);
+	if (!jump_sq_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Tables */
+	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
+	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
+			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_root_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+								     port_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
+	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
+							       jump_one_actions_tmpl);
+	if (!priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default jump to group 1"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	return 0;
+error:
+	if (priv->hw_esw_zero_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_zero_tbl, NULL);
+		priv->hw_esw_zero_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_tbl, NULL);
+		priv->hw_esw_sq_miss_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_root_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
+		priv->hw_esw_sq_miss_root_tbl = NULL;
+	}
+	if (jump_one_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
+	if (port_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
+	if (jump_sq_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (port_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
+	if (sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (esw_mgr_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
+	return -EINVAL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -2643,7 +3462,6 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-
 static int
 flow_hw_configure(struct rte_eth_dev *dev,
 		  const struct rte_flow_port_attr *port_attr,
@@ -2666,6 +3484,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		.free = mlx5_free,
 		.type = "mlx5_hw_action_construct_data",
 	};
+	/* Adds one queue to be used by PMD.
+	 * The last queue will be used by the PMD.
+	 */
+	uint16_t nb_q_updated;
+	struct rte_flow_queue_attr **_queue_attr = NULL;
+	struct rte_flow_queue_attr ctrl_queue_attr = {0};
+	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
+	int ret;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -2674,7 +3500,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* In case re-configuring, release existing context at first. */
 	if (priv->dr_ctx) {
 		/* */
-		for (i = 0; i < nb_queue; i++) {
+		for (i = 0; i < priv->nb_queue; i++) {
 			hw_q = &priv->hw_q[i];
 			/* Make sure all queues are empty. */
 			if (hw_q->size != hw_q->job_idx) {
@@ -2684,26 +3510,42 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		flow_hw_resource_release(dev);
 	}
+	ctrl_queue_attr.size = queue_attr[0]->size;
+	nb_q_updated = nb_queue + 1;
+	_queue_attr = mlx5_malloc(MLX5_MEM_ZERO,
+				  nb_q_updated *
+				  sizeof(struct rte_flow_queue_attr *),
+				  64, SOCKET_ID_ANY);
+	if (!_queue_attr) {
+		rte_errno = ENOMEM;
+		goto err;
+	}
+
+	memcpy(_queue_attr, queue_attr,
+	       sizeof(void *) * nb_queue);
+	_queue_attr[nb_queue] = &ctrl_queue_attr;
 	priv->acts_ipool = mlx5_ipool_create(&cfg);
 	if (!priv->acts_ipool)
 		goto err;
 	/* Allocate the queue job descriptor LIFO. */
-	mem_size = sizeof(priv->hw_q[0]) * nb_queue;
-	for (i = 0; i < nb_queue; i++) {
+	mem_size = sizeof(priv->hw_q[0]) * nb_q_updated;
+	for (i = 0; i < nb_q_updated; i++) {
 		/*
 		 * Check if the queues' size are all the same as the
 		 * limitation from HWS layer.
 		 */
-		if (queue_attr[i]->size != queue_attr[0]->size) {
+		if (_queue_attr[i]->size != _queue_attr[0]->size) {
 			rte_errno = EINVAL;
 			goto err;
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
+			    sizeof(struct mlx5_hw_q_job) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
 			    sizeof(struct mlx5_modification_cmd) *
 			    MLX5_MHDR_MAX_CMD +
-			    sizeof(struct mlx5_hw_q_job)) *
-			    queue_attr[0]->size;
+			    sizeof(struct rte_flow_item) *
+			    MLX5_HW_MAX_ITEMS) *
+			    _queue_attr[i]->size;
 	}
 	priv->hw_q = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
 				 64, SOCKET_ID_ANY);
@@ -2711,58 +3553,82 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		rte_errno = ENOMEM;
 		goto err;
 	}
-	for (i = 0; i < nb_queue; i++) {
+	for (i = 0; i < nb_q_updated; i++) {
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
+		struct rte_flow_item *items = NULL;
 
-		priv->hw_q[i].job_idx = queue_attr[i]->size;
-		priv->hw_q[i].size = queue_attr[i]->size;
+		priv->hw_q[i].job_idx = _queue_attr[i]->size;
+		priv->hw_q[i].size = _queue_attr[i]->size;
 		if (i == 0)
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &priv->hw_q[nb_queue];
+					    &priv->hw_q[nb_q_updated];
 		else
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &job[queue_attr[i - 1]->size];
+				&job[_queue_attr[i - 1]->size - 1].items
+				 [MLX5_HW_MAX_ITEMS];
 		job = (struct mlx5_hw_q_job *)
-		      &priv->hw_q[i].job[queue_attr[i]->size];
-		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
-		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
-		for (j = 0; j < queue_attr[i]->size; j++) {
+		      &priv->hw_q[i].job[_queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)
+			   &job[_queue_attr[i]->size];
+		encap = (uint8_t *)
+			 &mhdr_cmd[_queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
+		items = (struct rte_flow_item *)
+			 &encap[_queue_attr[i]->size * MLX5_ENCAP_MAX_LEN];
+		for (j = 0; j < _queue_attr[i]->size; j++) {
 			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
+			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
-	dr_ctx_attr.queues = nb_queue;
+	dr_ctx_attr.queues = nb_q_updated;
 	/* Queue size should all be the same. Take the first one. */
-	dr_ctx_attr.queue_size = queue_attr[0]->size;
+	dr_ctx_attr.queue_size = _queue_attr[0]->size;
 	dr_ctx = mlx5dr_context_open(priv->sh->cdev->ctx, &dr_ctx_attr);
 	/* rte_errno has been updated by HWS layer. */
 	if (!dr_ctx)
 		goto err;
 	priv->dr_ctx = dr_ctx;
-	priv->nb_queue = nb_queue;
+	priv->nb_queue = nb_q_updated;
+	rte_spinlock_init(&priv->hw_ctrl_lock);
+	LIST_INIT(&priv->hw_ctrl_flows);
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			priv->hw_drop[i][j] = mlx5dr_action_create_dest_drop
-				(priv->dr_ctx, mlx5_hw_act_flag[i][j]);
-			if (!priv->hw_drop[i][j])
-				goto err;
-		}
+		uint32_t act_flags = 0;
+
+		act_flags = mlx5_hw_act_flag[i][0] | mlx5_hw_act_flag[i][1];
+		if (is_proxy)
+			act_flags |= mlx5_hw_act_flag[i][2];
+		priv->hw_drop[i] = mlx5dr_action_create_dest_drop(priv->dr_ctx, act_flags);
+		if (!priv->hw_drop[i])
+			goto err;
 		priv->hw_tag[i] = mlx5dr_action_create_tag
 			(priv->dr_ctx, mlx5_hw_act_flag[i][0]);
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (is_proxy) {
+		ret = flow_hw_create_vport_actions(priv);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+		ret = flow_hw_create_ctrl_tables(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return 0;
 err:
+	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
@@ -2774,6 +3640,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -2792,10 +3660,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i, j;
+	int i;
 
 	if (!priv->dr_ctx)
 		return;
+	flow_hw_rxq_flag_set(dev, false);
+	flow_hw_flush_all_ctrl_flows(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -2809,13 +3679,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, at, NULL);
 	}
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
@@ -3058,4 +3927,397 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
+static uint32_t
+flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
+{
+	MLX5_ASSERT(priv->nb_queue > 0);
+	return priv->nb_queue - 1;
+}
+
+/**
+ * Creates a control flow using flow template API on @p proxy_dev device,
+ * on behalf of @p owner_dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * Created flow is stored in private list associated with @p proxy_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device on behalf of which flow is created.
+ * @param proxy_dev
+ *   Pointer to Ethernet device on which flow is created.
+ * @param table
+ *   Pointer to flow table.
+ * @param items
+ *   Pointer to flow rule items.
+ * @param item_template_idx
+ *   Index of an item template associated with @p table.
+ * @param actions
+ *   Pointer to flow rule actions.
+ * @param action_template_idx
+ *   Index of an action template associated with @p table.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno set.
+ */
+static __rte_unused int
+flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
+			 struct rte_eth_dev *proxy_dev,
+			 struct rte_flow_template_table *table,
+			 struct rte_flow_item items[],
+			 uint8_t item_template_idx,
+			 struct rte_flow_action actions[],
+			 uint8_t action_template_idx)
+{
+	struct mlx5_priv *priv = proxy_dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	struct rte_flow *flow = NULL;
+	struct mlx5_hw_ctrl_flow *entry = NULL;
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	entry = mlx5_malloc(MLX5_MEM_ZERO | MLX5_MEM_SYS, sizeof(*entry),
+			    0, SOCKET_ID_ANY);
+	if (!entry) {
+		DRV_LOG(ERR, "port %u not enough memory to create control flows",
+			proxy_dev->data->port_id);
+		rte_errno = ENOMEM;
+		ret = -rte_errno;
+		goto error;
+	}
+	flow = flow_hw_async_flow_create(proxy_dev, queue, &op_attr, table,
+					 items, item_template_idx,
+					 actions, action_template_idx,
+					 NULL, NULL);
+	if (!flow) {
+		DRV_LOG(ERR, "port %u failed to enqueue create control"
+			" flow operation", proxy_dev->data->port_id);
+		ret = -rte_errno;
+		goto error;
+	}
+	ret = flow_hw_push(proxy_dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			proxy_dev->data->port_id);
+		goto error;
+	}
+	ret = __flow_hw_pull_comp(proxy_dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to insert control flow",
+			proxy_dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto error;
+	}
+	entry->owner_dev = owner_dev;
+	entry->flow = flow;
+	LIST_INSERT_HEAD(&priv->hw_ctrl_flows, entry, next);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+error:
+	if (entry)
+		mlx5_free(entry);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys a control flow @p flow using flow template API on @p dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * If the @p flow is stored on any private list/pool, then caller must free up
+ * the relevant resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param flow
+ *   Pointer to flow rule.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+static int
+flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	ret = flow_hw_async_flow_destroy(dev, queue, &op_attr, flow, NULL, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to enqueue destroy control"
+			" flow operation", dev->data->port_id);
+		goto exit;
+	}
+	ret = flow_hw_push(dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			dev->data->port_id);
+		goto exit;
+	}
+	ret = __flow_hw_pull_comp(dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to destroy control flow",
+			dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto exit;
+	}
+exit:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys control flows created on behalf of @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	if (owner_priv->sh->config.dv_esw_en) {
+		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u",
+				owner_port_id);
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+		proxy_priv = proxy_dev->data->dev_private;
+	} else {
+		proxy_dev = owner_dev;
+		proxy_priv = owner_priv;
+	}
+	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		if (cf->owner_dev == owner_dev) {
+			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+			if (ret) {
+				rte_errno = ret;
+				return -ret;
+			}
+			LIST_REMOVE(cf, next);
+			mlx5_free(cf);
+		}
+		cf = cf_next;
+	}
+	return 0;
+}
+
+/**
+ * Destroys all control flows created on @p dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+static int
+flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	int ret;
+
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
+		if (ret) {
+			rte_errno = ret;
+			return -ret;
+		}
+		LIST_REMOVE(cf, next);
+		mlx5_free(cf);
+		cf = cf_next;
+	}
+	return 0;
+}
+
+int
+mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HW_SQ_MISS_GROUP,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx ||
+	    !priv->hw_esw_sq_miss_root_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_esw_sq_miss_root_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct mlx5_rte_flow_item_tx_queue queue_spec = {
+		.queue = txq,
+	};
+	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+			.spec = &queue_spec,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_ethdev port = {
+		.port_id = port_id,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	RTE_SET_USED(txq);
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
+	    !proxy_priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_sq_miss_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = port_id,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = 1,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_zero_tbl,
+					items, 0, actions, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index fd902078f8..7ffaf4c227 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1245,12 +1245,14 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 	uint16_t ether_type = 0;
 	bool is_empty_vlan = false;
 	uint16_t udp_dport = 0;
+	bool is_root;
 
 	if (items == NULL)
 		return -1;
 	ret = mlx5_flow_validate_attributes(dev, attr, error);
 	if (ret < 0)
 		return ret;
+	is_root = ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int ret = 0;
@@ -1380,7 +1382,7 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index c68b32cf14..6313602a66 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1280,6 +1280,52 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+
+static int
+mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	int ret;
+
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
+			goto error;
+	}
+	for (i = 0; i < priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
+		uint32_t queue;
+
+		if (!txq)
+			continue;
+		if (txq->is_hairpin)
+			queue = txq->obj->sq->id;
+		else
+			queue = txq->obj->sq_obj.sq->id;
+		if ((priv->representor || priv->master) &&
+		    priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
+	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
+			goto error;
+	}
+	return 0;
+error:
+	ret = rte_errno;
+	mlx5_flow_hw_flush_ctrl_flows(dev);
+	rte_errno = ret;
+	return -rte_errno;
+}
+
+#endif
+
 /**
  * Enable traffic flows configured by control plane
  *
@@ -1316,6 +1362,10 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 	unsigned int j;
 	int ret;
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_traffic_enable_hws(dev);
+#endif
 	/*
 	 * Hairpin txq default flow should be created no matter if it is
 	 * isolation mode. Or else all the packets to be sent will be sent
@@ -1346,13 +1396,17 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_create_esw_table_zero_flow(dev))
-			priv->fdb_def_rule = 1;
-		else
-			DRV_LOG(INFO, "port %u FDB default rule cannot be"
-				" configured - only Eswitch group 0 flows are"
-				" supported.", dev->data->port_id);
+	if (priv->sh->config.fdb_def_rule) {
+		if (priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_create_esw_table_zero_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				DRV_LOG(INFO, "port %u FDB default rule cannot be configured - only Eswitch group 0 flows are supported.",
+					dev->data->port_id);
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled",
+			dev->data->port_id);
 	}
 	if (!priv->sh->config.lacp_by_user && priv->pf_bond >= 0) {
 		ret = mlx5_flow_lacp_miss(dev);
@@ -1470,7 +1524,14 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 void
 mlx5_traffic_disable(struct rte_eth_dev *dev)
 {
-	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		mlx5_flow_hw_flush_ctrl_flows(dev);
+	else
+#endif
+		mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 06/17] net/mlx5: add extended metadata mode for hardware steering
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (4 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 05/17] net/mlx5: add HW steering port action Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 07/17] net/mlx5: add HW steering meter action Suanming Mou
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Bing Zhao

From: Bing Zhao <bingz@nvidia.com>

The new mode 4 of devarg "dv_xmeta_en" is added for HWS only. In this
mode, the Rx / Tx metadata with 32b width copy between FDB and NIC is
supported. The mark is only supported in NIC and there is no copy
supported.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  10 +-
 drivers/net/mlx5/mlx5.c          |   7 +-
 drivers/net/mlx5/mlx5.h          |   8 +-
 drivers/net/mlx5/mlx5_flow.c     |   8 +-
 drivers/net/mlx5/mlx5_flow.h     |  14 +
 drivers/net/mlx5/mlx5_flow_dv.c  |  43 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 864 ++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_trigger.c  |   3 +
 8 files changed, 872 insertions(+), 85 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index e0586a4d6f..061b825e7b 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1554,6 +1554,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+			DRV_LOG(ERR,
+				"metadata mode %u is not supported in HWS eswitch mode",
+				priv->sh->config.dv_xmeta_en);
+				err = ENOTSUP;
+				goto error;
+		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
@@ -1569,7 +1578,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		goto error;
 #endif
 	}
-	/* Port representor shares the same max priority with pf port. */
 	if (!priv->sh->flow_priority_check_flag) {
 		/* Supported Verbs flow priority number detection. */
 		err = mlx5_flow_discover_priorities(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 74adb677f4..cf5146d677 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1218,7 +1218,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
 		    tmp != MLX5_XMETA_MODE_META32 &&
-		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
+		    tmp != MLX5_XMETA_MODE_MISS_INFO &&
+		    tmp != MLX5_XMETA_MODE_META32_HWS) {
 			DRV_LOG(ERR, "Invalid extensive metadata parameter.");
 			rte_errno = EINVAL;
 			return -rte_errno;
@@ -2849,6 +2850,10 @@ mlx5_set_metadata_mask(struct rte_eth_dev *dev)
 		meta = UINT32_MAX;
 		mark = (reg_c0 >> rte_bsf32(reg_c0)) & MLX5_FLOW_MARK_MASK;
 		break;
+	case MLX5_XMETA_MODE_META32_HWS:
+		meta = UINT32_MAX;
+		mark = MLX5_FLOW_MARK_MASK;
+		break;
 	default:
 		meta = 0;
 		mark = 0;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0bf21c1efe..fc4bc4e6a3 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -298,8 +298,8 @@ struct mlx5_sh_config {
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
-	unsigned int dv_flow_en:2;
-	uint32_t dv_xmeta_en:2; /* Enable extensive flow metadata. */
+	uint32_t dv_flow_en:2; /* Enable DV flow. */
+	uint32_t dv_xmeta_en:3; /* Enable extensive flow metadata. */
 	uint32_t dv_miss_info:1; /* Restore packet after partial hw miss. */
 	uint32_t l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	uint32_t vf_nl_en:1; /* Enable Netlink requests in VF mode. */
@@ -312,7 +312,6 @@ struct mlx5_sh_config {
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
@@ -1279,12 +1278,12 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
+	/* Availability of mreg_c's. */
 	void *devx_channel_lwm;
 	struct rte_intr_handle *intr_handle_lwm;
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
-	/* Availability of mreg_c's. */
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1508,6 +1507,7 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
+	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 9c44b2e99b..b570ed7f69 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1107,6 +1107,8 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_METADATA_TX:
@@ -1119,11 +1121,14 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_FLOW_MARK:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
+		case MLX5_XMETA_MODE_META32_HWS:
 			return REG_NON;
 		case MLX5_XMETA_MODE_META16:
 			return REG_C_1;
@@ -4442,7 +4447,8 @@ static bool flow_check_modify_action_type(struct rte_eth_dev *dev,
 		return true;
 	case RTE_FLOW_ACTION_TYPE_FLAG:
 	case RTE_FLOW_ACTION_TYPE_MARK:
-		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY)
+		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS)
 			return true;
 		else
 			return false;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 93f0e189d4..a8b27ea494 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -48,6 +48,12 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
 };
 
+/* Private (internal) Field IDs for MODIFY_FIELD action. */
+enum mlx5_rte_flow_field_id {
+		MLX5_RTE_FLOW_FIELD_END = INT_MIN,
+			MLX5_RTE_FLOW_FIELD_META_REG,
+};
+
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
 
 enum {
@@ -1167,6 +1173,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
+	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
 };
 
 /* Jump action struct. */
@@ -1243,6 +1250,11 @@ struct mlx5_flow_group {
 #define MLX5_HW_TBL_MAX_ITEM_TEMPLATE 2
 #define MLX5_HW_TBL_MAX_ACTION_TEMPLATE 32
 
+struct mlx5_flow_template_table_cfg {
+	struct rte_flow_template_table_attr attr; /* Table attributes passed through flow API. */
+	bool external; /* True if created by flow API, false if table is internal to PMD. */
+};
+
 struct rte_flow_template_table {
 	LIST_ENTRY(rte_flow_template_table) next;
 	struct mlx5_flow_group *grp; /* The group rte_flow_template_table uses. */
@@ -1252,6 +1264,7 @@ struct rte_flow_template_table {
 	/* Action templates bind to the table. */
 	struct mlx5_hw_action_template ats[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	struct mlx5_indexed_pool *flow; /* The table's flow ipool. */
+	struct mlx5_flow_template_table_cfg cfg;
 	uint32_t type; /* Flow table type RX/TX/FDB. */
 	uint8_t nb_item_templates; /* Item template number. */
 	uint8_t nb_action_templates; /* Action template number. */
@@ -2333,4 +2346,5 @@ int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 3fc2453045..5b72cfaa61 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1783,7 +1783,8 @@ mlx5_flow_field_id_to_modify_info
 			int reg;
 
 			if (priv->sh->config.dv_flow_en == 2)
-				reg = REG_C_1;
+				reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG,
+							 data->level);
 			else
 				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
 							   data->level, error);
@@ -1862,6 +1863,24 @@ mlx5_flow_field_id_to_modify_info
 		else
 			info[idx].offset = off_be;
 		break;
+	case MLX5_RTE_FLOW_FIELD_META_REG:
+		{
+			uint32_t meta_mask = priv->sh->dv_meta_mask;
+			uint32_t meta_count = __builtin_popcount(meta_mask);
+			uint32_t reg = data->level;
+
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT(reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0, reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -9819,7 +9838,19 @@ flow_dv_translate_item_meta(struct rte_eth_dev *dev,
 	mask = meta_m->data;
 	if (key_type == MLX5_SET_MATCHER_HS_M)
 		mask = value;
-	reg = flow_dv_get_metadata_reg(dev, attr, NULL);
+	/*
+	 * In the current implementation, REG_B cannot be used to match.
+	 * Force to use REG_C_1 in HWS root table as other tables.
+	 * This map may change.
+	 * NIC: modify - REG_B to be present in SW
+	 *      match - REG_C_1 when copied from FDB, different from SWS
+	 * FDB: modify - REG_C_1 in Xmeta mode, REG_NON in legacy mode
+	 *      match - REG_C_1 in FDB
+	 */
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = flow_dv_get_metadata_reg(dev, attr, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_META, 0);
 	if (reg < 0)
 		return;
 	MLX5_ASSERT(reg != REG_NON);
@@ -9919,7 +9950,10 @@ flow_dv_translate_item_tag(struct rte_eth_dev *dev, void *key,
 	/* When set mask, the index should be from spec. */
 	index = tag_vv ? tag_vv->index : tag_v->index;
 	/* Get the metadata register index for the tag. */
-	reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG, index, NULL);
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG, index, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, index);
 	MLX5_ASSERT(reg > 0);
 	flow_dv_match_meta_reg(key, reg, tag_v->data, tag_m->data);
 }
@@ -13437,7 +13471,8 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 	 */
 	if (!(wks.item_flags & MLX5_FLOW_ITEM_PORT_ID) &&
 	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) && priv->sh->esw_mode &&
-	    !(attr->egress && !attr->transfer)) {
+	    !(attr->egress && !attr->transfer) &&
+	    attr->group != MLX5_FLOW_MREG_CP_TABLE_GROUP) {
 		if (flow_dv_translate_item_port_id_all(dev, match_mask,
 						   match_value, NULL, attr))
 			return -rte_errno;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index b3b37f36a2..64d06d4fb4 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,13 +20,27 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
-/* Maximum number of rules in control flow tables */
+/* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Flow group for SQ miss default flows/ */
-#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+/* Lowest flow group usable by an application. */
+#define MLX5_HW_LOWEST_USABLE_GROUP (1)
+
+/* Maximum group index usable by user applications for transfer flows. */
+#define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
+
+/* Lowest priority for HW root table. */
+#define MLX5_HW_LOWEST_PRIO_ROOT 15
+
+/* Lowest priority for HW non-root table. */
+#define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+static int flow_hw_translate_group(struct rte_eth_dev *dev,
+				   const struct mlx5_flow_template_table_cfg *cfg,
+				   uint32_t group,
+				   uint32_t *table_group,
+				   struct rte_flow_error *error);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -213,12 +227,12 @@ flow_hw_rss_item_flags_get(const struct rte_flow_item items[])
  */
 static struct mlx5_hw_jump_action *
 flow_hw_jump_action_register(struct rte_eth_dev *dev,
-			     const struct rte_flow_attr *attr,
+			     const struct mlx5_flow_template_table_cfg *cfg,
 			     uint32_t dest_group,
 			     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_attr jattr = *attr;
+	struct rte_flow_attr jattr = cfg->attr.flow_attr;
 	struct mlx5_flow_group *grp;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -226,9 +240,13 @@ flow_hw_jump_action_register(struct rte_eth_dev *dev,
 		.data = &jattr,
 	};
 	struct mlx5_list_entry *ge;
+	uint32_t target_group;
 
-	jattr.group = dest_group;
-	ge = mlx5_hlist_register(priv->sh->flow_tbls, dest_group, &ctx);
+	target_group = dest_group;
+	if (flow_hw_translate_group(dev, cfg, dest_group, &target_group, error))
+		return NULL;
+	jattr.group = target_group;
+	ge = mlx5_hlist_register(priv->sh->flow_tbls, target_group, &ctx);
 	if (!ge)
 		return NULL;
 	grp = container_of(ge, struct mlx5_flow_group, entry);
@@ -760,7 +778,8 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)conf->src.pvalue :
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
-		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
 			item.spec = &value;
@@ -860,6 +879,9 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	if (m && !!m->port_id) {
 		struct mlx5_priv *port_priv;
 
+		if (!v)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
 		if (port_priv == NULL)
 			return rte_flow_error_set
@@ -903,8 +925,8 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] table_attr
- *   Pointer to the table attributes.
+ * @param[in] cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in/out] acts
@@ -919,12 +941,13 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  */
 static int
 flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct rte_flow_template_table_attr *table_attr,
+			  const struct mlx5_flow_template_table_cfg *cfg,
 			  struct mlx5_hw_actions *acts,
 			  struct rte_flow_actions_template *at,
 			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
 	const struct rte_flow_attr *attr = &table_attr->flow_attr;
 	struct rte_flow_action *actions = at->actions;
 	struct rte_flow_action *action_start = actions;
@@ -991,7 +1014,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
 				acts->jump = flow_hw_jump_action_register
-						(dev, attr, jump_group, error);
+						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
 				acts->rule_acts[i].action = (!!attr->group) ?
@@ -1101,6 +1124,16 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 							   error);
 			if (err)
 				goto err;
+			/*
+			 * Adjust the action source position for the following.
+			 * ... / MODIFY_FIELD: rx_cpy_pos / (QUEUE|RSS) / ...
+			 * The next action will be Q/RSS, there will not be
+			 * another adjustment and the real source position of
+			 * the following actions will be decreased by 1.
+			 * No change of the total actions in the new template.
+			 */
+			if ((actions - action_start) == at->rx_cpy_pos)
+				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			if (flow_hw_represented_port_compile
@@ -1365,7 +1398,8 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 	else
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
-	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
 	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
@@ -1513,7 +1547,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
 			jump = flow_hw_jump_action_register
-				(dev, &attr, jump_group, NULL);
+				(dev, &table->cfg, jump_group, NULL);
 			if (!jump)
 				return -1;
 			rule_acts[act_data->action_dst].action =
@@ -1710,7 +1744,13 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow actions based on the input actions.*/
+	/*
+	 * Construct the flow actions based on the input actions.
+	 * The implicitly appended action is always fixed, like metadata
+	 * copy action from FDB to NIC Rx.
+	 * No need to copy and contrust a new "actions" list based on the
+	 * user's input, in order to save the cost.
+	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
 				  actions, rule_acts, &acts_num)) {
 		rte_errno = EINVAL;
@@ -1981,6 +2021,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
 	/* Flush flow per-table from MLX5_DEFAULT_FLUSH_QUEUE. */
 	hw_q = &priv->hw_q[MLX5_DEFAULT_FLUSH_QUEUE];
 	LIST_FOREACH(tbl, &priv->flow_hw_tbl, next) {
+		if (!tbl->cfg.external)
+			continue;
 		MLX5_IPOOL_FOREACH(tbl->flow, fidx, flow) {
 			if (flow_hw_async_flow_destroy(dev,
 						MLX5_DEFAULT_FLUSH_QUEUE,
@@ -2018,8 +2060,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] attr
- *   Pointer to the table attributes.
+ * @param[in] table_cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in] nb_item_templates
@@ -2036,7 +2078,7 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  */
 static struct rte_flow_template_table *
 flow_hw_table_create(struct rte_eth_dev *dev,
-		     const struct rte_flow_template_table_attr *attr,
+		     const struct mlx5_flow_template_table_cfg *table_cfg,
 		     struct rte_flow_pattern_template *item_templates[],
 		     uint8_t nb_item_templates,
 		     struct rte_flow_actions_template *action_templates[],
@@ -2048,6 +2090,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -2088,6 +2131,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*tbl), 0, rte_socket_id());
 	if (!tbl)
 		goto error;
+	tbl->cfg = *table_cfg;
 	/* Allocate flow indexed pool. */
 	tbl->flow = mlx5_ipool_create(&cfg);
 	if (!tbl->flow)
@@ -2131,7 +2175,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			goto at_error;
 		}
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, attr,
+		err = flow_hw_actions_translate(dev, &tbl->cfg,
 						&tbl->ats[i].acts,
 						action_templates[i], error);
 		if (err) {
@@ -2174,6 +2218,96 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Translates group index specified by the user in @p attr to internal
+ * group index.
+ *
+ * Translation is done by incrementing group index, so group n becomes n + 1.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] cfg
+ *   Pointer to the template table configuration.
+ * @param[in] group
+ *   Currently used group index (table group or jump destination).
+ * @param[out] table_group
+ *   Pointer to output group index.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success. Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static int
+flow_hw_translate_group(struct rte_eth_dev *dev,
+			const struct mlx5_flow_template_table_cfg *cfg,
+			uint32_t group,
+			uint32_t *table_group,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
+
+	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
+	} else {
+		*table_group = group;
+	}
+	return 0;
+}
+
+/**
+ * Create flow table.
+ *
+ * This function is a wrapper over @ref flow_hw_table_create(), which translates parameters
+ * provided by user to proper internal values.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Pointer to the table attributes.
+ * @param[in] item_templates
+ *   Item template array to be binded to the table.
+ * @param[in] nb_item_templates
+ *   Number of item templates.
+ * @param[in] action_templates
+ *   Action template array to be binded to the table.
+ * @param[in] nb_action_templates
+ *   Number of action templates.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Table on success, Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static struct rte_flow_template_table *
+flow_hw_template_table_create(struct rte_eth_dev *dev,
+			      const struct rte_flow_template_table_attr *attr,
+			      struct rte_flow_pattern_template *item_templates[],
+			      uint8_t nb_item_templates,
+			      struct rte_flow_actions_template *action_templates[],
+			      uint8_t nb_action_templates,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = *attr,
+		.external = true,
+	};
+	uint32_t group = attr->flow_attr.group;
+
+	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
+		return NULL;
+	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
+				    action_templates, nb_action_templates, error);
+}
+
 /**
  * Destroy flow table.
  *
@@ -2309,10 +2443,13 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 					  "cannot use represented_port actions"
 					  " without an E-Switch");
-	if (mask_conf->port_id) {
+	if (mask_conf && mask_conf->port_id) {
 		struct mlx5_priv *port_priv;
 		struct mlx5_priv *dev_priv;
 
+		if (!action_conf)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
 		if (!port_priv)
 			return rte_flow_error_set(error, rte_errno,
@@ -2337,20 +2474,77 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static inline int
+flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
+				const struct rte_flow_action masks[],
+				const struct rte_flow_action *ins_actions,
+				const struct rte_flow_action *ins_masks,
+				struct rte_flow_action *new_actions,
+				struct rte_flow_action *new_masks,
+				uint16_t *ins_pos)
+{
+	uint16_t idx, total = 0;
+	bool ins = false;
+	bool act_end = false;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(ins_actions && ins_masks);
+	for (idx = 0; !act_end; idx++) {
+		if (idx >= MLX5_HW_MAX_ACTS)
+			return -1;
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
+		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+			ins = true;
+			*ins_pos = idx;
+		}
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+			act_end = true;
+	}
+	if (!ins)
+		return 0;
+	else if (idx == MLX5_HW_MAX_ACTS)
+		return -1; /* No more space. */
+	total = idx;
+	/* Before the position, no change for the actions. */
+	for (idx = 0; idx < *ins_pos; idx++) {
+		new_actions[idx] = actions[idx];
+		new_masks[idx] = masks[idx];
+	}
+	/* Insert the new action and mask to the position. */
+	new_actions[idx] = *ins_actions;
+	new_masks[idx] = *ins_masks;
+	/* Remaining content is right shifted by one position. */
+	for (; idx < total; idx++) {
+		new_actions[idx + 1] = actions[idx];
+		new_masks[idx + 1] = masks[idx];
+	}
+	return 0;
+}
+
 static int
 flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
-	int i;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t i;
 	bool actions_end = false;
 	int ret;
 
+	/* FDB actions are only valid to proxy port. */
+	if (attr->transfer && (!priv->sh->config.dv_esw_en || !priv->master))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "transfer actions are only valid to proxy port");
 	for (i = 0; !actions_end; ++i) {
 		const struct rte_flow_action *action = &actions[i];
 		const struct rte_flow_action *mask = &masks[i];
 
+		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
 		if (action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -2447,21 +2641,77 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int len, act_len, mask_len, i;
-	struct rte_flow_actions_template *at;
+	struct rte_flow_actions_template *at = NULL;
+	uint16_t pos = MLX5_HW_MAX_ACTS;
+	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
+	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
+	const struct rte_flow_action *ra;
+	const struct rte_flow_action *rm;
+	const struct rte_flow_action_modify_field rx_mreg = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_B,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field rx_mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action rx_cpy = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg,
+	};
+	const struct rte_flow_action rx_cpy_mask = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg_mask,
+	};
 
-	if (flow_hw_action_validate(dev, actions, masks, error))
+	if (flow_hw_action_validate(dev, attr, actions, masks, error))
 		return NULL;
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				NULL, 0, actions, error);
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en) {
+		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
+						    tmp_action, tmp_mask, &pos)) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "Failed to concatenate new action/mask");
+			return NULL;
+		}
+	}
+	/* Application should make sure only one Q/RSS exist in one rule. */
+	if (pos == MLX5_HW_MAX_ACTS) {
+		ra = actions;
+		rm = masks;
+	} else {
+		ra = tmp_action;
+		rm = tmp_mask;
+	}
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
 		return NULL;
 	len = RTE_ALIGN(act_len, 16);
-	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				 NULL, 0, masks, error);
+	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, rm, error);
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at), 64, rte_socket_id());
+	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
+			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2469,18 +2719,20 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
+	/* Actions part is in the first half. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions, len,
-				actions, error);
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
+				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	at->masks = (struct rte_flow_action *)
-		    (((uint8_t *)at->actions) + act_len);
+	/* Masks part is in the second half. */
+	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
-				 len - act_len, masks, error);
+				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
 	 * The rte_flow_conv() function copies the content from conf pointer.
@@ -2497,7 +2749,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	mlx5_free(at);
+	if (at)
+		mlx5_free(at);
 	return NULL;
 }
 
@@ -2572,6 +2825,80 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 	return copied_items;
 }
 
+static int
+flow_hw_pattern_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error)
+{
+	int i;
+	bool items_end = false;
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+
+	for (i = 0; !items_end; i++) {
+		int type = items[i].type;
+
+		switch (type) {
+		case RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			int reg;
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+
+			reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, tag->index);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported tag index");
+			break;
+		}
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+			struct mlx5_priv *priv = dev->data->dev_private;
+			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
+
+			if (!((1 << (tag->index - REG_C_0)) & regcs))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported internal tag index");
+		}
+		case RTE_FLOW_ITEM_TYPE_VOID:
+		case RTE_FLOW_ITEM_TYPE_ETH:
+		case RTE_FLOW_ITEM_TYPE_VLAN:
+		case RTE_FLOW_ITEM_TYPE_IPV4:
+		case RTE_FLOW_ITEM_TYPE_IPV6:
+		case RTE_FLOW_ITEM_TYPE_UDP:
+		case RTE_FLOW_ITEM_TYPE_TCP:
+		case RTE_FLOW_ITEM_TYPE_GTP:
+		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ITEM_TYPE_VXLAN:
+		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+		case RTE_FLOW_ITEM_TYPE_META:
+		case RTE_FLOW_ITEM_TYPE_GRE:
+		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
+		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
+		case RTE_FLOW_ITEM_TYPE_ICMP:
+		case RTE_FLOW_ITEM_TYPE_ICMP6:
+			break;
+		case RTE_FLOW_ITEM_TYPE_END:
+			items_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL,
+						  "Unsupported item type");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow item template.
  *
@@ -2598,6 +2925,8 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
 
+	if (flow_hw_pattern_validate(dev, attr, items, error))
+		return NULL;
 	if (priv->sh->config.dv_esw_en && attr->ingress) {
 		/*
 		 * Disallow pattern template with ingress and egress/transfer
@@ -3032,6 +3361,17 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+static uint32_t
+flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+{
+	uint32_t usable_mask = ~priv->vport_meta_mask;
+
+	if (usable_mask)
+		return (1 << rte_bsf32(usable_mask));
+	else
+		return 0;
+}
+
 /**
  * Creates a flow pattern template used to match on E-Switch Manager.
  * This template is used to set up a table for SQ miss default flow.
@@ -3070,7 +3410,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match on a TX queue.
+ * Creates a flow pattern template used to match REG_C_0 and a TX queue.
+ * Matching on REG_C_0 is set up to match on least significant bit usable
+ * by user-space, which is set when packet was originated from E-Switch Manager.
+ *
  * This template is used to set up a table for SQ miss default flow.
  *
  * @param dev
@@ -3080,16 +3423,30 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
  *   Pointer to flow pattern template on success, NULL otherwise.
  */
 static struct rte_flow_pattern_template *
-flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
 	};
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_tx_queue queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
@@ -3100,6 +3457,12 @@ flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
+		return NULL;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -3137,6 +3500,132 @@ flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
+/*
+ * Creating a flow pattern template with all ETH packets matching.
+ * This template is used to set up a table for default Tx copy (Tx metadata
+ * to REG_C_1) flow rule usage.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr tx_pa_attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_pattern_template_create(dev, &tx_pa_attr, eth_all, &drop_err);
+}
+
+/**
+ * Creates a flow actions template with modify field action and masked jump action.
+ * Modify field action sets the least significant bit of REG_C_0 (usable by user-space)
+ * to 1, meaning that packet was originated from E-Switch Manager. Jump action
+ * transfers steering to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
+	uint32_t marker_bit_mask = UINT32_MAX;
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
+		return NULL;
+	}
+	set_reg_v.dst.offset = rte_bsf32(marker_bit);
+	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
+	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
 /**
  * Creates a flow actions template with an unmasked JUMP action. Flows
  * based on this template will perform a jump to some group. This template
@@ -3231,6 +3720,73 @@ flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
 					       NULL);
 }
 
+/*
+ * Creating an actions template to use header modify action for register
+ * copying. This template is used to set up a table for copy flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr tx_act_attr = {
+		.egress = 1,
+	};
+	const struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	const struct rte_flow_action copy_reg_mask[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_mask,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
+					       copy_reg_mask, &drop_err);
+}
+
 /**
  * Creates a control flow table used to transfer traffic from E-Switch Manager
  * and TX queues from group 0 to group 1.
@@ -3260,8 +3816,12 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 
@@ -3286,16 +3846,56 @@ flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
 {
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
-			.group = MLX5_HW_SQ_MISS_GROUP,
-			.priority = 0,
+			.group = 1,
+			.priority = MLX5_HW_LOWEST_PRIO_NON_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
+
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
+}
+
+/*
+ * Creating the default Tx metadata copy table on NIC Tx group 0.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param pt
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
+					  struct rte_flow_pattern_template *pt,
+					  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr tx_tbl_attr = {
+		.flow_attr = {
+			.group = 0, /* Root */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = 1, /* One default flow rule for all. */
+	};
+	struct mlx5_flow_template_table_cfg tx_tbl_cfg = {
+		.attr = tx_tbl_attr,
+		.external = false,
+	};
+	struct rte_flow_error drop_err;
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	RTE_SET_USED(drop_err);
+	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
 }
 
 /**
@@ -3320,15 +3920,19 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 15, /* TODO: Flow priority discovery. */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 /**
@@ -3346,11 +3950,14 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
-	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *regc_sq_items_tmpl = NULL;
 	struct rte_flow_pattern_template *port_items_tmpl = NULL;
-	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_pattern_template *tx_meta_items_tmpl = NULL;
+	struct rte_flow_actions_template *regc_jump_actions_tmpl = NULL;
 	struct rte_flow_actions_template *port_actions_tmpl = NULL;
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
+	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
 
 	/* Item templates */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
@@ -3359,8 +3966,8 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
-	if (!sq_items_tmpl) {
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create SQ item template for"
 			" control flows", dev->data->port_id);
 		goto error;
@@ -3371,11 +3978,18 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Action templates */
-	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
-									 MLX5_HW_SQ_MISS_GROUP);
-	if (!jump_sq_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
+	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
+	if (!regc_jump_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
@@ -3385,23 +3999,32 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
 	if (!jump_one_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
-			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_root_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
-	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
@@ -3416,6 +4039,16 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
+		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
+					tx_meta_items_tmpl, tx_meta_actions_tmpl);
+		if (!priv->hw_tx_meta_cpy_tbl) {
+			DRV_LOG(ERR, "port %u failed to create table for default"
+				" Tx metadata copy flow rule", dev->data->port_id);
+			goto error;
+		}
+	}
 	return 0;
 error:
 	if (priv->hw_esw_zero_tbl) {
@@ -3430,16 +4063,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
 	if (port_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
-	if (jump_sq_actions_tmpl)
-		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (regc_jump_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
-	if (sq_items_tmpl)
-		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (regc_sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, regc_sq_items_tmpl, NULL);
 	if (esw_mgr_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
 	return -EINVAL;
@@ -3491,7 +4128,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
-	int ret;
+	int ret = 0;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -3642,6 +4279,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	/* Do not overwrite the internal errno information. */
+	if (ret)
+		return ret;
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -3751,17 +4391,17 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		return;
 	unset |= 1 << (priv->mtr_color_reg - REG_C_0);
 	unset |= 1 << (REG_C_6 - REG_C_0);
-	if (meta_mode == MLX5_XMETA_MODE_META32_HWS) {
-		unset |= 1 << (REG_C_1 - REG_C_0);
+	if (priv->sh->config.dv_esw_en)
 		unset |= 1 << (REG_C_0 - REG_C_0);
-	}
+	if (meta_mode == MLX5_XMETA_MODE_META32_HWS)
+		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
 						mlx5_flow_hw_avl_tags[i];
-				copy_masks |= (1 << i);
+				copy_masks |= (1 << (mlx5_flow_hw_avl_tags[i] - REG_C_0));
 			}
 		}
 		if (copy_masks != masks) {
@@ -3903,7 +4543,6 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	return flow_dv_action_destroy(dev, handle, error);
 }
 
-
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -3911,7 +4550,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
-	.template_table_create = flow_hw_table_create,
+	.template_table_create = flow_hw_template_table_create,
 	.template_table_destroy = flow_hw_table_destroy,
 	.async_flow_create = flow_hw_async_flow_create,
 	.async_flow_destroy = flow_hw_async_flow_destroy,
@@ -3927,13 +4566,6 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
-static uint32_t
-flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
-{
-	MLX5_ASSERT(priv->nb_queue > 0);
-	return priv->nb_queue - 1;
-}
-
 /**
  * Creates a control flow using flow template API on @p proxy_dev device,
  * on behalf of @p owner_dev device.
@@ -3971,7 +4603,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4046,7 +4678,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4183,10 +4815,24 @@ mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
+	struct rte_flow_action_modify_field modify_field = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
 	struct rte_flow_action_jump jump = {
-		.group = MLX5_HW_SQ_MISS_GROUP,
+		.group = 1,
 	};
 	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &modify_field,
+		},
 		{
 			.type = RTE_FLOW_ACTION_TYPE_JUMP,
 			.conf = &jump,
@@ -4209,6 +4855,12 @@ int
 mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 {
 	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_tx_queue queue_spec = {
 		.queue = txq,
 	};
@@ -4216,6 +4868,12 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
@@ -4241,6 +4899,7 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
+	uint32_t marker_bit;
 	int ret;
 
 	RTE_SET_USED(txq);
@@ -4261,6 +4920,14 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
+	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_create_ctrl_flow(dev, proxy_dev,
 					proxy_priv->hw_esw_sq_miss_tbl,
 					items, 0, actions, 0);
@@ -4320,4 +4987,53 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 					items, 0, actions, 0);
 }
 
+int
+mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx || !priv->hw_tx_meta_cpy_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_tx_meta_cpy_tbl,
+					eth_all, 0, copy_reg_action, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 6313602a66..ccefebefc9 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1292,6 +1292,9 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	if (priv->sh->config.dv_esw_en && priv->master) {
 		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
 			goto error;
+		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
+			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+				goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 07/17] net/mlx5: add HW steering meter action
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (5 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 06/17] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 08/17] net/mlx5: add HW steering counter action Suanming Mou
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

This commit adds meter action for HWS steering.

HW steering meter is based on ASO. The number of meters will
be used by flows should be specified in advanced in the flow
configure API.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |  61 ++-
 drivers/net/mlx5/mlx5_flow.c       |  71 +++
 drivers/net/mlx5/mlx5_flow.h       |  50 ++
 drivers/net/mlx5/mlx5_flow_aso.c   |  30 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |  25 -
 drivers/net/mlx5/mlx5_flow_hw.c    | 264 ++++++++++-
 drivers/net/mlx5/mlx5_flow_meter.c | 702 ++++++++++++++++++++++++++++-
 7 files changed, 1142 insertions(+), 61 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index fc4bc4e6a3..686969719a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -357,6 +357,9 @@ struct mlx5_hw_q {
 	struct mlx5_hw_q_job **job; /* LIFO header. */
 } __rte_cache_aligned;
 
+
+
+
 #define MLX5_COUNTERS_PER_POOL 512
 #define MLX5_MAX_PENDING_QUERIES 4
 #define MLX5_CNT_CONTAINER_RESIZE 64
@@ -782,15 +785,29 @@ struct mlx5_flow_meter_policy {
 	/* Is meter action in policy table. */
 	uint32_t hierarchy_drop_cnt:1;
 	/* Is any meter in hierarchy contains drop_cnt. */
+	uint32_t skip_r:1;
+	/* If red color policy is skipped. */
 	uint32_t skip_y:1;
 	/* If yellow color policy is skipped. */
 	uint32_t skip_g:1;
 	/* If green color policy is skipped. */
 	uint32_t mark:1;
 	/* If policy contains mark action. */
+	uint32_t initialized:1;
+	/* Initialized. */
+	uint16_t group;
+	/* The group. */
 	rte_spinlock_t sl;
 	uint32_t ref_cnt;
 	/* Use count. */
+	struct rte_flow_pattern_template *hws_item_templ;
+	/* Hardware steering item templates. */
+	struct rte_flow_actions_template *hws_act_templ[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering action templates. */
+	struct rte_flow_template_table *hws_flow_table[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering tables. */
+	struct rte_flow *hws_flow_rule[MLX5_MTR_DOMAIN_MAX][RTE_COLORS];
+	/* Hardware steering rules. */
 	struct mlx5_meter_policy_action_container act_cnt[MLX5_MTR_RTE_COLORS];
 	/* Policy actions container. */
 	void *dr_drop_action[MLX5_MTR_DOMAIN_MAX];
@@ -865,6 +882,7 @@ struct mlx5_flow_meter_info {
 	 */
 	uint32_t transfer:1;
 	uint32_t def_policy:1;
+	uint32_t initialized:1;
 	/* Meter points to default policy. */
 	uint32_t color_aware:1;
 	/* Meter is color aware mode. */
@@ -880,6 +898,10 @@ struct mlx5_flow_meter_info {
 	/**< Flow meter action. */
 	void *meter_action_y;
 	/**< Flow meter action for yellow init_color. */
+	uint32_t meter_offset;
+	/**< Flow meter offset. */
+	uint16_t group;
+	/**< Flow meter group. */
 };
 
 /* PPS(packets per second) map to BPS(Bytes per second).
@@ -914,6 +936,7 @@ struct mlx5_flow_meter_profile {
 	uint32_t ref_cnt; /**< Use count. */
 	uint32_t g_support:1; /**< If G color will be generated. */
 	uint32_t y_support:1; /**< If Y color will be generated. */
+	uint32_t initialized:1; /**< Initialized. */
 };
 
 /* 2 meters in each ASO cache line */
@@ -934,13 +957,20 @@ enum mlx5_aso_mtr_state {
 	ASO_METER_READY, /* CQE received. */
 };
 
+/*aso flow meter type*/
+enum mlx5_aso_mtr_type {
+	ASO_METER_INDIRECT,
+	ASO_METER_DIRECT,
+};
+
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
 	LIST_ENTRY(mlx5_aso_mtr) next;
+	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
-	uint8_t offset;
+	uint32_t offset;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -964,6 +994,14 @@ struct mlx5_aso_mtr_pools_mng {
 	struct mlx5_aso_mtr_pool **pools; /* ASO flow meter pool array. */
 };
 
+/* Bulk management structure for ASO flow meter. */
+struct mlx5_mtr_bulk {
+	uint32_t size; /* Number of ASO objects. */
+	struct mlx5dr_action *action; /* HWS action */
+	struct mlx5_devx_obj *devx_obj; /* DEVX object. */
+	struct mlx5_aso_mtr *aso; /* Array of ASO objects. */
+};
+
 /* Meter management structure for global flow meter resource. */
 struct mlx5_flow_mtr_mng {
 	struct mlx5_aso_mtr_pools_mng pools_mng;
@@ -1017,6 +1055,7 @@ struct mlx5_flow_tbl_resource {
 #define MLX5_FLOW_TABLE_LEVEL_METER (MLX5_MAX_TABLES - 3)
 #define MLX5_FLOW_TABLE_LEVEL_POLICY (MLX5_MAX_TABLES - 4)
 #define MLX5_MAX_TABLES_EXTERNAL MLX5_FLOW_TABLE_LEVEL_POLICY
+#define MLX5_FLOW_TABLE_HWS_POLICY (MLX5_MAX_TABLES - 10)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 #define MLX5_FLOW_TABLE_FACTOR 10
 
@@ -1303,6 +1342,12 @@ TAILQ_HEAD(mlx5_mtr_profiles, mlx5_flow_meter_profile);
 /* MTR list. */
 TAILQ_HEAD(mlx5_legacy_flow_meters, mlx5_legacy_flow_meter);
 
+struct mlx5_mtr_config {
+	uint32_t nb_meters; /**< Number of configured meters */
+	uint32_t nb_meter_profiles; /**< Number of configured meter profiles */
+	uint32_t nb_meter_policies; /**< Number of configured meter policies */
+};
+
 /* RSS description. */
 struct mlx5_flow_rss_desc {
 	uint32_t level;
@@ -1538,12 +1583,16 @@ struct mlx5_priv {
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
+	struct mlx5_mtr_config mtr_config; /* Meter configuration */
 	uint8_t mtr_sfx_reg; /* Meter prefix-suffix flow match REG_C. */
 	uint8_t mtr_color_reg; /* Meter color match REG_C. */
 	struct mlx5_legacy_flow_meters flow_meters; /* MTR list. */
 	struct mlx5_l3t_tbl *mtr_profile_tbl; /* Meter index lookup table. */
+	struct mlx5_flow_meter_profile *mtr_profile_arr; /* Profile array. */
 	struct mlx5_l3t_tbl *policy_idx_tbl; /* Policy index lookup table. */
+	struct mlx5_flow_meter_policy *mtr_policy_arr; /* Policy array. */
 	struct mlx5_l3t_tbl *mtr_idx_tbl; /* Meter index lookup table. */
+	struct mlx5_mtr_bulk mtr_bulk; /* Meter index mapping for HWS */
 	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 	uint8_t fdb_def_rule; /* Whether fdb jump to table 1 is configured. */
 	struct mlx5_mp_id mp_id; /* ID of a multi-process process */
@@ -1557,13 +1606,13 @@ struct mlx5_priv {
 	struct mlx5_flex_item flex_item[MLX5_PORT_FLEX_ITEM_NUM];
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
+	uint32_t nb_queue; /* HW steering queue number. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
 	/* Action template list. */
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
-	uint32_t nb_queue; /* HW steering queue number. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
@@ -1579,6 +1628,7 @@ struct mlx5_priv {
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
 #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])
+#define CTRL_QUEUE_ID(priv) ((priv)->nb_queue - 1)
 
 struct rte_hairpin_peer_info {
 	uint32_t qp_id;
@@ -1890,6 +1940,11 @@ void mlx5_pmd_socket_uninit(void);
 
 /* mlx5_flow_meter.c */
 
+int mlx5_flow_meter_init(struct rte_eth_dev *dev,
+			 uint32_t nb_meters,
+			 uint32_t nb_meter_profiles,
+			 uint32_t nb_meter_policies);
+void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
 		uint32_t meter_id, uint32_t *mtr_idx);
@@ -1964,7 +2019,7 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b570ed7f69..fb3be940e5 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8331,6 +8331,40 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 	return fops->configure(dev, port_attr, nb_queue, queue_attr, error);
 }
 
+/**
+ * Validate item template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the item template attributes.
+ * @param[in] items
+ *   The template item pattern.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"pattern validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->pattern_validate(dev, attr, items, error);
+}
+
 /**
  * Create flow item template.
  *
@@ -8396,6 +8430,43 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 	return fops->pattern_template_destroy(dev, template, error);
 }
 
+/**
+ * Validate flow actions template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the action template attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[in] masks
+ *   List of actions that marks which of the action's member is constant.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
+			const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"actions validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->actions_validate(dev, attr, actions, masks, error);
+}
+
 /**
  * Create flow item template.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a8b27ea494..3bde95c927 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1654,6 +1654,11 @@ typedef int (*mlx5_flow_port_configure_t)
 			 uint16_t nb_queue,
 			 const struct rte_flow_queue_attr *queue_attr[],
 			 struct rte_flow_error *err);
+typedef int (*mlx5_flow_pattern_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_pattern_template *(*mlx5_flow_pattern_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_pattern_template_attr *attr,
@@ -1663,6 +1668,12 @@ typedef int (*mlx5_flow_pattern_template_destroy_t)
 			(struct rte_eth_dev *dev,
 			 struct rte_flow_pattern_template *template,
 			 struct rte_flow_error *error);
+typedef int (*mlx5_flow_actions_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_actions_template *(*mlx5_flow_actions_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_actions_template_attr *attr,
@@ -1779,8 +1790,10 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_item_update_t item_update;
 	mlx5_flow_info_get_t info_get;
 	mlx5_flow_port_configure_t configure;
+	mlx5_flow_pattern_validate_t pattern_validate;
 	mlx5_flow_pattern_template_create_t pattern_template_create;
 	mlx5_flow_pattern_template_destroy_t pattern_template_destroy;
+	mlx5_flow_actions_validate_t actions_validate;
 	mlx5_flow_actions_template_create_t actions_template_create;
 	mlx5_flow_actions_template_destroy_t actions_template_destroy;
 	mlx5_flow_table_create_t template_table_create;
@@ -1862,6 +1875,8 @@ mlx5_aso_meter_by_idx(struct mlx5_priv *priv, uint32_t idx)
 
 	/* Decrease to original index. */
 	idx--;
+	if (priv->mtr_bulk.aso)
+		return priv->mtr_bulk.aso + idx;
 	MLX5_ASSERT(idx / MLX5_ASO_MTRS_PER_POOL < pools_mng->n);
 	rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 	pool = pools_mng->pools[idx / MLX5_ASO_MTRS_PER_POOL];
@@ -1964,6 +1979,32 @@ mlx5_translate_tunnel_etypes(uint64_t pattern_flags)
 
 int flow_hw_q_flow_flush(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error);
+
+/*
+ * Convert rte_mtr_color to mlx5 color.
+ *
+ * @param[in] rcol
+ *   rte_mtr_color.
+ *
+ * @return
+ *   mlx5 color.
+ */
+static inline int
+rte_col_2_mlx5_col(enum rte_color rcol)
+{
+	switch (rcol) {
+	case RTE_COLOR_GREEN:
+		return MLX5_FLOW_COLOR_GREEN;
+	case RTE_COLOR_YELLOW:
+		return MLX5_FLOW_COLOR_YELLOW;
+	case RTE_COLOR_RED:
+		return MLX5_FLOW_COLOR_RED;
+	default:
+		break;
+	}
+	return MLX5_FLOW_COLOR_UNDEFINED;
+}
+
 int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
 			     const struct mlx5_flow_tunnel *tunnel,
 			     uint32_t group, uint32_t *table,
@@ -2347,4 +2388,13 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_actions_template_attr *attr,
+		const struct rte_flow_action actions[],
+		const struct rte_flow_action masks[],
+		struct rte_flow_error *error);
+int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 4129e3a9e0..60d0280367 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -642,7 +642,8 @@ mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh)
 static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
-			       struct mlx5_aso_mtr *aso_mtr)
+			       struct mlx5_aso_mtr *aso_mtr,
+			       struct mlx5_mtr_bulk *bulk)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -653,6 +654,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t dseg_idx = 0;
 	struct mlx5_aso_mtr_pool *pool = NULL;
 	uint32_t param_le;
+	int id;
 
 	rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
@@ -666,14 +668,19 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
-	pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-			mtrs[aso_mtr->offset]);
-	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
-			(aso_mtr->offset >> 1));
-	wqe->general_cseg.opcode = rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
-			(ASO_OPC_MOD_POLICER <<
-			WQE_CSEG_OPC_MOD_OFFSET) |
-			sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
+	if (aso_mtr->type == ASO_METER_INDIRECT) {
+		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+				    mtrs[aso_mtr->offset]);
+		id = pool->devx_obj->id;
+	} else {
+		id = bulk->devx_obj->id;
+	}
+	wqe->general_cseg.misc = rte_cpu_to_be_32(id +
+						  (aso_mtr->offset >> 1));
+	wqe->general_cseg.opcode =
+		rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
+			(ASO_OPC_MOD_POLICER << WQE_CSEG_OPC_MOD_OFFSET) |
+			 sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
 	/* There are 2 meters in one ASO cache line. */
 	dseg_idx = aso_mtr->offset & 0x1;
 	wqe->aso_cseg.data_mask =
@@ -811,14 +818,15 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  */
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-			struct mlx5_aso_mtr *mtr)
+			struct mlx5_aso_mtr *mtr,
+			struct mlx5_mtr_bulk *bulk)
 {
 	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 
 	do {
 		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 5b72cfaa61..1eb1ce659f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -216,31 +216,6 @@ flow_dv_attr_init(const struct rte_flow_item *item, union flow_dv_attr *attr,
 	attr->valid = 1;
 }
 
-/*
- * Convert rte_mtr_color to mlx5 color.
- *
- * @param[in] rcol
- *   rte_mtr_color.
- *
- * @return
- *   mlx5 color.
- */
-static inline int
-rte_col_2_mlx5_col(enum rte_color rcol)
-{
-	switch (rcol) {
-	case RTE_COLOR_GREEN:
-		return MLX5_FLOW_COLOR_GREEN;
-	case RTE_COLOR_YELLOW:
-		return MLX5_FLOW_COLOR_YELLOW;
-	case RTE_COLOR_RED:
-		return MLX5_FLOW_COLOR_RED;
-	default:
-		break;
-	}
-	return MLX5_FLOW_COLOR_UNDEFINED;
-}
-
 struct field_modify_info modify_eth[] = {
 	{4,  0, MLX5_MODI_OUT_DMAC_47_16},
 	{2,  4, MLX5_MODI_OUT_DMAC_15_0},
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 64d06d4fb4..c2e16bc56d 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -914,6 +914,38 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_meter_compile(struct rte_eth_dev *dev,
+		      const struct mlx5_flow_template_table_cfg *cfg,
+		      uint32_t  start_pos, const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	const struct rte_flow_action_meter *meter = action->conf;
+	uint32_t pos = start_pos;
+	uint32_t group = cfg->attr.flow_attr.group;
+
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
+	acts->rule_acts[pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
+		acts->jump = flow_hw_jump_action_register
+		(dev, cfg, aso_mtr->fm.group, error);
+	if (!acts->jump) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	acts->rule_acts[++pos].action = (!!group) ?
+				    acts->jump->hws_action :
+				    acts->jump->root_action;
+	*end_pos = pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	return 0;
+}
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1142,6 +1174,21 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter *)
+			     masks->conf)->mtr_id) {
+				err = flow_hw_meter_compile(dev, cfg,
+						i, actions, acts, &i, error);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							i))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1482,6 +1529,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
+	const struct rte_flow_action_meter *meter = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1489,6 +1537,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	struct mlx5_aso_mtr *mtr;
+	uint32_t mtr_id;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -1608,6 +1658,29 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			rule_acts[act_data->action_dst].action =
 					priv->hw_vport[port_action->port_id];
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			meter = action->conf;
+			mtr_id = meter->mtr_id;
+			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			rule_acts[act_data->action_dst].action =
+				priv->mtr_bulk.action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+								mtr->offset;
+			jump = flow_hw_jump_action_register
+				(dev, &table->cfg, mtr->fm.group, NULL);
+			if (!jump)
+				return -1;
+			MLX5_ASSERT
+				(!rule_acts[act_data->action_dst + 1].action);
+			rule_acts[act_data->action_dst + 1].action =
+					(!!attr.group) ? jump->hws_action :
+							 jump->root_action;
+			job->flow->jump = jump;
+			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
+			(*acts_num)++;
+			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2523,7 +2596,7 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 }
 
 static int
-flow_hw_action_validate(struct rte_eth_dev *dev,
+flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
@@ -2589,6 +2662,9 @@ flow_hw_action_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -2682,7 +2758,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_action_validate(dev, attr, actions, masks, error))
+	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
@@ -3028,15 +3104,27 @@ flow_hw_pattern_template_destroy(struct rte_eth_dev *dev __rte_unused,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_hw_info_get(struct rte_eth_dev *dev __rte_unused,
-		 struct rte_flow_port_info *port_info __rte_unused,
-		 struct rte_flow_queue_info *queue_info __rte_unused,
+flow_hw_info_get(struct rte_eth_dev *dev,
+		 struct rte_flow_port_info *port_info,
+		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
-	/* Nothing to be updated currently. */
+	uint16_t port_id = dev->data->port_id;
+	struct rte_mtr_capabilities mtr_cap;
+	int ret;
+
 	memset(port_info, 0, sizeof(*port_info));
 	/* Queue size is unlimited from low-level. */
+	port_info->max_nb_queues = UINT32_MAX;
 	queue_info->max_size = UINT32_MAX;
+
+	memset(&mtr_cap, 0, sizeof(struct rte_mtr_capabilities));
+	ret = rte_mtr_capabilities_get(port_id, &mtr_cap, NULL);
+	if (!ret) {
+		port_info->max_nb_meters = mtr_cap.n_max;
+		port_info->max_nb_meter_profiles = UINT32_MAX;
+		port_info->max_nb_meter_policies = UINT32_MAX;
+	}
 	return 0;
 }
 
@@ -4231,6 +4319,13 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	priv->nb_queue = nb_q_updated;
 	rte_spinlock_init(&priv->hw_ctrl_lock);
 	LIST_INIT(&priv->hw_ctrl_flows);
+	/* Initialize meter library*/
+	if (port_attr->nb_meters)
+		if (mlx5_flow_meter_init(dev,
+					port_attr->nb_meters,
+					port_attr->nb_meter_profiles,
+					port_attr->nb_meter_policies))
+			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		uint32_t act_flags = 0;
@@ -4546,8 +4641,10 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
+	.pattern_validate = flow_hw_pattern_validate,
 	.pattern_template_create = flow_hw_pattern_template_create,
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
+	.actions_validate = flow_hw_actions_validate,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
 	.template_table_create = flow_hw_template_table_create,
@@ -4603,7 +4700,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4678,7 +4775,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -5036,4 +5133,155 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+void
+mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->mtr_policy_arr) {
+		mlx5_free(priv->mtr_policy_arr);
+		priv->mtr_policy_arr = NULL;
+	}
+	if (priv->mtr_profile_arr) {
+		mlx5_free(priv->mtr_profile_arr);
+		priv->mtr_profile_arr = NULL;
+	}
+	if (priv->mtr_bulk.aso) {
+		mlx5_free(priv->mtr_bulk.aso);
+		priv->mtr_bulk.aso = NULL;
+		priv->mtr_bulk.size = 0;
+		mlx5_aso_queue_uninit(priv->sh, ASO_OPC_MOD_POLICER);
+	}
+	if (priv->mtr_bulk.action) {
+		mlx5dr_action_destroy(priv->mtr_bulk.action);
+		priv->mtr_bulk.action = NULL;
+	}
+	if (priv->mtr_bulk.devx_obj) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->mtr_bulk.devx_obj));
+		priv->mtr_bulk.devx_obj = NULL;
+	}
+}
+
+int
+mlx5_flow_meter_init(struct rte_eth_dev *dev,
+		     uint32_t nb_meters,
+		     uint32_t nb_meter_profiles,
+		     uint32_t nb_meter_policies)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_obj *dcs = NULL;
+	uint32_t log_obj_size;
+	int ret = 0;
+	int reg_id;
+	struct mlx5_aso_mtr *aso;
+	uint32_t i;
+	struct rte_flow_error error;
+
+	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter configuration is invalid.");
+		goto err;
+	}
+	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO is not supported.");
+		goto err;
+	}
+	priv->mtr_config.nb_meters = nb_meters;
+	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	log_obj_size = rte_log2_u32(nb_meters >> 1);
+	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
+		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
+			log_obj_size);
+	if (!dcs) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO object allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.devx_obj = dcs;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	if (reg_id < 0) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter register is not available.");
+		goto err;
+	}
+	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
+			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
+				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
+				MLX5DR_ACTION_FLAG_HWS_TX |
+				MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!priv->mtr_bulk.action) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter action creation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
+						sizeof(struct mlx5_aso_mtr) * nb_meters,
+						RTE_CACHE_LINE_SIZE,
+						SOCKET_ID_ANY);
+	if (!priv->mtr_bulk.aso) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter bulk ASO allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.size = nb_meters;
+	aso = priv->mtr_bulk.aso;
+	for (i = 0; i < priv->mtr_bulk.size; i++) {
+		aso->type = ASO_METER_DIRECT;
+		aso->state = ASO_METER_WAIT;
+		aso->offset = i;
+		aso++;
+	}
+	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
+	priv->mtr_profile_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_profile) *
+				nb_meter_profiles,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_profile_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter profile allocation failed.");
+		goto err;
+	}
+	priv->mtr_config.nb_meter_policies = nb_meter_policies;
+	priv->mtr_policy_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_policy) *
+				nb_meter_policies,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_policy_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter policy allocation failed.");
+		goto err;
+	}
+	return 0;
+err:
+	mlx5_flow_meter_uninit(dev);
+	return ret;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index d4aafe4eea..792b945c98 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -98,6 +98,8 @@ mlx5_flow_meter_profile_find(struct mlx5_priv *priv, uint32_t meter_profile_id)
 	union mlx5_l3t_data data;
 	int32_t ret;
 
+	if (priv->mtr_profile_arr)
+		return &priv->mtr_profile_arr[meter_profile_id];
 	if (mlx5_l3t_get_entry(priv->mtr_profile_tbl,
 			       meter_profile_id, &data) || !data.ptr)
 		return NULL;
@@ -145,17 +147,29 @@ mlx5_flow_meter_profile_validate(struct rte_eth_dev *dev,
 					  RTE_MTR_ERROR_TYPE_METER_PROFILE,
 					  NULL, "Meter profile is null.");
 	/* Meter profile ID must be valid. */
-	if (meter_profile_id == UINT32_MAX)
-		return -rte_mtr_error_set(error, EINVAL,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL, "Meter profile id not valid.");
-	/* Meter profile must not exist. */
-	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
-	if (fmp)
-		return -rte_mtr_error_set(error, EEXIST,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL,
-					  "Meter profile already exists.");
+	if (priv->mtr_profile_arr) {
+		if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp->initialized)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	} else {
+		if (meter_profile_id == UINT32_MAX)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	}
 	if (!priv->sh->meter_aso_en) {
 		/* Old version is even not supported. */
 		if (!priv->sh->cdev->config.hca_attr.qos.flow_meter_old)
@@ -574,6 +588,96 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to add MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[in] profile
+ *   Pointer to meter profile detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_add(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_meter_profile *profile,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+	int ret;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Check input params. */
+	ret = mlx5_flow_meter_profile_validate(dev, meter_profile_id,
+					       profile, error);
+	if (ret)
+		return ret;
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	/* Fill profile info. */
+	fmp->id = meter_profile_id;
+	fmp->profile = *profile;
+	fmp->initialized = 1;
+	/* Fill the flow meter parameters for the PRM. */
+	return mlx5_flow_meter_param_fill(fmp, error);
+}
+
+/**
+ * Callback to delete MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_delete(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Meter id must be valid. */
+	if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id not valid.");
+	/* Meter profile must exist. */
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	if (!fmp->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id is invalid.");
+	/* Check profile is unused. */
+	if (fmp->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  NULL, "Meter profile is in use.");
+	memset(fmp, 0, sizeof(struct mlx5_flow_meter_profile));
+	return 0;
+}
+
 /**
  * Find policy by id.
  *
@@ -594,6 +698,11 @@ mlx5_flow_meter_policy_find(struct rte_eth_dev *dev,
 	struct mlx5_flow_meter_sub_policy *sub_policy = NULL;
 	union mlx5_l3t_data data;
 
+	if (priv->mtr_policy_arr) {
+		if (policy_idx)
+			*policy_idx = policy_id;
+		return &priv->mtr_policy_arr[policy_id];
+	}
 	if (policy_id > MLX5_MAX_SUB_POLICY_TBL_NUM || !priv->policy_idx_tbl)
 		return NULL;
 	if (mlx5_l3t_get_entry(priv->policy_idx_tbl, policy_id, &data) ||
@@ -710,6 +819,43 @@ mlx5_flow_meter_policy_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to check MTR policy action validate for HWS
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_validate(struct rte_eth_dev *dev,
+	struct rte_mtr_meter_policy_params *policy,
+	struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_actions_template_attr attr = {
+		.transfer = priv->sh->config.dv_esw_en ? 1 : 0 };
+	int ret;
+	int i;
+
+	if (!priv->mtr_en || !priv->sh->meter_aso_en)
+		return -rte_mtr_error_set(error, ENOTSUP,
+				RTE_MTR_ERROR_TYPE_METER_POLICY,
+				NULL, "meter policy unsupported.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		ret = mlx5_flow_actions_validate(dev, &attr, policy->actions[i],
+						 policy->actions[i], NULL);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int
 __mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 			uint32_t policy_id,
@@ -1004,6 +1150,338 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to delete MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_delete(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy;
+	uint32_t i, j;
+	uint32_t nb_flows = 0;
+	int ret;
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter policy array is not allocated");
+	/* Meter id must be valid. */
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  &policy_id,
+					  "Meter policy id not valid.");
+	/* Meter policy must exist. */
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (!mtr_policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID, NULL,
+			"Meter policy does not exists.");
+	/* Check policy is unused. */
+	if (mtr_policy->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy is in use.");
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->hws_flow_rule[i][j]) {
+				ret = rte_flow_async_destroy(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_rule[i][j],
+					NULL, NULL);
+				if (ret < 0)
+					continue;
+				nb_flows++;
+			}
+		}
+	}
+	ret = rte_flow_push(dev->data->port_id, CTRL_QUEUE_ID(priv), NULL);
+	while (nb_flows && (ret >= 0)) {
+		ret = rte_flow_pull(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), result,
+					nb_flows, NULL);
+		nb_flows -= ret;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		if (mtr_policy->hws_flow_table[i])
+			rte_flow_template_table_destroy(dev->data->port_id,
+				 mtr_policy->hws_flow_table[i], NULL);
+	}
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->hws_act_templ[i])
+			rte_flow_actions_template_destroy(dev->data->port_id,
+				 mtr_policy->hws_act_templ[i], NULL);
+	}
+	if (mtr_policy->hws_item_templ)
+		rte_flow_pattern_template_destroy(dev->data->port_id,
+				mtr_policy->hws_item_templ, NULL);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	return 0;
+}
+
+/**
+ * Callback to add MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[out] policy_id
+ *   Pointer to policy id
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
+			uint32_t policy_id,
+			struct rte_mtr_meter_policy_params *policy,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy = NULL;
+	const struct rte_flow_action *act;
+	const struct rte_flow_action_meter *mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *plc;
+	uint8_t domain_color = MLX5_MTR_ALL_DOMAIN_BIT;
+	bool is_rss = false;
+	bool is_hierarchy = false;
+	int i, j;
+	uint32_t nb_colors = 0;
+	uint32_t nb_flows = 0;
+	int color;
+	int ret;
+	struct rte_flow_pattern_template_attr pta = {0};
+	struct rte_flow_actions_template_attr ata = {0};
+	struct rte_flow_template_table_attr ta = { {0}, 0 };
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+	const uint32_t color_mask = (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	int color_reg_c_idx = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						   0, NULL);
+	struct rte_flow_item_tag tag_spec = {
+		.data = 0,
+		.index = color_reg_c_idx
+	};
+	struct rte_flow_item_tag tag_mask = {
+		.data = color_mask,
+		.index = 0xff};
+	struct rte_flow_item pattern[] = {
+		[0] = {
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &tag_spec,
+			.mask = &tag_mask,
+		},
+		[1] = { .type = RTE_FLOW_ITEM_TYPE_END }
+	};
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy array is not allocated.");
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy id not valid.");
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (mtr_policy->initialized)
+		return -rte_mtr_error_set(error, EEXIST,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy already exists.");
+	if (!policy ||
+	    !policy->actions[RTE_COLOR_RED] ||
+	    !policy->actions[RTE_COLOR_YELLOW] ||
+	    !policy->actions[RTE_COLOR_GREEN])
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy actions are not valid.");
+	if (policy->actions[RTE_COLOR_RED] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_r = 1;
+	if (policy->actions[RTE_COLOR_YELLOW] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_y = 1;
+	if (policy->actions[RTE_COLOR_GREEN] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_g = 1;
+	if (mtr_policy->skip_r && mtr_policy->skip_y && mtr_policy->skip_g)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy actions are empty.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		act = policy->actions[i];
+		while (act && act->type != RTE_FLOW_ACTION_TYPE_END) {
+			switch (act->type) {
+			case RTE_FLOW_ACTION_TYPE_PORT_ID:
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+				domain_color &= ~(MLX5_MTR_DOMAIN_INGRESS_BIT |
+						  MLX5_MTR_DOMAIN_EGRESS_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_RSS:
+				is_rss = true;
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_QUEUE:
+				domain_color &= ~(MLX5_MTR_DOMAIN_EGRESS_BIT |
+						  MLX5_MTR_DOMAIN_TRANSFER_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_METER:
+				is_hierarchy = true;
+				mtr = act->conf;
+				fm = mlx5_flow_meter_find(priv,
+							  mtr->mtr_id, NULL);
+				if (!fm)
+					return -rte_mtr_error_set(error, EINVAL,
+						RTE_MTR_ERROR_TYPE_MTR_ID, NULL,
+						"Meter not found in meter hierarchy.");
+				plc = mlx5_flow_meter_policy_find(dev,
+								  fm->policy_id,
+								  NULL);
+				MLX5_ASSERT(plc);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->ingress <<
+					 MLX5_MTR_DOMAIN_INGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->egress <<
+					 MLX5_MTR_DOMAIN_EGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->transfer <<
+					 MLX5_MTR_DOMAIN_TRANSFER);
+				break;
+			default:
+				break;
+			}
+			act++;
+		}
+	}
+	if (!domain_color)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy domains are conflicting.");
+	mtr_policy->is_rss = is_rss;
+	mtr_policy->ingress = !!(domain_color & MLX5_MTR_DOMAIN_INGRESS_BIT);
+	pta.ingress = mtr_policy->ingress;
+	mtr_policy->egress = !!(domain_color & MLX5_MTR_DOMAIN_EGRESS_BIT);
+	pta.egress = mtr_policy->egress;
+	mtr_policy->transfer = !!(domain_color & MLX5_MTR_DOMAIN_TRANSFER_BIT);
+	pta.transfer = mtr_policy->transfer;
+	mtr_policy->group = MLX5_FLOW_TABLE_HWS_POLICY - policy_id;
+	mtr_policy->is_hierarchy = is_hierarchy;
+	mtr_policy->initialized = 1;
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	mtr_policy->hws_item_templ =
+		rte_flow_pattern_template_create(dev->data->port_id,
+						 &pta, pattern, NULL);
+	if (!mtr_policy->hws_item_templ)
+		goto policy_add_err;
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->skip_g && i == RTE_COLOR_GREEN)
+			continue;
+		if (mtr_policy->skip_y && i == RTE_COLOR_YELLOW)
+			continue;
+		if (mtr_policy->skip_r && i == RTE_COLOR_RED)
+			continue;
+		mtr_policy->hws_act_templ[nb_colors] =
+			rte_flow_actions_template_create(dev->data->port_id,
+						&ata, policy->actions[i],
+						policy->actions[i], NULL);
+		if (!mtr_policy->hws_act_templ[nb_colors])
+			goto policy_add_err;
+		nb_colors++;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		memset(&ta, 0, sizeof(ta));
+		ta.nb_flows = RTE_COLORS;
+		ta.flow_attr.group = mtr_policy->group;
+		if (i == MLX5_MTR_DOMAIN_INGRESS) {
+			if (!mtr_policy->ingress)
+				continue;
+			ta.flow_attr.ingress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_EGRESS) {
+			if (!mtr_policy->egress)
+				continue;
+			ta.flow_attr.egress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_TRANSFER) {
+			if (!mtr_policy->transfer)
+				continue;
+			ta.flow_attr.transfer = 1;
+		}
+		mtr_policy->hws_flow_table[i] =
+			rte_flow_template_table_create(dev->data->port_id,
+					&ta, &mtr_policy->hws_item_templ, 1,
+					mtr_policy->hws_act_templ, nb_colors,
+					NULL);
+		if (!mtr_policy->hws_flow_table[i])
+			goto policy_add_err;
+		nb_colors = 0;
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->skip_g && j == RTE_COLOR_GREEN)
+				continue;
+			if (mtr_policy->skip_y && j == RTE_COLOR_YELLOW)
+				continue;
+			if (mtr_policy->skip_r && j == RTE_COLOR_RED)
+				continue;
+			color = rte_col_2_mlx5_col((enum rte_color)j);
+			tag_spec.data = color;
+			mtr_policy->hws_flow_rule[i][j] =
+				rte_flow_async_create(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_table[i],
+					pattern, 0, policy->actions[j],
+					nb_colors, NULL, NULL);
+			if (!mtr_policy->hws_flow_rule[i][j])
+				goto policy_add_err;
+			nb_colors++;
+			nb_flows++;
+		}
+		ret = rte_flow_push(dev->data->port_id,
+				    CTRL_QUEUE_ID(priv), NULL);
+		if (ret < 0)
+			goto policy_add_err;
+		while (nb_flows) {
+			ret = rte_flow_pull(dev->data->port_id,
+					    CTRL_QUEUE_ID(priv), result,
+					    nb_flows, NULL);
+			if (ret < 0)
+				goto policy_add_err;
+			for (j = 0; j < ret; j++) {
+				if (result[j].status == RTE_FLOW_OP_ERROR)
+					goto policy_add_err;
+			}
+			nb_flows -= ret;
+		}
+	}
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+policy_add_err:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	ret = mlx5_flow_meter_policy_hws_delete(dev, policy_id, error);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	if (ret)
+		return ret;
+	return -rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Failed to create meter policy.");
+}
+
 /**
  * Check meter validation.
  *
@@ -1087,7 +1565,8 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
@@ -1336,7 +1815,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1369,6 +1849,90 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 		NULL, "Failed to create devx meter.");
 }
 
+/**
+ * Create meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[in] params
+ *   Pointer to rte meter parameters.
+ * @param[in] shared
+ *   Meter shared with other flow or not.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
+		       struct rte_mtr_params *params, int shared,
+		       struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *profile;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy = NULL;
+	struct mlx5_aso_mtr *aso_mtr;
+	int ret;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+			"Meter bulk array is not allocated.");
+	/* Meter profile must exist. */
+	profile = mlx5_flow_meter_profile_find(priv, params->meter_profile_id);
+	if (!profile->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+			NULL, "Meter profile id not valid.");
+	/* Meter policy must exist. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			params->meter_policy_id, NULL);
+	if (!policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy id not valid.");
+	/* Meter ID must be valid. */
+	if (meter_id >= priv->mtr_config.nb_meters)
+		return -rte_mtr_error_set(error, EINVAL,
+			RTE_MTR_ERROR_TYPE_MTR_ID,
+			NULL, "Meter id not valid.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object already exists.");
+	/* Fill the flow meter parameters. */
+	fm->meter_id = meter_id;
+	fm->policy_id = params->meter_policy_id;
+	fm->profile = profile;
+	fm->meter_offset = meter_id;
+	fm->group = policy->group;
+	/* Add to the flow meter list. */
+	fm->active_state = 1; /* Config meter starts as active. */
+	fm->is_enable = params->meter_enable;
+	fm->shared = !!shared;
+	fm->initialized = 1;
+	/* Update ASO flow meter by wqe. */
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+					   &priv->mtr_bulk);
+	if (ret)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+			NULL, "Failed to create devx meter.");
+	fm->active_state = params->meter_enable;
+	__atomic_add_fetch(&fm->profile->ref_cnt, 1, __ATOMIC_RELAXED);
+	__atomic_add_fetch(&policy->ref_cnt, 1, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
 mlx5_flow_meter_params_flush(struct rte_eth_dev *dev,
 			struct mlx5_flow_meter_info *fm,
@@ -1475,6 +2039,58 @@ mlx5_flow_meter_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
 	return 0;
 }
 
+/**
+ * Destroy meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_METER_POLICY, NULL,
+			"Meter bulk array is not allocated.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (!fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object id not valid.");
+	/* Meter object must not have any owner. */
+	if (fm->ref_cnt > 0)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter object is being used.");
+	/* Destroy the meter profile. */
+	__atomic_sub_fetch(&fm->profile->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	/* Destroy the meter policy. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			fm->policy_id, NULL);
+	__atomic_sub_fetch(&policy->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	memset(fm, 0, sizeof(struct mlx5_flow_meter_info));
+	return 0;
+}
+
 /**
  * Modify meter state.
  *
@@ -1798,6 +2414,23 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.stats_read = mlx5_flow_meter_stats_read,
 };
 
+static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
+	.capabilities_get = mlx5_flow_mtr_cap_get,
+	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
+	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
+	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
+	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.create = mlx5_flow_meter_hws_create,
+	.destroy = mlx5_flow_meter_hws_destroy,
+	.meter_enable = mlx5_flow_meter_enable,
+	.meter_disable = mlx5_flow_meter_disable,
+	.meter_profile_update = mlx5_flow_meter_profile_update,
+	.meter_dscp_table_update = NULL,
+	.stats_update = NULL,
+	.stats_read = NULL,
+};
+
 /**
  * Get meter operations.
  *
@@ -1812,7 +2445,12 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 int
 mlx5_flow_meter_ops_get(struct rte_eth_dev *dev __rte_unused, void *arg)
 {
-	*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_hws_ops;
+	else
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
 	return 0;
 }
 
@@ -1841,6 +2479,12 @@ mlx5_flow_meter_find(struct mlx5_priv *priv, uint32_t meter_id,
 	union mlx5_l3t_data data;
 	uint16_t n_valid;
 
+	if (priv->mtr_bulk.aso) {
+		if (mtr_idx)
+			*mtr_idx = meter_id;
+		aso_mtr = priv->mtr_bulk.aso + meter_id;
+		return &aso_mtr->fm;
+	}
 	if (priv->sh->meter_aso_en) {
 		rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 		n_valid = pools_mng->n_valid;
@@ -2185,6 +2829,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 	struct mlx5_flow_meter_profile *fmp;
 	struct mlx5_legacy_flow_meter *legacy_fm;
 	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
 	struct mlx5_flow_meter_sub_policy *sub_policy;
 	void *tmp;
 	uint32_t i, mtr_idx, policy_idx;
@@ -2219,6 +2864,14 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 				NULL, "MTR object meter profile invalid.");
 		}
 	}
+	if (priv->mtr_bulk.aso) {
+		for (i = 1; i <= priv->mtr_config.nb_meter_profiles; i++) {
+			aso_mtr = mlx5_aso_meter_by_idx(priv, i);
+			fm = &aso_mtr->fm;
+			if (fm->initialized)
+				mlx5_flow_meter_hws_destroy(dev, i, error);
+		}
+	}
 	if (priv->policy_idx_tbl) {
 		MLX5_L3T_FOREACH(priv->policy_idx_tbl, i, entry) {
 			policy_idx = *(uint32_t *)entry;
@@ -2244,6 +2897,15 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->policy_idx_tbl);
 		priv->policy_idx_tbl = NULL;
 	}
+	if (priv->mtr_policy_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_policies; i++) {
+			policy = mlx5_flow_meter_policy_find(dev, i,
+							     &policy_idx);
+			if (policy->initialized)
+				mlx5_flow_meter_policy_hws_delete(dev, i,
+								  error);
+		}
+	}
 	if (priv->mtr_profile_tbl) {
 		MLX5_L3T_FOREACH(priv->mtr_profile_tbl, i, entry) {
 			fmp = entry;
@@ -2257,9 +2919,21 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->mtr_profile_tbl);
 		priv->mtr_profile_tbl = NULL;
 	}
+	if (priv->mtr_profile_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_profiles; i++) {
+			fmp = mlx5_flow_meter_profile_find(priv, i);
+			if (fmp->initialized)
+				mlx5_flow_meter_profile_hws_delete(dev, i,
+								   error);
+		}
+	}
 	/* Delete default policy table. */
 	mlx5_flow_destroy_def_policy(dev);
 	if (priv->sh->refcnt == 1)
 		mlx5_flow_destroy_mtr_drop_tbls(dev);
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	/* Destroy HWS configuration. */
+	mlx5_flow_meter_uninit(dev);
+#endif
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 08/17] net/mlx5: add HW steering counter action
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (6 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 07/17] net/mlx5: add HW steering meter action Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 09/17] net/mlx5: support DR action template API Suanming Mou
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Xiaoyu Min

From: Xiaoyu Min <jackmin@nvidia.com>

This commit adds HW steering counter action support.
Pool mechanism is the basic data structure for the HW steering counter.

The HW steering's counter pool is based on the rte_ring of zero-copy
variation.

There are two global rte_rings:
1. free_list:
     Store the counters indexes, which are ready for use.
2. wait_reset_list:
     Store the counters indexes, which are just freed from the user and
     need to query the hardware counter to get the reset value before
     this counter can be reused again.

The counter pool also supports cache per HW steering's queues, which are
also based on rte_ring of zero-copy variation.

The cache can be configured in size, preload, threshold, and fetch size,
they are all exposed via device args.

The main operations of the counter pool are as follows:

 - Get one counter from the pool:
   1. The user call _get_* API.
   2. If the cache is enabled, dequeue one counter index from the local
      cache:
      2.A: if the dequeued one from the local cache is still in reset
	status (counter's query_gen_when_free is equal to pool's query
	gen):
	I. Flush all counters in local cache back to global
	   wait_reset_list.
	II. Fetch _fetch_sz_ counters into the cache from the global
	    free list.
	III. Fetch one counter from the cache.
   3. If the cache is empty, fetch _fetch_sz_ counters from the global
      free list into the cache and fetch one counter from the cache.
 - Free one counter into the pool:
   1. The user calls _put_* API.
   2. Put the counter into the local cache.
   3. If the local cache is full:
      3.A: Write back all counters above _threshold_ into the global
           wait_reset_list.
      3.B: Also, write back this counter into the global wait_reset_list.

When the local cache is disabled, _get_/_put_ cache directly from/into
global list.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  50 +++
 drivers/common/mlx5/mlx5_devx_cmds.h |  27 ++
 drivers/common/mlx5/mlx5_prm.h       |  62 ++-
 drivers/common/mlx5/version.map      |   1 +
 drivers/net/mlx5/meson.build         |   1 +
 drivers/net/mlx5/mlx5.c              |  14 +
 drivers/net/mlx5/mlx5.h              |  27 ++
 drivers/net/mlx5/mlx5_defs.h         |   2 +
 drivers/net/mlx5/mlx5_flow.c         |  27 +-
 drivers/net/mlx5/mlx5_flow.h         |   5 +
 drivers/net/mlx5/mlx5_flow_aso.c     | 261 ++++++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c      | 340 +++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.c      | 528 +++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_hws_cnt.h      | 558 +++++++++++++++++++++++++++
 14 files changed, 1871 insertions(+), 32 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index ac6891145d..eef7a98248 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -176,6 +176,41 @@ mlx5_devx_cmd_register_write(void *ctx, uint16_t reg_id, uint32_t arg,
 	return 0;
 }
 
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+		struct mlx5_devx_counter_attr *attr)
+{
+	struct mlx5_devx_obj *dcs = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*dcs),
+						0, SOCKET_ID_ANY);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	if (attr->bulk_log_max_alloc)
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk_log_size,
+			 attr->flow_counter_bulk_log_size);
+	else
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk,
+			 attr->bulk_n_128);
+	if (attr->pd_valid)
+		MLX5_SET(alloc_flow_counter_in, in, pd, attr->pd);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		mlx5_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
 /**
  * Allocate flow counters via devx interface.
  *
@@ -967,6 +1002,16 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 					 general_obj_types) &
 			      MLX5_GENERAL_OBJ_TYPES_CAP_CONN_TRACK_OFFLOAD);
 	attr->rq_delay_drop = MLX5_GET(cmd_hca_cap, hcattr, rq_delay_drop);
+	attr->max_flow_counter_15_0 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_15_0);
+	attr->max_flow_counter_31_16 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_31_16);
+	attr->alloc_flow_counter_pd = MLX5_GET(cmd_hca_cap, hcattr,
+			alloc_flow_counter_pd);
+	attr->flow_counter_access_aso = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_counter_access_aso);
+	attr->flow_access_aso_opc_mod = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_access_aso_opc_mod);
 	if (attr->crypto) {
 		attr->aes_xts = MLX5_GET(cmd_hca_cap, hcattr, aes_xts);
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
@@ -989,6 +1034,11 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 		}
 		attr->log_min_stride_wqe_sz = MLX5_GET(cmd_hca_cap_2, hcattr,
 						       log_min_stride_wqe_sz);
+		attr->flow_counter_bulk_log_max_alloc = MLX5_GET(cmd_hca_cap_2,
+				hcattr, flow_counter_bulk_log_max_alloc);
+		attr->flow_counter_bulk_log_granularity =
+			MLX5_GET(cmd_hca_cap_2, hcattr,
+				 flow_counter_bulk_log_granularity);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index d69dad613e..15b46f2acd 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -15,6 +15,16 @@
 #define MLX5_DEVX_MAX_KLM_ENTRIES ((UINT16_MAX - \
 		MLX5_ST_SZ_DW(create_mkey_in) * 4) / (MLX5_ST_SZ_DW(klm) * 4))
 
+struct mlx5_devx_counter_attr {
+	uint32_t pd_valid:1;
+	uint32_t pd:24;
+	uint32_t bulk_log_max_alloc:1;
+	union {
+		uint8_t flow_counter_bulk_log_size;
+		uint8_t bulk_n_128;
+	};
+};
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
@@ -263,6 +273,18 @@ struct mlx5_hca_attr {
 	uint32_t set_reg_c:8;
 	uint32_t nic_flow_table:1;
 	uint32_t modify_outer_ip_ecn:1;
+	union {
+		uint32_t max_flow_counter;
+		struct {
+			uint16_t max_flow_counter_15_0;
+			uint16_t max_flow_counter_31_16;
+		};
+	};
+	uint32_t flow_counter_bulk_log_max_alloc:5;
+	uint32_t flow_counter_bulk_log_granularity:5;
+	uint32_t alloc_flow_counter_pd:1;
+	uint32_t flow_counter_access_aso:1;
+	uint32_t flow_access_aso_opc_mod:8;
 };
 
 /* LAG Context. */
@@ -593,6 +615,11 @@ struct mlx5_devx_crypto_login_attr {
 
 /* mlx5_devx_cmds.c */
 
+__rte_internal
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+				struct mlx5_devx_counter_attr *attr);
+
 __rte_internal
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(void *ctx,
 						       uint32_t bulk_sz);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index c82ec94465..8514ca8fc4 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1161,8 +1161,10 @@ struct mlx5_ifc_alloc_flow_counter_in_bits {
 	u8 reserved_at_10[0x10];
 	u8 reserved_at_20[0x10];
 	u8 op_mod[0x10];
-	u8 flow_counter_id[0x20];
-	u8 reserved_at_40[0x18];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x13];
+	u8 flow_counter_bulk_log_size[0x5];
 	u8 flow_counter_bulk[0x8];
 };
 
@@ -1382,7 +1384,13 @@ enum {
 #define MLX5_STEERING_LOGIC_FORMAT_CONNECTX_6DX 0x1
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x20];
+	u8 access_other_hca_roce[0x1];
+	u8 alloc_flow_counter_pd[0x1];
+	u8 flow_counter_access_aso[0x1];
+	u8 reserved_at_3[0x5];
+	u8 flow_access_aso_opc_mod[0x8];
+	u8 reserved_at_10[0xf];
+	u8 vhca_resource_manager[0x1];
 	u8 hca_cap_2[0x1];
 	u8 reserved_at_21[0xf];
 	u8 vhca_id[0x10];
@@ -2058,8 +2066,52 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 log_conn_track_max_alloc[0x5];
 	u8 reserved_at_d8[0x3];
 	u8 log_max_conn_track_offload[0x5];
-	u8 reserved_at_e0[0x20]; /* End of DW7. */
-	u8 reserved_at_100[0x700];
+	u8 reserved_at_e0[0xc0];
+	u8 reserved_at_1a0[0xb];
+	u8 format_select_dw_8_6_ext[0x1];
+	u8 reserved_at_1ac[0x14];
+	u8 general_obj_types_127_64[0x40];
+	u8 reserved_at_200[0x53];
+	u8 flow_counter_bulk_log_max_alloc[0x5];
+	u8 reserved_at_258[0x3];
+	u8 flow_counter_bulk_log_granularity[0x5];
+	u8 reserved_at_260[0x20];
+	u8 format_select_dw_gtpu_dw_0[0x8];
+	u8 format_select_dw_gtpu_dw_1[0x8];
+	u8 format_select_dw_gtpu_dw_2[0x8];
+	u8 format_select_dw_gtpu_first_ext_dw_0[0x8];
+	u8 reserved_at_2a0[0x560];
+};
+
+struct mlx5_ifc_wqe_based_flow_table_cap_bits {
+	u8 reserved_at_0[0x3];
+	u8 log_max_num_ste[0x5];
+	u8 reserved_at_8[0x3];
+	u8 log_max_num_stc[0x5];
+	u8 reserved_at_10[0x3];
+	u8 log_max_num_rtc[0x5];
+	u8 reserved_at_18[0x3];
+	u8 log_max_num_header_modify_pattern[0x5];
+	u8 reserved_at_20[0x3];
+	u8 stc_alloc_log_granularity[0x5];
+	u8 reserved_at_28[0x3];
+	u8 stc_alloc_log_max[0x5];
+	u8 reserved_at_30[0x3];
+	u8 ste_alloc_log_granularity[0x5];
+	u8 reserved_at_38[0x3];
+	u8 ste_alloc_log_max[0x5];
+	u8 reserved_at_40[0xb];
+	u8 rtc_reparse_mode[0x5];
+	u8 reserved_at_50[0x3];
+	u8 rtc_index_mode[0x5];
+	u8 reserved_at_58[0x3];
+	u8 rtc_log_depth_max[0x5];
+	u8 reserved_at_60[0x10];
+	u8 ste_format[0x10];
+	u8 stc_action_type[0x80];
+	u8 header_insert_type[0x10];
+	u8 header_remove_type[0x10];
+	u8 trivial_match_definer[0x20];
 };
 
 struct mlx5_ifc_esw_cap_bits {
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 413dec14ab..4f72900519 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -40,6 +40,7 @@ INTERNAL {
 	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_alloc_general;
 	mlx5_devx_cmd_flow_counter_query;
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_flow_single_dump;
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 6a84d96380..f2d7bcaff6 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -38,6 +38,7 @@ sources = files(
         'mlx5_vlan.c',
         'mlx5_utils.c',
         'mlx5_devx.c',
+	'mlx5_hws_cnt.c',
 )
 
 if is_linux
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cf5146d677..b6a66f12ee 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -175,6 +175,12 @@
 /* Device parameter to create the fdb default rule in PMD */
 #define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
 
+/* HW steering counter configuration. */
+#define MLX5_HWS_CNT_SERVICE_CORE "service_core"
+
+/* HW steering counter's query interval. */
+#define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1245,6 +1251,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->allow_duplicate_pattern = !!tmp;
 	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
 		config->fdb_def_rule = !!tmp;
+	} else if (strcmp(MLX5_HWS_CNT_SERVICE_CORE, key) == 0) {
+		config->cnt_svc.service_core = tmp;
+	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
+		config->cnt_svc.cycle_time = tmp;
 	}
 	return 0;
 }
@@ -1281,6 +1291,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
 		MLX5_FDB_DEFAULT_RULE_EN,
+		MLX5_HWS_CNT_SERVICE_CORE,
+		MLX5_HWS_CNT_CYCLE_TIME,
 		NULL,
 	};
 	int ret = 0;
@@ -1293,6 +1305,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
 	config->fdb_def_rule = 1;
+	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
+	config->cnt_svc.service_core = rte_get_main_lcore();
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 686969719a..4859f5a509 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -308,6 +308,10 @@ struct mlx5_sh_config {
 	uint32_t hw_fcs_strip:1; /* FCS stripping is supported. */
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
+	struct {
+		uint16_t service_core;
+		uint32_t cycle_time; /* query cycle time in milli-second. */
+	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
@@ -1224,6 +1228,22 @@ struct mlx5_flex_item {
 	struct mlx5_flex_pattern_field map[MLX5_FLEX_ITEM_MAPPING_NUM];
 };
 
+#define HWS_CNT_ASO_SQ_NUM 4
+
+struct mlx5_hws_aso_mng {
+	uint16_t sq_num;
+	struct mlx5_aso_sq sqs[HWS_CNT_ASO_SQ_NUM];
+};
+
+struct mlx5_hws_cnt_svc_mng {
+	uint32_t refcnt;
+	uint32_t service_core;
+	uint32_t query_interval;
+	pthread_t service_thread;
+	uint8_t svc_running;
+	struct mlx5_hws_aso_mng aso_mng __rte_cache_aligned;
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -1323,6 +1343,7 @@ struct mlx5_dev_ctx_shared {
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
+	struct mlx5_hws_cnt_svc_mng *cnt_svc;
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1607,6 +1628,7 @@ struct mlx5_priv {
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
 	uint32_t nb_queue; /* HW steering queue number. */
+	struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
@@ -2037,6 +2059,11 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
+void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
+int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
 /* mlx5_flow_flex.c */
 
 struct rte_flow_item_flex_handle *
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 585afb0a98..d064abfef3 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -188,4 +188,6 @@
 #define static_assert _Static_assert
 #endif
 
+#define MLX5_CNT_SVC_CYCLE_TIME_DEFAULT 500
+
 #endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index fb3be940e5..658cc69750 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7832,24 +7832,33 @@ mlx5_flow_isolate(struct rte_eth_dev *dev,
  */
 static int
 flow_drv_query(struct rte_eth_dev *dev,
-	       uint32_t flow_idx,
+	       struct rte_flow *eflow,
 	       const struct rte_flow_action *actions,
 	       void *data,
 	       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow *flow = mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
-					       flow_idx);
-	enum mlx5_flow_drv_type ftype;
+	struct rte_flow *flow = NULL;
+	enum mlx5_flow_drv_type ftype = MLX5_FLOW_TYPE_MIN;
 
+	if (priv->sh->config.dv_flow_en == 2) {
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+		flow = eflow;
+		ftype = MLX5_FLOW_TYPE_HW;
+#endif
+	} else {
+		flow = (struct rte_flow *)mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
+				(uintptr_t)(void *)eflow);
+	}
 	if (!flow) {
 		return rte_flow_error_set(error, ENOENT,
 			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			  NULL,
 			  "invalid flow handle");
 	}
-	ftype = flow->drv_type;
+	if (ftype == MLX5_FLOW_TYPE_MIN)
+		ftype = flow->drv_type;
 	MLX5_ASSERT(ftype > MLX5_FLOW_TYPE_MIN && ftype < MLX5_FLOW_TYPE_MAX);
 	fops = flow_get_drv_ops(ftype);
 
@@ -7870,14 +7879,8 @@ mlx5_flow_query(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	int ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (priv->sh->config.dv_flow_en == 2)
-		return rte_flow_error_set(error, ENOTSUP,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-			  NULL,
-			  "Flow non-Q query not supported");
-	ret = flow_drv_query(dev, (uintptr_t)(void *)flow, actions, data,
+	ret = flow_drv_query(dev, flow, actions, data,
 			     error);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 3bde95c927..8f1b66eaac 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1103,6 +1103,7 @@ struct rte_flow_hw {
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	struct mlx5dr_rule rule; /* HWS layer data struct. */
+	uint32_t cnt_id;
 } __rte_packed;
 
 /* rte flow action translate to DR action struct. */
@@ -1146,6 +1147,9 @@ struct mlx5_action_construct_data {
 			uint32_t level; /* RSS level. */
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
+		struct {
+			uint32_t id;
+		} shared_counter;
 	};
 };
 
@@ -1224,6 +1228,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
+	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 60d0280367..ed9272e583 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -12,6 +12,9 @@
 
 #include "mlx5.h"
 #include "mlx5_flow.h"
+#include "mlx5_hws_cnt.h"
+
+#define MLX5_ASO_CNT_QUEUE_LOG_DESC 14
 
 /**
  * Free MR resources.
@@ -79,6 +82,33 @@ mlx5_aso_destroy_sq(struct mlx5_aso_sq *sq)
 	memset(sq, 0, sizeof(*sq));
 }
 
+/**
+ * Initialize Send Queue used for ASO access counter.
+ *
+ * @param[in] sq
+ *   ASO SQ to initialize.
+ */
+static void
+mlx5_aso_cnt_init_sq(struct mlx5_aso_sq *sq)
+{
+	volatile struct mlx5_aso_wqe *restrict wqe;
+	int i;
+	int size = 1 << sq->log_desc_n;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0, wqe = &sq->sq_obj.aso_wqes[0]; i < size; ++i, ++wqe) {
+		wqe->general_cseg.sq_ds = rte_cpu_to_be_32((sq->sqn << 8) |
+							  (sizeof(*wqe) >> 4));
+		wqe->aso_cseg.operand_masks = rte_cpu_to_be_32
+			(0u |
+			 (ASO_OPER_LOGICAL_OR << ASO_CSEG_COND_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_1_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_0_OPER_OFFSET) |
+			 (BYTEWISE_64BYTE << ASO_CSEG_DATA_MASK_MODE_OFFSET));
+		wqe->aso_cseg.data_mask = RTE_BE64(UINT64_MAX);
+	}
+}
+
 /**
  * Initialize Send Queue used for ASO access.
  *
@@ -191,7 +221,7 @@ mlx5_aso_ct_init_sq(struct mlx5_aso_sq *sq)
  */
 static int
 mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
-		   void *uar)
+		   void *uar, uint16_t log_desc_n)
 {
 	struct mlx5_devx_cq_attr cq_attr = {
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(uar),
@@ -212,12 +242,12 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	int ret;
 
 	if (mlx5_devx_cq_create(cdev->ctx, &sq->cq.cq_obj,
-				MLX5_ASO_QUEUE_LOG_DESC, &cq_attr,
+				log_desc_n, &cq_attr,
 				SOCKET_ID_ANY))
 		goto error;
 	sq->cq.cq_ci = 0;
-	sq->cq.log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
-	sq->log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
+	sq->cq.log_desc_n = log_desc_n;
+	sq->log_desc_n = log_desc_n;
 	sq_attr.cqn = sq->cq.cq_obj.cq->id;
 	/* for mlx5_aso_wqe that is twice the size of mlx5_wqe */
 	log_wqbb_n = sq->log_desc_n + 1;
@@ -269,7 +299,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    sq_desc_n, &sh->aso_age_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->aso_age_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->aso_age_mng->aso_sq.mr);
 			return -1;
 		}
@@ -277,7 +308,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		break;
 	case ASO_OPC_MOD_POLICER:
 		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj))
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
 			return -1;
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
@@ -287,7 +318,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    &sh->ct_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
 			return -1;
 		}
@@ -1403,3 +1434,219 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 	rte_errno = EBUSY;
 	return -rte_errno;
 }
+
+int
+mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_hws_aso_mng *aso_mng = NULL;
+	uint8_t idx;
+	struct mlx5_aso_sq *sq;
+
+	MLX5_ASSERT(sh);
+	MLX5_ASSERT(sh->cnt_svc);
+	aso_mng = &sh->cnt_svc->aso_mng;
+	aso_mng->sq_num = HWS_CNT_ASO_SQ_NUM;
+	for (idx = 0; idx < HWS_CNT_ASO_SQ_NUM; idx++) {
+		sq = &aso_mng->sqs[idx];
+		if (mlx5_aso_sq_create(sh->cdev, sq, sh->tx_uar.obj,
+					MLX5_ASO_CNT_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_cnt_init_sq(sq);
+	}
+	return 0;
+error:
+	mlx5_aso_cnt_queue_uninit(sh);
+	return -1;
+}
+
+void
+mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh)
+{
+	uint16_t idx;
+
+	for (idx = 0; idx < sh->cnt_svc->aso_mng.sq_num; idx++)
+		mlx5_aso_destroy_sq(&sh->cnt_svc->aso_mng.sqs[idx]);
+	sh->cnt_svc->aso_mng.sq_num = 0;
+}
+
+static uint16_t
+mlx5_aso_cnt_sq_enqueue_burst(struct mlx5_hws_cnt_pool *cpool,
+		struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_aso_sq *sq, uint32_t n,
+		uint32_t offset, uint32_t dcs_id_base)
+{
+	volatile struct mlx5_aso_wqe *wqe;
+	uint16_t size = 1 << sq->log_desc_n;
+	uint16_t mask = size - 1;
+	uint16_t max;
+	uint32_t upper_offset = offset;
+	uint64_t addr;
+	uint32_t ctrl_gen_id = 0;
+	uint8_t opcmod = sh->cdev->config.hca_attr.flow_access_aso_opc_mod;
+	rte_be32_t lkey = rte_cpu_to_be_32(cpool->raw_mng->mr.lkey);
+	uint16_t aso_n = (uint16_t)(RTE_ALIGN_CEIL(n, 4) / 4);
+	uint32_t ccntid;
+
+	max = RTE_MIN(size - (uint16_t)(sq->head - sq->tail), aso_n);
+	if (unlikely(!max))
+		return 0;
+	upper_offset += (max * 4);
+	/* Because only one burst at one time, we can use the same elt. */
+	sq->elts[0].burst_size = max;
+	ctrl_gen_id = dcs_id_base;
+	ctrl_gen_id /= 4;
+	do {
+		ccntid = upper_offset - max * 4;
+		wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
+		rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
+		wqe->general_cseg.misc = rte_cpu_to_be_32(ctrl_gen_id);
+		wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+							 MLX5_COMP_MODE_OFFSET);
+		wqe->general_cseg.opcode = rte_cpu_to_be_32
+						(MLX5_OPCODE_ACCESS_ASO |
+						 (opcmod <<
+						  WQE_CSEG_OPC_MOD_OFFSET) |
+						 (sq->pi <<
+						  WQE_CSEG_WQE_INDEX_OFFSET));
+		addr = (uint64_t)RTE_PTR_ADD(cpool->raw_mng->raw,
+				ccntid * sizeof(struct flow_counter_stats));
+		wqe->aso_cseg.va_h = rte_cpu_to_be_32((uint32_t)(addr >> 32));
+		wqe->aso_cseg.va_l_r = rte_cpu_to_be_32((uint32_t)addr | 1u);
+		wqe->aso_cseg.lkey = lkey;
+		sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
+		sq->head++;
+		sq->next++;
+		ctrl_gen_id++;
+		max--;
+	} while (max);
+	wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+							 MLX5_COMP_MODE_OFFSET);
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	return sq->elts[0].burst_size;
+}
+
+static uint16_t
+mlx5_aso_cnt_completion_handle(struct mlx5_aso_sq *sq)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const unsigned int cq_size = 1 << cq->log_desc_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = cq->cq_ci & mask;
+	const uint16_t max = (uint16_t)(sq->head - sq->tail);
+	uint16_t i = 0;
+	int ret;
+	if (unlikely(!max))
+		return 0;
+	idx = next_idx;
+	next_idx = (cq->cq_ci + 1) & mask;
+	rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+	cqe = &cq->cq_obj.cqes[idx];
+	ret = check_cqe(cqe, cq_size, cq->cq_ci);
+	/*
+	 * Be sure owner read is done before any other cookie field or
+	 * opaque field.
+	 */
+	rte_io_rmb();
+	if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+		if (likely(ret == MLX5_CQE_STATUS_HW_OWN))
+			return 0; /* return immediately. */
+		mlx5_aso_cqe_err_handle(sq);
+	}
+	i += sq->elts[0].burst_size;
+	sq->elts[0].burst_size = 0;
+	cq->cq_ci++;
+	if (likely(i)) {
+		sq->tail += i;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return i;
+}
+
+static uint16_t
+mlx5_aso_cnt_query_one_dcs(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool,
+			   uint8_t dcs_idx, uint32_t num)
+{
+	uint32_t dcs_id = cpool->dcs_mng.dcs[dcs_idx].obj->id;
+	uint64_t cnt_num = cpool->dcs_mng.dcs[dcs_idx].batch_sz;
+	uint64_t left;
+	uint32_t iidx = cpool->dcs_mng.dcs[dcs_idx].iidx;
+	uint32_t offset;
+	uint16_t mask;
+	uint16_t sq_idx;
+	uint64_t burst_sz = (uint64_t)(1 << MLX5_ASO_CNT_QUEUE_LOG_DESC) * 4 *
+		sh->cnt_svc->aso_mng.sq_num;
+	uint64_t qburst_sz = burst_sz / sh->cnt_svc->aso_mng.sq_num;
+	uint64_t n;
+	struct mlx5_aso_sq *sq;
+
+	cnt_num = RTE_MIN(num, cnt_num);
+	left = cnt_num;
+	while (left) {
+		mask = 0;
+		for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+				sq_idx++) {
+			if (left == 0) {
+				mask |= (1 << sq_idx);
+				continue;
+			}
+			n = RTE_MIN(left, qburst_sz);
+			offset = cnt_num - left;
+			offset += iidx;
+			mlx5_aso_cnt_sq_enqueue_burst(cpool, sh,
+					&sh->cnt_svc->aso_mng.sqs[sq_idx], n,
+					offset, dcs_id);
+			left -= n;
+		}
+		do {
+			for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+					sq_idx++) {
+				sq = &sh->cnt_svc->aso_mng.sqs[sq_idx];
+				if (mlx5_aso_cnt_completion_handle(sq))
+					mask |= (1 << sq_idx);
+			}
+		} while (mask < ((1 << sh->cnt_svc->aso_mng.sq_num) - 1));
+	}
+	return cnt_num;
+}
+
+/*
+ * Query FW counter via ASO WQE.
+ *
+ * ASO query counter use _sync_ mode, means:
+ * 1. each SQ issue one burst with several WQEs
+ * 2. ask for CQE at last WQE
+ * 3. busy poll CQ of each SQ's
+ * 4. If all SQ's CQE are received then goto step 1, issue next burst
+ *
+ * @param[in] sh
+ *   Pointer to shared device.
+ * @param[in] cpool
+ *   Pointer to counter pool.
+ *
+ * @return
+ *   0 on success, -1 on failure.
+ */
+int
+mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	uint32_t num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool) -
+		rte_ring_count(cpool->free_list);
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		num = RTE_MIN(cnt_num, cpool->dcs_mng.dcs[idx].batch_sz);
+		mlx5_aso_cnt_query_one_dcs(sh, cpool, idx, num);
+		cnt_num -= num;
+		if (cnt_num == 0)
+			break;
+	}
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index c2e16bc56d..507abb54e4 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -10,6 +10,7 @@
 #include "mlx5_rx.h"
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+#include "mlx5_hws_cnt.h"
 
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
@@ -353,6 +354,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 			mlx5dr_action_destroy(acts->mhdr->action);
 		mlx5_free(acts->mhdr);
 	}
+	if (mlx5_hws_cnt_id_valid(acts->cnt_id)) {
+		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
+		acts->cnt_id = 0;
+	}
 }
 
 /**
@@ -532,6 +537,44 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared counter action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] cnt_id
+ *   Shared counter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t cnt_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_counter.id = cnt_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
+
 /**
  * Translate shared indirect action.
  *
@@ -573,6 +616,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 		    action_src, action_dst, idx, shared_rss))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (__flow_hw_act_data_shared_cnt_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
+			action_src, action_dst, act_idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -946,6 +996,30 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	}
 	return 0;
 }
+
+static __rte_always_inline int
+flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
+		      struct mlx5_hw_actions *acts)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t pos = start_pos;
+	cnt_id_t cnt_id;
+	int ret;
+
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	if (ret != 0)
+		return ret;
+	ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &acts->rule_acts[pos].action,
+				 &acts->rule_acts[pos].counter.offset);
+	if (ret != 0)
+		return ret;
+	acts->cnt_id = cnt_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1189,6 +1263,20 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (masks->conf &&
+			    ((const struct rte_flow_action_count *)
+			     masks->conf)->id) {
+				err = flow_hw_cnt_compile(dev, i, acts);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, i)) {
+				goto err;
+			}
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1377,6 +1465,13 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				(dev, &act_data, item_flags, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+				act_idx,
+				&rule_act->action,
+				&rule_act->counter.offset))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1520,7 +1615,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num)
+			  uint32_t *acts_num,
+			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
@@ -1574,6 +1670,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
 		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
@@ -1681,6 +1778,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
+					&cnt_id);
+			if (ret != 0)
+				return ret;
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = cnt_id;
+			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 act_data->shared_counter.id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = act_data->shared_counter.id;
+			break;
 		default:
 			break;
 		}
@@ -1690,6 +1813,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
 	}
+	if (mlx5_hws_cnt_id_valid(hw_acts->cnt_id))
+		job->flow->cnt_id = hw_acts->cnt_id;
 	return 0;
 }
 
@@ -1825,7 +1950,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * user's input, in order to save the cost.
 	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num)) {
+				  actions, rule_acts, &acts_num, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1955,6 +2080,13 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
+			    mlx5_hws_cnt_is_shared
+				(priv->hws_cpool, job->flow->cnt_id) == false) {
+				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
+						&job->flow->cnt_id);
+				job->flow->cnt_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -2678,6 +2810,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -4355,6 +4490,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_counters) {
+		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
+				nb_queue);
+		if (priv->hws_cpool == NULL)
+			goto err;
+	}
 	return 0;
 err:
 	flow_hw_free_vport_actions(priv);
@@ -4424,6 +4565,8 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (priv->hws_cpool)
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4565,10 +4708,28 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	cnt_id_t cnt_id;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_create(dev, conf, action, error);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+			rte_flow_error_set(error, ENODEV,
+					RTE_FLOW_ERROR_TYPE_ACTION,
+					NULL,
+					"counter are not configured!");
+		else
+			handle = (struct rte_flow_action_handle *)
+				 (uintptr_t)cnt_id;
+		break;
+	default:
+		handle = flow_dv_action_create(dev, conf, action, error);
+	}
+	return handle;
 }
 
 /**
@@ -4632,10 +4793,172 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			      void *user_data,
 			      struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_destroy(dev, handle, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	default:
+		return flow_dv_action_destroy(dev, handle, error);
+	}
+}
+
+static int
+flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
+		      void *data, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cnt *cnt;
+	struct rte_flow_query_count *qc = data;
+	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint64_t pkts, bytes;
+
+	if (!mlx5_hws_cnt_id_valid(counter))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"counter are not available");
+	cnt = &priv->hws_cpool->pool[iidx];
+	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
+	qc->hits_set = 1;
+	qc->bytes_set = 1;
+	qc->hits = pkts - cnt->reset.hits;
+	qc->bytes = bytes - cnt->reset.bytes;
+	if (qc->reset) {
+		cnt->reset.bytes = bytes;
+		cnt->reset.hits = pkts;
+	}
+	return 0;
+}
+
+static int
+flow_hw_query(struct rte_eth_dev *dev,
+	      struct rte_flow *flow __rte_unused,
+	      const struct rte_flow_action *actions __rte_unused,
+	      void *data __rte_unused,
+	      struct rte_flow_error *error __rte_unused)
+{
+	int ret = -EINVAL;
+	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
+
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
+						  error);
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  actions,
+						  "action not supported");
+		}
+	}
+	return ret;
+}
+
+/**
+ * Create indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   A valid shared action handle in case of success, NULL otherwise and
+ *   rte_errno is set.
+ */
+static struct rte_flow_action_handle *
+flow_hw_action_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_indir_action_conf *conf,
+		       const struct rte_flow_action *action,
+		       struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
+					    NULL, err);
+}
+
+/**
+ * Destroy the indirect action.
+ * Release action related resources on the NIC and the memory.
+ * Lock free, (mutex should be acquired by caller).
+ * Dispatcher for action type specific call.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be removed.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_destroy(struct rte_eth_dev *dev,
+		       struct rte_flow_action_handle *handle,
+		       struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
+			NULL, error);
+}
+
+/**
+ * Updates in place shared action configuration.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be updated.
+ * @param[in] update
+ *   Action specification used to modify the action pointed by *handle*.
+ *   *update* could be of same type with the action pointed by the *handle*
+ *   handle argument, or some other structures like a wrapper, depending on
+ *   the indirect action type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_update(struct rte_eth_dev *dev,
+		      struct rte_flow_action_handle *handle,
+		      const void *update,
+		      struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
+			update, NULL, err);
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return flow_hw_query_counter(dev, act_idx, data, error);
+	default:
+		return flow_dv_action_query(dev, handle, data, error);
+	}
 }
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
@@ -4657,10 +4980,11 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
 	.action_validate = flow_dv_action_validate,
-	.action_create = flow_dv_action_create,
-	.action_destroy = flow_dv_action_destroy,
-	.action_update = flow_dv_action_update,
-	.action_query = flow_dv_action_query,
+	.action_create = flow_hw_action_create,
+	.action_destroy = flow_hw_action_destroy,
+	.action_update = flow_hw_action_update,
+	.action_query = flow_hw_action_query,
+	.query = flow_hw_query,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
new file mode 100644
index 0000000000..e2408ef36d
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -0,0 +1,528 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#include <stdint.h>
+#include <rte_malloc.h>
+#include <mlx5_malloc.h>
+#include <rte_ring.h>
+#include <mlx5_devx_cmds.h>
+#include <rte_cycles.h>
+
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+
+#include "mlx5_utils.h"
+#include "mlx5_hws_cnt.h"
+
+#define HWS_CNT_CACHE_SZ_DEFAULT 511
+#define HWS_CNT_CACHE_PRELOAD_DEFAULT 254
+#define HWS_CNT_CACHE_FETCH_DEFAULT 254
+#define HWS_CNT_CACHE_THRESHOLD_DEFAULT 254
+#define HWS_CNT_ALLOC_FACTOR_DEFAULT 20
+
+static void
+__hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t preload;
+	uint32_t q_num = cpool->cache->q_num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	cnt_id_t cnt_id, iidx = 0;
+	uint32_t qidx;
+	struct rte_ring *qcache = NULL;
+
+	/*
+	 * Counter ID order is important for tracking the max number of in used
+	 * counter for querying, which means counter internal index order must
+	 * be from zero to the number user configured, i.e: 0 - 8000000.
+	 * Need to load counter ID in this order into the cache firstly,
+	 * and then the global free list.
+	 * In the end, user fetch the the counter from minimal to the maximum.
+	 */
+	preload = RTE_MIN(cpool->cache->preload_sz, cnt_num / q_num);
+	for (qidx = 0; qidx < q_num; qidx++) {
+		for (; iidx < preload * (qidx + 1); iidx++) {
+			cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+			qcache = cpool->cache->qcache[qidx];
+			if (qcache)
+				rte_ring_enqueue_elem(qcache, &cnt_id,
+						sizeof(cnt_id));
+		}
+	}
+	for (; iidx < cnt_num; iidx++) {
+		cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+		rte_ring_enqueue_elem(cpool->free_list, &cnt_id,
+				sizeof(cnt_id));
+	}
+}
+
+static void
+__mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	struct rte_ring *reset_list = cpool->wait_reset_list;
+	struct rte_ring *reuse_list = cpool->reuse_list;
+	uint32_t reset_cnt_num;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdu = {0};
+
+	reset_cnt_num = rte_ring_count(reset_list);
+	do {
+		cpool->query_gen++;
+		mlx5_aso_cnt_query(sh, cpool);
+		zcdr.n1 = 0;
+		zcdu.n1 = 0;
+		rte_ring_enqueue_zc_burst_elem_start(reuse_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdu,
+				NULL);
+		rte_ring_dequeue_zc_burst_elem_start(reset_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdr,
+				NULL);
+		__hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num);
+		rte_ring_dequeue_zc_elem_finish(reset_list,
+				reset_cnt_num);
+		rte_ring_enqueue_zc_elem_finish(reuse_list,
+				reset_cnt_num);
+		reset_cnt_num = rte_ring_count(reset_list);
+	} while (reset_cnt_num > 0);
+}
+
+static void
+mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_raw_data_mng *mng)
+{
+	if (mng == NULL)
+		return;
+	sh->cdev->mr_scache.dereg_mr_cb(&mng->mr);
+	mlx5_free(mng->raw);
+	mlx5_free(mng);
+}
+
+__rte_unused
+static struct mlx5_hws_cnt_raw_data_mng *
+mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
+{
+	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
+	int ret;
+	size_t sz = n * sizeof(struct flow_counter_stats);
+
+	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
+			SOCKET_ID_ANY);
+	if (mng == NULL)
+		goto error;
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+			SOCKET_ID_ANY);
+	if (mng->raw == NULL)
+		goto error;
+	ret = sh->cdev->mr_scache.reg_mr_cb(sh->cdev->pd, mng->raw, sz,
+					    &mng->mr);
+	if (ret) {
+		rte_errno = errno;
+		goto error;
+	}
+	return mng;
+error:
+	mlx5_hws_cnt_raw_data_free(sh, mng);
+	return NULL;
+}
+
+static void *
+mlx5_hws_cnt_svc(void *opaque)
+{
+	struct mlx5_dev_ctx_shared *sh =
+		(struct mlx5_dev_ctx_shared *)opaque;
+	uint64_t interval =
+		(uint64_t)sh->cnt_svc->query_interval * (US_PER_S / MS_PER_S);
+	uint16_t port_id;
+	uint64_t start_cycle, query_cycle = 0;
+	uint64_t query_us;
+	uint64_t sleep_us;
+
+	while (sh->cnt_svc->svc_running != 0) {
+		start_cycle = rte_rdtsc();
+		MLX5_ETH_FOREACH_DEV(port_id, sh->cdev->dev) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+			if (opriv != NULL &&
+			    opriv->sh == sh &&
+			    opriv->hws_cpool != NULL) {
+				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+			}
+		}
+		query_cycle = rte_rdtsc() - start_cycle;
+		query_us = query_cycle / (rte_get_timer_hz() / US_PER_S);
+		sleep_us = interval - query_us;
+		if (interval > query_us)
+			rte_delay_us_sleep(sleep_us);
+	}
+	return NULL;
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg)
+{
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct mlx5_hws_cnt_pool *cntp;
+	uint64_t cnt_num = 0;
+	uint32_t qidx;
+
+	MLX5_ASSERT(pcfg);
+	MLX5_ASSERT(ccfg);
+	cntp = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*cntp), 0,
+			   SOCKET_ID_ANY);
+	if (cntp == NULL)
+		return NULL;
+
+	cntp->cfg = *pcfg;
+	cntp->cache = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*cntp->cache) +
+			sizeof(((struct mlx5_hws_cnt_pool_caches *)0)->qcache[0])
+				* ccfg->q_num, 0, SOCKET_ID_ANY);
+	if (cntp->cache == NULL)
+		goto error;
+	 /* store the necessary cache parameters. */
+	cntp->cache->fetch_sz = ccfg->fetch_sz;
+	cntp->cache->preload_sz = ccfg->preload_sz;
+	cntp->cache->threshold = ccfg->threshold;
+	cntp->cache->q_num = ccfg->q_num;
+	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
+	if (cnt_num > UINT32_MAX) {
+		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
+			cnt_num);
+		goto error;
+	}
+	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(struct mlx5_hws_cnt) *
+			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
+			0, SOCKET_ID_ANY);
+	if (cntp->pool == NULL)
+		goto error;
+	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
+	cntp->free_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->free_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_R_RING", pcfg->name);
+	cntp->wait_reset_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_MP_HTS_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
+	if (cntp->wait_reset_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_U_RING", pcfg->name);
+	cntp->reuse_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->reuse_list == NULL) {
+		DRV_LOG(ERR, "failed to create reuse list ring");
+		goto error;
+	}
+	for (qidx = 0; qidx < ccfg->q_num; qidx++) {
+		snprintf(mz_name, sizeof(mz_name), "%s_cache/%u", pcfg->name,
+				qidx);
+		cntp->cache->qcache[qidx] = rte_ring_create(mz_name, ccfg->size,
+				SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (cntp->cache->qcache[qidx] == NULL)
+			goto error;
+	}
+	return cntp;
+error:
+	mlx5_hws_cnt_pool_deinit(cntp);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool * const cntp)
+{
+	uint32_t qidx = 0;
+	if (cntp == NULL)
+		return;
+	rte_ring_free(cntp->free_list);
+	rte_ring_free(cntp->wait_reset_list);
+	rte_ring_free(cntp->reuse_list);
+	if (cntp->cache) {
+		for (qidx = 0; qidx < cntp->cache->q_num; qidx++)
+			rte_ring_free(cntp->cache->qcache[qidx]);
+	}
+	mlx5_free(cntp->cache);
+	mlx5_free(cntp->raw_mng);
+	mlx5_free(cntp->pool);
+	mlx5_free(cntp);
+}
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh)
+{
+#define CNT_THREAD_NAME_MAX 256
+	char name[CNT_THREAD_NAME_MAX];
+	rte_cpuset_t cpuset;
+	int ret;
+	uint32_t service_core = sh->cnt_svc->service_core;
+
+	CPU_ZERO(&cpuset);
+	sh->cnt_svc->svc_running = 1;
+	ret = pthread_create(&sh->cnt_svc->service_thread, NULL,
+			mlx5_hws_cnt_svc, sh);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create HW steering's counter service thread.");
+		return -ENOSYS;
+	}
+	snprintf(name, CNT_THREAD_NAME_MAX - 1, "%s/svc@%d",
+		 sh->ibdev_name, service_core);
+	rte_thread_setname(sh->cnt_svc->service_thread, name);
+	CPU_SET(service_core, &cpuset);
+	pthread_setaffinity_np(sh->cnt_svc->service_thread, sizeof(cpuset),
+				&cpuset);
+	return 0;
+}
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc->service_thread == 0)
+		return;
+	sh->cnt_svc->svc_running = 0;
+	pthread_join(sh->cnt_svc->service_thread, NULL);
+	sh->cnt_svc->service_thread = 0;
+}
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+	uint32_t max_log_bulk_sz = 0;
+	uint32_t log_bulk_sz;
+	uint32_t idx, alloced = 0;
+	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	struct mlx5_devx_counter_attr attr = {0};
+	struct mlx5_devx_obj *dcs;
+
+	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
+		DRV_LOG(ERR,
+			"Fw doesn't support bulk log max alloc");
+		return -1;
+	}
+	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
+	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
+	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
+	attr.pd = sh->cdev->pdn;
+	attr.pd_valid = 1;
+	attr.bulk_log_max_alloc = 1;
+	attr.flow_counter_bulk_log_size = log_bulk_sz;
+	idx = 0;
+	dcs = mlx5_devx_cmd_flow_counter_alloc_general(sh->cdev->ctx, &attr);
+	if (dcs == NULL)
+		goto error;
+	cpool->dcs_mng.dcs[idx].obj = dcs;
+	cpool->dcs_mng.dcs[idx].batch_sz = (1 << log_bulk_sz);
+	cpool->dcs_mng.batch_total++;
+	idx++;
+	cpool->dcs_mng.dcs[0].iidx = 0;
+	alloced = cpool->dcs_mng.dcs[0].batch_sz;
+	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
+		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			dcs = mlx5_devx_cmd_flow_counter_alloc_general
+				(sh->cdev->ctx, &attr);
+			if (dcs == NULL)
+				goto error;
+			cpool->dcs_mng.dcs[idx].obj = dcs;
+			cpool->dcs_mng.dcs[idx].batch_sz =
+				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].iidx = alloced;
+			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
+			cpool->dcs_mng.batch_total++;
+		}
+	}
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Cannot alloc device counter, allocated[%" PRIu32 "] request[%" PRIu32 "]",
+		alloced, cnt_num);
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+		cpool->dcs_mng.dcs[idx].obj = NULL;
+		cpool->dcs_mng.dcs[idx].batch_sz = 0;
+		cpool->dcs_mng.dcs[idx].iidx = 0;
+	}
+	cpool->dcs_mng.batch_total = 0;
+	return -1;
+}
+
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+
+	if (cpool == NULL)
+		return;
+	for (idx = 0; idx < MLX5_HWS_CNT_DCS_NUM; idx++)
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+	if (cpool->raw_mng) {
+		mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+		cpool->raw_mng = NULL;
+	}
+}
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	int ret = 0;
+	struct mlx5_hws_cnt_dcs *dcs;
+	uint32_t flags;
+
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		dcs->dr_action = mlx5dr_action_create_counter(priv->dr_ctx,
+					(struct mlx5dr_devx_obj *)dcs->obj,
+					flags);
+		if (dcs->dr_action == NULL) {
+			mlx5_hws_cnt_pool_action_destroy(cpool);
+			ret = -ENOSYS;
+			break;
+		}
+	}
+	return ret;
+}
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	struct mlx5_hws_cnt_dcs *dcs;
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		if (dcs->dr_action != NULL) {
+			mlx5dr_action_destroy(dcs->dr_action);
+			dcs->dr_action = NULL;
+		}
+	}
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue)
+{
+	struct mlx5_hws_cnt_pool *cpool = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cache_param cparam = {0};
+	struct mlx5_hws_cnt_pool_cfg pcfg = {0};
+	char *mp_name;
+	int ret = 0;
+	size_t sz;
+
+	/* init cnt service if not. */
+	if (priv->sh->cnt_svc == NULL) {
+		ret = mlx5_hws_cnt_svc_init(priv->sh);
+		if (ret != 0)
+			return NULL;
+	}
+	cparam.fetch_sz = HWS_CNT_CACHE_FETCH_DEFAULT;
+	cparam.preload_sz = HWS_CNT_CACHE_PRELOAD_DEFAULT;
+	cparam.q_num = nb_queue;
+	cparam.threshold = HWS_CNT_CACHE_THRESHOLD_DEFAULT;
+	cparam.size = HWS_CNT_CACHE_SZ_DEFAULT;
+	pcfg.alloc_factor = HWS_CNT_ALLOC_FACTOR_DEFAULT;
+	mp_name = mlx5_malloc(MLX5_MEM_ZERO, RTE_MEMZONE_NAMESIZE, 0,
+			SOCKET_ID_ANY);
+	if (mp_name == NULL)
+		goto error;
+	snprintf(mp_name, RTE_MEMZONE_NAMESIZE, "MLX5_HWS_CNT_POOL_%u",
+			dev->data->port_id);
+	pcfg.name = mp_name;
+	pcfg.request_num = pattr->nb_counters;
+	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	if (cpool == NULL)
+		goto error;
+	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
+	if (ret != 0)
+		goto error;
+	sz = RTE_ALIGN_CEIL(mlx5_hws_cnt_pool_get_size(cpool), 4);
+	cpool->raw_mng = mlx5_hws_cnt_raw_data_alloc(priv->sh, sz);
+	if (cpool->raw_mng == NULL)
+		goto error;
+	__hws_cnt_id_load(cpool);
+	/*
+	 * Bump query gen right after pool create so the
+	 * pre-loaded counters can be used directly
+	 * because they already have init value no need
+	 * to wait for query.
+	 */
+	cpool->query_gen = 1;
+	ret = mlx5_hws_cnt_pool_action_create(priv, cpool);
+	if (ret != 0)
+		goto error;
+	priv->sh->cnt_svc->refcnt++;
+	return cpool;
+error:
+	mlx5_hws_cnt_pool_destroy(priv->sh, cpool);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	if (cpool == NULL)
+		return;
+	if (--sh->cnt_svc->refcnt == 0)
+		mlx5_hws_cnt_svc_deinit(sh);
+	mlx5_hws_cnt_pool_action_destroy(cpool);
+	mlx5_hws_cnt_pool_dcs_free(sh, cpool);
+	mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+	mlx5_free((void *)cpool->cfg.name);
+	mlx5_hws_cnt_pool_deinit(cpool);
+}
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh)
+{
+	int ret;
+
+	sh->cnt_svc = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*sh->cnt_svc), 0, SOCKET_ID_ANY);
+	if (sh->cnt_svc == NULL)
+		return -1;
+	sh->cnt_svc->query_interval = sh->config.cnt_svc.cycle_time;
+	sh->cnt_svc->service_core = sh->config.cnt_svc.service_core;
+	ret = mlx5_aso_cnt_queue_init(sh);
+	if (ret != 0) {
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+		return -1;
+	}
+	ret = mlx5_hws_cnt_service_thread_create(sh);
+	if (ret != 0) {
+		mlx5_aso_cnt_queue_uninit(sh);
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+	}
+	return 0;
+}
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc == NULL)
+		return;
+	mlx5_hws_cnt_service_thread_destroy(sh);
+	mlx5_aso_cnt_queue_uninit(sh);
+	mlx5_free(sh->cnt_svc);
+	sh->cnt_svc = NULL;
+}
+
+#endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
new file mode 100644
index 0000000000..5fab4ba597
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -0,0 +1,558 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Mellanox Technologies, Ltd
+ */
+
+#ifndef _MLX5_HWS_CNT_H_
+#define _MLX5_HWS_CNT_H_
+
+#include <rte_ring.h>
+#include "mlx5_utils.h"
+#include "mlx5_flow.h"
+
+/*
+ * COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    | T |       | D |                                               |
+ *    ~ Y |       | C |                    IDX                        ~
+ *    | P |       | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
+#define MLX5_HWS_CNT_DCS_NUM 4
+#define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
+#define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
+#define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
+
+struct mlx5_hws_cnt_dcs {
+	void *dr_action;
+	uint32_t batch_sz;
+	uint32_t iidx; /* internal index of first counter in this bulk. */
+	struct mlx5_devx_obj *obj;
+};
+
+struct mlx5_hws_cnt_dcs_mng {
+	uint32_t batch_total;
+	struct mlx5_hws_cnt_dcs dcs[MLX5_HWS_CNT_DCS_NUM];
+};
+
+struct mlx5_hws_cnt {
+	struct flow_counter_stats reset;
+	union {
+		uint32_t share: 1;
+		/*
+		 * share will be set to 1 when this counter is used as indirect
+		 * action. Only meaningful when user own this counter.
+		 */
+		uint32_t query_gen_when_free;
+		/*
+		 * When PMD own this counter (user put back counter to PMD
+		 * counter pool, i.e), this field recorded value of counter
+		 * pools query generation at time user release the counter.
+		 */
+	};
+};
+
+struct mlx5_hws_cnt_raw_data_mng {
+	struct flow_counter_stats *raw;
+	struct mlx5_pmd_mr mr;
+};
+
+struct mlx5_hws_cache_param {
+	uint32_t size;
+	uint32_t q_num;
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+};
+
+struct mlx5_hws_cnt_pool_cfg {
+	char *name;
+	uint32_t request_num;
+	uint32_t alloc_factor;
+};
+
+struct mlx5_hws_cnt_pool_caches {
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+	uint32_t q_num;
+	struct rte_ring *qcache[];
+};
+
+struct mlx5_hws_cnt_pool {
+	struct mlx5_hws_cnt_pool_cfg cfg __rte_cache_aligned;
+	struct mlx5_hws_cnt_dcs_mng dcs_mng __rte_cache_aligned;
+	uint32_t query_gen __rte_cache_aligned;
+	struct mlx5_hws_cnt *pool;
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng;
+	struct rte_ring *reuse_list;
+	struct rte_ring *free_list;
+	struct rte_ring *wait_reset_list;
+	struct mlx5_hws_cnt_pool_caches *cache;
+} __rte_cache_aligned;
+
+/**
+ * Translate counter id into internal index (start from 0), which can be used
+ * as index of raw/cnt pool.
+ *
+ * @param cnt_id
+ *   The external counter id
+ * @return
+ *   Internal index
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+	uint32_t offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+
+	dcs_idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	return (cpool->dcs_mng.dcs[dcs_idx].iidx + offset);
+}
+
+/**
+ * Check if it's valid counter id.
+ */
+static __rte_always_inline bool
+mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
+{
+	return (cnt_id >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_COUNT ? true : false;
+}
+
+/**
+ * Generate Counter id from internal index.
+ *
+ * @param cpool
+ *   The pointer to counter pool
+ * @param index
+ *   The internal counter index.
+ *
+ * @return
+ *   Counter id
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+{
+	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
+	uint32_t idx;
+	uint32_t offset;
+	cnt_id_t cnt_id;
+
+	for (idx = 0, offset = iidx; idx < dcs_mng->batch_total; idx++) {
+		if (dcs_mng->dcs[idx].batch_sz <= offset)
+			offset -= dcs_mng->dcs[idx].batch_sz;
+		else
+			break;
+	}
+	cnt_id = offset;
+	cnt_id |= (idx << MLX5_HWS_CNT_DCS_IDX_OFFSET);
+	return (MLX5_INDIRECT_ACTION_TYPE_COUNT <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | cnt_id;
+}
+
+static __rte_always_inline void
+__hws_cnt_query_raw(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		uint64_t *raw_pkts, uint64_t *raw_bytes)
+{
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng = cpool->raw_mng;
+	struct flow_counter_stats s[2];
+	uint8_t i = 0x1;
+	size_t stat_sz = sizeof(s[0]);
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	memcpy(&s[0], &raw_mng->raw[iidx], stat_sz);
+	do {
+		memcpy(&s[i & 1], &raw_mng->raw[iidx], stat_sz);
+		if (memcmp(&s[0], &s[1], stat_sz) == 0) {
+			*raw_pkts = rte_be_to_cpu_64(s[0].hits);
+			*raw_bytes = rte_be_to_cpu_64(s[0].bytes);
+			break;
+		}
+		i = ~i;
+	} while (1);
+}
+
+/**
+ * Copy elems from one zero-copy ring to zero-copy ring in place.
+ *
+ * The input is a rte ring zero-copy data struct, which has two pointer.
+ * in case of the wrapper happened, the ptr2 will be meaningful.
+ *
+ * So this rountin needs to consider the situation that the address given by
+ * source and destination could be both wrapped.
+ * First, calculate the first number of element needs to be copied until wrapped
+ * address, which could be in source or destination.
+ * Second, copy left number of element until second wrapped address. If in first
+ * step the wrapped address is source, then this time it must be in destination.
+ * and vice-vers.
+ * Third, copy all left numbe of element.
+ *
+ * In worst case, we need copy three pieces of continuous memory.
+ *
+ * @param zcdd
+ *   A pointer to zero-copy data of dest ring.
+ * @param zcds
+ *   A pointer to zero-copy data of source ring.
+ * @param n
+ *   Number of elems to copy.
+ */
+static __rte_always_inline void
+__hws_cnt_r2rcpy(struct rte_ring_zc_data *zcdd, struct rte_ring_zc_data *zcds,
+		unsigned int n)
+{
+	unsigned int n1, n2, n3;
+	void *s1, *s2, *s3;
+	void *d1, *d2, *d3;
+
+	s1 = zcds->ptr1;
+	d1 = zcdd->ptr1;
+	n1 = RTE_MIN(zcdd->n1, zcds->n1);
+	if (zcds->n1 > n1) {
+		n2 = zcds->n1 - n1;
+		s2 = RTE_PTR_ADD(zcds->ptr1, sizeof(cnt_id_t) * n1);
+		d2 = zcdd->ptr2;
+		n3 = n - n1 - n2;
+		s3 = zcds->ptr2;
+		d3 = RTE_PTR_ADD(zcdd->ptr2, sizeof(cnt_id_t) * n2);
+	} else {
+		n2 = zcdd->n1 - n1;
+		s2 = zcds->ptr2;
+		d2 = RTE_PTR_ADD(zcdd->ptr1, sizeof(cnt_id_t) * n1);
+		n3 = n - n1 - n2;
+		s3 = RTE_PTR_ADD(zcds->ptr2, sizeof(cnt_id_t) * n2);
+		d3 = zcdd->ptr2;
+	}
+	memcpy(d1, s1, n1 * sizeof(cnt_id_t));
+	if (n2 != 0) {
+		memcpy(d2, s2, n2 * sizeof(cnt_id_t));
+		if (n3 != 0)
+			memcpy(d3, s3, n3 * sizeof(cnt_id_t));
+	}
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_flush(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *reset_list = NULL;
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache,
+			sizeof(cnt_id_t), rte_ring_count(qcache), &zcdc,
+			NULL);
+	MLX5_ASSERT(ret);
+	reset_list = cpool->wait_reset_list;
+	rte_ring_enqueue_zc_burst_elem_start(reset_list,
+			sizeof(cnt_id_t), ret, &zcdr, NULL);
+	__hws_cnt_r2rcpy(&zcdr, &zcdc, ret);
+	rte_ring_enqueue_zc_elem_finish(reset_list, ret);
+	rte_ring_dequeue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_fetch(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+	struct rte_ring *free_list = NULL;
+	struct rte_ring *reuse_list = NULL;
+	struct rte_ring *list = NULL;
+	struct rte_ring_zc_data zcdf = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdu = {0};
+	struct rte_ring_zc_data zcds = {0};
+	struct mlx5_hws_cnt_pool_caches *cache = cpool->cache;
+	unsigned int ret;
+
+	reuse_list = cpool->reuse_list;
+	ret = rte_ring_dequeue_zc_burst_elem_start(reuse_list,
+			sizeof(cnt_id_t), cache->fetch_sz, &zcdu, NULL);
+	zcds = zcdu;
+	list = reuse_list;
+	if (unlikely(ret == 0)) { /* no reuse counter. */
+		rte_ring_dequeue_zc_elem_finish(reuse_list, 0);
+		free_list = cpool->free_list;
+		ret = rte_ring_dequeue_zc_burst_elem_start(free_list,
+				sizeof(cnt_id_t), cache->fetch_sz, &zcdf, NULL);
+		zcds = zcdf;
+		list = free_list;
+		if (unlikely(ret == 0)) { /* no free counter. */
+			rte_ring_dequeue_zc_elem_finish(free_list, 0);
+			if (rte_ring_count(cpool->wait_reset_list))
+				return -EAGAIN;
+			return -ENOENT;
+		}
+	}
+	rte_ring_enqueue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+			ret, &zcdc, NULL);
+	__hws_cnt_r2rcpy(&zcdc, &zcds, ret);
+	rte_ring_dequeue_zc_elem_finish(list, ret);
+	rte_ring_enqueue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+__mlx5_hws_cnt_pool_enqueue_revert(struct rte_ring *r, unsigned int n,
+		struct rte_ring_zc_data *zcd)
+{
+	uint32_t current_head = 0;
+	uint32_t revert2head = 0;
+
+	MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST);
+	MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST);
+	current_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	MLX5_ASSERT(n <= r->capacity);
+	MLX5_ASSERT(n <= rte_ring_count(r));
+	revert2head = current_head - n;
+	r->prod.head = revert2head; /* This ring should be SP. */
+	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
+			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
+	/* Update tail */
+	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
+	return n;
+}
+
+/**
+ * Put one counter back in the mempool.
+ *
+ * @param cpool
+ *   A pointer to the counter pool structure.
+ * @param cnt_id
+ *   A counter id to be added.
+ * @return
+ *   - 0: Success; object taken
+ *   - -ENOENT: not enough entry in pool
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret = 0;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring *qcache = NULL;
+	unsigned int wb_num = 0; /* cache write-back number. */
+	cnt_id_t iidx;
+
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].query_gen_when_free =
+		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_enqueue_elem(cpool->wait_reset_list, cnt_id,
+				sizeof(cnt_id_t));
+		MLX5_ASSERT(ret == 0);
+		return ret;
+	}
+	ret = rte_ring_enqueue_burst_elem(qcache, cnt_id, sizeof(cnt_id_t), 1,
+					  NULL);
+	if (unlikely(ret == 0)) { /* cache is full. */
+		wb_num = rte_ring_count(qcache) - cpool->cache->threshold;
+		MLX5_ASSERT(wb_num < rte_ring_count(qcache));
+		__mlx5_hws_cnt_pool_enqueue_revert(qcache, wb_num, &zcdc);
+		rte_ring_enqueue_zc_burst_elem_start(cpool->wait_reset_list,
+				sizeof(cnt_id_t), wb_num, &zcdr, NULL);
+		__hws_cnt_r2rcpy(&zcdr, &zcdc, wb_num);
+		rte_ring_enqueue_zc_elem_finish(cpool->wait_reset_list, wb_num);
+		/* write-back THIS counter too */
+		ret = rte_ring_enqueue_burst_elem(cpool->wait_reset_list,
+				cnt_id, sizeof(cnt_id_t), 1, NULL);
+	}
+	return ret == 1 ? 0 : -ENOENT;
+}
+
+/**
+ * Get one counter from the pool.
+ *
+ * If @param queue is not null, objects will be retrieved first from queue's
+ * cache, subsequently from the common pool. Note that it can return -ENOENT
+ * when the local cache and common pool are empty, even if cache from other
+ * queue are full.
+ *
+ * @param cntp
+ *   A pointer to the counter pool structure.
+ * @param queue
+ *   A pointer to HWS queue. If null, it means fetch from common pool.
+ * @param cnt_id
+ *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @return
+ *   - 0: Success; objects taken.
+ *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
+ *   - -EAGAIN: counter is not ready; try again.
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *qcache = NULL;
+	uint32_t query_gen = 0;
+	cnt_id_t iidx, tmp_cid = 0;
+
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_dequeue_elem(cpool->reuse_list, &tmp_cid,
+				sizeof(cnt_id_t));
+		if (unlikely(ret != 0)) {
+			ret = rte_ring_dequeue_elem(cpool->free_list, &tmp_cid,
+					sizeof(cnt_id_t));
+			if (unlikely(ret != 0)) {
+				if (rte_ring_count(cpool->wait_reset_list))
+					return -EAGAIN;
+				return -ENOENT;
+			}
+		}
+		*cnt_id = tmp_cid;
+		iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+		__hws_cnt_query_raw(cpool, *cnt_id,
+				    &cpool->pool[iidx].reset.hits,
+				    &cpool->pool[iidx].reset.bytes);
+		return 0;
+	}
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
+			&zcdc, NULL);
+	if (unlikely(ret == 0)) { /* local cache is empty. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+	}
+	/* get one from local cache. */
+	*cnt_id = (*(cnt_id_t *)zcdc.ptr1);
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	query_gen = cpool->pool[iidx].query_gen_when_free;
+	if (cpool->query_gen == query_gen) { /* counter is waiting to reset. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* write-back counter to reset list. */
+		mlx5_hws_cnt_pool_cache_flush(cpool, *queue);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+		*cnt_id = *(cnt_id_t *)zcdc.ptr1;
+	}
+	__hws_cnt_query_raw(cpool, *cnt_id, &cpool->pool[iidx].reset.hits,
+			    &cpool->pool[iidx].reset.bytes);
+	rte_ring_dequeue_zc_elem_finish(qcache, 1);
+	cpool->pool[iidx].share = 0;
+	return 0;
+}
+
+static __rte_always_inline unsigned int
+mlx5_hws_cnt_pool_get_size(struct mlx5_hws_cnt_pool *cpool)
+{
+	return rte_ring_get_capacity(cpool->free_list);
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
+		cnt_id_t cnt_id, struct mlx5dr_action **action,
+		uint32_t *offset)
+{
+	uint8_t idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+
+	idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	*action = cpool->dcs_mng.dcs[idx].dr_action;
+	*offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx;
+
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	if (ret != 0)
+		return ret;
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	MLX5_ASSERT(cpool->pool[iidx].share == 0);
+	cpool->pool[iidx].share = 1;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_put(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+
+	cpool->pool[iidx].share = 0;
+	ret = mlx5_hws_cnt_pool_put(cpool, NULL, cnt_id);
+	if (unlikely(ret != 0))
+		cpool->pool[iidx].share = 1; /* fail to release, restore. */
+	return ret;
+}
+
+static __rte_always_inline bool
+mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	return cpool->pool[iidx].share ? true : false;
+}
+
+/* init HWS counter pool. */
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg);
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh);
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool);
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool);
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue);
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
+
+#endif /* _MLX5_HWS_CNT_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 09/17] net/mlx5: support DR action template API
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (7 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 08/17] net/mlx5: add HW steering counter action Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 10/17] net/mlx5: add HW steering connection tracking support Suanming Mou
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adapts mlx5 PMD to changes in mlx5dr API regarding action
templates. It changes the following:

1. Actions template creation:

    - Flow actions types are translated to mlx5dr action types in order
      to create mlx5dr_action_template object.
    - An offset is assigned to each flow action. This offset is used to
      predetermine action's location in rule_acts array passed on rule
      creation.

2. Template table creation:

    - Fixed actions are created and put in rule_acts cache using
      predetermined offsets
    - mlx5dr matcher is parametrized by action templates bound to
      template table.
    - mlx5dr matcher is configured to optimize rule creation based on
      passed rule indices.

3. Flow rule creation:

    - mlx5dr rule is parametrized by action template on which these
      rule's actions are based.
    - Rule index hint is provided to mlx5dr.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   1 +
 drivers/net/mlx5/mlx5.c          |   4 +-
 drivers/net/mlx5/mlx5.h          |   2 +
 drivers/net/mlx5/mlx5_flow.h     |  30 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 617 +++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c  |  10 +
 6 files changed, 541 insertions(+), 123 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 061b825e7b..65795da516 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1565,6 +1565,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		flow_hw_init_flow_metadata_config(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
 		    flow_hw_create_vport_action(eth_dev)) {
 			DRV_LOG(ERR, "port %u failed to create vport action",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b6a66f12ee..cf7b7b7158 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1969,8 +1969,10 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
-	if (priv->sh->config.dv_flow_en == 2)
+	if (priv->sh->config.dv_flow_en == 2) {
+		flow_hw_clear_flow_metadata_config();
 		flow_hw_clear_tags_set(dev);
+	}
 #endif
 	if (priv->rxq_privs != NULL) {
 		/* XXX race condition if mlx5_rx_burst() is still running. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4859f5a509..c0835e725f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1644,6 +1644,8 @@ struct mlx5_priv {
 	struct mlx5dr_action *hw_drop[2];
 	/* HW steering global tag action. */
 	struct mlx5dr_action *hw_tag[2];
+	/* HW steering create ongoing rte flow table list header. */
+	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 8f1b66eaac..ae1417f10e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1175,6 +1175,11 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint16_t dr_actions_num; /* Amount of DR rules actions. */
+	uint16_t actions_num; /* Amount of flow actions */
+	uint16_t *actions_off; /* DR action offset for given rte action offset. */
+	uint16_t reformat_off; /* Offset of DR reformat action. */
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
@@ -1226,7 +1231,6 @@ struct mlx5_hw_actions {
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
-	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
@@ -1482,6 +1486,13 @@ flow_hw_get_wire_port(struct ibv_context *ibctx)
 }
 #endif
 
+extern uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+extern uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+extern uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+void flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev);
+void flow_hw_clear_flow_metadata_config(void);
+
 /*
  * Convert metadata or tag to the actual register.
  * META: Can only be used to match in the FDB in this stage, fixed C_1.
@@ -1493,7 +1504,20 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 {
 	switch (type) {
 	case RTE_FLOW_ITEM_TYPE_META:
-		return REG_C_1;
+		if (mlx5_flow_hw_flow_metadata_esw_en &&
+		    mlx5_flow_hw_flow_metadata_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		}
+		/*
+		 * On root table - PMD allows only egress META matching, thus
+		 * REG_A matching is sufficient.
+		 *
+		 * On non-root tables - REG_A corresponds to general_purpose_lookup_field,
+		 * which translates to REG_A in NIC TX and to REG_B in NIC RX.
+		 * However, current FW does not implement REG_B case right now, so
+		 * REG_B case should be rejected on pattern template validation.
+		 */
+		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
@@ -2402,4 +2426,6 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_pattern_template_attr *attr,
 		const struct rte_flow_item items[],
 		struct rte_flow_error *error);
+int flow_hw_table_update(struct rte_eth_dev *dev,
+			 struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 507abb54e4..91835cd024 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -340,6 +340,13 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 				 struct mlx5_hw_actions *acts)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_action_construct_data *data;
+
+	while (!LIST_EMPTY(&acts->act_list)) {
+		data = LIST_FIRST(&acts->act_list);
+		LIST_REMOVE(data, next);
+		mlx5_ipool_free(priv->acts_ipool, data->idx);
+	}
 
 	if (acts->jump) {
 		struct mlx5_flow_group *grp;
@@ -349,6 +356,16 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->tir) {
+		mlx5_hrxq_release(dev, acts->tir->idx);
+		acts->tir = NULL;
+	}
+	if (acts->encap_decap) {
+		if (acts->encap_decap->action)
+			mlx5dr_action_destroy(acts->encap_decap->action);
+		mlx5_free(acts->encap_decap);
+		acts->encap_decap = NULL;
+	}
 	if (acts->mhdr) {
 		if (acts->mhdr->action)
 			mlx5dr_action_destroy(acts->mhdr->action);
@@ -967,33 +984,29 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 static __rte_always_inline int
 flow_hw_meter_compile(struct rte_eth_dev *dev,
 		      const struct mlx5_flow_template_table_cfg *cfg,
-		      uint32_t  start_pos, const struct rte_flow_action *action,
-		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      uint16_t aso_mtr_pos,
+		      uint16_t jump_pos,
+		      const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts,
 		      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr *aso_mtr;
 	const struct rte_flow_action_meter *meter = action->conf;
-	uint32_t pos = start_pos;
 	uint32_t group = cfg->attr.flow_attr.group;
 
 	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
-	acts->rule_acts[pos].action = priv->mtr_bulk.action;
-	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
-		acts->jump = flow_hw_jump_action_register
+	acts->rule_acts[aso_mtr_pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts->jump = flow_hw_jump_action_register
 		(dev, cfg, aso_mtr->fm.group, error);
-	if (!acts->jump) {
-		*end_pos = start_pos;
+	if (!acts->jump)
 		return -ENOMEM;
-	}
-	acts->rule_acts[++pos].action = (!!group) ?
+	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	*end_pos = pos;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
-		*end_pos = start_pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
 		return -ENOMEM;
-	}
 	return 0;
 }
 
@@ -1046,11 +1059,11 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
  *    Table on success, NULL otherwise and rte_errno is set.
  */
 static int
-flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct mlx5_flow_template_table_cfg *cfg,
-			  struct mlx5_hw_actions *acts,
-			  struct rte_flow_actions_template *at,
-			  struct rte_flow_error *error)
+__flow_hw_actions_translate(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_template_table_cfg *cfg,
+			    struct mlx5_hw_actions *acts,
+			    struct rte_flow_actions_template *at,
+			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
@@ -1061,12 +1074,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	enum mlx5dr_action_reformat_type refmt_type = 0;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
-	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
+	uint16_t reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
-	uint32_t type, i;
+	uint32_t type;
+	bool reformat_used = false;
+	uint16_t action_pos;
+	uint16_t jump_pos;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1076,46 +1092,53 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		type = MLX5DR_TABLE_TYPE_NIC_TX;
 	else
 		type = MLX5DR_TABLE_TYPE_NIC_RX;
-	for (i = 0; !actions_end; actions++, masks++) {
+	for (; !actions_end; actions++, masks++) {
 		switch (actions->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			action_pos = at->actions_off[actions - at->actions];
 			if (!attr->group) {
 				DRV_LOG(ERR, "Indirect action is not supported in root table.");
 				goto err;
 			}
 			if (actions->conf && masks->conf) {
 				if (flow_hw_shared_action_translate
-				(dev, actions, acts, actions - action_start, i))
+				(dev, actions, acts, actions - action_start, action_pos))
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
-			acts->rule_acts[i++].action =
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
 				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
+			action_pos = at->actions_off[actions - at->actions];
 			acts->mark = true;
-			if (masks->conf)
-				acts->rule_acts[i].tag.value =
+			if (masks->conf &&
+			    ((const struct rte_flow_action_mark *)
+			     masks->conf)->id)
+				acts->rule_acts[action_pos].tag.value =
 					mlx5_flow_mark_set
 					(((const struct rte_flow_action_mark *)
-					(masks->conf))->id);
+					(actions->conf))->id);
 			else if (__flow_hw_act_data_general_append(priv, acts,
-				actions->type, actions - action_start, i))
+				actions->type, actions - action_start, action_pos))
 				goto err;
-			acts->rule_acts[i++].action =
+			acts->rule_acts[action_pos].action =
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_jump *)
+			     masks->conf)->group) {
 				uint32_t jump_group =
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
@@ -1123,76 +1146,77 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
-				acts->rule_acts[i].action = (!!attr->group) ?
+				acts->rule_acts[action_pos].action = (!!attr->group) ?
 						acts->jump->hws_action :
 						acts->jump->root_action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_queue *)
+			     masks->conf)->index) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (actions->conf && masks->conf) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
-			reformat_pos = i++;
+			MLX5_ASSERT(!reformat_used);
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
@@ -1206,25 +1230,23 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 actions->conf;
 			encap_data = raw_encap_data->data;
 			data_size = raw_encap_data->size;
-			if (reformat_pos != MLX5_HW_MAX_ACTS) {
+			if (reformat_used) {
 				refmt_type = data_size <
 				MLX5_ENCAPSULATION_DECISION_SIZE ?
 				MLX5DR_ACTION_REFORMAT_TYPE_TNL_L3_TO_L2 :
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L3;
 			} else {
-				reformat_pos = i++;
+				reformat_used = true;
 				refmt_type =
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			}
 			reformat_src = actions - action_start;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
-			reformat_pos = i++;
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			if (mhdr.pos == UINT16_MAX)
-				mhdr.pos = i++;
 			err = flow_hw_modify_field_compile(dev, attr, action_start,
 							   actions, masks, acts, &mhdr,
 							   error);
@@ -1242,40 +1264,46 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			action_pos = at->actions_off[actions - at->actions];
 			if (flow_hw_represented_port_compile
 					(dev, attr, action_start, actions,
-					 masks, acts, i, error))
+					 masks, acts, action_pos, error))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
+			/*
+			 * METER action is compiled to 2 DR actions - ASO_METER and FT.
+			 * Calculated DR offset is stored only for ASO_METER and FT
+			 * is assumed to be the next action.
+			 */
+			action_pos = at->actions_off[actions - at->actions];
+			jump_pos = action_pos + 1;
 			if (actions->conf && masks->conf &&
 			    ((const struct rte_flow_action_meter *)
 			     masks->conf)->mtr_id) {
 				err = flow_hw_meter_compile(dev, cfg,
-						i, actions, acts, &i, error);
+						action_pos, jump_pos, actions, acts, error);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append(priv, acts,
 							actions->type,
 							actions - action_start,
-							i))
+							action_pos))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
+			action_pos = at->actions_off[actions - action_start];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
-				err = flow_hw_cnt_compile(dev, i, acts);
+				err = flow_hw_cnt_compile(dev, action_pos, acts);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -1309,10 +1337,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			goto err;
 		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
 	}
-	if (reformat_pos != MLX5_HW_MAX_ACTS) {
+	if (reformat_used) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
 
+		MLX5_ASSERT(at->reformat_off != UINT16_MAX);
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
 			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
@@ -1340,20 +1369,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
-		acts->rule_acts[reformat_pos].action =
-						acts->encap_decap->action;
-		acts->rule_acts[reformat_pos].reformat.data =
-						acts->encap_decap->data;
+		acts->rule_acts[at->reformat_off].action = acts->encap_decap->action;
+		acts->rule_acts[at->reformat_off].reformat.data = acts->encap_decap->data;
 		if (shared_rfmt)
-			acts->rule_acts[reformat_pos].reformat.offset = 0;
+			acts->rule_acts[at->reformat_off].reformat.offset = 0;
 		else if (__flow_hw_act_data_encap_append(priv, acts,
 				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, data_size))
+				 reformat_src, at->reformat_off, data_size))
 			goto err;
 		acts->encap_decap->shared = shared_rfmt;
-		acts->encap_decap_pos = reformat_pos;
+		acts->encap_decap_pos = at->reformat_off;
 	}
-	acts->acts_num = i;
 	return 0;
 err:
 	err = rte_errno;
@@ -1363,6 +1389,40 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				  "fail to create rte table");
 }
 
+/**
+ * Translate rte_flow actions to DR action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] tbl
+ *   Pointer to the flow template table.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_actions_translate(struct rte_eth_dev *dev,
+			  struct rte_flow_template_table *tbl,
+			  struct rte_flow_error *error)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->nb_action_templates; i++) {
+		if (__flow_hw_actions_translate(dev, &tbl->cfg,
+						&tbl->ats[i].acts,
+						tbl->ats[i].action_template,
+						error))
+			goto err;
+	}
+	return 0;
+err:
+	while (i--)
+		__flow_hw_action_template_destroy(dev, &tbl->ats[i].acts);
+	return -1;
+}
+
 /**
  * Get shared indirect action.
  *
@@ -1611,16 +1671,17 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 static __rte_always_inline int
 flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5_hw_q_job *job,
-			  const struct mlx5_hw_actions *hw_acts,
+			  const struct mlx5_hw_action_template *hw_at,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
+	const struct rte_flow_actions_template *at = hw_at->action_template;
+	const struct mlx5_hw_actions *hw_acts = &hw_at->acts;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
@@ -1636,11 +1697,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *mtr;
 	uint32_t mtr_id;
 
-	memcpy(rule_acts, hw_acts->rule_acts,
-	       sizeof(*rule_acts) * hw_acts->acts_num);
-	*acts_num = hw_acts->acts_num;
-	if (LIST_EMPTY(&hw_acts->act_list))
-		return 0;
+	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
 	ft_flag = mlx5_hw_act_flag[!!table->grp->group_id][table->type];
 	if (table->type == MLX5DR_TABLE_TYPE_FDB) {
@@ -1774,7 +1831,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			(*acts_num)++;
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
@@ -1912,13 +1968,16 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 		.burst = attr->postpone,
 	};
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
-	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
 	const struct rte_flow_item *rule_items;
-	uint32_t acts_num, flow_idx;
+	uint32_t flow_idx;
 	int ret;
 
+	if (unlikely((!dev->data->dev_started))) {
+		rte_errno = EINVAL;
+		goto error;
+	}
 	if (unlikely(!priv->hw_q[queue].job_idx)) {
 		rte_errno = ENOMEM;
 		goto error;
@@ -1941,7 +2000,12 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->flow = flow;
 	job->user_data = user_data;
 	rule_attr.user_data = job;
-	hw_acts = &table->ats[action_template_index].acts;
+	/*
+	 * Indexed pool returns 1-based indices, but mlx5dr expects 0-based indices for rule
+	 * insertion hints.
+	 */
+	MLX5_ASSERT(flow_idx > 0);
+	rule_attr.rule_idx = flow_idx - 1;
 	/*
 	 * Construct the flow actions based on the input actions.
 	 * The implicitly appended action is always fixed, like metadata
@@ -1949,8 +2013,8 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num, queue)) {
+	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
+				      pattern_template_index, actions, rule_acts, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1959,7 +2023,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	if (!rule_items)
 		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
-				 pattern_template_index, items,
+				 pattern_template_index, rule_items,
 				 action_template_index, rule_acts,
 				 &rule_attr, &flow->rule);
 	if (likely(!ret))
@@ -2295,6 +2359,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	struct mlx5dr_action_template *at[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
@@ -2315,6 +2380,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct mlx5_list_entry *ge;
 	uint32_t i, max_tpl = MLX5_HW_TBL_MAX_ITEM_TEMPLATE;
 	uint32_t nb_flows = rte_align32pow2(attr->nb_flows);
+	bool port_started = !!dev->data->dev_started;
 	int err;
 
 	/* HWS layer accepts only 1 item template with root table. */
@@ -2349,12 +2415,20 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl->grp = grp;
 	/* Prepare matcher information. */
 	matcher_attr.priority = attr->flow_attr.priority;
+	matcher_attr.optimize_using_rule_idx = true;
 	matcher_attr.mode = MLX5DR_MATCHER_RESOURCE_MODE_RULE;
 	matcher_attr.rule.num_log = rte_log2_u32(nb_flows);
 	/* Build the item template. */
 	for (i = 0; i < nb_item_templates; i++) {
 		uint32_t ret;
 
+		if ((flow_attr.ingress && !item_templates[i]->attr.ingress) ||
+		    (flow_attr.egress && !item_templates[i]->attr.egress) ||
+		    (flow_attr.transfer && !item_templates[i]->attr.transfer)) {
+			DRV_LOG(ERR, "pattern template and template table attribute mismatch");
+			rte_errno = EINVAL;
+			goto it_error;
+		}
 		ret = __atomic_add_fetch(&item_templates[i]->refcnt, 1,
 					 __ATOMIC_RELAXED);
 		if (ret <= 1) {
@@ -2364,10 +2438,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mt[i] = item_templates[i]->mt;
 		tbl->its[i] = item_templates[i];
 	}
-	tbl->matcher = mlx5dr_matcher_create
-		(tbl->grp->tbl, mt, nb_item_templates, NULL, 0, &matcher_attr);
-	if (!tbl->matcher)
-		goto it_error;
 	tbl->nb_item_templates = nb_item_templates;
 	/* Build the action template. */
 	for (i = 0; i < nb_action_templates; i++) {
@@ -2379,21 +2449,31 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			rte_errno = EINVAL;
 			goto at_error;
 		}
+		at[i] = action_templates[i]->tmpl;
+		tbl->ats[i].action_template = action_templates[i];
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, &tbl->cfg,
-						&tbl->ats[i].acts,
-						action_templates[i], error);
+		if (!port_started)
+			continue;
+		err = __flow_hw_actions_translate(dev, &tbl->cfg,
+						  &tbl->ats[i].acts,
+						  action_templates[i], error);
 		if (err) {
 			i++;
 			goto at_error;
 		}
-		tbl->ats[i].action_template = action_templates[i];
 	}
 	tbl->nb_action_templates = nb_action_templates;
+	tbl->matcher = mlx5dr_matcher_create
+		(tbl->grp->tbl, mt, nb_item_templates, at, nb_action_templates, &matcher_attr);
+	if (!tbl->matcher)
+		goto at_error;
 	tbl->type = attr->flow_attr.transfer ? MLX5DR_TABLE_TYPE_FDB :
 		    (attr->flow_attr.egress ? MLX5DR_TABLE_TYPE_NIC_TX :
 		    MLX5DR_TABLE_TYPE_NIC_RX);
-	LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	if (port_started)
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	else
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl_ongo, tbl, next);
 	return tbl;
 at_error:
 	while (i--) {
@@ -2406,7 +2486,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	while (i--)
 		__atomic_sub_fetch(&item_templates[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
-	mlx5dr_matcher_destroy(tbl->matcher);
 error:
 	err = rte_errno;
 	if (tbl) {
@@ -2423,6 +2502,33 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Update flow template table.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+int
+flow_hw_table_update(struct rte_eth_dev *dev,
+		     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table *tbl;
+
+	while ((tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo)) != NULL) {
+		if (flow_hw_actions_translate(dev, tbl, error))
+			return -1;
+		LIST_REMOVE(tbl, next);
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	}
+	return 0;
+}
+
 /**
  * Translates group index specified by the user in @p attr to internal
  * group index.
@@ -2501,6 +2607,7 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -2509,6 +2616,12 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
+	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
+		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+				  "egress flows are not supported with HW Steering"
+				  " when E-Switch is enabled");
+		return NULL;
+	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -2750,7 +2863,8 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_action *mask = &masks[i];
 
 		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
-		if (action->type != mask->type)
+		if (action->type != RTE_FLOW_ACTION_TYPE_INDIRECT &&
+		    action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  action,
@@ -2826,6 +2940,157 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
+	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
+	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
+	[RTE_FLOW_ACTION_TYPE_JUMP] = MLX5DR_ACTION_TYP_FT,
+	[RTE_FLOW_ACTION_TYPE_QUEUE] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_RSS] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
+	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
+	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+};
+
+static int
+flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
+					  unsigned int action_src,
+					  enum mlx5dr_action_type *action_types,
+					  uint16_t *curr_off,
+					  struct rte_flow_actions_template *at)
+{
+	uint32_t type;
+
+	if (!mask) {
+		DRV_LOG(WARNING, "Unable to determine indirect action type "
+			"without a mask specified");
+		return -EINVAL;
+	}
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
+		*curr_off = *curr_off + 1;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
+		*curr_off = *curr_off + 1;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Create DR action template based on a provided sequence of flow actions.
+ *
+ * @param[in] at
+ *   Pointer to flow actions template to be updated.
+ *
+ * @return
+ *   DR action template pointer on success and action offsets in @p at are updated.
+ *   NULL otherwise.
+ */
+static struct mlx5dr_action_template *
+flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
+{
+	struct mlx5dr_action_template *dr_template;
+	enum mlx5dr_action_type action_types[MLX5_HW_MAX_ACTS] = { MLX5DR_ACTION_TYP_LAST };
+	unsigned int i;
+	uint16_t curr_off;
+	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+	uint16_t reformat_off = UINT16_MAX;
+	uint16_t mhdr_off = UINT16_MAX;
+	int ret;
+	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		const struct rte_flow_action_raw_encap *raw_encap_data;
+		size_t data_size;
+		enum mlx5dr_action_type type;
+
+		if (curr_off >= MLX5_HW_MAX_ACTS)
+			goto err_actions_num;
+		switch (at->actions[i].type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
+									action_types,
+									&curr_off, at);
+			if (ret)
+				return NULL;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			MLX5_ASSERT(reformat_off == UINT16_MAX);
+			reformat_off = curr_off++;
+			reformat_act_type = mlx5_hw_dr_action_types[at->actions[i].type];
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data = at->actions[i].conf;
+			data_size = raw_encap_data->size;
+			if (reformat_off != UINT16_MAX) {
+				reformat_act_type = data_size < MLX5_ENCAPSULATION_DECISION_SIZE ?
+					MLX5DR_ACTION_TYP_TNL_L3_TO_L2 :
+					MLX5DR_ACTION_TYP_L2_TO_TNL_L3;
+			} else {
+				reformat_off = curr_off++;
+				reformat_act_type = MLX5DR_ACTION_TYP_L2_TO_TNL_L2;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			reformat_off = curr_off++;
+			reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr_off == UINT16_MAX) {
+				mhdr_off = curr_off++;
+				type = mlx5_hw_dr_action_types[at->actions[i].type];
+				action_types[mhdr_off] = type;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
+			break;
+		default:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			break;
+		}
+	}
+	if (curr_off >= MLX5_HW_MAX_ACTS)
+		goto err_actions_num;
+	if (mhdr_off != UINT16_MAX)
+		at->mhdr_off = mhdr_off;
+	if (reformat_off != UINT16_MAX) {
+		at->reformat_off = reformat_off;
+		action_types[reformat_off] = reformat_act_type;
+	}
+	dr_template = mlx5dr_action_template_create(action_types);
+	if (dr_template)
+		at->dr_actions_num = curr_off;
+	else
+		DRV_LOG(ERR, "Failed to create DR action template: %d", rte_errno);
+	return dr_template;
+err_actions_num:
+	DRV_LOG(ERR, "Number of HW actions (%u) exceeded maximum (%u) allowed in template",
+		curr_off, MLX5_HW_MAX_ACTS);
+	return NULL;
+}
+
 /**
  * Create flow action template.
  *
@@ -2851,7 +3116,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_len, mask_len, i;
+	int len, act_num, act_len, mask_len;
+	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = MLX5_HW_MAX_ACTS;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
@@ -2921,6 +3187,11 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
+	/* Count flow actions to allocate required space for storing DR offsets. */
+	act_num = 0;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
+		act_num++;
+	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
@@ -2930,19 +3201,26 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
-	/* Actions part is in the first half. */
+	/* Actions part is in the first part. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
 				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	/* Masks part is in the second half. */
+	/* Masks part is in the second part. */
 	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
 				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	/* DR actions offsets in the third part. */
+	at->actions_off = (uint16_t *)((uint8_t *)at->masks + mask_len);
+	at->actions_num = act_num;
+	for (i = 0; i < at->actions_num; ++i)
+		at->actions_off[i] = UINT16_MAX;
+	at->reformat_off = UINT16_MAX;
+	at->mhdr_off = UINT16_MAX;
 	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
@@ -2956,12 +3234,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			at->masks[i].conf = masks->conf;
 		}
 	}
+	at->tmpl = flow_hw_dr_actions_template_create(at);
+	if (!at->tmpl)
+		goto error;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	if (at)
+	if (at) {
+		if (at->tmpl)
+			mlx5dr_action_template_destroy(at->tmpl);
 		mlx5_free(at);
+	}
 	return NULL;
 }
 
@@ -2992,6 +3276,8 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 				   "action template in using");
 	}
 	LIST_REMOVE(template, next);
+	if (template->tmpl)
+		mlx5dr_action_template_destroy(template->tmpl);
 	mlx5_free(template);
 	return 0;
 }
@@ -3042,11 +3328,48 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			 const struct rte_flow_item items[],
 			 struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 	bool items_end = false;
-	RTE_SET_USED(dev);
-	RTE_SET_USED(attr);
 
+	if (!attr->ingress && !attr->egress && !attr->transfer)
+		return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "at least one of the direction attributes"
+					  " must be specified");
+	if (priv->sh->config.dv_esw_en) {
+		MLX5_ASSERT(priv->master || priv->representor);
+		if (priv->master) {
+			/*
+			 * It is allowed to specify ingress, egress and transfer attributes
+			 * at the same time, in order to construct flows catching all missed
+			 * FDB traffic and forwarding it to the master port.
+			 */
+			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "only one or all direction attributes"
+							  " at once can be used on transfer proxy"
+							  " port");
+		} else {
+			if (attr->transfer)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+							  "transfer attribute cannot be used with"
+							  " port representors");
+			if (attr->ingress && attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "ingress and egress direction attributes"
+							  " cannot be used at the same time on"
+							  " port representors");
+		}
+	} else {
+		if (attr->transfer)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+						  "transfer attribute cannot be used when"
+						  " E-Switch is disabled");
+	}
 	for (i = 0; !items_end; i++) {
 		int type = items[i].type;
 
@@ -3069,7 +3392,6 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		{
 			const struct rte_flow_item_tag *tag =
 				(const struct rte_flow_item_tag *)items[i].spec;
-			struct mlx5_priv *priv = dev->data->dev_private;
 			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
 
 			if (!((1 << (tag->index - REG_C_0)) & regcs))
@@ -3077,7 +3399,26 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 							  NULL,
 							  "Unsupported internal tag index");
+			break;
 		}
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+			if (attr->ingress || attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when transfer attribute is set");
+			break;
+		case RTE_FLOW_ITEM_TYPE_META:
+			if (!priv->sh->config.dv_esw_en ||
+			    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+				if (attr->ingress)
+					return rte_flow_error_set(error, EINVAL,
+								  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+								  "META item is not supported"
+								  " on current FW with ingress"
+								  " attribute");
+			}
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -3087,10 +3428,8 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_TCP:
 		case RTE_FLOW_ITEM_TYPE_GTP:
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
-		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
-		case RTE_FLOW_ITEM_TYPE_META:
 		case RTE_FLOW_ITEM_TYPE_GRE:
 		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
@@ -3138,21 +3477,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress) {
-		/*
-		 * Disallow pattern template with ingress and egress/transfer
-		 * attributes in order to forbid implicit port matching
-		 * on egress and transfer traffic.
-		 */
-		if (attr->egress || attr->transfer) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					   NULL,
-					   "item template for ingress traffic"
-					   " cannot be used for egress/transfer"
-					   " traffic when E-Switch is enabled");
-			return NULL;
-		}
+	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
 		copied_items = flow_hw_copy_prepend_port_item(items, error);
 		if (!copied_items)
 			return NULL;
@@ -4542,6 +4867,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
+		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
+		flow_hw_table_destroy(dev, tbl, NULL);
+	}
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -4679,6 +5008,54 @@ void flow_hw_clear_tags_set(struct rte_eth_dev *dev)
 		       sizeof(enum modify_reg) * MLX5_FLOW_HW_TAGS_MAX);
 }
 
+uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+/**
+ * Initializes static configuration of META flow items.
+ *
+ * As a temporary workaround, META flow item is translated to a register,
+ * based on statically saved dv_esw_en and dv_xmeta_en device arguments.
+ * It is a workaround for flow_hw_get_reg_id() where port specific information
+ * is not available at runtime.
+ *
+ * Values of dv_esw_en and dv_xmeta_en device arguments are taken from the first opened port.
+ * This means that each mlx5 port will use the same configuration for translation
+ * of META flow items.
+ *
+ * @param[in] dev
+ *    Pointer to Ethernet device.
+ */
+void
+flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_fetch_add(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = MLX5_SH(dev)->config.dv_esw_en;
+	mlx5_flow_hw_flow_metadata_xmeta_en = MLX5_SH(dev)->config.dv_xmeta_en;
+}
+
+/**
+ * Clears statically stored configuration related to META flow items.
+ */
+void
+flow_hw_clear_flow_metadata_config(void)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_sub_fetch(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = 0;
+	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
+}
+
 /**
  * Create shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index ccefebefc9..2603196933 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1170,6 +1170,16 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 			dev->data->port_id, rte_strerror(rte_errno));
 		goto error;
 	}
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		ret = flow_hw_table_update(dev, NULL);
+		if (ret) {
+			DRV_LOG(ERR, "port %u failed to update HWS tables",
+				dev->data->port_id);
+			goto error;
+		}
+	}
+#endif
 	ret = mlx5_traffic_enable(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u failed to set defaults flows",
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 10/17] net/mlx5: add HW steering connection tracking support
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (8 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 09/17] net/mlx5: support DR action template API Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 11/17] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

This commit adds the support of connection tracking to HW steering as
SW steering did before.

Different with SW steering implementation, take advantage of HW steering
bulk action allocation support, in HW steering only one single CT pool
is needed.

An indexed pool is introduced to record allocated actions from bulk and
CT action state etc. Once one CT action is allocated from bulk, one
indexed object will also be allocated from the indexed pool, similar for
deallocate. That makes mlx5_aso_ct_action can also be managed by that
indexed pool, no need to be reserved from mlx5_aso_ct_pool. The single
CT pool is also saved to mlx5_aso_ct_action struct directly.

The ASO operation functions are shared with SW steering implementation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   8 +-
 drivers/net/mlx5/mlx5.c          |   3 +-
 drivers/net/mlx5/mlx5.h          |  54 ++++-
 drivers/net/mlx5/mlx5_flow.c     |   1 +
 drivers/net/mlx5/mlx5_flow.h     |   7 +
 drivers/net/mlx5/mlx5_flow_aso.c | 212 +++++++++++++----
 drivers/net/mlx5/mlx5_flow_dv.c  |  28 ++-
 drivers/net/mlx5/mlx5_flow_hw.c  | 381 ++++++++++++++++++++++++++++++-
 8 files changed, 617 insertions(+), 77 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 65795da516..60a1a391fb 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1349,9 +1349,11 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			DRV_LOG(DEBUG, "Flow Hit ASO is supported.");
 		}
 #endif /* HAVE_MLX5_DR_CREATE_ACTION_ASO */
-#if defined(HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
-	defined(HAVE_MLX5_DR_ACTION_ASO_CT)
-		if (hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
+#if defined (HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
+    defined (HAVE_MLX5_DR_ACTION_ASO_CT)
+		/* HWS create CT ASO SQ based on HWS configure queue number. */
+		if (sh->config.dv_flow_en != 2 &&
+		    hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
 			err = mlx5_flow_aso_ct_mng_init(sh);
 			if (err) {
 				err = -err;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cf7b7b7158..925e19bcd5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -755,7 +755,8 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 
 	if (sh->ct_mng)
 		return 0;
-	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng),
+	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng) +
+				 sizeof(struct mlx5_aso_sq) * MLX5_ASO_CT_SQ_NUM,
 				 RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
 	if (!sh->ct_mng) {
 		DRV_LOG(ERR, "ASO CT management allocation failed.");
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c0835e725f..0578a41675 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -39,6 +39,8 @@
 
 #define MLX5_SH(dev) (((struct mlx5_priv *)(dev)->data->dev_private)->sh)
 
+#define MLX5_HW_INV_QUEUE UINT32_MAX
+
 /*
  * Number of modification commands.
  * The maximal actions amount in FW is some constant, and it is 16 in the
@@ -1159,7 +1161,12 @@ enum mlx5_aso_ct_state {
 
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
-	LIST_ENTRY(mlx5_aso_ct_action) next; /* Pointer to the next ASO CT. */
+	union {
+		LIST_ENTRY(mlx5_aso_ct_action) next;
+		/* Pointer to the next ASO CT. Used only in SWS. */
+		struct mlx5_aso_ct_pool *pool;
+		/* Pointer to action pool. Used only in HWS. */
+	};
 	void *dr_action_orig; /* General action object for original dir. */
 	void *dr_action_rply; /* General action object for reply dir. */
 	uint32_t refcnt; /* Action used count in device flows. */
@@ -1173,28 +1180,48 @@ struct mlx5_aso_ct_action {
 #define MLX5_ASO_CT_UPDATE_STATE(c, s) \
 	__atomic_store_n(&((c)->state), (s), __ATOMIC_RELAXED)
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+
 /* ASO connection tracking software pool definition. */
 struct mlx5_aso_ct_pool {
 	uint16_t index; /* Pool index in pools array. */
+	/* Free ASO CT index in the pool. Used by HWS. */
+	struct mlx5_indexed_pool *cts;
 	struct mlx5_devx_obj *devx_obj;
-	/* The first devx object in the bulk, used for freeing (not yet). */
-	struct mlx5_aso_ct_action actions[MLX5_ASO_CT_ACTIONS_PER_POOL];
+	union {
+		void *dummy_action;
+		/* Dummy action to increase the reference count in the driver. */
+		struct mlx5dr_action *dr_action;
+		/* HWS action. */
+	};
+	struct mlx5_aso_sq *sq; /* Async ASO SQ. */
+	struct mlx5_aso_sq *shared_sq; /* Shared ASO SQ. */
+	struct mlx5_aso_ct_action actions[0];
 	/* CT action structures bulk. */
 };
 
 LIST_HEAD(aso_ct_list, mlx5_aso_ct_action);
 
+#define MLX5_ASO_CT_SQ_NUM 16
+
 /* Pools management structure for ASO connection tracking pools. */
 struct mlx5_aso_ct_pools_mng {
 	struct mlx5_aso_ct_pool **pools;
 	uint16_t n; /* Total number of pools. */
 	uint16_t next; /* Number of pools in use, index of next free pool. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
 	rte_spinlock_t ct_sl; /* The ASO CT free list lock. */
 	rte_rwlock_t resize_rwl; /* The ASO CT pool resize lock. */
 	struct aso_ct_list free_cts; /* Free ASO CT objects list. */
-	struct mlx5_aso_sq aso_sq; /* ASO queue objects. */
+	struct mlx5_aso_sq aso_sqs[0]; /* ASO queue objects. */
 };
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
 /* LAG attr. */
 struct mlx5_lag {
 	uint8_t tx_remap_affinity[16]; /* The PF port number of affinity */
@@ -1332,8 +1359,7 @@ struct mlx5_dev_ctx_shared {
 	rte_spinlock_t geneve_tlv_opt_sl; /* Lock for geneve tlv resource */
 	struct mlx5_flow_mtr_mng *mtrmng;
 	/* Meter management structure. */
-	struct mlx5_aso_ct_pools_mng *ct_mng;
-	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pools_mng *ct_mng; /* Management data for ASO CT in HWS only. */
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
@@ -1647,6 +1673,9 @@ struct mlx5_priv {
 	/* HW steering create ongoing rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
+	struct mlx5_aso_ct_pools_mng *ct_mng;
+	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
 #endif
 };
 
@@ -2046,15 +2075,15 @@ int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
-int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
-int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
 			     struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
 mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
@@ -2065,6 +2094,11 @@ int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_hws_cnt_pool *cpool);
+int mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_aso_ct_pools_mng *ct_mng,
+			   uint32_t nb_queues);
+int mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_aso_ct_pools_mng *ct_mng);
 
 /* mlx5_flow_flex.c */
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 658cc69750..cbf9c31984 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -49,6 +49,7 @@ struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
  */
 uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 enum modify_reg mlx5_flow_hw_avl_tags[MLX5_FLOW_HW_TAGS_MAX] = {REG_NON};
+enum modify_reg mlx5_flow_hw_aso_tag;
 
 struct tunnel_default_miss_ctx {
 	uint16_t *queue;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ae1417f10e..f75a56a57b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -82,6 +82,10 @@ enum {
 #define MLX5_INDIRECT_ACT_CT_GET_IDX(index) \
 	((index) & ((1 << MLX5_INDIRECT_ACT_CT_OWNER_SHIFT) - 1))
 
+#define MLX5_ACTION_CTX_CT_GET_IDX  MLX5_INDIRECT_ACT_CT_GET_IDX
+#define MLX5_ACTION_CTX_CT_GET_OWNER MLX5_INDIRECT_ACT_CT_GET_OWNER
+#define MLX5_ACTION_CTX_CT_GEN_IDX MLX5_INDIRECT_ACT_CT_GEN_IDX
+
 /* Matches on selected register. */
 struct mlx5_rte_flow_item_tag {
 	enum modify_reg id;
@@ -1444,6 +1448,7 @@ extern struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
 #define MLX5_FLOW_HW_TAGS_MAX 8
 extern uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 extern enum modify_reg mlx5_flow_hw_avl_tags[];
+extern enum modify_reg mlx5_flow_hw_aso_tag;
 
 /*
  * Get metadata match tag and mask for given rte_eth_dev port.
@@ -1518,6 +1523,8 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 * REG_B case should be rejected on pattern template validation.
 		 */
 		return REG_A;
+	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index ed9272e583..c00c07b891 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -313,16 +313,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		/* 64B per object for query. */
-		if (mlx5_aso_reg_mr(cdev, 64 * sq_desc_n,
-				    &sh->ct_mng->aso_sq.mr))
+		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
 			return -1;
-		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
-			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
-			return -1;
-		}
-		mlx5_aso_ct_init_sq(&sh->ct_mng->aso_sq);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
@@ -343,7 +335,7 @@ void
 mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		      enum mlx5_access_aso_opc_mod aso_opc_mod)
 {
-	struct mlx5_aso_sq *sq;
+	struct mlx5_aso_sq *sq = NULL;
 
 	switch (aso_opc_mod) {
 	case ASO_OPC_MOD_FLOW_HIT:
@@ -354,14 +346,14 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->mtrmng->pools_mng.sq;
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		mlx5_aso_dereg_mr(sh->cdev, &sh->ct_mng->aso_sq.mr);
-		sq = &sh->ct_mng->aso_sq;
+		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
 		return;
 	}
-	mlx5_aso_destroy_sq(sq);
+	if (sq)
+		mlx5_aso_destroy_sq(sq);
 }
 
 /**
@@ -903,6 +895,89 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 	return -1;
 }
 
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_hws(uint32_t queue,
+			    struct mlx5_aso_ct_pool *pool)
+{
+	return (queue == MLX5_HW_INV_QUEUE) ?
+		pool->shared_sq : &pool->sq[queue];
+}
+
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_sws(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_ct_action *ct)
+{
+	return &sh->ct_mng->aso_sqs[ct->offset & (MLX5_ASO_CT_SQ_NUM - 1)];
+}
+
+static inline struct mlx5_aso_ct_pool*
+__mlx5_aso_ct_get_pool(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_action *ct)
+{
+	if (likely(sh->config.dv_flow_en == 2))
+		return ct->pool;
+	return container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+}
+
+int
+mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			 struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < ct_mng->nb_sq; i++) {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	}
+	return 0;
+}
+
+/**
+ * API to create and initialize CT Send Queue used for ASO access.
+ *
+ * @param[in] sh
+ *   Pointer to shared device context.
+ * @param[in] ct_mng
+ *   Pointer to the CT management struct.
+ * *param[in] nb_queues
+ *   Number of queues to be allocated.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_pools_mng *ct_mng,
+		       uint32_t nb_queues)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < nb_queues; i++) {
+		if (mlx5_aso_reg_mr(sh->cdev, 64 * (1 << MLX5_ASO_QUEUE_LOG_DESC),
+				    &ct_mng->aso_sqs[i].mr))
+			goto error;
+		if (mlx5_aso_sq_create(sh->cdev, &ct_mng->aso_sqs[i],
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_ct_init_sq(&ct_mng->aso_sqs[i]);
+	}
+	ct_mng->nb_sq = nb_queues;
+	return 0;
+error:
+	do {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		if (&ct_mng->aso_sqs[i])
+			mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	} while (i--);
+	ct_mng->nb_sq = 0;
+	return -1;
+}
+
 /*
  * Post a WQE to the ASO CT SQ to modify the context.
  *
@@ -918,11 +993,12 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
  */
 static uint16_t
 mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
+			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile)
+			      const struct rte_flow_action_conntrack *profile,
+			      bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -931,11 +1007,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	void *orig_dir;
 	void *reply_dir;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	/* Prevent other threads to update the index. */
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -945,7 +1023,7 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
 	sq->elts[sq->head & mask].ct = ct;
 	sq->elts[sq->head & mask].query_data = NULL;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1028,7 +1106,8 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1080,10 +1159,11 @@ mlx5_aso_ct_status_update(struct mlx5_aso_sq *sq, uint16_t num)
  */
 static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
-			    struct mlx5_aso_ct_action *ct, char *data)
+			    struct mlx5_aso_sq *sq,
+			    struct mlx5_aso_ct_action *ct, char *data,
+			    bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -1098,10 +1178,12 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	} else if (state == ASO_CONNTRACK_WAIT) {
 		return 0;
 	}
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -1113,7 +1195,7 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	wqe_idx = sq->head & mask;
 	sq->elts[wqe_idx].ct = ct;
 	sq->elts[wqe_idx].query_data = data;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1141,7 +1223,8 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1152,9 +1235,10 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
  *   Pointer to the CT pools management structure.
  */
 static void
-mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
+mlx5_aso_ct_completion_handle(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			      struct mlx5_aso_sq *sq,
+			      bool need_lock)
 {
-	struct mlx5_aso_sq *sq = &mng->aso_sq;
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
 	const uint32_t cq_size = 1 << cq->log_desc_n;
@@ -1165,10 +1249,12 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return;
 	}
 	next_idx = cq->cq_ci & mask;
@@ -1199,7 +1285,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /*
@@ -1207,6 +1294,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue index.
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  * @param[in] profile
@@ -1217,21 +1306,26 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  */
 int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
 			  const struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, ct, profile))
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1242,6 +1336,8 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue which CT works on..
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  *
@@ -1249,25 +1345,29 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, -1 on failure.
  */
 int
-mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		       struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 	    ASO_CONNTRACK_READY)
 		return 0;
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 		    ASO_CONNTRACK_READY)
 			return 0;
 		/* Waiting for CQE ready, consider should block or sleep. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to poll CQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1363,18 +1463,24 @@ mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
  */
 int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
 			 struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	char out_data[64 * 2];
 	int ret;
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		ret = mlx5_aso_ct_sq_query_single(sh, ct, out_data);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1383,12 +1489,11 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		else
 			rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
 data_handle:
-	ret = mlx5_aso_ct_wait_ready(sh, ct);
+	ret = mlx5_aso_ct_wait_ready(sh, queue, ct);
 	if (!ret)
 		mlx5_aso_ct_obj_analyze(profile, out_data);
 	return ret;
@@ -1408,13 +1513,20 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
  */
 int
 mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+		      uint32_t queue,
 		      struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	enum mlx5_aso_ct_state state =
 				__atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (state == ASO_CONNTRACK_FREE) {
 		rte_errno = ENXIO;
 		return -rte_errno;
@@ -1423,13 +1535,13 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		return 0;
 	}
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		state = __atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 		if (state == ASO_CONNTRACK_READY ||
 		    state == ASO_CONNTRACK_QUERY)
 			return 0;
-		/* Waiting for CQE ready, consider should block or sleep. */
-		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
+		/* Waiting for CQE ready, consider should block or sleep.  */
+		rte_delay_us_block(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
 	rte_errno = EBUSY;
 	return -rte_errno;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1eb1ce659f..9bede7c04f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12813,6 +12813,7 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 	struct mlx5_devx_obj *obj = NULL;
 	uint32_t i;
 	uint32_t log_obj_size = rte_log2_u32(MLX5_ASO_CT_ACTIONS_PER_POOL);
+	size_t mem_size;
 
 	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
 							  priv->sh->cdev->pdn,
@@ -12822,7 +12823,10 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
 		return NULL;
 	}
-	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	mem_size = sizeof(struct mlx5_aso_ct_action) *
+		   MLX5_ASO_CT_ACTIONS_PER_POOL +
+		   sizeof(*pool);
+	pool = mlx5_malloc(MLX5_MEM_ZERO, mem_size, 0, SOCKET_ID_ANY);
 	if (!pool) {
 		rte_errno = ENOMEM;
 		claim_zero(mlx5_devx_cmd_destroy(obj));
@@ -12962,10 +12966,13 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, ct, pro))
-		return rte_flow_error_set(error, EBUSY,
-					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					  "Failed to update CT");
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+		flow_dv_aso_ct_dev_release(dev, idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	return idx;
@@ -14160,7 +14167,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
 						"Failed to get CT object.");
-			if (mlx5_aso_ct_available(priv->sh, ct))
+			if (mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct))
 				return rte_flow_error_set(error, rte_errno,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
@@ -15768,14 +15775,15 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						ct, new_prf);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
 					"Failed to send CT context update WQE");
-		/* Block until ready or a failure. */
-		ret = mlx5_aso_ct_available(priv->sh, ct);
+		/* Block until ready or a failure, default is asynchronous. */
+		ret = mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct);
 		if (ret)
 			rte_flow_error_set(error, rte_errno,
 					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16604,7 +16612,7 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 91835cd024..f4340c475d 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -15,6 +15,14 @@
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_HW_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_HW_IPOOL_CACHE_MIN (1 << 9)
+
 /* Default push burst threshold. */
 #define BURST_THR 32u
 
@@ -324,6 +332,25 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 	return hrxq;
 }
 
+static __rte_always_inline int
+flow_hw_ct_compile(struct rte_eth_dev *dev,
+		   uint32_t queue, uint32_t idx,
+		   struct mlx5dr_rule_action *rule_act)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(priv->hws_ctpool->cts, MLX5_ACTION_CTX_CT_GET_IDX(idx));
+	if (!ct || mlx5_aso_ct_available(priv->sh, queue, ct))
+		return -1;
+	rule_act->action = priv->hws_ctpool->dr_action;
+	rule_act->aso_ct.offset = ct->offset;
+	rule_act->aso_ct.direction = ct->is_original ?
+		MLX5DR_ACTION_ASO_CT_DIRECTION_INITIATOR :
+		MLX5DR_ACTION_ASO_CT_DIRECTION_RESPONDER;
+	return 0;
+}
+
 /**
  * Destroy DR actions created by action template.
  *
@@ -640,6 +667,11 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
+				       idx, &acts->rule_acts[action_dst]))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1083,6 +1115,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	bool reformat_used = false;
 	uint16_t action_pos;
 	uint16_t jump_pos;
+	uint32_t ct_idx;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1305,6 +1338,20 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf) {
+				ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+					 ((uint32_t)(uintptr_t)actions->conf);
+				if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE, ct_idx,
+						       &acts->rule_acts[action_pos]))
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos)) {
+				goto err;
+			}
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1479,6 +1526,8 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev data structure.
+ * @param[in] queue
+ *   The flow creation queue index.
  * @param[in] action
  *   Pointer to the shared indirect rte_flow action.
  * @param[in] table
@@ -1492,7 +1541,7 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *    0 on success, negative value otherwise and rte_errno is set.
  */
 static __rte_always_inline int
-flow_hw_shared_action_construct(struct rte_eth_dev *dev,
+flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
 				const uint8_t it_idx,
@@ -1532,6 +1581,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				&rule_act->counter.offset))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1727,6 +1780,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		uint32_t ct_idx;
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
@@ -1735,7 +1789,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
-					(dev, action, table, it_idx,
+					(dev, queue, action, table, it_idx,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -1860,6 +1914,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				return ret;
 			job->flow->cnt_id = act_data->shared_counter.id;
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+				 ((uint32_t)(uintptr_t)action->conf);
+			if (flow_hw_ct_compile(dev, queue, ct_idx,
+					       &rule_acts[act_data->action_dst]))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2391,6 +2452,8 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	if (nb_flows < cfg.trunk_size) {
 		cfg.per_core_cache = 0;
 		cfg.trunk_size = nb_flows;
+	} else if (nb_flows <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
 	}
 	/* Check if we requires too many templates. */
 	if (nb_item_templates > max_tpl ||
@@ -2927,6 +2990,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2953,6 +3019,7 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 };
 
 static int
@@ -2981,6 +3048,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3435,6 +3507,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
 		case RTE_FLOW_ITEM_TYPE_ICMP:
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
+		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
@@ -4630,6 +4703,97 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	return -EINVAL;
 }
 
+static void
+flow_hw_ct_mng_destroy(struct rte_eth_dev *dev,
+		       struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	mlx5_aso_ct_queue_uninit(priv->sh, ct_mng);
+	mlx5_free(ct_mng);
+}
+
+static void
+flow_hw_ct_pool_destroy(struct rte_eth_dev *dev __rte_unused,
+			struct mlx5_aso_ct_pool *pool)
+{
+	if (pool->dr_action)
+		mlx5dr_action_destroy(pool->dr_action);
+	if (pool->devx_obj)
+		claim_zero(mlx5_devx_cmd_destroy(pool->devx_obj));
+	if (pool->cts)
+		mlx5_ipool_destroy(pool->cts);
+	mlx5_free(pool);
+}
+
+static struct mlx5_aso_ct_pool *
+flow_hw_ct_pool_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *port_attr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_devx_obj *obj;
+	uint32_t nb_cts = rte_align32pow2(port_attr->nb_conn_tracks);
+	uint32_t log_obj_size = rte_log2_u32(nb_cts);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_ct_action),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hw_ct_action",
+	};
+	int reg_id;
+	uint32_t flags;
+
+	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	if (!pool) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
+							  priv->sh->cdev->pdn,
+							  log_obj_size);
+	if (!obj) {
+		rte_errno = ENODATA;
+		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
+		goto err;
+	}
+	pool->devx_obj = obj;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_ASO_CONNTRACK, 0, NULL);
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	pool->dr_action = mlx5dr_action_create_aso_ct(priv->dr_ctx,
+						      (struct mlx5dr_devx_obj *)obj,
+						      reg_id - REG_C_0, flags);
+	if (!pool->dr_action)
+		goto err;
+	/*
+	 * No need for local cache if CT number is a small number. Since
+	 * flow insertion rate will be very limited in that case. Here let's
+	 * set the number to less than default trunk size 4K.
+	 */
+	if (nb_cts <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_cts;
+	} else if (nb_cts <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	pool->cts = mlx5_ipool_create(&cfg);
+	if (!pool->cts)
+		goto err;
+	pool->sq = priv->ct_mng->aso_sqs;
+	/* Assign the last extra ASO SQ as public SQ. */
+	pool->shared_sq = &priv->ct_mng->aso_sqs[priv->nb_queue - 1];
+	return pool;
+err:
+	flow_hw_ct_pool_destroy(dev, pool);
+	return NULL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4815,6 +4979,20 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_conn_tracks) {
+		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
+			   sizeof(*priv->ct_mng);
+		priv->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
+					   RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!priv->ct_mng)
+			goto err;
+		if (mlx5_aso_ct_queue_init(priv->sh, priv->ct_mng, nb_q_updated))
+			goto err;
+		priv->hws_ctpool = flow_hw_ct_pool_create(dev, port_attr);
+		if (!priv->hws_ctpool)
+			goto err;
+		priv->sh->ct_aso_en = 1;
+	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
 				nb_queue);
@@ -4823,6 +5001,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	return 0;
 err:
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -4896,6 +5082,14 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	}
 	if (priv->hws_cpool)
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4964,6 +5158,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
+		MLX5_ASSERT(mlx5_flow_hw_aso_tag == priv->mtr_color_reg);
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
@@ -4986,6 +5181,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		}
 	}
 	priv->sh->hws_tags = 1;
+	mlx5_flow_hw_aso_tag = (enum modify_reg)priv->mtr_color_reg;
 	mlx5_flow_hw_avl_tags_init_cnt++;
 }
 
@@ -5056,6 +5252,170 @@ flow_hw_clear_flow_metadata_config(void)
 	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
 }
 
+static int
+flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
+			  uint32_t idx,
+			  struct rte_flow_error *error)
+{
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	struct rte_eth_dev *owndev = &rte_eth_devices[owner];
+	struct mlx5_priv *priv = owndev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT destruction index");
+	}
+	__atomic_store_n(&ct->state, ASO_CONNTRACK_FREE,
+				 __ATOMIC_RELAXED);
+	mlx5_ipool_free(pool->cts, ct_idx);
+	return 0;
+}
+
+static int
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+			struct rte_flow_action_conntrack *profile,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+
+	if (owner != PORT_ID(priv))
+		return rte_flow_error_set(error, EACCES,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Can't query CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT query index");
+	}
+	profile->peer_port = ct->peer;
+	profile->is_original_dir = ct->is_original;
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+		return rte_flow_error_set(error, EIO,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Failed to query CT context");
+	return 0;
+}
+
+
+static int
+flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_modify_conntrack *action_conf,
+			 uint32_t idx, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	const struct rte_flow_action_conntrack *new_prf;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+	int ret = 0;
+
+	if (PORT_ID(priv) != owner)
+		return rte_flow_error_set(error, EACCES,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "Can't update CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT update index");
+	}
+	new_prf = &action_conf->new_ct;
+	if (action_conf->direction)
+		ct->is_original = !!new_prf->is_original_dir;
+	if (action_conf->state) {
+		/* Only validate the profile when it needs to be updated. */
+		ret = mlx5_validate_action_ct(dev, new_prf, error);
+		if (ret)
+			return ret;
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		if (ret)
+			return rte_flow_error_set(error, EIO,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL,
+					"Failed to send CT context update WQE");
+		if (queue != MLX5_HW_INV_QUEUE)
+			return 0;
+		/* Block until ready or a failure in synchronous mode. */
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret)
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+	}
+	return ret;
+}
+
+static struct rte_flow_action_handle *
+flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action_conntrack *pro,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint32_t ct_idx = 0;
+	int ret;
+	bool async = !!(queue != MLX5_HW_INV_QUEUE);
+
+	if (!pool) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "CT is not enabled");
+		return 0;
+	}
+	ct = mlx5_ipool_zmalloc(pool->cts, &ct_idx);
+	if (!ct) {
+		rte_flow_error_set(error, rte_errno,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to allocate CT object");
+		return 0;
+	}
+	ct->offset = ct_idx - 1;
+	ct->is_original = !!pro->is_original_dir;
+	ct->peer = pro->peer_port;
+	ct->pool = pool;
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+		mlx5_ipool_free(pool->cts, ct_idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
+	if (!async) {
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret) {
+			mlx5_ipool_free(pool->cts, ct_idx);
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+			return 0;
+		}
+	}
+	return (struct rte_flow_action_handle *)(uintptr_t)
+		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
+}
+
 /**
  * Create shared action.
  *
@@ -5103,6 +5463,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			handle = (struct rte_flow_action_handle *)
 				 (uintptr_t)cnt_id;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5138,10 +5501,18 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_update(dev, handle, update, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	default:
+		return flow_dv_action_update(dev, handle, update, error);
+	}
 }
 
 /**
@@ -5180,6 +5551,8 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_destroy(dev, act_idx, error);
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -5333,6 +5706,8 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_query(dev, act_idx, data, error);
 	default:
 		return flow_dv_action_query(dev, handle, data, error);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 11/17] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (9 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 10/17] net/mlx5: add HW steering connection tracking support Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 12/17] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

Add PMD implementation for HW steering VLAN push, pop and modify flow
actions.

HWS VLAN push flow action is triggered by a sequence of mandatory
OF_PUSH_VLAN, OF_SET_VLAN_VID and optional OF_SET_VLAN_PCP
flow actions commands.
The commands must be arranged in the exact order:
OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ].
In masked HWS VLAN push flow action template *ALL* the above flow
actions must be masked.
In non-masked HWS VLAN push flow action template *ALL* the above flow
actions must not be masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan / \
  of_set_vlan_vid \
  [ / of_set_vlan_pcp  ] / end \
mask \
  of_push_vlan ethertype 0 / \
  of_set_vlan_vid vlan_vid 0 \
  [ / of_set_vlan_pcp vlan_pcp 0 ] / end\

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan ethertype <E>/ \
  of_set_vlan_vid vlan_vid <VID>\
  [ / of_set_vlan_pcp  <PCP>] / end \
mask \
  of_push_vlan ethertype <type != 0> / \
  of_set_vlan_vid vlan_vid <vid_mask != 0>\
  [ / of_set_vlan_pcp vlan_pcp <pcp_mask != 0> ] / end\

HWS VLAN pop flow action is triggered by OF_POP_VLAN
flow action command.
HWS VLAN pop action template is always non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_pop_vlan / end mask of_pop_vlan / end

HWS VLAN VID modify flow action is triggered by a standalone
OF_SET_VLAN_VID flow action command.
HWS VLAN VID modify action template can be ether masked or non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid / end mask of_set_vlan_vid vlan_vid 0 / end

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid vlan_vid 0x101 / end \
mask of_set_vlan_vid vlan_vid 0xffff / end

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5.h         |   2 +
 drivers/net/mlx5/mlx5_flow.h    |   4 +
 drivers/net/mlx5/mlx5_flow_dv.c |   2 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 492 +++++++++++++++++++++++++++++---
 4 files changed, 463 insertions(+), 37 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0578a41675..7ec5f6a352 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1665,6 +1665,8 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action *hw_push_vlan[MLX5DR_TABLE_TYPE_MAX];
+	struct mlx5dr_action *hw_pop_vlan[MLX5DR_TABLE_TYPE_MAX];
 	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
 	struct mlx5dr_action *hw_drop[2];
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index f75a56a57b..6d928b477e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2435,4 +2435,8 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		struct rte_flow_error *error);
 int flow_hw_table_update(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error);
+int mlx5_flow_item_field_width(struct rte_eth_dev *dev,
+			   enum rte_flow_field_id field, int inherit,
+			   const struct rte_flow_attr *attr,
+			   struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 9bede7c04f..7f81272150 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1326,7 +1326,7 @@ flow_dv_convert_action_modify_ipv6_dscp
 					     MLX5_MODIFICATION_TYPE_SET, error);
 }
 
-static int
+int
 mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 			   enum rte_flow_field_id field, int inherit,
 			   const struct rte_flow_attr *attr,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index f4340c475d..71a134f224 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -44,12 +44,22 @@
 /* Lowest priority for HW non-root table. */
 #define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
+#define MLX5_HW_VLAN_PUSH_TYPE_IDX 0
+#define MLX5_HW_VLAN_PUSH_VID_IDX 1
+#define MLX5_HW_VLAN_PUSH_PCP_IDX 2
+
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
 static int flow_hw_translate_group(struct rte_eth_dev *dev,
 				   const struct mlx5_flow_template_table_cfg *cfg,
 				   uint32_t group,
 				   uint32_t *table_group,
 				   struct rte_flow_error *error);
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -1065,6 +1075,52 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	return 0;
 }
 
+static __rte_always_inline bool
+is_of_vlan_pcp_present(const struct rte_flow_action *actions)
+{
+	/*
+	 * Order of RTE VLAN push actions is
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	return actions[MLX5_HW_VLAN_PUSH_PCP_IDX].type ==
+		RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP;
+}
+
+static __rte_always_inline bool
+is_template_masked_push_vlan(const struct rte_flow_action_of_push_vlan *mask)
+{
+	/*
+	 * In masked push VLAN template all RTE push actions are masked.
+	 */
+	return mask && mask->ethertype != 0;
+}
+
+static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
+{
+/*
+ * OpenFlow Switch Specification defines 801.1q VID as 12+1 bits.
+ */
+	rte_be32_t type, vid, pcp;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	rte_be32_t vid_lo, vid_hi;
+#endif
+
+	type = ((const struct rte_flow_action_of_push_vlan *)
+		actions[MLX5_HW_VLAN_PUSH_TYPE_IDX].conf)->ethertype;
+	vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+		actions[MLX5_HW_VLAN_PUSH_VID_IDX].conf)->vlan_vid;
+	pcp = is_of_vlan_pcp_present(actions) ?
+	      ((const struct rte_flow_action_of_set_vlan_pcp *)
+		      actions[MLX5_HW_VLAN_PUSH_PCP_IDX].conf)->vlan_pcp : 0;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	vid_hi = vid & 0xff;
+	vid_lo = vid >> 8;
+	return (((vid_lo << 8) | (pcp << 5) | vid_hi) << 16) | type;
+#else
+	return (type << 16) | (pcp << 13) | vid;
+#endif
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1167,6 +1223,26 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_push_vlan[type];
+			if (is_template_masked_push_vlan(masks->conf))
+				acts->rule_acts[action_pos].push_vlan.vlan_hdr =
+					vlan_hdr_to_be32(actions);
+			else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos))
+				goto err;
+			actions += is_of_vlan_pcp_present(actions) ?
+					MLX5_HW_VLAN_PUSH_PCP_IDX :
+					MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_pop_vlan[type];
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			action_pos = at->actions_off[actions - at->actions];
 			if (masks->conf &&
@@ -1784,8 +1860,17 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
-		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
-			    (int)action->type == act_data->type);
+		/*
+		 * action template construction replaces
+		 * OF_SET_VLAN_VID with MODIFY_FIELD
+		 */
+		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+			MLX5_ASSERT(act_data->type ==
+				    RTE_FLOW_ACTION_TYPE_MODIFY_FIELD);
+		else
+			MLX5_ASSERT(action->type ==
+				    RTE_FLOW_ACTION_TYPE_INDIRECT ||
+				    (int)action->type == act_data->type);
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
@@ -1801,6 +1886,10 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			      (action->conf))->id);
 			rule_acts[act_data->action_dst].tag.value = tag;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			rule_acts[act_data->action_dst].push_vlan.vlan_hdr =
+				vlan_hdr_to_be32(action);
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
@@ -1852,10 +1941,16 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				    act_data->encap.len);
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			ret = flow_hw_modify_field_construct(job,
-							     act_data,
-							     hw_acts,
-							     action);
+			if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+				ret = flow_hw_set_vlan_vid_construct(dev, job,
+								     act_data,
+								     hw_acts,
+								     action);
+			else
+				ret = flow_hw_modify_field_construct(job,
+								     act_data,
+								     hw_acts,
+								     action);
 			if (ret)
 				return -1;
 			break;
@@ -2559,9 +2654,14 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			mlx5_ipool_destroy(tbl->flow);
 		mlx5_free(tbl);
 	}
-	rte_flow_error_set(error, err,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
-			  "fail to create rte table");
+	if (error != NULL) {
+		rte_flow_error_set(error, err,
+				error->type == RTE_FLOW_ERROR_TYPE_NONE ?
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED : error->type,
+				NULL,
+				error->message == NULL ?
+				"fail to create rte table" : error->message);
+	}
 	return NULL;
 }
 
@@ -2865,28 +2965,76 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 				uint16_t *ins_pos)
 {
 	uint16_t idx, total = 0;
-	bool ins = false;
+	uint16_t end_idx = UINT16_MAX;
 	bool act_end = false;
+	bool modify_field = false;
+	bool rss_or_queue = false;
 
 	MLX5_ASSERT(actions && masks);
 	MLX5_ASSERT(new_actions && new_masks);
 	MLX5_ASSERT(ins_actions && ins_masks);
 	for (idx = 0; !act_end; idx++) {
-		if (idx >= MLX5_HW_MAX_ACTS)
-			return -1;
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
-		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
-			ins = true;
-			*ins_pos = idx;
-		}
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* It is assumed that application provided only single RSS/QUEUE action. */
+			MLX5_ASSERT(!rss_or_queue);
+			rss_or_queue = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			modify_field = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			end_idx = idx;
 			act_end = true;
+			break;
+		default:
+			break;
+		}
 	}
-	if (!ins)
+	if (!rss_or_queue)
 		return 0;
-	else if (idx == MLX5_HW_MAX_ACTS)
+	else if (idx >= MLX5_HW_MAX_ACTS)
 		return -1; /* No more space. */
 	total = idx;
+	/*
+	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
+	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
+	 * first MODIFY_FIELD flow action.
+	 */
+	if (modify_field) {
+		*ins_pos = end_idx;
+		goto insert_meta_copy;
+	}
+	/*
+	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
+	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	 */
+	act_end = false;
+	for (idx = 0; !act_end; idx++) {
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+		case RTE_FLOW_ACTION_TYPE_METER:
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			*ins_pos = idx;
+			act_end = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			act_end = true;
+			break;
+		default:
+			break;
+		}
+	}
+insert_meta_copy:
+	MLX5_ASSERT(*ins_pos != UINT16_MAX);
+	MLX5_ASSERT(*ins_pos < total);
 	/* Before the position, no change for the actions. */
 	for (idx = 0; idx < *ins_pos; idx++) {
 		new_actions[idx] = actions[idx];
@@ -2903,6 +3051,73 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 	return 0;
 }
 
+static int
+flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
+				  const
+				  struct rte_flow_actions_template_attr *attr,
+				  const struct rte_flow_action *action,
+				  const struct rte_flow_action *mask,
+				  struct rte_flow_error *error)
+{
+#define X_FIELD(ptr, t, f) (((ptr)->conf) && ((t *)((ptr)->conf))->f)
+
+	const bool masked_push =
+		X_FIELD(mask + MLX5_HW_VLAN_PUSH_TYPE_IDX,
+			const struct rte_flow_action_of_push_vlan, ethertype);
+	bool masked_param;
+
+	/*
+	 * Mandatory actions order:
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+	/* Check that mark matches OF_PUSH_VLAN */
+	if (mask[MLX5_HW_VLAN_PUSH_TYPE_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: mask does not match");
+	/* Check that the second template and mask items are SET_VLAN_VID */
+	if (action[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID ||
+	    mask[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: invalid actions order");
+	masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_VID_IDX,
+			       const struct rte_flow_action_of_set_vlan_vid,
+			       vlan_vid);
+	/*
+	 * PMD requires OF_SET_VLAN_VID mask to must match OF_PUSH_VLAN
+	 */
+	if (masked_push ^ masked_param)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "OF_SET_VLAN_VID: mask does not match OF_PUSH_VLAN");
+	if (is_of_vlan_pcp_present(action)) {
+		if (mask[MLX5_HW_VLAN_PUSH_PCP_IDX].type !=
+		     RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "OF_SET_VLAN_PCP: missing mask configuration");
+		masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_PCP_IDX,
+				       const struct
+				       rte_flow_action_of_set_vlan_pcp,
+				       vlan_pcp);
+		/*
+		 * PMD requires OF_SET_VLAN_PCP mask to must match OF_PUSH_VLAN
+		 */
+		if (masked_push ^ masked_param)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION, action,
+						  "OF_SET_VLAN_PCP: mask does not match OF_PUSH_VLAN");
+	}
+	return 0;
+#undef X_FIELD
+}
+
 static int
 flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
@@ -2993,6 +3208,18 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			ret = flow_hw_validate_action_push_vlan
+					(dev, attr, action, mask, error);
+			if (ret != 0)
+				return ret;
+			i += is_of_vlan_pcp_present(action) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -3020,6 +3247,8 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
+	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
+	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
 };
 
 static int
@@ -3136,6 +3365,14 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				goto err_actions_num;
 			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			i += is_of_vlan_pcp_present(at->actions + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3163,6 +3400,89 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	return NULL;
 }
 
+static void
+flow_hw_set_vlan_vid(struct rte_eth_dev *dev,
+		     struct rte_flow_action *ra,
+		     struct rte_flow_action *rm,
+		     struct rte_flow_action_modify_field *spec,
+		     struct rte_flow_action_modify_field *mask,
+		     int set_vlan_vid_ix)
+{
+	struct rte_flow_error error;
+	const bool masked = rm[set_vlan_vid_ix].conf &&
+		(((const struct rte_flow_action_of_set_vlan_vid *)
+			rm[set_vlan_vid_ix].conf)->vlan_vid != 0);
+	const struct rte_flow_action_of_set_vlan_vid *conf =
+		ra[set_vlan_vid_ix].conf;
+	rte_be16_t vid = masked ? conf->vlan_vid : 0;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	*spec = (typeof(*spec)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	*mask = (typeof(*mask)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0xffffffff, .offset = 0xffffffff,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = masked ? (1U << width) - 1 : 0,
+			.offset = 0,
+		},
+		.width = 0xffffffff,
+	};
+	ra[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	ra[set_vlan_vid_ix].conf = spec;
+	rm[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	rm[set_vlan_vid_ix].conf = mask;
+}
+
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	struct rte_flow_error error;
+	rte_be16_t vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+			   action->conf)->vlan_vid;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	struct rte_flow_action_modify_field conf = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	struct rte_flow_action modify_action = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &conf
+	};
+
+	return flow_hw_modify_field_construct(job, act_data, hw_acts,
+					      &modify_action);
+}
+
 /**
  * Create flow action template.
  *
@@ -3188,14 +3508,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_num, act_len, mask_len;
+	int len, act_len, mask_len;
+	unsigned int act_num;
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
-	uint16_t pos = MLX5_HW_MAX_ACTS;
+	uint16_t pos = UINT16_MAX;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
-	const struct rte_flow_action *ra;
-	const struct rte_flow_action *rm;
+	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
+	struct rte_flow_action *rm = (void *)(uintptr_t)masks;
+	int set_vlan_vid_ix = -1;
+	struct rte_flow_action_modify_field set_vlan_vid_spec = {0, };
+	struct rte_flow_action_modify_field set_vlan_vid_mask = {0, };
 	const struct rte_flow_action_modify_field rx_mreg = {
 		.operation = RTE_FLOW_MODIFY_SET,
 		.dst = {
@@ -3235,21 +3559,58 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
+		/* Application should make sure only one Q/RSS exist in one rule. */
 		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
 						    tmp_action, tmp_mask, &pos)) {
 			rte_flow_error_set(error, EINVAL,
 					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					   "Failed to concatenate new action/mask");
 			return NULL;
+		} else if (pos != UINT16_MAX) {
+			ra = tmp_action;
+			rm = tmp_mask;
 		}
 	}
-	/* Application should make sure only one Q/RSS exist in one rule. */
-	if (pos == MLX5_HW_MAX_ACTS) {
-		ra = actions;
-		rm = masks;
-	} else {
-		ra = tmp_action;
-		rm = tmp_mask;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		switch (ra[i].type) {
+		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			i += is_of_vlan_pcp_present(ra + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			set_vlan_vid_ix = i;
+			break;
+		default:
+			break;
+		}
+	}
+	/*
+	 * Count flow actions to allocate required space for storing DR offsets and to check
+	 * if temporary buffer would not be overrun.
+	 */
+	act_num = i + 1;
+	if (act_num >= MLX5_HW_MAX_ACTS) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
+		return NULL;
+	}
+	if (set_vlan_vid_ix != -1) {
+		/* If temporary action buffer was not used, copy template actions to it */
+		if (ra == actions && rm == masks) {
+			for (i = 0; i < act_num; ++i) {
+				tmp_action[i] = actions[i];
+				tmp_mask[i] = masks[i];
+				if (actions[i].type == RTE_FLOW_ACTION_TYPE_END)
+					break;
+			}
+			ra = tmp_action;
+			rm = tmp_mask;
+		}
+		flow_hw_set_vlan_vid(dev, ra, rm,
+				     &set_vlan_vid_spec, &set_vlan_vid_mask,
+				     set_vlan_vid_ix);
 	}
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
@@ -3259,10 +3620,6 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	/* Count flow actions to allocate required space for storing DR offsets. */
-	act_num = 0;
-	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
-		act_num++;
 	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
@@ -4513,7 +4870,11 @@ flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
 		.attr = tx_tbl_attr,
 		.external = false,
 	};
-	struct rte_flow_error drop_err;
+	struct rte_flow_error drop_err = {
+		.type = RTE_FLOW_ERROR_TYPE_NONE,
+		.cause = NULL,
+		.message = NULL,
+	};
 
 	RTE_SET_USED(drop_err);
 	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
@@ -4794,6 +5155,60 @@ flow_hw_ct_pool_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+static void
+flow_hw_destroy_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i < MLX5DR_TABLE_TYPE_MAX; i++) {
+		if (priv->hw_pop_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_pop_vlan[i]);
+			priv->hw_pop_vlan[i] = NULL;
+		}
+		if (priv->hw_push_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_push_vlan[i]);
+			priv->hw_push_vlan[i] = NULL;
+		}
+	}
+}
+
+static int
+flow_hw_create_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+	const enum mlx5dr_action_flags flags[MLX5DR_TABLE_TYPE_MAX] = {
+		MLX5DR_ACTION_FLAG_HWS_RX,
+		MLX5DR_ACTION_FLAG_HWS_TX,
+		MLX5DR_ACTION_FLAG_HWS_FDB
+	};
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i <= MLX5DR_TABLE_TYPE_NIC_TX; i++) {
+		priv->hw_pop_vlan[i] =
+			mlx5dr_action_create_pop_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+		priv->hw_push_vlan[i] =
+			mlx5dr_action_create_push_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+	}
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_pop_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+		priv->hw_push_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_push_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+	}
+	return 0;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4999,6 +5414,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	ret = flow_hw_create_vlan(dev);
+	if (ret)
+		goto err;
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -5016,6 +5434,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
 	mlx5_free(priv->hw_q);
@@ -5075,6 +5494,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 12/17] net/mlx5: implement METER MARK indirect action for HWS
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (10 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 11/17] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 13/17] net/mlx5: add HWS AGE action support Suanming Mou
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Add ability to create an indirect action handle for METER_MARK.
It allows to share one Meter between several different actions.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5.c            |   4 +-
 drivers/net/mlx5/mlx5.h            |  33 ++-
 drivers/net/mlx5/mlx5_flow.c       |   6 +
 drivers/net/mlx5/mlx5_flow.h       |  19 +-
 drivers/net/mlx5/mlx5_flow_aso.c   | 139 +++++++--
 drivers/net/mlx5/mlx5_flow_dv.c    | 145 +++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c    | 438 +++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c |  79 +++++-
 8 files changed, 764 insertions(+), 99 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 925e19bcd5..383a789dfa 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -442,7 +442,7 @@ mlx5_flow_aso_age_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT, 1);
 	if (err) {
 		mlx5_free(sh->aso_age_mng);
 		return -1;
@@ -763,7 +763,7 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING, MLX5_ASO_CT_SQ_NUM);
 	if (err) {
 		mlx5_free(sh->ct_mng);
 		/* rte_errno should be extracted from the failure. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 7ec5f6a352..89dc8441dc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -971,12 +971,16 @@ enum mlx5_aso_mtr_type {
 
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
-	LIST_ENTRY(mlx5_aso_mtr) next;
+	union {
+		LIST_ENTRY(mlx5_aso_mtr) next;
+		struct mlx5_aso_mtr_pool *pool;
+	};
 	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
 	uint32_t offset;
+	enum rte_color init_color;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -985,7 +989,11 @@ struct mlx5_aso_mtr_pool {
 	/*Must be the first in pool*/
 	struct mlx5_devx_obj *devx_obj;
 	/* The devx object of the minimum aso flow meter ID. */
+	struct mlx5dr_action *action; /* HWS action. */
+	struct mlx5_indexed_pool *idx_pool; /* HWS index pool. */
 	uint32_t index; /* Pool index in management structure. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
+	struct mlx5_aso_sq *sq; /* ASO SQs. */
 };
 
 LIST_HEAD(aso_meter_list, mlx5_aso_mtr);
@@ -1678,6 +1686,7 @@ struct mlx5_priv {
 	struct mlx5_aso_ct_pools_mng *ct_mng;
 	/* Management data for ASO connection tracking. */
 	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
+	struct mlx5_aso_mtr_pool *hws_mpool; /* HW steering's Meter pool. */
 #endif
 };
 
@@ -1998,7 +2007,8 @@ void mlx5_pmd_socket_uninit(void);
 int mlx5_flow_meter_init(struct rte_eth_dev *dev,
 			 uint32_t nb_meters,
 			 uint32_t nb_meter_profiles,
-			 uint32_t nb_meter_policies);
+			 uint32_t nb_meter_policies,
+			 uint32_t nb_queues);
 void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
@@ -2067,15 +2077,24 @@ eth_tx_burst_t mlx5_select_tx_function(struct rte_eth_dev *dev);
 
 /* mlx5_flow_aso.c */
 
+int mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_mtr_pool *hws_pool,
+			    struct mlx5_aso_mtr_pools_mng *pool_mng,
+			    uint32_t nb_queues);
+void mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			       struct mlx5_aso_mtr_pool *hws_pool,
+			       struct mlx5_aso_mtr_pools_mng *pool_mng);
 int mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
+			enum mlx5_access_aso_opc_mod aso_opc_mode,
+			uint32_t nb_queues);
 int mlx5_aso_flow_hit_queue_poll_start(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
-int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
-int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+			   enum mlx5_access_aso_opc_mod aso_opc_mod);
+int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
+				 struct mlx5_aso_mtr *mtr,
+				 struct mlx5_mtr_bulk *bulk);
+int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index cbf9c31984..9627ffc979 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -4221,6 +4221,12 @@ flow_action_handles_translate(struct rte_eth_dev *dev,
 						MLX5_RTE_FLOW_ACTION_TYPE_COUNT;
 			translated[handle->index].conf = (void *)(uintptr_t)idx;
 			break;
+		case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+			translated[handle->index].type =
+						(enum rte_flow_action_type)
+						MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK;
+			translated[handle->index].conf = (void *)(uintptr_t)idx;
+			break;
 		case MLX5_INDIRECT_ACTION_TYPE_AGE:
 			if (priv->sh->flow_hit_aso_en) {
 				translated[handle->index].type =
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 6d928b477e..ffa4f28255 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -46,6 +46,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
 	MLX5_RTE_FLOW_ACTION_TYPE_JUMP,
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
+	MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
 };
 
 /* Private (internal) Field IDs for MODIFY_FIELD action. */
@@ -54,22 +55,23 @@ enum mlx5_rte_flow_field_id {
 			MLX5_RTE_FLOW_FIELD_META_REG,
 };
 
-#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
+#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
 	MLX5_INDIRECT_ACTION_TYPE_COUNT,
 	MLX5_INDIRECT_ACTION_TYPE_CT,
+	MLX5_INDIRECT_ACTION_TYPE_METER_MARK,
 };
 
-/* Now, the maximal ports will be supported is 256, action number is 4M. */
-#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x100
+/* Now, the maximal ports will be supported is 16, action number is 32M. */
+#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x10
 
 #define MLX5_INDIRECT_ACT_CT_OWNER_SHIFT 22
 #define MLX5_INDIRECT_ACT_CT_OWNER_MASK (MLX5_INDIRECT_ACT_CT_MAX_PORT - 1)
 
-/* 30-31: type, 22-29: owner port, 0-21: index. */
+/* 29-31: type, 25-28: owner port, 0-24: index */
 #define MLX5_INDIRECT_ACT_CT_GEN_IDX(owner, index) \
 	((MLX5_INDIRECT_ACTION_TYPE_CT << MLX5_INDIRECT_ACTION_TYPE_OFFSET) | \
 	 (((owner) & MLX5_INDIRECT_ACT_CT_OWNER_MASK) << \
@@ -207,6 +209,9 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ITEM_PORT_REPRESENTOR (UINT64_C(1) << 41)
 #define MLX5_FLOW_ITEM_REPRESENTED_PORT (UINT64_C(1) << 42)
 
+/* Meter color item */
+#define MLX5_FLOW_ITEM_METER_COLOR (UINT64_C(1) << 44)
+
 /* Outer Masks. */
 #define MLX5_FLOW_LAYER_OUTER_L3 \
 	(MLX5_FLOW_LAYER_OUTER_L3_IPV4 | MLX5_FLOW_LAYER_OUTER_L3_IPV6)
@@ -1108,6 +1113,7 @@ struct rte_flow_hw {
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	struct mlx5dr_rule rule; /* HWS layer data struct. */
 	uint32_t cnt_id;
+	uint32_t mtr_id;
 } __rte_packed;
 
 /* rte flow action translate to DR action struct. */
@@ -1154,6 +1160,9 @@ struct mlx5_action_construct_data {
 		struct {
 			uint32_t id;
 		} shared_counter;
+		struct {
+			uint32_t id;
+		} shared_meter;
 	};
 };
 
@@ -1237,6 +1246,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
+	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
@@ -1524,6 +1534,7 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 */
 		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
 		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index c00c07b891..f371fff2e2 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -275,6 +275,65 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	return -1;
 }
 
+void
+mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			  struct mlx5_aso_mtr_pool *hws_pool,
+			  struct mlx5_aso_mtr_pools_mng *pool_mng)
+{
+	uint32_t i;
+
+	if (hws_pool) {
+		for (i = 0; i < hws_pool->nb_sq; i++)
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+		mlx5_free(hws_pool->sq);
+		return;
+	}
+	if (pool_mng)
+		mlx5_aso_destroy_sq(&pool_mng->sq);
+}
+
+int
+mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+				struct mlx5_aso_mtr_pool *hws_pool,
+				struct mlx5_aso_mtr_pools_mng *pool_mng,
+				uint32_t nb_queues)
+{
+	struct mlx5_common_device *cdev = sh->cdev;
+	struct mlx5_aso_sq *sq;
+	uint32_t i;
+
+	if (hws_pool) {
+		sq = mlx5_malloc(MLX5_MEM_ZERO,
+			sizeof(struct mlx5_aso_sq) * nb_queues,
+			RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!sq)
+			return -1;
+		hws_pool->sq = sq;
+		for (i = 0; i < nb_queues; i++) {
+			if (mlx5_aso_sq_create(cdev, hws_pool->sq + i,
+					       sh->tx_uar.obj,
+					       MLX5_ASO_QUEUE_LOG_DESC))
+				goto error;
+			mlx5_aso_mtr_init_sq(hws_pool->sq + i);
+		}
+		hws_pool->nb_sq = nb_queues;
+	}
+	if (pool_mng) {
+		if (mlx5_aso_sq_create(cdev, &pool_mng->sq,
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			return -1;
+		mlx5_aso_mtr_init_sq(&pool_mng->sq);
+	}
+	return 0;
+error:
+	do {
+		if (&hws_pool->sq[i])
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+	} while (i--);
+	return -1;
+}
+
 /**
  * API to create and initialize Send Queue used for ASO access.
  *
@@ -282,13 +341,16 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
  *   Pointer to shared device context.
  * @param[in] aso_opc_mod
  *   Mode of ASO feature.
+ * @param[in] nb_queues
+ *   Number of Send Queues to create.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		    enum mlx5_access_aso_opc_mod aso_opc_mod)
+		    enum mlx5_access_aso_opc_mod aso_opc_mod,
+			uint32_t nb_queues)
 {
 	uint32_t sq_desc_n = 1 << MLX5_ASO_QUEUE_LOG_DESC;
 	struct mlx5_common_device *cdev = sh->cdev;
@@ -307,10 +369,9 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_age_init_sq(&sh->aso_age_mng->aso_sq);
 		break;
 	case ASO_OPC_MOD_POLICER:
-		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
+		if (mlx5_aso_mtr_queue_init(sh, NULL,
+					    &sh->mtrmng->pools_mng, nb_queues))
 			return -1;
-		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
@@ -343,7 +404,7 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->aso_age_mng->aso_sq;
 		break;
 	case ASO_OPC_MOD_POLICER:
-		sq = &sh->mtrmng->pools_mng.sq;
+		mlx5_aso_mtr_queue_uninit(sh, NULL, &sh->mtrmng->pools_mng);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
@@ -666,7 +727,8 @@ static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
-			       struct mlx5_mtr_bulk *bulk)
+			       struct mlx5_mtr_bulk *bulk,
+				   bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -679,11 +741,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t param_le;
 	int id;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return 0;
 	}
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
@@ -692,8 +756,11 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
-		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-				    mtrs[aso_mtr->offset]);
+		if (likely(sh->config.dv_flow_en == 2))
+			pool = aso_mtr->pool;
+		else
+			pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+					    mtrs[aso_mtr->offset]);
 		id = pool->devx_obj->id;
 	} else {
 		id = bulk->devx_obj->id;
@@ -756,7 +823,8 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -779,7 +847,7 @@ mlx5_aso_mtrs_status_update(struct mlx5_aso_sq *sq, uint16_t aso_mtrs_nums)
 }
 
 static void
-mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
+mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 {
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
@@ -791,7 +859,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
 		rte_spinlock_unlock(&sq->sqsl);
@@ -823,7 +892,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /**
@@ -840,16 +910,30 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
 			struct mlx5_mtr_bulk *bulk)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2)) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
+						   bulk, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -873,17 +957,30 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2)) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 		return 0;
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
 		if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 			return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 7f81272150..a42eb99154 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1387,6 +1387,7 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 		return inherit < 0 ? 0 : inherit;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+	case RTE_FLOW_FIELD_METER_COLOR:
 		return 2;
 	default:
 		MLX5_ASSERT(false);
@@ -1856,6 +1857,31 @@ mlx5_flow_field_id_to_modify_info
 				info[idx].offset = data->offset;
 		}
 		break;
+	case RTE_FLOW_FIELD_METER_COLOR:
+		{
+			const uint32_t color_mask =
+				(UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = flow_hw_get_reg_id
+					(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						       0, error);
+			if (reg < 0)
+				return;
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT((unsigned int)reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0,
+						reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, color_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -1913,7 +1939,9 @@ flow_dv_convert_action_modify_field
 		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
 					(void *)(uintptr_t)conf->src.pvalue :
 					(void *)(uintptr_t)&conf->src.value;
-		if (conf->dst.field == RTE_FLOW_FIELD_META) {
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR) {
 			meta = *(const unaligned_uint32_t *)item.spec;
 			meta = rte_cpu_to_be_32(meta);
 			item.spec = &meta;
@@ -3687,6 +3715,69 @@ flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Validate METER_COLOR item.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] item
+ *   Item specification.
+ * @param[in] attr
+ *   Attributes of flow that includes this item.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_validate_item_meter_color(struct rte_eth_dev *dev,
+			   const struct rte_flow_item *item,
+			   const struct rte_flow_attr *attr __rte_unused,
+			   struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_item_meter_color *spec = item->spec;
+	const struct rte_flow_item_meter_color *mask = item->mask;
+	struct rte_flow_item_meter_color nic_mask = {
+		.color = RTE_COLORS
+	};
+	int ret;
+
+	if (priv->mtr_color_reg == REG_NON)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ITEM, item,
+					  "meter color register"
+					  " isn't available");
+	ret = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, error);
+	if (ret < 0)
+		return ret;
+	if (!spec)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+					  item->spec,
+					  "data cannot be empty");
+	if (spec->color > RTE_COLORS)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &spec->color,
+					  "meter color is invalid");
+	if (!mask)
+		mask = &rte_flow_item_meter_color_mask;
+	if (!mask->color)
+		return rte_flow_error_set(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ITEM_SPEC, NULL,
+					"mask cannot be zero");
+
+	ret = mlx5_flow_item_acceptable(item, (const uint8_t *)mask,
+				(const uint8_t *)&nic_mask,
+				sizeof(struct rte_flow_item_meter_color),
+				MLX5_ITEM_RANGE_NOT_ACCEPTED, error);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
 int
 flow_dv_encap_decap_match_cb(void *tool_ctx __rte_unused,
 			     struct mlx5_list_entry *entry, void *cb_ctx)
@@ -6519,7 +6610,7 @@ flow_dv_mtr_container_resize(struct rte_eth_dev *dev)
 		return -ENOMEM;
 	}
 	if (!pools_mng->n)
-		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER, 1)) {
 			mlx5_free(pools);
 			return -ENOMEM;
 		}
@@ -7421,6 +7512,13 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+			ret = flow_dv_validate_item_meter_color(dev, items,
+								attr, error);
+			if (ret < 0)
+				return ret;
+			last_item = MLX5_FLOW_ITEM_METER_COLOR;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -10508,6 +10606,45 @@ flow_dv_translate_item_flex(struct rte_eth_dev *dev, void *matcher, void *key,
 	mlx5_flex_flow_translate_item(dev, matcher, key, item, is_inner);
 }
 
+/**
+ * Add METER_COLOR item to matcher
+ *
+ * @param[in] dev
+ *   The device to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ */
+static void
+flow_dv_translate_item_meter_color(struct rte_eth_dev *dev, void *key,
+			    const struct rte_flow_item *item,
+			    uint32_t key_type)
+{
+	const struct rte_flow_item_meter_color *color_m = item->mask;
+	const struct rte_flow_item_meter_color *color_v = item->spec;
+	uint32_t value, mask;
+	int reg = REG_NON;
+
+	MLX5_ASSERT(color_v);
+	if (MLX5_ITEM_VALID(item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(item, key_type, color_v, color_m,
+		&rte_flow_item_meter_color_mask);
+	value = rte_col_2_mlx5_col(color_v->color);
+	mask = color_m ?
+		color_m->color : (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+	if (reg == REG_NON)
+		return;
+	flow_dv_match_meta_reg(key, (enum modify_reg)reg, value, mask);
+}
+
 static uint32_t matcher_zero[MLX5_ST_SZ_DW(fte_match_param)] = { 0 };
 
 #define HEADER_IS_ZERO(match_criteria, headers)				     \
@@ -13260,6 +13397,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		/* No other protocol should follow eCPRI layer. */
 		last_item = MLX5_FLOW_LAYER_ECPRI;
 		break;
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		flow_dv_translate_item_meter_color(dev, key, items, key_type);
+		last_item = MLX5_FLOW_ITEM_METER_COLOR;
+		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 71a134f224..d498d203d5 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -412,6 +412,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
 		acts->cnt_id = 0;
 	}
+	if (acts->mtr_id) {
+		mlx5_ipool_free(priv->hws_mpool->idx_pool, acts->mtr_id);
+		acts->mtr_id = 0;
+	}
 }
 
 /**
@@ -628,6 +632,42 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared meter_mark action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] mtr_id
+ *   Shared meter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_mtr_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t mtr_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_meter.id = mtr_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
 
 /**
  * Translate shared indirect action.
@@ -682,6 +722,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 				       idx, &acts->rule_acts[action_dst]))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		if (__flow_hw_act_data_shared_mtr_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
+			action_src, action_dst, idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -888,6 +935,7 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
 		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
@@ -1047,7 +1095,7 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
+	if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 		return -ENOMEM;
 	return 0;
 }
@@ -1121,6 +1169,74 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 #endif
 }
 
+static __rte_always_inline struct mlx5_aso_mtr *
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
+			   const struct rte_flow_action *action,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_action_meter_mark *meter_mark = action->conf;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t mtr_id;
+
+	aso_mtr = mlx5_ipool_malloc(priv->hws_mpool->idx_pool, &mtr_id);
+	if (!aso_mtr)
+		return NULL;
+	/* Fill the flow meter parameters. */
+	aso_mtr->type = ASO_METER_INDIRECT;
+	fm = &aso_mtr->fm;
+	fm->meter_id = mtr_id;
+	fm->profile = (struct mlx5_flow_meter_profile *)(meter_mark->profile);
+	fm->is_enable = meter_mark->state;
+	fm->color_aware = meter_mark->color_mode;
+	aso_mtr->pool = pool;
+	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->offset = mtr_id - 1;
+	aso_mtr->init_color = (meter_mark->color_mode) ?
+		meter_mark->init_color : RTE_COLOR_GREEN;
+	/* Update ASO flow meter by wqe. */
+	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+					 &priv->mtr_bulk)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	/* Wait for ASO object completion. */
+	if (queue == MLX5_HW_INV_QUEUE &&
+	    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	return aso_mtr;
+}
+
+static __rte_always_inline int
+flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
+			   uint16_t aso_mtr_pos,
+			   const struct rte_flow_action *action,
+			   struct mlx5dr_rule_action *acts,
+			   uint32_t *index,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+
+	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	if (!aso_mtr)
+		return -1;
+
+	/* Compile METER_MARK action */
+	acts[aso_mtr_pos].action = pool->action;
+	acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts[aso_mtr_pos].aso_meter.init_color =
+		(enum mlx5dr_action_aso_meter_color)
+		rte_col_2_mlx5_col(aso_mtr->init_color);
+	*index = aso_mtr->fm.meter_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1428,6 +1544,24 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			action_pos = at->actions_off[actions - at->actions];
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter_mark *)
+			     masks->conf)->profile) {
+				err = flow_hw_meter_mark_compile(dev,
+							action_pos, actions,
+							acts->rule_acts,
+							&acts->mtr_id,
+							MLX5_HW_INV_QUEUE);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1624,8 +1758,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
+	struct mlx5_aso_mtr *aso_mtr;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
@@ -1661,6 +1797,17 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return -1;
+		rule_act->action = pool->action;
+		rule_act->aso_meter.offset = aso_mtr->offset;
+		rule_act->aso_meter.init_color =
+			(enum mlx5dr_action_aso_meter_color)
+			rte_col_2_mlx5_col(aso_mtr->init_color);
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1730,6 +1877,7 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
 	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
@@ -1807,6 +1955,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_actions_template *at = hw_at->action_template;
@@ -1823,8 +1972,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
-	struct mlx5_aso_mtr *mtr;
-	uint32_t mtr_id;
+	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
@@ -1858,6 +2006,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		struct mlx5_hrxq *hrxq;
 		uint32_t ct_idx;
 		cnt_id_t cnt_id;
+		uint32_t mtr_id;
 
 		action = &actions[act_data->action_src];
 		/*
@@ -1964,13 +2113,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			meter = action->conf;
 			mtr_id = meter->mtr_id;
-			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			aso_mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
 			rule_acts[act_data->action_dst].action =
 				priv->mtr_bulk.action;
 			rule_acts[act_data->action_dst].aso_meter.offset =
-								mtr->offset;
+								aso_mtr->offset;
 			jump = flow_hw_jump_action_register
-				(dev, &table->cfg, mtr->fm.group, NULL);
+				(dev, &table->cfg, aso_mtr->fm.group, NULL);
 			if (!jump)
 				return -1;
 			MLX5_ASSERT
@@ -1980,7 +2129,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
@@ -2016,6 +2165,28 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 					       &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK:
+			mtr_id = act_data->shared_meter.id &
+				((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+			/* Find ASO object. */
+			aso_mtr = mlx5_ipool_get(pool->idx_pool, mtr_id);
+			if (!aso_mtr)
+				return -1;
+			rule_acts[act_data->action_dst].action =
+							pool->action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+							aso_mtr->offset;
+			rule_acts[act_data->action_dst].aso_meter.init_color =
+				(enum mlx5dr_action_aso_meter_color)
+				rte_col_2_mlx5_col(aso_mtr->init_color);
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			ret = flow_hw_meter_mark_compile(dev,
+				act_data->action_dst, action,
+				rule_acts, &job->flow->mtr_id, queue);
+			if (ret != 0)
+				return ret;
+			break;
 		default:
 			break;
 		}
@@ -2283,6 +2454,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
@@ -2307,6 +2479,10 @@ flow_hw_pull(struct rte_eth_dev *dev,
 						&job->flow->cnt_id);
 				job->flow->cnt_id = 0;
 			}
+			if (job->flow->mtr_id) {
+				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
+				job->flow->mtr_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -3189,6 +3365,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -3282,6 +3461,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_METER;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3373,6 +3557,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3848,6 +4038,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 								  " attribute");
 			}
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		{
+			int reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported meter color register");
+			break;
+		}
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -5363,7 +5563,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (mlx5_flow_meter_init(dev,
 					port_attr->nb_meters,
 					port_attr->nb_meter_profiles,
-					port_attr->nb_meter_policies))
+					port_attr->nb_meter_policies,
+					nb_q_updated))
 			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
@@ -5867,7 +6068,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
+	uint32_t mtr_id;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5886,6 +6089,14 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		if (!aso_mtr)
+			break;
+		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
+		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5921,18 +6132,59 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
-	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
-
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_update_meter_mark *upd_meter_mark =
+		(const struct rte_flow_update_meter_mark *)update;
+	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		meter_mark = &upd_meter_mark->meter_mark;
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark update index");
+		fm = &aso_mtr->fm;
+		if (upd_meter_mark->profile_valid)
+			fm->profile = (struct mlx5_flow_meter_profile *)
+							(meter_mark->profile);
+		if (upd_meter_mark->color_mode_valid)
+			fm->color_aware = meter_mark->color_mode;
+		if (upd_meter_mark->init_color_valid)
+			aso_mtr->init_color = (meter_mark->color_mode) ?
+				meter_mark->init_color : RTE_COLOR_GREEN;
+		if (upd_meter_mark->state_valid)
+			fm->is_enable = meter_mark->state;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
+						 aso_mtr, &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		return 0;
 	default:
-		return flow_dv_action_update(dev, handle, update, error);
+		break;
 	}
+	return flow_dv_action_update(dev, handle, update, error);
 }
 
 /**
@@ -5963,7 +6215,11 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5973,6 +6229,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark destroy index");
+		fm = &aso_mtr->fm;
+		fm->is_enable = 0;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+						 &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		mlx5_ipool_free(pool->idx_pool, idx);
+		return 0;
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -6056,8 +6334,8 @@ flow_hw_action_create(struct rte_eth_dev *dev,
 		       const struct rte_flow_action *action,
 		       struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
-					    NULL, err);
+	return flow_hw_action_handle_create(dev, MLX5_HW_INV_QUEUE,
+					    NULL, conf, action, NULL, err);
 }
 
 /**
@@ -6082,8 +6360,8 @@ flow_hw_action_destroy(struct rte_eth_dev *dev,
 		       struct rte_flow_action_handle *handle,
 		       struct rte_flow_error *error)
 {
-	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
-			NULL, error);
+	return flow_hw_action_handle_destroy(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, NULL, error);
 }
 
 /**
@@ -6111,8 +6389,8 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 		      const void *update,
 		      struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
-			update, NULL, err);
+	return flow_hw_action_handle_update(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, update, NULL, err);
 }
 
 static int
@@ -6642,6 +6920,12 @@ mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 		mlx5_free(priv->mtr_profile_arr);
 		priv->mtr_profile_arr = NULL;
 	}
+	if (priv->hws_mpool) {
+		mlx5_aso_mtr_queue_uninit(priv->sh, priv->hws_mpool, NULL);
+		mlx5_ipool_destroy(priv->hws_mpool->idx_pool);
+		mlx5_free(priv->hws_mpool);
+		priv->hws_mpool = NULL;
+	}
 	if (priv->mtr_bulk.aso) {
 		mlx5_free(priv->mtr_bulk.aso);
 		priv->mtr_bulk.aso = NULL;
@@ -6662,7 +6946,8 @@ int
 mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		     uint32_t nb_meters,
 		     uint32_t nb_meter_profiles,
-		     uint32_t nb_meter_policies)
+		     uint32_t nb_meter_policies,
+		     uint32_t nb_queues)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_obj *dcs = NULL;
@@ -6672,29 +6957,35 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *aso;
 	uint32_t i;
 	struct rte_flow_error error;
+	uint32_t flags;
+	uint32_t nb_mtrs = rte_align32pow2(nb_meters);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_mtr),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.max_idx = nb_meters,
+		.free = mlx5_free,
+		.type = "mlx5_hw_mtr_mark_action",
+	};
 
 	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter configuration is invalid.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter configuration is invalid.");
 		goto err;
 	}
 	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO is not supported.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO is not supported.");
 		goto err;
 	}
 	priv->mtr_config.nb_meters = nb_meters;
-	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
-		ret = ENOMEM;
-		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO queue allocation failed.");
-		goto err;
-	}
 	log_obj_size = rte_log2_u32(nb_meters >> 1);
 	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
 		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
@@ -6702,8 +6993,8 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (!dcs) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO object allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO object allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.devx_obj = dcs;
@@ -6711,31 +7002,33 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (reg_id < 0) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter register is not available.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter register is not available.");
 		goto err;
 	}
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
 	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
 			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
-				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
-				MLX5DR_ACTION_FLAG_HWS_TX |
-				MLX5DR_ACTION_FLAG_HWS_FDB);
+				reg_id - REG_C_0, flags);
 	if (!priv->mtr_bulk.action) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter action creation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter action creation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
-						sizeof(struct mlx5_aso_mtr) * nb_meters,
-						RTE_CACHE_LINE_SIZE,
-						SOCKET_ID_ANY);
+					 sizeof(struct mlx5_aso_mtr) *
+					 nb_meters,
+					 RTE_CACHE_LINE_SIZE,
+					 SOCKET_ID_ANY);
 	if (!priv->mtr_bulk.aso) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter bulk ASO allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter bulk ASO allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.size = nb_meters;
@@ -6746,32 +7039,65 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		aso->offset = i;
 		aso++;
 	}
+	priv->hws_mpool = mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_aso_mtr_pool),
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	if (!priv->hws_mpool) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ipool allocation failed.");
+		goto err;
+	}
+	priv->hws_mpool->devx_obj = priv->mtr_bulk.devx_obj;
+	priv->hws_mpool->action = priv->mtr_bulk.action;
+	priv->hws_mpool->nb_sq = nb_queues;
+	if (mlx5_aso_mtr_queue_init(priv->sh, priv->hws_mpool,
+				    NULL, nb_queues)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	/*
+	 * No need for local cache if Meter number is a small number.
+	 * Since flow insertion rate will be very limited in that case.
+	 * Here let's set the number to less than default trunk size 4K.
+	 */
+	if (nb_mtrs <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_mtrs;
+	} else if (nb_mtrs <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	priv->hws_mpool->idx_pool = mlx5_ipool_create(&cfg);
 	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
 	priv->mtr_profile_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_profile) *
-				nb_meter_profiles,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_profile) *
+			    nb_meter_profiles,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_profile_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter profile allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter profile allocation failed.");
 		goto err;
 	}
 	priv->mtr_config.nb_meter_policies = nb_meter_policies;
 	priv->mtr_policy_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_policy) *
-				nb_meter_policies,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_policy) *
+			    nb_meter_policies,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_policy_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter policy allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter policy allocation failed.");
 		goto err;
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index 792b945c98..fd1337ae73 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -588,6 +588,36 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR profile.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_profile *
+mlx5_flow_meter_profile_get(struct rte_eth_dev *dev,
+			  uint32_t meter_profile_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_profile_find(priv,
+							meter_profile_id);
+}
+
 /**
  * Callback to add MTR profile with HWS.
  *
@@ -1150,6 +1180,37 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR policy.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_policy *
+mlx5_flow_meter_policy_get(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t policy_idx;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_policy_find(dev, policy_id,
+							      &policy_idx);
+}
+
 /**
  * Callback to delete MTR policy for HWS.
  *
@@ -1565,11 +1626,11 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
+		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
 		if (ret)
 			return ret;
 	} else {
@@ -1815,8 +1876,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1921,7 +1982,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->shared = !!shared;
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
-	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
 					   &priv->mtr_bulk);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
@@ -2401,9 +2462,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_create,
 	.destroy = mlx5_flow_meter_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2418,9 +2481,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_hws_create,
 	.destroy = mlx5_flow_meter_hws_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2566,7 +2631,7 @@ mlx5_flow_meter_attach(struct mlx5_priv *priv,
 		struct mlx5_aso_mtr *aso_mtr;
 
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
 			return rte_flow_error_set(error, ENOENT,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 13/17] net/mlx5: add HWS AGE action support
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (11 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 12/17] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 14/17] net/mlx5: add async action push and pull support Suanming Mou
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Michael Baum

From: Michael Baum <michaelba@nvidia.com>

Add support for AGE action for HW steering.
This patch includes:

 1. Add new structures to manage the aging.
 2. Initialize all them in configure function.
 3. Implement per second aging check using CNT background thread.
 4. Enable AGE action in flow create/destroy operations.
 5. Implement queue-based function to report aged flow rules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 drivers/net/mlx5/mlx5.c            |   67 +-
 drivers/net/mlx5/mlx5.h            |   51 +-
 drivers/net/mlx5/mlx5_defs.h       |    3 +
 drivers/net/mlx5/mlx5_flow.c       |   89 ++-
 drivers/net/mlx5/mlx5_flow.h       |   33 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   30 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1104 ++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_hws_cnt.c    |  704 +++++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.h    |  193 ++++-
 10 files changed, 2013 insertions(+), 265 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 383a789dfa..742607509b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -497,6 +497,12 @@ mlx5_flow_aging_init(struct mlx5_dev_ctx_shared *sh)
 	uint32_t i;
 	struct mlx5_age_info *age_info;
 
+	/*
+	 * In HW steering, aging information structure is initialized later
+	 * during configure function.
+	 */
+	if (sh->config.dv_flow_en == 2)
+		return;
 	for (i = 0; i < sh->max_port; i++) {
 		age_info = &sh->port[i].age_info;
 		age_info->flags = 0;
@@ -540,8 +546,8 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 			hca_attr->flow_counter_bulk_alloc_bitmap);
 	/* Initialize fallback mode only on the port initializes sh. */
 	if (sh->refcnt == 1)
-		sh->cmng.counter_fallback = fallback;
-	else if (fallback != sh->cmng.counter_fallback)
+		sh->sws_cmng.counter_fallback = fallback;
+	else if (fallback != sh->sws_cmng.counter_fallback)
 		DRV_LOG(WARNING, "Port %d in sh has different fallback mode "
 			"with others:%d.", PORT_ID(priv), fallback);
 #endif
@@ -556,17 +562,38 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 static void
 mlx5_flow_counters_mng_init(struct mlx5_dev_ctx_shared *sh)
 {
-	int i;
+	int i, j;
+
+	if (sh->config.dv_flow_en < 2) {
+		memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
+		TAILQ_INIT(&sh->sws_cmng.flow_counters);
+		sh->sws_cmng.min_id = MLX5_CNT_BATCH_OFFSET;
+		sh->sws_cmng.max_id = -1;
+		sh->sws_cmng.last_pool_idx = POOL_IDX_INVALID;
+		rte_spinlock_init(&sh->sws_cmng.pool_update_sl);
+		for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
+			TAILQ_INIT(&sh->sws_cmng.counters[i]);
+			rte_spinlock_init(&sh->sws_cmng.csl[i]);
+		}
+	} else {
+		struct mlx5_hca_attr *attr = &sh->cdev->config.hca_attr;
+		uint32_t fw_max_nb_cnts = attr->max_flow_counter;
+		uint8_t log_dcs = log2above(fw_max_nb_cnts) - 1;
+		uint32_t max_nb_cnts = 0;
+
+		for (i = 0, j = 0; j < MLX5_HWS_CNT_DCS_NUM; ++i) {
+			int log_dcs_i = log_dcs - i;
 
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
-	TAILQ_INIT(&sh->cmng.flow_counters);
-	sh->cmng.min_id = MLX5_CNT_BATCH_OFFSET;
-	sh->cmng.max_id = -1;
-	sh->cmng.last_pool_idx = POOL_IDX_INVALID;
-	rte_spinlock_init(&sh->cmng.pool_update_sl);
-	for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
-		TAILQ_INIT(&sh->cmng.counters[i]);
-		rte_spinlock_init(&sh->cmng.csl[i]);
+			if (log_dcs_i < 0)
+				break;
+			if ((max_nb_cnts | RTE_BIT32(log_dcs_i)) >
+			    fw_max_nb_cnts)
+				continue;
+			max_nb_cnts |= RTE_BIT32(log_dcs_i);
+			j++;
+		}
+		sh->hws_max_log_bulk_sz = log_dcs;
+		sh->hws_max_nb_counters = max_nb_cnts;
 	}
 }
 
@@ -607,13 +634,13 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 		rte_pause();
 	}
 
-	if (sh->cmng.pools) {
+	if (sh->sws_cmng.pools) {
 		struct mlx5_flow_counter_pool *pool;
-		uint16_t n_valid = sh->cmng.n_valid;
-		bool fallback = sh->cmng.counter_fallback;
+		uint16_t n_valid = sh->sws_cmng.n_valid;
+		bool fallback = sh->sws_cmng.counter_fallback;
 
 		for (i = 0; i < n_valid; ++i) {
-			pool = sh->cmng.pools[i];
+			pool = sh->sws_cmng.pools[i];
 			if (!fallback && pool->min_dcs)
 				claim_zero(mlx5_devx_cmd_destroy
 							       (pool->min_dcs));
@@ -632,14 +659,14 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 			}
 			mlx5_free(pool);
 		}
-		mlx5_free(sh->cmng.pools);
+		mlx5_free(sh->sws_cmng.pools);
 	}
-	mng = LIST_FIRST(&sh->cmng.mem_mngs);
+	mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	while (mng) {
 		mlx5_flow_destroy_counter_stat_mem_mng(mng);
-		mng = LIST_FIRST(&sh->cmng.mem_mngs);
+		mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	}
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
+	memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 89dc8441dc..c83157d0da 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -639,12 +639,45 @@ struct mlx5_geneve_tlv_option_resource {
 /* Current time in seconds. */
 #define MLX5_CURR_TIME_SEC	(rte_rdtsc() / rte_get_tsc_hz())
 
+/*
+ * HW steering queue oriented AGE info.
+ * It contains an array of rings, one for each HWS queue.
+ */
+struct mlx5_hws_q_age_info {
+	uint16_t nb_rings; /* Number of aged-out ring lists. */
+	struct rte_ring *aged_lists[]; /* Aged-out lists. */
+};
+
+/*
+ * HW steering AGE info.
+ * It has a ring list containing all aged out flow rules.
+ */
+struct mlx5_hws_age_info {
+	struct rte_ring *aged_list; /* Aged out lists. */
+};
+
 /* Aging information for per port. */
 struct mlx5_age_info {
 	uint8_t flags; /* Indicate if is new event or need to be triggered. */
-	struct mlx5_counters aged_counters; /* Aged counter list. */
-	struct aso_age_list aged_aso; /* Aged ASO actions list. */
-	rte_spinlock_t aged_sl; /* Aged flow list lock. */
+	union {
+		/* SW/FW steering AGE info. */
+		struct {
+			struct mlx5_counters aged_counters;
+			/* Aged counter list. */
+			struct aso_age_list aged_aso;
+			/* Aged ASO actions list. */
+			rte_spinlock_t aged_sl; /* Aged flow list lock. */
+		};
+		struct {
+			struct mlx5_indexed_pool *ages_ipool;
+			union {
+				struct mlx5_hws_age_info hw_age;
+				/* HW steering AGE info. */
+				struct mlx5_hws_q_age_info *hw_q_age;
+				/* HW steering queue oriented AGE info. */
+			};
+		};
+	};
 };
 
 /* Per port data of shared IB device. */
@@ -1302,6 +1335,9 @@ struct mlx5_dev_ctx_shared {
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
 	uint32_t shared_mark_enabled:1;
 	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
+	uint32_t hws_max_log_bulk_sz:5;
+	/* Log of minimal HWS counters created hard coded. */
+	uint32_t hws_max_nb_counters; /* Maximal number for HWS counters. */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1342,7 +1378,8 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_list *dest_array_list;
 	struct mlx5_list *flex_parsers_dv; /* Flex Item parsers. */
 	/* List of destination array actions. */
-	struct mlx5_flow_counter_mng cmng; /* Counters management structure. */
+	struct mlx5_flow_counter_mng sws_cmng;
+	/* SW steering counters management structure. */
 	void *default_miss_action; /* Default miss action. */
 	struct mlx5_indexed_pool *ipool[MLX5_IPOOL_MAX];
 	struct mlx5_indexed_pool *mdh_ipools[MLX5_MAX_MODIFY_NUM];
@@ -1670,6 +1707,9 @@ struct mlx5_priv {
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
+	uint32_t hws_strict_queue:1;
+	/**< Whether all operations strictly happen on the same HWS queue. */
+	uint32_t hws_age_req:1; /**< Whether this port has AGE indexed pool. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
@@ -1985,6 +2025,9 @@ int mlx5_validate_action_ct(struct rte_eth_dev *dev,
 			    const struct rte_flow_action_conntrack *conntrack,
 			    struct rte_flow_error *error);
 
+int mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			       void **contexts, uint32_t nb_contexts,
+			       struct rte_flow_error *error);
 
 /* mlx5_mp_os.c */
 
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index d064abfef3..2af8c731ef 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -43,6 +43,9 @@
 #define MLX5_PMD_SOFT_COUNTERS 1
 #endif
 
+/* Maximum number of DCS created per port. */
+#define MLX5_HWS_CNT_DCS_NUM 4
+
 /* Alarm timeout. */
 #define MLX5_ALARM_TIMEOUT_US 100000
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 9627ffc979..4bfa604578 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -987,6 +987,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.isolate = mlx5_flow_isolate,
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
+	.get_q_aged_flows = mlx5_flow_get_q_aged_flows,
 	.get_aged_flows = mlx5_flow_get_aged_flows,
 	.action_handle_create = mlx5_action_handle_create,
 	.action_handle_destroy = mlx5_action_handle_destroy,
@@ -8942,11 +8943,11 @@ mlx5_flow_create_counter_stat_mem_mng(struct mlx5_dev_ctx_shared *sh)
 		mem_mng->raws[i].data = raw_data + i * MLX5_COUNTERS_PER_POOL;
 	}
 	for (i = 0; i < MLX5_MAX_PENDING_QUERIES; ++i)
-		LIST_INSERT_HEAD(&sh->cmng.free_stat_raws,
+		LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws,
 				 mem_mng->raws + MLX5_CNT_CONTAINER_RESIZE + i,
 				 next);
-	LIST_INSERT_HEAD(&sh->cmng.mem_mngs, mem_mng, next);
-	sh->cmng.mem_mng = mem_mng;
+	LIST_INSERT_HEAD(&sh->sws_cmng.mem_mngs, mem_mng, next);
+	sh->sws_cmng.mem_mng = mem_mng;
 	return 0;
 }
 
@@ -8965,7 +8966,7 @@ static int
 mlx5_flow_set_counter_stat_mem(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_flow_counter_pool *pool)
 {
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	/* Resize statistic memory once used out. */
 	if (!(pool->index % MLX5_CNT_CONTAINER_RESIZE) &&
 	    mlx5_flow_create_counter_stat_mem_mng(sh)) {
@@ -8994,14 +8995,14 @@ mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh)
 {
 	uint32_t pools_n, us;
 
-	pools_n = __atomic_load_n(&sh->cmng.n_valid, __ATOMIC_RELAXED);
+	pools_n = __atomic_load_n(&sh->sws_cmng.n_valid, __ATOMIC_RELAXED);
 	us = MLX5_POOL_QUERY_FREQ_US / pools_n;
 	DRV_LOG(DEBUG, "Set alarm for %u pools each %u us", pools_n, us);
 	if (rte_eal_alarm_set(us, mlx5_flow_query_alarm, sh)) {
-		sh->cmng.query_thread_on = 0;
+		sh->sws_cmng.query_thread_on = 0;
 		DRV_LOG(ERR, "Cannot reinitialize query alarm");
 	} else {
-		sh->cmng.query_thread_on = 1;
+		sh->sws_cmng.query_thread_on = 1;
 	}
 }
 
@@ -9017,12 +9018,12 @@ mlx5_flow_query_alarm(void *arg)
 {
 	struct mlx5_dev_ctx_shared *sh = arg;
 	int ret;
-	uint16_t pool_index = sh->cmng.pool_index;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	uint16_t pool_index = sh->sws_cmng.pool_index;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	uint16_t n_valid;
 
-	if (sh->cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
+	if (sh->sws_cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
 		goto set_alarm;
 	rte_spinlock_lock(&cmng->pool_update_sl);
 	pool = cmng->pools[pool_index];
@@ -9035,7 +9036,7 @@ mlx5_flow_query_alarm(void *arg)
 		/* There is a pool query in progress. */
 		goto set_alarm;
 	pool->raw_hw =
-		LIST_FIRST(&sh->cmng.free_stat_raws);
+		LIST_FIRST(&sh->sws_cmng.free_stat_raws);
 	if (!pool->raw_hw)
 		/* No free counter statistics raw memory. */
 		goto set_alarm;
@@ -9061,12 +9062,12 @@ mlx5_flow_query_alarm(void *arg)
 		goto set_alarm;
 	}
 	LIST_REMOVE(pool->raw_hw, next);
-	sh->cmng.pending_queries++;
+	sh->sws_cmng.pending_queries++;
 	pool_index++;
 	if (pool_index >= n_valid)
 		pool_index = 0;
 set_alarm:
-	sh->cmng.pool_index = pool_index;
+	sh->sws_cmng.pool_index = pool_index;
 	mlx5_set_query_alarm(sh);
 }
 
@@ -9149,7 +9150,7 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 		(struct mlx5_flow_counter_pool *)(uintptr_t)async_id;
 	struct mlx5_counter_stats_raw *raw_to_free;
 	uint8_t query_gen = pool->query_gen ^ 1;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 		pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 				MLX5_COUNTER_TYPE_ORIGIN;
@@ -9172,9 +9173,9 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 			rte_spinlock_unlock(&cmng->csl[cnt_type]);
 		}
 	}
-	LIST_INSERT_HEAD(&sh->cmng.free_stat_raws, raw_to_free, next);
+	LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws, raw_to_free, next);
 	pool->raw_hw = NULL;
-	sh->cmng.pending_queries--;
+	sh->sws_cmng.pending_queries--;
 }
 
 static int
@@ -9534,7 +9535,7 @@ mlx5_flow_dev_dump_sh_all(struct rte_eth_dev *dev,
 	struct mlx5_list_inconst *l_inconst;
 	struct mlx5_list_entry *e;
 	int lcore_index;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	uint32_t max;
 	void *action;
 
@@ -9705,18 +9706,58 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 {
 	const struct mlx5_flow_driver_ops *fops;
 	struct rte_flow_attr attr = { .transfer = 0 };
+	enum mlx5_flow_drv_type type = flow_get_drv_type(dev, &attr);
 
-	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_DV) {
-		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_DV);
-		return fops->get_aged_flows(dev, contexts, nb_contexts,
-						    error);
+	if (type == MLX5_FLOW_TYPE_DV || type == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(type);
+		return fops->get_aged_flows(dev, contexts, nb_contexts, error);
 	}
-	DRV_LOG(ERR,
-		"port %u get aged flows is not supported.",
-		 dev->data->port_id);
+	DRV_LOG(ERR, "port %u get aged flows is not supported.",
+		dev->data->port_id);
 	return -ENOTSUP;
 }
 
+/**
+ * Get aged-out flows per HWS queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query.
+ * @param[in] context
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_countexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+int
+mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			   void **contexts, uint32_t nb_contexts,
+			   struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = { 0 };
+
+	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+		return fops->get_q_aged_flows(dev, queue_id, contexts,
+					      nb_contexts, error);
+	}
+	DRV_LOG(ERR, "port %u queue %u get aged flows is not supported.",
+		dev->data->port_id, queue_id);
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "get Q aged flows with incorrect steering mode");
+}
+
 /* Wrapper for driver action_validate op callback */
 static int
 flow_drv_action_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ffa4f28255..30a18ea35e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -293,6 +293,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_MODIFY_FIELD (1ull << 39)
 #define MLX5_FLOW_ACTION_METER_WITH_TERMINATED_POLICY (1ull << 40)
 #define MLX5_FLOW_ACTION_CT (1ull << 41)
+#define MLX5_FLOW_ACTION_INDIRECT_COUNT (1ull << 42)
+#define MLX5_FLOW_ACTION_INDIRECT_AGE (1ull << 43)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -1099,6 +1101,22 @@ struct rte_flow {
 	uint32_t geneve_tlv_option; /**< Holds Geneve TLV option id. > */
 } __rte_packed;
 
+/*
+ * HWS COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
 /* HWS flow struct. */
@@ -1112,7 +1130,8 @@ struct rte_flow_hw {
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	struct mlx5dr_rule rule; /* HWS layer data struct. */
-	uint32_t cnt_id;
+	uint32_t age_idx;
+	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 } __rte_packed;
 
@@ -1158,7 +1177,7 @@ struct mlx5_action_construct_data {
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
 		struct {
-			uint32_t id;
+			cnt_id_t id;
 		} shared_counter;
 		struct {
 			uint32_t id;
@@ -1189,6 +1208,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint64_t action_flags; /* Bit-map of all valid action in template. */
 	uint16_t dr_actions_num; /* Amount of DR rules actions. */
 	uint16_t actions_num; /* Amount of flow actions */
 	uint16_t *actions_off; /* DR action offset for given rte action offset. */
@@ -1245,7 +1265,7 @@ struct mlx5_hw_actions {
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
-	uint32_t cnt_id; /* Counter id. */
+	cnt_id_t cnt_id; /* Counter id. */
 	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
@@ -1619,6 +1639,12 @@ typedef int (*mlx5_flow_get_aged_flows_t)
 					 void **context,
 					 uint32_t nb_contexts,
 					 struct rte_flow_error *error);
+typedef int (*mlx5_flow_get_q_aged_flows_t)
+					(struct rte_eth_dev *dev,
+					 uint32_t queue_id,
+					 void **context,
+					 uint32_t nb_contexts,
+					 struct rte_flow_error *error);
 typedef int (*mlx5_flow_action_validate_t)
 				(struct rte_eth_dev *dev,
 				 const struct rte_flow_indir_action_conf *conf,
@@ -1825,6 +1851,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_counter_free_t counter_free;
 	mlx5_flow_counter_query_t counter_query;
 	mlx5_flow_get_aged_flows_t get_aged_flows;
+	mlx5_flow_get_q_aged_flows_t get_q_aged_flows;
 	mlx5_flow_action_validate_t action_validate;
 	mlx5_flow_action_create_t action_create;
 	mlx5_flow_action_destroy_t action_destroy;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index a42eb99154..1146e13cfa 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5524,7 +5524,7 @@ flow_dv_validate_action_age(uint64_t action_flags,
 	const struct rte_flow_action_age *age = action->conf;
 
 	if (!priv->sh->cdev->config.devx ||
-	    (priv->sh->cmng.counter_fallback && !priv->sh->aso_age_mng))
+	    (priv->sh->sws_cmng.counter_fallback && !priv->sh->aso_age_mng))
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -6085,7 +6085,7 @@ flow_dv_counter_get_by_idx(struct rte_eth_dev *dev,
 			   struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	/* Decrease to original index and clear shared bit. */
@@ -6179,7 +6179,7 @@ static int
 flow_dv_container_resize(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	void *old_pools = cmng->pools;
 	uint32_t resize = cmng->n + MLX5_CNT_CONTAINER_RESIZE;
 	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * resize;
@@ -6225,7 +6225,7 @@ _flow_dv_query_count(struct rte_eth_dev *dev, uint32_t counter, uint64_t *pkts,
 
 	cnt = flow_dv_counter_get_by_idx(dev, counter, &pool);
 	MLX5_ASSERT(pool);
-	if (priv->sh->cmng.counter_fallback)
+	if (priv->sh->sws_cmng.counter_fallback)
 		return mlx5_devx_cmd_flow_counter_query(cnt->dcs_when_active, 0,
 					0, pkts, bytes, 0, NULL, NULL, 0);
 	rte_spinlock_lock(&pool->sl);
@@ -6262,8 +6262,8 @@ flow_dv_pool_create(struct rte_eth_dev *dev, struct mlx5_devx_obj *dcs,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t size = sizeof(*pool);
 
 	size += MLX5_COUNTERS_PER_POOL * MLX5_CNT_SIZE;
@@ -6324,14 +6324,14 @@ flow_dv_counter_pool_prepare(struct rte_eth_dev *dev,
 			     uint32_t age)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	struct mlx5_counters tmp_tq;
 	struct mlx5_devx_obj *dcs = NULL;
 	struct mlx5_flow_counter *cnt;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t i;
 
 	if (fallback) {
@@ -6395,8 +6395,8 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt_free = NULL;
-	bool fallback = priv->sh->cmng.counter_fallback;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
 	uint32_t cnt_idx;
@@ -6442,7 +6442,7 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	if (_flow_dv_query_count(dev, cnt_idx, &cnt_free->hits,
 				 &cnt_free->bytes))
 		goto err;
-	if (!fallback && !priv->sh->cmng.query_thread_on)
+	if (!fallback && !priv->sh->sws_cmng.query_thread_on)
 		/* Start the asynchronous batch query by the host thread. */
 		mlx5_set_query_alarm(priv->sh);
 	/*
@@ -6570,7 +6570,7 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 	 * this case, lock will not be needed as query callback and release
 	 * function both operate with the different list.
 	 */
-	if (!priv->sh->cmng.counter_fallback) {
+	if (!priv->sh->sws_cmng.counter_fallback) {
 		rte_spinlock_lock(&pool->csl);
 		TAILQ_INSERT_TAIL(&pool->counters[pool->query_gen], cnt, next);
 		rte_spinlock_unlock(&pool->csl);
@@ -6578,10 +6578,10 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 		cnt->dcs_when_free = cnt->dcs_when_active;
 		cnt_type = pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 					   MLX5_COUNTER_TYPE_ORIGIN;
-		rte_spinlock_lock(&priv->sh->cmng.csl[cnt_type]);
-		TAILQ_INSERT_TAIL(&priv->sh->cmng.counters[cnt_type],
+		rte_spinlock_lock(&priv->sh->sws_cmng.csl[cnt_type]);
+		TAILQ_INSERT_TAIL(&priv->sh->sws_cmng.counters[cnt_type],
 				  cnt, next);
-		rte_spinlock_unlock(&priv->sh->cmng.csl[cnt_type]);
+		rte_spinlock_unlock(&priv->sh->sws_cmng.csl[cnt_type]);
 	}
 }
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index d498d203d5..161b96cd87 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -477,7 +477,8 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
 				  enum rte_flow_action_type type,
 				  uint16_t action_src,
 				  uint16_t action_dst)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -512,7 +513,8 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				uint16_t action_src,
 				uint16_t action_dst,
 				uint16_t len)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -582,7 +584,8 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 				     uint16_t action_dst,
 				     uint32_t idx,
 				     struct mlx5_shared_action_rss *rss)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -621,7 +624,8 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 				     uint16_t action_src,
 				     uint16_t action_dst,
 				     cnt_id_t cnt_id)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -717,6 +721,10 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/* Not supported, prevent by validate function. */
+		MLX5_ASSERT(0);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
 				       idx, &acts->rule_acts[action_dst]))
@@ -1109,7 +1117,7 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	cnt_id_t cnt_id;
 	int ret;
 
-	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0);
 	if (ret != 0)
 		return ret;
 	ret = mlx5_hws_cnt_pool_get_action_offset
@@ -1250,8 +1258,6 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to the rte_eth_dev structure.
  * @param[in] cfg
  *   Pointer to the table configuration.
- * @param[in] item_templates
- *   Item template array to be binded to the table.
  * @param[in/out] acts
  *   Pointer to the template HW steering DR actions.
  * @param[in] at
@@ -1260,7 +1266,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to error structure.
  *
  * @return
- *    Table on success, NULL otherwise and rte_errno is set.
+ *   0 on success, a negative errno otherwise and rte_errno is set.
  */
 static int
 __flow_hw_actions_translate(struct rte_eth_dev *dev,
@@ -1289,6 +1295,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t jump_pos;
 	uint32_t ct_idx;
 	int err;
+	uint32_t target_grp = 0;
 
 	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
@@ -1516,8 +1523,42 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 							action_pos))
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Age action on root table is not supported in HW steering mode");
+			}
+			action_pos = at->actions_off[actions - at->actions];
+			if (__flow_hw_act_data_general_append(priv, acts,
+							 actions->type,
+							 actions - action_start,
+							 action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			action_pos = at->actions_off[actions - action_start];
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Counter action on root table is not supported in HW steering mode");
+			}
+			if ((at->action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * When both COUNT and AGE are requested, it is
+				 * saved as AGE action which creates also the
+				 * counter.
+				 */
+				break;
+			action_pos = at->actions_off[actions - at->actions];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
@@ -1744,6 +1785,10 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *   Pointer to the flow table.
  * @param[in] it_idx
  *   Item template index the action template refer to.
+ * @param[in] action_flags
+ *   Actions bit-map detected in this template.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
  * @param[in] rule_act
  *   Pointer to the shared action's destination rule DR action.
  *
@@ -1754,7 +1799,8 @@ static __rte_always_inline int
 flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
-				const uint8_t it_idx,
+				const uint8_t it_idx, uint64_t action_flags,
+				struct rte_flow_hw *flow,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -1762,11 +1808,14 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
 	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_age_info *age_info;
+	struct mlx5_hws_age_param *param;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
 		       ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	uint64_t item_flags;
+	cnt_id_t age_cnt;
 
 	memset(&act_data, 0, sizeof(act_data));
 	switch (type) {
@@ -1792,6 +1841,44 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				&rule_act->action,
 				&rule_act->counter.offset))
 			return -1;
+		flow->cnt_id = act_idx;
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/*
+		 * Save the index with the indirect type, to recognize
+		 * it in flow destroy.
+		 */
+		flow->age_idx = act_idx;
+		if (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+			/*
+			 * The mutual update for idirect AGE & COUNT will be
+			 * performed later after we have ID for both of them.
+			 */
+			break;
+		age_info = GET_PORT_AGE_INFO(priv);
+		param = mlx5_ipool_get(age_info->ages_ipool, idx);
+		if (param == NULL)
+			return -1;
+		if (action_flags & MLX5_FLOW_ACTION_COUNT) {
+			if (mlx5_hws_cnt_pool_get(priv->hws_cpool,
+						  &param->queue_id, &age_cnt,
+						  idx) < 0)
+				return -1;
+			flow->cnt_id = age_cnt;
+			param->nb_cnts++;
+		} else {
+			/*
+			 * Get the counter of this indirect AGE or create one
+			 * if doesn't exist.
+			 */
+			age_cnt = mlx5_hws_age_cnt_get(priv, param, idx);
+			if (age_cnt == 0)
+				return -1;
+		}
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+						     age_cnt, &rule_act->action,
+						     &rule_act->counter.offset))
+			return -1;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
@@ -1952,7 +2039,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t queue)
+			  uint32_t queue,
+			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1965,6 +2053,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
 	const struct rte_flow_action_meter *meter = NULL;
+	const struct rte_flow_action_age *age = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1972,6 +2061,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	uint32_t age_idx = 0;
 	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
@@ -2024,6 +2114,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
 					(dev, queue, action, table, it_idx,
+					 at->action_flags, job->flow,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -2132,9 +2223,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			age = action->conf;
+			/*
+			 * First, create the AGE parameter, then create its
+			 * counter later:
+			 * Regular counter - in next case.
+			 * Indirect counter - update it after the loop.
+			 */
+			age_idx = mlx5_hws_age_action_create(priv, queue, 0,
+							     age,
+							     job->flow->idx,
+							     error);
+			if (age_idx == 0)
+				return -rte_errno;
+			job->flow->age_idx = age_idx;
+			if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+				/*
+				 * When AGE uses indirect counter, no need to
+				 * create counter but need to update it with the
+				 * AGE parameter, will be done after the loop.
+				 */
+				break;
+			/* Fall-through. */
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
-					&cnt_id);
+						    &cnt_id, age_idx);
 			if (ret != 0)
 				return ret;
 			ret = mlx5_hws_cnt_pool_get_action_offset
@@ -2191,6 +2305,25 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT) {
+		if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE) {
+			age_idx = job->flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+			if (mlx5_hws_cnt_age_get(priv->hws_cpool,
+						 job->flow->cnt_id) != age_idx)
+				/*
+				 * This is first use of this indirect counter
+				 * for this indirect AGE, need to increase the
+				 * number of counters.
+				 */
+				mlx5_hws_age_nb_cnt_increase(priv, age_idx);
+		}
+		/*
+		 * Update this indirect counter the indirect/direct AGE in which
+		 * using it.
+		 */
+		mlx5_hws_cnt_age_set(priv->hws_cpool, job->flow->cnt_id,
+				     age_idx);
+	}
 	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
@@ -2340,8 +2473,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
-				      pattern_template_index, actions, rule_acts, queue)) {
+	if (flow_hw_actions_construct(dev, job,
+				      &table->ats[action_template_index],
+				      pattern_template_index, actions,
+				      rule_acts, queue, error)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -2426,6 +2561,49 @@ flow_hw_async_flow_destroy(struct rte_eth_dev *dev,
 			"fail to create rte flow");
 }
 
+/**
+ * Release the AGE and counter for given flow.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue
+ *   The queue to release the counter.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
+ * @param[out] error
+ *   Pointer to error structure.
+ */
+static void
+flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
+			  struct rte_flow_hw *flow,
+			  struct rte_flow_error *error)
+{
+	if (mlx5_hws_cnt_is_shared(priv->hws_cpool, flow->cnt_id)) {
+		if (flow->age_idx && !mlx5_hws_age_is_indirect(flow->age_idx)) {
+			/* Remove this AGE parameter from indirect counter. */
+			mlx5_hws_cnt_age_set(priv->hws_cpool, flow->cnt_id, 0);
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+			flow->age_idx = 0;
+		}
+		return;
+	}
+	/* Put the counter first to reduce the race risk in BG thread. */
+	mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue, &flow->cnt_id);
+	flow->cnt_id = 0;
+	if (flow->age_idx) {
+		if (mlx5_hws_age_is_indirect(flow->age_idx)) {
+			uint32_t idx = flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+
+			mlx5_hws_age_nb_cnt_decrease(priv, idx);
+		} else {
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+		}
+		flow->age_idx = 0;
+	}
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2472,13 +2650,9 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
-			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
-			    mlx5_hws_cnt_is_shared
-				(priv->hws_cpool, job->flow->cnt_id) == false) {
-				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
-						&job->flow->cnt_id);
-				job->flow->cnt_id = 0;
-			}
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id))
+				flow_hw_age_count_release(priv, queue,
+							  job->flow, error);
 			if (job->flow->mtr_id) {
 				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
 				job->flow->mtr_id = 0;
@@ -3131,100 +3305,315 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static inline int
-flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
-				const struct rte_flow_action masks[],
-				const struct rte_flow_action *ins_actions,
-				const struct rte_flow_action *ins_masks,
-				struct rte_flow_action *new_actions,
-				struct rte_flow_action *new_masks,
-				uint16_t *ins_pos)
+/**
+ * Validate AGE action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[in] fixed_cnt
+ *   Indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_age(struct rte_eth_dev *dev,
+			    const struct rte_flow_action *action,
+			    uint64_t action_flags, bool fixed_cnt,
+			    struct rte_flow_error *error)
 {
-	uint16_t idx, total = 0;
-	uint16_t end_idx = UINT16_MAX;
-	bool act_end = false;
-	bool modify_field = false;
-	bool rss_or_queue = false;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
 
-	MLX5_ASSERT(actions && masks);
-	MLX5_ASSERT(new_actions && new_masks);
-	MLX5_ASSERT(ins_actions && ins_masks);
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_RSS:
-		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			/* It is assumed that application provided only single RSS/QUEUE action. */
-			MLX5_ASSERT(!rss_or_queue);
-			rss_or_queue = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			modify_field = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_END:
-			end_idx = idx;
-			act_end = true;
-			break;
-		default:
-			break;
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "AGE action not supported");
+	if (age_info->ages_ipool == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "aging pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_AGE) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate AGE actions set");
+	if (fixed_cnt)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "AGE and fixed COUNT combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate count action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_count(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      const struct rte_flow_action *mask,
+			      uint64_t action_flags,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count = mask->conf;
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "count action not supported");
+	if (!priv->hws_cpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "counters pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_COUNT) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate count actions set");
+	if (count && count->id && (action_flags & MLX5_FLOW_ACTION_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, mask,
+					  "AGE and COUNT action shared by mask combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate meter_mark action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_meter_mark(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	RTE_SET_USED(action);
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark action not supported");
+	if (!priv->hws_mpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark pool not initialized");
+	return 0;
+}
+
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in, out] action_flags
+ *   Holds the actions detected until now.
+ * @param[in, out] fixed_cnt
+ *   Pointer to indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_indirect(struct rte_eth_dev *dev,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *mask,
+				 uint64_t *action_flags, bool *fixed_cnt,
+				 struct rte_flow_error *error)
+{
+	uint32_t type;
+	int ret;
+
+	if (!mask)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "Unable to determine indirect action type without a mask specified");
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		ret = flow_hw_validate_action_meter_mark(dev, mask, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_METER;
+		break;
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_RSS;
+		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_CT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (action->conf && mask->conf) {
+			if ((*action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (*action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * AGE cannot use indirect counter which is
+				 * shared with enother flow rules.
+				 */
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "AGE and fixed COUNT combination is not supported");
+			*fixed_cnt = true;
 		}
+		ret = flow_hw_validate_action_count(dev, action, mask,
+						    *action_flags, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_COUNT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		ret = flow_hw_validate_action_age(dev, action, *action_flags,
+						  *fixed_cnt, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_AGE;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, mask,
+					  "Unsupported indirect action type");
 	}
-	if (!rss_or_queue)
-		return 0;
-	else if (idx >= MLX5_HW_MAX_ACTS)
-		return -1; /* No more space. */
-	total = idx;
-	/*
-	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
-	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
-	 * first MODIFY_FIELD flow action.
-	 */
-	if (modify_field) {
-		*ins_pos = end_idx;
-		goto insert_meta_copy;
-	}
-	/*
-	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
-	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	return 0;
+}
+
+/**
+ * Validate raw_encap action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_raw_encap(struct rte_eth_dev *dev __rte_unused,
+				  const struct rte_flow_action *action,
+				  struct rte_flow_error *error)
+{
+	const struct rte_flow_action_raw_encap *raw_encap_data = action->conf;
+
+	if (!raw_encap_data || !raw_encap_data->size || !raw_encap_data->data)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "invalid raw_encap_data");
+	return 0;
+}
+
+static inline uint16_t
+flow_hw_template_expand_modify_field(const struct rte_flow_action actions[],
+				     const struct rte_flow_action masks[],
+				     const struct rte_flow_action *mf_action,
+				     const struct rte_flow_action *mf_mask,
+				     struct rte_flow_action *new_actions,
+				     struct rte_flow_action *new_masks,
+				     uint64_t flags, uint32_t act_num)
+{
+	uint32_t i, tail;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(mf_action && mf_mask);
+	if (flags & MLX5_FLOW_ACTION_MODIFY_FIELD) {
+		/*
+		 * Application action template already has Modify Field.
+		 * It's location will be used in DR.
+		 * Expanded MF action can be added before the END.
+		 */
+		i = act_num - 1;
+		goto insert;
+	}
+	/**
+	 * Locate the first action positioned BEFORE the new MF.
+	 *
+	 * Search for a place to insert modify header
+	 * from the END action backwards:
+	 * 1. END is always present in actions array
+	 * 2. END location is always at action[act_num - 1]
+	 * 3. END always positioned AFTER modify field location
+	 *
+	 * Relative actions order is the same for RX, TX and FDB.
+	 *
+	 * Current actions order (draft-3)
+	 * @see action_order_arr[]
 	 */
-	act_end = false;
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_COUNT:
-		case RTE_FLOW_ACTION_TYPE_METER:
-		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+	for (i = act_num - 2; (int)i >= 0; i--) {
+		enum rte_flow_action_type type = actions[i].type;
+
+		if (type == RTE_FLOW_ACTION_TYPE_INDIRECT)
+			type = masks[i].type;
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_DROP:
+		case RTE_FLOW_ACTION_TYPE_JUMP:
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			*ins_pos = idx;
-			act_end = true;
-			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+		case RTE_FLOW_ACTION_TYPE_VOID:
 		case RTE_FLOW_ACTION_TYPE_END:
-			act_end = true;
 			break;
 		default:
+			i++; /* new MF inserted AFTER actions[i] */
+			goto insert;
 			break;
 		}
 	}
-insert_meta_copy:
-	MLX5_ASSERT(*ins_pos != UINT16_MAX);
-	MLX5_ASSERT(*ins_pos < total);
-	/* Before the position, no change for the actions. */
-	for (idx = 0; idx < *ins_pos; idx++) {
-		new_actions[idx] = actions[idx];
-		new_masks[idx] = masks[idx];
-	}
-	/* Insert the new action and mask to the position. */
-	new_actions[idx] = *ins_actions;
-	new_masks[idx] = *ins_masks;
-	/* Remaining content is right shifted by one position. */
-	for (; idx < total; idx++) {
-		new_actions[idx + 1] = actions[idx];
-		new_masks[idx + 1] = masks[idx];
-	}
-	return 0;
+	i = 0;
+insert:
+	tail = act_num - i; /* num action to move */
+	memcpy(new_actions, actions, sizeof(actions[0]) * i);
+	new_actions[i] = *mf_action;
+	memcpy(new_actions + i + 1, actions + i, sizeof(actions[0]) * tail);
+	memcpy(new_masks, masks, sizeof(masks[0]) * i);
+	new_masks[i] = *mf_mask;
+	memcpy(new_masks + i + 1, masks + i, sizeof(masks[0]) * tail);
+	return i;
 }
 
 static int
@@ -3295,13 +3684,17 @@ flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_actions_validate(struct rte_eth_dev *dev,
-			const struct rte_flow_actions_template_attr *attr,
-			const struct rte_flow_action actions[],
-			const struct rte_flow_action masks[],
-			struct rte_flow_error *error)
+mlx5_flow_hw_actions_validate(struct rte_eth_dev *dev,
+			      const struct rte_flow_actions_template_attr *attr,
+			      const struct rte_flow_action actions[],
+			      const struct rte_flow_action masks[],
+			      uint64_t *act_flags,
+			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count_mask = NULL;
+	bool fixed_cnt = false;
+	uint64_t action_flags = 0;
 	uint16_t i;
 	bool actions_end = false;
 	int ret;
@@ -3327,46 +3720,70 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_indirect(dev, action,
+							       mask,
+							       &action_flags,
+							       &fixed_cnt,
+							       error);
+			if (ret < 0)
+				return ret;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_MARK;
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DROP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_JUMP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_QUEUE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_RSS;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_raw_encap(dev, action, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_meter_mark(dev, action,
+								 error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
@@ -3374,21 +3791,43 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 									error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			ret = flow_hw_validate_action_represented_port
 					(dev, action, mask, error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_PORT_ID;
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			if (count_mask && count_mask->id)
+				fixed_cnt = true;
+			ret = flow_hw_validate_action_age(dev, action,
+							  action_flags,
+							  fixed_cnt, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_AGE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_count(dev, action, mask,
+							    action_flags,
+							    error);
+			if (ret < 0)
+				return ret;
+			count_mask = mask->conf;
+			action_flags |= MLX5_FLOW_ACTION_COUNT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_CT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_flags |= MLX5_FLOW_ACTION_OF_POP_VLAN;
+			break;
 		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			action_flags |= MLX5_FLOW_ACTION_OF_SET_VLAN_VID;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
 			ret = flow_hw_validate_action_push_vlan
@@ -3398,6 +3837,7 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			i += is_of_vlan_pcp_present(action) ?
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
+			action_flags |= MLX5_FLOW_ACTION_OF_PUSH_VLAN;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -3409,9 +3849,23 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 						  "action not supported in template API");
 		}
 	}
+	if (act_flags != NULL)
+		*act_flags = action_flags;
 	return 0;
 }
 
+static int
+flow_hw_actions_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error)
+{
+	return mlx5_flow_hw_actions_validate(dev, attr, actions, masks, NULL,
+					     error);
+}
+
+
 static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
 	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
@@ -3424,7 +3878,6 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
-	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
 	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
@@ -3434,7 +3887,7 @@ static int
 flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 					  unsigned int action_src,
 					  enum mlx5dr_action_type *action_types,
-					  uint16_t *curr_off,
+					  uint16_t *curr_off, uint16_t *cnt_off,
 					  struct rte_flow_actions_template *at)
 {
 	uint32_t type;
@@ -3451,10 +3904,18 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		at->actions_off[action_src] = *curr_off;
-		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
-		*curr_off = *curr_off + 1;
+		/*
+		 * Both AGE and COUNT action need counter, the first one fills
+		 * the action_types array, and the second only saves the offset.
+		 */
+		if (*cnt_off == UINT16_MAX) {
+			*cnt_off = *curr_off;
+			action_types[*cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			*curr_off = *curr_off + 1;
+		}
+		at->actions_off[action_src] = *cnt_off;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		at->actions_off[action_src] = *curr_off;
@@ -3493,6 +3954,7 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
 	uint16_t reformat_off = UINT16_MAX;
 	uint16_t mhdr_off = UINT16_MAX;
+	uint16_t cnt_off = UINT16_MAX;
 	int ret;
 	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		const struct rte_flow_action_raw_encap *raw_encap_data;
@@ -3505,9 +3967,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
-									action_types,
-									&curr_off, at);
+			ret = flow_hw_dr_actions_template_handle_shared
+								 (&at->masks[i],
+								  i,
+								  action_types,
+								  &curr_off,
+								  &cnt_off, at);
 			if (ret)
 				return NULL;
 			break;
@@ -3563,6 +4028,19 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 			if (curr_off >= MLX5_HW_MAX_ACTS)
 				goto err_actions_num;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/*
+			 * Both AGE and COUNT action need counter, the first
+			 * one fills the action_types array, and the second only
+			 * saves the offset.
+			 */
+			if (cnt_off == UINT16_MAX) {
+				cnt_off = curr_off++;
+				action_types[cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			}
+			at->actions_off[i] = cnt_off;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3703,6 +4181,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = UINT16_MAX;
+	uint64_t action_flags = 0;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
@@ -3745,22 +4224,9 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
+	if (mlx5_flow_hw_actions_validate(dev, attr, actions, masks,
+					  &action_flags, error))
 		return NULL;
-	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
-	    priv->sh->config.dv_esw_en) {
-		/* Application should make sure only one Q/RSS exist in one rule. */
-		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
-						    tmp_action, tmp_mask, &pos)) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					   "Failed to concatenate new action/mask");
-			return NULL;
-		} else if (pos != UINT16_MAX) {
-			ra = tmp_action;
-			rm = tmp_mask;
-		}
-	}
 	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		switch (ra[i].type) {
 		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
@@ -3786,6 +4252,29 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
 		return NULL;
 	}
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en &&
+	    (action_flags &
+	     (RTE_FLOW_ACTION_TYPE_QUEUE | RTE_FLOW_ACTION_TYPE_RSS))) {
+		/* Insert META copy */
+		if (act_num + 1 > MLX5_HW_MAX_ACTS) {
+			rte_flow_error_set(error, E2BIG,
+					   RTE_FLOW_ERROR_TYPE_ACTION,
+					   NULL, "cannot expand: too many actions");
+			return NULL;
+		}
+		/* Application should make sure only one Q/RSS exist in one rule. */
+		pos = flow_hw_template_expand_modify_field(actions, masks,
+							   &rx_cpy,
+							   &rx_cpy_mask,
+							   tmp_action, tmp_mask,
+							   action_flags,
+							   act_num);
+		ra = tmp_action;
+		rm = tmp_mask;
+		act_num++;
+		action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
+	}
 	if (set_vlan_vid_ix != -1) {
 		/* If temporary action buffer was not used, copy template actions to it */
 		if (ra == actions && rm == masks) {
@@ -3856,6 +4345,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	at->tmpl = flow_hw_dr_actions_template_create(at);
 	if (!at->tmpl)
 		goto error;
+	at->action_flags = action_flags;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
@@ -4199,6 +4689,7 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t port_id = dev->data->port_id;
 	struct rte_mtr_capabilities mtr_cap;
 	int ret;
@@ -4215,6 +4706,8 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 		port_info->max_nb_meter_profiles = UINT32_MAX;
 		port_info->max_nb_meter_policies = UINT32_MAX;
 	}
+	port_info->max_nb_counters = priv->sh->hws_max_nb_counters;
+	port_info->max_nb_aging_objects = port_info->max_nb_counters;
 	return 0;
 }
 
@@ -5593,8 +6086,6 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			goto err;
 		}
 	}
-	if (_queue_attr)
-		mlx5_free(_queue_attr);
 	if (port_attr->nb_conn_tracks) {
 		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
 			   sizeof(*priv->ct_mng);
@@ -5611,13 +6102,35 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
-				nb_queue);
+							   nb_queue);
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	if (port_attr->nb_aging_objects) {
+		if (port_attr->nb_counters == 0) {
+			/*
+			 * Aging management uses counter. Number counters
+			 * requesting should take into account a counter for
+			 * each flow rules containing AGE without counter.
+			 */
+			DRV_LOG(ERR, "Port %u AGE objects are requested (%u) "
+				"without counters requesting.",
+				dev->data->port_id,
+				port_attr->nb_aging_objects);
+			rte_errno = EINVAL;
+			goto err;
+		}
+		ret = mlx5_hws_age_pool_init(dev, port_attr, nb_queue);
+		if (ret < 0)
+			goto err;
+	}
 	ret = flow_hw_create_vlan(dev);
 	if (ret)
 		goto err;
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
+	if (port_attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE)
+		priv->hws_strict_queue = 1;
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -5628,6 +6141,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
+	if (priv->hws_cpool)
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -5701,6 +6218,8 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
 	if (priv->hws_cpool)
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	if (priv->hws_ctpool) {
@@ -6037,13 +6556,53 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
 }
 
+/**
+ * Validate shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used.
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] conf
+ *   Indirect action configuration.
+ * @param[in] action
+ *   rte_flow action detail.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_handle_validate(struct rte_eth_dev *dev, uint32_t queue,
+			       const struct rte_flow_op_attr *attr,
+			       const struct rte_flow_indir_action_conf *conf,
+			       const struct rte_flow_action *action,
+			       void *user_data,
+			       struct rte_flow_error *error)
+{
+	RTE_SET_USED(attr);
+	RTE_SET_USED(queue);
+	RTE_SET_USED(user_data);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		return flow_hw_validate_action_meter_mark(dev, action, error);
+	default:
+		return flow_dv_action_validate(dev, conf, action, error);
+	}
+}
+
 /**
  * Create shared action.
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] conf
@@ -6068,16 +6627,32 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
+	uint32_t age_idx;
 
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		age = action->conf;
+		age_idx = mlx5_hws_age_action_create(priv, queue, true, age,
+						     0, error);
+		if (age_idx == 0) {
+			rte_flow_error_set(error, ENODEV,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "AGE are not configured!");
+		} else {
+			age_idx = (MLX5_INDIRECT_ACTION_TYPE_AGE <<
+				   MLX5_INDIRECT_ACTION_TYPE_OFFSET) | age_idx;
+			handle =
+			    (struct rte_flow_action_handle *)(uintptr_t)age_idx;
+		}
+		break;
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0))
 			rte_flow_error_set(error, ENODEV,
 					RTE_FLOW_ERROR_TYPE_ACTION,
 					NULL,
@@ -6097,8 +6672,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
 		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
 		break;
-	default:
+	case RTE_FLOW_ACTION_TYPE_RSS:
 		handle = flow_dv_action_create(dev, conf, action, error);
+		break;
+	default:
+		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				   NULL, "action type not supported");
+		return NULL;
 	}
 	return handle;
 }
@@ -6109,7 +6689,7 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6132,7 +6712,6 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6147,6 +6726,8 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_update(priv, idx, update, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
@@ -6180,11 +6761,15 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		return 0;
-	default:
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
+		return flow_dv_action_update(dev, handle, update, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
-	return flow_dv_action_update(dev, handle, update, error);
+	return 0;
 }
 
 /**
@@ -6193,7 +6778,7 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6215,6 +6800,7 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -6225,7 +6811,16 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_destroy(priv, age_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
+		if (age_idx != 0)
+			/*
+			 * If this counter belongs to indirect AGE, here is the
+			 * time to update the AGE.
+			 */
+			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
@@ -6250,10 +6845,15 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
 		mlx5_ipool_free(pool->idx_pool, idx);
-		return 0;
-	default:
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_destroy(dev, handle, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
+	return 0;
 }
 
 static int
@@ -6263,13 +6863,14 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hws_cnt *cnt;
 	struct rte_flow_query_count *qc = data;
-	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint32_t iidx;
 	uint64_t pkts, bytes;
 
 	if (!mlx5_hws_cnt_id_valid(counter))
 		return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				"counter are not available");
+	iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
 	cnt = &priv->hws_cpool->pool[iidx];
 	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
 	qc->hits_set = 1;
@@ -6283,12 +6884,64 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	return 0;
 }
 
+/**
+ * Query a flow rule AGE action for aging information.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] age_idx
+ *   Index of AGE action parameter.
+ * @param[out] data
+ *   Data retrieved by the query.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
 static int
-flow_hw_query(struct rte_eth_dev *dev,
-	      struct rte_flow *flow __rte_unused,
-	      const struct rte_flow_action *actions __rte_unused,
-	      void *data __rte_unused,
-	      struct rte_flow_error *error __rte_unused)
+flow_hw_query_age(const struct rte_eth_dev *dev, uint32_t age_idx, void *data,
+		  struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+	struct rte_flow_query_age *resp = data;
+
+	if (!param || !param->timeout)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "age data not available");
+	switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+	case HWS_AGE_AGED_OUT_REPORTED:
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		resp->aged = 1;
+		break;
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		resp->aged = 0;
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * When state is FREE the flow itself should be invalid.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	resp->sec_since_last_hit_valid = !resp->aged;
+	if (resp->sec_since_last_hit_valid)
+		resp->sec_since_last_hit = __atomic_load_n
+				 (&param->sec_since_last_hit, __ATOMIC_RELAXED);
+	return 0;
+}
+
+static int
+flow_hw_query(struct rte_eth_dev *dev, struct rte_flow *flow,
+	      const struct rte_flow_action *actions, void *data,
+	      struct rte_flow_error *error)
 {
 	int ret = -EINVAL;
 	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
@@ -6299,7 +6952,11 @@ flow_hw_query(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
-						  error);
+						    error);
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			ret = flow_hw_query_age(dev, hw_flow->age_idx, data,
+						error);
 			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
@@ -6311,6 +6968,32 @@ flow_hw_query(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_indir_action_conf *conf,
+			const struct rte_flow_action *action,
+			struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_validate(dev, MLX5_HW_INV_QUEUE, NULL,
+					      conf, action, NULL, err);
+}
+
 /**
  * Create indirect action.
  *
@@ -6334,6 +7017,12 @@ flow_hw_action_create(struct rte_eth_dev *dev,
 		       const struct rte_flow_action *action,
 		       struct rte_flow_error *err)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hws_strict_queue)
+		DRV_LOG(WARNING,
+			"port %u create indirect action called in strict queue mode.",
+			dev->data->port_id);
 	return flow_hw_action_handle_create(dev, MLX5_HW_INV_QUEUE,
 					    NULL, conf, action, NULL, err);
 }
@@ -6400,17 +7089,118 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return flow_hw_query_age(dev, age_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	default:
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_query(dev, handle, data, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
 }
 
+/**
+ * Get aged-out flows of a given port on the given HWS flow queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query. Ignored when RTE_FLOW_PORT_FLAG_STRICT_QUEUE not set.
+ * @param[in, out] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array, otherwise negative errno value.
+ */
+static int
+flow_hw_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			 void **contexts, uint32_t nb_contexts,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct rte_ring *r;
+	int nb_flows = 0;
+
+	if (nb_contexts && !contexts)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "empty context");
+	if (priv->hws_strict_queue) {
+		if (queue_id >= age_info->hw_q_age->nb_rings)
+			return rte_flow_error_set(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						NULL, "invalid queue id");
+		r = age_info->hw_q_age->aged_lists[queue_id];
+	} else {
+		r = age_info->hw_age.aged_list;
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	if (nb_contexts == 0)
+		return rte_ring_count(r);
+	while ((uint32_t)nb_flows < nb_contexts) {
+		uint32_t age_idx;
+
+		if (rte_ring_dequeue_elem(r, &age_idx, sizeof(uint32_t)) < 0)
+			break;
+		/* get the AGE context if the aged-out index is still valid. */
+		contexts[nb_flows] = mlx5_hws_age_context_get(priv, age_idx);
+		if (!contexts[nb_flows])
+			continue;
+		nb_flows++;
+	}
+	return nb_flows;
+}
+
+/**
+ * Get aged-out flows.
+ *
+ * This function is relevant only if RTE_FLOW_PORT_FLAG_STRICT_QUEUE isn't set.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+static int
+flow_hw_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
+		       uint32_t nb_contexts, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hws_strict_queue)
+		DRV_LOG(WARNING,
+			"port %u get aged flows called in strict queue mode.",
+			dev->data->port_id);
+	return flow_hw_get_q_aged_flows(dev, 0, contexts, nb_contexts, error);
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -6429,12 +7219,14 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
-	.action_validate = flow_dv_action_validate,
+	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
 	.action_update = flow_hw_action_update,
 	.action_query = flow_hw_action_query,
 	.query = flow_hw_query,
+	.get_aged_flows = flow_hw_get_aged_flows,
+	.get_q_aged_flows = flow_hw_get_q_aged_flows,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 7ffaf4c227..81a33ddf09 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -122,7 +122,7 @@ flow_verbs_counter_get_by_idx(struct rte_eth_dev *dev,
 			      struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	idx = (idx - 1) & (MLX5_CNT_SHARED_OFFSET - 1);
@@ -215,7 +215,7 @@ static uint32_t
 flow_verbs_counter_new(struct rte_eth_dev *dev, uint32_t id __rte_unused)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt = NULL;
 	uint32_t n_valid = cmng->n_valid;
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
index e2408ef36d..6eab58aa9c 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.c
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -8,6 +8,7 @@
 #include <rte_ring.h>
 #include <mlx5_devx_cmds.h>
 #include <rte_cycles.h>
+#include <rte_eal_paging.h>
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
@@ -26,8 +27,8 @@ __hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
 	uint32_t preload;
 	uint32_t q_num = cpool->cache->q_num;
 	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
-	cnt_id_t cnt_id, iidx = 0;
-	uint32_t qidx;
+	cnt_id_t cnt_id;
+	uint32_t qidx, iidx = 0;
 	struct rte_ring *qcache = NULL;
 
 	/*
@@ -86,6 +87,174 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
 	} while (reset_cnt_num > 0);
 }
 
+/**
+ * Release AGE parameter.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param own_cnt_index
+ *   Counter ID to created only for this AGE to release.
+ *   Zero means there is no such counter.
+ * @param age_ipool
+ *   Pointer to AGE parameter indexed pool.
+ * @param idx
+ *   Index of AGE parameter in the indexed pool.
+ */
+static void
+mlx5_hws_age_param_free(struct mlx5_priv *priv, cnt_id_t own_cnt_index,
+			struct mlx5_indexed_pool *age_ipool, uint32_t idx)
+{
+	if (own_cnt_index) {
+		struct mlx5_hws_cnt_pool *cpool = priv->hws_cpool;
+
+		MLX5_ASSERT(mlx5_hws_cnt_is_shared(cpool, own_cnt_index));
+		mlx5_hws_cnt_shared_put(cpool, &own_cnt_index);
+	}
+	mlx5_ipool_free(age_ipool, idx);
+}
+
+/**
+ * Check and callback event for new aged flow in the HWS counter pool.
+ *
+ * @param[in] priv
+ *   Pointer to port private object.
+ * @param[in] cpool
+ *   Pointer to current counter pool.
+ */
+static void
+mlx5_hws_aging_check(struct mlx5_priv *priv, struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct flow_counter_stats *stats = cpool->raw_mng->raw;
+	struct mlx5_hws_age_param *param;
+	struct rte_ring *r;
+	const uint64_t curr_time = MLX5_CURR_TIME_SEC;
+	const uint32_t time_delta = curr_time - cpool->time_of_last_age_check;
+	uint32_t nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(cpool);
+	uint16_t expected1 = HWS_AGE_CANDIDATE;
+	uint16_t expected2 = HWS_AGE_CANDIDATE_INSIDE_RING;
+	uint32_t i;
+
+	cpool->time_of_last_age_check = curr_time;
+	for (i = 0; i < nb_alloc_cnts; ++i) {
+		uint32_t age_idx = cpool->pool[i].age_idx;
+		uint64_t hits;
+
+		if (!cpool->pool[i].in_used || age_idx == 0)
+			continue;
+		param = mlx5_ipool_get(age_info->ages_ipool, age_idx);
+		if (unlikely(param == NULL)) {
+			/*
+			 * When AGE which used indirect counter it is user
+			 * responsibility not using this indirect counter
+			 * without this AGE.
+			 * If this counter is used after the AGE was freed, the
+			 * AGE index is invalid and using it here will cause a
+			 * segmentation fault.
+			 */
+			DRV_LOG(WARNING,
+				"Counter %u is lost his AGE, it is unused.", i);
+			continue;
+		}
+		if (param->timeout == 0)
+			continue;
+		switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+		case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		case HWS_AGE_AGED_OUT_REPORTED:
+			/* Already aged-out, no action is needed. */
+			continue;
+		case HWS_AGE_CANDIDATE:
+		case HWS_AGE_CANDIDATE_INSIDE_RING:
+			/* This AGE candidate to be aged-out, go to checking. */
+			break;
+		case HWS_AGE_FREE:
+			/*
+			 * AGE parameter with state "FREE" couldn't be pointed
+			 * by any counter since counter is destroyed first.
+			 * Fall-through.
+			 */
+		default:
+			MLX5_ASSERT(0);
+			continue;
+		}
+		hits = rte_be_to_cpu_64(stats[i].hits);
+		if (param->nb_cnts == 1) {
+			if (hits != param->accumulator_last_hits) {
+				__atomic_store_n(&param->sec_since_last_hit, 0,
+						 __ATOMIC_RELAXED);
+				param->accumulator_last_hits = hits;
+				continue;
+			}
+		} else {
+			param->accumulator_hits += hits;
+			param->accumulator_cnt++;
+			if (param->accumulator_cnt < param->nb_cnts)
+				continue;
+			param->accumulator_cnt = 0;
+			if (param->accumulator_last_hits !=
+						param->accumulator_hits) {
+				__atomic_store_n(&param->sec_since_last_hit,
+						 0, __ATOMIC_RELAXED);
+				param->accumulator_last_hits =
+							param->accumulator_hits;
+				param->accumulator_hits = 0;
+				continue;
+			}
+			param->accumulator_hits = 0;
+		}
+		if (__atomic_add_fetch(&param->sec_since_last_hit, time_delta,
+				       __ATOMIC_RELAXED) <=
+		   __atomic_load_n(&param->timeout, __ATOMIC_RELAXED))
+			continue;
+		/* Prepare the relevant ring for this AGE parameter */
+		if (priv->hws_strict_queue)
+			r = age_info->hw_q_age->aged_lists[param->queue_id];
+		else
+			r = age_info->hw_age.aged_list;
+		/* Changing the state atomically and insert it into the ring. */
+		if (__atomic_compare_exchange_n(&param->state, &expected1,
+						HWS_AGE_AGED_OUT_NOT_REPORTED,
+						false, __ATOMIC_RELAXED,
+						__ATOMIC_RELAXED)) {
+			int ret = rte_ring_enqueue_burst_elem(r, &age_idx,
+							      sizeof(uint32_t),
+							      1, NULL);
+
+			/*
+			 * The ring doesn't have enough room for this entry,
+			 * it replace back the state for the next second.
+			 *
+			 * FIXME: if until next sec it get traffic, we are going
+			 *        to lose this "aged out", will be fixed later
+			 *        when optimise it to fill ring in bulks.
+			 */
+			expected2 = HWS_AGE_AGED_OUT_NOT_REPORTED;
+			if (ret < 0 &&
+			    !__atomic_compare_exchange_n(&param->state,
+							 &expected2, expected1,
+							 false,
+							 __ATOMIC_RELAXED,
+							 __ATOMIC_RELAXED) &&
+			    expected2 == HWS_AGE_FREE)
+				mlx5_hws_age_param_free(priv,
+							param->own_cnt_index,
+							age_info->ages_ipool,
+							age_idx);
+			/* The event is irrelevant in strict queue mode. */
+			if (!priv->hws_strict_queue)
+				MLX5_AGE_SET(age_info, MLX5_AGE_EVENT_NEW);
+		} else {
+			__atomic_compare_exchange_n(&param->state, &expected2,
+						  HWS_AGE_AGED_OUT_NOT_REPORTED,
+						  false, __ATOMIC_RELAXED,
+						  __ATOMIC_RELAXED);
+		}
+	}
+	/* The event is irrelevant in strict queue mode. */
+	if (!priv->hws_strict_queue)
+		mlx5_age_event_prepare(priv->sh);
+}
+
 static void
 mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
 			   struct mlx5_hws_cnt_raw_data_mng *mng)
@@ -104,12 +273,14 @@ mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
 	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
 	int ret;
 	size_t sz = n * sizeof(struct flow_counter_stats);
+	size_t pgsz = rte_mem_page_size();
 
+	MLX5_ASSERT(pgsz > 0);
 	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
 			SOCKET_ID_ANY);
 	if (mng == NULL)
 		goto error;
-	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, pgsz,
 			SOCKET_ID_ANY);
 	if (mng->raw == NULL)
 		goto error;
@@ -146,6 +317,9 @@ mlx5_hws_cnt_svc(void *opaque)
 			    opriv->sh == sh &&
 			    opriv->hws_cpool != NULL) {
 				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+				if (opriv->hws_age_req)
+					mlx5_hws_aging_check(opriv,
+							     opriv->hws_cpool);
 			}
 		}
 		query_cycle = rte_rdtsc() - start_cycle;
@@ -158,8 +332,9 @@ mlx5_hws_cnt_svc(void *opaque)
 }
 
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg)
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct mlx5_hws_cnt_pool *cntp;
@@ -185,16 +360,26 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 	cntp->cache->preload_sz = ccfg->preload_sz;
 	cntp->cache->threshold = ccfg->threshold;
 	cntp->cache->q_num = ccfg->q_num;
+	if (pcfg->request_num > sh->hws_max_nb_counters) {
+		DRV_LOG(ERR, "Counter number %u "
+			"is greater than the maximum supported (%u).",
+			pcfg->request_num, sh->hws_max_nb_counters);
+		goto error;
+	}
 	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
 	if (cnt_num > UINT32_MAX) {
 		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
 			cnt_num);
 		goto error;
 	}
+	/*
+	 * When counter request number is supported, but the factor takes it
+	 * out of size, the factor is reduced.
+	 */
+	cnt_num = RTE_MIN((uint32_t)cnt_num, sh->hws_max_nb_counters);
 	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
-			sizeof(struct mlx5_hws_cnt) *
-			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
-			0, SOCKET_ID_ANY);
+				 sizeof(struct mlx5_hws_cnt) * cnt_num,
+				 0, SOCKET_ID_ANY);
 	if (cntp->pool == NULL)
 		goto error;
 	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
@@ -231,6 +416,8 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 		if (cntp->cache->qcache[qidx] == NULL)
 			goto error;
 	}
+	/* Initialize the time for aging-out calculation. */
+	cntp->time_of_last_age_check = MLX5_CURR_TIME_SEC;
 	return cntp;
 error:
 	mlx5_hws_cnt_pool_deinit(cntp);
@@ -297,19 +484,17 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_hws_cnt_pool *cpool)
 {
 	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
-	uint32_t max_log_bulk_sz = 0;
+	uint32_t max_log_bulk_sz = sh->hws_max_log_bulk_sz;
 	uint32_t log_bulk_sz;
-	uint32_t idx, alloced = 0;
+	uint32_t idx, alloc_candidate, alloced = 0;
 	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
 	struct mlx5_devx_counter_attr attr = {0};
 	struct mlx5_devx_obj *dcs;
 
 	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
-		DRV_LOG(ERR,
-			"Fw doesn't support bulk log max alloc");
+		DRV_LOG(ERR, "Fw doesn't support bulk log max alloc");
 		return -1;
 	}
-	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
 	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
 	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
 	attr.pd = sh->cdev->pdn;
@@ -327,18 +512,23 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 	cpool->dcs_mng.dcs[0].iidx = 0;
 	alloced = cpool->dcs_mng.dcs[0].batch_sz;
 	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
-		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+		while (idx < MLX5_HWS_CNT_DCS_NUM) {
 			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			alloc_candidate = RTE_BIT32(max_log_bulk_sz);
+			if (alloced + alloc_candidate > sh->hws_max_nb_counters)
+				continue;
 			dcs = mlx5_devx_cmd_flow_counter_alloc_general
 				(sh->cdev->ctx, &attr);
 			if (dcs == NULL)
 				goto error;
 			cpool->dcs_mng.dcs[idx].obj = dcs;
-			cpool->dcs_mng.dcs[idx].batch_sz =
-				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].batch_sz = alloc_candidate;
 			cpool->dcs_mng.dcs[idx].iidx = alloced;
 			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
 			cpool->dcs_mng.batch_total++;
+			if (alloced >= cnt_num)
+				break;
+			idx++;
 		}
 	}
 	return 0;
@@ -445,7 +635,7 @@ mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
 			dev->data->port_id);
 	pcfg.name = mp_name;
 	pcfg.request_num = pattr->nb_counters;
-	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	cpool = mlx5_hws_cnt_pool_init(priv->sh, &pcfg, &cparam);
 	if (cpool == NULL)
 		goto error;
 	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
@@ -525,4 +715,484 @@ mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
 	sh->cnt_svc = NULL;
 }
 
+/**
+ * Destroy AGE action.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ * @param error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	switch (__atomic_exchange_n(&param->state, HWS_AGE_FREE,
+				    __ATOMIC_RELAXED)) {
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_AGED_OUT_REPORTED:
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		/*
+		 * In both cases AGE is inside the ring. Change the state here
+		 * and destroy it later when it is taken out of ring.
+		 */
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * If index is valid and state is FREE, it says this AGE has
+		 * been freed for the user but not for the PMD since it is
+		 * inside the ring.
+		 */
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "this AGE has already been released");
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return 0;
+}
+
+/**
+ * Create AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue_id
+ *   Which HWS queue to be used.
+ * @param[in] shared
+ *   Whether it indirect AGE action.
+ * @param[in] flow_idx
+ *   Flow index from indexed pool.
+ *   For indirect AGE action it doesn't affect.
+ * @param[in] age
+ *   Pointer to the aging action configuration.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Index to AGE action parameter on success, 0 otherwise.
+ */
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param;
+	uint32_t age_idx;
+
+	param = mlx5_ipool_malloc(ipool, &age_idx);
+	if (param == NULL) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   "cannot allocate AGE parameter");
+		return 0;
+	}
+	MLX5_ASSERT(__atomic_load_n(&param->state,
+				    __ATOMIC_RELAXED) == HWS_AGE_FREE);
+	if (shared) {
+		param->nb_cnts = 0;
+		param->accumulator_hits = 0;
+		param->accumulator_cnt = 0;
+		flow_idx = age_idx;
+	} else {
+		param->nb_cnts = 1;
+	}
+	param->context = age->context ? age->context :
+					(void *)(uintptr_t)flow_idx;
+	param->timeout = age->timeout;
+	param->queue_id = queue_id;
+	param->accumulator_last_hits = 0;
+	param->own_cnt_index = 0;
+	param->sec_since_last_hit = 0;
+	param->state = HWS_AGE_CANDIDATE;
+	return age_idx;
+}
+
+/**
+ * Update indirect AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] idx
+ *   Index of AGE parameter.
+ * @param[in] update
+ *   Update value.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error)
+{
+	const struct rte_flow_update_age *update_ade = update;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	bool sec_since_last_hit_reset = false;
+	bool state_update = false;
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	if (update_ade->timeout_valid) {
+		uint32_t old_timeout = __atomic_exchange_n(&param->timeout,
+							   update_ade->timeout,
+							   __ATOMIC_RELAXED);
+
+		if (old_timeout == 0)
+			sec_since_last_hit_reset = true;
+		else if (old_timeout < update_ade->timeout ||
+			 update_ade->timeout == 0)
+			/*
+			 * When timeout is increased, aged-out flows might be
+			 * active again and state should be updated accordingly.
+			 * When new timeout is 0, we update the state for not
+			 * reporting aged-out stopped.
+			 */
+			state_update = true;
+	}
+	if (update_ade->touch) {
+		sec_since_last_hit_reset = true;
+		state_update = true;
+	}
+	if (sec_since_last_hit_reset)
+		__atomic_store_n(&param->sec_since_last_hit, 0,
+				 __ATOMIC_RELAXED);
+	if (state_update) {
+		uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+		/*
+		 * Change states of aged-out flows to active:
+		 *  - AGED_OUT_NOT_REPORTED -> CANDIDATE_INSIDE_RING
+		 *  - AGED_OUT_REPORTED -> CANDIDATE
+		 */
+		if (!__atomic_compare_exchange_n(&param->state, &expected,
+						 HWS_AGE_CANDIDATE_INSIDE_RING,
+						 false, __ATOMIC_RELAXED,
+						 __ATOMIC_RELAXED) &&
+		    expected == HWS_AGE_AGED_OUT_REPORTED)
+			__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+					 __ATOMIC_RELAXED);
+	}
+	return 0;
+}
+
+/**
+ * Get the AGE context if the aged-out index is still valid.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ *
+ * @return
+ *   AGE context if the index is still aged-out, NULL otherwise.
+ */
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+	MLX5_ASSERT(param != NULL);
+	if (__atomic_compare_exchange_n(&param->state, &expected,
+					HWS_AGE_AGED_OUT_REPORTED, false,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
+		return param->context;
+	switch (expected) {
+	case HWS_AGE_FREE:
+		/*
+		 * This AGE couldn't have been destroyed since it was inside
+		 * the ring. Its state has updated, and now it is actually
+		 * destroyed.
+		 */
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+				 __ATOMIC_RELAXED);
+		break;
+	case HWS_AGE_CANDIDATE:
+		/*
+		 * Only BG thread pushes to ring and it never pushes this state.
+		 * When AGE inside the ring becomes candidate, it has a special
+		 * state called HWS_AGE_CANDIDATE_INSIDE_RING.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_REPORTED:
+		/*
+		 * Only this thread (doing query) may write this state, and it
+		 * happens only after the query thread takes it out of the ring.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		/*
+		 * In this case the compare return true and function return
+		 * the context immediately.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return NULL;
+}
+
+#ifdef RTE_ARCH_64
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX UINT32_MAX
+#else
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX RTE_BIT32(8)
+#endif
+
+/**
+ * Get the size of aged out ring list for each queue.
+ *
+ * The size is one percent of nb_counters divided by nb_queues.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is on.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ * @param nb_queues
+ *   Number of HWS queues in this port.
+ *
+ * @return
+ *   Size of aged out ring per queue.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_q_ring_size_get(uint32_t nb_counters, uint32_t nb_queues)
+{
+	uint32_t size = rte_align32pow2((nb_counters / 100) / nb_queues);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Get the size of the aged out ring list.
+ *
+ * The size is one percent of nb_counters.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is off.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ *
+ * @return
+ *   Size of the aged out ring list.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_ring_size_get(uint32_t nb_counters)
+{
+	uint32_t size = rte_align32pow2(nb_counters / 100);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Initialize the shared aging list information per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param nb_queues
+ *   Number of HWS queues.
+ * @param strict_queue
+ *   Indicator whether is strict_queue mode.
+ * @param ring_size
+ *   Size of aged-out ring for creation.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hws_age_info_init(struct rte_eth_dev *dev, uint16_t nb_queues,
+		       bool strict_queue, uint32_t ring_size)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint32_t flags = RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct rte_ring *r = NULL;
+	uint32_t qidx;
+
+	age_info->flags = 0;
+	if (strict_queue) {
+		size_t size = sizeof(*age_info->hw_q_age) +
+			      sizeof(struct rte_ring *) * nb_queues;
+
+		age_info->hw_q_age = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+						 size, 0, SOCKET_ID_ANY);
+		if (age_info->hw_q_age == NULL)
+			return -ENOMEM;
+		for (qidx = 0; qidx < nb_queues; ++qidx) {
+			snprintf(mz_name, sizeof(mz_name),
+				 "port_%u_queue_%u_aged_out_ring",
+				 dev->data->port_id, qidx);
+			r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY,
+					    flags);
+			if (r == NULL) {
+				DRV_LOG(ERR, "\"%s\" creation failed: %s",
+					mz_name, rte_strerror(rte_errno));
+				goto error;
+			}
+			age_info->hw_q_age->aged_lists[qidx] = r;
+			DRV_LOG(DEBUG,
+				"\"%s\" is successfully created (size=%u).",
+				mz_name, ring_size);
+		}
+		age_info->hw_q_age->nb_rings = nb_queues;
+	} else {
+		snprintf(mz_name, sizeof(mz_name), "port_%u_aged_out_ring",
+			 dev->data->port_id);
+		r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY, flags);
+		if (r == NULL) {
+			DRV_LOG(ERR, "\"%s\" creation failed: %s", mz_name,
+				rte_strerror(rte_errno));
+			return -rte_errno;
+		}
+		age_info->hw_age.aged_list = r;
+		DRV_LOG(DEBUG, "\"%s\" is successfully created (size=%u).",
+			mz_name, ring_size);
+		/* In non "strict_queue" mode, initialize the event. */
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	return 0;
+error:
+	MLX5_ASSERT(strict_queue);
+	while (qidx--)
+		rte_ring_free(age_info->hw_q_age->aged_lists[qidx]);
+	rte_free(age_info->hw_q_age);
+	return -1;
+}
+
+/**
+ * Destroy the shared aging list information per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+static void
+mlx5_hws_age_info_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint16_t nb_queues = age_info->hw_q_age->nb_rings;
+
+	if (priv->hws_strict_queue) {
+		uint32_t qidx;
+
+		for (qidx = 0; qidx < nb_queues; ++qidx)
+			rte_ring_free(age_info->hw_q_age->aged_lists[qidx]);
+		rte_free(age_info->hw_q_age);
+	} else {
+		rte_ring_free(age_info->hw_age.aged_list);
+	}
+}
+
+/**
+ * Initialize the aging mechanism per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param attr
+ *   Port configuration attributes.
+ * @param nb_queues
+ *   Number of HWS queues.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool_config cfg = {
+		.size =
+		      RTE_CACHE_LINE_ROUNDUP(sizeof(struct mlx5_hws_age_param)),
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hws_age_pool",
+	};
+	bool strict_queue = !!(attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE);
+	uint32_t nb_alloc_cnts;
+	uint32_t rsize;
+	uint32_t nb_ages_updated;
+	int ret;
+
+	MLX5_ASSERT(priv->hws_cpool);
+	nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(priv->hws_cpool);
+	if (strict_queue) {
+		rsize = mlx5_hws_aged_out_q_ring_size_get(nb_alloc_cnts,
+							  nb_queues);
+		nb_ages_updated = rsize * nb_queues + attr->nb_aging_objects;
+	} else {
+		rsize = mlx5_hws_aged_out_ring_size_get(nb_alloc_cnts);
+		nb_ages_updated = rsize + attr->nb_aging_objects;
+	}
+	ret = mlx5_hws_age_info_init(dev, nb_queues, strict_queue, rsize);
+	if (ret < 0)
+		return ret;
+	cfg.trunk_size = rte_align32pow2(nb_ages_updated);
+	age_info->ages_ipool = mlx5_ipool_create(&cfg);
+	if (age_info->ages_ipool == NULL) {
+		mlx5_hws_age_info_destroy(priv);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	priv->hws_age_req = 1;
+	return 0;
+}
+
+/**
+ * Cleanup all aging resources per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+
+	MLX5_ASSERT(priv->hws_age_req);
+	mlx5_ipool_destroy(age_info->ages_ipool);
+	age_info->ages_ipool = NULL;
+	mlx5_hws_age_info_destroy(priv);
+	priv->hws_age_req = 0;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index 5fab4ba597..e311923f71 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -10,26 +10,26 @@
 #include "mlx5_flow.h"
 
 /*
- * COUNTER ID's layout
+ * HWS COUNTER ID's layout
  *       3                   2                   1                   0
  *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- *    | T |       | D |                                               |
- *    ~ Y |       | C |                    IDX                        ~
- *    | P |       | S |                                               |
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
- *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
  *    Bit 25:24 = DCS index
  *    Bit 23:00 = IDX in this counter belonged DCS bulk.
  */
-typedef uint32_t cnt_id_t;
 
-#define MLX5_HWS_CNT_DCS_NUM 4
 #define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
 #define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
 #define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
 
+#define MLX5_HWS_AGE_IDX_MASK (RTE_BIT32(MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1)
+
 struct mlx5_hws_cnt_dcs {
 	void *dr_action;
 	uint32_t batch_sz;
@@ -44,12 +44,22 @@ struct mlx5_hws_cnt_dcs_mng {
 
 struct mlx5_hws_cnt {
 	struct flow_counter_stats reset;
+	bool in_used; /* Indicator whether this counter in used or in pool. */
 	union {
-		uint32_t share: 1;
-		/*
-		 * share will be set to 1 when this counter is used as indirect
-		 * action. Only meaningful when user own this counter.
-		 */
+		struct {
+			uint32_t share:1;
+			/*
+			 * share will be set to 1 when this counter is used as
+			 * indirect action.
+			 */
+			uint32_t age_idx:24;
+			/*
+			 * When this counter uses for aging, it save the index
+			 * of AGE parameter. For pure counter (without aging)
+			 * this index is zero.
+			 */
+		};
+		/* This struct is only meaningful when user own this counter. */
 		uint32_t query_gen_when_free;
 		/*
 		 * When PMD own this counter (user put back counter to PMD
@@ -96,8 +106,48 @@ struct mlx5_hws_cnt_pool {
 	struct rte_ring *free_list;
 	struct rte_ring *wait_reset_list;
 	struct mlx5_hws_cnt_pool_caches *cache;
+	uint64_t time_of_last_age_check;
 } __rte_cache_aligned;
 
+/* HWS AGE status. */
+enum {
+	HWS_AGE_FREE, /* Initialized state. */
+	HWS_AGE_CANDIDATE, /* AGE assigned to flows. */
+	HWS_AGE_CANDIDATE_INSIDE_RING,
+	/*
+	 * AGE assigned to flows but it still in ring. It was aged-out but the
+	 * timeout was changed, so it in ring but stiil candidate.
+	 */
+	HWS_AGE_AGED_OUT_REPORTED,
+	/*
+	 * Aged-out, reported by rte_flow_get_q_aged_flows and wait for destroy.
+	 */
+	HWS_AGE_AGED_OUT_NOT_REPORTED,
+	/*
+	 * Aged-out, inside the aged-out ring.
+	 * wait for rte_flow_get_q_aged_flows and destroy.
+	 */
+};
+
+/* HWS counter age parameter. */
+struct mlx5_hws_age_param {
+	uint32_t timeout; /* Aging timeout in seconds (atomically accessed). */
+	uint32_t sec_since_last_hit;
+	/* Time in seconds since last hit (atomically accessed). */
+	uint16_t state; /* AGE state (atomically accessed). */
+	uint64_t accumulator_last_hits;
+	/* Last total value of hits for comparing. */
+	uint64_t accumulator_hits;
+	/* Accumulator for hits coming from several counters. */
+	uint32_t accumulator_cnt;
+	/* Number counters which already updated the accumulator in this sec. */
+	uint32_t nb_cnts; /* Number counters used by this AGE. */
+	uint32_t queue_id; /* Queue id of the counter. */
+	cnt_id_t own_cnt_index;
+	/* Counter action created specifically for this AGE action. */
+	void *context; /* Flow AGE context. */
+} __rte_packed __rte_cache_aligned;
+
 /**
  * Translate counter id into internal index (start from 0), which can be used
  * as index of raw/cnt pool.
@@ -107,7 +157,7 @@ struct mlx5_hws_cnt_pool {
  * @return
  *   Internal index
  */
-static __rte_always_inline cnt_id_t
+static __rte_always_inline uint32_t
 mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 {
 	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
@@ -139,7 +189,7 @@ mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
  *   Counter id
  */
 static __rte_always_inline cnt_id_t
-mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, uint32_t iidx)
 {
 	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
 	uint32_t idx;
@@ -344,9 +394,10 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
 	struct rte_ring_zc_data zcdr = {0};
 	struct rte_ring *qcache = NULL;
 	unsigned int wb_num = 0; /* cache write-back number. */
-	cnt_id_t iidx;
+	uint32_t iidx;
 
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].in_used = false;
 	cpool->pool[iidx].query_gen_when_free =
 		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
 	if (likely(queue != NULL))
@@ -388,20 +439,23 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
  *   A pointer to HWS queue. If null, it means fetch from common pool.
  * @param cnt_id
  *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @param age_idx
+ *   Index of AGE parameter using this counter, zero means there is no such AGE.
+ *
  * @return
  *   - 0: Success; objects taken.
  *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
  *   - -EAGAIN: counter is not ready; try again.
  */
 static __rte_always_inline int
-mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
-		uint32_t *queue, cnt_id_t *cnt_id)
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue,
+		      cnt_id_t *cnt_id, uint32_t age_idx)
 {
 	unsigned int ret;
 	struct rte_ring_zc_data zcdc = {0};
 	struct rte_ring *qcache = NULL;
-	uint32_t query_gen = 0;
-	cnt_id_t iidx, tmp_cid = 0;
+	uint32_t iidx, query_gen = 0;
+	cnt_id_t tmp_cid = 0;
 
 	if (likely(queue != NULL))
 		qcache = cpool->cache->qcache[*queue];
@@ -422,6 +476,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 		__hws_cnt_query_raw(cpool, *cnt_id,
 				    &cpool->pool[iidx].reset.hits,
 				    &cpool->pool[iidx].reset.bytes);
+		cpool->pool[iidx].in_used = true;
+		cpool->pool[iidx].age_idx = age_idx;
 		return 0;
 	}
 	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
@@ -455,6 +511,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 			    &cpool->pool[iidx].reset.bytes);
 	rte_ring_dequeue_zc_elem_finish(qcache, 1);
 	cpool->pool[iidx].share = 0;
+	cpool->pool[iidx].in_used = true;
+	cpool->pool[iidx].age_idx = age_idx;
 	return 0;
 }
 
@@ -478,16 +536,16 @@ mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
 }
 
 static __rte_always_inline int
-mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id,
+			uint32_t age_idx)
 {
 	int ret;
 	uint32_t iidx;
 
-	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id, age_idx);
 	if (ret != 0)
 		return ret;
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
-	MLX5_ASSERT(cpool->pool[iidx].share == 0);
 	cpool->pool[iidx].share = 1;
 	return 0;
 }
@@ -513,10 +571,73 @@ mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 	return cpool->pool[iidx].share ? true : false;
 }
 
+static __rte_always_inline void
+mlx5_hws_cnt_age_set(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		     uint32_t age_idx)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	cpool->pool[iidx].age_idx = age_idx;
+}
+
+static __rte_always_inline uint32_t
+mlx5_hws_cnt_age_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	return cpool->pool[iidx].age_idx;
+}
+
+static __rte_always_inline cnt_id_t
+mlx5_hws_age_cnt_get(struct mlx5_priv *priv, struct mlx5_hws_age_param *param,
+		     uint32_t age_idx)
+{
+	if (!param->own_cnt_index) {
+		/* Create indirect counter one for internal usage. */
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool,
+					    &param->own_cnt_index, age_idx) < 0)
+			return 0;
+		param->nb_cnts++;
+	}
+	return param->own_cnt_index;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_increase(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	MLX5_ASSERT(param != NULL);
+	param->nb_cnts++;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_decrease(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	if (param != NULL)
+		param->nb_cnts--;
+}
+
+static __rte_always_inline bool
+mlx5_hws_age_is_indirect(uint32_t age_idx)
+{
+	return (age_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_AGE ? true : false;
+}
+
 /* init HWS counter pool. */
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg);
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg);
 
 void
 mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
@@ -555,4 +676,28 @@ mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
 void
 mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
 
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error);
+
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error);
+
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error);
+
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx);
+
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues);
+
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv);
+
 #endif /* _MLX5_HWS_CNT_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 14/17] net/mlx5: add async action push and pull support
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (12 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 13/17] net/mlx5: add HWS AGE action support Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 15/17] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

The queue based rte_flow_async_action_* functions work same as
queue based async flow functions. The operations can be pushed
asynchronously, so is the pull.

This commit adds the async action missing push and pull support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |  62 ++++-
 drivers/net/mlx5/mlx5_flow.c       |  45 ++++
 drivers/net/mlx5/mlx5_flow.h       |  17 ++
 drivers/net/mlx5/mlx5_flow_aso.c   | 181 +++++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c    |   7 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 412 +++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c |   6 +-
 7 files changed, 626 insertions(+), 104 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c83157d0da..f6033710aa 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -341,6 +341,8 @@ struct mlx5_lb_ctx {
 enum {
 	MLX5_HW_Q_JOB_TYPE_CREATE, /* Flow create job type. */
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
+	MLX5_HW_Q_JOB_TYPE_UPDATE,
+	MLX5_HW_Q_JOB_TYPE_QUERY,
 };
 
 #define MLX5_HW_MAX_ITEMS (16)
@@ -348,12 +350,23 @@ enum {
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
-	struct rte_flow_hw *flow; /* Flow attached to the job. */
+	union {
+		struct rte_flow_hw *flow; /* Flow attached to the job. */
+		const void *action; /* Indirect action attached to the job. */
+	};
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
 	struct rte_flow_item *items;
-	struct rte_flow_item_ethdev port_spec;
+	union {
+		struct {
+			/* Pointer to ct query user memory. */
+			struct rte_flow_action_conntrack *profile;
+			/* Pointer to ct ASO query out memory. */
+			void *out_data;
+		} __rte_packed;
+		struct rte_flow_item_ethdev port_spec;
+	} __rte_packed;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -361,6 +374,8 @@ struct mlx5_hw_q {
 	uint32_t job_idx; /* Free job index. */
 	uint32_t size; /* LIFO size. */
 	struct mlx5_hw_q_job **job; /* LIFO header. */
+	struct rte_ring *indir_cq; /* Indirect action SW completion queue. */
+	struct rte_ring *indir_iq; /* Indirect action SW in progress queue. */
 } __rte_cache_aligned;
 
 
@@ -569,6 +584,7 @@ struct mlx5_aso_sq_elem {
 			struct mlx5_aso_ct_action *ct;
 			char *query_data;
 		};
+		void *user_data;
 	};
 };
 
@@ -578,7 +594,9 @@ struct mlx5_aso_sq {
 	struct mlx5_aso_cq cq;
 	struct mlx5_devx_sq sq_obj;
 	struct mlx5_pmd_mr mr;
+	volatile struct mlx5_aso_wqe *db;
 	uint16_t pi;
+	uint16_t db_pi;
 	uint32_t head;
 	uint32_t tail;
 	uint32_t sqn;
@@ -993,6 +1011,7 @@ struct mlx5_flow_meter_profile {
 enum mlx5_aso_mtr_state {
 	ASO_METER_FREE, /* In free list. */
 	ASO_METER_WAIT, /* ACCESS_ASO WQE in progress. */
+	ASO_METER_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_METER_READY, /* CQE received. */
 };
 
@@ -1195,6 +1214,7 @@ struct mlx5_bond_info {
 enum mlx5_aso_ct_state {
 	ASO_CONNTRACK_FREE, /* Inactive, in the free list. */
 	ASO_CONNTRACK_WAIT, /* WQE sent in the SQ. */
+	ASO_CONNTRACK_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_CONNTRACK_READY, /* CQE received w/o error. */
 	ASO_CONNTRACK_QUERY, /* WQE for query sent. */
 	ASO_CONNTRACK_MAX, /* Guard. */
@@ -1203,13 +1223,21 @@ enum mlx5_aso_ct_state {
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
 	union {
-		LIST_ENTRY(mlx5_aso_ct_action) next;
-		/* Pointer to the next ASO CT. Used only in SWS. */
-		struct mlx5_aso_ct_pool *pool;
-		/* Pointer to action pool. Used only in HWS. */
+		/* SWS mode struct. */
+		struct {
+			/* Pointer to the next ASO CT. Used only in SWS. */
+			LIST_ENTRY(mlx5_aso_ct_action) next;
+		};
+		/* HWS mode struct. */
+		struct {
+			/* Pointer to action pool. Used only in HWS. */
+			struct mlx5_aso_ct_pool *pool;
+		};
 	};
-	void *dr_action_orig; /* General action object for original dir. */
-	void *dr_action_rply; /* General action object for reply dir. */
+	/* General action object for original dir. */
+	void *dr_action_orig;
+	/* General action object for reply dir. */
+	void *dr_action_rply;
 	uint32_t refcnt; /* Action used count in device flows. */
 	uint16_t offset; /* Offset of ASO CT in DevX objects bulk. */
 	uint16_t peer; /* The only peer port index could also use this CT. */
@@ -2135,18 +2163,21 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 			   enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
-				 struct mlx5_aso_mtr *mtr,
-				 struct mlx5_mtr_bulk *bulk);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk,
+		void *user_data, bool push);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile);
+			      const struct rte_flow_action_conntrack *profile,
+			      void *user_data,
+			      bool push);
 int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
 int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
-			     struct rte_flow_action_conntrack *profile);
+			     struct rte_flow_action_conntrack *profile,
+			     void *user_data, bool push);
 int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
@@ -2154,6 +2185,13 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+void mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
+			     char *wdata);
+void mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_sq *sq);
+int mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			     struct rte_flow_op_result res[],
+			     uint16_t n_res);
 int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 4bfa604578..bc2ccb4d3c 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -979,6 +979,14 @@ mlx5_flow_async_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				  void *user_data,
 				  struct rte_flow_error *error);
 
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				 const struct rte_flow_op_attr *attr,
+				 const struct rte_flow_action_handle *handle,
+				 void *data,
+				 void *user_data,
+				 struct rte_flow_error *error);
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -1015,6 +1023,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.push = mlx5_flow_push,
 	.async_action_handle_create = mlx5_flow_async_action_handle_create,
 	.async_action_handle_update = mlx5_flow_async_action_handle_update,
+	.async_action_handle_query = mlx5_flow_async_action_handle_query,
 	.async_action_handle_destroy = mlx5_flow_async_action_handle_destroy,
 };
 
@@ -8858,6 +8867,42 @@ mlx5_flow_async_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 					 update, user_data, error);
 }
 
+/**
+ * Query shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used..
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] handle
+ *   Action handle to be updated.
+ * @param[in] data
+ *   Pointer query result data.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				    const struct rte_flow_op_attr *attr,
+				    const struct rte_flow_action_handle *handle,
+				    void *data,
+				    void *user_data,
+				    struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops =
+			flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+
+	return fops->async_action_query(dev, queue, attr, handle,
+					data, user_data, error);
+}
+
 /**
  * Destroy shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 30a18ea35e..e45869a890 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -57,6 +57,13 @@ enum mlx5_rte_flow_field_id {
 
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
+#define MLX5_INDIRECT_ACTION_TYPE_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) >> MLX5_INDIRECT_ACTION_TYPE_OFFSET)
+
+#define MLX5_INDIRECT_ACTION_IDX_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) & \
+	 ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1))
+
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
@@ -1816,6 +1823,15 @@ typedef int (*mlx5_flow_async_action_handle_update_t)
 			 void *user_data,
 			 struct rte_flow_error *error);
 
+typedef int (*mlx5_flow_async_action_handle_query_t)
+			(struct rte_eth_dev *dev,
+			 uint32_t queue,
+			 const struct rte_flow_op_attr *attr,
+			 const struct rte_flow_action_handle *handle,
+			 void *data,
+			 void *user_data,
+			 struct rte_flow_error *error);
+
 typedef int (*mlx5_flow_async_action_handle_destroy_t)
 			(struct rte_eth_dev *dev,
 			 uint32_t queue,
@@ -1878,6 +1894,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_push_t push;
 	mlx5_flow_async_action_handle_create_t async_action_create;
 	mlx5_flow_async_action_handle_update_t async_action_update;
+	mlx5_flow_async_action_handle_query_t async_action_query;
 	mlx5_flow_async_action_handle_destroy_t async_action_destroy;
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index f371fff2e2..43ef893e9d 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -519,6 +519,70 @@ mlx5_aso_cqe_err_handle(struct mlx5_aso_sq *sq)
 			       (volatile uint32_t *)&sq->sq_obj.aso_wqes[idx]);
 }
 
+int
+mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			 struct rte_flow_op_result res[],
+			 uint16_t n_res)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const uint32_t cq_size = 1 << cq->log_desc_n;
+	const uint32_t mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx;
+	uint16_t max;
+	uint16_t n = 0;
+	int ret;
+
+	max = (uint16_t)(sq->head - sq->tail);
+	if (unlikely(!max || !n_res))
+		return 0;
+	next_idx = cq->cq_ci & mask;
+	do {
+		idx = next_idx;
+		next_idx = (cq->cq_ci + 1) & mask;
+		/* Need to confirm the position of the prefetch. */
+		rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+		cqe = &cq->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, cq->cq_ci);
+		/*
+		 * Be sure owner read is done before any other cookie field or
+		 * opaque field.
+		 */
+		rte_io_rmb();
+		if (ret == MLX5_CQE_STATUS_HW_OWN)
+			break;
+		res[n].user_data = sq->elts[(uint16_t)((sq->tail + n) & mask)].user_data;
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			mlx5_aso_cqe_err_handle(sq);
+			res[n].status = RTE_FLOW_OP_ERROR;
+		} else {
+			res[n].status = RTE_FLOW_OP_SUCCESS;
+		}
+		cq->cq_ci++;
+		if (++n == n_res)
+			break;
+	} while (1);
+	if (likely(n)) {
+		sq->tail += n;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return n;
+}
+
+void
+mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		  struct mlx5_aso_sq *sq)
+{
+	if (sq->db_pi == sq->pi)
+		return;
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)sq->db,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	sq->db_pi = sq->pi;
+}
+
 /**
  * Update ASO objects upon completion.
  *
@@ -728,7 +792,9 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
 			       struct mlx5_mtr_bulk *bulk,
-				   bool need_lock)
+			       bool need_lock,
+			       void *user_data,
+			       bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -754,7 +820,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
-	sq->elts[sq->head & mask].mtr = aso_mtr;
+	sq->elts[sq->head & mask].mtr = user_data ? user_data : aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
 		if (likely(sh->config.dv_flow_en == 2))
 			pool = aso_mtr->pool;
@@ -820,9 +886,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	 */
 	sq->head++;
 	sq->pi += 2;/* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -912,11 +982,14 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
-			struct mlx5_mtr_bulk *bulk)
+			struct mlx5_mtr_bulk *bulk,
+			void *user_data,
+			bool push)
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 	bool need_lock;
+	int ret;
 
 	if (likely(sh->config.dv_flow_en == 2)) {
 		if (queue == MLX5_HW_INV_QUEUE) {
@@ -930,10 +1003,15 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						     need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
-						   bulk, need_lock))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						   need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -962,6 +1040,7 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	uint8_t state;
 	bool need_lock;
 
 	if (likely(sh->config.dv_flow_en == 2)) {
@@ -976,8 +1055,8 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
-	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
-					    ASO_METER_READY)
+	state = __atomic_load_n(&mtr->state, __ATOMIC_RELAXED);
+	if (state == ASO_METER_READY || state == ASO_METER_WAIT_ASYNC)
 		return 0;
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
@@ -1093,7 +1172,9 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile,
-			      bool need_lock)
+			      bool need_lock,
+			      void *user_data,
+			      bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1117,10 +1198,16 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
-	sq->elts[sq->head & mask].ct = ct;
-	sq->elts[sq->head & mask].query_data = NULL;
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_WAIT);
+	if (user_data) {
+		sq->elts[sq->head & mask].user_data = user_data;
+	} else {
+		sq->elts[sq->head & mask].ct = ct;
+		sq->elts[sq->head & mask].query_data = NULL;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
+
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1200,9 +1287,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 		 profile->reply_dir.max_ack);
 	sq->head++;
 	sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1258,7 +1349,9 @@ static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_aso_sq *sq,
 			    struct mlx5_aso_ct_action *ct, char *data,
-			    bool need_lock)
+			    bool need_lock,
+			    void *user_data,
+			    bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1284,14 +1377,23 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_QUERY);
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_QUERY);
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	/* Confirm the location and address of the prefetch instruction. */
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	wqe_idx = sq->head & mask;
-	sq->elts[wqe_idx].ct = ct;
-	sq->elts[wqe_idx].query_data = data;
+	/* Check if this is async mode. */
+	if (user_data) {
+		struct mlx5_hw_q_job *job = (struct mlx5_hw_q_job *)user_data;
+
+		sq->elts[wqe_idx].ct = user_data;
+		job->out_data = (char *)((uintptr_t)sq->mr.addr + wqe_idx * 64);
+	} else {
+		sq->elts[wqe_idx].query_data = data;
+		sq->elts[wqe_idx].ct = ct;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
@@ -1317,9 +1419,13 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	 * data segment is not used in this case.
 	 */
 	sq->pi += 2;
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1405,20 +1511,29 @@ int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
-			  const struct rte_flow_action_conntrack *profile)
+			  const struct rte_flow_action_conntrack *profile,
+			  void *user_data,
+			  bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
 	struct mlx5_aso_sq *sq;
 	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
+	int ret;
 
 	if (sh->config.dv_flow_en == 2)
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						    need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
-		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
+		mlx5_aso_ct_completion_handle(sh, sq,  need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						  need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
@@ -1478,7 +1593,7 @@ mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
  * @param[in] wdata
  *   Pointer to data fetched from hardware.
  */
-static inline void
+void
 mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
 			char *wdata)
 {
@@ -1562,7 +1677,8 @@ int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
-			 struct rte_flow_action_conntrack *profile)
+			 struct rte_flow_action_conntrack *profile,
+			 void *user_data, bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
@@ -1575,9 +1691,15 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+						  need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+				need_lock, NULL, true);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1628,7 +1750,8 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		rte_errno = ENXIO;
 		return -rte_errno;
 	} else if (state == ASO_CONNTRACK_READY ||
-		   state == ASO_CONNTRACK_QUERY) {
+		   state == ASO_CONNTRACK_QUERY ||
+		   state == ASO_CONNTRACK_WAIT_ASYNC) {
 		return 0;
 	}
 	do {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1146e13cfa..d31838e26e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -13103,7 +13103,7 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro, NULL, true)) {
 		flow_dv_aso_ct_dev_release(dev, idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -15917,7 +15917,7 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		if (ret)
 			return ret;
 		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						ct, new_prf);
+						ct, new_prf, NULL, true);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16753,7 +16753,8 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct,
+					data, NULL, true))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 161b96cd87..9f70637fcf 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -1178,9 +1178,9 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 }
 
 static __rte_always_inline struct mlx5_aso_mtr *
-flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
-			   const struct rte_flow_action *action,
-			   uint32_t queue)
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action *action,
+			 void *user_data, bool push)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1200,13 +1200,14 @@ flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
 	fm->is_enable = meter_mark->state;
 	fm->color_aware = meter_mark->color_mode;
 	aso_mtr->pool = pool;
-	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->state = (queue == MLX5_HW_INV_QUEUE) ?
+			  ASO_METER_WAIT : ASO_METER_WAIT_ASYNC;
 	aso_mtr->offset = mtr_id - 1;
 	aso_mtr->init_color = (meter_mark->color_mode) ?
 		meter_mark->init_color : RTE_COLOR_GREEN;
 	/* Update ASO flow meter by wqe. */
 	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-					 &priv->mtr_bulk)) {
+					 &priv->mtr_bulk, user_data, push)) {
 		mlx5_ipool_free(pool->idx_pool, mtr_id);
 		return NULL;
 	}
@@ -1231,7 +1232,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_aso_mtr *aso_mtr;
 
-	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, NULL, true);
 	if (!aso_mtr)
 		return -1;
 
@@ -2295,9 +2296,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				rte_col_2_mlx5_col(aso_mtr->init_color);
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/*
+			 * Allocate meter directly will slow down flow
+			 * insertion rate.
+			 */
 			ret = flow_hw_meter_mark_compile(dev,
 				act_data->action_dst, action,
-				rule_acts, &job->flow->mtr_id, queue);
+				rule_acts, &job->flow->mtr_id, MLX5_HW_INV_QUEUE);
 			if (ret != 0)
 				return ret;
 			break;
@@ -2604,6 +2609,74 @@ flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
 	}
 }
 
+static inline int
+__flow_hw_pull_indir_action_comp(struct rte_eth_dev *dev,
+				 uint32_t queue,
+				 struct rte_flow_op_result res[],
+				 uint16_t n_res)
+
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *r = priv->hw_q[queue].indir_cq;
+	struct mlx5_hw_q_job *job;
+	void *user_data = NULL;
+	uint32_t type, idx;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_aso_ct_action *aso_ct;
+	int ret_comp, i;
+
+	ret_comp = (int)rte_ring_count(r);
+	if (ret_comp > n_res)
+		ret_comp = n_res;
+	for (i = 0; i < ret_comp; i++) {
+		rte_ring_dequeue(r, &user_data);
+		res[i].user_data = user_data;
+		res[i].status = RTE_FLOW_OP_SUCCESS;
+	}
+	if (ret_comp < n_res && priv->hws_mpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->hws_mpool->sq[queue],
+				&res[ret_comp], n_res - ret_comp);
+	if (ret_comp < n_res && priv->hws_ctpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->ct_mng->aso_sqs[queue],
+				&res[ret_comp], n_res - ret_comp);
+	for (i = 0; i <  ret_comp; i++) {
+		job = (struct mlx5_hw_q_job *)res[i].user_data;
+		/* Restore user data. */
+		res[i].user_data = job->user_data;
+		if (job->type == MLX5_HW_Q_JOB_TYPE_DESTROY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				mlx5_ipool_free(priv->hws_mpool->idx_pool, idx);
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_CREATE) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				aso_mtr = mlx5_ipool_get(priv->hws_mpool->idx_pool, idx);
+				aso_mtr->state = ASO_METER_READY;
+			} else if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_QUERY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				mlx5_aso_ct_obj_analyze(job->profile,
+							job->out_data);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		}
+		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
+	}
+	return ret_comp;
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2636,6 +2709,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
+	/* 1. Pull the flow completion. */
 	ret = mlx5dr_send_queue_poll(priv->dr_ctx, queue, res, n_res);
 	if (ret < 0)
 		return rte_flow_error_set(error, rte_errno,
@@ -2661,9 +2735,34 @@ flow_hw_pull(struct rte_eth_dev *dev,
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
 	}
+	/* 2. Pull indirect action comp. */
+	if (ret < n_res)
+		ret += __flow_hw_pull_indir_action_comp(dev, queue, &res[ret],
+							n_res - ret);
 	return ret;
 }
 
+static inline void
+__flow_hw_push_action(struct rte_eth_dev *dev,
+		    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *iq = priv->hw_q[queue].indir_iq;
+	struct rte_ring *cq = priv->hw_q[queue].indir_cq;
+	void *job = NULL;
+	uint32_t ret, i;
+
+	ret = rte_ring_count(iq);
+	for (i = 0; i < ret; i++) {
+		rte_ring_dequeue(iq, &job);
+		rte_ring_enqueue(cq, job);
+	}
+	if (priv->hws_ctpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->ct_mng->aso_sqs[queue]);
+	if (priv->hws_mpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->hws_mpool->sq[queue]);
+}
+
 /**
  * Push the enqueued flows to HW.
  *
@@ -2687,6 +2786,7 @@ flow_hw_push(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret;
 
+	__flow_hw_push_action(dev, queue);
 	ret = mlx5dr_send_queue_action(priv->dr_ctx, queue,
 				       MLX5DR_SEND_QUEUE_ACTION_DRAIN);
 	if (ret) {
@@ -5944,7 +6044,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* Adds one queue to be used by PMD.
 	 * The last queue will be used by the PMD.
 	 */
-	uint16_t nb_q_updated;
+	uint16_t nb_q_updated = 0;
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
@@ -6011,6 +6111,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		goto err;
 	}
 	for (i = 0; i < nb_q_updated; i++) {
+		char mz_name[RTE_MEMZONE_NAMESIZE];
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 		struct rte_flow_item *items = NULL;
@@ -6038,6 +6139,22 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_cq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_cq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_cq)
+			goto err;
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_iq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_iq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_iq)
+			goto err;
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
 	dr_ctx_attr.queues = nb_q_updated;
@@ -6155,6 +6272,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
+	for (i = 0; i < nb_q_updated; i++) {
+		if (priv->hw_q[i].indir_iq)
+			rte_ring_free(priv->hw_q[i].indir_iq);
+		if (priv->hw_q[i].indir_cq)
+			rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	if (priv->acts_ipool) {
@@ -6184,7 +6307,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i;
+	uint32_t i;
 
 	if (!priv->dr_ctx)
 		return;
@@ -6230,6 +6353,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	for (i = 0; i < priv->nb_queue; i++) {
+		rte_ring_free(priv->hw_q[i].indir_iq);
+		rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -6418,8 +6545,9 @@ flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
 }
 
 static int
-flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t queue, uint32_t idx,
 			struct rte_flow_action_conntrack *profile,
+			void *user_data, bool push,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6443,7 +6571,7 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 	}
 	profile->peer_port = ct->peer;
 	profile->is_original_dir = ct->is_original;
-	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, queue, ct, profile, user_data, push))
 		return rte_flow_error_set(error, EIO,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -6455,7 +6583,8 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 static int
 flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_modify_conntrack *action_conf,
-			 uint32_t idx, struct rte_flow_error *error)
+			 uint32_t idx, void *user_data, bool push,
+			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
@@ -6486,7 +6615,8 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf,
+						user_data, push);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -6508,6 +6638,7 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 static struct rte_flow_action_handle *
 flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_action_conntrack *pro,
+			 void *user_data, bool push,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6534,7 +6665,7 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	ct->pool = pool;
-	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro, user_data, push)) {
 		mlx5_ipool_free(pool->cts, ct_idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -6626,15 +6757,29 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     struct rte_flow_error *error)
 {
 	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 	uint32_t age_idx;
+	bool push = true;
+	bool aso = false;
 
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx)) {
+			rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Flow queue full.");
+			return NULL;
+		}
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_CREATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (action->type) {
 	case RTE_FLOW_ACTION_TYPE_AGE:
 		age = action->conf;
@@ -6662,10 +6807,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 				 (uintptr_t)cnt_id;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
-		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		aso = true;
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, job,
+						  push, error);
 		break;
 	case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		aso = true;
+		aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, job, push);
 		if (!aso_mtr)
 			break;
 		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
@@ -6678,7 +6826,20 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	default:
 		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				   NULL, "action type not supported");
-		return NULL;
+		break;
+	}
+	if (job) {
+		if (!handle) {
+			priv->hw_q[queue].job_idx++;
+			return NULL;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return handle;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
 	return handle;
 }
@@ -6712,32 +6873,56 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_modify_conntrack *ct_conf =
+		(const struct rte_flow_modify_conntrack *)update;
 	const struct rte_flow_update_meter_mark *upd_meter_mark =
 		(const struct rte_flow_update_meter_mark *)update;
 	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+	int ret = 0;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action update failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_UPDATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_update(priv, idx, update, error);
+		ret = mlx5_hws_age_action_update(priv, idx, update, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+		if (ct_conf->state)
+			aso = true;
+		ret = flow_hw_conntrack_update(dev, queue, update, act_idx,
+					       job, push, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso = true;
 		meter_mark = &upd_meter_mark->meter_mark;
 		/* Find ASO object. */
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark update index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		if (upd_meter_mark->profile_valid)
 			fm->profile = (struct mlx5_flow_meter_profile *)
@@ -6751,25 +6936,46 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			fm->is_enable = meter_mark->state;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
-						 aso_mtr, &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 aso_mtr, &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
+		}
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_update(dev, handle, update, error);
+		ret = flow_dv_action_update(dev, handle, update, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return 0;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 /**
@@ -6804,15 +7010,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
+	bool push = true;
+	bool aso = false;
+	int ret = 0;
 
-	RTE_SET_USED(queue);
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_DESTROY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_destroy(priv, age_idx, error);
+		ret = mlx5_hws_age_action_destroy(priv, age_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
 		if (age_idx != 0)
@@ -6821,39 +7040,69 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			 * time to update the AGE.
 			 */
 			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
-		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		ret = mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_destroy(dev, act_idx, error);
+		ret = flow_hw_conntrack_destroy(dev, act_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark destroy index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		fm->is_enable = 0;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-						 &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		mlx5_ipool_free(pool->idx_pool, idx);
+			break;
+		}
+		if (!job)
+			mlx5_ipool_free(pool->idx_pool, idx);
+		else
+			aso = true;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_destroy(dev, handle, error);
+		ret = flow_dv_action_destroy(dev, handle, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 static int
@@ -7083,28 +7332,76 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_action_query(struct rte_eth_dev *dev,
-		     const struct rte_flow_action_handle *handle, void *data,
-		     struct rte_flow_error *error)
+flow_hw_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+			    const struct rte_flow_op_attr *attr,
+			    const struct rte_flow_action_handle *handle,
+			    void *data, void *user_data,
+			    struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_q_job *job = NULL;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
+	int ret;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_QUERY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return flow_hw_query_age(dev, age_idx, data, error);
+		ret = flow_hw_query_age(dev, age_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
-		return flow_hw_query_counter(dev, act_idx, data, error);
+		ret = flow_hw_query_counter(dev, act_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_query(dev, handle, data, error);
+		aso = true;
+		if (job)
+			job->profile = (struct rte_flow_action_conntrack *)data;
+		ret = flow_hw_conntrack_query(dev, queue, act_idx, data,
+					      job, push, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
+	}
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
+	return 0;
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_query(dev, MLX5_HW_INV_QUEUE, NULL,
+			handle, data, NULL, error);
 }
 
 /**
@@ -7219,6 +7516,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
+	.async_action_query = flow_hw_action_handle_query,
 	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index fd1337ae73..480ac6c8ec 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -1627,7 +1627,7 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
@@ -1877,7 +1877,7 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1983,7 +1983,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
 	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
-					   &priv->mtr_bulk);
+					   &priv->mtr_bulk, NULL, true);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
 			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 15/17] net/mlx5: support flow integrity in HWS group 0
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (13 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 14/17] net/mlx5: add async action push and pull support Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 16/17] net/mlx5: support device control for E-Switch default rule Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 17/17] net/mlx5: support device control of representor matching Suanming Mou
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

- Reformat flow integrity item translation for HWS code.
- Support flow integrity bits in HWS group 0.
- Update integrity item translation to match positive semantics only.
Positive flow semantics was described in patch [ae37c0f60c].

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   1 +
 drivers/net/mlx5/mlx5_flow_dv.c | 163 ++++++++++++++++----------------
 drivers/net/mlx5/mlx5_flow_hw.c |   8 ++
 3 files changed, 90 insertions(+), 82 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index e45869a890..3f4aa080bb 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1462,6 +1462,7 @@ struct mlx5_dv_matcher_workspace {
 	struct mlx5_flow_rss_desc *rss_desc; /* RSS descriptor. */
 	const struct rte_flow_item *tunnel_item; /* Flow tunnel item. */
 	const struct rte_flow_item *gre_item; /* Flow GRE item. */
+	const struct rte_flow_item *integrity_items[2];
 };
 
 struct mlx5_flow_split_info {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index d31838e26e..e86a06eae6 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12648,132 +12648,121 @@ flow_dv_aso_age_params_init(struct rte_eth_dev *dev,
 
 static void
 flow_dv_translate_integrity_l4(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v)
+			       void *headers)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value is used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l4_ok) {
 		/* RTE l4_ok filter aggregates hardware l4_ok and
 		 * l4_checksum_ok filters.
 		 * Positive RTE l4_ok match requires hardware match on both L4
 		 * hardware integrity bits.
-		 * For negative match, check hardware l4_checksum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L4.
+		 * PMD supports positive integrity item semantics only.
 		 */
-		if (value->l4_ok) {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_ok, 1);
-		}
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 !!value->l4_ok);
-	}
-	if (mask->l4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 value->l4_csum_ok);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_ok, 1);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
+	} else if (mask->l4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
 	}
 }
 
 static void
 flow_dv_translate_integrity_l3(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v, bool is_ipv4)
+			       void *headers, bool is_ipv4)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l3_ok) {
 		/* RTE l3_ok filter aggregates for IPv4 hardware l3_ok and
 		 * ipv4_csum_ok filters.
 		 * Positive RTE l3_ok match requires hardware match on both L3
 		 * hardware integrity bits.
-		 * For negative match, check hardware l3_csum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L3.
+		 * PMD supports positive integrity item semantics only.
 		 */
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l3_ok, 1);
 		if (is_ipv4) {
-			if (value->l3_ok) {
-				MLX5_SET(fte_match_set_lyr_2_4, headers_m,
-					 l3_ok, 1);
-				MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-					 l3_ok, 1);
-			}
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m,
+			MLX5_SET(fte_match_set_lyr_2_4, headers,
 				 ipv4_checksum_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-				 ipv4_checksum_ok, !!value->l3_ok);
-		} else {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l3_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l3_ok,
-				 value->l3_ok);
 		}
-	}
-	if (mask->ipv4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ipv4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ipv4_checksum_ok,
-			 value->ipv4_csum_ok);
+	} else if (is_ipv4 && mask->ipv4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, ipv4_checksum_ok, 1);
 	}
 }
 
 static void
-set_integrity_bits(void *headers_m, void *headers_v,
-		   const struct rte_flow_item *integrity_item, bool is_l3_ip4)
+set_integrity_bits(void *headers, const struct rte_flow_item *integrity_item,
+		   bool is_l3_ip4, uint32_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = integrity_item->spec;
-	const struct rte_flow_item_integrity *mask = integrity_item->mask;
+	const struct rte_flow_item_integrity *spec;
+	const struct rte_flow_item_integrity *mask;
 
 	/* Integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (!mask)
-		mask = &rte_flow_item_integrity_mask;
-	flow_dv_translate_integrity_l3(mask, spec, headers_m, headers_v,
-				       is_l3_ip4);
-	flow_dv_translate_integrity_l4(mask, spec, headers_m, headers_v);
+	if (MLX5_ITEM_VALID(integrity_item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(integrity_item, key_type, spec, mask,
+			 &rte_flow_item_integrity_mask);
+	flow_dv_translate_integrity_l3(mask, headers, is_l3_ip4);
+	flow_dv_translate_integrity_l4(mask, headers);
 }
 
 static void
-flow_dv_translate_item_integrity_post(void *matcher, void *key,
+flow_dv_translate_item_integrity_post(void *key,
 				      const
 				      struct rte_flow_item *integrity_items[2],
-				      uint64_t pattern_flags)
+				      uint64_t pattern_flags, uint32_t key_type)
 {
-	void *headers_m, *headers_v;
+	void *headers;
 	bool is_l3_ip4;
 
 	if (pattern_flags & MLX5_FLOW_ITEM_INNER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 inner_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_INNER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[1], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[1], is_l3_ip4,
+				   key_type);
 	}
 	if (pattern_flags & MLX5_FLOW_ITEM_OUTER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 outer_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_OUTER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[0], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[0], is_l3_ip4,
+				   key_type);
 	}
 }
 
-static void
+static uint64_t
 flow_dv_translate_item_integrity(const struct rte_flow_item *item,
-				 const struct rte_flow_item *integrity_items[2],
-				 uint64_t *last_item)
+				 struct mlx5_dv_matcher_workspace *wks,
+				 uint64_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = (typeof(spec))item->spec;
+	if ((key_type & MLX5_SET_MATCHER_SW) != 0) {
+		const struct rte_flow_item_integrity
+			*spec = (typeof(spec))item->spec;
 
-	/* integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (spec->level > 1) {
-		integrity_items[1] = item;
-		*last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		/* SWS integrity bits validation cleared spec pointer */
+		if (spec->level > 1) {
+			wks->integrity_items[1] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		} else {
+			wks->integrity_items[0] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		}
 	} else {
-		integrity_items[0] = item;
-		*last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		/* HWS supports outer integrity only */
+		wks->integrity_items[0] = item;
+		wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
 	}
+	return wks->last_item;
 }
 
 /**
@@ -13401,6 +13390,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		flow_dv_translate_item_meter_color(dev, key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_METER_COLOR;
 		break;
+	case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+		last_item = flow_dv_translate_item_integrity(items,
+							     wks, key_type);
+		break;
 	default:
 		break;
 	}
@@ -13464,6 +13457,12 @@ flow_dv_translate_items_hws(const struct rte_flow_item *items,
 		if (ret)
 			return ret;
 	}
+	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
+		flow_dv_translate_item_integrity_post(key,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      key_type);
+	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(key,
 						 wks.tunnel_item,
@@ -13544,7 +13543,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			     mlx5_flow_get_thread_workspace())->rss_desc,
 	};
 	struct mlx5_dv_matcher_workspace wks_m = wks;
-	const struct rte_flow_item *integrity_items[2] = {NULL, NULL};
 	int ret = 0;
 	int tunnel;
 
@@ -13555,10 +13553,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 						  NULL, "item not supported");
 		tunnel = !!(wks.item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		switch (items->type) {
-		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
-			flow_dv_translate_item_integrity(items, integrity_items,
-							 &wks.last_item);
-			break;
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			flow_dv_translate_item_aso_ct(dev, match_mask,
 						      match_value, items);
@@ -13601,9 +13595,14 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			return -rte_errno;
 	}
 	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
-		flow_dv_translate_item_integrity_post(match_mask, match_value,
-						      integrity_items,
-						      wks.item_flags);
+		flow_dv_translate_item_integrity_post(match_mask,
+						      wks_m.integrity_items,
+						      wks_m.item_flags,
+						      MLX5_SET_MATCHER_SW_M);
+		flow_dv_translate_item_integrity_post(match_value,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      MLX5_SET_MATCHER_SW_V);
 	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(match_mask,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 9f70637fcf..2b5eab6659 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4656,6 +4656,14 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
+		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+			/*
+			 * Integrity flow item validation require access to
+			 * both item mask and spec.
+			 * Current HWS model allows item mask in pattern
+			 * template and item spec in flow rule.
+			 */
+			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
 			break;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 16/17] net/mlx5: support device control for E-Switch default rule
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (14 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 15/17] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  2022-09-30 12:53   ` [PATCH v3 17/17] net/mlx5: support device control of representor matching Suanming Mou
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Dariusz Sosnowski, Xueming Li

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch introduces
This patch adds support for fdb_def_rule_en device argument to HW
Steering, which controls:

- creation of default FDB jump flow rule,
- ability of the user to create transfer flow rules in root table.

A new PMD API to allow user application to enable traffic with port
ID and SQ number is also added to direct packet to wire.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  14 ++
 drivers/net/mlx5/mlx5.h          |   4 +-
 drivers/net/mlx5/mlx5_flow.c     |  28 ++--
 drivers/net/mlx5/mlx5_flow.h     |  11 +-
 drivers/net/mlx5/mlx5_flow_dv.c  |  78 +++++----
 drivers/net/mlx5/mlx5_flow_hw.c  | 279 +++++++++++++++----------------
 drivers/net/mlx5/mlx5_trigger.c  |  31 ++--
 drivers/net/mlx5/mlx5_tx.h       |   1 +
 drivers/net/mlx5/mlx5_txq.c      |  47 ++++++
 drivers/net/mlx5/rte_pmd_mlx5.h  |  17 ++
 drivers/net/mlx5/version.map     |   1 +
 11 files changed, 305 insertions(+), 206 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 60a1a391fb..de8c003d02 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1554,6 +1554,20 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
+		if (priv->sh->config.dv_esw_en) {
+			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
+				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
+					     "but it is disabled (configure it through devlink)");
+				err = ENOTSUP;
+				goto error;
+			}
+			if (priv->sh->dv_regc0_mask == 0) {
+				DRV_LOG(ERR, "E-Switch with HWS is not supported "
+					     "(no available bits in reg_c[0])");
+				err = ENOTSUP;
+				goto error;
+			}
+		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f6033710aa..419b5a18ca 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2015,7 +2015,7 @@ int mlx5_flow_ops_get(struct rte_eth_dev *dev, const struct rte_flow_ops **ops);
 int mlx5_flow_start_default(struct rte_eth_dev *dev);
 void mlx5_flow_stop_default(struct rte_eth_dev *dev);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
-int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t sq_num);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
@@ -2027,7 +2027,7 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev,
 int mlx5_flow_lacp_miss(struct rte_eth_dev *dev);
 struct rte_flow *mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev);
 uint32_t mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev,
-					    uint32_t txq);
+					    uint32_t sq_num);
 void mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 				       uint64_t async_id, int status);
 void mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index bc2ccb4d3c..2142cd828a 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7155,14 +7155,14 @@ mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param txq
- *   Txq index.
+ * @param sq_num
+ *   SQ number.
  *
  * @return
  *   Flow ID on success, 0 otherwise and rte_errno is set.
  */
 uint32_t
-mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sq_num)
 {
 	struct rte_flow_attr attr = {
 		.group = 0,
@@ -7174,8 +7174,8 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_flow_item_port_id port_spec = {
 		.id = MLX5_PORT_ESW_MGR,
 	};
-	struct mlx5_rte_flow_item_tx_queue txq_spec = {
-		.queue = txq,
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sq_num,
 	};
 	struct rte_flow_item pattern[] = {
 		{
@@ -7184,8 +7184,8 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		},
 		{
 			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
-			.spec = &txq_spec,
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &sq_spec,
 		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
@@ -7556,30 +7556,30 @@ mlx5_flow_verify(struct rte_eth_dev *dev __rte_unused)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param queue
- *   The queue index.
+ * @param sq_num
+ *   The SQ hw number.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
-			    uint32_t queue)
+			    uint32_t sq_num)
 {
 	const struct rte_flow_attr attr = {
 		.egress = 1,
 		.priority = 0,
 	};
-	struct mlx5_rte_flow_item_tx_queue queue_spec = {
-		.queue = queue,
+	struct mlx5_rte_flow_item_sq queue_spec = {
+		.queue = sq_num,
 	};
-	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
 		{
 			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
 			.spec = &queue_spec,
 			.last = NULL,
 			.mask = &queue_mask,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 3f4aa080bb..63f946473d 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -29,7 +29,7 @@
 enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_END = INT_MIN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
-	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+	MLX5_RTE_FLOW_ITEM_TYPE_SQ,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
 	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
@@ -115,8 +115,8 @@ struct mlx5_flow_action_copy_mreg {
 };
 
 /* Matches on source queue. */
-struct mlx5_rte_flow_item_tx_queue {
-	uint32_t queue;
+struct mlx5_rte_flow_item_sq {
+	uint32_t queue; /* DevX SQ number */
 };
 
 /* Feature name to allocate metadata register. */
@@ -179,7 +179,7 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_LAYER_GENEVE (1u << 26)
 
 /* Queue items. */
-#define MLX5_FLOW_ITEM_TX_QUEUE (1u << 27)
+#define MLX5_FLOW_ITEM_SQ (1u << 27)
 
 /* Pattern tunnel Layer bits (continued). */
 #define MLX5_FLOW_LAYER_GTP (1u << 28)
@@ -2475,9 +2475,8 @@ int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 
 int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
 
-int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
-					 uint32_t txq);
+					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index e86a06eae6..0f6fd34a8b 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -7453,8 +7453,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 				return ret;
 			last_item = MLX5_FLOW_ITEM_TAG;
 			break;
-		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
-			last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
+			last_item = MLX5_FLOW_ITEM_SQ;
 			break;
 		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
 			break;
@@ -8343,7 +8343,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 	 * work due to metadata regC0 mismatch.
 	 */
 	if ((!attr->transfer && attr->egress) && priv->representor &&
-	    !(item_flags & MLX5_FLOW_ITEM_TX_QUEUE))
+	    !(item_flags & MLX5_FLOW_ITEM_SQ))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ITEM,
 					  NULL,
@@ -10123,6 +10123,29 @@ flow_dv_translate_item_port_id(struct rte_eth_dev *dev, void *key,
 	return 0;
 }
 
+/**
+ * Translate port representor item to eswitch match on port id.
+ *
+ * @param[in] dev
+ *   The devich to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise.
+ */
+static int
+flow_dv_translate_item_port_representor(struct rte_eth_dev *dev, void *key,
+					uint32_t key_type)
+{
+	flow_dv_translate_item_source_vport(key,
+			key_type & MLX5_SET_MATCHER_V ?
+			mlx5_flow_get_esw_manager_vport_id(dev) : 0xffff);
+	return 0;
+}
+
 /**
  * Translate represented port item to eswitch match on port id.
  *
@@ -11402,10 +11425,10 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
 }
 
 /**
- * Add Tx queue matcher
+ * Add SQ matcher
  *
- * @param[in] dev
- *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
  * @param[in, out] key
  *   Flow matcher value.
  * @param[in] item
@@ -11414,40 +11437,29 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
  *   Set flow matcher mask or value.
  */
 static void
-flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
-				void *key,
-				const struct rte_flow_item *item,
-				uint32_t key_type)
+flow_dv_translate_item_sq(void *key,
+			  const struct rte_flow_item *item,
+			  uint32_t key_type)
 {
-	const struct mlx5_rte_flow_item_tx_queue *queue_m;
-	const struct mlx5_rte_flow_item_tx_queue *queue_v;
-	const struct mlx5_rte_flow_item_tx_queue queue_mask = {
+	const struct mlx5_rte_flow_item_sq *queue_m;
+	const struct mlx5_rte_flow_item_sq *queue_v;
+	const struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
-	void *misc_v =
-		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
-	struct mlx5_txq_ctrl *txq = NULL;
+	void *misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 	uint32_t queue;
 
 	MLX5_ITEM_UPDATE(item, key_type, queue_v, queue_m, &queue_mask);
 	if (!queue_m || !queue_v)
 		return;
 	if (key_type & MLX5_SET_MATCHER_V) {
-		txq = mlx5_txq_get(dev, queue_v->queue);
-		if (!txq)
-			return;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = queue_v->queue;
 		if (key_type == MLX5_SET_MATCHER_SW_V)
 			queue &= queue_m->queue;
 	} else {
 		queue = queue_m->queue;
 	}
 	MLX5_SET(fte_match_set_misc, misc_v, source_sqn, queue);
-	if (txq)
-		mlx5_txq_release(dev, queue_v->queue);
 }
 
 /**
@@ -13148,6 +13160,11 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 			(dev, key, items, wks->attr, key_type);
 		last_item = MLX5_FLOW_ITEM_PORT_ID;
 		break;
+	case RTE_FLOW_ITEM_TYPE_PORT_REPRESENTOR:
+		flow_dv_translate_item_port_representor
+			(dev, key, key_type);
+		last_item = MLX5_FLOW_ITEM_PORT_REPRESENTOR;
+		break;
 	case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		flow_dv_translate_item_represented_port
 			(dev, key, items, wks->attr, key_type);
@@ -13353,9 +13370,9 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		flow_dv_translate_mlx5_item_tag(dev, key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_TAG;
 		break;
-	case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
-		flow_dv_translate_item_tx_queue(dev, key, items, key_type);
-		last_item = MLX5_FLOW_ITEM_TX_QUEUE;
+	case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
+		flow_dv_translate_item_sq(key, items, key_type);
+		last_item = MLX5_FLOW_ITEM_SQ;
 		break;
 	case RTE_FLOW_ITEM_TYPE_GTP:
 		flow_dv_translate_item_gtp(key, items, tunnel, key_type);
@@ -13564,7 +13581,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			wks.last_item = tunnel ? MLX5_FLOW_ITEM_INNER_FLEX :
 						 MLX5_FLOW_ITEM_OUTER_FLEX;
 			break;
-
 		default:
 			ret = flow_dv_translate_items(dev, items, &wks_m,
 				match_mask, MLX5_SET_MATCHER_SW_M, error);
@@ -13587,7 +13603,9 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 	 * in use.
 	 */
 	if (!(wks.item_flags & MLX5_FLOW_ITEM_PORT_ID) &&
-	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) && priv->sh->esw_mode &&
+	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) &&
+	    !(wks.item_flags & MLX5_FLOW_ITEM_PORT_REPRESENTOR) &&
+	    priv->sh->esw_mode &&
 	    !(attr->egress && !attr->transfer) &&
 	    attr->group != MLX5_FLOW_MREG_CP_TABLE_GROUP) {
 		if (flow_dv_translate_item_port_id_all(dev, match_mask,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 2b5eab6659..b2824ad8fe 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3173,7 +3173,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+	if (priv->sh->config.dv_esw_en &&
+	    priv->fdb_def_rule &&
+	    cfg->external &&
+	    flow_attr->transfer) {
 		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -4648,7 +4651,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_GTP:
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
-		case MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE:
+		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
 		case RTE_FLOW_ITEM_TYPE_GRE:
 		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
@@ -5141,14 +5144,23 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 }
 
 static uint32_t
-flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
-	uint32_t usable_mask = ~priv->vport_meta_mask;
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
 
-	if (usable_mask)
-		return (1 << rte_bsf32(usable_mask));
-	else
-		return 0;
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return mask;
+}
+
+static uint32_t
+flow_hw_esw_mgr_regc_marker(struct rte_eth_dev *dev)
+{
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return RTE_BIT32(rte_bsf32(mask));
 }
 
 /**
@@ -5174,12 +5186,19 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 	struct rte_flow_item_ethdev port_mask = {
 		.port_id = UINT16_MAX,
 	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
 	struct rte_flow_item items[] = {
 		{
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &port_spec,
 			.mask = &port_mask,
 		},
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
@@ -5189,9 +5208,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match REG_C_0 and a TX queue.
- * Matching on REG_C_0 is set up to match on least significant bit usable
- * by user-space, which is set when packet was originated from E-Switch Manager.
+ * Creates a flow pattern template used to match REG_C_0 and a SQ.
+ * Matching on REG_C_0 is set up to match on all bits usable by user-space.
+ * If traffic was sent from E-Switch Manager, then all usable bits will be set to 0,
+ * except the least significant bit, which will be set to 1.
  *
  * This template is used to set up a table for SQ miss default flow.
  *
@@ -5204,8 +5224,6 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_pattern_template *
 flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
@@ -5215,8 +5233,9 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
-	struct mlx5_rte_flow_item_tx_queue queue_mask = {
+	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
@@ -5228,7 +5247,7 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 		{
 			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
 			.mask = &queue_mask,
 		},
 		{
@@ -5236,12 +5255,6 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
-		return NULL;
-	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -5333,9 +5346,8 @@ flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_actions_template *
 flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
-	uint32_t marker_bit_mask = UINT32_MAX;
+	uint32_t marker_mask = flow_hw_esw_mgr_regc_marker_mask(dev);
+	uint32_t marker_bits = flow_hw_esw_mgr_regc_marker(dev);
 	struct rte_flow_actions_template_attr attr = {
 		.transfer = 1,
 	};
@@ -5348,7 +5360,7 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		.src = {
 			.field = RTE_FLOW_FIELD_VALUE,
 		},
-		.width = 1,
+		.width = __builtin_popcount(marker_mask),
 	};
 	struct rte_flow_action_modify_field set_reg_m = {
 		.operation = RTE_FLOW_MODIFY_SET,
@@ -5395,13 +5407,9 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		}
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
-		return NULL;
-	}
-	set_reg_v.dst.offset = rte_bsf32(marker_bit);
-	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
-	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	set_reg_v.dst.offset = rte_bsf32(marker_mask);
+	rte_memcpy(set_reg_v.src.value, &marker_bits, sizeof(marker_bits));
+	rte_memcpy(set_reg_m.src.value, &marker_mask, sizeof(marker_mask));
 	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
 }
 
@@ -5588,7 +5596,7 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -5703,7 +5711,7 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.priority = 0,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -7765,141 +7773,123 @@ flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
 }
 
 int
-mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sqn)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_item_ethdev port_spec = {
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev esw_mgr_spec = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item_ethdev port_mask = {
+	struct rte_flow_item_ethdev esw_mgr_mask = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item items[] = {
-		{
-			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-			.spec = &port_spec,
-			.mask = &port_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
-	};
-	struct rte_flow_action_modify_field modify_field = {
-		.operation = RTE_FLOW_MODIFY_SET,
-		.dst = {
-			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-		},
-		.src = {
-			.field = RTE_FLOW_FIELD_VALUE,
-		},
-		.width = 1,
-	};
-	struct rte_flow_action_jump jump = {
-		.group = 1,
-	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
-			.conf = &modify_field,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_JUMP,
-			.conf = &jump,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
-
-	MLX5_ASSERT(priv->master);
-	if (!priv->dr_ctx ||
-	    !priv->hw_esw_sq_miss_root_tbl)
-		return 0;
-	return flow_hw_create_ctrl_flow(dev, dev,
-					priv->hw_esw_sq_miss_root_tbl,
-					items, 0, actions, 0);
-}
-
-int
-mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
-{
-	uint16_t port_id = dev->data->port_id;
 	struct rte_flow_item_tag reg_c0_spec = {
 		.index = (uint8_t)REG_C_0,
+		.data = flow_hw_esw_mgr_regc_marker(dev),
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
-	struct mlx5_rte_flow_item_tx_queue queue_spec = {
-		.queue = txq,
-	};
-	struct mlx5_rte_flow_item_tx_queue queue_mask = {
-		.queue = UINT32_MAX,
-	};
-	struct rte_flow_item items[] = {
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
-			.spec = &reg_c0_spec,
-			.mask = &reg_c0_mask,
-		},
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
-			.spec = &queue_spec,
-			.mask = &queue_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
 	};
 	struct rte_flow_action_ethdev port = {
 		.port_id = port_id,
 	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
-			.conf = &port,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
+	struct rte_flow_item items[3] = { { 0 } };
+	struct rte_flow_action actions[3] = { { 0 } };
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
-	uint32_t marker_bit;
 	int ret;
 
-	RTE_SET_USED(txq);
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default SQ miss flows.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default SQ miss flows. Default flows will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
 	    !proxy_priv->hw_esw_sq_miss_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
-		rte_errno = EINVAL;
-		return -rte_errno;
+	/*
+	 * Create a root SQ miss flow rule - match E-Switch Manager and SQ,
+	 * and jump to group 1.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = &esw_mgr_spec,
+		.mask = &esw_mgr_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_JUMP,
+	};
+	actions[2] = (struct rte_flow_action) {
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_root_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create root SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
 	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
-	return flow_hw_create_ctrl_flow(dev, proxy_dev,
-					proxy_priv->hw_esw_sq_miss_tbl,
-					items, 0, actions, 0);
+	/*
+	 * Create a non-root SQ miss flow rule - match REG_C_0 marker and SQ,
+	 * and forward to port.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &reg_c0_spec,
+		.mask = &reg_c0_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+		.conf = &port,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create HWS SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
+	}
+	return 0;
 }
 
 int
@@ -7937,17 +7927,24 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default FDB jump rule.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default FDB jump rule. Default rule will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_zero_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 2603196933..a973cbc5e3 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -426,7 +426,7 @@ mlx5_hairpin_queue_peer_update(struct rte_eth_dev *dev, uint16_t peer_queue,
 			mlx5_txq_release(dev, peer_queue);
 			return -rte_errno;
 		}
-		peer_info->qp_id = txq_ctrl->obj->sq->id;
+		peer_info->qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		peer_info->vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		/* 1-to-1 mapping, only the first one is used. */
 		peer_info->peer_q = txq_ctrl->hairpin_conf.peers[0].queue;
@@ -818,7 +818,7 @@ mlx5_hairpin_bind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 		}
 		/* Pass TxQ's information to peer RxQ and try binding. */
 		cur.peer_q = rx_queue;
-		cur.qp_id = txq_ctrl->obj->sq->id;
+		cur.qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		cur.vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		cur.tx_explicit = txq_ctrl->hairpin_conf.tx_explicit;
 		cur.manual_bind = txq_ctrl->hairpin_conf.manual_bind;
@@ -1300,8 +1300,6 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	int ret;
 
 	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
-			goto error;
 		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
 			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
 				goto error;
@@ -1312,10 +1310,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 
 		if (!txq)
 			continue;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = mlx5_txq_get_sqn(txq);
 		if ((priv->representor || priv->master) &&
 		    priv->sh->config.dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
@@ -1325,9 +1320,15 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
-			goto error;
+	if (priv->sh->config.fdb_def_rule) {
+		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				goto error;
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled", dev->data->port_id);
 	}
 	return 0;
 error:
@@ -1393,14 +1394,18 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		    txq_ctrl->hairpin_conf.tx_explicit == 0 &&
 		    txq_ctrl->hairpin_conf.peers[0].port ==
 		    priv->dev_data->port_id) {
-			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			ret = mlx5_ctrl_flow_source_queue(dev,
+					mlx5_txq_get_sqn(txq_ctrl));
 			if (ret) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
 		if (priv->sh->config.dv_esw_en) {
-			if (mlx5_flow_create_devx_sq_miss_flow(dev, i) == 0) {
+			uint32_t q = mlx5_txq_get_sqn(txq_ctrl);
+
+			if (mlx5_flow_create_devx_sq_miss_flow(dev, q) == 0) {
+				mlx5_txq_release(dev, i);
 				DRV_LOG(ERR,
 					"Port %u Tx queue %u SQ create representor devx default miss rule failed.",
 					dev->data->port_id, i);
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index e0fc1872fe..6471ebf59f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -213,6 +213,7 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
 uint64_t mlx5_get_tx_port_offloads(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9150ced72d..7a0f1d61a5 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -27,6 +27,8 @@
 #include "mlx5_tx.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
+#include "rte_pmd_mlx5.h"
+#include "mlx5_flow.h"
 
 /**
  * Allocate TX queue elements.
@@ -1274,6 +1276,51 @@ mlx5_txq_verify(struct rte_eth_dev *dev)
 	return ret;
 }
 
+int
+mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq)
+{
+	return txq->is_hairpin ? txq->obj->sq->id : txq->obj->sq_obj.sq->id;
+}
+
+int
+rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint32_t flow;
+
+	if (rte_eth_dev_is_valid_port(port_id) < 0) {
+		DRV_LOG(ERR, "There is no Ethernet device for port %u.",
+			port_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if ((!priv->representor && !priv->master) ||
+	    !priv->sh->config.dv_esw_en) {
+		DRV_LOG(ERR, "Port %u must be represetnor or master port in E-Switch mode.",
+			port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (sq_num == 0) {
+		DRV_LOG(ERR, "Invalid SQ number.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_flow_hw_esw_create_sq_miss_flow(dev, sq_num);
+#endif
+	flow = mlx5_flow_create_devx_sq_miss_flow(dev, sq_num);
+	if (flow > 0)
+		return 0;
+	DRV_LOG(ERR, "Port %u failed to create default miss flow for SQ %u.",
+		port_id, sq_num);
+	return -rte_errno;
+}
+
 /**
  * Set the Tx queue dynamic timestamp (mask and offset)
  *
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index fbfdd9737b..d4caea5b20 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -139,6 +139,23 @@ int rte_pmd_mlx5_external_rx_queue_id_unmap(uint16_t port_id,
 __rte_experimental
 int rte_pmd_mlx5_host_shaper_config(int port_id, uint8_t rate, uint32_t flags);
 
+/**
+ * Enable traffic for external SQ.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] sq_num
+ *   SQ HW number.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ *   Possible values for rte_errno:
+ *   - EINVAL - invalid sq_number or port type.
+ *   - ENODEV - there is no Ethernet device for this port id.
+ */
+__rte_experimental
+int rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/mlx5/version.map b/drivers/net/mlx5/version.map
index 9942de5079..848270da13 100644
--- a/drivers/net/mlx5/version.map
+++ b/drivers/net/mlx5/version.map
@@ -14,4 +14,5 @@ EXPERIMENTAL {
 	rte_pmd_mlx5_external_rx_queue_id_unmap;
 	# added in 22.07
 	rte_pmd_mlx5_host_shaper_config;
+	rte_pmd_mlx5_external_sq_enable;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v3 17/17] net/mlx5: support device control of representor matching
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
                     ` (15 preceding siblings ...)
  2022-09-30 12:53   ` [PATCH v3 16/17] net/mlx5: support device control for E-Switch default rule Suanming Mou
@ 2022-09-30 12:53   ` Suanming Mou
  16 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-09-30 12:53 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

In some E-Switch use cases applications want to receive all traffic
on a single port. Since currently flow API does not provide a way to
match traffic forwarded to any port representor, this patch adds
support for controlling representor matching on ingress flow rules.

Representor matching is controlled through new device argument
repr_matching_en.

- If representor matching is enabled (default setting),
  then each ingress pattern template has an implicit REPRESENTED_PORT
  item added. Flow rules based on this pattern template will match
  the vport associated with port on which rule is created.
- If representor matching is disabled, then there will be no implicit
  item added. As a result ingress flow rules will match traffic
  coming to any port, not only the port on which flow rule is created.

Representor matching is enabled by default, to provide an expected
default behavior.

This patch enables egress flow rules on representors when E-Switch is
enabled in the following configurations:

- repr_matching_en=1 and dv_xmeta_en=4
- repr_matching_en=1 and dv_xmeta_en=0
- repr_matching_en=0 and dv_xmeta_en=0

When representor matching is enabled, the following logic is
implemented:

1. Creating an egress template table in group 0 for each port. These
   tables will hold default flow rules defined as follows:

      pattern SQ
      actions MODIFY_FIELD (set available bits in REG_C_0 to
                            vport_meta_tag)
              MODIFY_FIELD (copy REG_A to REG_C_1, only when
                            dv_xmeta_en == 4)
              JUMP (group 1)

2. Egress pattern templates created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   available bits of REG_C_0.

3. Egress flow rules created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   vport_meta_tag placed in available bits of REG_C_0.

4. Egress template tables created by an application, which are in
   group n, are placed in group n + 1.

5. Items and actions related to META are operating on REG_A when
   dv_xmeta_en == 0 or REG_C_1 when dv_xmeta_en == 4.

When representor matching is disabled and extended metadata is disabled,
no changes to current logic are required.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  11 +
 drivers/net/mlx5/mlx5.c          |  13 +
 drivers/net/mlx5/mlx5.h          |   5 +
 drivers/net/mlx5/mlx5_flow.c     |   8 +-
 drivers/net/mlx5/mlx5_flow.h     |   7 +
 drivers/net/mlx5/mlx5_flow_hw.c  | 738 ++++++++++++++++++++++++-------
 drivers/net/mlx5/mlx5_trigger.c  | 167 ++++++-
 7 files changed, 783 insertions(+), 166 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index de8c003d02..50d34b152a 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1555,6 +1555,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 		if (priv->sh->config.dv_esw_en) {
+			uint32_t usable_bits;
+			uint32_t required_bits;
+
 			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
 				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
 					     "but it is disabled (configure it through devlink)");
@@ -1567,6 +1570,14 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				err = ENOTSUP;
 				goto error;
 			}
+			usable_bits = __builtin_popcount(priv->sh->dv_regc0_mask);
+			required_bits = __builtin_popcount(priv->vport_meta_mask);
+			if (usable_bits < required_bits) {
+				DRV_LOG(ERR, "Not enough bits available in reg_c[0] to provide "
+					     "representor matching.");
+				err = ENOTSUP;
+				goto error;
+			}
 		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 742607509b..c249619a60 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -181,6 +181,9 @@
 /* HW steering counter's query interval. */
 #define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
 
+/* Device parameter to control representor matching in ingress/egress flows with HWS. */
+#define MLX5_REPR_MATCHING_EN "repr_matching_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1283,6 +1286,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->cnt_svc.service_core = tmp;
 	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
 		config->cnt_svc.cycle_time = tmp;
+	} else if (strcmp(MLX5_REPR_MATCHING_EN, key) == 0) {
+		config->repr_matching = !!tmp;
 	}
 	return 0;
 }
@@ -1321,6 +1326,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_FDB_DEFAULT_RULE_EN,
 		MLX5_HWS_CNT_SERVICE_CORE,
 		MLX5_HWS_CNT_CYCLE_TIME,
+		MLX5_REPR_MATCHING_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1335,6 +1341,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->fdb_def_rule = 1;
 	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
 	config->cnt_svc.service_core = rte_get_main_lcore();
+	config->repr_matching = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1368,6 +1375,11 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 			config->dv_xmeta_en);
 		config->dv_xmeta_en = MLX5_XMETA_MODE_LEGACY;
 	}
+	if (config->dv_flow_en != 2 && !config->repr_matching) {
+		DRV_LOG(DEBUG, "Disabling representor matching is valid only "
+			       "when HW Steering is enabled.");
+		config->repr_matching = 1;
+	}
 	if (config->tx_pp && !sh->dev_cap.txpp_en) {
 		DRV_LOG(ERR, "Packet pacing is not supported.");
 		rte_errno = ENODEV;
@@ -1411,6 +1423,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
 	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
+	DRV_LOG(DEBUG, "\"repr_matching_en\" is %u.", config->repr_matching);
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 419b5a18ca..a0fb6d04d8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -316,6 +316,7 @@ struct mlx5_sh_config {
 	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
+	uint32_t repr_matching:1; /* Enable implicit vport matching in HWS FDB. */
 };
 
 /* Structure for VF VLAN workaround. */
@@ -366,6 +367,7 @@ struct mlx5_hw_q_job {
 			void *out_data;
 		} __rte_packed;
 		struct rte_flow_item_ethdev port_spec;
+		struct rte_flow_item_tag tag_spec;
 	} __rte_packed;
 };
 
@@ -1673,6 +1675,9 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
+	struct rte_flow_pattern_template *hw_tx_repr_tagging_pt;
+	struct rte_flow_actions_template *hw_tx_repr_tagging_at;
+	struct rte_flow_template_table *hw_tx_repr_tagging_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 2142cd828a..026d4eb9c0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1123,7 +1123,11 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 		}
 		break;
 	case MLX5_METADATA_TX:
-		return REG_A;
+		if (config->dv_flow_en == 2 && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		} else {
+			return REG_A;
+		}
 	case MLX5_METADATA_FDB:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
@@ -11319,7 +11323,7 @@ mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 			return 0;
 		}
 	}
-	return rte_flow_error_set(error, EINVAL,
+	return rte_flow_error_set(error, ENODEV,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, "unable to find a proxy port");
 }
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 63f946473d..a497dac474 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1199,12 +1199,18 @@ struct rte_flow_pattern_template {
 	struct rte_flow_pattern_template_attr attr;
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
+	uint64_t orig_item_nb; /* Number of pattern items provided by the user (with END item). */
 	uint32_t refcnt;  /* Reference counter. */
 	/*
 	 * If true, then rule pattern should be prepended with
 	 * represented_port pattern item.
 	 */
 	bool implicit_port;
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * tag pattern item for representor matching.
+	 */
+	bool implicit_tag;
 };
 
 /* Flow action template struct. */
@@ -2479,6 +2485,7 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_actions_template_attr *attr,
 		const struct rte_flow_action actions[],
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index b2824ad8fe..461d344700 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -32,12 +32,15 @@
 /* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Lowest flow group usable by an application. */
+/* Lowest flow group usable by an application if group translation is done. */
 #define MLX5_HW_LOWEST_USABLE_GROUP (1)
 
 /* Maximum group index usable by user applications for transfer flows. */
 #define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
 
+/* Maximum group index usable by user applications for egress flows. */
+#define MLX5_HW_MAX_EGRESS_GROUP (UINT32_MAX - 1)
+
 /* Lowest priority for HW root table. */
 #define MLX5_HW_LOWEST_PRIO_ROOT 15
 
@@ -61,6 +64,9 @@ flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
 			       const struct mlx5_hw_actions *hw_acts,
 			       const struct rte_flow_action *action);
 
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev);
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -2346,21 +2352,18 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 		       uint8_t pattern_template_index,
 		       struct mlx5_hw_q_job *job)
 {
-	if (table->its[pattern_template_index]->implicit_port) {
-		const struct rte_flow_item *curr_item;
-		unsigned int nb_items;
-		bool found_end;
-		unsigned int i;
-
-		/* Count number of pattern items. */
-		nb_items = 0;
-		found_end = false;
-		for (curr_item = items; !found_end; ++curr_item) {
-			++nb_items;
-			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-				found_end = true;
+	struct rte_flow_pattern_template *pt = table->its[pattern_template_index];
+
+	/* Only one implicit item can be added to flow rule pattern. */
+	MLX5_ASSERT(!pt->implicit_port || !pt->implicit_tag);
+	/* At least one item was allocated in job descriptor for items. */
+	MLX5_ASSERT(MLX5_HW_MAX_ITEMS >= 1);
+	if (pt->implicit_port) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
-		/* Prepend represented port item. */
+		/* Set up represented port item in job descriptor. */
 		job->port_spec = (struct rte_flow_item_ethdev){
 			.port_id = dev->data->port_id,
 		};
@@ -2368,21 +2371,26 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &job->port_spec,
 		};
-		found_end = false;
-		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
-			job->items[i] = items[i - 1];
-			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
-				found_end = true;
-				break;
-			}
-		}
-		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
+		return job->items;
+	} else if (pt->implicit_tag) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
 			rte_errno = ENOMEM;
 			return NULL;
 		}
+		/* Set up tag item in job descriptor. */
+		job->tag_spec = (struct rte_flow_item_tag){
+			.data = flow_hw_tx_tag_regc_value(dev),
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &job->tag_spec,
+		};
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
 		return job->items;
+	} else {
+		return items;
 	}
-	return items;
 }
 
 /**
@@ -2960,6 +2968,11 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		     uint8_t nb_action_templates,
 		     struct rte_flow_error *error)
 {
+	struct rte_flow_error sub_error = {
+		.type = RTE_FLOW_ERROR_TYPE_NONE,
+		.cause = NULL,
+		.message = NULL,
+	};
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5dr_matcher_attr matcher_attr = {0};
 	struct rte_flow_template_table *tbl = NULL;
@@ -2970,7 +2983,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
-		.error = error,
+		.error = &sub_error,
 		.data = &flow_attr,
 	};
 	struct mlx5_indexed_pool_config cfg = {
@@ -3064,7 +3077,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			continue;
 		err = __flow_hw_actions_translate(dev, &tbl->cfg,
 						  &tbl->ats[i].acts,
-						  action_templates[i], error);
+						  action_templates[i], &sub_error);
 		if (err) {
 			i++;
 			goto at_error;
@@ -3105,12 +3118,11 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mlx5_free(tbl);
 	}
 	if (error != NULL) {
-		rte_flow_error_set(error, err,
-				error->type == RTE_FLOW_ERROR_TYPE_NONE ?
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED : error->type,
-				NULL,
-				error->message == NULL ?
-				"fail to create rte table" : error->message);
+		if (sub_error.type == RTE_FLOW_ERROR_TYPE_NONE)
+			rte_flow_error_set(error, err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					   "Failed to create template table");
+		else
+			rte_memcpy(error, &sub_error, sizeof(sub_error));
 	}
 	return NULL;
 }
@@ -3171,9 +3183,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en &&
+	if (config->dv_esw_en &&
 	    priv->fdb_def_rule &&
 	    cfg->external &&
 	    flow_attr->transfer) {
@@ -3183,6 +3196,22 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 						  NULL,
 						  "group index not supported");
 		*table_group = group + 1;
+	} else if (config->dv_esw_en &&
+		   !(config->repr_matching && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) &&
+		   cfg->external &&
+		   flow_attr->egress) {
+		/*
+		 * On E-Switch setups, egress group translation is not done if and only if
+		 * representor matching is disabled and legacy metadata mode is selected.
+		 * In all other cases, egree group 0 is reserved for representor tagging flows
+		 * and metadata copy flows.
+		 */
+		if (group > MLX5_HW_MAX_EGRESS_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
 	} else {
 		*table_group = group;
 	}
@@ -3223,7 +3252,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -3232,12 +3260,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
-		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-				  "egress flows are not supported with HW Steering"
-				  " when E-Switch is enabled");
-		return NULL;
-	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -4494,26 +4516,28 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
-static struct rte_flow_item *
-flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
-			       struct rte_flow_error *error)
+static uint32_t
+flow_hw_count_items(const struct rte_flow_item *items)
 {
 	const struct rte_flow_item *curr_item;
-	struct rte_flow_item *copied_items;
-	bool found_end;
-	unsigned int nb_items;
-	unsigned int i;
-	size_t size;
+	uint32_t nb_items;
 
-	/* Count number of pattern items. */
 	nb_items = 0;
-	found_end = false;
-	for (curr_item = items; !found_end; ++curr_item) {
+	for (curr_item = items; curr_item->type != RTE_FLOW_ITEM_TYPE_END; ++curr_item)
 		++nb_items;
-		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-			found_end = true;
-	}
-	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	return ++nb_items;
+}
+
+static struct rte_flow_item *
+flow_hw_prepend_item(const struct rte_flow_item *items,
+		     const uint32_t nb_items,
+		     const struct rte_flow_item *new_item,
+		     struct rte_flow_error *error)
+{
+	struct rte_flow_item *copied_items;
+	size_t size;
+
+	/* Allocate new array of items. */
 	size = sizeof(*copied_items) * (nb_items + 1);
 	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
 	if (!copied_items) {
@@ -4523,14 +4547,9 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 				   "cannot allocate item template");
 		return NULL;
 	}
-	copied_items[0] = (struct rte_flow_item){
-		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-		.spec = NULL,
-		.last = NULL,
-		.mask = &rte_flow_item_ethdev_mask,
-	};
-	for (i = 1; i < nb_items + 1; ++i)
-		copied_items[i] = items[i - 1];
+	/* Put new item at the beginning and copy the rest. */
+	copied_items[0] = *new_item;
+	rte_memcpy(&copied_items[1], items, sizeof(*items) * nb_items);
 	return copied_items;
 }
 
@@ -4551,17 +4570,13 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	if (priv->sh->config.dv_esw_en) {
 		MLX5_ASSERT(priv->master || priv->representor);
 		if (priv->master) {
-			/*
-			 * It is allowed to specify ingress, egress and transfer attributes
-			 * at the same time, in order to construct flows catching all missed
-			 * FDB traffic and forwarding it to the master port.
-			 */
-			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+			if ((attr->ingress && attr->egress) ||
+			    (attr->ingress && attr->transfer) ||
+			    (attr->egress && attr->transfer))
 				return rte_flow_error_set(error, EINVAL,
 							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-							  "only one or all direction attributes"
-							  " at once can be used on transfer proxy"
-							  " port");
+							  "only one direction attribute at once"
+							  " can be used on transfer proxy port");
 		} else {
 			if (attr->transfer)
 				return rte_flow_error_set(error, EINVAL,
@@ -4614,11 +4629,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			break;
 		}
 		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
-			if (attr->ingress || attr->egress)
+			if (attr->ingress && priv->sh->config.repr_matching)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when ingress attribute is set");
+			if (attr->egress)
 				return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
 						  "represented port item cannot be used"
-						  " when transfer attribute is set");
+						  " when egress attribute is set");
 			break;
 		case RTE_FLOW_ITEM_TYPE_META:
 			if (!priv->sh->config.dv_esw_en ||
@@ -4680,6 +4700,17 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static bool
+flow_hw_pattern_has_sq_match(const struct rte_flow_item *items)
+{
+	unsigned int i;
+
+	for (i = 0; items[i].type != RTE_FLOW_ITEM_TYPE_END; ++i)
+		if (items[i].type == (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ)
+			return true;
+	return false;
+}
+
 /**
  * Create flow item template.
  *
@@ -4705,17 +4736,53 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
+	uint64_t orig_item_nb;
+	struct rte_flow_item port = {
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	struct rte_flow_item_tag tag_v = {
+		.data = 0,
+		.index = REG_C_0,
+	};
+	struct rte_flow_item_tag tag_m = {
+		.data = flow_hw_tx_tag_regc_mask(dev),
+		.index = 0xff,
+	};
+	struct rte_flow_item tag = {
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &tag_v,
+		.mask = &tag_m,
+		.last = NULL
+	};
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
-		copied_items = flow_hw_copy_prepend_port_item(items, error);
+	orig_item_nb = flow_hw_count_items(items);
+	if (priv->sh->config.dv_esw_en &&
+	    priv->sh->config.repr_matching &&
+	    attr->ingress && !attr->egress && !attr->transfer) {
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &port, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else if (priv->sh->config.dv_esw_en &&
+		   priv->sh->config.repr_matching &&
+		   !attr->ingress && attr->egress && !attr->transfer) {
+		if (flow_hw_pattern_has_sq_match(items)) {
+			DRV_LOG(DEBUG, "Port %u omitting implicit REG_C_0 match for egress "
+				       "pattern template", dev->data->port_id);
+			tmpl_items = items;
+			goto setup_pattern_template;
+		}
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &tag, error);
 		if (!copied_items)
 			return NULL;
 		tmpl_items = copied_items;
 	} else {
 		tmpl_items = items;
 	}
+setup_pattern_template:
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
 		if (copied_items)
@@ -4727,6 +4794,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
+	it->orig_item_nb = orig_item_nb;
 	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
 		if (copied_items)
@@ -4739,11 +4807,15 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
-	it->implicit_port = !!copied_items;
+	if (copied_items) {
+		if (attr->ingress)
+			it->implicit_port = true;
+		else if (attr->egress)
+			it->implicit_tag = true;
+		mlx5_free(copied_items);
+	}
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
-	if (copied_items)
-		mlx5_free(copied_items);
 	return it;
 }
 
@@ -5143,6 +5215,254 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+/**
+ * Create an egress pattern template matching on source SQ.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to pattern template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_repr_sq_pattern_tmpl(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t mask = priv->sh->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(mask != 0);
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT(__builtin_popcount(mask) >= __builtin_popcount(priv->vport_meta_mask));
+	return mask;
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t tag;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(priv->vport_meta_mask != 0);
+	tag = priv->vport_meta_tag >> (rte_bsf32(priv->vport_meta_mask));
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT((tag & priv->sh->dv_regc0_mask) == tag);
+	return tag;
+}
+
+static void
+flow_hw_update_action_mask(struct rte_flow_action *action,
+			   struct rte_flow_action *mask,
+			   enum rte_flow_action_type type,
+			   void *conf_v,
+			   void *conf_m)
+{
+	action->type = type;
+	action->conf = conf_v;
+	mask->type = type;
+	mask->conf = conf_m;
+}
+
+/**
+ * Create an egress actions template with MODIFY_FIELD action for setting unused REG_C_0 bits
+ * to vport tag and JUMP action to group 1.
+ *
+ * If extended metadata mode is enabled, then MODIFY_FIELD action for copying software metadata
+ * to REG_C_1 is added as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to actions template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_repr_tag_jump_acts_tmpl(struct rte_eth_dev *dev)
+{
+	uint32_t tag_mask = flow_hw_tx_tag_regc_mask(dev);
+	uint32_t tag_value = flow_hw_tx_tag_regc_value(dev);
+	struct rte_flow_actions_template_attr attr = {
+		.egress = 1,
+	};
+	struct rte_flow_action_modify_field set_tag_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+			.offset = rte_bsf32(tag_mask),
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = __builtin_popcount(tag_mask),
+	};
+	struct rte_flow_action_modify_field set_tag_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_modify_field copy_metadata_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action_modify_field copy_metadata_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[4] = { { 0 } };
+	struct rte_flow_action actions_m[4] = { { 0 } };
+	unsigned int idx = 0;
+
+	rte_memcpy(set_tag_v.src.value, &tag_value, sizeof(tag_value));
+	rte_memcpy(set_tag_m.src.value, &tag_mask, sizeof(tag_mask));
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+				   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+				   &set_tag_v, &set_tag_m);
+	idx++;
+	if (MLX5_SH(dev)->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+					   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+					   &copy_metadata_v, &copy_metadata_m);
+		idx++;
+	}
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_JUMP,
+				   &jump_v, &jump_m);
+	idx++;
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_END,
+				   NULL, NULL);
+	idx++;
+	MLX5_ASSERT(idx <= RTE_DIM(actions_v));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
+static void
+flow_hw_cleanup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hw_tx_repr_tagging_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_tx_repr_tagging_tbl, NULL);
+		priv->hw_tx_repr_tagging_tbl = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_at) {
+		flow_hw_actions_template_destroy(dev, priv->hw_tx_repr_tagging_at, NULL);
+		priv->hw_tx_repr_tagging_at = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_pt) {
+		flow_hw_pattern_template_destroy(dev, priv->hw_tx_repr_tagging_pt, NULL);
+		priv->hw_tx_repr_tagging_pt = NULL;
+	}
+}
+
+/**
+ * Setup templates and table used to create default Tx flow rules. These default rules
+ * allow for matching Tx representor traffic using a vport tag placed in unused bits of
+ * REG_C_0 register.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static int
+flow_hw_setup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
+
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	priv->hw_tx_repr_tagging_pt = flow_hw_create_tx_repr_sq_pattern_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_pt)
+		goto error;
+	priv->hw_tx_repr_tagging_at = flow_hw_create_tx_repr_tag_jump_acts_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_at)
+		goto error;
+	priv->hw_tx_repr_tagging_tbl = flow_hw_table_create(dev, &cfg,
+							    &priv->hw_tx_repr_tagging_pt, 1,
+							    &priv->hw_tx_repr_tagging_at, 1,
+							    NULL);
+	if (!priv->hw_tx_repr_tagging_tbl)
+		goto error;
+	return 0;
+error:
+	flow_hw_cleanup_tx_repr_tagging(dev);
+	return -rte_errno;
+}
+
 static uint32_t
 flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
@@ -5549,29 +5869,43 @@ flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
 		},
 		.width = UINT32_MAX,
 	};
-	const struct rte_flow_action copy_reg_action[] = {
+	const struct rte_flow_action_jump jump_action = {
+		.group = 1,
+	};
+	const struct rte_flow_action_jump jump_mask = {
+		.group = UINT32_MAX,
+	};
+	const struct rte_flow_action actions[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_action,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
-	const struct rte_flow_action copy_reg_mask[] = {
+	const struct rte_flow_action masks[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_mask,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_mask,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
 	struct rte_flow_error drop_err;
 
 	RTE_SET_USED(drop_err);
-	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
-					       copy_reg_mask, &drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, actions,
+					       masks, &drop_err);
 }
 
 /**
@@ -5749,63 +6083,21 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
 	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
 	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
+	uint32_t repr_matching = priv->sh->config.repr_matching;
 
-	/* Item templates */
+	/* Create templates and table for default SQ miss flow rules - root table. */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
 	if (!esw_mgr_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
-	if (!regc_sq_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
-	if (!port_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
-		if (!tx_meta_items_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Action templates */
 	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
 	if (!regc_jump_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
-	if (!port_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create port action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
-			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
-	if (!jump_one_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
-		if (!tx_meta_actions_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
 			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
@@ -5814,6 +6106,19 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default SQ miss flow rules - non-root table. */
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
 	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
@@ -5822,6 +6127,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default FDB jump flow rules. */
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
 	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
 							       jump_one_actions_tmpl);
@@ -5830,7 +6149,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+	/* Create templates and table for default Tx metadata copy flow rule. */
+	if (!repr_matching && xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
 		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
 		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
 					tx_meta_items_tmpl, tx_meta_actions_tmpl);
@@ -5854,7 +6186,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+	if (tx_meta_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
@@ -5862,7 +6194,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
 	if (regc_jump_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+	if (tx_meta_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
@@ -6207,6 +6539,13 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (priv->sh->config.dv_esw_en && priv->sh->config.repr_matching) {
+		ret = flow_hw_setup_tx_repr_tagging(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
 	if (is_proxy) {
 		ret = flow_hw_create_vport_actions(priv);
 		if (ret) {
@@ -6329,6 +6668,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	flow_hw_cleanup_tx_repr_tagging(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -7688,45 +8028,30 @@ flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 }
 
 /**
- * Destroys control flows created on behalf of @p owner_dev device.
+ * Destroys control flows created on behalf of @p owner device on @p dev device.
  *
- * @param owner_dev
+ * @param dev
+ *   Pointer to Ethernet device on which control flows were created.
+ * @param owner
  *   Pointer to Ethernet device owning control flows.
  *
  * @return
  *   0 on success, otherwise negative error code is returned and
  *   rte_errno is set.
  */
-int
-mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+static int
+flow_hw_flush_ctrl_flows_owned_by(struct rte_eth_dev *dev, struct rte_eth_dev *owner)
 {
-	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
-	struct rte_eth_dev *proxy_dev;
-	struct mlx5_priv *proxy_priv;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hw_ctrl_flow *cf;
 	struct mlx5_hw_ctrl_flow *cf_next;
-	uint16_t owner_port_id = owner_dev->data->port_id;
-	uint16_t proxy_port_id = owner_dev->data->port_id;
 	int ret;
 
-	if (owner_priv->sh->config.dv_esw_en) {
-		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
-			DRV_LOG(ERR, "Unable to find proxy port for port %u",
-				owner_port_id);
-			rte_errno = EINVAL;
-			return -rte_errno;
-		}
-		proxy_dev = &rte_eth_devices[proxy_port_id];
-		proxy_priv = proxy_dev->data->dev_private;
-	} else {
-		proxy_dev = owner_dev;
-		proxy_priv = owner_priv;
-	}
-	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
 	while (cf != NULL) {
 		cf_next = LIST_NEXT(cf, next);
-		if (cf->owner_dev == owner_dev) {
-			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+		if (cf->owner_dev == owner) {
+			ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
 			if (ret) {
 				rte_errno = ret;
 				return -ret;
@@ -7739,6 +8064,50 @@ mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
 	return 0;
 }
 
+/**
+ * Destroys control flows created for @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	/* Flush all flows created by this port for itself. */
+	ret = flow_hw_flush_ctrl_flows_owned_by(owner_dev, owner_dev);
+	if (ret)
+		return ret;
+	/* Flush all flows created for this port on proxy port. */
+	if (owner_priv->sh->config.dv_esw_en) {
+		ret = rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL);
+		if (ret == -ENODEV) {
+			DRV_LOG(DEBUG, "Unable to find transfer proxy port for port %u. It was "
+				       "probably closed. Control flows were cleared.",
+				       owner_port_id);
+			rte_errno = 0;
+			return 0;
+		} else if (ret) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u (ret = %d)",
+				owner_port_id, ret);
+			return ret;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+	} else {
+		proxy_dev = owner_dev;
+	}
+	return flow_hw_flush_ctrl_flows_owned_by(proxy_dev, owner_dev);
+}
+
 /**
  * Destroys all control flows created on @p dev device.
  *
@@ -7990,6 +8359,9 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
@@ -8002,6 +8374,60 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+int
+mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &sq_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	/*
+	 * Allocate actions array suitable for all cases - extended metadata enabled or not.
+	 * With extended metadata there will be an additional MODIFY_FIELD action before JUMP.
+	 */
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD },
+		{ .type = RTE_FLOW_ACTION_TYPE_JUMP },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+
+	/* It is assumed that caller checked for representor matching. */
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	if (!priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Port %u must be configured for HWS, before creating "
+			       "default egress flow rules. Omitting creation.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!priv->hw_tx_repr_tagging_tbl) {
+		DRV_LOG(ERR, "Port %u is configured for HWS, but table for default "
+			     "egress flow rules does not exist.",
+			     dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * If extended metadata mode is enabled, then an additional MODIFY_FIELD action must be
+	 * placed before terminating JUMP action.
+	 */
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		actions[1].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+		actions[2].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	}
+	return flow_hw_create_ctrl_flow(dev, dev, priv->hw_tx_repr_tagging_tbl,
+					items, 0, actions, 0);
+}
+
 void
 mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 {
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index a973cbc5e3..dcb02f2a7f 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1065,6 +1065,69 @@ mlx5_hairpin_get_peer_ports(struct rte_eth_dev *dev, uint16_t *peer_ports,
 	return ret;
 }
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+
+/**
+ * Check if starting representor port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then starting representor port
+ * is allowed if and only if transfer proxy port is started as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping representor port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_representor_port_allowed_start(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = UINT16_MAX;
+	int ret;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->representor);
+	ret = rte_flow_pick_transfer_proxy(dev->data->port_id, &proxy_port_id, NULL);
+	if (ret) {
+		if (ret == -ENODEV)
+			DRV_LOG(ERR, "Starting representor port %u is not allowed. Transfer "
+				     "proxy port is not available.", dev->data->port_id);
+		else
+			DRV_LOG(ERR, "Failed to pick transfer proxy for port %u (ret = %d)",
+				dev->data->port_id, ret);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (proxy_priv->dr_ctx == NULL) {
+		DRV_LOG(DEBUG, "Starting representor port %u is allowed, but default traffic flows"
+			       " will not be created. Transfer proxy port must be configured"
+			       " for HWS and started.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!proxy_dev->data->dev_started) {
+		DRV_LOG(ERR, "Failed to start port %u: transfer proxy (port %u) must be started",
+			     dev->data->port_id, proxy_port_id);
+		rte_errno = EAGAIN;
+		return -rte_errno;
+	}
+	if (priv->sh->config.repr_matching && !priv->dr_ctx) {
+		DRV_LOG(ERR, "Failed to start port %u: with representor matching enabled, port "
+			     "must be configured for HWS", dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return 0;
+}
+
+#endif
+
 /**
  * DPDK callback to start the device.
  *
@@ -1084,6 +1147,19 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	int fine_inline;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_start;
+		/* If master is being started, then it is always allowed. */
+		if (priv->master)
+			goto continue_dev_start;
+		if (mlx5_hw_representor_port_allowed_start(dev))
+			return -rte_errno;
+	}
+continue_dev_start:
+#endif
 	fine_inline = rte_mbuf_dynflag_lookup
 		(RTE_PMD_MLX5_FINE_GRANULARITY_INLINE, NULL);
 	if (fine_inline >= 0)
@@ -1248,6 +1324,53 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	return -rte_errno;
 }
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+/**
+ * Check if stopping transfer proxy port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then it is allowed to stop it
+ * if and only if all other representor ports are stopped.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping transfer proxy port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_proxy_port_allowed_stop(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	bool representor_started = false;
+	uint16_t port_id;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->master);
+	/* If transfer proxy port was not configured for HWS, then stopping it is allowed. */
+	if (!priv->dr_ctx)
+		return 0;
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_id != dev->data->port_id &&
+		    port_priv->domain_id == priv->domain_id &&
+		    port_dev->data->dev_started)
+			representor_started = true;
+	}
+	if (representor_started) {
+		DRV_LOG(INFO, "Failed to stop port %u: attached representor ports"
+			      " must be stopped before stopping transfer proxy port",
+			      dev->data->port_id);
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
+	return 0;
+}
+#endif
+
 /**
  * DPDK callback to stop the device.
  *
@@ -1261,6 +1384,21 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_stop;
+		/* If representor is being stopped, then it is always allowed. */
+		if (priv->representor)
+			goto continue_dev_stop;
+		if (mlx5_hw_proxy_port_allowed_stop(dev)) {
+			dev->data->dev_started = 1;
+			return -rte_errno;
+		}
+	}
+continue_dev_stop:
+#endif
 	dev->data->dev_started = 0;
 	/* Prevent crashes when queues are still in use. */
 	dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
@@ -1296,13 +1434,21 @@ static int
 mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	unsigned int i;
 	int ret;
 
-	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
-			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
-				goto error;
+	/*
+	 * With extended metadata enabled, the Tx metadata copy is handled by default
+	 * Tx tagging flow rules, so default Tx flow rule is not needed. It is only
+	 * required when representor matching is disabled.
+	 */
+	if (config->dv_esw_en &&
+	    !config->repr_matching &&
+	    config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->master) {
+		if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+			goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
@@ -1311,17 +1457,22 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		if (!txq)
 			continue;
 		queue = mlx5_txq_get_sqn(txq);
-		if ((priv->representor || priv->master) &&
-		    priv->sh->config.dv_esw_en) {
+		if ((priv->representor || priv->master) && config->dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
+		if (config->dv_esw_en && config->repr_matching) {
+			if (mlx5_flow_hw_tx_repr_matching_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.fdb_def_rule) {
-		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+	if (config->fdb_def_rule) {
+		if ((priv->master || priv->representor) && config->dv_esw_en) {
 			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
 				priv->fdb_def_rule = 1;
 			else
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 00/18] net/mlx5: HW steering PMD update
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (28 preceding siblings ...)
  2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-10-19 16:25 ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
                     ` (17 more replies)
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
  31 siblings, 18 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  Cc: dev, rasland, orika

The skeleton of mlx5 HW steering(HWS) has been updated into
upstream for pretty a long time, but not updated anymore due
to missing of the low-level steering layer code. Luckily,
better late than never, the steering layer finnaly comes[1].

This series will add more features to the existing PMD code:
 - FDB and metadata copy.
 - Modify field.
 - Meter color.
 - Counter.
 - Aging.
 - Action template pre-parser optimization.
 - Connection tracking.

Some features such as meter/aging/ct touches the public API,
and public API changes have been sent to ML much earily in
other threads in order not to be swallowed by this big series.

The dpends patches as below:
 [1]https://inbox.dpdk.org/dev/20220922190345.394-1-valex@nvidia.com/

---

 v4:
  - Disable aging due to the flow age API change still in progress.
    https://patches.dpdk.org/project/dpdk/cover/20221019144904.2543586-1-michaelba@nvidia.com/
  - Add control flow for HWS.

 v3:
  - Fixed flow can't be aged out.
  - Fix error not be filled properly while table creat failed.
  - Remove transfer_mode in flow attributes before ethdev layer applied.
    https://patches.dpdk.org/project/dpdk/patch/20220928092425.68214-1-rongweil@nvidia.com/

 v2:
  - Remove the rte_flow patches as they will be integrated in other thread.
  - Fix compilation issues.
  - Make the patches be better organized.


*** BLURB HERE ***

Alexander Kozyrev (2):
  net/mlx5: add HW steering meter action
  net/mlx5: implement METER MARK indirect action for HWS

Bing Zhao (1):
  net/mlx5: add extended metadata mode for hardware steering

Dariusz Sosnowski (5):
  net/mlx5: add HW steering port action
  net/mlx5: support DR action template API
  net/mlx5: support device control for E-Switch default rule
  net/mlx5: support device control of representor matching
  net/mlx5: create control flow rules with HWS

Gregory Etelson (2):
  net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  net/mlx5: support flow integrity in HWS group 0

Michael Baum (1):
  net/mlx5: add HWS AGE action support

Suanming Mou (6):
  net/mlx5: fix invalid flow attributes
  net/mlx5: fix IPv6 and TCP RSS hash fields
  net/mlx5: add shared header reformat support
  net/mlx5: add modify field hws support
  net/mlx5: add HW steering connection tracking support
  net/mlx5: add async action push and pull support

Xiaoyu Min (1):
  net/mlx5: add HW steering counter action

 doc/guides/nics/mlx5.rst             |   39 +
 drivers/common/mlx5/mlx5_devx_cmds.c |   50 +
 drivers/common/mlx5/mlx5_devx_cmds.h |   27 +
 drivers/common/mlx5/mlx5_prm.h       |   22 +-
 drivers/common/mlx5/version.map      |    1 +
 drivers/net/mlx5/linux/mlx5_os.c     |   78 +-
 drivers/net/mlx5/meson.build         |    1 +
 drivers/net/mlx5/mlx5.c              |  126 +-
 drivers/net/mlx5/mlx5.h              |  322 +-
 drivers/net/mlx5/mlx5_defs.h         |    5 +
 drivers/net/mlx5/mlx5_flow.c         |  409 +-
 drivers/net/mlx5/mlx5_flow.h         |  335 +-
 drivers/net/mlx5/mlx5_flow_aso.c     |  797 ++-
 drivers/net/mlx5/mlx5_flow_dv.c      | 1128 ++--
 drivers/net/mlx5/mlx5_flow_hw.c      | 8789 +++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow_meter.c   |  776 ++-
 drivers/net/mlx5/mlx5_flow_verbs.c   |    8 +-
 drivers/net/mlx5/mlx5_hws_cnt.c      | 1247 ++++
 drivers/net/mlx5/mlx5_hws_cnt.h      |  703 ++
 drivers/net/mlx5/mlx5_rxq.c          |    3 +-
 drivers/net/mlx5/mlx5_trigger.c      |  272 +-
 drivers/net/mlx5/mlx5_tx.h           |    1 +
 drivers/net/mlx5/mlx5_txq.c          |   47 +
 drivers/net/mlx5/mlx5_utils.h        |   10 +-
 drivers/net/mlx5/rte_pmd_mlx5.h      |   17 +
 drivers/net/mlx5/version.map         |    1 +
 26 files changed, 13576 insertions(+), 1638 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 01/18] net/mlx5: fix invalid flow attributes
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the function flow_get_drv_type(), attr will be read in non-HWS mode.
In case user call the HWS API in SWS mode, attr should be placed in
HWS functions, or it will cause crash.

Fixes: 572801ab860f ("ethdev: backport upstream rte_flow_async codes")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c | 38 ++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 2c6acd551c..f36e72fb89 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -3742,6 +3742,8 @@ flow_get_drv_type(struct rte_eth_dev *dev, const struct rte_flow_attr *attr)
 	 */
 	if (priv->sh->config.dv_flow_en == 2)
 		return MLX5_FLOW_TYPE_HW;
+	if (!attr)
+		return MLX5_FLOW_TYPE_MIN;
 	/* If no OS specific type - continue with DV/VERBS selection */
 	if (attr->transfer && priv->sh->config.dv_esw_en)
 		type = MLX5_FLOW_TYPE_DV;
@@ -8254,8 +8256,9 @@ mlx5_flow_info_get(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8289,8 +8292,9 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8321,8 +8325,9 @@ mlx5_flow_pattern_template_create(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8352,8 +8357,9 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8387,8 +8393,9 @@ mlx5_flow_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8418,8 +8425,9 @@ mlx5_flow_actions_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8459,8 +8467,9 @@ mlx5_flow_table_create(struct rte_eth_dev *dev,
 		       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8496,8 +8505,9 @@ mlx5_flow_table_destroy(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8544,8 +8554,9 @@ mlx5_flow_async_flow_create(struct rte_eth_dev *dev,
 			    struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8587,8 +8598,9 @@ mlx5_flow_async_flow_destroy(struct rte_eth_dev *dev,
 			     struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8623,8 +8635,9 @@ mlx5_flow_pull(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8652,8 +8665,9 @@ mlx5_flow_push(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 03/18] net/mlx5: add shared header reformat support Suanming Mou
                     ` (15 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the flow_dv_hashfields_set() function, while item_flags was 0,
the code went directly to the first if and the else case would
never have chance be checked. This caused the IPv6 and TCP hash
fileds in the else case would never be set.

This commit adds the dedicate HW steering hash field set function
to generate the RSS hash fields.

Fixes: 6540da0b93b5 ("net/mlx5: fix RSS scaling issue")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 12 +++----
 drivers/net/mlx5/mlx5_flow_hw.c | 59 ++++++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 0cf757898d..29d7bf7049 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -11274,8 +11274,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		rss_inner = 1;
 #endif
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV4)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4)) ||
-	     !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4))) {
 		if (rss_types & MLX5_IPV4_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV4;
@@ -11285,8 +11284,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_IPV4_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV6)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6))) {
 		if (rss_types & MLX5_IPV6_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV6;
@@ -11309,8 +11307,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		return;
 	}
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_UDP)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP)) ||
-	    !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP))) {
 		if (rss_types & RTE_ETH_RSS_UDP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_UDP;
@@ -11320,8 +11317,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_UDP_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_TCP)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP))) {
 		if (rss_types & RTE_ETH_RSS_TCP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_TCP;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index fecf28c1ca..d46e4c6769 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -62,6 +62,63 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	priv->mark_enabled = enable;
 }
 
+/**
+ * Set the hash fields according to the @p rss_desc information.
+ *
+ * @param[in] rss_desc
+ *   Pointer to the mlx5_flow_rss_desc.
+ * @param[out] hash_fields
+ *   Pointer to the RSS hash fields.
+ */
+static void
+flow_hw_hashfields_set(struct mlx5_flow_rss_desc *rss_desc,
+		       uint64_t *hash_fields)
+{
+	uint64_t fields = 0;
+	int rss_inner = 0;
+	uint64_t rss_types = rte_eth_rss_hf_refine(rss_desc->types);
+
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	if (rss_desc->level >= 2)
+		rss_inner = 1;
+#endif
+	if (rss_types & MLX5_IPV4_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV4;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV4;
+		else
+			fields |= MLX5_IPV4_IBV_RX_HASH;
+	} else if (rss_types & MLX5_IPV6_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV6;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV6;
+		else
+			fields |= MLX5_IPV6_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_UDP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_UDP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_UDP;
+		else
+			fields |= MLX5_UDP_IBV_RX_HASH;
+	} else if (rss_types & RTE_ETH_RSS_TCP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_TCP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_TCP;
+		else
+			fields |= MLX5_TCP_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_ESP)
+		fields |= IBV_RX_HASH_IPSEC_SPI;
+	if (rss_inner)
+		fields |= IBV_RX_HASH_INNER;
+	*hash_fields = fields;
+}
+
 /**
  * Generate the pattern item flags.
  * Will be used for shared RSS action.
@@ -225,7 +282,7 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 		       MLX5_RSS_HASH_KEY_LEN);
 		rss_desc.key_len = MLX5_RSS_HASH_KEY_LEN;
 		rss_desc.types = !rss->types ? RTE_ETH_RSS_IP : rss->types;
-		flow_dv_hashfields_set(0, &rss_desc, &rss_desc.hash_fields);
+		flow_hw_hashfields_set(&rss_desc, &rss_desc.hash_fields);
 		flow_dv_action_rss_l34_hash_adjust(rss->types,
 						   &rss_desc.hash_fields);
 		if (rss->level > 1) {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 03/18] net/mlx5: add shared header reformat support
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 04/18] net/mlx5: add modify field hws support Suanming Mou
                     ` (14 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

As the rte_flow_async API defines, the action mask with field value
not be 0 means the action will be used as shared in all the flows in
the table.

The header reformat action with action mask field not be 0 will be
created as constant shared action. For encapsulation header reformat
action, there are two kinds of encapsulation data, raw_encap_data
and rte_flow_item encap_data. Both of these two kinds of data can be
identified from the action mask conf as constant or not.

Examples:
1. VXLAN encap (encap_data: rte_flow_item)
	action conf (eth/ipv4/udp/vxlan_hdr)

	a. action mask conf (eth/ipv4/udp/vxlan_hdr)
	  - items are constant.
	b. action mask conf (NULL)
	  - items will change.

2. RAW encap (encap_data: raw)
	action conf (raw_data)

	a. action mask conf (not NULL)
	  - encap_data constant.
	b. action mask conf (NULL)
	  - encap_data will change.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   6 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 124 ++++++++++----------------------
 2 files changed, 39 insertions(+), 91 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index cde602d3a1..26660da0de 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1075,10 +1075,6 @@ struct mlx5_action_construct_data {
 	uint16_t action_dst; /* mlx5dr_rule_action dst offset. */
 	union {
 		struct {
-			/* encap src(item) offset. */
-			uint16_t src;
-			/* encap dst data offset. */
-			uint16_t dst;
 			/* encap data len. */
 			uint16_t len;
 		} encap;
@@ -1121,6 +1117,8 @@ struct mlx5_hw_jump_action {
 /* Encap decap action struct. */
 struct mlx5_hw_encap_decap_action {
 	struct mlx5dr_action *action; /* Action object. */
+	/* Is header_reformat action shared across flows in table. */
+	bool shared;
 	size_t data_size; /* Action metadata size. */
 	uint8_t data[]; /* Action data. */
 };
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index d46e4c6769..e62d25fda2 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -402,10 +402,6 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
  *   Offset of source rte flow action.
  * @param[in] action_dst
  *   Offset of destination DR action.
- * @param[in] encap_src
- *   Offset of source encap raw data.
- * @param[in] encap_dst
- *   Offset of destination encap raw data.
  * @param[in] len
  *   Length of the data to be updated.
  *
@@ -418,16 +414,12 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				enum rte_flow_action_type type,
 				uint16_t action_src,
 				uint16_t action_dst,
-				uint16_t encap_src,
-				uint16_t encap_dst,
 				uint16_t len)
 {	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
 		return -1;
-	act_data->encap.src = encap_src;
-	act_data->encap.dst = encap_dst;
 	act_data->encap.len = len;
 	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
 	return 0;
@@ -523,53 +515,6 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
-/**
- * Translate encap items to encapsulation list.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev data structure.
- * @param[in] acts
- *   Pointer to the template HW steering DR actions.
- * @param[in] type
- *   Action type.
- * @param[in] action_src
- *   Offset of source rte flow action.
- * @param[in] action_dst
- *   Offset of destination DR action.
- * @param[in] items
- *   Encap item pattern.
- * @param[in] items_m
- *   Encap item mask indicates which part are constant and dynamic.
- *
- * @return
- *    0 on success, negative value otherwise and rte_errno is set.
- */
-static __rte_always_inline int
-flow_hw_encap_item_translate(struct rte_eth_dev *dev,
-			     struct mlx5_hw_actions *acts,
-			     enum rte_flow_action_type type,
-			     uint16_t action_src,
-			     uint16_t action_dst,
-			     const struct rte_flow_item *items,
-			     const struct rte_flow_item *items_m)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	size_t len, total_len = 0;
-	uint32_t i = 0;
-
-	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++, items_m++, i++) {
-		len = flow_dv_get_item_hdr_len(items->type);
-		if ((!items_m->spec ||
-		    memcmp(items_m->spec, items->spec, len)) &&
-		    __flow_hw_act_data_encap_append(priv, acts, type,
-						    action_src, action_dst, i,
-						    total_len, len))
-			return -1;
-		total_len += len;
-	}
-	return 0;
-}
-
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -611,7 +556,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
-	uint8_t *encap_data = NULL;
+	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	bool actions_end = false;
 	uint32_t type, i;
@@ -718,9 +663,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_vxlan_encap *)
-				 masks->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -729,9 +674,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_nvgre_encap *)
-				actions->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -743,6 +688,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data =
+				(const struct rte_flow_action_raw_encap *)
+				 masks->conf;
+			if (raw_encap_data)
+				encap_data_m = raw_encap_data->data;
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 actions->conf;
@@ -773,22 +723,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
+		bool shared_rfmt = true;
 
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
-			if (flow_dv_convert_encap_data
-				(enc_item, buf, &data_size, error) ||
-			    flow_hw_encap_item_translate
-				(dev, acts, (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos,
-				 enc_item, enc_item_m))
+			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
 				goto err;
 			encap_data = buf;
-		} else if (encap_data && __flow_hw_act_data_encap_append
-				(priv, acts,
-				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, 0, 0, data_size)) {
-			goto err;
+			if (!enc_item_m)
+				shared_rfmt = false;
+		} else if (encap_data && !encap_data_m) {
+			shared_rfmt = false;
 		}
 		acts->encap_decap = mlx5_malloc(MLX5_MEM_ZERO,
 				    sizeof(*acts->encap_decap) + data_size,
@@ -802,12 +747,22 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		acts->encap_decap->action = mlx5dr_action_create_reformat
 				(priv->dr_ctx, refmt_type,
 				 data_size, encap_data,
-				 rte_log2_u32(table_attr->nb_flows),
-				 mlx5_hw_act_flag[!!attr->group][type]);
+				 shared_rfmt ? 0 : rte_log2_u32(table_attr->nb_flows),
+				 mlx5_hw_act_flag[!!attr->group][type] |
+				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
 		acts->rule_acts[reformat_pos].action =
 						acts->encap_decap->action;
+		acts->rule_acts[reformat_pos].reformat.data =
+						acts->encap_decap->data;
+		if (shared_rfmt)
+			acts->rule_acts[reformat_pos].reformat.offset = 0;
+		else if (__flow_hw_act_data_encap_append(priv, acts,
+				 (action_start + reformat_src)->type,
+				 reformat_src, reformat_pos, data_size))
+			goto err;
+		acts->encap_decap->shared = shared_rfmt;
 		acts->encap_decap_pos = reformat_pos;
 	}
 	acts->acts_num = i;
@@ -972,6 +927,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			.ingress = 1,
 	};
 	uint32_t ft_flag;
+	size_t encap_len = 0;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -989,9 +945,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
-	if (hw_acts->encap_decap && hw_acts->encap_decap->data_size)
-		memcpy(buf, hw_acts->encap_decap->data,
-		       hw_acts->encap_decap->data_size);
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1050,23 +1003,20 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 action->conf;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   raw_encap_data->data, act_data->encap.len);
+			rte_memcpy((void *)buf, raw_encap_data->data, act_data->encap.len);
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
@@ -1074,7 +1024,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
-	if (hw_acts->encap_decap) {
+	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 04/18] net/mlx5: add modify field hws support
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (2 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 03/18] net/mlx5: add shared header reformat support Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 05/18] net/mlx5: add HW steering port action Suanming Mou
                     ` (13 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

This patch introduces support for modify_field rte_flow actions in HWS
mode. Support includes:

- Ingress and egress domains,
	- SET and ADD operations,
	- usage of arbitrary bit offsets and widths for packet and metadata
	fields.

	Support is implemented in two phases:

	1. On flow table creation the hardware commands are generated, based
	on rte_flow action templates, and stored alongside action template.
	2. On flow rule creation/queueing the hardware commands are updated with
	values provided by the user. Any masks over immediate values, provided
	in action templates, are applied to these values before enqueueing rules
	for creation.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h   |   2 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +-
 drivers/net/mlx5/mlx5.h          |   1 +
 drivers/net/mlx5/mlx5_flow.h     |  96 +++++
 drivers/net/mlx5/mlx5_flow_dv.c  | 551 ++++++++++++++-------------
 drivers/net/mlx5/mlx5_flow_hw.c  | 614 ++++++++++++++++++++++++++++++-
 6 files changed, 1007 insertions(+), 275 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 371942ae50..fb3c43eed9 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -751,6 +751,8 @@ enum mlx5_modification_field {
 	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
 	MLX5_MODI_GTP_TEID = 0x6E,
 	MLX5_MODI_OUT_IP_ECN = 0x73,
+	MLX5_MODI_TUNNEL_HDR_DW_1 = 0x75,
+	MLX5_MODI_GTPU_FIRST_EXT_DW_0 = 0x76,
 };
 
 /* Total number of metadata reg_c's. */
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index aed55e6a62..b7cc11a2ef 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1540,6 +1540,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				       mlx5_hrxq_clone_free_cb);
 	if (!priv->hrxqs)
 		goto error;
+	mlx5_set_metadata_mask(eth_dev);
+	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+	    !priv->sh->dv_regc0_mask) {
+		DRV_LOG(ERR, "metadata mode %u is not supported "
+			     "(no metadata reg_c[0] is available)",
+			     sh->config.dv_xmeta_en);
+			err = ENOTSUP;
+			goto error;
+	}
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
@@ -1566,15 +1575,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		err = -err;
 		goto error;
 	}
-	mlx5_set_metadata_mask(eth_dev);
-	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
-	    !priv->sh->dv_regc0_mask) {
-		DRV_LOG(ERR, "metadata mode %u is not supported "
-			     "(no metadata reg_c[0] is available)",
-			     sh->config.dv_xmeta_en);
-			err = ENOTSUP;
-			goto error;
-	}
 	/* Query availability of metadata reg_c's. */
 	if (!priv->sh->metadata_regc_check_flag) {
 		err = mlx5_flow_discover_mreg_c(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1d3c1ad93d..6f75a32488 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -348,6 +348,7 @@ struct mlx5_hw_q_job {
 	struct rte_flow_hw *flow; /* Flow attached to the job. */
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
+	struct mlx5_modification_cmd *mhdr_cmd;
 };
 
 /* HW steering job descriptor LIFO pool. */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 26660da0de..407b3d79bd 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1010,6 +1010,51 @@ flow_items_to_tunnel(const struct rte_flow_item items[])
 	return items[0].spec;
 }
 
+/**
+ * Fetch 1, 2, 3 or 4 byte field from the byte array
+ * and return as unsigned integer in host-endian format.
+ *
+ * @param[in] data
+ *   Pointer to data array.
+ * @param[in] size
+ *   Size of field to extract.
+ *
+ * @return
+ *   converted field in host endian format.
+ */
+static inline uint32_t
+flow_dv_fetch_field(const uint8_t *data, uint32_t size)
+{
+	uint32_t ret;
+
+	switch (size) {
+	case 1:
+		ret = *data;
+		break;
+	case 2:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		break;
+	case 3:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		ret = (ret << 8) | *(data + sizeof(uint16_t));
+		break;
+	case 4:
+		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
+		break;
+	default:
+		MLX5_ASSERT(false);
+		ret = 0;
+		break;
+	}
+	return ret;
+}
+
+struct field_modify_info {
+	uint32_t size; /* Size of field in protocol header, in bytes. */
+	uint32_t offset; /* Offset of field in protocol header, in bytes. */
+	enum mlx5_modification_field id;
+};
+
 /* HW steering flow attributes. */
 struct mlx5_flow_attr {
 	uint32_t port_id; /* Port index. */
@@ -1078,6 +1123,29 @@ struct mlx5_action_construct_data {
 			/* encap data len. */
 			uint16_t len;
 		} encap;
+		struct {
+			/* Modify header action offset in pattern. */
+			uint16_t mhdr_cmds_off;
+			/* Offset in pattern after modify header actions. */
+			uint16_t mhdr_cmds_end;
+			/*
+			 * True if this action is masked and does not need to
+			 * be generated.
+			 */
+			bool shared;
+			/*
+			 * Modified field definitions in dst field (SET, ADD)
+			 * or src field (COPY).
+			 */
+			struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS];
+			/* Modified field definitions in dst field (COPY). */
+			struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS];
+			/*
+			 * Masks applied to field values to generate
+			 * PRM actions.
+			 */
+			uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS];
+		} modify_header;
 		struct {
 			uint64_t types; /* RSS hash types. */
 			uint32_t level; /* RSS level. */
@@ -1103,6 +1171,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 };
 
@@ -1123,6 +1192,22 @@ struct mlx5_hw_encap_decap_action {
 	uint8_t data[]; /* Action data. */
 };
 
+#define MLX5_MHDR_MAX_CMD ((MLX5_MAX_MODIFY_NUM) * 2 + 1)
+
+/* Modify field action struct. */
+struct mlx5_hw_modify_header_action {
+	/* Reference to DR action */
+	struct mlx5dr_action *action;
+	/* Modify header action position in action rule table. */
+	uint16_t pos;
+	/* Is MODIFY_HEADER action shared across flows in table. */
+	bool shared;
+	/* Amount of modification commands stored in the precompiled buffer. */
+	uint32_t mhdr_cmds_num;
+	/* Precompiled modification commands. */
+	struct mlx5_modification_cmd mhdr_cmds[MLX5_MHDR_MAX_CMD];
+};
+
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
@@ -1132,6 +1217,7 @@ struct mlx5_hw_actions {
 	LIST_HEAD(act_list, mlx5_action_construct_data) act_list;
 	struct mlx5_hw_jump_action *jump; /* Jump action. */
 	struct mlx5_hrxq *tir; /* TIR action. */
+	struct mlx5_hw_modify_header_action *mhdr; /* Modify header action. */
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
@@ -2238,6 +2324,16 @@ int flow_dv_action_query(struct rte_eth_dev *dev,
 size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type);
 int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf,
 			   size_t *size, struct rte_flow_error *error);
+void mlx5_flow_field_id_to_modify_info
+		(const struct rte_flow_action_modify_data *data,
+		 struct field_modify_info *info, uint32_t *mask,
+		 uint32_t width, struct rte_eth_dev *dev,
+		 const struct rte_flow_attr *attr, struct rte_flow_error *error);
+int flow_dv_convert_modify_action(struct rte_flow_item *item,
+			      struct field_modify_info *field,
+			      struct field_modify_info *dcopy,
+			      struct mlx5_flow_dv_modify_hdr_resource *resource,
+			      uint32_t type, struct rte_flow_error *error);
 
 #define MLX5_PF_VPORT_ID 0
 #define MLX5_ECPF_VPORT_ID 0xFFFE
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 29d7bf7049..9fbaa4bfe8 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -216,12 +216,6 @@ flow_dv_attr_init(const struct rte_flow_item *item, union flow_dv_attr *attr,
 	attr->valid = 1;
 }
 
-struct field_modify_info {
-	uint32_t size; /* Size of field in protocol header, in bytes. */
-	uint32_t offset; /* Offset of field in protocol header, in bytes. */
-	enum mlx5_modification_field id;
-};
-
 struct field_modify_info modify_eth[] = {
 	{4,  0, MLX5_MODI_OUT_DMAC_47_16},
 	{2,  4, MLX5_MODI_OUT_DMAC_15_0},
@@ -354,45 +348,6 @@ mlx5_update_vlan_vid_pcp(const struct rte_flow_action *action,
 	}
 }
 
-/**
- * Fetch 1, 2, 3 or 4 byte field from the byte array
- * and return as unsigned integer in host-endian format.
- *
- * @param[in] data
- *   Pointer to data array.
- * @param[in] size
- *   Size of field to extract.
- *
- * @return
- *   converted field in host endian format.
- */
-static inline uint32_t
-flow_dv_fetch_field(const uint8_t *data, uint32_t size)
-{
-	uint32_t ret;
-
-	switch (size) {
-	case 1:
-		ret = *data;
-		break;
-	case 2:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		break;
-	case 3:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		ret = (ret << 8) | *(data + sizeof(uint16_t));
-		break;
-	case 4:
-		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
-		break;
-	default:
-		MLX5_ASSERT(false);
-		ret = 0;
-		break;
-	}
-	return ret;
-}
-
 /**
  * Convert modify-header action to DV specification.
  *
@@ -421,7 +376,7 @@ flow_dv_fetch_field(const uint8_t *data, uint32_t size)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-static int
+int
 flow_dv_convert_modify_action(struct rte_flow_item *item,
 			      struct field_modify_info *field,
 			      struct field_modify_info *dcopy,
@@ -1439,7 +1394,32 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static void
+static __rte_always_inline uint8_t
+flow_modify_info_mask_8(uint32_t length, uint32_t off)
+{
+	return (0xffu >> (8 - length)) << off;
+}
+
+static __rte_always_inline uint16_t
+flow_modify_info_mask_16(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_16((0xffffu >> (16 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_32((0xffffffffu >> (32 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mask)
+{
+	uint32_t mask = (0xffffffffu >> (32 - length)) << off;
+	return rte_cpu_to_be_32(mask & post_mask);
+}
+
+void
 mlx5_flow_field_id_to_modify_info
 		(const struct rte_flow_action_modify_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
@@ -1448,323 +1428,340 @@ mlx5_flow_field_id_to_modify_info
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint32_t idx = 0;
-	uint32_t off = 0;
-
-	switch (data->field) {
+	uint32_t off_be = 0;
+	uint32_t length = 0;
+	switch ((int)data->field) {
 	case RTE_FLOW_FIELD_START:
 		/* not supported yet */
 		MLX5_ASSERT(false);
 		break;
 	case RTE_FLOW_FIELD_MAC_DST:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_DMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_DMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_DMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_DMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_DMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_SRC:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_SMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_SMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_SMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_SMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_SMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VLAN_TYPE:
 		/* not supported yet */
 		break;
 	case RTE_FLOW_FIELD_VLAN_ID:
+		MLX5_ASSERT(data->offset + width <= 12);
+		off_be = 12 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_FIRST_VID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x0fff >> (12 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_TYPE:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_ETHERTYPE};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_TTL:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV4_TTL};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_SRC:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_SIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DST:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_DIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_HOPLIMIT:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV6_HOPLIMIT};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_SRC:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_SIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_SIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_SIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	case RTE_FLOW_FIELD_IPV6_SRC: {
+		/*
+		 * Fields corresponding to IPv6 source address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_SIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_SIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_SIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_SIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_DST:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_DIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_DIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_DIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	}
+	case RTE_FLOW_FIELD_IPV6_DST: {
+		/*
+		 * Fields corresponding to IPv6 destination address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_DIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_DIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_DIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_DIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
+	}
 	case RTE_FLOW_FIELD_TCP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_SEQ_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_SEQ_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_ACK_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_ACK_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_FLAGS:
+		MLX5_ASSERT(data->offset + width <= 9);
+		off_be = 9 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_FLAGS};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x1ff >> (9 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VXLAN_VNI:
-		/* not supported yet */
+		MLX5_ASSERT(data->offset + width <= 24);
+		/* VNI is on bits 31-8 of TUNNEL_HDR_DW_1. */
+		off_be = 24 - (data->offset + width) + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_TUNNEL_HDR_DW_1};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_GENEVE_VNI:
 		/* not supported yet*/
 		break;
 	case RTE_FLOW_FIELD_GTP_TEID:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_GTP_TEID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TAG:
 		{
-			int reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
-						   data->level, error);
+			MLX5_ASSERT(data->offset + width <= 32);
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = REG_C_1;
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
+							   data->level, error);
 			if (reg < 0)
 				return;
 			MLX5_ASSERT(reg != REG_NON);
@@ -1772,15 +1769,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] =
-					rte_cpu_to_be_32(0xffffffff >>
-							 (32 - width));
+				mask[idx] = flow_modify_info_mask_32
+					(width, data->offset);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_MARK:
 		{
 			uint32_t mark_mask = priv->sh->dv_mark_mask;
 			uint32_t mark_count = __builtin_popcount(mark_mask);
+			RTE_SET_USED(mark_count);
+			MLX5_ASSERT(data->offset + width <= mark_count);
 			int reg = mlx5_flow_get_reg_id(dev, MLX5_FLOW_MARK,
 						       0, error);
 			if (reg < 0)
@@ -1790,14 +1790,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((mark_mask >>
-					 (mark_count - width)) & mark_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, mark_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_META:
 		{
 			uint32_t meta_mask = priv->sh->dv_meta_mask;
 			uint32_t meta_count = __builtin_popcount(meta_mask);
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
 			int reg = flow_dv_get_metadata_reg(dev, attr, error);
 			if (reg < 0)
 				return;
@@ -1806,16 +1810,32 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((meta_mask >>
-					(meta_count - width)) & meta_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+		MLX5_ASSERT(data->offset + width <= 2);
+		off_be = 2 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_ECN};
 		if (mask)
-			mask[idx] = 0x3 >> (2 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
+		break;
+	case RTE_FLOW_FIELD_GTP_PSC_QFI:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = data->offset + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_GTPU_FIRST_EXT_DW_0};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
@@ -1865,7 +1885,8 @@ flow_dv_convert_action_modify_field
 
 	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
 	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
-		type = MLX5_MODIFICATION_TYPE_SET;
+		type = conf->operation == RTE_FLOW_MODIFY_SET ?
+			MLX5_MODIFICATION_TYPE_SET : MLX5_MODIFICATION_TYPE_ADD;
 		/** For SET fill the destination field (field) first. */
 		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
 						  conf->width, dev,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index e62d25fda2..fa7bd37737 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -319,6 +319,11 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->mhdr) {
+		if (acts->mhdr->action)
+			mlx5dr_action_destroy(acts->mhdr->action);
+		mlx5_free(acts->mhdr);
+	}
 }
 
 /**
@@ -425,6 +430,37 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+static __rte_always_inline int
+__flow_hw_act_data_hdr_modify_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     uint16_t mhdr_cmds_off,
+				     uint16_t mhdr_cmds_end,
+				     bool shared,
+				     struct field_modify_info *field,
+				     struct field_modify_info *dcopy,
+				     uint32_t *mask)
+{
+	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->modify_header.mhdr_cmds_off = mhdr_cmds_off;
+	act_data->modify_header.mhdr_cmds_end = mhdr_cmds_end;
+	act_data->modify_header.shared = shared;
+	rte_memcpy(act_data->modify_header.field, field,
+		   sizeof(*field) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.dcopy, dcopy,
+		   sizeof(*dcopy) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.mask, mask,
+		   sizeof(*mask) * MLX5_ACT_MAX_MOD_FIELDS);
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
 /**
  * Append shared RSS action to the dynamic action list.
  *
@@ -515,6 +551,265 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline bool
+flow_hw_action_modify_field_is_shared(const struct rte_flow_action *action,
+				      const struct rte_flow_action *mask)
+{
+	const struct rte_flow_action_modify_field *v = action->conf;
+	const struct rte_flow_action_modify_field *m = mask->conf;
+
+	if (v->src.field == RTE_FLOW_FIELD_VALUE) {
+		uint32_t j;
+
+		if (m == NULL)
+			return false;
+		for (j = 0; j < RTE_DIM(m->src.value); ++j) {
+			/*
+			 * Immediate value is considered to be masked
+			 * (and thus shared by all flow rules), if mask
+			 * is non-zero. Partial mask over immediate value
+			 * is not allowed.
+			 */
+			if (m->src.value[j])
+				return true;
+		}
+		return false;
+	}
+	if (v->src.field == RTE_FLOW_FIELD_POINTER)
+		return m->src.pvalue != NULL;
+	/*
+	 * Source field types other than VALUE and
+	 * POINTER are always shared.
+	 */
+	return true;
+}
+
+static __rte_always_inline bool
+flow_hw_should_insert_nop(const struct mlx5_hw_modify_header_action *mhdr,
+			  const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd last_cmd = { { 0 } };
+	struct mlx5_modification_cmd new_cmd = { { 0 } };
+	const uint32_t cmds_num = mhdr->mhdr_cmds_num;
+	unsigned int last_type;
+	bool should_insert = false;
+
+	if (cmds_num == 0)
+		return false;
+	last_cmd = *(&mhdr->mhdr_cmds[cmds_num - 1]);
+	last_cmd.data0 = rte_be_to_cpu_32(last_cmd.data0);
+	last_cmd.data1 = rte_be_to_cpu_32(last_cmd.data1);
+	last_type = last_cmd.action_type;
+	new_cmd = *cmd;
+	new_cmd.data0 = rte_be_to_cpu_32(new_cmd.data0);
+	new_cmd.data1 = rte_be_to_cpu_32(new_cmd.data1);
+	switch (new_cmd.action_type) {
+	case MLX5_MODIFICATION_TYPE_SET:
+	case MLX5_MODIFICATION_TYPE_ADD:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = new_cmd.field == last_cmd.field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = new_cmd.field == last_cmd.dst_field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	case MLX5_MODIFICATION_TYPE_COPY:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = (new_cmd.field == last_cmd.field ||
+					 new_cmd.dst_field == last_cmd.field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = (new_cmd.field == last_cmd.dst_field ||
+					 new_cmd.dst_field == last_cmd.dst_field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	default:
+		/* Other action types should be rejected on AT validation. */
+		MLX5_ASSERT(false);
+		break;
+	}
+	return should_insert;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_nop_append(struct mlx5_hw_modify_header_action *mhdr)
+{
+	struct mlx5_modification_cmd *nop;
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	nop = mhdr->mhdr_cmds + num;
+	nop->data0 = 0;
+	nop->action_type = MLX5_MODIFICATION_TYPE_NOP;
+	nop->data0 = rte_cpu_to_be_32(nop->data0);
+	nop->data1 = 0;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_append(struct mlx5_hw_modify_header_action *mhdr,
+			struct mlx5_modification_cmd *cmd)
+{
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	mhdr->mhdr_cmds[num] = *cmd;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_converted_mhdr_cmds_append(struct mlx5_hw_modify_header_action *mhdr,
+				   struct mlx5_flow_dv_modify_hdr_resource *resource)
+{
+	uint32_t idx;
+	int ret;
+
+	for (idx = 0; idx < resource->actions_num; ++idx) {
+		struct mlx5_modification_cmd *src = &resource->actions[idx];
+
+		if (flow_hw_should_insert_nop(mhdr, src)) {
+			ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+			if (ret)
+				return ret;
+		}
+		ret = flow_hw_mhdr_cmd_append(mhdr, src);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static __rte_always_inline void
+flow_hw_modify_field_init(struct mlx5_hw_modify_header_action *mhdr,
+			  struct rte_flow_actions_template *at)
+{
+	memset(mhdr, 0, sizeof(*mhdr));
+	/* Modify header action without any commands is shared by default. */
+	mhdr->shared = true;
+	mhdr->pos = at->mhdr_off;
+}
+
+static __rte_always_inline int
+flow_hw_modify_field_compile(struct rte_eth_dev *dev,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *action_start, /* Start of AT actions. */
+			     const struct rte_flow_action *action, /* Current action from AT. */
+			     const struct rte_flow_action *action_mask, /* Current mask from AT. */
+			     struct mlx5_hw_actions *acts,
+			     struct mlx5_hw_modify_header_action *mhdr,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_modify_field *conf = action->conf;
+	union {
+		struct mlx5_flow_dv_modify_hdr_resource resource;
+		uint8_t data[sizeof(struct mlx5_flow_dv_modify_hdr_resource) +
+			     sizeof(struct mlx5_modification_cmd) * MLX5_MHDR_MAX_CMD];
+	} dummy;
+	struct mlx5_flow_dv_modify_hdr_resource *resource;
+	struct rte_flow_item item = {
+		.spec = NULL,
+		.mask = NULL
+	};
+	struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS] = { 0 };
+	uint32_t type, value = 0;
+	uint16_t cmds_start, cmds_end;
+	bool shared;
+	int ret;
+
+	/*
+	 * Modify header action is shared if previous modify_field actions
+	 * are shared and currently compiled action is shared.
+	 */
+	shared = flow_hw_action_modify_field_is_shared(action, action_mask);
+	mhdr->shared &= shared;
+	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
+	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
+		type = conf->operation == RTE_FLOW_MODIFY_SET ? MLX5_MODIFICATION_TYPE_SET :
+								MLX5_MODIFICATION_TYPE_ADD;
+		/* For SET/ADD fill the destination field (field) first. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
+						  conf->width, dev,
+						  attr, error);
+		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
+				(void *)(uintptr_t)conf->src.pvalue :
+				(void *)(uintptr_t)&conf->src.value;
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+			value = *(const unaligned_uint32_t *)item.spec;
+			value = rte_cpu_to_be_32(value);
+			item.spec = &value;
+		} else if (conf->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+			/*
+			 * QFI is passed as an uint8_t integer, but it is accessed through
+			 * a 2nd least significant byte of a 32-bit field in modify header command.
+			 */
+			value = *(const uint8_t *)item.spec;
+			value = rte_cpu_to_be_32(value << 8);
+			item.spec = &value;
+		}
+	} else {
+		type = MLX5_MODIFICATION_TYPE_COPY;
+		/* For COPY fill the destination field (dcopy) without mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, dcopy, NULL,
+						  conf->width, dev,
+						  attr, error);
+		/* Then construct the source field (field) with mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->src, field, mask,
+						  conf->width, dev,
+						  attr, error);
+	}
+	item.mask = &mask;
+	memset(&dummy, 0, sizeof(dummy));
+	resource = &dummy.resource;
+	ret = flow_dv_convert_modify_action(&item, field, dcopy, resource, type, error);
+	if (ret)
+		return ret;
+	MLX5_ASSERT(resource->actions_num > 0);
+	/*
+	 * If previous modify field action collide with this one, then insert NOP command.
+	 * This NOP command will not be a part of action's command range used to update commands
+	 * on rule creation.
+	 */
+	if (flow_hw_should_insert_nop(mhdr, &resource->actions[0])) {
+		ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+		if (ret)
+			return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL, "too many modify field operations specified");
+	}
+	cmds_start = mhdr->mhdr_cmds_num;
+	ret = flow_hw_converted_mhdr_cmds_append(mhdr, resource);
+	if (ret)
+		return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "too many modify field operations specified");
+
+	cmds_end = mhdr->mhdr_cmds_num;
+	if (shared)
+		return 0;
+	ret = __flow_hw_act_data_hdr_modify_append(priv, acts, RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+						   action - action_start, mhdr->pos,
+						   cmds_start, cmds_end, shared,
+						   field, dcopy, mask);
+	if (ret)
+		return rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "not enough memory to store modify field metadata");
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -558,10 +853,12 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
+	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
 	uint32_t type, i;
 	int err;
 
+	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
 		type = MLX5DR_TABLE_TYPE_FDB;
 	else if (attr->egress)
@@ -714,6 +1011,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			reformat_pos = i++;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr.pos == UINT16_MAX)
+				mhdr.pos = i++;
+			err = flow_hw_modify_field_compile(dev, attr, action_start,
+							   actions, masks, acts, &mhdr,
+							   error);
+			if (err)
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -721,6 +1027,31 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (mhdr.pos != UINT16_MAX) {
+		uint32_t flags;
+		uint32_t bulk_size;
+		size_t mhdr_len;
+
+		acts->mhdr = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*acts->mhdr),
+					 0, SOCKET_ID_ANY);
+		if (!acts->mhdr)
+			goto err;
+		rte_memcpy(acts->mhdr, &mhdr, sizeof(*acts->mhdr));
+		mhdr_len = sizeof(struct mlx5_modification_cmd) * acts->mhdr->mhdr_cmds_num;
+		flags = mlx5_hw_act_flag[!!attr->group][type];
+		if (acts->mhdr->shared) {
+			flags |= MLX5DR_ACTION_FLAG_SHARED;
+			bulk_size = 0;
+		} else {
+			bulk_size = rte_log2_u32(table_attr->nb_flows);
+		}
+		acts->mhdr->action = mlx5dr_action_create_modify_header
+				(priv->dr_ctx, mhdr_len, (__be64 *)acts->mhdr->mhdr_cmds,
+				 bulk_size, flags);
+		if (!acts->mhdr->action)
+			goto err;
+		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
+	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
@@ -884,6 +1215,110 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_mhdr_cmd_is_nop(const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd cmd_he = {
+		.data0 = rte_be_to_cpu_32(cmd->data0),
+		.data1 = 0,
+	};
+
+	return cmd_he.action_type == MLX5_MODIFICATION_TYPE_NOP;
+}
+
+/**
+ * Construct flow action array.
+ *
+ * For action template contains dynamic actions, these actions need to
+ * be updated according to the rte_flow action during flow creation.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] job
+ *   Pointer to job descriptor.
+ * @param[in] hw_acts
+ *   Pointer to translated actions from template.
+ * @param[in] it_idx
+ *   Item template index the action template refer to.
+ * @param[in] actions
+ *   Array of rte_flow action need to be checked.
+ * @param[in] rule_acts
+ *   Array of DR rule actions to be used during flow creation..
+ * @param[in] acts_num
+ *   Pointer to the real acts_num flow has.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	const struct rte_flow_action_modify_field *mhdr_action = action->conf;
+	uint8_t values[16] = { 0 };
+	unaligned_uint32_t *value_p;
+	uint32_t i;
+	struct field_modify_info *field;
+
+	if (!hw_acts->mhdr)
+		return -1;
+	if (hw_acts->mhdr->shared || act_data->modify_header.shared)
+		return 0;
+	MLX5_ASSERT(mhdr_action->operation == RTE_FLOW_MODIFY_SET ||
+		    mhdr_action->operation == RTE_FLOW_MODIFY_ADD);
+	if (mhdr_action->src.field != RTE_FLOW_FIELD_VALUE &&
+	    mhdr_action->src.field != RTE_FLOW_FIELD_POINTER)
+		return 0;
+	if (mhdr_action->src.field == RTE_FLOW_FIELD_VALUE)
+		rte_memcpy(values, &mhdr_action->src.value, sizeof(values));
+	else
+		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
+	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(*value_p);
+	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+		uint32_t tmp;
+
+		/*
+		 * QFI is passed as an uint8_t integer, but it is accessed through
+		 * a 2nd least significant byte of a 32-bit field in modify header command.
+		 */
+		tmp = values[0];
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(tmp << 8);
+	}
+	i = act_data->modify_header.mhdr_cmds_off;
+	field = act_data->modify_header.field;
+	do {
+		uint32_t off_b;
+		uint32_t mask;
+		uint32_t data;
+		const uint8_t *mask_src;
+
+		if (i >= act_data->modify_header.mhdr_cmds_end)
+			return -1;
+		if (flow_hw_mhdr_cmd_is_nop(&job->mhdr_cmd[i])) {
+			++i;
+			continue;
+		}
+		mask_src = (const uint8_t *)act_data->modify_header.mask;
+		mask = flow_dv_fetch_field(mask_src + field->offset, field->size);
+		if (!mask) {
+			++field;
+			continue;
+		}
+		off_b = rte_bsf32(mask);
+		data = flow_dv_fetch_field(values + field->offset, field->size);
+		data = (data & mask) >> off_b;
+		job->mhdr_cmd[i++].data1 = rte_cpu_to_be_32(data);
+		++field;
+	} while (field->size);
+	return 0;
+}
+
 /**
  * Construct flow action array.
  *
@@ -928,6 +1363,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	};
 	uint32_t ft_flag;
 	size_t encap_len = 0;
+	int ret;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -945,6 +1381,18 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
+	if (hw_acts->mhdr && hw_acts->mhdr->mhdr_cmds_num > 0) {
+		uint16_t pos = hw_acts->mhdr->pos;
+
+		if (!hw_acts->mhdr->shared) {
+			rule_acts[pos].modify_header.offset =
+						job->flow->idx - 1;
+			rule_acts[pos].modify_header.data =
+						(uint8_t *)job->mhdr_cmd;
+			rte_memcpy(job->mhdr_cmd, hw_acts->mhdr->mhdr_cmds,
+				   sizeof(*job->mhdr_cmd) * hw_acts->mhdr->mhdr_cmds_num);
+		}
+	}
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1020,6 +1468,14 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_modify_field_construct(job,
+							     act_data,
+							     hw_acts,
+							     action);
+			if (ret)
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -1609,6 +2065,155 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static bool
+flow_hw_modify_field_is_used(const struct rte_flow_action_modify_field *action,
+			     enum rte_flow_field_id field)
+{
+	return action->src.field == field || action->dst.field == field;
+}
+
+static int
+flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
+				     const struct rte_flow_action *mask,
+				     struct rte_flow_error *error)
+{
+	const struct rte_flow_action_modify_field *action_conf =
+		action->conf;
+	const struct rte_flow_action_modify_field *mask_conf =
+		mask->conf;
+
+	if (action_conf->operation != mask_conf->operation)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field operation mask and template are not equal");
+	if (action_conf->dst.field != mask_conf->dst.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->dst.field == RTE_FLOW_FIELD_POINTER ||
+	    action_conf->dst.field == RTE_FLOW_FIELD_VALUE)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"immediate value and pointer cannot be used as destination");
+	if (mask_conf->dst.level != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination encapsulation level must be fully masked");
+	if (mask_conf->dst.offset != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination offset level must be fully masked");
+	if (action_conf->src.field != mask_conf->src.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->src.field != RTE_FLOW_FIELD_POINTER &&
+	    action_conf->src.field != RTE_FLOW_FIELD_VALUE) {
+		if (mask_conf->src.level != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source encapsulation level must be fully masked");
+		if (mask_conf->src.offset != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source offset level must be fully masked");
+	}
+	if (mask_conf->width != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field width field must be fully masked");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_START))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying arbitrary place in a packet is not supported");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_VLAN_TYPE))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying vlan_type is not supported");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_GENEVE_VNI))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying Geneve VNI is not supported");
+	return 0;
+}
+
+static int
+flow_hw_action_validate(const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	int i;
+	bool actions_end = false;
+	int ret;
+
+	for (i = 0; !actions_end; ++i) {
+		const struct rte_flow_action *action = &actions[i];
+		const struct rte_flow_action *mask = &masks[i];
+
+		if (action->type != mask->type)
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "mask type does not match action type");
+		switch (action->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MARK:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_DROP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_JUMP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_validate_action_modify_field(action,
+									mask,
+									error);
+			if (ret < 0)
+				return ret;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			actions_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "action not supported in template API");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow action template.
  *
@@ -1637,6 +2242,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
+	if (flow_hw_action_validate(actions, masks, error))
+		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
 	if (act_len <= 0)
@@ -2093,6 +2700,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
+			    sizeof(struct mlx5_modification_cmd) *
+			    MLX5_MHDR_MAX_CMD +
 			    sizeof(struct mlx5_hw_q_job)) *
 			    queue_attr[0]->size;
 	}
@@ -2104,6 +2713,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	for (i = 0; i < nb_queue; i++) {
 		uint8_t *encap = NULL;
+		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 
 		priv->hw_q[i].job_idx = queue_attr[i]->size;
 		priv->hw_q[i].size = queue_attr[i]->size;
@@ -2115,8 +2725,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 					    &job[queue_attr[i - 1]->size];
 		job = (struct mlx5_hw_q_job *)
 		      &priv->hw_q[i].job[queue_attr[i]->size];
-		encap = (uint8_t *)&job[queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
+		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
 		for (j = 0; j < queue_attr[i]->size; j++) {
+			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
 			priv->hw_q[i].job[j] = &job[j];
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 05/18] net/mlx5: add HW steering port action
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (3 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 04/18] net/mlx5: add modify field hws support Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
                     ` (12 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch implements creating and caching of port actions for use with
HW Steering FDB flows.

Actions are created on flow template API configuration and created
only on the port designated as master. Attaching and detaching of ports
in the same switching domain causes an update to the port actions cache
by, respectively, creating and destroying actions.

A new devarg fdb_def_rule_en is being added and it's used to control
the default dedicated E-Switch rule is created by PMD implicitly or not,
and PMD sets this value to 1 by default.
If set to 0, the default E-Switch rule will not be created and user can
create the specific E-Switch rule on root table if needed.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |    9 +
 drivers/net/mlx5/linux/mlx5_os.c   |   16 +-
 drivers/net/mlx5/mlx5.c            |   14 +
 drivers/net/mlx5/mlx5.h            |   26 +-
 drivers/net/mlx5/mlx5_flow.c       |   96 +-
 drivers/net/mlx5/mlx5_flow.h       |   22 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   93 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1356 +++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_trigger.c    |   77 +-
 10 files changed, 1595 insertions(+), 118 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 303eb17714..7d2095f075 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1132,6 +1132,15 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``fdb_def_rule_en`` parameter [int]
+
+  A non-zero value enables the PMD to create a dedicated rule on E-Switch root
+  table, this dedicated rule forwards all incoming packets into table 1, other
+  rules will be created in E-Switch table original table level plus one, to
+  improve the flow insertion rate due to skip root table managed by firmware.
+  If set to 0, all rules will be created on the original E-Switch table level.
+
+  By default, the PMD will set this value to 1.
 
 Supported NICs
 --------------
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index b7cc11a2ef..d674b54624 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1551,11 +1551,18 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5_HWS_SUPPORT
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    flow_hw_create_vport_action(eth_dev)) {
+			DRV_LOG(ERR, "port %u failed to create vport action",
+				eth_dev->data->port_id);
+			err = EINVAL;
+			goto error;
+		}
 		return eth_dev;
 #else
 		DRV_LOG(ERR, "DV support is missing for HWS.");
@@ -1620,6 +1627,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	return eth_dev;
 error:
 	if (priv) {
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		if (eth_dev &&
+		    priv->sh &&
+		    priv->sh->config.dv_flow_en == 2 &&
+		    priv->sh->config.dv_esw_en)
+			flow_hw_destroy_vport_action(eth_dev);
+#endif
 		if (priv->mreg_cp_tbl)
 			mlx5_hlist_destroy(priv->mreg_cp_tbl);
 		if (priv->sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a34fbcf74d..470b9c2d0f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -172,6 +172,9 @@
 /* Device parameter to configure the delay drop when creating Rxqs. */
 #define MLX5_DELAY_DROP "delay_drop"
 
+/* Device parameter to create the fdb default rule in PMD */
+#define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1239,6 +1242,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->decap_en = !!tmp;
 	} else if (strcmp(MLX5_ALLOW_DUPLICATE_PATTERN, key) == 0) {
 		config->allow_duplicate_pattern = !!tmp;
+	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
+		config->fdb_def_rule = !!tmp;
 	}
 	return 0;
 }
@@ -1274,6 +1279,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_RECLAIM_MEM,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
+		MLX5_FDB_DEFAULT_RULE_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1285,6 +1291,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->dv_flow_en = 1;
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
+	config->fdb_def_rule = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1360,6 +1367,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"decap_en\" is %u.", config->decap_en);
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
+	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
 	return 0;
 }
 
@@ -1943,6 +1951,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	mlx5_flex_parser_ecpri_release(dev);
 	mlx5_flex_item_port_cleanup(dev);
 #ifdef HAVE_MLX5_HWS_SUPPORT
+	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
 	if (priv->sh->config.dv_flow_en == 2)
@@ -2644,6 +2653,11 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.fdb_def_rule ^ config->fdb_def_rule) {
+		DRV_LOG(ERR, "\"fdb_def_rule_en\" configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.l3_vxlan_en ^ config->l3_vxlan_en) {
 		DRV_LOG(ERR, "\"l3_vxlan_en\" "
 			"configuration mismatch for shared %s context.",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6f75a32488..69a0a60030 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -314,6 +314,7 @@ struct mlx5_sh_config {
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
 	/* Allow/Prevent the duplicate rules pattern. */
+	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
 
@@ -342,6 +343,8 @@ enum {
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
 };
 
+#define MLX5_HW_MAX_ITEMS (16)
+
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
@@ -349,6 +352,8 @@ struct mlx5_hw_q_job {
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
+	struct rte_flow_item *items;
+	struct rte_flow_item_ethdev port_spec;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -1207,6 +1212,8 @@ struct mlx5_dev_ctx_shared {
 	uint32_t flow_priority_check_flag:1; /* Check Flag for flow priority. */
 	uint32_t metadata_regc_check_flag:1; /* Check Flag for metadata REGC. */
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
+	uint32_t shared_mark_enabled:1;
+	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1457,6 +1464,12 @@ struct mlx5_obj_ops {
 
 #define MLX5_RSS_HASH_FIELDS_LEN RTE_DIM(mlx5_rss_hash_fields)
 
+struct mlx5_hw_ctrl_flow {
+	LIST_ENTRY(mlx5_hw_ctrl_flow) next;
+	struct rte_eth_dev *owner_dev;
+	struct rte_flow *flow;
+};
+
 struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared device context. */
@@ -1497,6 +1510,11 @@ struct mlx5_priv {
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
 	void *root_drop_action; /* Pointer to root drop action. */
+	rte_spinlock_t hw_ctrl_lock;
+	LIST_HEAD(hw_ctrl_flow, mlx5_hw_ctrl_flow) hw_ctrl_flows;
+	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
+	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
+	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
@@ -1557,11 +1575,11 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_drop[MLX5_HW_ACTION_FLAG_MAX]
-				     [MLX5DR_TABLE_TYPE_MAX];
-	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_tag[MLX5_HW_ACTION_FLAG_MAX];
+	struct mlx5dr_action *hw_drop[2];
+	/* HW steering global tag action. */
+	struct mlx5dr_action *hw_tag[2];
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index f36e72fb89..60f76f5a43 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1001,6 +1001,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.flex_item_create = mlx5_flow_flex_item_create,
 	.flex_item_release = mlx5_flow_flex_item_release,
 	.info_get = mlx5_flow_info_get,
+	.pick_transfer_proxy = mlx5_flow_pick_transfer_proxy,
 	.configure = mlx5_flow_port_configure,
 	.pattern_template_create = mlx5_flow_pattern_template_create,
 	.pattern_template_destroy = mlx5_flow_pattern_template_destroy,
@@ -1244,7 +1245,7 @@ mlx5_get_lowest_priority(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (!attr->group && !attr->transfer)
+	if (!attr->group && !(attr->transfer && priv->fdb_def_rule))
 		return priv->sh->flow_max_priority - 2;
 	return MLX5_NON_ROOT_FLOW_MAX_PRIO - 1;
 }
@@ -1271,11 +1272,14 @@ mlx5_get_matcher_priority(struct rte_eth_dev *dev,
 	uint16_t priority = (uint16_t)attr->priority;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+	/* NIC root rules */
 	if (!attr->group && !attr->transfer) {
 		if (attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR)
 			priority = priv->sh->flow_max_priority - 1;
 		return mlx5_os_flow_adjust_priority(dev, priority, subpriority);
-	} else if (!external && attr->transfer && attr->group == 0 &&
+	/* FDB root rules */
+	} else if (attr->transfer && (!external || !priv->fdb_def_rule) &&
+		   attr->group == 0 &&
 		   attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR) {
 		return (priv->sh->flow_max_priority - 1) * 3;
 	}
@@ -1483,13 +1487,32 @@ flow_rxq_mark_flag_set(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_ctrl *rxq_ctrl;
+	uint16_t port_id;
 
-	if (priv->mark_enabled)
+	if (priv->sh->shared_mark_enabled)
 		return;
-	LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
-		rxq_ctrl->rxq.mark = 1;
+	if (priv->master || priv->representor) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->domain_id != priv->domain_id ||
+			    opriv->mark_enabled)
+				continue;
+			LIST_FOREACH(rxq_ctrl, &opriv->rxqsctrl, next) {
+				rxq_ctrl->rxq.mark = 1;
+			}
+			opriv->mark_enabled = 1;
+		}
+	} else {
+		LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
+			rxq_ctrl->rxq.mark = 1;
+		}
+		priv->mark_enabled = 1;
 	}
-	priv->mark_enabled = 1;
+	priv->sh->shared_mark_enabled = 1;
 }
 
 /**
@@ -1625,6 +1648,7 @@ flow_rxq_flags_clear(struct rte_eth_dev *dev)
 		rxq->ctrl->rxq.tunnel = 0;
 	}
 	priv->mark_enabled = 0;
+	priv->sh->shared_mark_enabled = 0;
 }
 
 /**
@@ -2810,8 +2834,8 @@ mlx5_flow_validate_item_tcp(const struct rte_flow_item *item,
  *   Item specification.
  * @param[in] item_flags
  *   Bit-fields that holds the items detected until now.
- * @param[in] attr
- *   Flow rule attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2823,7 +2847,7 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 			      uint16_t udp_dport,
 			      const struct rte_flow_item *item,
 			      uint64_t item_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_vxlan *spec = item->spec;
@@ -2860,12 +2884,11 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 	if (priv->sh->steering_format_version !=
 	    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
 	    !udp_dport || udp_dport == MLX5_UDP_PORT_VXLAN) {
-		/* FDB domain & NIC domain non-zero group */
-		if ((attr->transfer || attr->group) && priv->sh->misc5_cap)
+		/* non-root table */
+		if (!root && priv->sh->misc5_cap)
 			valid_mask = &nic_mask;
 		/* Group zero in NIC domain */
-		if (!attr->group && !attr->transfer &&
-		    priv->sh->tunnel_header_0_1)
+		if (!root && priv->sh->tunnel_header_0_1)
 			valid_mask = &nic_mask;
 	}
 	ret = mlx5_flow_item_acceptable
@@ -3104,11 +3127,11 @@ mlx5_flow_validate_item_gre_option(struct rte_eth_dev *dev,
 	if (mask->checksum_rsvd.checksum || mask->sequence.sequence) {
 		if (priv->sh->steering_format_version ==
 		    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
-		    ((attr->group || attr->transfer) &&
+		    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
 		     !priv->sh->misc5_cap) ||
 		    (!(priv->sh->tunnel_header_0_1 &&
 		       priv->sh->tunnel_header_2_3) &&
-		    !attr->group && !attr->transfer))
+		    !attr->group && (!attr->transfer || !priv->fdb_def_rule)))
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  item,
@@ -6165,7 +6188,8 @@ flow_create_split_metadata(struct rte_eth_dev *dev,
 	}
 	if (qrss) {
 		/* Check if it is in meter suffix table. */
-		mtr_sfx = attr->group == (attr->transfer ?
+		mtr_sfx = attr->group ==
+			  ((attr->transfer && priv->fdb_def_rule) ?
 			  (MLX5_FLOW_TABLE_LEVEL_METER - 1) :
 			  MLX5_FLOW_TABLE_LEVEL_METER);
 		/*
@@ -11088,3 +11112,43 @@ int mlx5_flow_get_item_vport_id(struct rte_eth_dev *dev,
 
 	return 0;
 }
+
+int
+mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+			      uint16_t *proxy_port_id,
+			      struct rte_flow_error *error)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t port_id;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " without E-Switch configured");
+	if (!priv->master && !priv->representor)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " for port which is not a master"
+					  " or a representor port");
+	if (priv->master) {
+		*proxy_port_id = dev->data->port_id;
+		return 0;
+	}
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_priv->master &&
+		    port_priv->domain_id == priv->domain_id) {
+			*proxy_port_id = port_id;
+			return 0;
+		}
+	}
+	return rte_flow_error_set(error, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "unable to find a proxy port");
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 407b3d79bd..25b44ccca2 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1162,6 +1162,11 @@ struct rte_flow_pattern_template {
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
 	uint32_t refcnt;  /* Reference counter. */
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * represented_port pattern item.
+	 */
+	bool implicit_port;
 };
 
 /* Flow action template struct. */
@@ -1237,6 +1242,7 @@ struct mlx5_hw_action_template {
 /* mlx5 flow group struct. */
 struct mlx5_flow_group {
 	struct mlx5_list_entry entry;
+	struct rte_eth_dev *dev; /* Reference to corresponding device. */
 	struct mlx5dr_table *tbl; /* HWS table object. */
 	struct mlx5_hw_jump_action jump; /* Jump action. */
 	enum mlx5dr_table_type type; /* Table type. */
@@ -1495,6 +1501,9 @@ void flow_hw_clear_port_info(struct rte_eth_dev *dev);
 void flow_hw_init_tags_set(struct rte_eth_dev *dev);
 void flow_hw_clear_tags_set(struct rte_eth_dev *dev);
 
+int flow_hw_create_vport_action(struct rte_eth_dev *dev);
+void flow_hw_destroy_vport_action(struct rte_eth_dev *dev);
+
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr,
 				    const struct rte_flow_item items[],
@@ -2093,7 +2102,7 @@ int mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 				  uint16_t udp_dport,
 				  const struct rte_flow_item *item,
 				  uint64_t item_flags,
-				  const struct rte_flow_attr *attr,
+				  bool root,
 				  struct rte_flow_error *error);
 int mlx5_flow_validate_item_vxlan_gpe(const struct rte_flow_item *item,
 				      uint64_t item_flags,
@@ -2350,4 +2359,15 @@ int flow_dv_translate_items_hws(const struct rte_flow_item *items,
 				uint32_t key_type, uint64_t *item_flags,
 				uint8_t *match_criteria,
 				struct rte_flow_error *error);
+
+int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+				  uint16_t *proxy_port_id,
+				  struct rte_flow_error *error);
+
+int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
+
+int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
+					 uint32_t txq);
+int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 9fbaa4bfe8..1ee26be975 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2446,8 +2446,8 @@ flow_dv_validate_item_gtp(struct rte_eth_dev *dev,
  *   Previous validated item in the pattern items.
  * @param[in] gtp_item
  *   Previous GTP item specification.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2458,7 +2458,7 @@ static int
 flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			      uint64_t last_item,
 			      const struct rte_flow_item *gtp_item,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_gtp *gtp_spec;
@@ -2483,7 +2483,7 @@ flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, item,
 			 "GTP E flag must be 1 to match GTP PSC");
 	/* Check the flow is not created in group zero. */
-	if (!attr->transfer && !attr->group)
+	if (root)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 			 "GTP PSC is not supported for group 0");
@@ -3348,20 +3348,19 @@ flow_dv_validate_action_set_tag(struct rte_eth_dev *dev,
 /**
  * Indicates whether ASO aging is supported.
  *
- * @param[in] sh
- *   Pointer to shared device context structure.
- * @param[in] attr
- *   Attributes of flow that includes AGE action.
+ * @param[in] priv
+ *   Pointer to device private context structure.
+ * @param[in] root
+ *   Whether action is on root table.
  *
  * @return
  *   True when ASO aging is supported, false otherwise.
  */
 static inline bool
-flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
-		const struct rte_flow_attr *attr)
+flow_hit_aso_supported(const struct mlx5_priv *priv, bool root)
 {
-	MLX5_ASSERT(sh && attr);
-	return (sh->flow_hit_aso_en && (attr->transfer || attr->group));
+	MLX5_ASSERT(priv);
+	return (priv->sh->flow_hit_aso_en && !root);
 }
 
 /**
@@ -3373,8 +3372,8 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
  *   Indicator if action is shared.
  * @param[in] action_flags
  *   Holds the actions detected until now.
- * @param[in] attr
- *   Attributes of flow that includes this action.
+ * @param[in] root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3384,7 +3383,7 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
 static int
 flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 			      uint64_t action_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -3396,7 +3395,7 @@ flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "duplicate count actions set");
 	if (shared && (action_flags & MLX5_FLOW_ACTION_AGE) &&
-	    !flow_hit_aso_supported(priv->sh, attr))
+	    !flow_hit_aso_supported(priv, root))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "old age and indirect count combination is not supported");
@@ -3627,8 +3626,8 @@ flow_dv_validate_action_raw_encap_decap
  *   Holds the actions detected until now.
  * @param[in] item_flags
  *   The items found in this flow rule.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3639,12 +3638,12 @@ static int
 flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 			       uint64_t action_flags,
 			       uint64_t item_flags,
-			       const struct rte_flow_attr *attr,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	RTE_SET_USED(dev);
 
-	if (attr->group == 0 && !attr->transfer)
+	if (root)
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -4894,6 +4893,8 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   Pointer to the modify action.
  * @param[in] attr
  *   Pointer to the flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -4906,6 +4907,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 				   const uint64_t action_flags,
 				   const struct rte_flow_action *action,
 				   const struct rte_flow_attr *attr,
+				   bool root,
 				   struct rte_flow_error *error)
 {
 	int ret = 0;
@@ -4953,7 +4955,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	}
 	if (action_modify_field->src.field != RTE_FLOW_FIELD_VALUE &&
 	    action_modify_field->src.field != RTE_FLOW_FIELD_POINTER) {
-		if (!attr->transfer && !attr->group)
+		if (root)
 			return rte_flow_error_set(error, ENOTSUP,
 					RTE_FLOW_ERROR_TYPE_ACTION, action,
 					"modify field action is not"
@@ -5043,8 +5045,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV4_ECN ||
 	    action_modify_field->dst.field == RTE_FLOW_FIELD_IPV6_ECN ||
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV6_ECN)
-		if (!hca_attr->modify_outer_ip_ecn &&
-		    !attr->transfer && !attr->group)
+		if (!hca_attr->modify_outer_ip_ecn && root)
 			return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_ACTION, action,
 				"modifications of the ECN for current firmware is not supported");
@@ -5078,11 +5079,12 @@ flow_dv_validate_action_jump(struct rte_eth_dev *dev,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
 	struct flow_grp_info grp_info = {
 		.external = !!external,
 		.transfer = !!attributes->transfer,
-		.fdb_def_rule = 1,
+		.fdb_def_rule = !!priv->fdb_def_rule,
 		.std_tbl_fix = 0
 	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
@@ -5662,6 +5664,8 @@ flow_dv_modify_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  *   Pointer to the COUNT action in sample action list.
  * @param[out] fdb_mirror_limit
  *   Pointer to the FDB mirror limitation flag.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -5678,6 +5682,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 			       const struct rte_flow_action_rss **sample_rss,
 			       const struct rte_flow_action_count **count,
 			       int *fdb_mirror_limit,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5779,7 +5784,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count
 				(dev, false, *action_flags | sub_action_flags,
-				 attr, error);
+				 root, error);
 			if (ret < 0)
 				return ret;
 			*count = act->conf;
@@ -7259,7 +7264,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
@@ -7353,7 +7358,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
 			ret = flow_dv_validate_item_gtp_psc(items, last_item,
-							    gtp_item, attr,
+							    gtp_item, is_root,
 							    error);
 			if (ret < 0)
 				return ret;
@@ -7570,7 +7575,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count(dev, shared_count,
 							    action_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			count = actions->conf;
@@ -7864,7 +7869,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_SET_TAG;
 			break;
 		case MLX5_RTE_FLOW_ACTION_TYPE_AGE:
-			if (!attr->transfer && !attr->group)
+			if (is_root)
 				return rte_flow_error_set(error, ENOTSUP,
 						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 									   NULL,
@@ -7889,7 +7894,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			 * Validate the regular AGE action (using counter)
 			 * mutual exclusion with indirect counter actions.
 			 */
-			if (!flow_hit_aso_supported(priv->sh, attr)) {
+			if (!flow_hit_aso_supported(priv, is_root)) {
 				if (shared_count)
 					return rte_flow_error_set
 						(error, EINVAL,
@@ -7945,6 +7950,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 							     rss, &sample_rss,
 							     &sample_count,
 							     &fdb_mirror_limit,
+							     is_root,
 							     error);
 			if (ret < 0)
 				return ret;
@@ -7961,6 +7967,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 								   action_flags,
 								   actions,
 								   attr,
+								   is_root,
 								   error);
 			if (ret < 0)
 				return ret;
@@ -7974,8 +7981,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			ret = flow_dv_validate_action_aso_ct(dev, action_flags,
-							     item_flags, attr,
-							     error);
+							     item_flags,
+							     is_root, error);
 			if (ret < 0)
 				return ret;
 			action_flags |= MLX5_FLOW_ACTION_CT;
@@ -9172,15 +9179,18 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev,
 	if (MLX5_ITEM_VALID(item, key_type))
 		return;
 	MLX5_ITEM_UPDATE(item, key_type, vxlan_v, vxlan_m, &nic_mask);
-	if (item->mask == &nic_mask &&
-	    ((!attr->group && !priv->sh->tunnel_header_0_1) ||
-	    (attr->group && !priv->sh->misc5_cap)))
+	if ((item->mask == &nic_mask) &&
+	    ((!attr->group && !(attr->transfer && priv->fdb_def_rule) &&
+	    !priv->sh->tunnel_header_0_1) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)))
 		vxlan_m = &rte_flow_item_vxlan_mask;
 	if ((priv->sh->steering_format_version ==
 	     MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 &&
 	     dport != MLX5_UDP_PORT_VXLAN) ||
-	    (!attr->group && !attr->transfer) ||
-	    ((attr->group || attr->transfer) && !priv->sh->misc5_cap)) {
+	    (!attr->group && !(attr->transfer && priv->fdb_def_rule)) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)) {
 		misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 		size = sizeof(vxlan_m->vni);
 		vni_v = MLX5_ADDR_OF(fte_match_set_misc, misc_v, vxlan_vni);
@@ -14152,7 +14162,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 			 */
 			if (action_flags & MLX5_FLOW_ACTION_AGE) {
 				if ((non_shared_age && count) ||
-				    !flow_hit_aso_supported(priv->sh, attr)) {
+				    !flow_hit_aso_supported(priv, !dev_flow->dv.group)) {
 					/* Creates age by counters. */
 					cnt_act = flow_dv_prepare_counter
 								(dev, dev_flow,
@@ -18301,6 +18311,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 			struct rte_flow_error *err)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	/* called from RTE API */
 
 	RTE_SET_USED(conf);
 	switch (action->type) {
@@ -18328,7 +18339,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 						"Indirect age action not supported");
 		return flow_dv_validate_action_age(0, action, dev, err);
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		return flow_dv_validate_action_count(dev, true, 0, NULL, err);
+		return flow_dv_validate_action_count(dev, true, 0, false, err);
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		if (!priv->sh->ct_aso_en)
 			return rte_flow_error_set(err, ENOTSUP,
@@ -18505,6 +18516,8 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 	bool def_green = false;
 	bool def_yellow = false;
 	const struct rte_flow_action_rss *rss_color[RTE_COLORS] = {NULL};
+	/* Called from RTE API */
+	bool is_root = !(attr->group || (attr->transfer && priv->fdb_def_rule));
 
 	if (!dev_conf->dv_esw_en)
 		def_domain &= ~MLX5_MTR_DOMAIN_TRANSFER_BIT;
@@ -18706,7 +18719,7 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 				break;
 			case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 				ret = flow_dv_validate_action_modify_field(dev,
-					action_flags[i], act, attr, &flow_err);
+					action_flags[i], act, attr, is_root, &flow_err);
 				if (ret < 0)
 					return -rte_mtr_error_set(error,
 					  ENOTSUP,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index fa7bd37737..991e4c9b7b 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,6 +20,14 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
+/* Maximum number of rules in control flow tables */
+#define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
+
+/* Flow group for SQ miss default flows/ */
+#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+
+static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -57,6 +65,9 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_ctrl_get(dev, i);
 
+		/* With RXQ start/stop feature, RXQ might be stopped. */
+		if (!rxq_ctrl)
+			continue;
 		rxq_ctrl->rxq.mark = enable;
 	}
 	priv->mark_enabled = enable;
@@ -810,6 +821,77 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+flow_hw_represented_port_compile(struct rte_eth_dev *dev,
+				 const struct rte_flow_attr *attr,
+				 const struct rte_flow_action *action_start,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *action_mask,
+				 struct mlx5_hw_actions *acts,
+				 uint16_t action_dst,
+				 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_ethdev *v = action->conf;
+	const struct rte_flow_action_ethdev *m = action_mask->conf;
+	int ret;
+
+	if (!attr->group)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used on group 0");
+	if (!attr->transfer)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER,
+					  NULL,
+					  "represented_port action requires"
+					  " transfer attribute");
+	if (attr->ingress || attr->egress)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used with direction attributes");
+	if (!priv->master)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "represented_port acton must"
+					  " be used on proxy port");
+	if (m && !!m->port_id) {
+		struct mlx5_priv *port_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
+		if (port_priv == NULL)
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "port does not exist or unable to"
+					 " obtain E-Switch info for port");
+		MLX5_ASSERT(priv->hw_vport != NULL);
+		if (priv->hw_vport[v->port_id]) {
+			acts->rule_acts[action_dst].action =
+					priv->hw_vport[v->port_id];
+		} else {
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "cannot use represented_port action"
+					 " with this port");
+		}
+	} else {
+		ret = __flow_hw_act_data_general_append
+				(priv, acts, action->type,
+				 action - action_start, action_dst);
+		if (ret)
+			return rte_flow_error_set
+					(error, ENOMEM,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "not enough memory to store"
+					 " vport action");
+	}
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -887,7 +969,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			acts->rule_acts[i++].action =
-				priv->hw_drop[!!attr->group][type];
+				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			acts->mark = true;
@@ -1020,6 +1102,13 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			if (err)
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			if (flow_hw_represented_port_compile
+					(dev, attr, action_start, actions,
+					 masks, acts, i, error))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1352,11 +1441,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5dr_rule_action *rule_acts,
 			  uint32_t *acts_num)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
+	const struct rte_flow_action_ethdev *port_action = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1476,6 +1567,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (ret)
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			port_action = action->conf;
+			if (!priv->hw_vport[port_action->port_id])
+				return -1;
+			rule_acts[act_data->action_dst].action =
+					priv->hw_vport[port_action->port_id];
+			break;
 		default:
 			break;
 		}
@@ -1488,6 +1586,52 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static const struct rte_flow_item *
+flow_hw_get_rule_items(struct rte_eth_dev *dev,
+		       struct rte_flow_template_table *table,
+		       const struct rte_flow_item items[],
+		       uint8_t pattern_template_index,
+		       struct mlx5_hw_q_job *job)
+{
+	if (table->its[pattern_template_index]->implicit_port) {
+		const struct rte_flow_item *curr_item;
+		unsigned int nb_items;
+		bool found_end;
+		unsigned int i;
+
+		/* Count number of pattern items. */
+		nb_items = 0;
+		found_end = false;
+		for (curr_item = items; !found_end; ++curr_item) {
+			++nb_items;
+			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+				found_end = true;
+		}
+		/* Prepend represented port item. */
+		job->port_spec = (struct rte_flow_item_ethdev){
+			.port_id = dev->data->port_id,
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &job->port_spec,
+		};
+		found_end = false;
+		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
+			job->items[i] = items[i - 1];
+			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
+				found_end = true;
+				break;
+			}
+		}
+		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+		return job->items;
+	}
+	return items;
+}
+
 /**
  * Enqueue HW steering flow creation.
  *
@@ -1539,6 +1683,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
+	const struct rte_flow_item *rule_items;
 	uint32_t acts_num, flow_idx;
 	int ret;
 
@@ -1565,15 +1710,23 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow action array based on the input actions.*/
-	flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num);
+	/* Construct the flow actions based on the input actions.*/
+	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
+				  actions, rule_acts, &acts_num)) {
+		rte_errno = EINVAL;
+		goto free;
+	}
+	rule_items = flow_hw_get_rule_items(dev, table, items,
+					    pattern_template_index, job);
+	if (!rule_items)
+		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
 				 pattern_template_index, items,
 				 action_template_index, rule_acts,
 				 &rule_attr, (struct mlx5dr_rule *)flow->rule);
 	if (likely(!ret))
 		return (struct rte_flow *)flow;
+free:
 	/* Flow created fail, return the descriptor and flow memory. */
 	mlx5_ipool_free(table->flow, flow_idx);
 	priv->hw_q[queue].job_idx++;
@@ -1754,7 +1907,9 @@ __flow_hw_pull_comp(struct rte_eth_dev *dev,
 	struct rte_flow_op_result comp[BURST_THR];
 	int ret, i, empty_loop = 0;
 
-	flow_hw_push(dev, queue, error);
+	ret = flow_hw_push(dev, queue, error);
+	if (ret < 0)
+		return ret;
 	while (pending_rules) {
 		ret = flow_hw_pull(dev, queue, comp, BURST_THR, error);
 		if (ret < 0)
@@ -2039,8 +2194,12 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
+	uint32_t fidx = 1;
 
-	if (table->refcnt) {
+	/* Build ipool allocated object bitmap. */
+	mlx5_ipool_flush_cache(table->flow);
+	/* Check if ipool has allocated objects. */
+	if (table->refcnt || mlx5_ipool_get_next(table->flow, &fidx)) {
 		DRV_LOG(WARNING, "Table %p is still in using.", (void *)table);
 		return rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2052,8 +2211,6 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 		__atomic_sub_fetch(&table->its[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
 	for (i = 0; i < table->nb_action_templates; i++) {
-		if (table->ats[i].acts.mark)
-			flow_hw_rxq_flag_set(dev, false);
 		__flow_hw_action_template_destroy(dev, &table->ats[i].acts);
 		__atomic_sub_fetch(&table->ats[i].action_template->refcnt,
 				   1, __ATOMIC_RELAXED);
@@ -2138,7 +2295,51 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
 }
 
 static int
-flow_hw_action_validate(const struct rte_flow_action actions[],
+flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
+					 const struct rte_flow_action *action,
+					 const struct rte_flow_action *mask,
+					 struct rte_flow_error *error)
+{
+	const struct rte_flow_action_ethdev *action_conf = action->conf;
+	const struct rte_flow_action_ethdev *mask_conf = mask->conf;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "cannot use represented_port actions"
+					  " without an E-Switch");
+	if (mask_conf->port_id) {
+		struct mlx5_priv *port_priv;
+		struct mlx5_priv *dev_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
+		if (!port_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for port");
+		dev_priv = mlx5_dev_to_eswitch_info(dev);
+		if (!dev_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for transfer proxy");
+		if (port_priv->domain_id != dev_priv->domain_id)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "cannot forward to port from"
+						  " a different E-Switch");
+	}
+	return 0;
+}
+
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
@@ -2201,6 +2402,12 @@ flow_hw_action_validate(const struct rte_flow_action actions[],
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			ret = flow_hw_validate_action_represented_port
+					(dev, action, mask, error);
+			if (ret < 0)
+				return ret;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2242,7 +2449,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
-	if (flow_hw_action_validate(actions, masks, error))
+	if (flow_hw_action_validate(dev, actions, masks, error))
 		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
@@ -2325,6 +2532,46 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
+static struct rte_flow_item *
+flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
+			       struct rte_flow_error *error)
+{
+	const struct rte_flow_item *curr_item;
+	struct rte_flow_item *copied_items;
+	bool found_end;
+	unsigned int nb_items;
+	unsigned int i;
+	size_t size;
+
+	/* Count number of pattern items. */
+	nb_items = 0;
+	found_end = false;
+	for (curr_item = items; !found_end; ++curr_item) {
+		++nb_items;
+		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+			found_end = true;
+	}
+	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	size = sizeof(*copied_items) * (nb_items + 1);
+	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
+	if (!copied_items) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				   NULL,
+				   "cannot allocate item template");
+		return NULL;
+	}
+	copied_items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = NULL,
+		.last = NULL,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	for (i = 1; i < nb_items + 1; ++i)
+		copied_items[i] = items[i - 1];
+	return copied_items;
+}
+
 /**
  * Create flow item template.
  *
@@ -2348,9 +2595,35 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *it;
+	struct rte_flow_item *copied_items = NULL;
+	const struct rte_flow_item *tmpl_items;
 
+	if (priv->sh->config.dv_esw_en && attr->ingress) {
+		/*
+		 * Disallow pattern template with ingress and egress/transfer
+		 * attributes in order to forbid implicit port matching
+		 * on egress and transfer traffic.
+		 */
+		if (attr->egress || attr->transfer) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "item template for ingress traffic"
+					   " cannot be used for egress/transfer"
+					   " traffic when E-Switch is enabled");
+			return NULL;
+		}
+		copied_items = flow_hw_copy_prepend_port_item(items, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else {
+		tmpl_items = items;
+	}
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL,
@@ -2358,8 +2631,10 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
-	it->mt = mlx5dr_match_template_create(items, attr->relaxed_matching);
+	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		mlx5_free(it);
 		rte_flow_error_set(error, rte_errno,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2367,9 +2642,12 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 				   "cannot create match template");
 		return NULL;
 	}
-	it->item_flags = flow_hw_rss_item_flags_get(items);
+	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
+	it->implicit_port = !!copied_items;
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
+	if (copied_items)
+		mlx5_free(copied_items);
 	return it;
 }
 
@@ -2495,6 +2773,7 @@ flow_hw_grp_create_cb(void *tool_ctx, void *cb_ctx)
 			goto error;
 		grp_data->jump.root_action = jump;
 	}
+	grp_data->dev = dev;
 	grp_data->idx = idx;
 	grp_data->group_id = attr->group;
 	grp_data->type = dr_tbl_attr.type;
@@ -2563,7 +2842,8 @@ flow_hw_grp_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 	struct rte_flow_attr *attr =
 			(struct rte_flow_attr *)ctx->data;
 
-	return (grp_data->group_id != attr->group) ||
+	return (grp_data->dev != ctx->dev) ||
+		(grp_data->group_id != attr->group) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_FDB) &&
 		attr->transfer) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_NIC_TX) &&
@@ -2626,6 +2906,545 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
 	mlx5_ipool_free(sh->ipool[MLX5_IPOOL_HW_GRP], grp_data->idx);
 }
 
+/**
+ * Create and cache a vport action for given @p dev port. vport actions
+ * cache is used in HWS with FDB flows.
+ *
+ * This function does not create any function if proxy port for @p dev port
+ * was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ *
+ * @return
+ *   0 on success, positive value otherwise.
+ */
+int
+flow_hw_create_vport_action(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+	int ret;
+
+	ret = mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL);
+	if (ret)
+		return ret;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport)
+		return 0;
+	if (proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u HWS vport action already created",
+			port_id);
+		return -EINVAL;
+	}
+	proxy_priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+			(proxy_priv->dr_ctx, priv->dev_port,
+			 MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u unable to create HWS vport action",
+			port_id);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Destroys the vport action associated with @p dev device
+ * from actions' cache.
+ *
+ * This function does not destroy any action if there is no action cached
+ * for @p dev or proxy port was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ */
+void
+flow_hw_destroy_vport_action(struct rte_eth_dev *dev)
+{
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+
+	if (mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL))
+		return;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport || !proxy_priv->hw_vport[port_id])
+		return;
+	mlx5dr_action_destroy(proxy_priv->hw_vport[port_id]);
+	proxy_priv->hw_vport[port_id] = NULL;
+}
+
+static int
+flow_hw_create_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	MLX5_ASSERT(!priv->hw_vport);
+	priv->hw_vport = mlx5_malloc(MLX5_MEM_ZERO,
+				     sizeof(*priv->hw_vport) * RTE_MAX_ETHPORTS,
+				     0, SOCKET_ID_ANY);
+	if (!priv->hw_vport)
+		return -ENOMEM;
+	DRV_LOG(DEBUG, "port %u :: creating vport actions", priv->dev_data->port_id);
+	DRV_LOG(DEBUG, "port %u ::    domain_id=%u", priv->dev_data->port_id, priv->domain_id);
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
+		struct mlx5_priv *port_priv = rte_eth_devices[port_id].data->dev_private;
+
+		if (!port_priv ||
+		    port_priv->domain_id != priv->domain_id)
+			continue;
+		DRV_LOG(DEBUG, "port %u :: for port_id=%u, calling mlx5dr_action_create_dest_vport() with ibport=%u",
+			priv->dev_data->port_id, port_id, port_priv->dev_port);
+		priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+				(priv->dr_ctx, port_priv->dev_port,
+				 MLX5DR_ACTION_FLAG_HWS_FDB);
+		DRV_LOG(DEBUG, "port %u :: priv->hw_vport[%u]=%p",
+			priv->dev_data->port_id, port_id, (void *)priv->hw_vport[port_id]);
+		if (!priv->hw_vport[port_id])
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static void
+flow_hw_free_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	if (!priv->hw_vport)
+		return;
+	for (port_id = 0; port_id < RTE_MAX_ETHPORTS; ++port_id)
+		if (priv->hw_vport[port_id])
+			mlx5dr_action_destroy(priv->hw_vport[port_id]);
+	mlx5_free(priv->hw_vport);
+	priv->hw_vport = NULL;
+}
+
+/**
+ * Creates a flow pattern template used to match on E-Switch Manager.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template used to match on a TX queue.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct mlx5_rte_flow_item_sq queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template with unmasked represented port matching.
+ * This template is used to set up a table for default transfer flows
+ * directing packets to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow actions template with an unmasked JUMP action. Flows
+ * based on this template will perform a jump to some group. This template
+ * is used to set up tables for control flows.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param group
+ *   Destination group for this action template.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_jump_actions_template(struct rte_eth_dev *dev,
+					  uint32_t group)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = group,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a flow action template with a unmasked REPRESENTED_PORT action.
+ * It is used to create control flow tables.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow action template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_ethdev port_v = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action_ethdev port_m = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
+				       struct rte_flow_pattern_template *it,
+				       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
+				  struct rte_flow_pattern_template *it,
+				  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = MLX5_HW_SQ_MISS_GROUP,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic
+ * from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table *
+flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
+			       struct rte_flow_pattern_template *it,
+			       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 15, /* TODO: Flow priority discovery. */
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a set of flow tables used to create control flows used
+ * when E-Switch is engaged.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, EINVAL otherwise
+ */
+static __rte_unused int
+flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
+	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *port_items_tmpl = NULL;
+	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_actions_template *port_actions_tmpl = NULL;
+	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+
+	/* Item templates */
+	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
+	if (!esw_mgr_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
+			" template for control flows", dev->data->port_id);
+		goto error;
+	}
+	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
+	if (!sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Action templates */
+	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
+									 MLX5_HW_SQ_MISS_GROUP);
+	if (!jump_sq_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Tables */
+	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
+	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
+			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_root_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+								     port_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
+	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
+							       jump_one_actions_tmpl);
+	if (!priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default jump to group 1"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	return 0;
+error:
+	if (priv->hw_esw_zero_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_zero_tbl, NULL);
+		priv->hw_esw_zero_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_tbl, NULL);
+		priv->hw_esw_sq_miss_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_root_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
+		priv->hw_esw_sq_miss_root_tbl = NULL;
+	}
+	if (jump_one_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
+	if (port_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
+	if (jump_sq_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (port_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
+	if (sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (esw_mgr_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
+	return -EINVAL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -2643,7 +3462,6 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-
 static int
 flow_hw_configure(struct rte_eth_dev *dev,
 		  const struct rte_flow_port_attr *port_attr,
@@ -2666,6 +3484,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		.free = mlx5_free,
 		.type = "mlx5_hw_action_construct_data",
 	};
+	/* Adds one queue to be used by PMD.
+	 * The last queue will be used by the PMD.
+	 */
+	uint16_t nb_q_updated;
+	struct rte_flow_queue_attr **_queue_attr = NULL;
+	struct rte_flow_queue_attr ctrl_queue_attr = {0};
+	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
+	int ret;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -2674,7 +3500,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* In case re-configuring, release existing context at first. */
 	if (priv->dr_ctx) {
 		/* */
-		for (i = 0; i < nb_queue; i++) {
+		for (i = 0; i < priv->nb_queue; i++) {
 			hw_q = &priv->hw_q[i];
 			/* Make sure all queues are empty. */
 			if (hw_q->size != hw_q->job_idx) {
@@ -2684,26 +3510,42 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		flow_hw_resource_release(dev);
 	}
+	ctrl_queue_attr.size = queue_attr[0]->size;
+	nb_q_updated = nb_queue + 1;
+	_queue_attr = mlx5_malloc(MLX5_MEM_ZERO,
+				  nb_q_updated *
+				  sizeof(struct rte_flow_queue_attr *),
+				  64, SOCKET_ID_ANY);
+	if (!_queue_attr) {
+		rte_errno = ENOMEM;
+		goto err;
+	}
+
+	memcpy(_queue_attr, queue_attr,
+	       sizeof(void *) * nb_queue);
+	_queue_attr[nb_queue] = &ctrl_queue_attr;
 	priv->acts_ipool = mlx5_ipool_create(&cfg);
 	if (!priv->acts_ipool)
 		goto err;
 	/* Allocate the queue job descriptor LIFO. */
-	mem_size = sizeof(priv->hw_q[0]) * nb_queue;
-	for (i = 0; i < nb_queue; i++) {
+	mem_size = sizeof(priv->hw_q[0]) * nb_q_updated;
+	for (i = 0; i < nb_q_updated; i++) {
 		/*
 		 * Check if the queues' size are all the same as the
 		 * limitation from HWS layer.
 		 */
-		if (queue_attr[i]->size != queue_attr[0]->size) {
+		if (_queue_attr[i]->size != _queue_attr[0]->size) {
 			rte_errno = EINVAL;
 			goto err;
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
+			    sizeof(struct mlx5_hw_q_job) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
 			    sizeof(struct mlx5_modification_cmd) *
 			    MLX5_MHDR_MAX_CMD +
-			    sizeof(struct mlx5_hw_q_job)) *
-			    queue_attr[0]->size;
+			    sizeof(struct rte_flow_item) *
+			    MLX5_HW_MAX_ITEMS) *
+			    _queue_attr[i]->size;
 	}
 	priv->hw_q = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
 				 64, SOCKET_ID_ANY);
@@ -2711,58 +3553,82 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		rte_errno = ENOMEM;
 		goto err;
 	}
-	for (i = 0; i < nb_queue; i++) {
+	for (i = 0; i < nb_q_updated; i++) {
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
+		struct rte_flow_item *items = NULL;
 
-		priv->hw_q[i].job_idx = queue_attr[i]->size;
-		priv->hw_q[i].size = queue_attr[i]->size;
+		priv->hw_q[i].job_idx = _queue_attr[i]->size;
+		priv->hw_q[i].size = _queue_attr[i]->size;
 		if (i == 0)
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &priv->hw_q[nb_queue];
+					    &priv->hw_q[nb_q_updated];
 		else
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &job[queue_attr[i - 1]->size];
+				&job[_queue_attr[i - 1]->size - 1].items
+				 [MLX5_HW_MAX_ITEMS];
 		job = (struct mlx5_hw_q_job *)
-		      &priv->hw_q[i].job[queue_attr[i]->size];
-		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
-		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
-		for (j = 0; j < queue_attr[i]->size; j++) {
+		      &priv->hw_q[i].job[_queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)
+			   &job[_queue_attr[i]->size];
+		encap = (uint8_t *)
+			 &mhdr_cmd[_queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
+		items = (struct rte_flow_item *)
+			 &encap[_queue_attr[i]->size * MLX5_ENCAP_MAX_LEN];
+		for (j = 0; j < _queue_attr[i]->size; j++) {
 			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
+			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
-	dr_ctx_attr.queues = nb_queue;
+	dr_ctx_attr.queues = nb_q_updated;
 	/* Queue size should all be the same. Take the first one. */
-	dr_ctx_attr.queue_size = queue_attr[0]->size;
+	dr_ctx_attr.queue_size = _queue_attr[0]->size;
 	dr_ctx = mlx5dr_context_open(priv->sh->cdev->ctx, &dr_ctx_attr);
 	/* rte_errno has been updated by HWS layer. */
 	if (!dr_ctx)
 		goto err;
 	priv->dr_ctx = dr_ctx;
-	priv->nb_queue = nb_queue;
+	priv->nb_queue = nb_q_updated;
+	rte_spinlock_init(&priv->hw_ctrl_lock);
+	LIST_INIT(&priv->hw_ctrl_flows);
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			priv->hw_drop[i][j] = mlx5dr_action_create_dest_drop
-				(priv->dr_ctx, mlx5_hw_act_flag[i][j]);
-			if (!priv->hw_drop[i][j])
-				goto err;
-		}
+		uint32_t act_flags = 0;
+
+		act_flags = mlx5_hw_act_flag[i][0] | mlx5_hw_act_flag[i][1];
+		if (is_proxy)
+			act_flags |= mlx5_hw_act_flag[i][2];
+		priv->hw_drop[i] = mlx5dr_action_create_dest_drop(priv->dr_ctx, act_flags);
+		if (!priv->hw_drop[i])
+			goto err;
 		priv->hw_tag[i] = mlx5dr_action_create_tag
 			(priv->dr_ctx, mlx5_hw_act_flag[i][0]);
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (is_proxy) {
+		ret = flow_hw_create_vport_actions(priv);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+		ret = flow_hw_create_ctrl_tables(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return 0;
 err:
+	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
@@ -2774,6 +3640,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -2792,10 +3660,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i, j;
+	int i;
 
 	if (!priv->dr_ctx)
 		return;
+	flow_hw_rxq_flag_set(dev, false);
+	flow_hw_flush_all_ctrl_flows(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -2809,13 +3679,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, at, NULL);
 	}
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
@@ -3058,4 +3927,397 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
+static uint32_t
+flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
+{
+	MLX5_ASSERT(priv->nb_queue > 0);
+	return priv->nb_queue - 1;
+}
+
+/**
+ * Creates a control flow using flow template API on @p proxy_dev device,
+ * on behalf of @p owner_dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * Created flow is stored in private list associated with @p proxy_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device on behalf of which flow is created.
+ * @param proxy_dev
+ *   Pointer to Ethernet device on which flow is created.
+ * @param table
+ *   Pointer to flow table.
+ * @param items
+ *   Pointer to flow rule items.
+ * @param item_template_idx
+ *   Index of an item template associated with @p table.
+ * @param actions
+ *   Pointer to flow rule actions.
+ * @param action_template_idx
+ *   Index of an action template associated with @p table.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno set.
+ */
+static __rte_unused int
+flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
+			 struct rte_eth_dev *proxy_dev,
+			 struct rte_flow_template_table *table,
+			 struct rte_flow_item items[],
+			 uint8_t item_template_idx,
+			 struct rte_flow_action actions[],
+			 uint8_t action_template_idx)
+{
+	struct mlx5_priv *priv = proxy_dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	struct rte_flow *flow = NULL;
+	struct mlx5_hw_ctrl_flow *entry = NULL;
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	entry = mlx5_malloc(MLX5_MEM_ZERO | MLX5_MEM_SYS, sizeof(*entry),
+			    0, SOCKET_ID_ANY);
+	if (!entry) {
+		DRV_LOG(ERR, "port %u not enough memory to create control flows",
+			proxy_dev->data->port_id);
+		rte_errno = ENOMEM;
+		ret = -rte_errno;
+		goto error;
+	}
+	flow = flow_hw_async_flow_create(proxy_dev, queue, &op_attr, table,
+					 items, item_template_idx,
+					 actions, action_template_idx,
+					 NULL, NULL);
+	if (!flow) {
+		DRV_LOG(ERR, "port %u failed to enqueue create control"
+			" flow operation", proxy_dev->data->port_id);
+		ret = -rte_errno;
+		goto error;
+	}
+	ret = flow_hw_push(proxy_dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			proxy_dev->data->port_id);
+		goto error;
+	}
+	ret = __flow_hw_pull_comp(proxy_dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to insert control flow",
+			proxy_dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto error;
+	}
+	entry->owner_dev = owner_dev;
+	entry->flow = flow;
+	LIST_INSERT_HEAD(&priv->hw_ctrl_flows, entry, next);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+error:
+	if (entry)
+		mlx5_free(entry);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys a control flow @p flow using flow template API on @p dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * If the @p flow is stored on any private list/pool, then caller must free up
+ * the relevant resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param flow
+ *   Pointer to flow rule.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+static int
+flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	ret = flow_hw_async_flow_destroy(dev, queue, &op_attr, flow, NULL, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to enqueue destroy control"
+			" flow operation", dev->data->port_id);
+		goto exit;
+	}
+	ret = flow_hw_push(dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			dev->data->port_id);
+		goto exit;
+	}
+	ret = __flow_hw_pull_comp(dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to destroy control flow",
+			dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto exit;
+	}
+exit:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys control flows created on behalf of @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	if (owner_priv->sh->config.dv_esw_en) {
+		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u",
+				owner_port_id);
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+		proxy_priv = proxy_dev->data->dev_private;
+	} else {
+		proxy_dev = owner_dev;
+		proxy_priv = owner_priv;
+	}
+	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		if (cf->owner_dev == owner_dev) {
+			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+			if (ret) {
+				rte_errno = ret;
+				return -ret;
+			}
+			LIST_REMOVE(cf, next);
+			mlx5_free(cf);
+		}
+		cf = cf_next;
+	}
+	return 0;
+}
+
+/**
+ * Destroys all control flows created on @p dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+static int
+flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	int ret;
+
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
+		if (ret) {
+			rte_errno = ret;
+			return -ret;
+		}
+		LIST_REMOVE(cf, next);
+		mlx5_free(cf);
+		cf = cf_next;
+	}
+	return 0;
+}
+
+int
+mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HW_SQ_MISS_GROUP,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx ||
+	    !priv->hw_esw_sq_miss_root_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_esw_sq_miss_root_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct mlx5_rte_flow_item_sq queue_spec = {
+		.queue = txq,
+	};
+	struct mlx5_rte_flow_item_sq queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &queue_spec,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_ethdev port = {
+		.port_id = port_id,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	RTE_SET_USED(txq);
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
+	    !proxy_priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_sq_miss_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = port_id,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = 1,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_zero_tbl,
+					items, 0, actions, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index fd902078f8..7ffaf4c227 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1245,12 +1245,14 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 	uint16_t ether_type = 0;
 	bool is_empty_vlan = false;
 	uint16_t udp_dport = 0;
+	bool is_root;
 
 	if (items == NULL)
 		return -1;
 	ret = mlx5_flow_validate_attributes(dev, attr, error);
 	if (ret < 0)
 		return ret;
+	is_root = ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int ret = 0;
@@ -1380,7 +1382,7 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index c68b32cf14..f59d314ff4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1280,6 +1280,52 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+
+static int
+mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	int ret;
+
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
+			goto error;
+	}
+	for (i = 0; i < priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
+		uint32_t queue;
+
+		if (!txq)
+			continue;
+		if (txq->is_hairpin)
+			queue = txq->obj->sq->id;
+		else
+			queue = txq->obj->sq_obj.sq->id;
+		if ((priv->representor || priv->master) &&
+		    priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
+	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
+			goto error;
+	}
+	return 0;
+error:
+	ret = rte_errno;
+	mlx5_flow_hw_flush_ctrl_flows(dev);
+	rte_errno = ret;
+	return -rte_errno;
+}
+
+#endif
+
 /**
  * Enable traffic flows configured by control plane
  *
@@ -1316,6 +1362,10 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 	unsigned int j;
 	int ret;
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_traffic_enable_hws(dev);
+#endif
 	/*
 	 * Hairpin txq default flow should be created no matter if it is
 	 * isolation mode. Or else all the packets to be sent will be sent
@@ -1346,13 +1396,17 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_create_esw_table_zero_flow(dev))
-			priv->fdb_def_rule = 1;
-		else
-			DRV_LOG(INFO, "port %u FDB default rule cannot be"
-				" configured - only Eswitch group 0 flows are"
-				" supported.", dev->data->port_id);
+	if (priv->sh->config.fdb_def_rule) {
+		if (priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_create_esw_table_zero_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				DRV_LOG(INFO, "port %u FDB default rule cannot be configured - only Eswitch group 0 flows are supported.",
+					dev->data->port_id);
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled",
+			dev->data->port_id);
 	}
 	if (!priv->sh->config.lacp_by_user && priv->pf_bond >= 0) {
 		ret = mlx5_flow_lacp_miss(dev);
@@ -1470,7 +1524,14 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 void
 mlx5_traffic_disable(struct rte_eth_dev *dev)
 {
-	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		mlx5_flow_hw_flush_ctrl_flows(dev);
+	else
+#endif
+		mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 06/18] net/mlx5: add extended metadata mode for hardware steering
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (4 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 05/18] net/mlx5: add HW steering port action Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 07/18] net/mlx5: add HW steering meter action Suanming Mou
                     ` (11 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Bing Zhao

From: Bing Zhao <bingz@nvidia.com>

The new mode 4 of devarg "dv_xmeta_en" is added for HWS only. In this
mode, the Rx / Tx metadata with 32b width copy between FDB and NIC is
supported. The mark is only supported in NIC and there is no copy
supported.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 doc/guides/nics/mlx5.rst         |   4 +
 drivers/net/mlx5/linux/mlx5_os.c |  10 +-
 drivers/net/mlx5/mlx5.c          |   7 +-
 drivers/net/mlx5/mlx5.h          |   8 +-
 drivers/net/mlx5/mlx5_flow.c     |   8 +-
 drivers/net/mlx5/mlx5_flow.h     |  14 +
 drivers/net/mlx5/mlx5_flow_dv.c  |  43 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 864 ++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_trigger.c  |   3 +
 9 files changed, 876 insertions(+), 85 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 7d2095f075..0c7bd042a4 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -980,6 +980,10 @@ for an additional list of options shared with other mlx5 drivers.
   - 3, this engages tunnel offload mode. In E-Switch configuration, that
     mode implicitly activates ``dv_xmeta_en=1``.
 
+  - 4, this mode only supported in HWS (``dv_flow_en=2``). The Rx / Tx
+    metadata with 32b width copy between FDB and NIC is supported. The
+    mark is only supported in NIC and there is no copy supported.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index d674b54624..c70cd84b8d 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1554,6 +1554,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 #ifdef HAVE_MLX5_HWS_SUPPORT
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+			DRV_LOG(ERR,
+				"metadata mode %u is not supported in HWS eswitch mode",
+				priv->sh->config.dv_xmeta_en);
+				err = ENOTSUP;
+				goto error;
+		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
@@ -1569,7 +1578,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		goto error;
 #endif
 	}
-	/* Port representor shares the same max priority with pf port. */
 	if (!priv->sh->flow_priority_check_flag) {
 		/* Supported Verbs flow priority number detection. */
 		err = mlx5_flow_discover_priorities(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 470b9c2d0f..9cd4892858 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1218,7 +1218,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
 		    tmp != MLX5_XMETA_MODE_META32 &&
-		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
+		    tmp != MLX5_XMETA_MODE_MISS_INFO &&
+		    tmp != MLX5_XMETA_MODE_META32_HWS) {
 			DRV_LOG(ERR, "Invalid extensive metadata parameter.");
 			rte_errno = EINVAL;
 			return -rte_errno;
@@ -2849,6 +2850,10 @@ mlx5_set_metadata_mask(struct rte_eth_dev *dev)
 		meta = UINT32_MAX;
 		mark = (reg_c0 >> rte_bsf32(reg_c0)) & MLX5_FLOW_MARK_MASK;
 		break;
+	case MLX5_XMETA_MODE_META32_HWS:
+		meta = UINT32_MAX;
+		mark = MLX5_FLOW_MARK_MASK;
+		break;
 	default:
 		meta = 0;
 		mark = 0;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 69a0a60030..6e7216efab 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -303,8 +303,8 @@ struct mlx5_sh_config {
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
-	unsigned int dv_flow_en:2;
-	uint32_t dv_xmeta_en:2; /* Enable extensive flow metadata. */
+	uint32_t dv_flow_en:2; /* Enable DV flow. */
+	uint32_t dv_xmeta_en:3; /* Enable extensive flow metadata. */
 	uint32_t dv_miss_info:1; /* Restore packet after partial hw miss. */
 	uint32_t l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	uint32_t vf_nl_en:1; /* Enable Netlink requests in VF mode. */
@@ -317,7 +317,6 @@ struct mlx5_sh_config {
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
@@ -1284,12 +1283,12 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
+	/* Availability of mreg_c's. */
 	void *devx_channel_lwm;
 	struct rte_intr_handle *intr_handle_lwm;
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
-	/* Availability of mreg_c's. */
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1515,6 +1514,7 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
+	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 60f76f5a43..3b8e97ccd0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1109,6 +1109,8 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_METADATA_TX:
@@ -1121,11 +1123,14 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_FLOW_MARK:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
+		case MLX5_XMETA_MODE_META32_HWS:
 			return REG_NON;
 		case MLX5_XMETA_MODE_META16:
 			return REG_C_1;
@@ -4444,7 +4449,8 @@ static bool flow_check_modify_action_type(struct rte_eth_dev *dev,
 		return true;
 	case RTE_FLOW_ACTION_TYPE_FLAG:
 	case RTE_FLOW_ACTION_TYPE_MARK:
-		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY)
+		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS)
 			return true;
 		else
 			return false;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 25b44ccca2..b0af13886a 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -48,6 +48,12 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
 };
 
+/* Private (internal) Field IDs for MODIFY_FIELD action. */
+enum mlx5_rte_flow_field_id {
+		MLX5_RTE_FLOW_FIELD_END = INT_MIN,
+			MLX5_RTE_FLOW_FIELD_META_REG,
+};
+
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
 
 enum {
@@ -1178,6 +1184,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
+	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
 };
 
 /* Jump action struct. */
@@ -1254,6 +1261,11 @@ struct mlx5_flow_group {
 #define MLX5_HW_TBL_MAX_ITEM_TEMPLATE 2
 #define MLX5_HW_TBL_MAX_ACTION_TEMPLATE 32
 
+struct mlx5_flow_template_table_cfg {
+	struct rte_flow_template_table_attr attr; /* Table attributes passed through flow API. */
+	bool external; /* True if created by flow API, false if table is internal to PMD. */
+};
+
 struct rte_flow_template_table {
 	LIST_ENTRY(rte_flow_template_table) next;
 	struct mlx5_flow_group *grp; /* The group rte_flow_template_table uses. */
@@ -1263,6 +1275,7 @@ struct rte_flow_template_table {
 	/* Action templates bind to the table. */
 	struct mlx5_hw_action_template ats[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	struct mlx5_indexed_pool *flow; /* The table's flow ipool. */
+	struct mlx5_flow_template_table_cfg cfg;
 	uint32_t type; /* Flow table type RX/TX/FDB. */
 	uint8_t nb_item_templates; /* Item template number. */
 	uint8_t nb_action_templates; /* Action template number. */
@@ -2370,4 +2383,5 @@ int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1ee26be975..a0bcaa5c53 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1758,7 +1758,8 @@ mlx5_flow_field_id_to_modify_info
 			int reg;
 
 			if (priv->sh->config.dv_flow_en == 2)
-				reg = REG_C_1;
+				reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG,
+							 data->level);
 			else
 				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
 							   data->level, error);
@@ -1837,6 +1838,24 @@ mlx5_flow_field_id_to_modify_info
 		else
 			info[idx].offset = off_be;
 		break;
+	case MLX5_RTE_FLOW_FIELD_META_REG:
+		{
+			uint32_t meta_mask = priv->sh->dv_meta_mask;
+			uint32_t meta_count = __builtin_popcount(meta_mask);
+			uint32_t reg = data->level;
+
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT(reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0, reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -9794,7 +9813,19 @@ flow_dv_translate_item_meta(struct rte_eth_dev *dev,
 	mask = meta_m->data;
 	if (key_type == MLX5_SET_MATCHER_HS_M)
 		mask = value;
-	reg = flow_dv_get_metadata_reg(dev, attr, NULL);
+	/*
+	 * In the current implementation, REG_B cannot be used to match.
+	 * Force to use REG_C_1 in HWS root table as other tables.
+	 * This map may change.
+	 * NIC: modify - REG_B to be present in SW
+	 *      match - REG_C_1 when copied from FDB, different from SWS
+	 * FDB: modify - REG_C_1 in Xmeta mode, REG_NON in legacy mode
+	 *      match - REG_C_1 in FDB
+	 */
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = flow_dv_get_metadata_reg(dev, attr, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_META, 0);
 	if (reg < 0)
 		return;
 	MLX5_ASSERT(reg != REG_NON);
@@ -9894,7 +9925,10 @@ flow_dv_translate_item_tag(struct rte_eth_dev *dev, void *key,
 	/* When set mask, the index should be from spec. */
 	index = tag_vv ? tag_vv->index : tag_v->index;
 	/* Get the metadata register index for the tag. */
-	reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG, index, NULL);
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG, index, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, index);
 	MLX5_ASSERT(reg > 0);
 	flow_dv_match_meta_reg(key, reg, tag_v->data, tag_m->data);
 }
@@ -13412,7 +13446,8 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 	 */
 	if (!(wks.item_flags & MLX5_FLOW_ITEM_PORT_ID) &&
 	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) && priv->sh->esw_mode &&
-	    !(attr->egress && !attr->transfer)) {
+	    !(attr->egress && !attr->transfer) &&
+	    attr->group != MLX5_FLOW_MREG_CP_TABLE_GROUP) {
 		if (flow_dv_translate_item_port_id_all(dev, match_mask,
 						   match_value, NULL, attr))
 			return -rte_errno;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 991e4c9b7b..319c8d1a89 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,13 +20,27 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
-/* Maximum number of rules in control flow tables */
+/* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Flow group for SQ miss default flows/ */
-#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+/* Lowest flow group usable by an application. */
+#define MLX5_HW_LOWEST_USABLE_GROUP (1)
+
+/* Maximum group index usable by user applications for transfer flows. */
+#define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
+
+/* Lowest priority for HW root table. */
+#define MLX5_HW_LOWEST_PRIO_ROOT 15
+
+/* Lowest priority for HW non-root table. */
+#define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+static int flow_hw_translate_group(struct rte_eth_dev *dev,
+				   const struct mlx5_flow_template_table_cfg *cfg,
+				   uint32_t group,
+				   uint32_t *table_group,
+				   struct rte_flow_error *error);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -213,12 +227,12 @@ flow_hw_rss_item_flags_get(const struct rte_flow_item items[])
  */
 static struct mlx5_hw_jump_action *
 flow_hw_jump_action_register(struct rte_eth_dev *dev,
-			     const struct rte_flow_attr *attr,
+			     const struct mlx5_flow_template_table_cfg *cfg,
 			     uint32_t dest_group,
 			     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_attr jattr = *attr;
+	struct rte_flow_attr jattr = cfg->attr.flow_attr;
 	struct mlx5_flow_group *grp;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -226,9 +240,13 @@ flow_hw_jump_action_register(struct rte_eth_dev *dev,
 		.data = &jattr,
 	};
 	struct mlx5_list_entry *ge;
+	uint32_t target_group;
 
-	jattr.group = dest_group;
-	ge = mlx5_hlist_register(priv->sh->flow_tbls, dest_group, &ctx);
+	target_group = dest_group;
+	if (flow_hw_translate_group(dev, cfg, dest_group, &target_group, error))
+		return NULL;
+	jattr.group = target_group;
+	ge = mlx5_hlist_register(priv->sh->flow_tbls, target_group, &ctx);
 	if (!ge)
 		return NULL;
 	grp = container_of(ge, struct mlx5_flow_group, entry);
@@ -760,7 +778,8 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)conf->src.pvalue :
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
-		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
 			item.spec = &value;
@@ -860,6 +879,9 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	if (m && !!m->port_id) {
 		struct mlx5_priv *port_priv;
 
+		if (!v)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
 		if (port_priv == NULL)
 			return rte_flow_error_set
@@ -903,8 +925,8 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] table_attr
- *   Pointer to the table attributes.
+ * @param[in] cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in/out] acts
@@ -919,12 +941,13 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  */
 static int
 flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct rte_flow_template_table_attr *table_attr,
+			  const struct mlx5_flow_template_table_cfg *cfg,
 			  struct mlx5_hw_actions *acts,
 			  struct rte_flow_actions_template *at,
 			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
 	const struct rte_flow_attr *attr = &table_attr->flow_attr;
 	struct rte_flow_action *actions = at->actions;
 	struct rte_flow_action *action_start = actions;
@@ -991,7 +1014,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
 				acts->jump = flow_hw_jump_action_register
-						(dev, attr, jump_group, error);
+						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
 				acts->rule_acts[i].action = (!!attr->group) ?
@@ -1101,6 +1124,16 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 							   error);
 			if (err)
 				goto err;
+			/*
+			 * Adjust the action source position for the following.
+			 * ... / MODIFY_FIELD: rx_cpy_pos / (QUEUE|RSS) / ...
+			 * The next action will be Q/RSS, there will not be
+			 * another adjustment and the real source position of
+			 * the following actions will be decreased by 1.
+			 * No change of the total actions in the new template.
+			 */
+			if ((actions - action_start) == at->rx_cpy_pos)
+				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			if (flow_hw_represented_port_compile
@@ -1365,7 +1398,8 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 	else
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
-	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
 	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
@@ -1513,7 +1547,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
 			jump = flow_hw_jump_action_register
-				(dev, &attr, jump_group, NULL);
+				(dev, &table->cfg, jump_group, NULL);
 			if (!jump)
 				return -1;
 			rule_acts[act_data->action_dst].action =
@@ -1710,7 +1744,13 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow actions based on the input actions.*/
+	/*
+	 * Construct the flow actions based on the input actions.
+	 * The implicitly appended action is always fixed, like metadata
+	 * copy action from FDB to NIC Rx.
+	 * No need to copy and contrust a new "actions" list based on the
+	 * user's input, in order to save the cost.
+	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
 				  actions, rule_acts, &acts_num)) {
 		rte_errno = EINVAL;
@@ -1981,6 +2021,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
 	/* Flush flow per-table from MLX5_DEFAULT_FLUSH_QUEUE. */
 	hw_q = &priv->hw_q[MLX5_DEFAULT_FLUSH_QUEUE];
 	LIST_FOREACH(tbl, &priv->flow_hw_tbl, next) {
+		if (!tbl->cfg.external)
+			continue;
 		MLX5_IPOOL_FOREACH(tbl->flow, fidx, flow) {
 			if (flow_hw_async_flow_destroy(dev,
 						MLX5_DEFAULT_FLUSH_QUEUE,
@@ -2018,8 +2060,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] attr
- *   Pointer to the table attributes.
+ * @param[in] table_cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in] nb_item_templates
@@ -2036,7 +2078,7 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  */
 static struct rte_flow_template_table *
 flow_hw_table_create(struct rte_eth_dev *dev,
-		     const struct rte_flow_template_table_attr *attr,
+		     const struct mlx5_flow_template_table_cfg *table_cfg,
 		     struct rte_flow_pattern_template *item_templates[],
 		     uint8_t nb_item_templates,
 		     struct rte_flow_actions_template *action_templates[],
@@ -2048,6 +2090,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -2088,6 +2131,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*tbl), 0, rte_socket_id());
 	if (!tbl)
 		goto error;
+	tbl->cfg = *table_cfg;
 	/* Allocate flow indexed pool. */
 	tbl->flow = mlx5_ipool_create(&cfg);
 	if (!tbl->flow)
@@ -2131,7 +2175,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			goto at_error;
 		}
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, attr,
+		err = flow_hw_actions_translate(dev, &tbl->cfg,
 						&tbl->ats[i].acts,
 						action_templates[i], error);
 		if (err) {
@@ -2174,6 +2218,96 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Translates group index specified by the user in @p attr to internal
+ * group index.
+ *
+ * Translation is done by incrementing group index, so group n becomes n + 1.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] cfg
+ *   Pointer to the template table configuration.
+ * @param[in] group
+ *   Currently used group index (table group or jump destination).
+ * @param[out] table_group
+ *   Pointer to output group index.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success. Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static int
+flow_hw_translate_group(struct rte_eth_dev *dev,
+			const struct mlx5_flow_template_table_cfg *cfg,
+			uint32_t group,
+			uint32_t *table_group,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
+
+	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
+	} else {
+		*table_group = group;
+	}
+	return 0;
+}
+
+/**
+ * Create flow table.
+ *
+ * This function is a wrapper over @ref flow_hw_table_create(), which translates parameters
+ * provided by user to proper internal values.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Pointer to the table attributes.
+ * @param[in] item_templates
+ *   Item template array to be binded to the table.
+ * @param[in] nb_item_templates
+ *   Number of item templates.
+ * @param[in] action_templates
+ *   Action template array to be binded to the table.
+ * @param[in] nb_action_templates
+ *   Number of action templates.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Table on success, Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static struct rte_flow_template_table *
+flow_hw_template_table_create(struct rte_eth_dev *dev,
+			      const struct rte_flow_template_table_attr *attr,
+			      struct rte_flow_pattern_template *item_templates[],
+			      uint8_t nb_item_templates,
+			      struct rte_flow_actions_template *action_templates[],
+			      uint8_t nb_action_templates,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = *attr,
+		.external = true,
+	};
+	uint32_t group = attr->flow_attr.group;
+
+	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
+		return NULL;
+	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
+				    action_templates, nb_action_templates, error);
+}
+
 /**
  * Destroy flow table.
  *
@@ -2309,10 +2443,13 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 					  "cannot use represented_port actions"
 					  " without an E-Switch");
-	if (mask_conf->port_id) {
+	if (mask_conf && mask_conf->port_id) {
 		struct mlx5_priv *port_priv;
 		struct mlx5_priv *dev_priv;
 
+		if (!action_conf)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
 		if (!port_priv)
 			return rte_flow_error_set(error, rte_errno,
@@ -2337,20 +2474,77 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static inline int
+flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
+				const struct rte_flow_action masks[],
+				const struct rte_flow_action *ins_actions,
+				const struct rte_flow_action *ins_masks,
+				struct rte_flow_action *new_actions,
+				struct rte_flow_action *new_masks,
+				uint16_t *ins_pos)
+{
+	uint16_t idx, total = 0;
+	bool ins = false;
+	bool act_end = false;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(ins_actions && ins_masks);
+	for (idx = 0; !act_end; idx++) {
+		if (idx >= MLX5_HW_MAX_ACTS)
+			return -1;
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
+		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+			ins = true;
+			*ins_pos = idx;
+		}
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+			act_end = true;
+	}
+	if (!ins)
+		return 0;
+	else if (idx == MLX5_HW_MAX_ACTS)
+		return -1; /* No more space. */
+	total = idx;
+	/* Before the position, no change for the actions. */
+	for (idx = 0; idx < *ins_pos; idx++) {
+		new_actions[idx] = actions[idx];
+		new_masks[idx] = masks[idx];
+	}
+	/* Insert the new action and mask to the position. */
+	new_actions[idx] = *ins_actions;
+	new_masks[idx] = *ins_masks;
+	/* Remaining content is right shifted by one position. */
+	for (; idx < total; idx++) {
+		new_actions[idx + 1] = actions[idx];
+		new_masks[idx + 1] = masks[idx];
+	}
+	return 0;
+}
+
 static int
 flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
-	int i;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t i;
 	bool actions_end = false;
 	int ret;
 
+	/* FDB actions are only valid to proxy port. */
+	if (attr->transfer && (!priv->sh->config.dv_esw_en || !priv->master))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "transfer actions are only valid to proxy port");
 	for (i = 0; !actions_end; ++i) {
 		const struct rte_flow_action *action = &actions[i];
 		const struct rte_flow_action *mask = &masks[i];
 
+		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
 		if (action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -2447,21 +2641,77 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int len, act_len, mask_len, i;
-	struct rte_flow_actions_template *at;
+	struct rte_flow_actions_template *at = NULL;
+	uint16_t pos = MLX5_HW_MAX_ACTS;
+	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
+	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
+	const struct rte_flow_action *ra;
+	const struct rte_flow_action *rm;
+	const struct rte_flow_action_modify_field rx_mreg = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_B,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field rx_mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action rx_cpy = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg,
+	};
+	const struct rte_flow_action rx_cpy_mask = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg_mask,
+	};
 
-	if (flow_hw_action_validate(dev, actions, masks, error))
+	if (flow_hw_action_validate(dev, attr, actions, masks, error))
 		return NULL;
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				NULL, 0, actions, error);
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en) {
+		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
+						    tmp_action, tmp_mask, &pos)) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "Failed to concatenate new action/mask");
+			return NULL;
+		}
+	}
+	/* Application should make sure only one Q/RSS exist in one rule. */
+	if (pos == MLX5_HW_MAX_ACTS) {
+		ra = actions;
+		rm = masks;
+	} else {
+		ra = tmp_action;
+		rm = tmp_mask;
+	}
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
 		return NULL;
 	len = RTE_ALIGN(act_len, 16);
-	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				 NULL, 0, masks, error);
+	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, rm, error);
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at), 64, rte_socket_id());
+	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
+			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2469,18 +2719,20 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
+	/* Actions part is in the first half. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions, len,
-				actions, error);
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
+				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	at->masks = (struct rte_flow_action *)
-		    (((uint8_t *)at->actions) + act_len);
+	/* Masks part is in the second half. */
+	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
-				 len - act_len, masks, error);
+				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
 	 * The rte_flow_conv() function copies the content from conf pointer.
@@ -2497,7 +2749,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	mlx5_free(at);
+	if (at)
+		mlx5_free(at);
 	return NULL;
 }
 
@@ -2572,6 +2825,80 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 	return copied_items;
 }
 
+static int
+flow_hw_pattern_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error)
+{
+	int i;
+	bool items_end = false;
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+
+	for (i = 0; !items_end; i++) {
+		int type = items[i].type;
+
+		switch (type) {
+		case RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			int reg;
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+
+			reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, tag->index);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported tag index");
+			break;
+		}
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+			struct mlx5_priv *priv = dev->data->dev_private;
+			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
+
+			if (!((1 << (tag->index - REG_C_0)) & regcs))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported internal tag index");
+		}
+		case RTE_FLOW_ITEM_TYPE_VOID:
+		case RTE_FLOW_ITEM_TYPE_ETH:
+		case RTE_FLOW_ITEM_TYPE_VLAN:
+		case RTE_FLOW_ITEM_TYPE_IPV4:
+		case RTE_FLOW_ITEM_TYPE_IPV6:
+		case RTE_FLOW_ITEM_TYPE_UDP:
+		case RTE_FLOW_ITEM_TYPE_TCP:
+		case RTE_FLOW_ITEM_TYPE_GTP:
+		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ITEM_TYPE_VXLAN:
+		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
+		case RTE_FLOW_ITEM_TYPE_META:
+		case RTE_FLOW_ITEM_TYPE_GRE:
+		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
+		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
+		case RTE_FLOW_ITEM_TYPE_ICMP:
+		case RTE_FLOW_ITEM_TYPE_ICMP6:
+			break;
+		case RTE_FLOW_ITEM_TYPE_END:
+			items_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL,
+						  "Unsupported item type");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow item template.
  *
@@ -2598,6 +2925,8 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
 
+	if (flow_hw_pattern_validate(dev, attr, items, error))
+		return NULL;
 	if (priv->sh->config.dv_esw_en && attr->ingress) {
 		/*
 		 * Disallow pattern template with ingress and egress/transfer
@@ -3032,6 +3361,17 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+static uint32_t
+flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+{
+	uint32_t usable_mask = ~priv->vport_meta_mask;
+
+	if (usable_mask)
+		return (1 << rte_bsf32(usable_mask));
+	else
+		return 0;
+}
+
 /**
  * Creates a flow pattern template used to match on E-Switch Manager.
  * This template is used to set up a table for SQ miss default flow.
@@ -3070,7 +3410,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match on a TX queue.
+ * Creates a flow pattern template used to match REG_C_0 and a TX queue.
+ * Matching on REG_C_0 is set up to match on least significant bit usable
+ * by user-space, which is set when packet was originated from E-Switch Manager.
+ *
  * This template is used to set up a table for SQ miss default flow.
  *
  * @param dev
@@ -3080,16 +3423,30 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
  *   Pointer to flow pattern template on success, NULL otherwise.
  */
 static struct rte_flow_pattern_template *
-flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
 	};
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
@@ -3100,6 +3457,12 @@ flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
+		return NULL;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -3137,6 +3500,132 @@ flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
+/*
+ * Creating a flow pattern template with all ETH packets matching.
+ * This template is used to set up a table for default Tx copy (Tx metadata
+ * to REG_C_1) flow rule usage.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr tx_pa_attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_pattern_template_create(dev, &tx_pa_attr, eth_all, &drop_err);
+}
+
+/**
+ * Creates a flow actions template with modify field action and masked jump action.
+ * Modify field action sets the least significant bit of REG_C_0 (usable by user-space)
+ * to 1, meaning that packet was originated from E-Switch Manager. Jump action
+ * transfers steering to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
+	uint32_t marker_bit_mask = UINT32_MAX;
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
+		return NULL;
+	}
+	set_reg_v.dst.offset = rte_bsf32(marker_bit);
+	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
+	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
 /**
  * Creates a flow actions template with an unmasked JUMP action. Flows
  * based on this template will perform a jump to some group. This template
@@ -3231,6 +3720,73 @@ flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
 					       NULL);
 }
 
+/*
+ * Creating an actions template to use header modify action for register
+ * copying. This template is used to set up a table for copy flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr tx_act_attr = {
+		.egress = 1,
+	};
+	const struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	const struct rte_flow_action copy_reg_mask[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_mask,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
+					       copy_reg_mask, &drop_err);
+}
+
 /**
  * Creates a control flow table used to transfer traffic from E-Switch Manager
  * and TX queues from group 0 to group 1.
@@ -3260,8 +3816,12 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 
@@ -3286,16 +3846,56 @@ flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
 {
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
-			.group = MLX5_HW_SQ_MISS_GROUP,
-			.priority = 0,
+			.group = 1,
+			.priority = MLX5_HW_LOWEST_PRIO_NON_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
+
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
+}
+
+/*
+ * Creating the default Tx metadata copy table on NIC Tx group 0.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param pt
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
+					  struct rte_flow_pattern_template *pt,
+					  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr tx_tbl_attr = {
+		.flow_attr = {
+			.group = 0, /* Root */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = 1, /* One default flow rule for all. */
+	};
+	struct mlx5_flow_template_table_cfg tx_tbl_cfg = {
+		.attr = tx_tbl_attr,
+		.external = false,
+	};
+	struct rte_flow_error drop_err;
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	RTE_SET_USED(drop_err);
+	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
 }
 
 /**
@@ -3320,15 +3920,19 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 15, /* TODO: Flow priority discovery. */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 /**
@@ -3346,11 +3950,14 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
-	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *regc_sq_items_tmpl = NULL;
 	struct rte_flow_pattern_template *port_items_tmpl = NULL;
-	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_pattern_template *tx_meta_items_tmpl = NULL;
+	struct rte_flow_actions_template *regc_jump_actions_tmpl = NULL;
 	struct rte_flow_actions_template *port_actions_tmpl = NULL;
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
+	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
 
 	/* Item templates */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
@@ -3359,8 +3966,8 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
-	if (!sq_items_tmpl) {
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create SQ item template for"
 			" control flows", dev->data->port_id);
 		goto error;
@@ -3371,11 +3978,18 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Action templates */
-	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
-									 MLX5_HW_SQ_MISS_GROUP);
-	if (!jump_sq_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
+	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
+	if (!regc_jump_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
@@ -3385,23 +3999,32 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
 	if (!jump_one_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
-			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_root_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
-	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
@@ -3416,6 +4039,16 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
+		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
+					tx_meta_items_tmpl, tx_meta_actions_tmpl);
+		if (!priv->hw_tx_meta_cpy_tbl) {
+			DRV_LOG(ERR, "port %u failed to create table for default"
+				" Tx metadata copy flow rule", dev->data->port_id);
+			goto error;
+		}
+	}
 	return 0;
 error:
 	if (priv->hw_esw_zero_tbl) {
@@ -3430,16 +4063,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
 	if (port_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
-	if (jump_sq_actions_tmpl)
-		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (regc_jump_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
-	if (sq_items_tmpl)
-		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (regc_sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, regc_sq_items_tmpl, NULL);
 	if (esw_mgr_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
 	return -EINVAL;
@@ -3491,7 +4128,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
-	int ret;
+	int ret = 0;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -3642,6 +4279,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	/* Do not overwrite the internal errno information. */
+	if (ret)
+		return ret;
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -3751,17 +4391,17 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		return;
 	unset |= 1 << (priv->mtr_color_reg - REG_C_0);
 	unset |= 1 << (REG_C_6 - REG_C_0);
-	if (meta_mode == MLX5_XMETA_MODE_META32_HWS) {
-		unset |= 1 << (REG_C_1 - REG_C_0);
+	if (priv->sh->config.dv_esw_en)
 		unset |= 1 << (REG_C_0 - REG_C_0);
-	}
+	if (meta_mode == MLX5_XMETA_MODE_META32_HWS)
+		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
 						mlx5_flow_hw_avl_tags[i];
-				copy_masks |= (1 << i);
+				copy_masks |= (1 << (mlx5_flow_hw_avl_tags[i] - REG_C_0));
 			}
 		}
 		if (copy_masks != masks) {
@@ -3903,7 +4543,6 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	return flow_dv_action_destroy(dev, handle, error);
 }
 
-
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -3911,7 +4550,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
-	.template_table_create = flow_hw_table_create,
+	.template_table_create = flow_hw_template_table_create,
 	.template_table_destroy = flow_hw_table_destroy,
 	.async_flow_create = flow_hw_async_flow_create,
 	.async_flow_destroy = flow_hw_async_flow_destroy,
@@ -3927,13 +4566,6 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
-static uint32_t
-flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
-{
-	MLX5_ASSERT(priv->nb_queue > 0);
-	return priv->nb_queue - 1;
-}
-
 /**
  * Creates a control flow using flow template API on @p proxy_dev device,
  * on behalf of @p owner_dev device.
@@ -3971,7 +4603,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4046,7 +4678,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4183,10 +4815,24 @@ mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
+	struct rte_flow_action_modify_field modify_field = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
 	struct rte_flow_action_jump jump = {
-		.group = MLX5_HW_SQ_MISS_GROUP,
+		.group = 1,
 	};
 	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &modify_field,
+		},
 		{
 			.type = RTE_FLOW_ACTION_TYPE_JUMP,
 			.conf = &jump,
@@ -4209,6 +4855,12 @@ int
 mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 {
 	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_sq queue_spec = {
 		.queue = txq,
 	};
@@ -4216,6 +4868,12 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
@@ -4241,6 +4899,7 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
+	uint32_t marker_bit;
 	int ret;
 
 	RTE_SET_USED(txq);
@@ -4261,6 +4920,14 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
+	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_create_ctrl_flow(dev, proxy_dev,
 					proxy_priv->hw_esw_sq_miss_tbl,
 					items, 0, actions, 0);
@@ -4320,4 +4987,53 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 					items, 0, actions, 0);
 }
 
+int
+mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx || !priv->hw_tx_meta_cpy_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_tx_meta_cpy_tbl,
+					eth_all, 0, copy_reg_action, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f59d314ff4..cccec08d70 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1292,6 +1292,9 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	if (priv->sh->config.dv_esw_en && priv->master) {
 		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
 			goto error;
+		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
+			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+				goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 07/18] net/mlx5: add HW steering meter action
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (5 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 08/18] net/mlx5: add HW steering counter action Suanming Mou
                     ` (10 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

This commit adds meter action for HWS steering.

HW steering meter is based on ASO. The number of meters will
be used by flows should be specified in advanced in the flow
configure API.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |  61 ++-
 drivers/net/mlx5/mlx5_flow.c       |  71 +++
 drivers/net/mlx5/mlx5_flow.h       |  24 +
 drivers/net/mlx5/mlx5_flow_aso.c   |  30 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 258 ++++++++++-
 drivers/net/mlx5/mlx5_flow_meter.c | 702 ++++++++++++++++++++++++++++-
 6 files changed, 1110 insertions(+), 36 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6e7216efab..325f0b31c5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -362,6 +362,9 @@ struct mlx5_hw_q {
 	struct mlx5_hw_q_job **job; /* LIFO header. */
 } __rte_cache_aligned;
 
+
+
+
 #define MLX5_COUNTERS_PER_POOL 512
 #define MLX5_MAX_PENDING_QUERIES 4
 #define MLX5_CNT_CONTAINER_RESIZE 64
@@ -787,15 +790,29 @@ struct mlx5_flow_meter_policy {
 	/* Is meter action in policy table. */
 	uint32_t hierarchy_drop_cnt:1;
 	/* Is any meter in hierarchy contains drop_cnt. */
+	uint32_t skip_r:1;
+	/* If red color policy is skipped. */
 	uint32_t skip_y:1;
 	/* If yellow color policy is skipped. */
 	uint32_t skip_g:1;
 	/* If green color policy is skipped. */
 	uint32_t mark:1;
 	/* If policy contains mark action. */
+	uint32_t initialized:1;
+	/* Initialized. */
+	uint16_t group;
+	/* The group. */
 	rte_spinlock_t sl;
 	uint32_t ref_cnt;
 	/* Use count. */
+	struct rte_flow_pattern_template *hws_item_templ;
+	/* Hardware steering item templates. */
+	struct rte_flow_actions_template *hws_act_templ[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering action templates. */
+	struct rte_flow_template_table *hws_flow_table[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering tables. */
+	struct rte_flow *hws_flow_rule[MLX5_MTR_DOMAIN_MAX][RTE_COLORS];
+	/* Hardware steering rules. */
 	struct mlx5_meter_policy_action_container act_cnt[MLX5_MTR_RTE_COLORS];
 	/* Policy actions container. */
 	void *dr_drop_action[MLX5_MTR_DOMAIN_MAX];
@@ -870,6 +887,7 @@ struct mlx5_flow_meter_info {
 	 */
 	uint32_t transfer:1;
 	uint32_t def_policy:1;
+	uint32_t initialized:1;
 	/* Meter points to default policy. */
 	uint32_t color_aware:1;
 	/* Meter is color aware mode. */
@@ -885,6 +903,10 @@ struct mlx5_flow_meter_info {
 	/**< Flow meter action. */
 	void *meter_action_y;
 	/**< Flow meter action for yellow init_color. */
+	uint32_t meter_offset;
+	/**< Flow meter offset. */
+	uint16_t group;
+	/**< Flow meter group. */
 };
 
 /* PPS(packets per second) map to BPS(Bytes per second).
@@ -919,6 +941,7 @@ struct mlx5_flow_meter_profile {
 	uint32_t ref_cnt; /**< Use count. */
 	uint32_t g_support:1; /**< If G color will be generated. */
 	uint32_t y_support:1; /**< If Y color will be generated. */
+	uint32_t initialized:1; /**< Initialized. */
 };
 
 /* 2 meters in each ASO cache line */
@@ -939,13 +962,20 @@ enum mlx5_aso_mtr_state {
 	ASO_METER_READY, /* CQE received. */
 };
 
+/*aso flow meter type*/
+enum mlx5_aso_mtr_type {
+	ASO_METER_INDIRECT,
+	ASO_METER_DIRECT,
+};
+
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
 	LIST_ENTRY(mlx5_aso_mtr) next;
+	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
-	uint8_t offset;
+	uint32_t offset;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -969,6 +999,14 @@ struct mlx5_aso_mtr_pools_mng {
 	struct mlx5_aso_mtr_pool **pools; /* ASO flow meter pool array. */
 };
 
+/* Bulk management structure for ASO flow meter. */
+struct mlx5_mtr_bulk {
+	uint32_t size; /* Number of ASO objects. */
+	struct mlx5dr_action *action; /* HWS action */
+	struct mlx5_devx_obj *devx_obj; /* DEVX object. */
+	struct mlx5_aso_mtr *aso; /* Array of ASO objects. */
+};
+
 /* Meter management structure for global flow meter resource. */
 struct mlx5_flow_mtr_mng {
 	struct mlx5_aso_mtr_pools_mng pools_mng;
@@ -1022,6 +1060,7 @@ struct mlx5_flow_tbl_resource {
 #define MLX5_FLOW_TABLE_LEVEL_METER (MLX5_MAX_TABLES - 3)
 #define MLX5_FLOW_TABLE_LEVEL_POLICY (MLX5_MAX_TABLES - 4)
 #define MLX5_MAX_TABLES_EXTERNAL MLX5_FLOW_TABLE_LEVEL_POLICY
+#define MLX5_FLOW_TABLE_HWS_POLICY (MLX5_MAX_TABLES - 10)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 #define MLX5_FLOW_TABLE_FACTOR 10
 
@@ -1308,6 +1347,12 @@ TAILQ_HEAD(mlx5_mtr_profiles, mlx5_flow_meter_profile);
 /* MTR list. */
 TAILQ_HEAD(mlx5_legacy_flow_meters, mlx5_legacy_flow_meter);
 
+struct mlx5_mtr_config {
+	uint32_t nb_meters; /**< Number of configured meters */
+	uint32_t nb_meter_profiles; /**< Number of configured meter profiles */
+	uint32_t nb_meter_policies; /**< Number of configured meter policies */
+};
+
 /* RSS description. */
 struct mlx5_flow_rss_desc {
 	uint32_t level;
@@ -1545,12 +1590,16 @@ struct mlx5_priv {
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
+	struct mlx5_mtr_config mtr_config; /* Meter configuration */
 	uint8_t mtr_sfx_reg; /* Meter prefix-suffix flow match REG_C. */
 	uint8_t mtr_color_reg; /* Meter color match REG_C. */
 	struct mlx5_legacy_flow_meters flow_meters; /* MTR list. */
 	struct mlx5_l3t_tbl *mtr_profile_tbl; /* Meter index lookup table. */
+	struct mlx5_flow_meter_profile *mtr_profile_arr; /* Profile array. */
 	struct mlx5_l3t_tbl *policy_idx_tbl; /* Policy index lookup table. */
+	struct mlx5_flow_meter_policy *mtr_policy_arr; /* Policy array. */
 	struct mlx5_l3t_tbl *mtr_idx_tbl; /* Meter index lookup table. */
+	struct mlx5_mtr_bulk mtr_bulk; /* Meter index mapping for HWS */
 	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 	uint8_t fdb_def_rule; /* Whether fdb jump to table 1 is configured. */
 	struct mlx5_mp_id mp_id; /* ID of a multi-process process */
@@ -1564,13 +1613,13 @@ struct mlx5_priv {
 	struct mlx5_flex_item flex_item[MLX5_PORT_FLEX_ITEM_NUM];
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
+	uint32_t nb_queue; /* HW steering queue number. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
 	/* Action template list. */
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
-	uint32_t nb_queue; /* HW steering queue number. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
@@ -1586,6 +1635,7 @@ struct mlx5_priv {
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
 #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])
+#define CTRL_QUEUE_ID(priv) ((priv)->nb_queue - 1)
 
 struct rte_hairpin_peer_info {
 	uint32_t qp_id;
@@ -1897,6 +1947,11 @@ void mlx5_pmd_socket_uninit(void);
 
 /* mlx5_flow_meter.c */
 
+int mlx5_flow_meter_init(struct rte_eth_dev *dev,
+			 uint32_t nb_meters,
+			 uint32_t nb_meter_profiles,
+			 uint32_t nb_meter_policies);
+void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
 		uint32_t meter_id, uint32_t *mtr_idx);
@@ -1971,7 +2026,7 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 3b8e97ccd0..892c42a10b 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8333,6 +8333,40 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 	return fops->configure(dev, port_attr, nb_queue, queue_attr, error);
 }
 
+/**
+ * Validate item template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the item template attributes.
+ * @param[in] items
+ *   The template item pattern.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"pattern validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->pattern_validate(dev, attr, items, error);
+}
+
 /**
  * Create flow item template.
  *
@@ -8398,6 +8432,43 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 	return fops->pattern_template_destroy(dev, template, error);
 }
 
+/**
+ * Validate flow actions template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the action template attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[in] masks
+ *   List of actions that marks which of the action's member is constant.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
+			const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"actions validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->actions_validate(dev, attr, actions, masks, error);
+}
+
 /**
  * Create flow item template.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index b0af13886a..5f89afbe29 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1665,6 +1665,11 @@ typedef int (*mlx5_flow_port_configure_t)
 			 uint16_t nb_queue,
 			 const struct rte_flow_queue_attr *queue_attr[],
 			 struct rte_flow_error *err);
+typedef int (*mlx5_flow_pattern_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_pattern_template *(*mlx5_flow_pattern_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_pattern_template_attr *attr,
@@ -1674,6 +1679,12 @@ typedef int (*mlx5_flow_pattern_template_destroy_t)
 			(struct rte_eth_dev *dev,
 			 struct rte_flow_pattern_template *template,
 			 struct rte_flow_error *error);
+typedef int (*mlx5_flow_actions_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_actions_template *(*mlx5_flow_actions_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_actions_template_attr *attr,
@@ -1790,8 +1801,10 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_item_update_t item_update;
 	mlx5_flow_info_get_t info_get;
 	mlx5_flow_port_configure_t configure;
+	mlx5_flow_pattern_validate_t pattern_validate;
 	mlx5_flow_pattern_template_create_t pattern_template_create;
 	mlx5_flow_pattern_template_destroy_t pattern_template_destroy;
+	mlx5_flow_actions_validate_t actions_validate;
 	mlx5_flow_actions_template_create_t actions_template_create;
 	mlx5_flow_actions_template_destroy_t actions_template_destroy;
 	mlx5_flow_table_create_t template_table_create;
@@ -1873,6 +1886,8 @@ mlx5_aso_meter_by_idx(struct mlx5_priv *priv, uint32_t idx)
 
 	/* Decrease to original index. */
 	idx--;
+	if (priv->mtr_bulk.aso)
+		return priv->mtr_bulk.aso + idx;
 	MLX5_ASSERT(idx / MLX5_ASO_MTRS_PER_POOL < pools_mng->n);
 	rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 	pool = pools_mng->pools[idx / MLX5_ASO_MTRS_PER_POOL];
@@ -2384,4 +2399,13 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_actions_template_attr *attr,
+		const struct rte_flow_action actions[],
+		const struct rte_flow_action masks[],
+		struct rte_flow_error *error);
+int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 4129e3a9e0..60d0280367 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -642,7 +642,8 @@ mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh)
 static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
-			       struct mlx5_aso_mtr *aso_mtr)
+			       struct mlx5_aso_mtr *aso_mtr,
+			       struct mlx5_mtr_bulk *bulk)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -653,6 +654,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t dseg_idx = 0;
 	struct mlx5_aso_mtr_pool *pool = NULL;
 	uint32_t param_le;
+	int id;
 
 	rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
@@ -666,14 +668,19 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
-	pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-			mtrs[aso_mtr->offset]);
-	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
-			(aso_mtr->offset >> 1));
-	wqe->general_cseg.opcode = rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
-			(ASO_OPC_MOD_POLICER <<
-			WQE_CSEG_OPC_MOD_OFFSET) |
-			sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
+	if (aso_mtr->type == ASO_METER_INDIRECT) {
+		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+				    mtrs[aso_mtr->offset]);
+		id = pool->devx_obj->id;
+	} else {
+		id = bulk->devx_obj->id;
+	}
+	wqe->general_cseg.misc = rte_cpu_to_be_32(id +
+						  (aso_mtr->offset >> 1));
+	wqe->general_cseg.opcode =
+		rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
+			(ASO_OPC_MOD_POLICER << WQE_CSEG_OPC_MOD_OFFSET) |
+			 sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
 	/* There are 2 meters in one ASO cache line. */
 	dseg_idx = aso_mtr->offset & 0x1;
 	wqe->aso_cseg.data_mask =
@@ -811,14 +818,15 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  */
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-			struct mlx5_aso_mtr *mtr)
+			struct mlx5_aso_mtr *mtr,
+			struct mlx5_mtr_bulk *bulk)
 {
 	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 
 	do {
 		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 319c8d1a89..5051741a5a 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -914,6 +914,38 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_meter_compile(struct rte_eth_dev *dev,
+		      const struct mlx5_flow_template_table_cfg *cfg,
+		      uint32_t  start_pos, const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	const struct rte_flow_action_meter *meter = action->conf;
+	uint32_t pos = start_pos;
+	uint32_t group = cfg->attr.flow_attr.group;
+
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
+	acts->rule_acts[pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
+		acts->jump = flow_hw_jump_action_register
+		(dev, cfg, aso_mtr->fm.group, error);
+	if (!acts->jump) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	acts->rule_acts[++pos].action = (!!group) ?
+				    acts->jump->hws_action :
+				    acts->jump->root_action;
+	*end_pos = pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	return 0;
+}
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1142,6 +1174,21 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter *)
+			     masks->conf)->mtr_id) {
+				err = flow_hw_meter_compile(dev, cfg,
+						i, actions, acts, &i, error);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							i))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1482,6 +1529,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
+	const struct rte_flow_action_meter *meter = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1489,6 +1537,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	struct mlx5_aso_mtr *mtr;
+	uint32_t mtr_id;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -1608,6 +1658,29 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			rule_acts[act_data->action_dst].action =
 					priv->hw_vport[port_action->port_id];
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			meter = action->conf;
+			mtr_id = meter->mtr_id;
+			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			rule_acts[act_data->action_dst].action =
+				priv->mtr_bulk.action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+								mtr->offset;
+			jump = flow_hw_jump_action_register
+				(dev, &table->cfg, mtr->fm.group, NULL);
+			if (!jump)
+				return -1;
+			MLX5_ASSERT
+				(!rule_acts[act_data->action_dst + 1].action);
+			rule_acts[act_data->action_dst + 1].action =
+					(!!attr.group) ? jump->hws_action :
+							 jump->root_action;
+			job->flow->jump = jump;
+			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
+			(*acts_num)++;
+			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2523,7 +2596,7 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 }
 
 static int
-flow_hw_action_validate(struct rte_eth_dev *dev,
+flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
@@ -2589,6 +2662,9 @@ flow_hw_action_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -2682,7 +2758,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_action_validate(dev, attr, actions, masks, error))
+	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
@@ -3028,15 +3104,24 @@ flow_hw_pattern_template_destroy(struct rte_eth_dev *dev __rte_unused,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_hw_info_get(struct rte_eth_dev *dev __rte_unused,
-		 struct rte_flow_port_info *port_info __rte_unused,
-		 struct rte_flow_queue_info *queue_info __rte_unused,
+flow_hw_info_get(struct rte_eth_dev *dev,
+		 struct rte_flow_port_info *port_info,
+		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
-	/* Nothing to be updated currently. */
+	uint16_t port_id = dev->data->port_id;
+	struct rte_mtr_capabilities mtr_cap;
+	int ret;
+
 	memset(port_info, 0, sizeof(*port_info));
 	/* Queue size is unlimited from low-level. */
+	port_info->max_nb_queues = UINT32_MAX;
 	queue_info->max_size = UINT32_MAX;
+
+	memset(&mtr_cap, 0, sizeof(struct rte_mtr_capabilities));
+	ret = rte_mtr_capabilities_get(port_id, &mtr_cap, NULL);
+	if (!ret)
+		port_info->max_nb_meters = mtr_cap.n_max;
 	return 0;
 }
 
@@ -4231,6 +4316,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	priv->nb_queue = nb_q_updated;
 	rte_spinlock_init(&priv->hw_ctrl_lock);
 	LIST_INIT(&priv->hw_ctrl_flows);
+	/* Initialize meter library*/
+	if (port_attr->nb_meters)
+		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1))
+			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		uint32_t act_flags = 0;
@@ -4546,8 +4635,10 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
+	.pattern_validate = flow_hw_pattern_validate,
 	.pattern_template_create = flow_hw_pattern_template_create,
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
+	.actions_validate = flow_hw_actions_validate,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
 	.template_table_create = flow_hw_template_table_create,
@@ -4603,7 +4694,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4678,7 +4769,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -5036,4 +5127,155 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+void
+mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->mtr_policy_arr) {
+		mlx5_free(priv->mtr_policy_arr);
+		priv->mtr_policy_arr = NULL;
+	}
+	if (priv->mtr_profile_arr) {
+		mlx5_free(priv->mtr_profile_arr);
+		priv->mtr_profile_arr = NULL;
+	}
+	if (priv->mtr_bulk.aso) {
+		mlx5_free(priv->mtr_bulk.aso);
+		priv->mtr_bulk.aso = NULL;
+		priv->mtr_bulk.size = 0;
+		mlx5_aso_queue_uninit(priv->sh, ASO_OPC_MOD_POLICER);
+	}
+	if (priv->mtr_bulk.action) {
+		mlx5dr_action_destroy(priv->mtr_bulk.action);
+		priv->mtr_bulk.action = NULL;
+	}
+	if (priv->mtr_bulk.devx_obj) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->mtr_bulk.devx_obj));
+		priv->mtr_bulk.devx_obj = NULL;
+	}
+}
+
+int
+mlx5_flow_meter_init(struct rte_eth_dev *dev,
+		     uint32_t nb_meters,
+		     uint32_t nb_meter_profiles,
+		     uint32_t nb_meter_policies)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_obj *dcs = NULL;
+	uint32_t log_obj_size;
+	int ret = 0;
+	int reg_id;
+	struct mlx5_aso_mtr *aso;
+	uint32_t i;
+	struct rte_flow_error error;
+
+	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter configuration is invalid.");
+		goto err;
+	}
+	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO is not supported.");
+		goto err;
+	}
+	priv->mtr_config.nb_meters = nb_meters;
+	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	log_obj_size = rte_log2_u32(nb_meters >> 1);
+	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
+		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
+			log_obj_size);
+	if (!dcs) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO object allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.devx_obj = dcs;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	if (reg_id < 0) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter register is not available.");
+		goto err;
+	}
+	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
+			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
+				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
+				MLX5DR_ACTION_FLAG_HWS_TX |
+				MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!priv->mtr_bulk.action) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter action creation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
+						sizeof(struct mlx5_aso_mtr) * nb_meters,
+						RTE_CACHE_LINE_SIZE,
+						SOCKET_ID_ANY);
+	if (!priv->mtr_bulk.aso) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter bulk ASO allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.size = nb_meters;
+	aso = priv->mtr_bulk.aso;
+	for (i = 0; i < priv->mtr_bulk.size; i++) {
+		aso->type = ASO_METER_DIRECT;
+		aso->state = ASO_METER_WAIT;
+		aso->offset = i;
+		aso++;
+	}
+	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
+	priv->mtr_profile_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_profile) *
+				nb_meter_profiles,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_profile_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter profile allocation failed.");
+		goto err;
+	}
+	priv->mtr_config.nb_meter_policies = nb_meter_policies;
+	priv->mtr_policy_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_policy) *
+				nb_meter_policies,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_policy_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter policy allocation failed.");
+		goto err;
+	}
+	return 0;
+err:
+	mlx5_flow_meter_uninit(dev);
+	return ret;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index d4aafe4eea..8cf24d1f7a 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -98,6 +98,8 @@ mlx5_flow_meter_profile_find(struct mlx5_priv *priv, uint32_t meter_profile_id)
 	union mlx5_l3t_data data;
 	int32_t ret;
 
+	if (priv->mtr_profile_arr)
+		return &priv->mtr_profile_arr[meter_profile_id];
 	if (mlx5_l3t_get_entry(priv->mtr_profile_tbl,
 			       meter_profile_id, &data) || !data.ptr)
 		return NULL;
@@ -145,17 +147,29 @@ mlx5_flow_meter_profile_validate(struct rte_eth_dev *dev,
 					  RTE_MTR_ERROR_TYPE_METER_PROFILE,
 					  NULL, "Meter profile is null.");
 	/* Meter profile ID must be valid. */
-	if (meter_profile_id == UINT32_MAX)
-		return -rte_mtr_error_set(error, EINVAL,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL, "Meter profile id not valid.");
-	/* Meter profile must not exist. */
-	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
-	if (fmp)
-		return -rte_mtr_error_set(error, EEXIST,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL,
-					  "Meter profile already exists.");
+	if (priv->mtr_profile_arr) {
+		if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp->initialized)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	} else {
+		if (meter_profile_id == UINT32_MAX)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	}
 	if (!priv->sh->meter_aso_en) {
 		/* Old version is even not supported. */
 		if (!priv->sh->cdev->config.hca_attr.qos.flow_meter_old)
@@ -574,6 +588,96 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to add MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[in] profile
+ *   Pointer to meter profile detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_add(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_meter_profile *profile,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+	int ret;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Check input params. */
+	ret = mlx5_flow_meter_profile_validate(dev, meter_profile_id,
+					       profile, error);
+	if (ret)
+		return ret;
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	/* Fill profile info. */
+	fmp->id = meter_profile_id;
+	fmp->profile = *profile;
+	fmp->initialized = 1;
+	/* Fill the flow meter parameters for the PRM. */
+	return mlx5_flow_meter_param_fill(fmp, error);
+}
+
+/**
+ * Callback to delete MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_delete(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Meter id must be valid. */
+	if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id not valid.");
+	/* Meter profile must exist. */
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	if (!fmp->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id is invalid.");
+	/* Check profile is unused. */
+	if (fmp->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  NULL, "Meter profile is in use.");
+	memset(fmp, 0, sizeof(struct mlx5_flow_meter_profile));
+	return 0;
+}
+
 /**
  * Find policy by id.
  *
@@ -594,6 +698,11 @@ mlx5_flow_meter_policy_find(struct rte_eth_dev *dev,
 	struct mlx5_flow_meter_sub_policy *sub_policy = NULL;
 	union mlx5_l3t_data data;
 
+	if (priv->mtr_policy_arr) {
+		if (policy_idx)
+			*policy_idx = policy_id;
+		return &priv->mtr_policy_arr[policy_id];
+	}
 	if (policy_id > MLX5_MAX_SUB_POLICY_TBL_NUM || !priv->policy_idx_tbl)
 		return NULL;
 	if (mlx5_l3t_get_entry(priv->policy_idx_tbl, policy_id, &data) ||
@@ -710,6 +819,43 @@ mlx5_flow_meter_policy_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to check MTR policy action validate for HWS
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_validate(struct rte_eth_dev *dev,
+	struct rte_mtr_meter_policy_params *policy,
+	struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_actions_template_attr attr = {
+		.transfer = priv->sh->config.dv_esw_en ? 1 : 0 };
+	int ret;
+	int i;
+
+	if (!priv->mtr_en || !priv->sh->meter_aso_en)
+		return -rte_mtr_error_set(error, ENOTSUP,
+				RTE_MTR_ERROR_TYPE_METER_POLICY,
+				NULL, "meter policy unsupported.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		ret = mlx5_flow_actions_validate(dev, &attr, policy->actions[i],
+						 policy->actions[i], NULL);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int
 __mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 			uint32_t policy_id,
@@ -1004,6 +1150,338 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to delete MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_delete(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy;
+	uint32_t i, j;
+	uint32_t nb_flows = 0;
+	int ret;
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter policy array is not allocated");
+	/* Meter id must be valid. */
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  &policy_id,
+					  "Meter policy id not valid.");
+	/* Meter policy must exist. */
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (!mtr_policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID, NULL,
+			"Meter policy does not exists.");
+	/* Check policy is unused. */
+	if (mtr_policy->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy is in use.");
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->hws_flow_rule[i][j]) {
+				ret = rte_flow_async_destroy(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_rule[i][j],
+					NULL, NULL);
+				if (ret < 0)
+					continue;
+				nb_flows++;
+			}
+		}
+	}
+	ret = rte_flow_push(dev->data->port_id, CTRL_QUEUE_ID(priv), NULL);
+	while (nb_flows && (ret >= 0)) {
+		ret = rte_flow_pull(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), result,
+					nb_flows, NULL);
+		nb_flows -= ret;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		if (mtr_policy->hws_flow_table[i])
+			rte_flow_template_table_destroy(dev->data->port_id,
+				 mtr_policy->hws_flow_table[i], NULL);
+	}
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->hws_act_templ[i])
+			rte_flow_actions_template_destroy(dev->data->port_id,
+				 mtr_policy->hws_act_templ[i], NULL);
+	}
+	if (mtr_policy->hws_item_templ)
+		rte_flow_pattern_template_destroy(dev->data->port_id,
+				mtr_policy->hws_item_templ, NULL);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	return 0;
+}
+
+/**
+ * Callback to add MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[out] policy_id
+ *   Pointer to policy id
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
+			uint32_t policy_id,
+			struct rte_mtr_meter_policy_params *policy,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy = NULL;
+	const struct rte_flow_action *act;
+	const struct rte_flow_action_meter *mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *plc;
+	uint8_t domain_color = MLX5_MTR_ALL_DOMAIN_BIT;
+	bool is_rss = false;
+	bool is_hierarchy = false;
+	int i, j;
+	uint32_t nb_colors = 0;
+	uint32_t nb_flows = 0;
+	int color;
+	int ret;
+	struct rte_flow_pattern_template_attr pta = {0};
+	struct rte_flow_actions_template_attr ata = {0};
+	struct rte_flow_template_table_attr ta = { {0}, 0 };
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+	const uint32_t color_mask = (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	int color_reg_c_idx = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						   0, NULL);
+	struct rte_flow_item_tag tag_spec = {
+		.data = 0,
+		.index = color_reg_c_idx
+	};
+	struct rte_flow_item_tag tag_mask = {
+		.data = color_mask,
+		.index = 0xff};
+	struct rte_flow_item pattern[] = {
+		[0] = {
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &tag_spec,
+			.mask = &tag_mask,
+		},
+		[1] = { .type = RTE_FLOW_ITEM_TYPE_END }
+	};
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy array is not allocated.");
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy id not valid.");
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (mtr_policy->initialized)
+		return -rte_mtr_error_set(error, EEXIST,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy already exists.");
+	if (!policy ||
+	    !policy->actions[RTE_COLOR_RED] ||
+	    !policy->actions[RTE_COLOR_YELLOW] ||
+	    !policy->actions[RTE_COLOR_GREEN])
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy actions are not valid.");
+	if (policy->actions[RTE_COLOR_RED] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_r = 1;
+	if (policy->actions[RTE_COLOR_YELLOW] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_y = 1;
+	if (policy->actions[RTE_COLOR_GREEN] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_g = 1;
+	if (mtr_policy->skip_r && mtr_policy->skip_y && mtr_policy->skip_g)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy actions are empty.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		act = policy->actions[i];
+		while (act && act->type != RTE_FLOW_ACTION_TYPE_END) {
+			switch (act->type) {
+			case RTE_FLOW_ACTION_TYPE_PORT_ID:
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+				domain_color &= ~(MLX5_MTR_DOMAIN_INGRESS_BIT |
+						  MLX5_MTR_DOMAIN_EGRESS_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_RSS:
+				is_rss = true;
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_QUEUE:
+				domain_color &= ~(MLX5_MTR_DOMAIN_EGRESS_BIT |
+						  MLX5_MTR_DOMAIN_TRANSFER_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_METER:
+				is_hierarchy = true;
+				mtr = act->conf;
+				fm = mlx5_flow_meter_find(priv,
+							  mtr->mtr_id, NULL);
+				if (!fm)
+					return -rte_mtr_error_set(error, EINVAL,
+						RTE_MTR_ERROR_TYPE_MTR_ID, NULL,
+						"Meter not found in meter hierarchy.");
+				plc = mlx5_flow_meter_policy_find(dev,
+								  fm->policy_id,
+								  NULL);
+				MLX5_ASSERT(plc);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->ingress <<
+					 MLX5_MTR_DOMAIN_INGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->egress <<
+					 MLX5_MTR_DOMAIN_EGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->transfer <<
+					 MLX5_MTR_DOMAIN_TRANSFER);
+				break;
+			default:
+				break;
+			}
+			act++;
+		}
+	}
+	if (!domain_color)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy domains are conflicting.");
+	mtr_policy->is_rss = is_rss;
+	mtr_policy->ingress = !!(domain_color & MLX5_MTR_DOMAIN_INGRESS_BIT);
+	pta.ingress = mtr_policy->ingress;
+	mtr_policy->egress = !!(domain_color & MLX5_MTR_DOMAIN_EGRESS_BIT);
+	pta.egress = mtr_policy->egress;
+	mtr_policy->transfer = !!(domain_color & MLX5_MTR_DOMAIN_TRANSFER_BIT);
+	pta.transfer = mtr_policy->transfer;
+	mtr_policy->group = MLX5_FLOW_TABLE_HWS_POLICY - policy_id;
+	mtr_policy->is_hierarchy = is_hierarchy;
+	mtr_policy->initialized = 1;
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	mtr_policy->hws_item_templ =
+		rte_flow_pattern_template_create(dev->data->port_id,
+						 &pta, pattern, NULL);
+	if (!mtr_policy->hws_item_templ)
+		goto policy_add_err;
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->skip_g && i == RTE_COLOR_GREEN)
+			continue;
+		if (mtr_policy->skip_y && i == RTE_COLOR_YELLOW)
+			continue;
+		if (mtr_policy->skip_r && i == RTE_COLOR_RED)
+			continue;
+		mtr_policy->hws_act_templ[nb_colors] =
+			rte_flow_actions_template_create(dev->data->port_id,
+						&ata, policy->actions[i],
+						policy->actions[i], NULL);
+		if (!mtr_policy->hws_act_templ[nb_colors])
+			goto policy_add_err;
+		nb_colors++;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		memset(&ta, 0, sizeof(ta));
+		ta.nb_flows = RTE_COLORS;
+		ta.flow_attr.group = mtr_policy->group;
+		if (i == MLX5_MTR_DOMAIN_INGRESS) {
+			if (!mtr_policy->ingress)
+				continue;
+			ta.flow_attr.ingress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_EGRESS) {
+			if (!mtr_policy->egress)
+				continue;
+			ta.flow_attr.egress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_TRANSFER) {
+			if (!mtr_policy->transfer)
+				continue;
+			ta.flow_attr.transfer = 1;
+		}
+		mtr_policy->hws_flow_table[i] =
+			rte_flow_template_table_create(dev->data->port_id,
+					&ta, &mtr_policy->hws_item_templ, 1,
+					mtr_policy->hws_act_templ, nb_colors,
+					NULL);
+		if (!mtr_policy->hws_flow_table[i])
+			goto policy_add_err;
+		nb_colors = 0;
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->skip_g && j == RTE_COLOR_GREEN)
+				continue;
+			if (mtr_policy->skip_y && j == RTE_COLOR_YELLOW)
+				continue;
+			if (mtr_policy->skip_r && j == RTE_COLOR_RED)
+				continue;
+			color = rte_col_2_mlx5_col((enum rte_color)j);
+			tag_spec.data = color;
+			mtr_policy->hws_flow_rule[i][j] =
+				rte_flow_async_create(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_table[i],
+					pattern, 0, policy->actions[j],
+					nb_colors, NULL, NULL);
+			if (!mtr_policy->hws_flow_rule[i][j])
+				goto policy_add_err;
+			nb_colors++;
+			nb_flows++;
+		}
+		ret = rte_flow_push(dev->data->port_id,
+				    CTRL_QUEUE_ID(priv), NULL);
+		if (ret < 0)
+			goto policy_add_err;
+		while (nb_flows) {
+			ret = rte_flow_pull(dev->data->port_id,
+					    CTRL_QUEUE_ID(priv), result,
+					    nb_flows, NULL);
+			if (ret < 0)
+				goto policy_add_err;
+			for (j = 0; j < ret; j++) {
+				if (result[j].status == RTE_FLOW_OP_ERROR)
+					goto policy_add_err;
+			}
+			nb_flows -= ret;
+		}
+	}
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+policy_add_err:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	ret = mlx5_flow_meter_policy_hws_delete(dev, policy_id, error);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	if (ret)
+		return ret;
+	return -rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Failed to create meter policy.");
+}
+
 /**
  * Check meter validation.
  *
@@ -1087,7 +1565,8 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
@@ -1336,7 +1815,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1369,6 +1849,90 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 		NULL, "Failed to create devx meter.");
 }
 
+/**
+ * Create meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[in] params
+ *   Pointer to rte meter parameters.
+ * @param[in] shared
+ *   Meter shared with other flow or not.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
+		       struct rte_mtr_params *params, int shared,
+		       struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *profile;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy = NULL;
+	struct mlx5_aso_mtr *aso_mtr;
+	int ret;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+			"Meter bulk array is not allocated.");
+	/* Meter profile must exist. */
+	profile = mlx5_flow_meter_profile_find(priv, params->meter_profile_id);
+	if (!profile->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+			NULL, "Meter profile id not valid.");
+	/* Meter policy must exist. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			params->meter_policy_id, NULL);
+	if (!policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy id not valid.");
+	/* Meter ID must be valid. */
+	if (meter_id >= priv->mtr_config.nb_meters)
+		return -rte_mtr_error_set(error, EINVAL,
+			RTE_MTR_ERROR_TYPE_MTR_ID,
+			NULL, "Meter id not valid.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object already exists.");
+	/* Fill the flow meter parameters. */
+	fm->meter_id = meter_id;
+	fm->policy_id = params->meter_policy_id;
+	fm->profile = profile;
+	fm->meter_offset = meter_id;
+	fm->group = policy->group;
+	/* Add to the flow meter list. */
+	fm->active_state = 1; /* Config meter starts as active. */
+	fm->is_enable = params->meter_enable;
+	fm->shared = !!shared;
+	fm->initialized = 1;
+	/* Update ASO flow meter by wqe. */
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+					   &priv->mtr_bulk);
+	if (ret)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+			NULL, "Failed to create devx meter.");
+	fm->active_state = params->meter_enable;
+	__atomic_add_fetch(&fm->profile->ref_cnt, 1, __ATOMIC_RELAXED);
+	__atomic_add_fetch(&policy->ref_cnt, 1, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
 mlx5_flow_meter_params_flush(struct rte_eth_dev *dev,
 			struct mlx5_flow_meter_info *fm,
@@ -1475,6 +2039,58 @@ mlx5_flow_meter_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
 	return 0;
 }
 
+/**
+ * Destroy meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_METER_POLICY, NULL,
+			"Meter bulk array is not allocated.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (!fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object id not valid.");
+	/* Meter object must not have any owner. */
+	if (fm->ref_cnt > 0)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter object is being used.");
+	/* Destroy the meter profile. */
+	__atomic_sub_fetch(&fm->profile->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	/* Destroy the meter policy. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			fm->policy_id, NULL);
+	__atomic_sub_fetch(&policy->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	memset(fm, 0, sizeof(struct mlx5_flow_meter_info));
+	return 0;
+}
+
 /**
  * Modify meter state.
  *
@@ -1798,6 +2414,23 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.stats_read = mlx5_flow_meter_stats_read,
 };
 
+static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
+	.capabilities_get = mlx5_flow_mtr_cap_get,
+	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
+	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
+	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
+	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.create = mlx5_flow_meter_hws_create,
+	.destroy = mlx5_flow_meter_hws_destroy,
+	.meter_enable = mlx5_flow_meter_enable,
+	.meter_disable = mlx5_flow_meter_disable,
+	.meter_profile_update = mlx5_flow_meter_profile_update,
+	.meter_dscp_table_update = NULL,
+	.stats_update = NULL,
+	.stats_read = NULL,
+};
+
 /**
  * Get meter operations.
  *
@@ -1812,7 +2445,12 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 int
 mlx5_flow_meter_ops_get(struct rte_eth_dev *dev __rte_unused, void *arg)
 {
-	*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_hws_ops;
+	else
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
 	return 0;
 }
 
@@ -1841,6 +2479,12 @@ mlx5_flow_meter_find(struct mlx5_priv *priv, uint32_t meter_id,
 	union mlx5_l3t_data data;
 	uint16_t n_valid;
 
+	if (priv->mtr_bulk.aso) {
+		if (mtr_idx)
+			*mtr_idx = meter_id;
+		aso_mtr = priv->mtr_bulk.aso + meter_id;
+		return &aso_mtr->fm;
+	}
 	if (priv->sh->meter_aso_en) {
 		rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 		n_valid = pools_mng->n_valid;
@@ -2185,6 +2829,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 	struct mlx5_flow_meter_profile *fmp;
 	struct mlx5_legacy_flow_meter *legacy_fm;
 	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
 	struct mlx5_flow_meter_sub_policy *sub_policy;
 	void *tmp;
 	uint32_t i, mtr_idx, policy_idx;
@@ -2219,6 +2864,14 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 				NULL, "MTR object meter profile invalid.");
 		}
 	}
+	if (priv->mtr_bulk.aso) {
+		for (i = 1; i <= priv->mtr_config.nb_meter_profiles; i++) {
+			aso_mtr = mlx5_aso_meter_by_idx(priv, i);
+			fm = &aso_mtr->fm;
+			if (fm->initialized)
+				mlx5_flow_meter_hws_destroy(dev, i, error);
+		}
+	}
 	if (priv->policy_idx_tbl) {
 		MLX5_L3T_FOREACH(priv->policy_idx_tbl, i, entry) {
 			policy_idx = *(uint32_t *)entry;
@@ -2244,6 +2897,15 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->policy_idx_tbl);
 		priv->policy_idx_tbl = NULL;
 	}
+	if (priv->mtr_policy_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_policies; i++) {
+			policy = mlx5_flow_meter_policy_find(dev, i,
+							     &policy_idx);
+			if (policy->initialized)
+				mlx5_flow_meter_policy_hws_delete(dev, i,
+								  error);
+		}
+	}
 	if (priv->mtr_profile_tbl) {
 		MLX5_L3T_FOREACH(priv->mtr_profile_tbl, i, entry) {
 			fmp = entry;
@@ -2257,9 +2919,21 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->mtr_profile_tbl);
 		priv->mtr_profile_tbl = NULL;
 	}
+	if (priv->mtr_profile_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_profiles; i++) {
+			fmp = mlx5_flow_meter_profile_find(priv, i);
+			if (fmp->initialized)
+				mlx5_flow_meter_profile_hws_delete(dev, i,
+								   error);
+		}
+	}
 	/* Delete default policy table. */
 	mlx5_flow_destroy_def_policy(dev);
 	if (priv->sh->refcnt == 1)
 		mlx5_flow_destroy_mtr_drop_tbls(dev);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	/* Destroy HWS configuration. */
+	mlx5_flow_meter_uninit(dev);
+#endif
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 08/18] net/mlx5: add HW steering counter action
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (6 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 07/18] net/mlx5: add HW steering meter action Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 09/18] net/mlx5: support DR action template API Suanming Mou
                     ` (9 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Xiaoyu Min

From: Xiaoyu Min <jackmin@nvidia.com>

This commit adds HW steering counter action support.
Pool mechanism is the basic data structure for the HW steering counter.

The HW steering's counter pool is based on the rte_ring of zero-copy
variation.

There are two global rte_rings:
1. free_list:
     Store the counters indexes, which are ready for use.
2. wait_reset_list:
     Store the counters indexes, which are just freed from the user and
     need to query the hardware counter to get the reset value before
     this counter can be reused again.

The counter pool also supports cache per HW steering's queues, which are
also based on rte_ring of zero-copy variation.

The cache can be configured in size, preload, threshold, and fetch size,
they are all exposed via device args.

The main operations of the counter pool are as follows:

 - Get one counter from the pool:
   1. The user call _get_* API.
   2. If the cache is enabled, dequeue one counter index from the local
      cache:
      2.A: if the dequeued one from the local cache is still in reset
	status (counter's query_gen_when_free is equal to pool's query
	gen):
	I. Flush all counters in local cache back to global
	   wait_reset_list.
	II. Fetch _fetch_sz_ counters into the cache from the global
	    free list.
	III. Fetch one counter from the cache.
   3. If the cache is empty, fetch _fetch_sz_ counters from the global
      free list into the cache and fetch one counter from the cache.
 - Free one counter into the pool:
   1. The user calls _put_* API.
   2. Put the counter into the local cache.
   3. If the local cache is full:
      3.A: Write back all counters above _threshold_ into the global
           wait_reset_list.
      3.B: Also, write back this counter into the global wait_reset_list.

When the local cache is disabled, _get_/_put_ cache directly from/into
global list.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  50 +++
 drivers/common/mlx5/mlx5_devx_cmds.h |  27 ++
 drivers/common/mlx5/mlx5_prm.h       |  20 +-
 drivers/common/mlx5/version.map      |   1 +
 drivers/net/mlx5/meson.build         |   1 +
 drivers/net/mlx5/mlx5.c              |  14 +
 drivers/net/mlx5/mlx5.h              |  27 ++
 drivers/net/mlx5/mlx5_defs.h         |   2 +
 drivers/net/mlx5/mlx5_flow.c         |  27 +-
 drivers/net/mlx5/mlx5_flow.h         |   5 +
 drivers/net/mlx5/mlx5_flow_aso.c     | 261 ++++++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c      | 340 +++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.c      | 528 +++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_hws_cnt.h      | 558 +++++++++++++++++++++++++++
 14 files changed, 1830 insertions(+), 31 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 9c185366d0..05b9429c7f 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -176,6 +176,41 @@ mlx5_devx_cmd_register_write(void *ctx, uint16_t reg_id, uint32_t arg,
 	return 0;
 }
 
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+		struct mlx5_devx_counter_attr *attr)
+{
+	struct mlx5_devx_obj *dcs = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*dcs),
+						0, SOCKET_ID_ANY);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	if (attr->bulk_log_max_alloc)
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk_log_size,
+			 attr->flow_counter_bulk_log_size);
+	else
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk,
+			 attr->bulk_n_128);
+	if (attr->pd_valid)
+		MLX5_SET(alloc_flow_counter_in, in, pd, attr->pd);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		mlx5_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
 /**
  * Allocate flow counters via devx interface.
  *
@@ -967,6 +1002,16 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 					 general_obj_types) &
 			      MLX5_GENERAL_OBJ_TYPES_CAP_CONN_TRACK_OFFLOAD);
 	attr->rq_delay_drop = MLX5_GET(cmd_hca_cap, hcattr, rq_delay_drop);
+	attr->max_flow_counter_15_0 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_15_0);
+	attr->max_flow_counter_31_16 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_31_16);
+	attr->alloc_flow_counter_pd = MLX5_GET(cmd_hca_cap, hcattr,
+			alloc_flow_counter_pd);
+	attr->flow_counter_access_aso = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_counter_access_aso);
+	attr->flow_access_aso_opc_mod = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_access_aso_opc_mod);
 	if (attr->crypto) {
 		attr->aes_xts = MLX5_GET(cmd_hca_cap, hcattr, aes_xts);
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
@@ -995,6 +1040,11 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 							   hairpin_sq_wq_in_host_mem);
 		attr->hairpin_data_buffer_locked = MLX5_GET(cmd_hca_cap_2, hcattr,
 							    hairpin_data_buffer_locked);
+		attr->flow_counter_bulk_log_max_alloc = MLX5_GET(cmd_hca_cap_2,
+				hcattr, flow_counter_bulk_log_max_alloc);
+		attr->flow_counter_bulk_log_granularity =
+			MLX5_GET(cmd_hca_cap_2, hcattr,
+				 flow_counter_bulk_log_granularity);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index a10aa3331b..c94b9eac06 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -15,6 +15,16 @@
 #define MLX5_DEVX_MAX_KLM_ENTRIES ((UINT16_MAX - \
 		MLX5_ST_SZ_DW(create_mkey_in) * 4) / (MLX5_ST_SZ_DW(klm) * 4))
 
+struct mlx5_devx_counter_attr {
+	uint32_t pd_valid:1;
+	uint32_t pd:24;
+	uint32_t bulk_log_max_alloc:1;
+	union {
+		uint8_t flow_counter_bulk_log_size;
+		uint8_t bulk_n_128;
+	};
+};
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
@@ -266,6 +276,18 @@ struct mlx5_hca_attr {
 	uint32_t set_reg_c:8;
 	uint32_t nic_flow_table:1;
 	uint32_t modify_outer_ip_ecn:1;
+	union {
+		uint32_t max_flow_counter;
+		struct {
+			uint16_t max_flow_counter_15_0;
+			uint16_t max_flow_counter_31_16;
+		};
+	};
+	uint32_t flow_counter_bulk_log_max_alloc:5;
+	uint32_t flow_counter_bulk_log_granularity:5;
+	uint32_t alloc_flow_counter_pd:1;
+	uint32_t flow_counter_access_aso:1;
+	uint32_t flow_access_aso_opc_mod:8;
 };
 
 /* LAG Context. */
@@ -598,6 +620,11 @@ struct mlx5_devx_crypto_login_attr {
 
 /* mlx5_devx_cmds.c */
 
+__rte_internal
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+				struct mlx5_devx_counter_attr *attr);
+
 __rte_internal
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(void *ctx,
 						       uint32_t bulk_sz);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index fb3c43eed9..2b5c43ee6e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1170,8 +1170,10 @@ struct mlx5_ifc_alloc_flow_counter_in_bits {
 	u8 reserved_at_10[0x10];
 	u8 reserved_at_20[0x10];
 	u8 op_mod[0x10];
-	u8 flow_counter_id[0x20];
-	u8 reserved_at_40[0x18];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x13];
+	u8 flow_counter_bulk_log_size[0x5];
 	u8 flow_counter_bulk[0x8];
 };
 
@@ -1405,7 +1407,13 @@ enum {
 #define MLX5_STEERING_LOGIC_FORMAT_CONNECTX_6DX 0x1
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x20];
+	u8 access_other_hca_roce[0x1];
+	u8 alloc_flow_counter_pd[0x1];
+	u8 flow_counter_access_aso[0x1];
+	u8 reserved_at_3[0x5];
+	u8 flow_access_aso_opc_mod[0x8];
+	u8 reserved_at_10[0xf];
+	u8 vhca_resource_manager[0x1];
 	u8 hca_cap_2[0x1];
 	u8 reserved_at_21[0xf];
 	u8 vhca_id[0x10];
@@ -2118,7 +2126,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 format_select_dw_8_6_ext[0x1];
 	u8 reserved_at_1ac[0x14];
 	u8 general_obj_types_127_64[0x40];
-	u8 reserved_at_200[0x80];
+	u8 reserved_at_200[0x53];
+	u8 flow_counter_bulk_log_max_alloc[0x5];
+	u8 reserved_at_258[0x3];
+	u8 flow_counter_bulk_log_granularity[0x5];
+	u8 reserved_at_260[0x20];
 	u8 format_select_dw_gtpu_dw_0[0x8];
 	u8 format_select_dw_gtpu_dw_1[0x8];
 	u8 format_select_dw_gtpu_dw_2[0x8];
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 413dec14ab..4f72900519 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -40,6 +40,7 @@ INTERNAL {
 	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_alloc_general;
 	mlx5_devx_cmd_flow_counter_query;
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_flow_single_dump;
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index c3b8fa16d3..0b506e52b4 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -41,6 +41,7 @@ sources = files(
 if is_linux
     sources += files(
             'mlx5_flow_hw.c',
+	    'mlx5_hws_cnt.c',
             'mlx5_flow_verbs.c',
     )
     if (dpdk_conf.has('RTE_ARCH_X86_64')
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9cd4892858..4d87da8e29 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -175,6 +175,12 @@
 /* Device parameter to create the fdb default rule in PMD */
 #define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
 
+/* HW steering counter configuration. */
+#define MLX5_HWS_CNT_SERVICE_CORE "service_core"
+
+/* HW steering counter's query interval. */
+#define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1245,6 +1251,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->allow_duplicate_pattern = !!tmp;
 	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
 		config->fdb_def_rule = !!tmp;
+	} else if (strcmp(MLX5_HWS_CNT_SERVICE_CORE, key) == 0) {
+		config->cnt_svc.service_core = tmp;
+	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
+		config->cnt_svc.cycle_time = tmp;
 	}
 	return 0;
 }
@@ -1281,6 +1291,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
 		MLX5_FDB_DEFAULT_RULE_EN,
+		MLX5_HWS_CNT_SERVICE_CORE,
+		MLX5_HWS_CNT_CYCLE_TIME,
 		NULL,
 	};
 	int ret = 0;
@@ -1293,6 +1305,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
 	config->fdb_def_rule = 1;
+	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
+	config->cnt_svc.service_core = rte_get_main_lcore();
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 325f0b31c5..c71db131a1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -313,6 +313,10 @@ struct mlx5_sh_config {
 	uint32_t hw_fcs_strip:1; /* FCS stripping is supported. */
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
+	struct {
+		uint16_t service_core;
+		uint32_t cycle_time; /* query cycle time in milli-second. */
+	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
@@ -1229,6 +1233,22 @@ struct mlx5_flex_item {
 	struct mlx5_flex_pattern_field map[MLX5_FLEX_ITEM_MAPPING_NUM];
 };
 
+#define HWS_CNT_ASO_SQ_NUM 4
+
+struct mlx5_hws_aso_mng {
+	uint16_t sq_num;
+	struct mlx5_aso_sq sqs[HWS_CNT_ASO_SQ_NUM];
+};
+
+struct mlx5_hws_cnt_svc_mng {
+	uint32_t refcnt;
+	uint32_t service_core;
+	uint32_t query_interval;
+	pthread_t service_thread;
+	uint8_t svc_running;
+	struct mlx5_hws_aso_mng aso_mng __rte_cache_aligned;
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -1328,6 +1348,7 @@ struct mlx5_dev_ctx_shared {
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
+	struct mlx5_hws_cnt_svc_mng *cnt_svc;
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1614,6 +1635,7 @@ struct mlx5_priv {
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
 	uint32_t nb_queue; /* HW steering queue number. */
+	struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
@@ -2044,6 +2066,11 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
+void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
+int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
 /* mlx5_flow_flex.c */
 
 struct rte_flow_item_flex_handle *
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 585afb0a98..d064abfef3 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -188,4 +188,6 @@
 #define static_assert _Static_assert
 #endif
 
+#define MLX5_CNT_SVC_CYCLE_TIME_DEFAULT 500
+
 #endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 892c42a10b..38932fe9d7 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7834,24 +7834,33 @@ mlx5_flow_isolate(struct rte_eth_dev *dev,
  */
 static int
 flow_drv_query(struct rte_eth_dev *dev,
-	       uint32_t flow_idx,
+	       struct rte_flow *eflow,
 	       const struct rte_flow_action *actions,
 	       void *data,
 	       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow *flow = mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
-					       flow_idx);
-	enum mlx5_flow_drv_type ftype;
+	struct rte_flow *flow = NULL;
+	enum mlx5_flow_drv_type ftype = MLX5_FLOW_TYPE_MIN;
 
+	if (priv->sh->config.dv_flow_en == 2) {
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		flow = eflow;
+		ftype = MLX5_FLOW_TYPE_HW;
+#endif
+	} else {
+		flow = (struct rte_flow *)mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
+				(uintptr_t)(void *)eflow);
+	}
 	if (!flow) {
 		return rte_flow_error_set(error, ENOENT,
 			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			  NULL,
 			  "invalid flow handle");
 	}
-	ftype = flow->drv_type;
+	if (ftype == MLX5_FLOW_TYPE_MIN)
+		ftype = flow->drv_type;
 	MLX5_ASSERT(ftype > MLX5_FLOW_TYPE_MIN && ftype < MLX5_FLOW_TYPE_MAX);
 	fops = flow_get_drv_ops(ftype);
 
@@ -7872,14 +7881,8 @@ mlx5_flow_query(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	int ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (priv->sh->config.dv_flow_en == 2)
-		return rte_flow_error_set(error, ENOTSUP,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-			  NULL,
-			  "Flow non-Q query not supported");
-	ret = flow_drv_query(dev, (uintptr_t)(void *)flow, actions, data,
+	ret = flow_drv_query(dev, flow, actions, data,
 			     error);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 5f89afbe29..1948de5dd8 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1109,6 +1109,7 @@ struct rte_flow_hw {
 		struct mlx5_hrxq *hrxq; /* TIR action. */
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
+	uint32_t cnt_id;
 	uint8_t rule[0]; /* HWS layer data struct. */
 } __rte_packed;
 
@@ -1157,6 +1158,9 @@ struct mlx5_action_construct_data {
 			uint32_t level; /* RSS level. */
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
+		struct {
+			uint32_t id;
+		} shared_counter;
 	};
 };
 
@@ -1235,6 +1239,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
+	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 60d0280367..ed9272e583 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -12,6 +12,9 @@
 
 #include "mlx5.h"
 #include "mlx5_flow.h"
+#include "mlx5_hws_cnt.h"
+
+#define MLX5_ASO_CNT_QUEUE_LOG_DESC 14
 
 /**
  * Free MR resources.
@@ -79,6 +82,33 @@ mlx5_aso_destroy_sq(struct mlx5_aso_sq *sq)
 	memset(sq, 0, sizeof(*sq));
 }
 
+/**
+ * Initialize Send Queue used for ASO access counter.
+ *
+ * @param[in] sq
+ *   ASO SQ to initialize.
+ */
+static void
+mlx5_aso_cnt_init_sq(struct mlx5_aso_sq *sq)
+{
+	volatile struct mlx5_aso_wqe *restrict wqe;
+	int i;
+	int size = 1 << sq->log_desc_n;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0, wqe = &sq->sq_obj.aso_wqes[0]; i < size; ++i, ++wqe) {
+		wqe->general_cseg.sq_ds = rte_cpu_to_be_32((sq->sqn << 8) |
+							  (sizeof(*wqe) >> 4));
+		wqe->aso_cseg.operand_masks = rte_cpu_to_be_32
+			(0u |
+			 (ASO_OPER_LOGICAL_OR << ASO_CSEG_COND_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_1_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_0_OPER_OFFSET) |
+			 (BYTEWISE_64BYTE << ASO_CSEG_DATA_MASK_MODE_OFFSET));
+		wqe->aso_cseg.data_mask = RTE_BE64(UINT64_MAX);
+	}
+}
+
 /**
  * Initialize Send Queue used for ASO access.
  *
@@ -191,7 +221,7 @@ mlx5_aso_ct_init_sq(struct mlx5_aso_sq *sq)
  */
 static int
 mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
-		   void *uar)
+		   void *uar, uint16_t log_desc_n)
 {
 	struct mlx5_devx_cq_attr cq_attr = {
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(uar),
@@ -212,12 +242,12 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	int ret;
 
 	if (mlx5_devx_cq_create(cdev->ctx, &sq->cq.cq_obj,
-				MLX5_ASO_QUEUE_LOG_DESC, &cq_attr,
+				log_desc_n, &cq_attr,
 				SOCKET_ID_ANY))
 		goto error;
 	sq->cq.cq_ci = 0;
-	sq->cq.log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
-	sq->log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
+	sq->cq.log_desc_n = log_desc_n;
+	sq->log_desc_n = log_desc_n;
 	sq_attr.cqn = sq->cq.cq_obj.cq->id;
 	/* for mlx5_aso_wqe that is twice the size of mlx5_wqe */
 	log_wqbb_n = sq->log_desc_n + 1;
@@ -269,7 +299,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    sq_desc_n, &sh->aso_age_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->aso_age_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->aso_age_mng->aso_sq.mr);
 			return -1;
 		}
@@ -277,7 +308,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		break;
 	case ASO_OPC_MOD_POLICER:
 		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj))
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
 			return -1;
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
@@ -287,7 +318,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    &sh->ct_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
 			return -1;
 		}
@@ -1403,3 +1434,219 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 	rte_errno = EBUSY;
 	return -rte_errno;
 }
+
+int
+mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_hws_aso_mng *aso_mng = NULL;
+	uint8_t idx;
+	struct mlx5_aso_sq *sq;
+
+	MLX5_ASSERT(sh);
+	MLX5_ASSERT(sh->cnt_svc);
+	aso_mng = &sh->cnt_svc->aso_mng;
+	aso_mng->sq_num = HWS_CNT_ASO_SQ_NUM;
+	for (idx = 0; idx < HWS_CNT_ASO_SQ_NUM; idx++) {
+		sq = &aso_mng->sqs[idx];
+		if (mlx5_aso_sq_create(sh->cdev, sq, sh->tx_uar.obj,
+					MLX5_ASO_CNT_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_cnt_init_sq(sq);
+	}
+	return 0;
+error:
+	mlx5_aso_cnt_queue_uninit(sh);
+	return -1;
+}
+
+void
+mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh)
+{
+	uint16_t idx;
+
+	for (idx = 0; idx < sh->cnt_svc->aso_mng.sq_num; idx++)
+		mlx5_aso_destroy_sq(&sh->cnt_svc->aso_mng.sqs[idx]);
+	sh->cnt_svc->aso_mng.sq_num = 0;
+}
+
+static uint16_t
+mlx5_aso_cnt_sq_enqueue_burst(struct mlx5_hws_cnt_pool *cpool,
+		struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_aso_sq *sq, uint32_t n,
+		uint32_t offset, uint32_t dcs_id_base)
+{
+	volatile struct mlx5_aso_wqe *wqe;
+	uint16_t size = 1 << sq->log_desc_n;
+	uint16_t mask = size - 1;
+	uint16_t max;
+	uint32_t upper_offset = offset;
+	uint64_t addr;
+	uint32_t ctrl_gen_id = 0;
+	uint8_t opcmod = sh->cdev->config.hca_attr.flow_access_aso_opc_mod;
+	rte_be32_t lkey = rte_cpu_to_be_32(cpool->raw_mng->mr.lkey);
+	uint16_t aso_n = (uint16_t)(RTE_ALIGN_CEIL(n, 4) / 4);
+	uint32_t ccntid;
+
+	max = RTE_MIN(size - (uint16_t)(sq->head - sq->tail), aso_n);
+	if (unlikely(!max))
+		return 0;
+	upper_offset += (max * 4);
+	/* Because only one burst at one time, we can use the same elt. */
+	sq->elts[0].burst_size = max;
+	ctrl_gen_id = dcs_id_base;
+	ctrl_gen_id /= 4;
+	do {
+		ccntid = upper_offset - max * 4;
+		wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
+		rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
+		wqe->general_cseg.misc = rte_cpu_to_be_32(ctrl_gen_id);
+		wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+							 MLX5_COMP_MODE_OFFSET);
+		wqe->general_cseg.opcode = rte_cpu_to_be_32
+						(MLX5_OPCODE_ACCESS_ASO |
+						 (opcmod <<
+						  WQE_CSEG_OPC_MOD_OFFSET) |
+						 (sq->pi <<
+						  WQE_CSEG_WQE_INDEX_OFFSET));
+		addr = (uint64_t)RTE_PTR_ADD(cpool->raw_mng->raw,
+				ccntid * sizeof(struct flow_counter_stats));
+		wqe->aso_cseg.va_h = rte_cpu_to_be_32((uint32_t)(addr >> 32));
+		wqe->aso_cseg.va_l_r = rte_cpu_to_be_32((uint32_t)addr | 1u);
+		wqe->aso_cseg.lkey = lkey;
+		sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
+		sq->head++;
+		sq->next++;
+		ctrl_gen_id++;
+		max--;
+	} while (max);
+	wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+							 MLX5_COMP_MODE_OFFSET);
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	return sq->elts[0].burst_size;
+}
+
+static uint16_t
+mlx5_aso_cnt_completion_handle(struct mlx5_aso_sq *sq)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const unsigned int cq_size = 1 << cq->log_desc_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = cq->cq_ci & mask;
+	const uint16_t max = (uint16_t)(sq->head - sq->tail);
+	uint16_t i = 0;
+	int ret;
+	if (unlikely(!max))
+		return 0;
+	idx = next_idx;
+	next_idx = (cq->cq_ci + 1) & mask;
+	rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+	cqe = &cq->cq_obj.cqes[idx];
+	ret = check_cqe(cqe, cq_size, cq->cq_ci);
+	/*
+	 * Be sure owner read is done before any other cookie field or
+	 * opaque field.
+	 */
+	rte_io_rmb();
+	if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+		if (likely(ret == MLX5_CQE_STATUS_HW_OWN))
+			return 0; /* return immediately. */
+		mlx5_aso_cqe_err_handle(sq);
+	}
+	i += sq->elts[0].burst_size;
+	sq->elts[0].burst_size = 0;
+	cq->cq_ci++;
+	if (likely(i)) {
+		sq->tail += i;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return i;
+}
+
+static uint16_t
+mlx5_aso_cnt_query_one_dcs(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool,
+			   uint8_t dcs_idx, uint32_t num)
+{
+	uint32_t dcs_id = cpool->dcs_mng.dcs[dcs_idx].obj->id;
+	uint64_t cnt_num = cpool->dcs_mng.dcs[dcs_idx].batch_sz;
+	uint64_t left;
+	uint32_t iidx = cpool->dcs_mng.dcs[dcs_idx].iidx;
+	uint32_t offset;
+	uint16_t mask;
+	uint16_t sq_idx;
+	uint64_t burst_sz = (uint64_t)(1 << MLX5_ASO_CNT_QUEUE_LOG_DESC) * 4 *
+		sh->cnt_svc->aso_mng.sq_num;
+	uint64_t qburst_sz = burst_sz / sh->cnt_svc->aso_mng.sq_num;
+	uint64_t n;
+	struct mlx5_aso_sq *sq;
+
+	cnt_num = RTE_MIN(num, cnt_num);
+	left = cnt_num;
+	while (left) {
+		mask = 0;
+		for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+				sq_idx++) {
+			if (left == 0) {
+				mask |= (1 << sq_idx);
+				continue;
+			}
+			n = RTE_MIN(left, qburst_sz);
+			offset = cnt_num - left;
+			offset += iidx;
+			mlx5_aso_cnt_sq_enqueue_burst(cpool, sh,
+					&sh->cnt_svc->aso_mng.sqs[sq_idx], n,
+					offset, dcs_id);
+			left -= n;
+		}
+		do {
+			for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+					sq_idx++) {
+				sq = &sh->cnt_svc->aso_mng.sqs[sq_idx];
+				if (mlx5_aso_cnt_completion_handle(sq))
+					mask |= (1 << sq_idx);
+			}
+		} while (mask < ((1 << sh->cnt_svc->aso_mng.sq_num) - 1));
+	}
+	return cnt_num;
+}
+
+/*
+ * Query FW counter via ASO WQE.
+ *
+ * ASO query counter use _sync_ mode, means:
+ * 1. each SQ issue one burst with several WQEs
+ * 2. ask for CQE at last WQE
+ * 3. busy poll CQ of each SQ's
+ * 4. If all SQ's CQE are received then goto step 1, issue next burst
+ *
+ * @param[in] sh
+ *   Pointer to shared device.
+ * @param[in] cpool
+ *   Pointer to counter pool.
+ *
+ * @return
+ *   0 on success, -1 on failure.
+ */
+int
+mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	uint32_t num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool) -
+		rte_ring_count(cpool->free_list);
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		num = RTE_MIN(cnt_num, cpool->dcs_mng.dcs[idx].batch_sz);
+		mlx5_aso_cnt_query_one_dcs(sh, cpool, idx, num);
+		cnt_num -= num;
+		if (cnt_num == 0)
+			break;
+	}
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 5051741a5a..1e441c9c0d 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -10,6 +10,7 @@
 #include "mlx5_rx.h"
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+#include "mlx5_hws_cnt.h"
 
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
@@ -353,6 +354,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 			mlx5dr_action_destroy(acts->mhdr->action);
 		mlx5_free(acts->mhdr);
 	}
+	if (mlx5_hws_cnt_id_valid(acts->cnt_id)) {
+		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
+		acts->cnt_id = 0;
+	}
 }
 
 /**
@@ -532,6 +537,44 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared counter action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] cnt_id
+ *   Shared counter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t cnt_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_counter.id = cnt_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
+
 /**
  * Translate shared indirect action.
  *
@@ -573,6 +616,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 		    action_src, action_dst, idx, shared_rss))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (__flow_hw_act_data_shared_cnt_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
+			action_src, action_dst, act_idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -946,6 +996,30 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	}
 	return 0;
 }
+
+static __rte_always_inline int
+flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
+		      struct mlx5_hw_actions *acts)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t pos = start_pos;
+	cnt_id_t cnt_id;
+	int ret;
+
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	if (ret != 0)
+		return ret;
+	ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &acts->rule_acts[pos].action,
+				 &acts->rule_acts[pos].counter.offset);
+	if (ret != 0)
+		return ret;
+	acts->cnt_id = cnt_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1189,6 +1263,20 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (masks->conf &&
+			    ((const struct rte_flow_action_count *)
+			     masks->conf)->id) {
+				err = flow_hw_cnt_compile(dev, i, acts);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, i)) {
+				goto err;
+			}
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1377,6 +1465,13 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				(dev, &act_data, item_flags, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+				act_idx,
+				&rule_act->action,
+				&rule_act->counter.offset))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1520,7 +1615,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num)
+			  uint32_t *acts_num,
+			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
@@ -1574,6 +1670,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
 		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
@@ -1681,6 +1778,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
+					&cnt_id);
+			if (ret != 0)
+				return ret;
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = cnt_id;
+			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 act_data->shared_counter.id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = act_data->shared_counter.id;
+			break;
 		default:
 			break;
 		}
@@ -1690,6 +1813,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
 	}
+	if (mlx5_hws_cnt_id_valid(hw_acts->cnt_id))
+		job->flow->cnt_id = hw_acts->cnt_id;
 	return 0;
 }
 
@@ -1825,7 +1950,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * user's input, in order to save the cost.
 	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num)) {
+				  actions, rule_acts, &acts_num, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1955,6 +2080,13 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
+			    mlx5_hws_cnt_is_shared
+				(priv->hws_cpool, job->flow->cnt_id) == false) {
+				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
+						&job->flow->cnt_id);
+				job->flow->cnt_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -2678,6 +2810,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -4349,6 +4484,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_counters) {
+		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
+				nb_queue);
+		if (priv->hws_cpool == NULL)
+			goto err;
+	}
 	return 0;
 err:
 	flow_hw_free_vport_actions(priv);
@@ -4418,6 +4559,8 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (priv->hws_cpool)
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4559,10 +4702,28 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	cnt_id_t cnt_id;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_create(dev, conf, action, error);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+			rte_flow_error_set(error, ENODEV,
+					RTE_FLOW_ERROR_TYPE_ACTION,
+					NULL,
+					"counter are not configured!");
+		else
+			handle = (struct rte_flow_action_handle *)
+				 (uintptr_t)cnt_id;
+		break;
+	default:
+		handle = flow_dv_action_create(dev, conf, action, error);
+	}
+	return handle;
 }
 
 /**
@@ -4626,10 +4787,172 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			      void *user_data,
 			      struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_destroy(dev, handle, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	default:
+		return flow_dv_action_destroy(dev, handle, error);
+	}
+}
+
+static int
+flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
+		      void *data, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cnt *cnt;
+	struct rte_flow_query_count *qc = data;
+	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint64_t pkts, bytes;
+
+	if (!mlx5_hws_cnt_id_valid(counter))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"counter are not available");
+	cnt = &priv->hws_cpool->pool[iidx];
+	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
+	qc->hits_set = 1;
+	qc->bytes_set = 1;
+	qc->hits = pkts - cnt->reset.hits;
+	qc->bytes = bytes - cnt->reset.bytes;
+	if (qc->reset) {
+		cnt->reset.bytes = bytes;
+		cnt->reset.hits = pkts;
+	}
+	return 0;
+}
+
+static int
+flow_hw_query(struct rte_eth_dev *dev,
+	      struct rte_flow *flow __rte_unused,
+	      const struct rte_flow_action *actions __rte_unused,
+	      void *data __rte_unused,
+	      struct rte_flow_error *error __rte_unused)
+{
+	int ret = -EINVAL;
+	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
+
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
+						  error);
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  actions,
+						  "action not supported");
+		}
+	}
+	return ret;
+}
+
+/**
+ * Create indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   A valid shared action handle in case of success, NULL otherwise and
+ *   rte_errno is set.
+ */
+static struct rte_flow_action_handle *
+flow_hw_action_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_indir_action_conf *conf,
+		       const struct rte_flow_action *action,
+		       struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
+					    NULL, err);
+}
+
+/**
+ * Destroy the indirect action.
+ * Release action related resources on the NIC and the memory.
+ * Lock free, (mutex should be acquired by caller).
+ * Dispatcher for action type specific call.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be removed.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_destroy(struct rte_eth_dev *dev,
+		       struct rte_flow_action_handle *handle,
+		       struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
+			NULL, error);
+}
+
+/**
+ * Updates in place shared action configuration.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be updated.
+ * @param[in] update
+ *   Action specification used to modify the action pointed by *handle*.
+ *   *update* could be of same type with the action pointed by the *handle*
+ *   handle argument, or some other structures like a wrapper, depending on
+ *   the indirect action type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_update(struct rte_eth_dev *dev,
+		      struct rte_flow_action_handle *handle,
+		      const void *update,
+		      struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
+			update, NULL, err);
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return flow_hw_query_counter(dev, act_idx, data, error);
+	default:
+		return flow_dv_action_query(dev, handle, data, error);
+	}
 }
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
@@ -4651,10 +4974,11 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
 	.action_validate = flow_dv_action_validate,
-	.action_create = flow_dv_action_create,
-	.action_destroy = flow_dv_action_destroy,
-	.action_update = flow_dv_action_update,
-	.action_query = flow_dv_action_query,
+	.action_create = flow_hw_action_create,
+	.action_destroy = flow_hw_action_destroy,
+	.action_update = flow_hw_action_update,
+	.action_query = flow_hw_action_query,
+	.query = flow_hw_query,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
new file mode 100644
index 0000000000..e2408ef36d
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -0,0 +1,528 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#include <stdint.h>
+#include <rte_malloc.h>
+#include <mlx5_malloc.h>
+#include <rte_ring.h>
+#include <mlx5_devx_cmds.h>
+#include <rte_cycles.h>
+
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+
+#include "mlx5_utils.h"
+#include "mlx5_hws_cnt.h"
+
+#define HWS_CNT_CACHE_SZ_DEFAULT 511
+#define HWS_CNT_CACHE_PRELOAD_DEFAULT 254
+#define HWS_CNT_CACHE_FETCH_DEFAULT 254
+#define HWS_CNT_CACHE_THRESHOLD_DEFAULT 254
+#define HWS_CNT_ALLOC_FACTOR_DEFAULT 20
+
+static void
+__hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t preload;
+	uint32_t q_num = cpool->cache->q_num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	cnt_id_t cnt_id, iidx = 0;
+	uint32_t qidx;
+	struct rte_ring *qcache = NULL;
+
+	/*
+	 * Counter ID order is important for tracking the max number of in used
+	 * counter for querying, which means counter internal index order must
+	 * be from zero to the number user configured, i.e: 0 - 8000000.
+	 * Need to load counter ID in this order into the cache firstly,
+	 * and then the global free list.
+	 * In the end, user fetch the the counter from minimal to the maximum.
+	 */
+	preload = RTE_MIN(cpool->cache->preload_sz, cnt_num / q_num);
+	for (qidx = 0; qidx < q_num; qidx++) {
+		for (; iidx < preload * (qidx + 1); iidx++) {
+			cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+			qcache = cpool->cache->qcache[qidx];
+			if (qcache)
+				rte_ring_enqueue_elem(qcache, &cnt_id,
+						sizeof(cnt_id));
+		}
+	}
+	for (; iidx < cnt_num; iidx++) {
+		cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+		rte_ring_enqueue_elem(cpool->free_list, &cnt_id,
+				sizeof(cnt_id));
+	}
+}
+
+static void
+__mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	struct rte_ring *reset_list = cpool->wait_reset_list;
+	struct rte_ring *reuse_list = cpool->reuse_list;
+	uint32_t reset_cnt_num;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdu = {0};
+
+	reset_cnt_num = rte_ring_count(reset_list);
+	do {
+		cpool->query_gen++;
+		mlx5_aso_cnt_query(sh, cpool);
+		zcdr.n1 = 0;
+		zcdu.n1 = 0;
+		rte_ring_enqueue_zc_burst_elem_start(reuse_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdu,
+				NULL);
+		rte_ring_dequeue_zc_burst_elem_start(reset_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdr,
+				NULL);
+		__hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num);
+		rte_ring_dequeue_zc_elem_finish(reset_list,
+				reset_cnt_num);
+		rte_ring_enqueue_zc_elem_finish(reuse_list,
+				reset_cnt_num);
+		reset_cnt_num = rte_ring_count(reset_list);
+	} while (reset_cnt_num > 0);
+}
+
+static void
+mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_raw_data_mng *mng)
+{
+	if (mng == NULL)
+		return;
+	sh->cdev->mr_scache.dereg_mr_cb(&mng->mr);
+	mlx5_free(mng->raw);
+	mlx5_free(mng);
+}
+
+__rte_unused
+static struct mlx5_hws_cnt_raw_data_mng *
+mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
+{
+	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
+	int ret;
+	size_t sz = n * sizeof(struct flow_counter_stats);
+
+	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
+			SOCKET_ID_ANY);
+	if (mng == NULL)
+		goto error;
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+			SOCKET_ID_ANY);
+	if (mng->raw == NULL)
+		goto error;
+	ret = sh->cdev->mr_scache.reg_mr_cb(sh->cdev->pd, mng->raw, sz,
+					    &mng->mr);
+	if (ret) {
+		rte_errno = errno;
+		goto error;
+	}
+	return mng;
+error:
+	mlx5_hws_cnt_raw_data_free(sh, mng);
+	return NULL;
+}
+
+static void *
+mlx5_hws_cnt_svc(void *opaque)
+{
+	struct mlx5_dev_ctx_shared *sh =
+		(struct mlx5_dev_ctx_shared *)opaque;
+	uint64_t interval =
+		(uint64_t)sh->cnt_svc->query_interval * (US_PER_S / MS_PER_S);
+	uint16_t port_id;
+	uint64_t start_cycle, query_cycle = 0;
+	uint64_t query_us;
+	uint64_t sleep_us;
+
+	while (sh->cnt_svc->svc_running != 0) {
+		start_cycle = rte_rdtsc();
+		MLX5_ETH_FOREACH_DEV(port_id, sh->cdev->dev) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+			if (opriv != NULL &&
+			    opriv->sh == sh &&
+			    opriv->hws_cpool != NULL) {
+				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+			}
+		}
+		query_cycle = rte_rdtsc() - start_cycle;
+		query_us = query_cycle / (rte_get_timer_hz() / US_PER_S);
+		sleep_us = interval - query_us;
+		if (interval > query_us)
+			rte_delay_us_sleep(sleep_us);
+	}
+	return NULL;
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg)
+{
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct mlx5_hws_cnt_pool *cntp;
+	uint64_t cnt_num = 0;
+	uint32_t qidx;
+
+	MLX5_ASSERT(pcfg);
+	MLX5_ASSERT(ccfg);
+	cntp = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*cntp), 0,
+			   SOCKET_ID_ANY);
+	if (cntp == NULL)
+		return NULL;
+
+	cntp->cfg = *pcfg;
+	cntp->cache = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*cntp->cache) +
+			sizeof(((struct mlx5_hws_cnt_pool_caches *)0)->qcache[0])
+				* ccfg->q_num, 0, SOCKET_ID_ANY);
+	if (cntp->cache == NULL)
+		goto error;
+	 /* store the necessary cache parameters. */
+	cntp->cache->fetch_sz = ccfg->fetch_sz;
+	cntp->cache->preload_sz = ccfg->preload_sz;
+	cntp->cache->threshold = ccfg->threshold;
+	cntp->cache->q_num = ccfg->q_num;
+	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
+	if (cnt_num > UINT32_MAX) {
+		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
+			cnt_num);
+		goto error;
+	}
+	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(struct mlx5_hws_cnt) *
+			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
+			0, SOCKET_ID_ANY);
+	if (cntp->pool == NULL)
+		goto error;
+	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
+	cntp->free_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->free_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_R_RING", pcfg->name);
+	cntp->wait_reset_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_MP_HTS_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
+	if (cntp->wait_reset_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_U_RING", pcfg->name);
+	cntp->reuse_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->reuse_list == NULL) {
+		DRV_LOG(ERR, "failed to create reuse list ring");
+		goto error;
+	}
+	for (qidx = 0; qidx < ccfg->q_num; qidx++) {
+		snprintf(mz_name, sizeof(mz_name), "%s_cache/%u", pcfg->name,
+				qidx);
+		cntp->cache->qcache[qidx] = rte_ring_create(mz_name, ccfg->size,
+				SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (cntp->cache->qcache[qidx] == NULL)
+			goto error;
+	}
+	return cntp;
+error:
+	mlx5_hws_cnt_pool_deinit(cntp);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool * const cntp)
+{
+	uint32_t qidx = 0;
+	if (cntp == NULL)
+		return;
+	rte_ring_free(cntp->free_list);
+	rte_ring_free(cntp->wait_reset_list);
+	rte_ring_free(cntp->reuse_list);
+	if (cntp->cache) {
+		for (qidx = 0; qidx < cntp->cache->q_num; qidx++)
+			rte_ring_free(cntp->cache->qcache[qidx]);
+	}
+	mlx5_free(cntp->cache);
+	mlx5_free(cntp->raw_mng);
+	mlx5_free(cntp->pool);
+	mlx5_free(cntp);
+}
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh)
+{
+#define CNT_THREAD_NAME_MAX 256
+	char name[CNT_THREAD_NAME_MAX];
+	rte_cpuset_t cpuset;
+	int ret;
+	uint32_t service_core = sh->cnt_svc->service_core;
+
+	CPU_ZERO(&cpuset);
+	sh->cnt_svc->svc_running = 1;
+	ret = pthread_create(&sh->cnt_svc->service_thread, NULL,
+			mlx5_hws_cnt_svc, sh);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create HW steering's counter service thread.");
+		return -ENOSYS;
+	}
+	snprintf(name, CNT_THREAD_NAME_MAX - 1, "%s/svc@%d",
+		 sh->ibdev_name, service_core);
+	rte_thread_setname(sh->cnt_svc->service_thread, name);
+	CPU_SET(service_core, &cpuset);
+	pthread_setaffinity_np(sh->cnt_svc->service_thread, sizeof(cpuset),
+				&cpuset);
+	return 0;
+}
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc->service_thread == 0)
+		return;
+	sh->cnt_svc->svc_running = 0;
+	pthread_join(sh->cnt_svc->service_thread, NULL);
+	sh->cnt_svc->service_thread = 0;
+}
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+	uint32_t max_log_bulk_sz = 0;
+	uint32_t log_bulk_sz;
+	uint32_t idx, alloced = 0;
+	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	struct mlx5_devx_counter_attr attr = {0};
+	struct mlx5_devx_obj *dcs;
+
+	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
+		DRV_LOG(ERR,
+			"Fw doesn't support bulk log max alloc");
+		return -1;
+	}
+	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
+	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
+	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
+	attr.pd = sh->cdev->pdn;
+	attr.pd_valid = 1;
+	attr.bulk_log_max_alloc = 1;
+	attr.flow_counter_bulk_log_size = log_bulk_sz;
+	idx = 0;
+	dcs = mlx5_devx_cmd_flow_counter_alloc_general(sh->cdev->ctx, &attr);
+	if (dcs == NULL)
+		goto error;
+	cpool->dcs_mng.dcs[idx].obj = dcs;
+	cpool->dcs_mng.dcs[idx].batch_sz = (1 << log_bulk_sz);
+	cpool->dcs_mng.batch_total++;
+	idx++;
+	cpool->dcs_mng.dcs[0].iidx = 0;
+	alloced = cpool->dcs_mng.dcs[0].batch_sz;
+	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
+		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			dcs = mlx5_devx_cmd_flow_counter_alloc_general
+				(sh->cdev->ctx, &attr);
+			if (dcs == NULL)
+				goto error;
+			cpool->dcs_mng.dcs[idx].obj = dcs;
+			cpool->dcs_mng.dcs[idx].batch_sz =
+				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].iidx = alloced;
+			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
+			cpool->dcs_mng.batch_total++;
+		}
+	}
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Cannot alloc device counter, allocated[%" PRIu32 "] request[%" PRIu32 "]",
+		alloced, cnt_num);
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+		cpool->dcs_mng.dcs[idx].obj = NULL;
+		cpool->dcs_mng.dcs[idx].batch_sz = 0;
+		cpool->dcs_mng.dcs[idx].iidx = 0;
+	}
+	cpool->dcs_mng.batch_total = 0;
+	return -1;
+}
+
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+
+	if (cpool == NULL)
+		return;
+	for (idx = 0; idx < MLX5_HWS_CNT_DCS_NUM; idx++)
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+	if (cpool->raw_mng) {
+		mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+		cpool->raw_mng = NULL;
+	}
+}
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	int ret = 0;
+	struct mlx5_hws_cnt_dcs *dcs;
+	uint32_t flags;
+
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		dcs->dr_action = mlx5dr_action_create_counter(priv->dr_ctx,
+					(struct mlx5dr_devx_obj *)dcs->obj,
+					flags);
+		if (dcs->dr_action == NULL) {
+			mlx5_hws_cnt_pool_action_destroy(cpool);
+			ret = -ENOSYS;
+			break;
+		}
+	}
+	return ret;
+}
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	struct mlx5_hws_cnt_dcs *dcs;
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		if (dcs->dr_action != NULL) {
+			mlx5dr_action_destroy(dcs->dr_action);
+			dcs->dr_action = NULL;
+		}
+	}
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue)
+{
+	struct mlx5_hws_cnt_pool *cpool = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cache_param cparam = {0};
+	struct mlx5_hws_cnt_pool_cfg pcfg = {0};
+	char *mp_name;
+	int ret = 0;
+	size_t sz;
+
+	/* init cnt service if not. */
+	if (priv->sh->cnt_svc == NULL) {
+		ret = mlx5_hws_cnt_svc_init(priv->sh);
+		if (ret != 0)
+			return NULL;
+	}
+	cparam.fetch_sz = HWS_CNT_CACHE_FETCH_DEFAULT;
+	cparam.preload_sz = HWS_CNT_CACHE_PRELOAD_DEFAULT;
+	cparam.q_num = nb_queue;
+	cparam.threshold = HWS_CNT_CACHE_THRESHOLD_DEFAULT;
+	cparam.size = HWS_CNT_CACHE_SZ_DEFAULT;
+	pcfg.alloc_factor = HWS_CNT_ALLOC_FACTOR_DEFAULT;
+	mp_name = mlx5_malloc(MLX5_MEM_ZERO, RTE_MEMZONE_NAMESIZE, 0,
+			SOCKET_ID_ANY);
+	if (mp_name == NULL)
+		goto error;
+	snprintf(mp_name, RTE_MEMZONE_NAMESIZE, "MLX5_HWS_CNT_POOL_%u",
+			dev->data->port_id);
+	pcfg.name = mp_name;
+	pcfg.request_num = pattr->nb_counters;
+	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	if (cpool == NULL)
+		goto error;
+	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
+	if (ret != 0)
+		goto error;
+	sz = RTE_ALIGN_CEIL(mlx5_hws_cnt_pool_get_size(cpool), 4);
+	cpool->raw_mng = mlx5_hws_cnt_raw_data_alloc(priv->sh, sz);
+	if (cpool->raw_mng == NULL)
+		goto error;
+	__hws_cnt_id_load(cpool);
+	/*
+	 * Bump query gen right after pool create so the
+	 * pre-loaded counters can be used directly
+	 * because they already have init value no need
+	 * to wait for query.
+	 */
+	cpool->query_gen = 1;
+	ret = mlx5_hws_cnt_pool_action_create(priv, cpool);
+	if (ret != 0)
+		goto error;
+	priv->sh->cnt_svc->refcnt++;
+	return cpool;
+error:
+	mlx5_hws_cnt_pool_destroy(priv->sh, cpool);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	if (cpool == NULL)
+		return;
+	if (--sh->cnt_svc->refcnt == 0)
+		mlx5_hws_cnt_svc_deinit(sh);
+	mlx5_hws_cnt_pool_action_destroy(cpool);
+	mlx5_hws_cnt_pool_dcs_free(sh, cpool);
+	mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+	mlx5_free((void *)cpool->cfg.name);
+	mlx5_hws_cnt_pool_deinit(cpool);
+}
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh)
+{
+	int ret;
+
+	sh->cnt_svc = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*sh->cnt_svc), 0, SOCKET_ID_ANY);
+	if (sh->cnt_svc == NULL)
+		return -1;
+	sh->cnt_svc->query_interval = sh->config.cnt_svc.cycle_time;
+	sh->cnt_svc->service_core = sh->config.cnt_svc.service_core;
+	ret = mlx5_aso_cnt_queue_init(sh);
+	if (ret != 0) {
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+		return -1;
+	}
+	ret = mlx5_hws_cnt_service_thread_create(sh);
+	if (ret != 0) {
+		mlx5_aso_cnt_queue_uninit(sh);
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+	}
+	return 0;
+}
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc == NULL)
+		return;
+	mlx5_hws_cnt_service_thread_destroy(sh);
+	mlx5_aso_cnt_queue_uninit(sh);
+	mlx5_free(sh->cnt_svc);
+	sh->cnt_svc = NULL;
+}
+
+#endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
new file mode 100644
index 0000000000..5fab4ba597
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -0,0 +1,558 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Mellanox Technologies, Ltd
+ */
+
+#ifndef _MLX5_HWS_CNT_H_
+#define _MLX5_HWS_CNT_H_
+
+#include <rte_ring.h>
+#include "mlx5_utils.h"
+#include "mlx5_flow.h"
+
+/*
+ * COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    | T |       | D |                                               |
+ *    ~ Y |       | C |                    IDX                        ~
+ *    | P |       | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
+#define MLX5_HWS_CNT_DCS_NUM 4
+#define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
+#define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
+#define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
+
+struct mlx5_hws_cnt_dcs {
+	void *dr_action;
+	uint32_t batch_sz;
+	uint32_t iidx; /* internal index of first counter in this bulk. */
+	struct mlx5_devx_obj *obj;
+};
+
+struct mlx5_hws_cnt_dcs_mng {
+	uint32_t batch_total;
+	struct mlx5_hws_cnt_dcs dcs[MLX5_HWS_CNT_DCS_NUM];
+};
+
+struct mlx5_hws_cnt {
+	struct flow_counter_stats reset;
+	union {
+		uint32_t share: 1;
+		/*
+		 * share will be set to 1 when this counter is used as indirect
+		 * action. Only meaningful when user own this counter.
+		 */
+		uint32_t query_gen_when_free;
+		/*
+		 * When PMD own this counter (user put back counter to PMD
+		 * counter pool, i.e), this field recorded value of counter
+		 * pools query generation at time user release the counter.
+		 */
+	};
+};
+
+struct mlx5_hws_cnt_raw_data_mng {
+	struct flow_counter_stats *raw;
+	struct mlx5_pmd_mr mr;
+};
+
+struct mlx5_hws_cache_param {
+	uint32_t size;
+	uint32_t q_num;
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+};
+
+struct mlx5_hws_cnt_pool_cfg {
+	char *name;
+	uint32_t request_num;
+	uint32_t alloc_factor;
+};
+
+struct mlx5_hws_cnt_pool_caches {
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+	uint32_t q_num;
+	struct rte_ring *qcache[];
+};
+
+struct mlx5_hws_cnt_pool {
+	struct mlx5_hws_cnt_pool_cfg cfg __rte_cache_aligned;
+	struct mlx5_hws_cnt_dcs_mng dcs_mng __rte_cache_aligned;
+	uint32_t query_gen __rte_cache_aligned;
+	struct mlx5_hws_cnt *pool;
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng;
+	struct rte_ring *reuse_list;
+	struct rte_ring *free_list;
+	struct rte_ring *wait_reset_list;
+	struct mlx5_hws_cnt_pool_caches *cache;
+} __rte_cache_aligned;
+
+/**
+ * Translate counter id into internal index (start from 0), which can be used
+ * as index of raw/cnt pool.
+ *
+ * @param cnt_id
+ *   The external counter id
+ * @return
+ *   Internal index
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+	uint32_t offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+
+	dcs_idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	return (cpool->dcs_mng.dcs[dcs_idx].iidx + offset);
+}
+
+/**
+ * Check if it's valid counter id.
+ */
+static __rte_always_inline bool
+mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
+{
+	return (cnt_id >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_COUNT ? true : false;
+}
+
+/**
+ * Generate Counter id from internal index.
+ *
+ * @param cpool
+ *   The pointer to counter pool
+ * @param index
+ *   The internal counter index.
+ *
+ * @return
+ *   Counter id
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+{
+	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
+	uint32_t idx;
+	uint32_t offset;
+	cnt_id_t cnt_id;
+
+	for (idx = 0, offset = iidx; idx < dcs_mng->batch_total; idx++) {
+		if (dcs_mng->dcs[idx].batch_sz <= offset)
+			offset -= dcs_mng->dcs[idx].batch_sz;
+		else
+			break;
+	}
+	cnt_id = offset;
+	cnt_id |= (idx << MLX5_HWS_CNT_DCS_IDX_OFFSET);
+	return (MLX5_INDIRECT_ACTION_TYPE_COUNT <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | cnt_id;
+}
+
+static __rte_always_inline void
+__hws_cnt_query_raw(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		uint64_t *raw_pkts, uint64_t *raw_bytes)
+{
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng = cpool->raw_mng;
+	struct flow_counter_stats s[2];
+	uint8_t i = 0x1;
+	size_t stat_sz = sizeof(s[0]);
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	memcpy(&s[0], &raw_mng->raw[iidx], stat_sz);
+	do {
+		memcpy(&s[i & 1], &raw_mng->raw[iidx], stat_sz);
+		if (memcmp(&s[0], &s[1], stat_sz) == 0) {
+			*raw_pkts = rte_be_to_cpu_64(s[0].hits);
+			*raw_bytes = rte_be_to_cpu_64(s[0].bytes);
+			break;
+		}
+		i = ~i;
+	} while (1);
+}
+
+/**
+ * Copy elems from one zero-copy ring to zero-copy ring in place.
+ *
+ * The input is a rte ring zero-copy data struct, which has two pointer.
+ * in case of the wrapper happened, the ptr2 will be meaningful.
+ *
+ * So this rountin needs to consider the situation that the address given by
+ * source and destination could be both wrapped.
+ * First, calculate the first number of element needs to be copied until wrapped
+ * address, which could be in source or destination.
+ * Second, copy left number of element until second wrapped address. If in first
+ * step the wrapped address is source, then this time it must be in destination.
+ * and vice-vers.
+ * Third, copy all left numbe of element.
+ *
+ * In worst case, we need copy three pieces of continuous memory.
+ *
+ * @param zcdd
+ *   A pointer to zero-copy data of dest ring.
+ * @param zcds
+ *   A pointer to zero-copy data of source ring.
+ * @param n
+ *   Number of elems to copy.
+ */
+static __rte_always_inline void
+__hws_cnt_r2rcpy(struct rte_ring_zc_data *zcdd, struct rte_ring_zc_data *zcds,
+		unsigned int n)
+{
+	unsigned int n1, n2, n3;
+	void *s1, *s2, *s3;
+	void *d1, *d2, *d3;
+
+	s1 = zcds->ptr1;
+	d1 = zcdd->ptr1;
+	n1 = RTE_MIN(zcdd->n1, zcds->n1);
+	if (zcds->n1 > n1) {
+		n2 = zcds->n1 - n1;
+		s2 = RTE_PTR_ADD(zcds->ptr1, sizeof(cnt_id_t) * n1);
+		d2 = zcdd->ptr2;
+		n3 = n - n1 - n2;
+		s3 = zcds->ptr2;
+		d3 = RTE_PTR_ADD(zcdd->ptr2, sizeof(cnt_id_t) * n2);
+	} else {
+		n2 = zcdd->n1 - n1;
+		s2 = zcds->ptr2;
+		d2 = RTE_PTR_ADD(zcdd->ptr1, sizeof(cnt_id_t) * n1);
+		n3 = n - n1 - n2;
+		s3 = RTE_PTR_ADD(zcds->ptr2, sizeof(cnt_id_t) * n2);
+		d3 = zcdd->ptr2;
+	}
+	memcpy(d1, s1, n1 * sizeof(cnt_id_t));
+	if (n2 != 0) {
+		memcpy(d2, s2, n2 * sizeof(cnt_id_t));
+		if (n3 != 0)
+			memcpy(d3, s3, n3 * sizeof(cnt_id_t));
+	}
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_flush(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *reset_list = NULL;
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache,
+			sizeof(cnt_id_t), rte_ring_count(qcache), &zcdc,
+			NULL);
+	MLX5_ASSERT(ret);
+	reset_list = cpool->wait_reset_list;
+	rte_ring_enqueue_zc_burst_elem_start(reset_list,
+			sizeof(cnt_id_t), ret, &zcdr, NULL);
+	__hws_cnt_r2rcpy(&zcdr, &zcdc, ret);
+	rte_ring_enqueue_zc_elem_finish(reset_list, ret);
+	rte_ring_dequeue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_fetch(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+	struct rte_ring *free_list = NULL;
+	struct rte_ring *reuse_list = NULL;
+	struct rte_ring *list = NULL;
+	struct rte_ring_zc_data zcdf = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdu = {0};
+	struct rte_ring_zc_data zcds = {0};
+	struct mlx5_hws_cnt_pool_caches *cache = cpool->cache;
+	unsigned int ret;
+
+	reuse_list = cpool->reuse_list;
+	ret = rte_ring_dequeue_zc_burst_elem_start(reuse_list,
+			sizeof(cnt_id_t), cache->fetch_sz, &zcdu, NULL);
+	zcds = zcdu;
+	list = reuse_list;
+	if (unlikely(ret == 0)) { /* no reuse counter. */
+		rte_ring_dequeue_zc_elem_finish(reuse_list, 0);
+		free_list = cpool->free_list;
+		ret = rte_ring_dequeue_zc_burst_elem_start(free_list,
+				sizeof(cnt_id_t), cache->fetch_sz, &zcdf, NULL);
+		zcds = zcdf;
+		list = free_list;
+		if (unlikely(ret == 0)) { /* no free counter. */
+			rte_ring_dequeue_zc_elem_finish(free_list, 0);
+			if (rte_ring_count(cpool->wait_reset_list))
+				return -EAGAIN;
+			return -ENOENT;
+		}
+	}
+	rte_ring_enqueue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+			ret, &zcdc, NULL);
+	__hws_cnt_r2rcpy(&zcdc, &zcds, ret);
+	rte_ring_dequeue_zc_elem_finish(list, ret);
+	rte_ring_enqueue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+__mlx5_hws_cnt_pool_enqueue_revert(struct rte_ring *r, unsigned int n,
+		struct rte_ring_zc_data *zcd)
+{
+	uint32_t current_head = 0;
+	uint32_t revert2head = 0;
+
+	MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST);
+	MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST);
+	current_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	MLX5_ASSERT(n <= r->capacity);
+	MLX5_ASSERT(n <= rte_ring_count(r));
+	revert2head = current_head - n;
+	r->prod.head = revert2head; /* This ring should be SP. */
+	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
+			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
+	/* Update tail */
+	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
+	return n;
+}
+
+/**
+ * Put one counter back in the mempool.
+ *
+ * @param cpool
+ *   A pointer to the counter pool structure.
+ * @param cnt_id
+ *   A counter id to be added.
+ * @return
+ *   - 0: Success; object taken
+ *   - -ENOENT: not enough entry in pool
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret = 0;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring *qcache = NULL;
+	unsigned int wb_num = 0; /* cache write-back number. */
+	cnt_id_t iidx;
+
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].query_gen_when_free =
+		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_enqueue_elem(cpool->wait_reset_list, cnt_id,
+				sizeof(cnt_id_t));
+		MLX5_ASSERT(ret == 0);
+		return ret;
+	}
+	ret = rte_ring_enqueue_burst_elem(qcache, cnt_id, sizeof(cnt_id_t), 1,
+					  NULL);
+	if (unlikely(ret == 0)) { /* cache is full. */
+		wb_num = rte_ring_count(qcache) - cpool->cache->threshold;
+		MLX5_ASSERT(wb_num < rte_ring_count(qcache));
+		__mlx5_hws_cnt_pool_enqueue_revert(qcache, wb_num, &zcdc);
+		rte_ring_enqueue_zc_burst_elem_start(cpool->wait_reset_list,
+				sizeof(cnt_id_t), wb_num, &zcdr, NULL);
+		__hws_cnt_r2rcpy(&zcdr, &zcdc, wb_num);
+		rte_ring_enqueue_zc_elem_finish(cpool->wait_reset_list, wb_num);
+		/* write-back THIS counter too */
+		ret = rte_ring_enqueue_burst_elem(cpool->wait_reset_list,
+				cnt_id, sizeof(cnt_id_t), 1, NULL);
+	}
+	return ret == 1 ? 0 : -ENOENT;
+}
+
+/**
+ * Get one counter from the pool.
+ *
+ * If @param queue is not null, objects will be retrieved first from queue's
+ * cache, subsequently from the common pool. Note that it can return -ENOENT
+ * when the local cache and common pool are empty, even if cache from other
+ * queue are full.
+ *
+ * @param cntp
+ *   A pointer to the counter pool structure.
+ * @param queue
+ *   A pointer to HWS queue. If null, it means fetch from common pool.
+ * @param cnt_id
+ *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @return
+ *   - 0: Success; objects taken.
+ *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
+ *   - -EAGAIN: counter is not ready; try again.
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *qcache = NULL;
+	uint32_t query_gen = 0;
+	cnt_id_t iidx, tmp_cid = 0;
+
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_dequeue_elem(cpool->reuse_list, &tmp_cid,
+				sizeof(cnt_id_t));
+		if (unlikely(ret != 0)) {
+			ret = rte_ring_dequeue_elem(cpool->free_list, &tmp_cid,
+					sizeof(cnt_id_t));
+			if (unlikely(ret != 0)) {
+				if (rte_ring_count(cpool->wait_reset_list))
+					return -EAGAIN;
+				return -ENOENT;
+			}
+		}
+		*cnt_id = tmp_cid;
+		iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+		__hws_cnt_query_raw(cpool, *cnt_id,
+				    &cpool->pool[iidx].reset.hits,
+				    &cpool->pool[iidx].reset.bytes);
+		return 0;
+	}
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
+			&zcdc, NULL);
+	if (unlikely(ret == 0)) { /* local cache is empty. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+	}
+	/* get one from local cache. */
+	*cnt_id = (*(cnt_id_t *)zcdc.ptr1);
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	query_gen = cpool->pool[iidx].query_gen_when_free;
+	if (cpool->query_gen == query_gen) { /* counter is waiting to reset. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* write-back counter to reset list. */
+		mlx5_hws_cnt_pool_cache_flush(cpool, *queue);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+		*cnt_id = *(cnt_id_t *)zcdc.ptr1;
+	}
+	__hws_cnt_query_raw(cpool, *cnt_id, &cpool->pool[iidx].reset.hits,
+			    &cpool->pool[iidx].reset.bytes);
+	rte_ring_dequeue_zc_elem_finish(qcache, 1);
+	cpool->pool[iidx].share = 0;
+	return 0;
+}
+
+static __rte_always_inline unsigned int
+mlx5_hws_cnt_pool_get_size(struct mlx5_hws_cnt_pool *cpool)
+{
+	return rte_ring_get_capacity(cpool->free_list);
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
+		cnt_id_t cnt_id, struct mlx5dr_action **action,
+		uint32_t *offset)
+{
+	uint8_t idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+
+	idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	*action = cpool->dcs_mng.dcs[idx].dr_action;
+	*offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx;
+
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	if (ret != 0)
+		return ret;
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	MLX5_ASSERT(cpool->pool[iidx].share == 0);
+	cpool->pool[iidx].share = 1;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_put(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+
+	cpool->pool[iidx].share = 0;
+	ret = mlx5_hws_cnt_pool_put(cpool, NULL, cnt_id);
+	if (unlikely(ret != 0))
+		cpool->pool[iidx].share = 1; /* fail to release, restore. */
+	return ret;
+}
+
+static __rte_always_inline bool
+mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	return cpool->pool[iidx].share ? true : false;
+}
+
+/* init HWS counter pool. */
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg);
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh);
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool);
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool);
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue);
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
+
+#endif /* _MLX5_HWS_CNT_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 09/18] net/mlx5: support DR action template API
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (7 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 08/18] net/mlx5: add HW steering counter action Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
                     ` (8 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adapts mlx5 PMD to changes in mlx5dr API regarding action
templates. It changes the following:

1. Actions template creation:

    - Flow actions types are translated to mlx5dr action types in order
      to create mlx5dr_action_template object.
    - An offset is assigned to each flow action. This offset is used to
      predetermine action's location in rule_acts array passed on rule
      creation.

2. Template table creation:

    - Fixed actions are created and put in rule_acts cache using
      predetermined offsets
    - mlx5dr matcher is parametrized by action templates bound to
      template table.
    - mlx5dr matcher is configured to optimize rule creation based on
      passed rule indices.

3. Flow rule creation:

    - mlx5dr rule is parametrized by action template on which these
      rule's actions are based.
    - Rule index hint is provided to mlx5dr.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   1 +
 drivers/net/mlx5/mlx5.c          |   4 +-
 drivers/net/mlx5/mlx5.h          |   2 +
 drivers/net/mlx5/mlx5_flow.h     |  32 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 617 +++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c  |  10 +
 6 files changed, 543 insertions(+), 123 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index c70cd84b8d..78cc44fae8 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1565,6 +1565,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		flow_hw_init_flow_metadata_config(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
 		    flow_hw_create_vport_action(eth_dev)) {
 			DRV_LOG(ERR, "port %u failed to create vport action",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4d87da8e29..e7a4aac354 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1969,8 +1969,10 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
-	if (priv->sh->config.dv_flow_en == 2)
+	if (priv->sh->config.dv_flow_en == 2) {
+		flow_hw_clear_flow_metadata_config();
 		flow_hw_clear_tags_set(dev);
+	}
 #endif
 	if (priv->rxq_privs != NULL) {
 		/* XXX race condition if mlx5_rx_burst() is still running. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c71db131a1..b8663e0322 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1651,6 +1651,8 @@ struct mlx5_priv {
 	struct mlx5dr_action *hw_drop[2];
 	/* HW steering global tag action. */
 	struct mlx5dr_action *hw_tag[2];
+	/* HW steering create ongoing rte flow table list header. */
+	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1948de5dd8..210cc9ae3e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1186,6 +1186,11 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint16_t dr_actions_num; /* Amount of DR rules actions. */
+	uint16_t actions_num; /* Amount of flow actions */
+	uint16_t *actions_off; /* DR action offset for given rte action offset. */
+	uint16_t reformat_off; /* Offset of DR reformat action. */
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
@@ -1237,7 +1242,6 @@ struct mlx5_hw_actions {
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
-	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
@@ -1493,6 +1497,13 @@ flow_hw_get_wire_port(struct ibv_context *ibctx)
 }
 #endif
 
+extern uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+extern uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+extern uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+void flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev);
+void flow_hw_clear_flow_metadata_config(void);
+
 /*
  * Convert metadata or tag to the actual register.
  * META: Can only be used to match in the FDB in this stage, fixed C_1.
@@ -1504,7 +1515,22 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 {
 	switch (type) {
 	case RTE_FLOW_ITEM_TYPE_META:
-		return REG_C_1;
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		if (mlx5_flow_hw_flow_metadata_esw_en &&
+		    mlx5_flow_hw_flow_metadata_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		}
+#endif
+		/*
+		 * On root table - PMD allows only egress META matching, thus
+		 * REG_A matching is sufficient.
+		 *
+		 * On non-root tables - REG_A corresponds to general_purpose_lookup_field,
+		 * which translates to REG_A in NIC TX and to REG_B in NIC RX.
+		 * However, current FW does not implement REG_B case right now, so
+		 * REG_B case should be rejected on pattern template validation.
+		 */
+		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
@@ -2413,4 +2439,6 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_pattern_template_attr *attr,
 		const struct rte_flow_item items[],
 		struct rte_flow_error *error);
+int flow_hw_table_update(struct rte_eth_dev *dev,
+			 struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1e441c9c0d..5b7ef1be68 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -340,6 +340,13 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 				 struct mlx5_hw_actions *acts)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_action_construct_data *data;
+
+	while (!LIST_EMPTY(&acts->act_list)) {
+		data = LIST_FIRST(&acts->act_list);
+		LIST_REMOVE(data, next);
+		mlx5_ipool_free(priv->acts_ipool, data->idx);
+	}
 
 	if (acts->jump) {
 		struct mlx5_flow_group *grp;
@@ -349,6 +356,16 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->tir) {
+		mlx5_hrxq_release(dev, acts->tir->idx);
+		acts->tir = NULL;
+	}
+	if (acts->encap_decap) {
+		if (acts->encap_decap->action)
+			mlx5dr_action_destroy(acts->encap_decap->action);
+		mlx5_free(acts->encap_decap);
+		acts->encap_decap = NULL;
+	}
 	if (acts->mhdr) {
 		if (acts->mhdr->action)
 			mlx5dr_action_destroy(acts->mhdr->action);
@@ -967,33 +984,29 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 static __rte_always_inline int
 flow_hw_meter_compile(struct rte_eth_dev *dev,
 		      const struct mlx5_flow_template_table_cfg *cfg,
-		      uint32_t  start_pos, const struct rte_flow_action *action,
-		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      uint16_t aso_mtr_pos,
+		      uint16_t jump_pos,
+		      const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts,
 		      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr *aso_mtr;
 	const struct rte_flow_action_meter *meter = action->conf;
-	uint32_t pos = start_pos;
 	uint32_t group = cfg->attr.flow_attr.group;
 
 	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
-	acts->rule_acts[pos].action = priv->mtr_bulk.action;
-	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
-		acts->jump = flow_hw_jump_action_register
+	acts->rule_acts[aso_mtr_pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts->jump = flow_hw_jump_action_register
 		(dev, cfg, aso_mtr->fm.group, error);
-	if (!acts->jump) {
-		*end_pos = start_pos;
+	if (!acts->jump)
 		return -ENOMEM;
-	}
-	acts->rule_acts[++pos].action = (!!group) ?
+	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	*end_pos = pos;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
-		*end_pos = start_pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
 		return -ENOMEM;
-	}
 	return 0;
 }
 
@@ -1046,11 +1059,11 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
  *    Table on success, NULL otherwise and rte_errno is set.
  */
 static int
-flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct mlx5_flow_template_table_cfg *cfg,
-			  struct mlx5_hw_actions *acts,
-			  struct rte_flow_actions_template *at,
-			  struct rte_flow_error *error)
+__flow_hw_actions_translate(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_template_table_cfg *cfg,
+			    struct mlx5_hw_actions *acts,
+			    struct rte_flow_actions_template *at,
+			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
@@ -1061,12 +1074,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	enum mlx5dr_action_reformat_type refmt_type = 0;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
-	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
+	uint16_t reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
-	uint32_t type, i;
+	uint32_t type;
+	bool reformat_used = false;
+	uint16_t action_pos;
+	uint16_t jump_pos;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1076,46 +1092,53 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		type = MLX5DR_TABLE_TYPE_NIC_TX;
 	else
 		type = MLX5DR_TABLE_TYPE_NIC_RX;
-	for (i = 0; !actions_end; actions++, masks++) {
+	for (; !actions_end; actions++, masks++) {
 		switch (actions->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			action_pos = at->actions_off[actions - at->actions];
 			if (!attr->group) {
 				DRV_LOG(ERR, "Indirect action is not supported in root table.");
 				goto err;
 			}
 			if (actions->conf && masks->conf) {
 				if (flow_hw_shared_action_translate
-				(dev, actions, acts, actions - action_start, i))
+				(dev, actions, acts, actions - action_start, action_pos))
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
-			acts->rule_acts[i++].action =
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
 				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
+			action_pos = at->actions_off[actions - at->actions];
 			acts->mark = true;
-			if (masks->conf)
-				acts->rule_acts[i].tag.value =
+			if (masks->conf &&
+			    ((const struct rte_flow_action_mark *)
+			     masks->conf)->id)
+				acts->rule_acts[action_pos].tag.value =
 					mlx5_flow_mark_set
 					(((const struct rte_flow_action_mark *)
-					(masks->conf))->id);
+					(actions->conf))->id);
 			else if (__flow_hw_act_data_general_append(priv, acts,
-				actions->type, actions - action_start, i))
+				actions->type, actions - action_start, action_pos))
 				goto err;
-			acts->rule_acts[i++].action =
+			acts->rule_acts[action_pos].action =
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_jump *)
+			     masks->conf)->group) {
 				uint32_t jump_group =
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
@@ -1123,76 +1146,77 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
-				acts->rule_acts[i].action = (!!attr->group) ?
+				acts->rule_acts[action_pos].action = (!!attr->group) ?
 						acts->jump->hws_action :
 						acts->jump->root_action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_queue *)
+			     masks->conf)->index) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (actions->conf && masks->conf) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
-			reformat_pos = i++;
+			MLX5_ASSERT(!reformat_used);
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
@@ -1206,25 +1230,23 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 actions->conf;
 			encap_data = raw_encap_data->data;
 			data_size = raw_encap_data->size;
-			if (reformat_pos != MLX5_HW_MAX_ACTS) {
+			if (reformat_used) {
 				refmt_type = data_size <
 				MLX5_ENCAPSULATION_DECISION_SIZE ?
 				MLX5DR_ACTION_REFORMAT_TYPE_TNL_L3_TO_L2 :
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L3;
 			} else {
-				reformat_pos = i++;
+				reformat_used = true;
 				refmt_type =
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			}
 			reformat_src = actions - action_start;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
-			reformat_pos = i++;
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			if (mhdr.pos == UINT16_MAX)
-				mhdr.pos = i++;
 			err = flow_hw_modify_field_compile(dev, attr, action_start,
 							   actions, masks, acts, &mhdr,
 							   error);
@@ -1242,40 +1264,46 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			action_pos = at->actions_off[actions - at->actions];
 			if (flow_hw_represented_port_compile
 					(dev, attr, action_start, actions,
-					 masks, acts, i, error))
+					 masks, acts, action_pos, error))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
+			/*
+			 * METER action is compiled to 2 DR actions - ASO_METER and FT.
+			 * Calculated DR offset is stored only for ASO_METER and FT
+			 * is assumed to be the next action.
+			 */
+			action_pos = at->actions_off[actions - at->actions];
+			jump_pos = action_pos + 1;
 			if (actions->conf && masks->conf &&
 			    ((const struct rte_flow_action_meter *)
 			     masks->conf)->mtr_id) {
 				err = flow_hw_meter_compile(dev, cfg,
-						i, actions, acts, &i, error);
+						action_pos, jump_pos, actions, acts, error);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append(priv, acts,
 							actions->type,
 							actions - action_start,
-							i))
+							action_pos))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
+			action_pos = at->actions_off[actions - action_start];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
-				err = flow_hw_cnt_compile(dev, i, acts);
+				err = flow_hw_cnt_compile(dev, action_pos, acts);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -1309,10 +1337,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			goto err;
 		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
 	}
-	if (reformat_pos != MLX5_HW_MAX_ACTS) {
+	if (reformat_used) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
 
+		MLX5_ASSERT(at->reformat_off != UINT16_MAX);
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
 			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
@@ -1340,20 +1369,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
-		acts->rule_acts[reformat_pos].action =
-						acts->encap_decap->action;
-		acts->rule_acts[reformat_pos].reformat.data =
-						acts->encap_decap->data;
+		acts->rule_acts[at->reformat_off].action = acts->encap_decap->action;
+		acts->rule_acts[at->reformat_off].reformat.data = acts->encap_decap->data;
 		if (shared_rfmt)
-			acts->rule_acts[reformat_pos].reformat.offset = 0;
+			acts->rule_acts[at->reformat_off].reformat.offset = 0;
 		else if (__flow_hw_act_data_encap_append(priv, acts,
 				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, data_size))
+				 reformat_src, at->reformat_off, data_size))
 			goto err;
 		acts->encap_decap->shared = shared_rfmt;
-		acts->encap_decap_pos = reformat_pos;
+		acts->encap_decap_pos = at->reformat_off;
 	}
-	acts->acts_num = i;
 	return 0;
 err:
 	err = rte_errno;
@@ -1363,6 +1389,40 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				  "fail to create rte table");
 }
 
+/**
+ * Translate rte_flow actions to DR action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] tbl
+ *   Pointer to the flow template table.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_actions_translate(struct rte_eth_dev *dev,
+			  struct rte_flow_template_table *tbl,
+			  struct rte_flow_error *error)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->nb_action_templates; i++) {
+		if (__flow_hw_actions_translate(dev, &tbl->cfg,
+						&tbl->ats[i].acts,
+						tbl->ats[i].action_template,
+						error))
+			goto err;
+	}
+	return 0;
+err:
+	while (i--)
+		__flow_hw_action_template_destroy(dev, &tbl->ats[i].acts);
+	return -1;
+}
+
 /**
  * Get shared indirect action.
  *
@@ -1611,16 +1671,17 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 static __rte_always_inline int
 flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5_hw_q_job *job,
-			  const struct mlx5_hw_actions *hw_acts,
+			  const struct mlx5_hw_action_template *hw_at,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
+	const struct rte_flow_actions_template *at = hw_at->action_template;
+	const struct mlx5_hw_actions *hw_acts = &hw_at->acts;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
@@ -1636,11 +1697,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *mtr;
 	uint32_t mtr_id;
 
-	memcpy(rule_acts, hw_acts->rule_acts,
-	       sizeof(*rule_acts) * hw_acts->acts_num);
-	*acts_num = hw_acts->acts_num;
-	if (LIST_EMPTY(&hw_acts->act_list))
-		return 0;
+	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
 	ft_flag = mlx5_hw_act_flag[!!table->grp->group_id][table->type];
 	if (table->type == MLX5DR_TABLE_TYPE_FDB) {
@@ -1774,7 +1831,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			(*acts_num)++;
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
@@ -1912,13 +1968,16 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 		.burst = attr->postpone,
 	};
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
-	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
 	const struct rte_flow_item *rule_items;
-	uint32_t acts_num, flow_idx;
+	uint32_t flow_idx;
 	int ret;
 
+	if (unlikely((!dev->data->dev_started))) {
+		rte_errno = EINVAL;
+		goto error;
+	}
 	if (unlikely(!priv->hw_q[queue].job_idx)) {
 		rte_errno = ENOMEM;
 		goto error;
@@ -1941,7 +2000,12 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->flow = flow;
 	job->user_data = user_data;
 	rule_attr.user_data = job;
-	hw_acts = &table->ats[action_template_index].acts;
+	/*
+	 * Indexed pool returns 1-based indices, but mlx5dr expects 0-based indices for rule
+	 * insertion hints.
+	 */
+	MLX5_ASSERT(flow_idx > 0);
+	rule_attr.rule_idx = flow_idx - 1;
 	/*
 	 * Construct the flow actions based on the input actions.
 	 * The implicitly appended action is always fixed, like metadata
@@ -1949,8 +2013,8 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num, queue)) {
+	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
+				      pattern_template_index, actions, rule_acts, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1959,7 +2023,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	if (!rule_items)
 		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
-				 pattern_template_index, items,
+				 pattern_template_index, rule_items,
 				 action_template_index, rule_acts,
 				 &rule_attr, (struct mlx5dr_rule *)flow->rule);
 	if (likely(!ret))
@@ -2295,6 +2359,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	struct mlx5dr_action_template *at[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
@@ -2315,6 +2380,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct mlx5_list_entry *ge;
 	uint32_t i, max_tpl = MLX5_HW_TBL_MAX_ITEM_TEMPLATE;
 	uint32_t nb_flows = rte_align32pow2(attr->nb_flows);
+	bool port_started = !!dev->data->dev_started;
 	int err;
 
 	/* HWS layer accepts only 1 item template with root table. */
@@ -2349,12 +2415,20 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl->grp = grp;
 	/* Prepare matcher information. */
 	matcher_attr.priority = attr->flow_attr.priority;
+	matcher_attr.optimize_using_rule_idx = true;
 	matcher_attr.mode = MLX5DR_MATCHER_RESOURCE_MODE_RULE;
 	matcher_attr.rule.num_log = rte_log2_u32(nb_flows);
 	/* Build the item template. */
 	for (i = 0; i < nb_item_templates; i++) {
 		uint32_t ret;
 
+		if ((flow_attr.ingress && !item_templates[i]->attr.ingress) ||
+		    (flow_attr.egress && !item_templates[i]->attr.egress) ||
+		    (flow_attr.transfer && !item_templates[i]->attr.transfer)) {
+			DRV_LOG(ERR, "pattern template and template table attribute mismatch");
+			rte_errno = EINVAL;
+			goto it_error;
+		}
 		ret = __atomic_add_fetch(&item_templates[i]->refcnt, 1,
 					 __ATOMIC_RELAXED);
 		if (ret <= 1) {
@@ -2364,10 +2438,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mt[i] = item_templates[i]->mt;
 		tbl->its[i] = item_templates[i];
 	}
-	tbl->matcher = mlx5dr_matcher_create
-		(tbl->grp->tbl, mt, nb_item_templates, NULL, 0, &matcher_attr);
-	if (!tbl->matcher)
-		goto it_error;
 	tbl->nb_item_templates = nb_item_templates;
 	/* Build the action template. */
 	for (i = 0; i < nb_action_templates; i++) {
@@ -2379,21 +2449,31 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			rte_errno = EINVAL;
 			goto at_error;
 		}
+		at[i] = action_templates[i]->tmpl;
+		tbl->ats[i].action_template = action_templates[i];
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, &tbl->cfg,
-						&tbl->ats[i].acts,
-						action_templates[i], error);
+		if (!port_started)
+			continue;
+		err = __flow_hw_actions_translate(dev, &tbl->cfg,
+						  &tbl->ats[i].acts,
+						  action_templates[i], error);
 		if (err) {
 			i++;
 			goto at_error;
 		}
-		tbl->ats[i].action_template = action_templates[i];
 	}
 	tbl->nb_action_templates = nb_action_templates;
+	tbl->matcher = mlx5dr_matcher_create
+		(tbl->grp->tbl, mt, nb_item_templates, at, nb_action_templates, &matcher_attr);
+	if (!tbl->matcher)
+		goto at_error;
 	tbl->type = attr->flow_attr.transfer ? MLX5DR_TABLE_TYPE_FDB :
 		    (attr->flow_attr.egress ? MLX5DR_TABLE_TYPE_NIC_TX :
 		    MLX5DR_TABLE_TYPE_NIC_RX);
-	LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	if (port_started)
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	else
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl_ongo, tbl, next);
 	return tbl;
 at_error:
 	while (i--) {
@@ -2406,7 +2486,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	while (i--)
 		__atomic_sub_fetch(&item_templates[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
-	mlx5dr_matcher_destroy(tbl->matcher);
 error:
 	err = rte_errno;
 	if (tbl) {
@@ -2423,6 +2502,33 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Update flow template table.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+int
+flow_hw_table_update(struct rte_eth_dev *dev,
+		     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table *tbl;
+
+	while ((tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo)) != NULL) {
+		if (flow_hw_actions_translate(dev, tbl, error))
+			return -1;
+		LIST_REMOVE(tbl, next);
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	}
+	return 0;
+}
+
 /**
  * Translates group index specified by the user in @p attr to internal
  * group index.
@@ -2501,6 +2607,7 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -2509,6 +2616,12 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
+	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
+		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+				  "egress flows are not supported with HW Steering"
+				  " when E-Switch is enabled");
+		return NULL;
+	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -2750,7 +2863,8 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_action *mask = &masks[i];
 
 		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
-		if (action->type != mask->type)
+		if (action->type != RTE_FLOW_ACTION_TYPE_INDIRECT &&
+		    action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  action,
@@ -2826,6 +2940,157 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
+	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
+	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
+	[RTE_FLOW_ACTION_TYPE_JUMP] = MLX5DR_ACTION_TYP_FT,
+	[RTE_FLOW_ACTION_TYPE_QUEUE] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_RSS] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
+	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
+	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+};
+
+static int
+flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
+					  unsigned int action_src,
+					  enum mlx5dr_action_type *action_types,
+					  uint16_t *curr_off,
+					  struct rte_flow_actions_template *at)
+{
+	uint32_t type;
+
+	if (!mask) {
+		DRV_LOG(WARNING, "Unable to determine indirect action type "
+			"without a mask specified");
+		return -EINVAL;
+	}
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
+		*curr_off = *curr_off + 1;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
+		*curr_off = *curr_off + 1;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Create DR action template based on a provided sequence of flow actions.
+ *
+ * @param[in] at
+ *   Pointer to flow actions template to be updated.
+ *
+ * @return
+ *   DR action template pointer on success and action offsets in @p at are updated.
+ *   NULL otherwise.
+ */
+static struct mlx5dr_action_template *
+flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
+{
+	struct mlx5dr_action_template *dr_template;
+	enum mlx5dr_action_type action_types[MLX5_HW_MAX_ACTS] = { MLX5DR_ACTION_TYP_LAST };
+	unsigned int i;
+	uint16_t curr_off;
+	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+	uint16_t reformat_off = UINT16_MAX;
+	uint16_t mhdr_off = UINT16_MAX;
+	int ret;
+	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		const struct rte_flow_action_raw_encap *raw_encap_data;
+		size_t data_size;
+		enum mlx5dr_action_type type;
+
+		if (curr_off >= MLX5_HW_MAX_ACTS)
+			goto err_actions_num;
+		switch (at->actions[i].type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
+									action_types,
+									&curr_off, at);
+			if (ret)
+				return NULL;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			MLX5_ASSERT(reformat_off == UINT16_MAX);
+			reformat_off = curr_off++;
+			reformat_act_type = mlx5_hw_dr_action_types[at->actions[i].type];
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data = at->actions[i].conf;
+			data_size = raw_encap_data->size;
+			if (reformat_off != UINT16_MAX) {
+				reformat_act_type = data_size < MLX5_ENCAPSULATION_DECISION_SIZE ?
+					MLX5DR_ACTION_TYP_TNL_L3_TO_L2 :
+					MLX5DR_ACTION_TYP_L2_TO_TNL_L3;
+			} else {
+				reformat_off = curr_off++;
+				reformat_act_type = MLX5DR_ACTION_TYP_L2_TO_TNL_L2;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			reformat_off = curr_off++;
+			reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr_off == UINT16_MAX) {
+				mhdr_off = curr_off++;
+				type = mlx5_hw_dr_action_types[at->actions[i].type];
+				action_types[mhdr_off] = type;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
+			break;
+		default:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			break;
+		}
+	}
+	if (curr_off >= MLX5_HW_MAX_ACTS)
+		goto err_actions_num;
+	if (mhdr_off != UINT16_MAX)
+		at->mhdr_off = mhdr_off;
+	if (reformat_off != UINT16_MAX) {
+		at->reformat_off = reformat_off;
+		action_types[reformat_off] = reformat_act_type;
+	}
+	dr_template = mlx5dr_action_template_create(action_types);
+	if (dr_template)
+		at->dr_actions_num = curr_off;
+	else
+		DRV_LOG(ERR, "Failed to create DR action template: %d", rte_errno);
+	return dr_template;
+err_actions_num:
+	DRV_LOG(ERR, "Number of HW actions (%u) exceeded maximum (%u) allowed in template",
+		curr_off, MLX5_HW_MAX_ACTS);
+	return NULL;
+}
+
 /**
  * Create flow action template.
  *
@@ -2851,7 +3116,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_len, mask_len, i;
+	int len, act_num, act_len, mask_len;
+	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = MLX5_HW_MAX_ACTS;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
@@ -2921,6 +3187,11 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
+	/* Count flow actions to allocate required space for storing DR offsets. */
+	act_num = 0;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
+		act_num++;
+	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
@@ -2930,19 +3201,26 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
-	/* Actions part is in the first half. */
+	/* Actions part is in the first part. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
 				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	/* Masks part is in the second half. */
+	/* Masks part is in the second part. */
 	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
 				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	/* DR actions offsets in the third part. */
+	at->actions_off = (uint16_t *)((uint8_t *)at->masks + mask_len);
+	at->actions_num = act_num;
+	for (i = 0; i < at->actions_num; ++i)
+		at->actions_off[i] = UINT16_MAX;
+	at->reformat_off = UINT16_MAX;
+	at->mhdr_off = UINT16_MAX;
 	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
@@ -2956,12 +3234,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			at->masks[i].conf = masks->conf;
 		}
 	}
+	at->tmpl = flow_hw_dr_actions_template_create(at);
+	if (!at->tmpl)
+		goto error;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	if (at)
+	if (at) {
+		if (at->tmpl)
+			mlx5dr_action_template_destroy(at->tmpl);
 		mlx5_free(at);
+	}
 	return NULL;
 }
 
@@ -2992,6 +3276,8 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 				   "action template in using");
 	}
 	LIST_REMOVE(template, next);
+	if (template->tmpl)
+		mlx5dr_action_template_destroy(template->tmpl);
 	mlx5_free(template);
 	return 0;
 }
@@ -3042,11 +3328,48 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			 const struct rte_flow_item items[],
 			 struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 	bool items_end = false;
-	RTE_SET_USED(dev);
-	RTE_SET_USED(attr);
 
+	if (!attr->ingress && !attr->egress && !attr->transfer)
+		return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "at least one of the direction attributes"
+					  " must be specified");
+	if (priv->sh->config.dv_esw_en) {
+		MLX5_ASSERT(priv->master || priv->representor);
+		if (priv->master) {
+			/*
+			 * It is allowed to specify ingress, egress and transfer attributes
+			 * at the same time, in order to construct flows catching all missed
+			 * FDB traffic and forwarding it to the master port.
+			 */
+			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "only one or all direction attributes"
+							  " at once can be used on transfer proxy"
+							  " port");
+		} else {
+			if (attr->transfer)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+							  "transfer attribute cannot be used with"
+							  " port representors");
+			if (attr->ingress && attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "ingress and egress direction attributes"
+							  " cannot be used at the same time on"
+							  " port representors");
+		}
+	} else {
+		if (attr->transfer)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+						  "transfer attribute cannot be used when"
+						  " E-Switch is disabled");
+	}
 	for (i = 0; !items_end; i++) {
 		int type = items[i].type;
 
@@ -3069,7 +3392,6 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		{
 			const struct rte_flow_item_tag *tag =
 				(const struct rte_flow_item_tag *)items[i].spec;
-			struct mlx5_priv *priv = dev->data->dev_private;
 			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
 
 			if (!((1 << (tag->index - REG_C_0)) & regcs))
@@ -3077,7 +3399,26 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 							  NULL,
 							  "Unsupported internal tag index");
+			break;
 		}
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+			if (attr->ingress || attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when transfer attribute is set");
+			break;
+		case RTE_FLOW_ITEM_TYPE_META:
+			if (!priv->sh->config.dv_esw_en ||
+			    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+				if (attr->ingress)
+					return rte_flow_error_set(error, EINVAL,
+								  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+								  "META item is not supported"
+								  " on current FW with ingress"
+								  " attribute");
+			}
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -3087,10 +3428,8 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_TCP:
 		case RTE_FLOW_ITEM_TYPE_GTP:
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
-		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
-		case RTE_FLOW_ITEM_TYPE_META:
 		case RTE_FLOW_ITEM_TYPE_GRE:
 		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
@@ -3138,21 +3477,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress) {
-		/*
-		 * Disallow pattern template with ingress and egress/transfer
-		 * attributes in order to forbid implicit port matching
-		 * on egress and transfer traffic.
-		 */
-		if (attr->egress || attr->transfer) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					   NULL,
-					   "item template for ingress traffic"
-					   " cannot be used for egress/transfer"
-					   " traffic when E-Switch is enabled");
-			return NULL;
-		}
+	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
 		copied_items = flow_hw_copy_prepend_port_item(items, error);
 		if (!copied_items)
 			return NULL;
@@ -4536,6 +4861,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
+		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
+		flow_hw_table_destroy(dev, tbl, NULL);
+	}
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -4673,6 +5002,54 @@ void flow_hw_clear_tags_set(struct rte_eth_dev *dev)
 		       sizeof(enum modify_reg) * MLX5_FLOW_HW_TAGS_MAX);
 }
 
+uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+/**
+ * Initializes static configuration of META flow items.
+ *
+ * As a temporary workaround, META flow item is translated to a register,
+ * based on statically saved dv_esw_en and dv_xmeta_en device arguments.
+ * It is a workaround for flow_hw_get_reg_id() where port specific information
+ * is not available at runtime.
+ *
+ * Values of dv_esw_en and dv_xmeta_en device arguments are taken from the first opened port.
+ * This means that each mlx5 port will use the same configuration for translation
+ * of META flow items.
+ *
+ * @param[in] dev
+ *    Pointer to Ethernet device.
+ */
+void
+flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_fetch_add(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = MLX5_SH(dev)->config.dv_esw_en;
+	mlx5_flow_hw_flow_metadata_xmeta_en = MLX5_SH(dev)->config.dv_xmeta_en;
+}
+
+/**
+ * Clears statically stored configuration related to META flow items.
+ */
+void
+flow_hw_clear_flow_metadata_config(void)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_sub_fetch(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = 0;
+	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
+}
+
 /**
  * Create shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cccec08d70..c260c81e57 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1170,6 +1170,16 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 			dev->data->port_id, rte_strerror(rte_errno));
 		goto error;
 	}
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		ret = flow_hw_table_update(dev, NULL);
+		if (ret) {
+			DRV_LOG(ERR, "port %u failed to update HWS tables",
+				dev->data->port_id);
+			goto error;
+		}
+	}
+#endif
 	ret = mlx5_traffic_enable(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u failed to set defaults flows",
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 10/18] net/mlx5: add HW steering connection tracking support
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (8 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 09/18] net/mlx5: support DR action template API Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

This commit adds the support of connection tracking to HW steering as
SW steering did before.

Different with SW steering implementation, take advantage of HW steering
bulk action allocation support, in HW steering only one single CT pool
is needed.

An indexed pool is introduced to record allocated actions from bulk and
CT action state etc. Once one CT action is allocated from bulk, one
indexed object will also be allocated from the indexed pool, similar for
deallocate. That makes mlx5_aso_ct_action can also be managed by that
indexed pool, no need to be reserved from mlx5_aso_ct_pool. The single
CT pool is also saved to mlx5_aso_ct_action struct directly.

The ASO operation functions are shared with SW steering implementation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   8 +-
 drivers/net/mlx5/mlx5.c          |   3 +-
 drivers/net/mlx5/mlx5.h          |  54 ++++-
 drivers/net/mlx5/mlx5_flow.c     |   1 +
 drivers/net/mlx5/mlx5_flow.h     |   7 +
 drivers/net/mlx5/mlx5_flow_aso.c | 212 +++++++++++++----
 drivers/net/mlx5/mlx5_flow_dv.c  |  28 ++-
 drivers/net/mlx5/mlx5_flow_hw.c  | 381 ++++++++++++++++++++++++++++++-
 8 files changed, 617 insertions(+), 77 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 78cc44fae8..55801682cc 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1349,9 +1349,11 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			DRV_LOG(DEBUG, "Flow Hit ASO is supported.");
 		}
 #endif /* HAVE_MLX5_DR_CREATE_ACTION_ASO */
-#if defined(HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
-	defined(HAVE_MLX5_DR_ACTION_ASO_CT)
-		if (hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
+#if defined (HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
+    defined (HAVE_MLX5_DR_ACTION_ASO_CT)
+		/* HWS create CT ASO SQ based on HWS configure queue number. */
+		if (sh->config.dv_flow_en != 2 &&
+		    hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
 			err = mlx5_flow_aso_ct_mng_init(sh);
 			if (err) {
 				err = -err;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e7a4aac354..6490ac636c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -755,7 +755,8 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 
 	if (sh->ct_mng)
 		return 0;
-	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng),
+	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng) +
+				 sizeof(struct mlx5_aso_sq) * MLX5_ASO_CT_SQ_NUM,
 				 RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
 	if (!sh->ct_mng) {
 		DRV_LOG(ERR, "ASO CT management allocation failed.");
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b8663e0322..9c080e5eac 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -44,6 +44,8 @@
 
 #define MLX5_SH(dev) (((struct mlx5_priv *)(dev)->data->dev_private)->sh)
 
+#define MLX5_HW_INV_QUEUE UINT32_MAX
+
 /*
  * Number of modification commands.
  * The maximal actions amount in FW is some constant, and it is 16 in the
@@ -1164,7 +1166,12 @@ enum mlx5_aso_ct_state {
 
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
-	LIST_ENTRY(mlx5_aso_ct_action) next; /* Pointer to the next ASO CT. */
+	union {
+		LIST_ENTRY(mlx5_aso_ct_action) next;
+		/* Pointer to the next ASO CT. Used only in SWS. */
+		struct mlx5_aso_ct_pool *pool;
+		/* Pointer to action pool. Used only in HWS. */
+	};
 	void *dr_action_orig; /* General action object for original dir. */
 	void *dr_action_rply; /* General action object for reply dir. */
 	uint32_t refcnt; /* Action used count in device flows. */
@@ -1178,28 +1185,48 @@ struct mlx5_aso_ct_action {
 #define MLX5_ASO_CT_UPDATE_STATE(c, s) \
 	__atomic_store_n(&((c)->state), (s), __ATOMIC_RELAXED)
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+
 /* ASO connection tracking software pool definition. */
 struct mlx5_aso_ct_pool {
 	uint16_t index; /* Pool index in pools array. */
+	/* Free ASO CT index in the pool. Used by HWS. */
+	struct mlx5_indexed_pool *cts;
 	struct mlx5_devx_obj *devx_obj;
-	/* The first devx object in the bulk, used for freeing (not yet). */
-	struct mlx5_aso_ct_action actions[MLX5_ASO_CT_ACTIONS_PER_POOL];
+	union {
+		void *dummy_action;
+		/* Dummy action to increase the reference count in the driver. */
+		struct mlx5dr_action *dr_action;
+		/* HWS action. */
+	};
+	struct mlx5_aso_sq *sq; /* Async ASO SQ. */
+	struct mlx5_aso_sq *shared_sq; /* Shared ASO SQ. */
+	struct mlx5_aso_ct_action actions[0];
 	/* CT action structures bulk. */
 };
 
 LIST_HEAD(aso_ct_list, mlx5_aso_ct_action);
 
+#define MLX5_ASO_CT_SQ_NUM 16
+
 /* Pools management structure for ASO connection tracking pools. */
 struct mlx5_aso_ct_pools_mng {
 	struct mlx5_aso_ct_pool **pools;
 	uint16_t n; /* Total number of pools. */
 	uint16_t next; /* Number of pools in use, index of next free pool. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
 	rte_spinlock_t ct_sl; /* The ASO CT free list lock. */
 	rte_rwlock_t resize_rwl; /* The ASO CT pool resize lock. */
 	struct aso_ct_list free_cts; /* Free ASO CT objects list. */
-	struct mlx5_aso_sq aso_sq; /* ASO queue objects. */
+	struct mlx5_aso_sq aso_sqs[0]; /* ASO queue objects. */
 };
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
 /* LAG attr. */
 struct mlx5_lag {
 	uint8_t tx_remap_affinity[16]; /* The PF port number of affinity */
@@ -1337,8 +1364,7 @@ struct mlx5_dev_ctx_shared {
 	rte_spinlock_t geneve_tlv_opt_sl; /* Lock for geneve tlv resource */
 	struct mlx5_flow_mtr_mng *mtrmng;
 	/* Meter management structure. */
-	struct mlx5_aso_ct_pools_mng *ct_mng;
-	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pools_mng *ct_mng; /* Management data for ASO CT in HWS only. */
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
@@ -1654,6 +1680,9 @@ struct mlx5_priv {
 	/* HW steering create ongoing rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
+	struct mlx5_aso_ct_pools_mng *ct_mng;
+	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
 #endif
 };
 
@@ -2053,15 +2082,15 @@ int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
-int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
-int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
 			     struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
 mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
@@ -2072,6 +2101,11 @@ int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_hws_cnt_pool *cpool);
+int mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_aso_ct_pools_mng *ct_mng,
+			   uint32_t nb_queues);
+int mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_aso_ct_pools_mng *ct_mng);
 
 /* mlx5_flow_flex.c */
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 38932fe9d7..7c3295609d 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -49,6 +49,7 @@ struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
  */
 uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 enum modify_reg mlx5_flow_hw_avl_tags[MLX5_FLOW_HW_TAGS_MAX] = {REG_NON};
+enum modify_reg mlx5_flow_hw_aso_tag;
 
 struct tunnel_default_miss_ctx {
 	uint16_t *queue;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 210cc9ae3e..7e90eac2d0 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -82,6 +82,10 @@ enum {
 #define MLX5_INDIRECT_ACT_CT_GET_IDX(index) \
 	((index) & ((1 << MLX5_INDIRECT_ACT_CT_OWNER_SHIFT) - 1))
 
+#define MLX5_ACTION_CTX_CT_GET_IDX  MLX5_INDIRECT_ACT_CT_GET_IDX
+#define MLX5_ACTION_CTX_CT_GET_OWNER MLX5_INDIRECT_ACT_CT_GET_OWNER
+#define MLX5_ACTION_CTX_CT_GEN_IDX MLX5_INDIRECT_ACT_CT_GEN_IDX
+
 /* Matches on selected register. */
 struct mlx5_rte_flow_item_tag {
 	enum modify_reg id;
@@ -1455,6 +1459,7 @@ extern struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
 #define MLX5_FLOW_HW_TAGS_MAX 8
 extern uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 extern enum modify_reg mlx5_flow_hw_avl_tags[];
+extern enum modify_reg mlx5_flow_hw_aso_tag;
 
 /*
  * Get metadata match tag and mask for given rte_eth_dev port.
@@ -1531,6 +1536,8 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 * REG_B case should be rejected on pattern template validation.
 		 */
 		return REG_A;
+	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index ed9272e583..c00c07b891 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -313,16 +313,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		/* 64B per object for query. */
-		if (mlx5_aso_reg_mr(cdev, 64 * sq_desc_n,
-				    &sh->ct_mng->aso_sq.mr))
+		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
 			return -1;
-		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
-			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
-			return -1;
-		}
-		mlx5_aso_ct_init_sq(&sh->ct_mng->aso_sq);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
@@ -343,7 +335,7 @@ void
 mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		      enum mlx5_access_aso_opc_mod aso_opc_mod)
 {
-	struct mlx5_aso_sq *sq;
+	struct mlx5_aso_sq *sq = NULL;
 
 	switch (aso_opc_mod) {
 	case ASO_OPC_MOD_FLOW_HIT:
@@ -354,14 +346,14 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->mtrmng->pools_mng.sq;
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		mlx5_aso_dereg_mr(sh->cdev, &sh->ct_mng->aso_sq.mr);
-		sq = &sh->ct_mng->aso_sq;
+		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
 		return;
 	}
-	mlx5_aso_destroy_sq(sq);
+	if (sq)
+		mlx5_aso_destroy_sq(sq);
 }
 
 /**
@@ -903,6 +895,89 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 	return -1;
 }
 
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_hws(uint32_t queue,
+			    struct mlx5_aso_ct_pool *pool)
+{
+	return (queue == MLX5_HW_INV_QUEUE) ?
+		pool->shared_sq : &pool->sq[queue];
+}
+
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_sws(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_ct_action *ct)
+{
+	return &sh->ct_mng->aso_sqs[ct->offset & (MLX5_ASO_CT_SQ_NUM - 1)];
+}
+
+static inline struct mlx5_aso_ct_pool*
+__mlx5_aso_ct_get_pool(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_action *ct)
+{
+	if (likely(sh->config.dv_flow_en == 2))
+		return ct->pool;
+	return container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+}
+
+int
+mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			 struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < ct_mng->nb_sq; i++) {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	}
+	return 0;
+}
+
+/**
+ * API to create and initialize CT Send Queue used for ASO access.
+ *
+ * @param[in] sh
+ *   Pointer to shared device context.
+ * @param[in] ct_mng
+ *   Pointer to the CT management struct.
+ * *param[in] nb_queues
+ *   Number of queues to be allocated.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_pools_mng *ct_mng,
+		       uint32_t nb_queues)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < nb_queues; i++) {
+		if (mlx5_aso_reg_mr(sh->cdev, 64 * (1 << MLX5_ASO_QUEUE_LOG_DESC),
+				    &ct_mng->aso_sqs[i].mr))
+			goto error;
+		if (mlx5_aso_sq_create(sh->cdev, &ct_mng->aso_sqs[i],
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_ct_init_sq(&ct_mng->aso_sqs[i]);
+	}
+	ct_mng->nb_sq = nb_queues;
+	return 0;
+error:
+	do {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		if (&ct_mng->aso_sqs[i])
+			mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	} while (i--);
+	ct_mng->nb_sq = 0;
+	return -1;
+}
+
 /*
  * Post a WQE to the ASO CT SQ to modify the context.
  *
@@ -918,11 +993,12 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
  */
 static uint16_t
 mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
+			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile)
+			      const struct rte_flow_action_conntrack *profile,
+			      bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -931,11 +1007,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	void *orig_dir;
 	void *reply_dir;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	/* Prevent other threads to update the index. */
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -945,7 +1023,7 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
 	sq->elts[sq->head & mask].ct = ct;
 	sq->elts[sq->head & mask].query_data = NULL;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1028,7 +1106,8 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1080,10 +1159,11 @@ mlx5_aso_ct_status_update(struct mlx5_aso_sq *sq, uint16_t num)
  */
 static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
-			    struct mlx5_aso_ct_action *ct, char *data)
+			    struct mlx5_aso_sq *sq,
+			    struct mlx5_aso_ct_action *ct, char *data,
+			    bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -1098,10 +1178,12 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	} else if (state == ASO_CONNTRACK_WAIT) {
 		return 0;
 	}
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -1113,7 +1195,7 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	wqe_idx = sq->head & mask;
 	sq->elts[wqe_idx].ct = ct;
 	sq->elts[wqe_idx].query_data = data;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1141,7 +1223,8 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1152,9 +1235,10 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
  *   Pointer to the CT pools management structure.
  */
 static void
-mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
+mlx5_aso_ct_completion_handle(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			      struct mlx5_aso_sq *sq,
+			      bool need_lock)
 {
-	struct mlx5_aso_sq *sq = &mng->aso_sq;
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
 	const uint32_t cq_size = 1 << cq->log_desc_n;
@@ -1165,10 +1249,12 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return;
 	}
 	next_idx = cq->cq_ci & mask;
@@ -1199,7 +1285,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /*
@@ -1207,6 +1294,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue index.
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  * @param[in] profile
@@ -1217,21 +1306,26 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  */
 int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
 			  const struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, ct, profile))
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1242,6 +1336,8 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue which CT works on..
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  *
@@ -1249,25 +1345,29 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, -1 on failure.
  */
 int
-mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		       struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 	    ASO_CONNTRACK_READY)
 		return 0;
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 		    ASO_CONNTRACK_READY)
 			return 0;
 		/* Waiting for CQE ready, consider should block or sleep. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to poll CQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1363,18 +1463,24 @@ mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
  */
 int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
 			 struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	char out_data[64 * 2];
 	int ret;
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		ret = mlx5_aso_ct_sq_query_single(sh, ct, out_data);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1383,12 +1489,11 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		else
 			rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
 data_handle:
-	ret = mlx5_aso_ct_wait_ready(sh, ct);
+	ret = mlx5_aso_ct_wait_ready(sh, queue, ct);
 	if (!ret)
 		mlx5_aso_ct_obj_analyze(profile, out_data);
 	return ret;
@@ -1408,13 +1513,20 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
  */
 int
 mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+		      uint32_t queue,
 		      struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	enum mlx5_aso_ct_state state =
 				__atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (state == ASO_CONNTRACK_FREE) {
 		rte_errno = ENXIO;
 		return -rte_errno;
@@ -1423,13 +1535,13 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		return 0;
 	}
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		state = __atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 		if (state == ASO_CONNTRACK_READY ||
 		    state == ASO_CONNTRACK_QUERY)
 			return 0;
-		/* Waiting for CQE ready, consider should block or sleep. */
-		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
+		/* Waiting for CQE ready, consider should block or sleep.  */
+		rte_delay_us_block(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
 	rte_errno = EBUSY;
 	return -rte_errno;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index a0bcaa5c53..ea13345baf 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12813,6 +12813,7 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 	struct mlx5_devx_obj *obj = NULL;
 	uint32_t i;
 	uint32_t log_obj_size = rte_log2_u32(MLX5_ASO_CT_ACTIONS_PER_POOL);
+	size_t mem_size;
 
 	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
 							  priv->sh->cdev->pdn,
@@ -12822,7 +12823,10 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
 		return NULL;
 	}
-	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	mem_size = sizeof(struct mlx5_aso_ct_action) *
+		   MLX5_ASO_CT_ACTIONS_PER_POOL +
+		   sizeof(*pool);
+	pool = mlx5_malloc(MLX5_MEM_ZERO, mem_size, 0, SOCKET_ID_ANY);
 	if (!pool) {
 		rte_errno = ENOMEM;
 		claim_zero(mlx5_devx_cmd_destroy(obj));
@@ -12962,10 +12966,13 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, ct, pro))
-		return rte_flow_error_set(error, EBUSY,
-					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					  "Failed to update CT");
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+		flow_dv_aso_ct_dev_release(dev, idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	return idx;
@@ -14160,7 +14167,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
 						"Failed to get CT object.");
-			if (mlx5_aso_ct_available(priv->sh, ct))
+			if (mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct))
 				return rte_flow_error_set(error, rte_errno,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
@@ -15768,14 +15775,15 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						ct, new_prf);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
 					"Failed to send CT context update WQE");
-		/* Block until ready or a failure. */
-		ret = mlx5_aso_ct_available(priv->sh, ct);
+		/* Block until ready or a failure, default is asynchronous. */
+		ret = mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct);
 		if (ret)
 			rte_flow_error_set(error, rte_errno,
 					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16604,7 +16612,7 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 5b7ef1be68..535df6ba5d 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -15,6 +15,14 @@
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_HW_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_HW_IPOOL_CACHE_MIN (1 << 9)
+
 /* Default push burst threshold. */
 #define BURST_THR 32u
 
@@ -324,6 +332,25 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 	return hrxq;
 }
 
+static __rte_always_inline int
+flow_hw_ct_compile(struct rte_eth_dev *dev,
+		   uint32_t queue, uint32_t idx,
+		   struct mlx5dr_rule_action *rule_act)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(priv->hws_ctpool->cts, MLX5_ACTION_CTX_CT_GET_IDX(idx));
+	if (!ct || mlx5_aso_ct_available(priv->sh, queue, ct))
+		return -1;
+	rule_act->action = priv->hws_ctpool->dr_action;
+	rule_act->aso_ct.offset = ct->offset;
+	rule_act->aso_ct.direction = ct->is_original ?
+		MLX5DR_ACTION_ASO_CT_DIRECTION_INITIATOR :
+		MLX5DR_ACTION_ASO_CT_DIRECTION_RESPONDER;
+	return 0;
+}
+
 /**
  * Destroy DR actions created by action template.
  *
@@ -640,6 +667,11 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
+				       idx, &acts->rule_acts[action_dst]))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1083,6 +1115,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	bool reformat_used = false;
 	uint16_t action_pos;
 	uint16_t jump_pos;
+	uint32_t ct_idx;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1305,6 +1338,20 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf) {
+				ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+					 ((uint32_t)(uintptr_t)actions->conf);
+				if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE, ct_idx,
+						       &acts->rule_acts[action_pos]))
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos)) {
+				goto err;
+			}
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1479,6 +1526,8 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev data structure.
+ * @param[in] queue
+ *   The flow creation queue index.
  * @param[in] action
  *   Pointer to the shared indirect rte_flow action.
  * @param[in] table
@@ -1492,7 +1541,7 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *    0 on success, negative value otherwise and rte_errno is set.
  */
 static __rte_always_inline int
-flow_hw_shared_action_construct(struct rte_eth_dev *dev,
+flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
 				const uint8_t it_idx,
@@ -1532,6 +1581,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				&rule_act->counter.offset))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1727,6 +1780,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		uint32_t ct_idx;
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
@@ -1735,7 +1789,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
-					(dev, action, table, it_idx,
+					(dev, queue, action, table, it_idx,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -1860,6 +1914,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				return ret;
 			job->flow->cnt_id = act_data->shared_counter.id;
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+				 ((uint32_t)(uintptr_t)action->conf);
+			if (flow_hw_ct_compile(dev, queue, ct_idx,
+					       &rule_acts[act_data->action_dst]))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2391,6 +2452,8 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	if (nb_flows < cfg.trunk_size) {
 		cfg.per_core_cache = 0;
 		cfg.trunk_size = nb_flows;
+	} else if (nb_flows <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
 	}
 	/* Check if we requires too many templates. */
 	if (nb_item_templates > max_tpl ||
@@ -2927,6 +2990,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2953,6 +3019,7 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 };
 
 static int
@@ -2981,6 +3048,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3435,6 +3507,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
 		case RTE_FLOW_ITEM_TYPE_ICMP:
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
+		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
@@ -4627,6 +4700,97 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	return -EINVAL;
 }
 
+static void
+flow_hw_ct_mng_destroy(struct rte_eth_dev *dev,
+		       struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	mlx5_aso_ct_queue_uninit(priv->sh, ct_mng);
+	mlx5_free(ct_mng);
+}
+
+static void
+flow_hw_ct_pool_destroy(struct rte_eth_dev *dev __rte_unused,
+			struct mlx5_aso_ct_pool *pool)
+{
+	if (pool->dr_action)
+		mlx5dr_action_destroy(pool->dr_action);
+	if (pool->devx_obj)
+		claim_zero(mlx5_devx_cmd_destroy(pool->devx_obj));
+	if (pool->cts)
+		mlx5_ipool_destroy(pool->cts);
+	mlx5_free(pool);
+}
+
+static struct mlx5_aso_ct_pool *
+flow_hw_ct_pool_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *port_attr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_devx_obj *obj;
+	uint32_t nb_cts = rte_align32pow2(port_attr->nb_conn_tracks);
+	uint32_t log_obj_size = rte_log2_u32(nb_cts);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_ct_action),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hw_ct_action",
+	};
+	int reg_id;
+	uint32_t flags;
+
+	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	if (!pool) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
+							  priv->sh->cdev->pdn,
+							  log_obj_size);
+	if (!obj) {
+		rte_errno = ENODATA;
+		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
+		goto err;
+	}
+	pool->devx_obj = obj;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_ASO_CONNTRACK, 0, NULL);
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	pool->dr_action = mlx5dr_action_create_aso_ct(priv->dr_ctx,
+						      (struct mlx5dr_devx_obj *)obj,
+						      reg_id - REG_C_0, flags);
+	if (!pool->dr_action)
+		goto err;
+	/*
+	 * No need for local cache if CT number is a small number. Since
+	 * flow insertion rate will be very limited in that case. Here let's
+	 * set the number to less than default trunk size 4K.
+	 */
+	if (nb_cts <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_cts;
+	} else if (nb_cts <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	pool->cts = mlx5_ipool_create(&cfg);
+	if (!pool->cts)
+		goto err;
+	pool->sq = priv->ct_mng->aso_sqs;
+	/* Assign the last extra ASO SQ as public SQ. */
+	pool->shared_sq = &priv->ct_mng->aso_sqs[priv->nb_queue - 1];
+	return pool;
+err:
+	flow_hw_ct_pool_destroy(dev, pool);
+	return NULL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4809,6 +4973,20 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_conn_tracks) {
+		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
+			   sizeof(*priv->ct_mng);
+		priv->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
+					   RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!priv->ct_mng)
+			goto err;
+		if (mlx5_aso_ct_queue_init(priv->sh, priv->ct_mng, nb_q_updated))
+			goto err;
+		priv->hws_ctpool = flow_hw_ct_pool_create(dev, port_attr);
+		if (!priv->hws_ctpool)
+			goto err;
+		priv->sh->ct_aso_en = 1;
+	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
 				nb_queue);
@@ -4817,6 +4995,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	return 0;
 err:
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -4890,6 +5076,14 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	}
 	if (priv->hws_cpool)
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4958,6 +5152,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
+		MLX5_ASSERT(mlx5_flow_hw_aso_tag == priv->mtr_color_reg);
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
@@ -4980,6 +5175,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		}
 	}
 	priv->sh->hws_tags = 1;
+	mlx5_flow_hw_aso_tag = (enum modify_reg)priv->mtr_color_reg;
 	mlx5_flow_hw_avl_tags_init_cnt++;
 }
 
@@ -5050,6 +5246,170 @@ flow_hw_clear_flow_metadata_config(void)
 	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
 }
 
+static int
+flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
+			  uint32_t idx,
+			  struct rte_flow_error *error)
+{
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	struct rte_eth_dev *owndev = &rte_eth_devices[owner];
+	struct mlx5_priv *priv = owndev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT destruction index");
+	}
+	__atomic_store_n(&ct->state, ASO_CONNTRACK_FREE,
+				 __ATOMIC_RELAXED);
+	mlx5_ipool_free(pool->cts, ct_idx);
+	return 0;
+}
+
+static int
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+			struct rte_flow_action_conntrack *profile,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+
+	if (owner != PORT_ID(priv))
+		return rte_flow_error_set(error, EACCES,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Can't query CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT query index");
+	}
+	profile->peer_port = ct->peer;
+	profile->is_original_dir = ct->is_original;
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+		return rte_flow_error_set(error, EIO,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Failed to query CT context");
+	return 0;
+}
+
+
+static int
+flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_modify_conntrack *action_conf,
+			 uint32_t idx, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	const struct rte_flow_action_conntrack *new_prf;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+	int ret = 0;
+
+	if (PORT_ID(priv) != owner)
+		return rte_flow_error_set(error, EACCES,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "Can't update CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT update index");
+	}
+	new_prf = &action_conf->new_ct;
+	if (action_conf->direction)
+		ct->is_original = !!new_prf->is_original_dir;
+	if (action_conf->state) {
+		/* Only validate the profile when it needs to be updated. */
+		ret = mlx5_validate_action_ct(dev, new_prf, error);
+		if (ret)
+			return ret;
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		if (ret)
+			return rte_flow_error_set(error, EIO,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL,
+					"Failed to send CT context update WQE");
+		if (queue != MLX5_HW_INV_QUEUE)
+			return 0;
+		/* Block until ready or a failure in synchronous mode. */
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret)
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+	}
+	return ret;
+}
+
+static struct rte_flow_action_handle *
+flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action_conntrack *pro,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint32_t ct_idx = 0;
+	int ret;
+	bool async = !!(queue != MLX5_HW_INV_QUEUE);
+
+	if (!pool) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "CT is not enabled");
+		return 0;
+	}
+	ct = mlx5_ipool_zmalloc(pool->cts, &ct_idx);
+	if (!ct) {
+		rte_flow_error_set(error, rte_errno,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to allocate CT object");
+		return 0;
+	}
+	ct->offset = ct_idx - 1;
+	ct->is_original = !!pro->is_original_dir;
+	ct->peer = pro->peer_port;
+	ct->pool = pool;
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+		mlx5_ipool_free(pool->cts, ct_idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
+	if (!async) {
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret) {
+			mlx5_ipool_free(pool->cts, ct_idx);
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+			return 0;
+		}
+	}
+	return (struct rte_flow_action_handle *)(uintptr_t)
+		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
+}
+
 /**
  * Create shared action.
  *
@@ -5097,6 +5457,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			handle = (struct rte_flow_action_handle *)
 				 (uintptr_t)cnt_id;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5132,10 +5495,18 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_update(dev, handle, update, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	default:
+		return flow_dv_action_update(dev, handle, update, error);
+	}
 }
 
 /**
@@ -5174,6 +5545,8 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_destroy(dev, act_idx, error);
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -5327,6 +5700,8 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_query(dev, act_idx, data, error);
 	default:
 		return flow_dv_action_query(dev, handle, data, error);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (9 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

Add PMD implementation for HW steering VLAN push, pop and modify flow
actions.

HWS VLAN push flow action is triggered by a sequence of mandatory
OF_PUSH_VLAN, OF_SET_VLAN_VID and optional OF_SET_VLAN_PCP
flow actions commands.
The commands must be arranged in the exact order:
OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ].
In masked HWS VLAN push flow action template *ALL* the above flow
actions must be masked.
In non-masked HWS VLAN push flow action template *ALL* the above flow
actions must not be masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan / \
  of_set_vlan_vid \
  [ / of_set_vlan_pcp  ] / end \
mask \
  of_push_vlan ethertype 0 / \
  of_set_vlan_vid vlan_vid 0 \
  [ / of_set_vlan_pcp vlan_pcp 0 ] / end\

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan ethertype <E>/ \
  of_set_vlan_vid vlan_vid <VID>\
  [ / of_set_vlan_pcp  <PCP>] / end \
mask \
  of_push_vlan ethertype <type != 0> / \
  of_set_vlan_vid vlan_vid <vid_mask != 0>\
  [ / of_set_vlan_pcp vlan_pcp <pcp_mask != 0> ] / end\

HWS VLAN pop flow action is triggered by OF_POP_VLAN
flow action command.
HWS VLAN pop action template is always non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_pop_vlan / end mask of_pop_vlan / end

HWS VLAN VID modify flow action is triggered by a standalone
OF_SET_VLAN_VID flow action command.
HWS VLAN VID modify action template can be ether masked or non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid / end mask of_set_vlan_vid vlan_vid 0 / end

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid vlan_vid 0x101 / end \
mask of_set_vlan_vid vlan_vid 0xffff / end

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5.h         |   2 +
 drivers/net/mlx5/mlx5_flow.h    |   4 +
 drivers/net/mlx5/mlx5_flow_dv.c |   2 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 492 +++++++++++++++++++++++++++++---
 4 files changed, 463 insertions(+), 37 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9c080e5eac..e78ed958c8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1672,6 +1672,8 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action *hw_push_vlan[MLX5DR_TABLE_TYPE_MAX];
+	struct mlx5dr_action *hw_pop_vlan[MLX5DR_TABLE_TYPE_MAX];
 	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
 	struct mlx5dr_action *hw_drop[2];
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 7e90eac2d0..b8124f6f79 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2448,4 +2448,8 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		struct rte_flow_error *error);
 int flow_hw_table_update(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error);
+int mlx5_flow_item_field_width(struct rte_eth_dev *dev,
+			   enum rte_flow_field_id field, int inherit,
+			   const struct rte_flow_attr *attr,
+			   struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index ea13345baf..7efc936ddd 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1326,7 +1326,7 @@ flow_dv_convert_action_modify_ipv6_dscp
 					     MLX5_MODIFICATION_TYPE_SET, error);
 }
 
-static int
+int
 mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 			   enum rte_flow_field_id field, int inherit,
 			   const struct rte_flow_attr *attr,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 535df6ba5d..0c110819e6 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -44,12 +44,22 @@
 /* Lowest priority for HW non-root table. */
 #define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
+#define MLX5_HW_VLAN_PUSH_TYPE_IDX 0
+#define MLX5_HW_VLAN_PUSH_VID_IDX 1
+#define MLX5_HW_VLAN_PUSH_PCP_IDX 2
+
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
 static int flow_hw_translate_group(struct rte_eth_dev *dev,
 				   const struct mlx5_flow_template_table_cfg *cfg,
 				   uint32_t group,
 				   uint32_t *table_group,
 				   struct rte_flow_error *error);
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -1065,6 +1075,52 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	return 0;
 }
 
+static __rte_always_inline bool
+is_of_vlan_pcp_present(const struct rte_flow_action *actions)
+{
+	/*
+	 * Order of RTE VLAN push actions is
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	return actions[MLX5_HW_VLAN_PUSH_PCP_IDX].type ==
+		RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP;
+}
+
+static __rte_always_inline bool
+is_template_masked_push_vlan(const struct rte_flow_action_of_push_vlan *mask)
+{
+	/*
+	 * In masked push VLAN template all RTE push actions are masked.
+	 */
+	return mask && mask->ethertype != 0;
+}
+
+static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
+{
+/*
+ * OpenFlow Switch Specification defines 801.1q VID as 12+1 bits.
+ */
+	rte_be32_t type, vid, pcp;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	rte_be32_t vid_lo, vid_hi;
+#endif
+
+	type = ((const struct rte_flow_action_of_push_vlan *)
+		actions[MLX5_HW_VLAN_PUSH_TYPE_IDX].conf)->ethertype;
+	vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+		actions[MLX5_HW_VLAN_PUSH_VID_IDX].conf)->vlan_vid;
+	pcp = is_of_vlan_pcp_present(actions) ?
+	      ((const struct rte_flow_action_of_set_vlan_pcp *)
+		      actions[MLX5_HW_VLAN_PUSH_PCP_IDX].conf)->vlan_pcp : 0;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	vid_hi = vid & 0xff;
+	vid_lo = vid >> 8;
+	return (((vid_lo << 8) | (pcp << 5) | vid_hi) << 16) | type;
+#else
+	return (type << 16) | (pcp << 13) | vid;
+#endif
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1167,6 +1223,26 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_push_vlan[type];
+			if (is_template_masked_push_vlan(masks->conf))
+				acts->rule_acts[action_pos].push_vlan.vlan_hdr =
+					vlan_hdr_to_be32(actions);
+			else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos))
+				goto err;
+			actions += is_of_vlan_pcp_present(actions) ?
+					MLX5_HW_VLAN_PUSH_PCP_IDX :
+					MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_pop_vlan[type];
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			action_pos = at->actions_off[actions - at->actions];
 			if (masks->conf &&
@@ -1784,8 +1860,17 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
-		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
-			    (int)action->type == act_data->type);
+		/*
+		 * action template construction replaces
+		 * OF_SET_VLAN_VID with MODIFY_FIELD
+		 */
+		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+			MLX5_ASSERT(act_data->type ==
+				    RTE_FLOW_ACTION_TYPE_MODIFY_FIELD);
+		else
+			MLX5_ASSERT(action->type ==
+				    RTE_FLOW_ACTION_TYPE_INDIRECT ||
+				    (int)action->type == act_data->type);
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
@@ -1801,6 +1886,10 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			      (action->conf))->id);
 			rule_acts[act_data->action_dst].tag.value = tag;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			rule_acts[act_data->action_dst].push_vlan.vlan_hdr =
+				vlan_hdr_to_be32(action);
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
@@ -1852,10 +1941,16 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				    act_data->encap.len);
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			ret = flow_hw_modify_field_construct(job,
-							     act_data,
-							     hw_acts,
-							     action);
+			if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+				ret = flow_hw_set_vlan_vid_construct(dev, job,
+								     act_data,
+								     hw_acts,
+								     action);
+			else
+				ret = flow_hw_modify_field_construct(job,
+								     act_data,
+								     hw_acts,
+								     action);
 			if (ret)
 				return -1;
 			break;
@@ -2559,9 +2654,14 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			mlx5_ipool_destroy(tbl->flow);
 		mlx5_free(tbl);
 	}
-	rte_flow_error_set(error, err,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
-			  "fail to create rte table");
+	if (error != NULL) {
+		rte_flow_error_set(error, err,
+				error->type == RTE_FLOW_ERROR_TYPE_NONE ?
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED : error->type,
+				NULL,
+				error->message == NULL ?
+				"fail to create rte table" : error->message);
+	}
 	return NULL;
 }
 
@@ -2865,28 +2965,76 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 				uint16_t *ins_pos)
 {
 	uint16_t idx, total = 0;
-	bool ins = false;
+	uint16_t end_idx = UINT16_MAX;
 	bool act_end = false;
+	bool modify_field = false;
+	bool rss_or_queue = false;
 
 	MLX5_ASSERT(actions && masks);
 	MLX5_ASSERT(new_actions && new_masks);
 	MLX5_ASSERT(ins_actions && ins_masks);
 	for (idx = 0; !act_end; idx++) {
-		if (idx >= MLX5_HW_MAX_ACTS)
-			return -1;
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
-		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
-			ins = true;
-			*ins_pos = idx;
-		}
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* It is assumed that application provided only single RSS/QUEUE action. */
+			MLX5_ASSERT(!rss_or_queue);
+			rss_or_queue = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			modify_field = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			end_idx = idx;
 			act_end = true;
+			break;
+		default:
+			break;
+		}
 	}
-	if (!ins)
+	if (!rss_or_queue)
 		return 0;
-	else if (idx == MLX5_HW_MAX_ACTS)
+	else if (idx >= MLX5_HW_MAX_ACTS)
 		return -1; /* No more space. */
 	total = idx;
+	/*
+	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
+	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
+	 * first MODIFY_FIELD flow action.
+	 */
+	if (modify_field) {
+		*ins_pos = end_idx;
+		goto insert_meta_copy;
+	}
+	/*
+	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
+	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	 */
+	act_end = false;
+	for (idx = 0; !act_end; idx++) {
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+		case RTE_FLOW_ACTION_TYPE_METER:
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			*ins_pos = idx;
+			act_end = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			act_end = true;
+			break;
+		default:
+			break;
+		}
+	}
+insert_meta_copy:
+	MLX5_ASSERT(*ins_pos != UINT16_MAX);
+	MLX5_ASSERT(*ins_pos < total);
 	/* Before the position, no change for the actions. */
 	for (idx = 0; idx < *ins_pos; idx++) {
 		new_actions[idx] = actions[idx];
@@ -2903,6 +3051,73 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 	return 0;
 }
 
+static int
+flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
+				  const
+				  struct rte_flow_actions_template_attr *attr,
+				  const struct rte_flow_action *action,
+				  const struct rte_flow_action *mask,
+				  struct rte_flow_error *error)
+{
+#define X_FIELD(ptr, t, f) (((ptr)->conf) && ((t *)((ptr)->conf))->f)
+
+	const bool masked_push =
+		X_FIELD(mask + MLX5_HW_VLAN_PUSH_TYPE_IDX,
+			const struct rte_flow_action_of_push_vlan, ethertype);
+	bool masked_param;
+
+	/*
+	 * Mandatory actions order:
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+	/* Check that mark matches OF_PUSH_VLAN */
+	if (mask[MLX5_HW_VLAN_PUSH_TYPE_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: mask does not match");
+	/* Check that the second template and mask items are SET_VLAN_VID */
+	if (action[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID ||
+	    mask[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: invalid actions order");
+	masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_VID_IDX,
+			       const struct rte_flow_action_of_set_vlan_vid,
+			       vlan_vid);
+	/*
+	 * PMD requires OF_SET_VLAN_VID mask to must match OF_PUSH_VLAN
+	 */
+	if (masked_push ^ masked_param)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "OF_SET_VLAN_VID: mask does not match OF_PUSH_VLAN");
+	if (is_of_vlan_pcp_present(action)) {
+		if (mask[MLX5_HW_VLAN_PUSH_PCP_IDX].type !=
+		     RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "OF_SET_VLAN_PCP: missing mask configuration");
+		masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_PCP_IDX,
+				       const struct
+				       rte_flow_action_of_set_vlan_pcp,
+				       vlan_pcp);
+		/*
+		 * PMD requires OF_SET_VLAN_PCP mask to must match OF_PUSH_VLAN
+		 */
+		if (masked_push ^ masked_param)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION, action,
+						  "OF_SET_VLAN_PCP: mask does not match OF_PUSH_VLAN");
+	}
+	return 0;
+#undef X_FIELD
+}
+
 static int
 flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
@@ -2993,6 +3208,18 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			ret = flow_hw_validate_action_push_vlan
+					(dev, attr, action, mask, error);
+			if (ret != 0)
+				return ret;
+			i += is_of_vlan_pcp_present(action) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -3020,6 +3247,8 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
+	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
+	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
 };
 
 static int
@@ -3136,6 +3365,14 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				goto err_actions_num;
 			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			i += is_of_vlan_pcp_present(at->actions + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3163,6 +3400,89 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	return NULL;
 }
 
+static void
+flow_hw_set_vlan_vid(struct rte_eth_dev *dev,
+		     struct rte_flow_action *ra,
+		     struct rte_flow_action *rm,
+		     struct rte_flow_action_modify_field *spec,
+		     struct rte_flow_action_modify_field *mask,
+		     int set_vlan_vid_ix)
+{
+	struct rte_flow_error error;
+	const bool masked = rm[set_vlan_vid_ix].conf &&
+		(((const struct rte_flow_action_of_set_vlan_vid *)
+			rm[set_vlan_vid_ix].conf)->vlan_vid != 0);
+	const struct rte_flow_action_of_set_vlan_vid *conf =
+		ra[set_vlan_vid_ix].conf;
+	rte_be16_t vid = masked ? conf->vlan_vid : 0;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	*spec = (typeof(*spec)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	*mask = (typeof(*mask)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0xffffffff, .offset = 0xffffffff,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = masked ? (1U << width) - 1 : 0,
+			.offset = 0,
+		},
+		.width = 0xffffffff,
+	};
+	ra[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	ra[set_vlan_vid_ix].conf = spec;
+	rm[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	rm[set_vlan_vid_ix].conf = mask;
+}
+
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	struct rte_flow_error error;
+	rte_be16_t vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+			   action->conf)->vlan_vid;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	struct rte_flow_action_modify_field conf = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	struct rte_flow_action modify_action = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &conf
+	};
+
+	return flow_hw_modify_field_construct(job, act_data, hw_acts,
+					      &modify_action);
+}
+
 /**
  * Create flow action template.
  *
@@ -3188,14 +3508,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_num, act_len, mask_len;
+	int len, act_len, mask_len;
+	unsigned int act_num;
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
-	uint16_t pos = MLX5_HW_MAX_ACTS;
+	uint16_t pos = UINT16_MAX;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
-	const struct rte_flow_action *ra;
-	const struct rte_flow_action *rm;
+	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
+	struct rte_flow_action *rm = (void *)(uintptr_t)masks;
+	int set_vlan_vid_ix = -1;
+	struct rte_flow_action_modify_field set_vlan_vid_spec = {0, };
+	struct rte_flow_action_modify_field set_vlan_vid_mask = {0, };
 	const struct rte_flow_action_modify_field rx_mreg = {
 		.operation = RTE_FLOW_MODIFY_SET,
 		.dst = {
@@ -3235,21 +3559,58 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
+		/* Application should make sure only one Q/RSS exist in one rule. */
 		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
 						    tmp_action, tmp_mask, &pos)) {
 			rte_flow_error_set(error, EINVAL,
 					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					   "Failed to concatenate new action/mask");
 			return NULL;
+		} else if (pos != UINT16_MAX) {
+			ra = tmp_action;
+			rm = tmp_mask;
 		}
 	}
-	/* Application should make sure only one Q/RSS exist in one rule. */
-	if (pos == MLX5_HW_MAX_ACTS) {
-		ra = actions;
-		rm = masks;
-	} else {
-		ra = tmp_action;
-		rm = tmp_mask;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		switch (ra[i].type) {
+		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			i += is_of_vlan_pcp_present(ra + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			set_vlan_vid_ix = i;
+			break;
+		default:
+			break;
+		}
+	}
+	/*
+	 * Count flow actions to allocate required space for storing DR offsets and to check
+	 * if temporary buffer would not be overrun.
+	 */
+	act_num = i + 1;
+	if (act_num >= MLX5_HW_MAX_ACTS) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
+		return NULL;
+	}
+	if (set_vlan_vid_ix != -1) {
+		/* If temporary action buffer was not used, copy template actions to it */
+		if (ra == actions && rm == masks) {
+			for (i = 0; i < act_num; ++i) {
+				tmp_action[i] = actions[i];
+				tmp_mask[i] = masks[i];
+				if (actions[i].type == RTE_FLOW_ACTION_TYPE_END)
+					break;
+			}
+			ra = tmp_action;
+			rm = tmp_mask;
+		}
+		flow_hw_set_vlan_vid(dev, ra, rm,
+				     &set_vlan_vid_spec, &set_vlan_vid_mask,
+				     set_vlan_vid_ix);
 	}
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
@@ -3259,10 +3620,6 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	/* Count flow actions to allocate required space for storing DR offsets. */
-	act_num = 0;
-	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
-		act_num++;
 	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
@@ -4510,7 +4867,11 @@ flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
 		.attr = tx_tbl_attr,
 		.external = false,
 	};
-	struct rte_flow_error drop_err;
+	struct rte_flow_error drop_err = {
+		.type = RTE_FLOW_ERROR_TYPE_NONE,
+		.cause = NULL,
+		.message = NULL,
+	};
 
 	RTE_SET_USED(drop_err);
 	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
@@ -4791,6 +5152,60 @@ flow_hw_ct_pool_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+static void
+flow_hw_destroy_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i < MLX5DR_TABLE_TYPE_MAX; i++) {
+		if (priv->hw_pop_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_pop_vlan[i]);
+			priv->hw_pop_vlan[i] = NULL;
+		}
+		if (priv->hw_push_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_push_vlan[i]);
+			priv->hw_push_vlan[i] = NULL;
+		}
+	}
+}
+
+static int
+flow_hw_create_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+	const enum mlx5dr_action_flags flags[MLX5DR_TABLE_TYPE_MAX] = {
+		MLX5DR_ACTION_FLAG_HWS_RX,
+		MLX5DR_ACTION_FLAG_HWS_TX,
+		MLX5DR_ACTION_FLAG_HWS_FDB
+	};
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i <= MLX5DR_TABLE_TYPE_NIC_TX; i++) {
+		priv->hw_pop_vlan[i] =
+			mlx5dr_action_create_pop_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+		priv->hw_push_vlan[i] =
+			mlx5dr_action_create_push_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+	}
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_pop_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+		priv->hw_push_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_push_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+	}
+	return 0;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4993,6 +5408,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	ret = flow_hw_create_vlan(dev);
+	if (ret)
+		goto err;
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -5010,6 +5428,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
 	mlx5_free(priv->hw_q);
@@ -5069,6 +5488,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 12/18] net/mlx5: implement METER MARK indirect action for HWS
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (10 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 13/18] net/mlx5: add HWS AGE action support Suanming Mou
                     ` (5 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Add ability to create an indirect action handle for METER_MARK.
It allows to share one Meter between several different actions.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |   1 +
 drivers/net/mlx5/mlx5.c            |   4 +-
 drivers/net/mlx5/mlx5.h            |  33 ++-
 drivers/net/mlx5/mlx5_flow.c       |   6 +
 drivers/net/mlx5/mlx5_flow.h       |  20 +-
 drivers/net/mlx5/mlx5_flow_aso.c   | 141 ++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c    | 145 +++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c    | 437 +++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c |  92 +++++-
 9 files changed, 774 insertions(+), 105 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0c7bd042a4..fc823111c6 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -105,6 +105,7 @@ Features
 - Sub-Function representors.
 - Sub-Function.
 - Matching on represented port.
+- Meter color.
 
 
 Limitations
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6490ac636c..64a0e6f31d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -442,7 +442,7 @@ mlx5_flow_aso_age_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT, 1);
 	if (err) {
 		mlx5_free(sh->aso_age_mng);
 		return -1;
@@ -763,7 +763,7 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING, MLX5_ASO_CT_SQ_NUM);
 	if (err) {
 		mlx5_free(sh->ct_mng);
 		/* rte_errno should be extracted from the failure. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e78ed958c8..d3267fafda 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -976,12 +976,16 @@ enum mlx5_aso_mtr_type {
 
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
-	LIST_ENTRY(mlx5_aso_mtr) next;
+	union {
+		LIST_ENTRY(mlx5_aso_mtr) next;
+		struct mlx5_aso_mtr_pool *pool;
+	};
 	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
 	uint32_t offset;
+	enum rte_color init_color;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -990,7 +994,11 @@ struct mlx5_aso_mtr_pool {
 	/*Must be the first in pool*/
 	struct mlx5_devx_obj *devx_obj;
 	/* The devx object of the minimum aso flow meter ID. */
+	struct mlx5dr_action *action; /* HWS action. */
+	struct mlx5_indexed_pool *idx_pool; /* HWS index pool. */
 	uint32_t index; /* Pool index in management structure. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
+	struct mlx5_aso_sq *sq; /* ASO SQs. */
 };
 
 LIST_HEAD(aso_meter_list, mlx5_aso_mtr);
@@ -1685,6 +1693,7 @@ struct mlx5_priv {
 	struct mlx5_aso_ct_pools_mng *ct_mng;
 	/* Management data for ASO connection tracking. */
 	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
+	struct mlx5_aso_mtr_pool *hws_mpool; /* HW steering's Meter pool. */
 #endif
 };
 
@@ -2005,7 +2014,8 @@ void mlx5_pmd_socket_uninit(void);
 int mlx5_flow_meter_init(struct rte_eth_dev *dev,
 			 uint32_t nb_meters,
 			 uint32_t nb_meter_profiles,
-			 uint32_t nb_meter_policies);
+			 uint32_t nb_meter_policies,
+			 uint32_t nb_queues);
 void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
@@ -2074,15 +2084,24 @@ eth_tx_burst_t mlx5_select_tx_function(struct rte_eth_dev *dev);
 
 /* mlx5_flow_aso.c */
 
+int mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_mtr_pool *hws_pool,
+			    struct mlx5_aso_mtr_pools_mng *pool_mng,
+			    uint32_t nb_queues);
+void mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			       struct mlx5_aso_mtr_pool *hws_pool,
+			       struct mlx5_aso_mtr_pools_mng *pool_mng);
 int mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
+			enum mlx5_access_aso_opc_mod aso_opc_mode,
+			uint32_t nb_queues);
 int mlx5_aso_flow_hit_queue_poll_start(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
-int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
-int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+			   enum mlx5_access_aso_opc_mod aso_opc_mod);
+int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
+				 struct mlx5_aso_mtr *mtr,
+				 struct mlx5_mtr_bulk *bulk);
+int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 7c3295609d..e3485352db 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -4223,6 +4223,12 @@ flow_action_handles_translate(struct rte_eth_dev *dev,
 						MLX5_RTE_FLOW_ACTION_TYPE_COUNT;
 			translated[handle->index].conf = (void *)(uintptr_t)idx;
 			break;
+		case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+			translated[handle->index].type =
+						(enum rte_flow_action_type)
+						MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK;
+			translated[handle->index].conf = (void *)(uintptr_t)idx;
+			break;
 		case MLX5_INDIRECT_ACTION_TYPE_AGE:
 			if (priv->sh->flow_hit_aso_en) {
 				translated[handle->index].type =
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index b8124f6f79..96198d7d17 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -46,6 +46,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
 	MLX5_RTE_FLOW_ACTION_TYPE_JUMP,
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
+	MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
 };
 
 /* Private (internal) Field IDs for MODIFY_FIELD action. */
@@ -54,22 +55,23 @@ enum mlx5_rte_flow_field_id {
 			MLX5_RTE_FLOW_FIELD_META_REG,
 };
 
-#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
+#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
 	MLX5_INDIRECT_ACTION_TYPE_COUNT,
 	MLX5_INDIRECT_ACTION_TYPE_CT,
+	MLX5_INDIRECT_ACTION_TYPE_METER_MARK,
 };
 
-/* Now, the maximal ports will be supported is 256, action number is 4M. */
-#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x100
+/* Now, the maximal ports will be supported is 16, action number is 32M. */
+#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x10
 
 #define MLX5_INDIRECT_ACT_CT_OWNER_SHIFT 22
 #define MLX5_INDIRECT_ACT_CT_OWNER_MASK (MLX5_INDIRECT_ACT_CT_MAX_PORT - 1)
 
-/* 30-31: type, 22-29: owner port, 0-21: index. */
+/* 29-31: type, 25-28: owner port, 0-24: index */
 #define MLX5_INDIRECT_ACT_CT_GEN_IDX(owner, index) \
 	((MLX5_INDIRECT_ACTION_TYPE_CT << MLX5_INDIRECT_ACTION_TYPE_OFFSET) | \
 	 (((owner) & MLX5_INDIRECT_ACT_CT_OWNER_MASK) << \
@@ -1114,6 +1116,7 @@ struct rte_flow_hw {
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	uint32_t cnt_id;
+	uint32_t mtr_id;
 	uint8_t rule[0]; /* HWS layer data struct. */
 } __rte_packed;
 
@@ -1165,6 +1168,9 @@ struct mlx5_action_construct_data {
 		struct {
 			uint32_t id;
 		} shared_counter;
+		struct {
+			uint32_t id;
+		} shared_meter;
 	};
 };
 
@@ -1248,6 +1254,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
+	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
@@ -1537,6 +1544,7 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 */
 		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
 		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
@@ -1922,10 +1930,10 @@ mlx5_aso_meter_by_idx(struct mlx5_priv *priv, uint32_t idx)
 	struct mlx5_aso_mtr_pools_mng *pools_mng =
 				&priv->sh->mtrmng->pools_mng;
 
-	/* Decrease to original index. */
-	idx--;
 	if (priv->mtr_bulk.aso)
 		return priv->mtr_bulk.aso + idx;
+	/* Decrease to original index. */
+	idx--;
 	MLX5_ASSERT(idx / MLX5_ASO_MTRS_PER_POOL < pools_mng->n);
 	rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 	pool = pools_mng->pools[idx / MLX5_ASO_MTRS_PER_POOL];
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index c00c07b891..a5f58301eb 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -275,6 +275,65 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	return -1;
 }
 
+void
+mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			  struct mlx5_aso_mtr_pool *hws_pool,
+			  struct mlx5_aso_mtr_pools_mng *pool_mng)
+{
+	uint32_t i;
+
+	if (hws_pool) {
+		for (i = 0; i < hws_pool->nb_sq; i++)
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+		mlx5_free(hws_pool->sq);
+		return;
+	}
+	if (pool_mng)
+		mlx5_aso_destroy_sq(&pool_mng->sq);
+}
+
+int
+mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+				struct mlx5_aso_mtr_pool *hws_pool,
+				struct mlx5_aso_mtr_pools_mng *pool_mng,
+				uint32_t nb_queues)
+{
+	struct mlx5_common_device *cdev = sh->cdev;
+	struct mlx5_aso_sq *sq;
+	uint32_t i;
+
+	if (hws_pool) {
+		sq = mlx5_malloc(MLX5_MEM_ZERO,
+			sizeof(struct mlx5_aso_sq) * nb_queues,
+			RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!sq)
+			return -1;
+		hws_pool->sq = sq;
+		for (i = 0; i < nb_queues; i++) {
+			if (mlx5_aso_sq_create(cdev, hws_pool->sq + i,
+					       sh->tx_uar.obj,
+					       MLX5_ASO_QUEUE_LOG_DESC))
+				goto error;
+			mlx5_aso_mtr_init_sq(hws_pool->sq + i);
+		}
+		hws_pool->nb_sq = nb_queues;
+	}
+	if (pool_mng) {
+		if (mlx5_aso_sq_create(cdev, &pool_mng->sq,
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			return -1;
+		mlx5_aso_mtr_init_sq(&pool_mng->sq);
+	}
+	return 0;
+error:
+	do {
+		if (&hws_pool->sq[i])
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+	} while (i--);
+	return -1;
+}
+
 /**
  * API to create and initialize Send Queue used for ASO access.
  *
@@ -282,13 +341,16 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
  *   Pointer to shared device context.
  * @param[in] aso_opc_mod
  *   Mode of ASO feature.
+ * @param[in] nb_queues
+ *   Number of Send Queues to create.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		    enum mlx5_access_aso_opc_mod aso_opc_mod)
+		    enum mlx5_access_aso_opc_mod aso_opc_mod,
+			uint32_t nb_queues)
 {
 	uint32_t sq_desc_n = 1 << MLX5_ASO_QUEUE_LOG_DESC;
 	struct mlx5_common_device *cdev = sh->cdev;
@@ -307,10 +369,9 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_age_init_sq(&sh->aso_age_mng->aso_sq);
 		break;
 	case ASO_OPC_MOD_POLICER:
-		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
+		if (mlx5_aso_mtr_queue_init(sh, NULL,
+					    &sh->mtrmng->pools_mng, nb_queues))
 			return -1;
-		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
@@ -343,7 +404,7 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->aso_age_mng->aso_sq;
 		break;
 	case ASO_OPC_MOD_POLICER:
-		sq = &sh->mtrmng->pools_mng.sq;
+		mlx5_aso_mtr_queue_uninit(sh, NULL, &sh->mtrmng->pools_mng);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
@@ -666,7 +727,8 @@ static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
-			       struct mlx5_mtr_bulk *bulk)
+			       struct mlx5_mtr_bulk *bulk,
+				   bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -679,11 +741,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t param_le;
 	int id;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return 0;
 	}
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
@@ -692,8 +756,11 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
-		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-				    mtrs[aso_mtr->offset]);
+		if (likely(sh->config.dv_flow_en == 2))
+			pool = aso_mtr->pool;
+		else
+			pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+					    mtrs[aso_mtr->offset]);
 		id = pool->devx_obj->id;
 	} else {
 		id = bulk->devx_obj->id;
@@ -756,7 +823,8 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -779,7 +847,7 @@ mlx5_aso_mtrs_status_update(struct mlx5_aso_sq *sq, uint16_t aso_mtrs_nums)
 }
 
 static void
-mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
+mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 {
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
@@ -791,7 +859,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
 		rte_spinlock_unlock(&sq->sqsl);
@@ -823,7 +892,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /**
@@ -840,16 +910,31 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
 			struct mlx5_mtr_bulk *bulk)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2) &&
+	    mtr->type == ASO_METER_INDIRECT) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
+						   bulk, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -873,17 +958,31 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2) &&
+	    mtr->type == ASO_METER_INDIRECT) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 		return 0;
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
 		if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 			return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 7efc936ddd..868fa6e1a5 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1387,6 +1387,7 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 		return inherit < 0 ? 0 : inherit;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+	case RTE_FLOW_FIELD_METER_COLOR:
 		return 2;
 	default:
 		MLX5_ASSERT(false);
@@ -1856,6 +1857,31 @@ mlx5_flow_field_id_to_modify_info
 				info[idx].offset = data->offset;
 		}
 		break;
+	case RTE_FLOW_FIELD_METER_COLOR:
+		{
+			const uint32_t color_mask =
+				(UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = flow_hw_get_reg_id
+					(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						       0, error);
+			if (reg < 0)
+				return;
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT((unsigned int)reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0,
+						reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, color_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -1913,7 +1939,9 @@ flow_dv_convert_action_modify_field
 		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
 					(void *)(uintptr_t)conf->src.pvalue :
 					(void *)(uintptr_t)&conf->src.value;
-		if (conf->dst.field == RTE_FLOW_FIELD_META) {
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR) {
 			meta = *(const unaligned_uint32_t *)item.spec;
 			meta = rte_cpu_to_be_32(meta);
 			item.spec = &meta;
@@ -3687,6 +3715,69 @@ flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Validate METER_COLOR item.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] item
+ *   Item specification.
+ * @param[in] attr
+ *   Attributes of flow that includes this item.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_validate_item_meter_color(struct rte_eth_dev *dev,
+			   const struct rte_flow_item *item,
+			   const struct rte_flow_attr *attr __rte_unused,
+			   struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_item_meter_color *spec = item->spec;
+	const struct rte_flow_item_meter_color *mask = item->mask;
+	struct rte_flow_item_meter_color nic_mask = {
+		.color = RTE_COLORS
+	};
+	int ret;
+
+	if (priv->mtr_color_reg == REG_NON)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ITEM, item,
+					  "meter color register"
+					  " isn't available");
+	ret = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, error);
+	if (ret < 0)
+		return ret;
+	if (!spec)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+					  item->spec,
+					  "data cannot be empty");
+	if (spec->color > RTE_COLORS)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &spec->color,
+					  "meter color is invalid");
+	if (!mask)
+		mask = &rte_flow_item_meter_color_mask;
+	if (!mask->color)
+		return rte_flow_error_set(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ITEM_SPEC, NULL,
+					"mask cannot be zero");
+
+	ret = mlx5_flow_item_acceptable(item, (const uint8_t *)mask,
+				(const uint8_t *)&nic_mask,
+				sizeof(struct rte_flow_item_meter_color),
+				MLX5_ITEM_RANGE_NOT_ACCEPTED, error);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
 int
 flow_dv_encap_decap_match_cb(void *tool_ctx __rte_unused,
 			     struct mlx5_list_entry *entry, void *cb_ctx)
@@ -6519,7 +6610,7 @@ flow_dv_mtr_container_resize(struct rte_eth_dev *dev)
 		return -ENOMEM;
 	}
 	if (!pools_mng->n)
-		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER, 1)) {
 			mlx5_free(pools);
 			return -ENOMEM;
 		}
@@ -7421,6 +7512,13 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+			ret = flow_dv_validate_item_meter_color(dev, items,
+								attr, error);
+			if (ret < 0)
+				return ret;
+			last_item = MLX5_FLOW_ITEM_METER_COLOR;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -10508,6 +10606,45 @@ flow_dv_translate_item_flex(struct rte_eth_dev *dev, void *matcher, void *key,
 	mlx5_flex_flow_translate_item(dev, matcher, key, item, is_inner);
 }
 
+/**
+ * Add METER_COLOR item to matcher
+ *
+ * @param[in] dev
+ *   The device to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ */
+static void
+flow_dv_translate_item_meter_color(struct rte_eth_dev *dev, void *key,
+			    const struct rte_flow_item *item,
+			    uint32_t key_type)
+{
+	const struct rte_flow_item_meter_color *color_m = item->mask;
+	const struct rte_flow_item_meter_color *color_v = item->spec;
+	uint32_t value, mask;
+	int reg = REG_NON;
+
+	MLX5_ASSERT(color_v);
+	if (MLX5_ITEM_VALID(item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(item, key_type, color_v, color_m,
+		&rte_flow_item_meter_color_mask);
+	value = rte_col_2_mlx5_col(color_v->color);
+	mask = color_m ?
+		color_m->color : (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+	if (reg == REG_NON)
+		return;
+	flow_dv_match_meta_reg(key, (enum modify_reg)reg, value, mask);
+}
+
 static uint32_t matcher_zero[MLX5_ST_SZ_DW(fte_match_param)] = { 0 };
 
 #define HEADER_IS_ZERO(match_criteria, headers)				     \
@@ -13260,6 +13397,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		/* No other protocol should follow eCPRI layer. */
 		last_item = MLX5_FLOW_LAYER_ECPRI;
 		break;
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		flow_dv_translate_item_meter_color(dev, key, items, key_type);
+		last_item = MLX5_FLOW_ITEM_METER_COLOR;
+		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 0c110819e6..52125c861e 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -412,6 +412,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
 		acts->cnt_id = 0;
 	}
+	if (acts->mtr_id) {
+		mlx5_ipool_free(priv->hws_mpool->idx_pool, acts->mtr_id);
+		acts->mtr_id = 0;
+	}
 }
 
 /**
@@ -628,6 +632,42 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared meter_mark action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] mtr_id
+ *   Shared meter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_mtr_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t mtr_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_meter.id = mtr_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
 
 /**
  * Translate shared indirect action.
@@ -682,6 +722,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 				       idx, &acts->rule_acts[action_dst]))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		if (__flow_hw_act_data_shared_mtr_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
+			action_src, action_dst, idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -888,6 +935,7 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
 		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
@@ -1047,7 +1095,7 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
+	if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 		return -ENOMEM;
 	return 0;
 }
@@ -1121,6 +1169,74 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 #endif
 }
 
+static __rte_always_inline struct mlx5_aso_mtr *
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
+			   const struct rte_flow_action *action,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_action_meter_mark *meter_mark = action->conf;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t mtr_id;
+
+	aso_mtr = mlx5_ipool_malloc(priv->hws_mpool->idx_pool, &mtr_id);
+	if (!aso_mtr)
+		return NULL;
+	/* Fill the flow meter parameters. */
+	aso_mtr->type = ASO_METER_INDIRECT;
+	fm = &aso_mtr->fm;
+	fm->meter_id = mtr_id;
+	fm->profile = (struct mlx5_flow_meter_profile *)(meter_mark->profile);
+	fm->is_enable = meter_mark->state;
+	fm->color_aware = meter_mark->color_mode;
+	aso_mtr->pool = pool;
+	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->offset = mtr_id - 1;
+	aso_mtr->init_color = (meter_mark->color_mode) ?
+		meter_mark->init_color : RTE_COLOR_GREEN;
+	/* Update ASO flow meter by wqe. */
+	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+					 &priv->mtr_bulk)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	/* Wait for ASO object completion. */
+	if (queue == MLX5_HW_INV_QUEUE &&
+	    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	return aso_mtr;
+}
+
+static __rte_always_inline int
+flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
+			   uint16_t aso_mtr_pos,
+			   const struct rte_flow_action *action,
+			   struct mlx5dr_rule_action *acts,
+			   uint32_t *index,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+
+	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	if (!aso_mtr)
+		return -1;
+
+	/* Compile METER_MARK action */
+	acts[aso_mtr_pos].action = pool->action;
+	acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts[aso_mtr_pos].aso_meter.init_color =
+		(enum mlx5dr_action_aso_meter_color)
+		rte_col_2_mlx5_col(aso_mtr->init_color);
+	*index = aso_mtr->fm.meter_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1428,6 +1544,24 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			action_pos = at->actions_off[actions - at->actions];
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter_mark *)
+			     masks->conf)->profile) {
+				err = flow_hw_meter_mark_compile(dev,
+							action_pos, actions,
+							acts->rule_acts,
+							&acts->mtr_id,
+							MLX5_HW_INV_QUEUE);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1624,8 +1758,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
+	struct mlx5_aso_mtr *aso_mtr;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
@@ -1661,6 +1797,17 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return -1;
+		rule_act->action = pool->action;
+		rule_act->aso_meter.offset = aso_mtr->offset;
+		rule_act->aso_meter.init_color =
+			(enum mlx5dr_action_aso_meter_color)
+			rte_col_2_mlx5_col(aso_mtr->init_color);
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1730,6 +1877,7 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
 	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
@@ -1807,6 +1955,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_actions_template *at = hw_at->action_template;
@@ -1823,8 +1972,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
-	struct mlx5_aso_mtr *mtr;
-	uint32_t mtr_id;
+	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
@@ -1858,6 +2006,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		struct mlx5_hrxq *hrxq;
 		uint32_t ct_idx;
 		cnt_id_t cnt_id;
+		uint32_t mtr_id;
 
 		action = &actions[act_data->action_src];
 		/*
@@ -1964,13 +2113,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			meter = action->conf;
 			mtr_id = meter->mtr_id;
-			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			aso_mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
 			rule_acts[act_data->action_dst].action =
 				priv->mtr_bulk.action;
 			rule_acts[act_data->action_dst].aso_meter.offset =
-								mtr->offset;
+								aso_mtr->offset;
 			jump = flow_hw_jump_action_register
-				(dev, &table->cfg, mtr->fm.group, NULL);
+				(dev, &table->cfg, aso_mtr->fm.group, NULL);
 			if (!jump)
 				return -1;
 			MLX5_ASSERT
@@ -1980,7 +2129,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
@@ -2016,6 +2165,28 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 					       &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK:
+			mtr_id = act_data->shared_meter.id &
+				((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+			/* Find ASO object. */
+			aso_mtr = mlx5_ipool_get(pool->idx_pool, mtr_id);
+			if (!aso_mtr)
+				return -1;
+			rule_acts[act_data->action_dst].action =
+							pool->action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+							aso_mtr->offset;
+			rule_acts[act_data->action_dst].aso_meter.init_color =
+				(enum mlx5dr_action_aso_meter_color)
+				rte_col_2_mlx5_col(aso_mtr->init_color);
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			ret = flow_hw_meter_mark_compile(dev,
+				act_data->action_dst, action,
+				rule_acts, &job->flow->mtr_id, queue);
+			if (ret != 0)
+				return ret;
+			break;
 		default:
 			break;
 		}
@@ -2283,6 +2454,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
@@ -2307,6 +2479,10 @@ flow_hw_pull(struct rte_eth_dev *dev,
 						&job->flow->cnt_id);
 				job->flow->cnt_id = 0;
 			}
+			if (job->flow->mtr_id) {
+				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
+				job->flow->mtr_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -3189,6 +3365,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -3282,6 +3461,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_METER;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3373,6 +3557,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3848,6 +4038,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 								  " attribute");
 			}
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		{
+			int reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported meter color register");
+			break;
+		}
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -5357,7 +5557,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	LIST_INIT(&priv->hw_ctrl_flows);
 	/* Initialize meter library*/
 	if (port_attr->nb_meters)
-		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1))
+		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1, nb_q_updated))
 			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
@@ -5861,7 +6061,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
+	uint32_t mtr_id;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5880,6 +6082,14 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		if (!aso_mtr)
+			break;
+		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
+		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5915,18 +6125,59 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
-	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
-
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_update_meter_mark *upd_meter_mark =
+		(const struct rte_flow_update_meter_mark *)update;
+	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		meter_mark = &upd_meter_mark->meter_mark;
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark update index");
+		fm = &aso_mtr->fm;
+		if (upd_meter_mark->profile_valid)
+			fm->profile = (struct mlx5_flow_meter_profile *)
+							(meter_mark->profile);
+		if (upd_meter_mark->color_mode_valid)
+			fm->color_aware = meter_mark->color_mode;
+		if (upd_meter_mark->init_color_valid)
+			aso_mtr->init_color = (meter_mark->color_mode) ?
+				meter_mark->init_color : RTE_COLOR_GREEN;
+		if (upd_meter_mark->state_valid)
+			fm->is_enable = meter_mark->state;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
+						 aso_mtr, &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		return 0;
 	default:
-		return flow_dv_action_update(dev, handle, update, error);
+		break;
 	}
+	return flow_dv_action_update(dev, handle, update, error);
 }
 
 /**
@@ -5957,7 +6208,11 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5967,6 +6222,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark destroy index");
+		fm = &aso_mtr->fm;
+		fm->is_enable = 0;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+						 &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		mlx5_ipool_free(pool->idx_pool, idx);
+		return 0;
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -6050,8 +6327,8 @@ flow_hw_action_create(struct rte_eth_dev *dev,
 		       const struct rte_flow_action *action,
 		       struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
-					    NULL, err);
+	return flow_hw_action_handle_create(dev, MLX5_HW_INV_QUEUE,
+					    NULL, conf, action, NULL, err);
 }
 
 /**
@@ -6076,8 +6353,8 @@ flow_hw_action_destroy(struct rte_eth_dev *dev,
 		       struct rte_flow_action_handle *handle,
 		       struct rte_flow_error *error)
 {
-	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
-			NULL, error);
+	return flow_hw_action_handle_destroy(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, NULL, error);
 }
 
 /**
@@ -6105,8 +6382,8 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 		      const void *update,
 		      struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
-			update, NULL, err);
+	return flow_hw_action_handle_update(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, update, NULL, err);
 }
 
 static int
@@ -6636,6 +6913,12 @@ mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 		mlx5_free(priv->mtr_profile_arr);
 		priv->mtr_profile_arr = NULL;
 	}
+	if (priv->hws_mpool) {
+		mlx5_aso_mtr_queue_uninit(priv->sh, priv->hws_mpool, NULL);
+		mlx5_ipool_destroy(priv->hws_mpool->idx_pool);
+		mlx5_free(priv->hws_mpool);
+		priv->hws_mpool = NULL;
+	}
 	if (priv->mtr_bulk.aso) {
 		mlx5_free(priv->mtr_bulk.aso);
 		priv->mtr_bulk.aso = NULL;
@@ -6656,7 +6939,8 @@ int
 mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		     uint32_t nb_meters,
 		     uint32_t nb_meter_profiles,
-		     uint32_t nb_meter_policies)
+		     uint32_t nb_meter_policies,
+		     uint32_t nb_queues)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_obj *dcs = NULL;
@@ -6666,29 +6950,35 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *aso;
 	uint32_t i;
 	struct rte_flow_error error;
+	uint32_t flags;
+	uint32_t nb_mtrs = rte_align32pow2(nb_meters);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_mtr),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.max_idx = nb_meters,
+		.free = mlx5_free,
+		.type = "mlx5_hw_mtr_mark_action",
+	};
 
 	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter configuration is invalid.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter configuration is invalid.");
 		goto err;
 	}
 	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO is not supported.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO is not supported.");
 		goto err;
 	}
 	priv->mtr_config.nb_meters = nb_meters;
-	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
-		ret = ENOMEM;
-		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO queue allocation failed.");
-		goto err;
-	}
 	log_obj_size = rte_log2_u32(nb_meters >> 1);
 	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
 		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
@@ -6696,8 +6986,8 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (!dcs) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO object allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO object allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.devx_obj = dcs;
@@ -6705,31 +6995,33 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (reg_id < 0) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter register is not available.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter register is not available.");
 		goto err;
 	}
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
 	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
 			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
-				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
-				MLX5DR_ACTION_FLAG_HWS_TX |
-				MLX5DR_ACTION_FLAG_HWS_FDB);
+				reg_id - REG_C_0, flags);
 	if (!priv->mtr_bulk.action) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter action creation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter action creation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
-						sizeof(struct mlx5_aso_mtr) * nb_meters,
-						RTE_CACHE_LINE_SIZE,
-						SOCKET_ID_ANY);
+					 sizeof(struct mlx5_aso_mtr) *
+					 nb_meters,
+					 RTE_CACHE_LINE_SIZE,
+					 SOCKET_ID_ANY);
 	if (!priv->mtr_bulk.aso) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter bulk ASO allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter bulk ASO allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.size = nb_meters;
@@ -6740,32 +7032,65 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		aso->offset = i;
 		aso++;
 	}
+	priv->hws_mpool = mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_aso_mtr_pool),
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	if (!priv->hws_mpool) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ipool allocation failed.");
+		goto err;
+	}
+	priv->hws_mpool->devx_obj = priv->mtr_bulk.devx_obj;
+	priv->hws_mpool->action = priv->mtr_bulk.action;
+	priv->hws_mpool->nb_sq = nb_queues;
+	if (mlx5_aso_mtr_queue_init(priv->sh, priv->hws_mpool,
+				    &priv->sh->mtrmng->pools_mng, nb_queues)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	/*
+	 * No need for local cache if Meter number is a small number.
+	 * Since flow insertion rate will be very limited in that case.
+	 * Here let's set the number to less than default trunk size 4K.
+	 */
+	if (nb_mtrs <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_mtrs;
+	} else if (nb_mtrs <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	priv->hws_mpool->idx_pool = mlx5_ipool_create(&cfg);
 	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
 	priv->mtr_profile_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_profile) *
-				nb_meter_profiles,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_profile) *
+			    nb_meter_profiles,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_profile_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter profile allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter profile allocation failed.");
 		goto err;
 	}
 	priv->mtr_config.nb_meter_policies = nb_meter_policies;
 	priv->mtr_policy_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_policy) *
-				nb_meter_policies,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_policy) *
+			    nb_meter_policies,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_policy_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter policy allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter policy allocation failed.");
 		goto err;
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index 8cf24d1f7a..ed2306283d 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -588,6 +588,36 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR profile.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_profile *
+mlx5_flow_meter_profile_get(struct rte_eth_dev *dev,
+			  uint32_t meter_profile_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_profile_find(priv,
+							meter_profile_id);
+}
+
 /**
  * Callback to add MTR profile with HWS.
  *
@@ -1150,6 +1180,37 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR policy.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_policy *
+mlx5_flow_meter_policy_get(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t policy_idx;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_policy_find(dev, policy_id,
+							      &policy_idx);
+}
+
 /**
  * Callback to delete MTR policy for HWS.
  *
@@ -1310,9 +1371,9 @@ mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
 			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
 			NULL, "Meter policy already exists.");
 	if (!policy ||
-	    !policy->actions[RTE_COLOR_RED] ||
-	    !policy->actions[RTE_COLOR_YELLOW] ||
-	    !policy->actions[RTE_COLOR_GREEN])
+	    (!policy->actions[RTE_COLOR_RED] &&
+	    !policy->actions[RTE_COLOR_YELLOW] &&
+	    !policy->actions[RTE_COLOR_GREEN]))
 		return -rte_mtr_error_set(error, EINVAL,
 					  RTE_MTR_ERROR_TYPE_METER_POLICY,
 					  NULL, "Meter policy actions are not valid.");
@@ -1372,6 +1433,11 @@ mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
 			act++;
 		}
 	}
+	if (priv->sh->config.dv_esw_en)
+		domain_color &= ~(MLX5_MTR_DOMAIN_EGRESS_BIT |
+				  MLX5_MTR_DOMAIN_TRANSFER_BIT);
+	else
+		domain_color &= ~MLX5_MTR_DOMAIN_TRANSFER_BIT;
 	if (!domain_color)
 		return -rte_mtr_error_set(error, ENOTSUP,
 					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
@@ -1565,11 +1631,11 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
+		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
 		if (ret)
 			return ret;
 	} else {
@@ -1815,8 +1881,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1921,7 +1987,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->shared = !!shared;
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
-	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
 					   &priv->mtr_bulk);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
@@ -2401,9 +2467,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_create,
 	.destroy = mlx5_flow_meter_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2418,9 +2486,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_hws_create,
 	.destroy = mlx5_flow_meter_hws_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2566,7 +2636,7 @@ mlx5_flow_meter_attach(struct mlx5_priv *priv,
 		struct mlx5_aso_mtr *aso_mtr;
 
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
 			return rte_flow_error_set(error, ENOENT,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
@@ -2865,7 +2935,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		}
 	}
 	if (priv->mtr_bulk.aso) {
-		for (i = 1; i <= priv->mtr_config.nb_meter_profiles; i++) {
+		for (i = 0; i < priv->mtr_config.nb_meters; i++) {
 			aso_mtr = mlx5_aso_meter_by_idx(priv, i);
 			fm = &aso_mtr->fm;
 			if (fm->initialized)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 13/18] net/mlx5: add HWS AGE action support
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (11 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 14/18] net/mlx5: add async action push and pull support Suanming Mou
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Michael Baum

From: Michael Baum <michaelba@nvidia.com>

Add support for AGE action for HW steering.
This patch includes:

 1. Add new structures to manage the aging.
 2. Initialize all them in configure function.
 3. Implement per second aging check using CNT background thread.
 4. Enable AGE action in flow create/destroy operations.
 5. Implement queue-based function to report aged flow rules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |   14 +
 drivers/net/mlx5/mlx5.c            |   67 +-
 drivers/net/mlx5/mlx5.h            |   51 +-
 drivers/net/mlx5/mlx5_defs.h       |    3 +
 drivers/net/mlx5/mlx5_flow.c       |   91 ++-
 drivers/net/mlx5/mlx5_flow.h       |   33 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   30 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1145 ++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_hws_cnt.c    |  753 +++++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.h    |  193 ++++-
 drivers/net/mlx5/mlx5_utils.h      |   10 +-
 12 files changed, 2127 insertions(+), 267 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index fc823111c6..75620c286b 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -558,6 +558,20 @@ Limitations
 - The NIC egress flow rules on representor port are not supported.
 
 
+- HWS AGE action in mlx5:
+
+  - Using the same indirect COUNT action combined with multiple AGE actions in
+    different flows may cause a wrong AGE state for the AGE actions.
+  - Creating/destroying flow rules with indirect AGE action when it is active
+    (timeout != 0) may cause a wrong AGE state for the indirect AGE action.
+  - The mlx5 driver reuses counters for aging action, so for optimization
+    the values in ``rte_flow_port_attr`` structure should describe:
+
+    - ``nb_counters`` is the number of flow rules using counter (with/without AGE)
+      in addition to flow rules using only AGE (without COUNT action).
+    - ``nb_aging_objects`` is the number of flow rules containing AGE action.
+
+
 Statistics
 ----------
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 64a0e6f31d..4e532f0807 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -497,6 +497,12 @@ mlx5_flow_aging_init(struct mlx5_dev_ctx_shared *sh)
 	uint32_t i;
 	struct mlx5_age_info *age_info;
 
+	/*
+	 * In HW steering, aging information structure is initialized later
+	 * during configure function.
+	 */
+	if (sh->config.dv_flow_en == 2)
+		return;
 	for (i = 0; i < sh->max_port; i++) {
 		age_info = &sh->port[i].age_info;
 		age_info->flags = 0;
@@ -540,8 +546,8 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 			hca_attr->flow_counter_bulk_alloc_bitmap);
 	/* Initialize fallback mode only on the port initializes sh. */
 	if (sh->refcnt == 1)
-		sh->cmng.counter_fallback = fallback;
-	else if (fallback != sh->cmng.counter_fallback)
+		sh->sws_cmng.counter_fallback = fallback;
+	else if (fallback != sh->sws_cmng.counter_fallback)
 		DRV_LOG(WARNING, "Port %d in sh has different fallback mode "
 			"with others:%d.", PORT_ID(priv), fallback);
 #endif
@@ -556,17 +562,38 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 static void
 mlx5_flow_counters_mng_init(struct mlx5_dev_ctx_shared *sh)
 {
-	int i;
+	int i, j;
+
+	if (sh->config.dv_flow_en < 2) {
+		memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
+		TAILQ_INIT(&sh->sws_cmng.flow_counters);
+		sh->sws_cmng.min_id = MLX5_CNT_BATCH_OFFSET;
+		sh->sws_cmng.max_id = -1;
+		sh->sws_cmng.last_pool_idx = POOL_IDX_INVALID;
+		rte_spinlock_init(&sh->sws_cmng.pool_update_sl);
+		for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
+			TAILQ_INIT(&sh->sws_cmng.counters[i]);
+			rte_spinlock_init(&sh->sws_cmng.csl[i]);
+		}
+	} else {
+		struct mlx5_hca_attr *attr = &sh->cdev->config.hca_attr;
+		uint32_t fw_max_nb_cnts = attr->max_flow_counter;
+		uint8_t log_dcs = log2above(fw_max_nb_cnts) - 1;
+		uint32_t max_nb_cnts = 0;
+
+		for (i = 0, j = 0; j < MLX5_HWS_CNT_DCS_NUM; ++i) {
+			int log_dcs_i = log_dcs - i;
 
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
-	TAILQ_INIT(&sh->cmng.flow_counters);
-	sh->cmng.min_id = MLX5_CNT_BATCH_OFFSET;
-	sh->cmng.max_id = -1;
-	sh->cmng.last_pool_idx = POOL_IDX_INVALID;
-	rte_spinlock_init(&sh->cmng.pool_update_sl);
-	for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
-		TAILQ_INIT(&sh->cmng.counters[i]);
-		rte_spinlock_init(&sh->cmng.csl[i]);
+			if (log_dcs_i < 0)
+				break;
+			if ((max_nb_cnts | RTE_BIT32(log_dcs_i)) >
+			    fw_max_nb_cnts)
+				continue;
+			max_nb_cnts |= RTE_BIT32(log_dcs_i);
+			j++;
+		}
+		sh->hws_max_log_bulk_sz = log_dcs;
+		sh->hws_max_nb_counters = max_nb_cnts;
 	}
 }
 
@@ -607,13 +634,13 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 		rte_pause();
 	}
 
-	if (sh->cmng.pools) {
+	if (sh->sws_cmng.pools) {
 		struct mlx5_flow_counter_pool *pool;
-		uint16_t n_valid = sh->cmng.n_valid;
-		bool fallback = sh->cmng.counter_fallback;
+		uint16_t n_valid = sh->sws_cmng.n_valid;
+		bool fallback = sh->sws_cmng.counter_fallback;
 
 		for (i = 0; i < n_valid; ++i) {
-			pool = sh->cmng.pools[i];
+			pool = sh->sws_cmng.pools[i];
 			if (!fallback && pool->min_dcs)
 				claim_zero(mlx5_devx_cmd_destroy
 							       (pool->min_dcs));
@@ -632,14 +659,14 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 			}
 			mlx5_free(pool);
 		}
-		mlx5_free(sh->cmng.pools);
+		mlx5_free(sh->sws_cmng.pools);
 	}
-	mng = LIST_FIRST(&sh->cmng.mem_mngs);
+	mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	while (mng) {
 		mlx5_flow_destroy_counter_stat_mem_mng(mng);
-		mng = LIST_FIRST(&sh->cmng.mem_mngs);
+		mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	}
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
+	memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d3267fafda..09ab7a080a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -644,12 +644,45 @@ struct mlx5_geneve_tlv_option_resource {
 /* Current time in seconds. */
 #define MLX5_CURR_TIME_SEC	(rte_rdtsc() / rte_get_tsc_hz())
 
+/*
+ * HW steering queue oriented AGE info.
+ * It contains an array of rings, one for each HWS queue.
+ */
+struct mlx5_hws_q_age_info {
+	uint16_t nb_rings; /* Number of aged-out ring lists. */
+	struct rte_ring *aged_lists[]; /* Aged-out lists. */
+};
+
+/*
+ * HW steering AGE info.
+ * It has a ring list containing all aged out flow rules.
+ */
+struct mlx5_hws_age_info {
+	struct rte_ring *aged_list; /* Aged out lists. */
+};
+
 /* Aging information for per port. */
 struct mlx5_age_info {
 	uint8_t flags; /* Indicate if is new event or need to be triggered. */
-	struct mlx5_counters aged_counters; /* Aged counter list. */
-	struct aso_age_list aged_aso; /* Aged ASO actions list. */
-	rte_spinlock_t aged_sl; /* Aged flow list lock. */
+	union {
+		/* SW/FW steering AGE info. */
+		struct {
+			struct mlx5_counters aged_counters;
+			/* Aged counter list. */
+			struct aso_age_list aged_aso;
+			/* Aged ASO actions list. */
+			rte_spinlock_t aged_sl; /* Aged flow list lock. */
+		};
+		struct {
+			struct mlx5_indexed_pool *ages_ipool;
+			union {
+				struct mlx5_hws_age_info hw_age;
+				/* HW steering AGE info. */
+				struct mlx5_hws_q_age_info *hw_q_age;
+				/* HW steering queue oriented AGE info. */
+			};
+		};
+	};
 };
 
 /* Per port data of shared IB device. */
@@ -1307,6 +1340,9 @@ struct mlx5_dev_ctx_shared {
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
 	uint32_t shared_mark_enabled:1;
 	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
+	uint32_t hws_max_log_bulk_sz:5;
+	/* Log of minimal HWS counters created hard coded. */
+	uint32_t hws_max_nb_counters; /* Maximal number for HWS counters. */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1347,7 +1383,8 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_list *dest_array_list;
 	struct mlx5_list *flex_parsers_dv; /* Flex Item parsers. */
 	/* List of destination array actions. */
-	struct mlx5_flow_counter_mng cmng; /* Counters management structure. */
+	struct mlx5_flow_counter_mng sws_cmng;
+	/* SW steering counters management structure. */
 	void *default_miss_action; /* Default miss action. */
 	struct mlx5_indexed_pool *ipool[MLX5_IPOOL_MAX];
 	struct mlx5_indexed_pool *mdh_ipools[MLX5_MAX_MODIFY_NUM];
@@ -1677,6 +1714,9 @@ struct mlx5_priv {
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
+	uint32_t hws_strict_queue:1;
+	/**< Whether all operations strictly happen on the same HWS queue. */
+	uint32_t hws_age_req:1; /**< Whether this port has AGE indexed pool. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
@@ -1992,6 +2032,9 @@ int mlx5_validate_action_ct(struct rte_eth_dev *dev,
 			    const struct rte_flow_action_conntrack *conntrack,
 			    struct rte_flow_error *error);
 
+int mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			       void **contexts, uint32_t nb_contexts,
+			       struct rte_flow_error *error);
 
 /* mlx5_mp_os.c */
 
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index d064abfef3..2af8c731ef 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -43,6 +43,9 @@
 #define MLX5_PMD_SOFT_COUNTERS 1
 #endif
 
+/* Maximum number of DCS created per port. */
+#define MLX5_HWS_CNT_DCS_NUM 4
+
 /* Alarm timeout. */
 #define MLX5_ALARM_TIMEOUT_US 100000
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index e3485352db..c32255a3f9 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -989,6 +989,9 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.isolate = mlx5_flow_isolate,
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	.get_q_aged_flows = mlx5_flow_get_q_aged_flows,
+#endif
 	.get_aged_flows = mlx5_flow_get_aged_flows,
 	.action_handle_create = mlx5_action_handle_create,
 	.action_handle_destroy = mlx5_action_handle_destroy,
@@ -8944,11 +8947,11 @@ mlx5_flow_create_counter_stat_mem_mng(struct mlx5_dev_ctx_shared *sh)
 		mem_mng->raws[i].data = raw_data + i * MLX5_COUNTERS_PER_POOL;
 	}
 	for (i = 0; i < MLX5_MAX_PENDING_QUERIES; ++i)
-		LIST_INSERT_HEAD(&sh->cmng.free_stat_raws,
+		LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws,
 				 mem_mng->raws + MLX5_CNT_CONTAINER_RESIZE + i,
 				 next);
-	LIST_INSERT_HEAD(&sh->cmng.mem_mngs, mem_mng, next);
-	sh->cmng.mem_mng = mem_mng;
+	LIST_INSERT_HEAD(&sh->sws_cmng.mem_mngs, mem_mng, next);
+	sh->sws_cmng.mem_mng = mem_mng;
 	return 0;
 }
 
@@ -8967,7 +8970,7 @@ static int
 mlx5_flow_set_counter_stat_mem(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_flow_counter_pool *pool)
 {
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	/* Resize statistic memory once used out. */
 	if (!(pool->index % MLX5_CNT_CONTAINER_RESIZE) &&
 	    mlx5_flow_create_counter_stat_mem_mng(sh)) {
@@ -8996,14 +8999,14 @@ mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh)
 {
 	uint32_t pools_n, us;
 
-	pools_n = __atomic_load_n(&sh->cmng.n_valid, __ATOMIC_RELAXED);
+	pools_n = __atomic_load_n(&sh->sws_cmng.n_valid, __ATOMIC_RELAXED);
 	us = MLX5_POOL_QUERY_FREQ_US / pools_n;
 	DRV_LOG(DEBUG, "Set alarm for %u pools each %u us", pools_n, us);
 	if (rte_eal_alarm_set(us, mlx5_flow_query_alarm, sh)) {
-		sh->cmng.query_thread_on = 0;
+		sh->sws_cmng.query_thread_on = 0;
 		DRV_LOG(ERR, "Cannot reinitialize query alarm");
 	} else {
-		sh->cmng.query_thread_on = 1;
+		sh->sws_cmng.query_thread_on = 1;
 	}
 }
 
@@ -9019,12 +9022,12 @@ mlx5_flow_query_alarm(void *arg)
 {
 	struct mlx5_dev_ctx_shared *sh = arg;
 	int ret;
-	uint16_t pool_index = sh->cmng.pool_index;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	uint16_t pool_index = sh->sws_cmng.pool_index;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	uint16_t n_valid;
 
-	if (sh->cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
+	if (sh->sws_cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
 		goto set_alarm;
 	rte_spinlock_lock(&cmng->pool_update_sl);
 	pool = cmng->pools[pool_index];
@@ -9037,7 +9040,7 @@ mlx5_flow_query_alarm(void *arg)
 		/* There is a pool query in progress. */
 		goto set_alarm;
 	pool->raw_hw =
-		LIST_FIRST(&sh->cmng.free_stat_raws);
+		LIST_FIRST(&sh->sws_cmng.free_stat_raws);
 	if (!pool->raw_hw)
 		/* No free counter statistics raw memory. */
 		goto set_alarm;
@@ -9063,12 +9066,12 @@ mlx5_flow_query_alarm(void *arg)
 		goto set_alarm;
 	}
 	LIST_REMOVE(pool->raw_hw, next);
-	sh->cmng.pending_queries++;
+	sh->sws_cmng.pending_queries++;
 	pool_index++;
 	if (pool_index >= n_valid)
 		pool_index = 0;
 set_alarm:
-	sh->cmng.pool_index = pool_index;
+	sh->sws_cmng.pool_index = pool_index;
 	mlx5_set_query_alarm(sh);
 }
 
@@ -9151,7 +9154,7 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 		(struct mlx5_flow_counter_pool *)(uintptr_t)async_id;
 	struct mlx5_counter_stats_raw *raw_to_free;
 	uint8_t query_gen = pool->query_gen ^ 1;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 		pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 				MLX5_COUNTER_TYPE_ORIGIN;
@@ -9174,9 +9177,9 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 			rte_spinlock_unlock(&cmng->csl[cnt_type]);
 		}
 	}
-	LIST_INSERT_HEAD(&sh->cmng.free_stat_raws, raw_to_free, next);
+	LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws, raw_to_free, next);
 	pool->raw_hw = NULL;
-	sh->cmng.pending_queries--;
+	sh->sws_cmng.pending_queries--;
 }
 
 static int
@@ -9536,7 +9539,7 @@ mlx5_flow_dev_dump_sh_all(struct rte_eth_dev *dev,
 	struct mlx5_list_inconst *l_inconst;
 	struct mlx5_list_entry *e;
 	int lcore_index;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	uint32_t max;
 	void *action;
 
@@ -9707,18 +9710,58 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 {
 	const struct mlx5_flow_driver_ops *fops;
 	struct rte_flow_attr attr = { .transfer = 0 };
+	enum mlx5_flow_drv_type type = flow_get_drv_type(dev, &attr);
 
-	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_DV) {
-		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_DV);
-		return fops->get_aged_flows(dev, contexts, nb_contexts,
-						    error);
+	if (type == MLX5_FLOW_TYPE_DV || type == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(type);
+		return fops->get_aged_flows(dev, contexts, nb_contexts, error);
 	}
-	DRV_LOG(ERR,
-		"port %u get aged flows is not supported.",
-		 dev->data->port_id);
+	DRV_LOG(ERR, "port %u get aged flows is not supported.",
+		dev->data->port_id);
 	return -ENOTSUP;
 }
 
+/**
+ * Get aged-out flows per HWS queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query.
+ * @param[in] context
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_countexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+int
+mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			   void **contexts, uint32_t nb_contexts,
+			   struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = { 0 };
+
+	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+		return fops->get_q_aged_flows(dev, queue_id, contexts,
+					      nb_contexts, error);
+	}
+	DRV_LOG(ERR, "port %u queue %u get aged flows is not supported.",
+		dev->data->port_id, queue_id);
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "get Q aged flows with incorrect steering mode");
+}
+
 /* Wrapper for driver action_validate op callback */
 static int
 flow_drv_action_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 96198d7d17..5c57f51706 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -293,6 +293,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_MODIFY_FIELD (1ull << 39)
 #define MLX5_FLOW_ACTION_METER_WITH_TERMINATED_POLICY (1ull << 40)
 #define MLX5_FLOW_ACTION_CT (1ull << 41)
+#define MLX5_FLOW_ACTION_INDIRECT_COUNT (1ull << 42)
+#define MLX5_FLOW_ACTION_INDIRECT_AGE (1ull << 43)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -1099,6 +1101,22 @@ struct rte_flow {
 	uint32_t geneve_tlv_option; /**< Holds Geneve TLV option id. > */
 } __rte_packed;
 
+/*
+ * HWS COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
 #ifdef PEDANTIC
@@ -1115,7 +1133,8 @@ struct rte_flow_hw {
 		struct mlx5_hrxq *hrxq; /* TIR action. */
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
-	uint32_t cnt_id;
+	uint32_t age_idx;
+	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 	uint8_t rule[0]; /* HWS layer data struct. */
 } __rte_packed;
@@ -1166,7 +1185,7 @@ struct mlx5_action_construct_data {
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
 		struct {
-			uint32_t id;
+			cnt_id_t id;
 		} shared_counter;
 		struct {
 			uint32_t id;
@@ -1197,6 +1216,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint64_t action_flags; /* Bit-map of all valid action in template. */
 	uint16_t dr_actions_num; /* Amount of DR rules actions. */
 	uint16_t actions_num; /* Amount of flow actions */
 	uint16_t *actions_off; /* DR action offset for given rte action offset. */
@@ -1253,7 +1273,7 @@ struct mlx5_hw_actions {
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
-	uint32_t cnt_id; /* Counter id. */
+	cnt_id_t cnt_id; /* Counter id. */
 	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
@@ -1629,6 +1649,12 @@ typedef int (*mlx5_flow_get_aged_flows_t)
 					 void **context,
 					 uint32_t nb_contexts,
 					 struct rte_flow_error *error);
+typedef int (*mlx5_flow_get_q_aged_flows_t)
+					(struct rte_eth_dev *dev,
+					 uint32_t queue_id,
+					 void **context,
+					 uint32_t nb_contexts,
+					 struct rte_flow_error *error);
 typedef int (*mlx5_flow_action_validate_t)
 				(struct rte_eth_dev *dev,
 				 const struct rte_flow_indir_action_conf *conf,
@@ -1835,6 +1861,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_counter_free_t counter_free;
 	mlx5_flow_counter_query_t counter_query;
 	mlx5_flow_get_aged_flows_t get_aged_flows;
+	mlx5_flow_get_q_aged_flows_t get_q_aged_flows;
 	mlx5_flow_action_validate_t action_validate;
 	mlx5_flow_action_create_t action_create;
 	mlx5_flow_action_destroy_t action_destroy;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 868fa6e1a5..250f61d46f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5524,7 +5524,7 @@ flow_dv_validate_action_age(uint64_t action_flags,
 	const struct rte_flow_action_age *age = action->conf;
 
 	if (!priv->sh->cdev->config.devx ||
-	    (priv->sh->cmng.counter_fallback && !priv->sh->aso_age_mng))
+	    (priv->sh->sws_cmng.counter_fallback && !priv->sh->aso_age_mng))
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -6085,7 +6085,7 @@ flow_dv_counter_get_by_idx(struct rte_eth_dev *dev,
 			   struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	/* Decrease to original index and clear shared bit. */
@@ -6179,7 +6179,7 @@ static int
 flow_dv_container_resize(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	void *old_pools = cmng->pools;
 	uint32_t resize = cmng->n + MLX5_CNT_CONTAINER_RESIZE;
 	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * resize;
@@ -6225,7 +6225,7 @@ _flow_dv_query_count(struct rte_eth_dev *dev, uint32_t counter, uint64_t *pkts,
 
 	cnt = flow_dv_counter_get_by_idx(dev, counter, &pool);
 	MLX5_ASSERT(pool);
-	if (priv->sh->cmng.counter_fallback)
+	if (priv->sh->sws_cmng.counter_fallback)
 		return mlx5_devx_cmd_flow_counter_query(cnt->dcs_when_active, 0,
 					0, pkts, bytes, 0, NULL, NULL, 0);
 	rte_spinlock_lock(&pool->sl);
@@ -6262,8 +6262,8 @@ flow_dv_pool_create(struct rte_eth_dev *dev, struct mlx5_devx_obj *dcs,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t size = sizeof(*pool);
 
 	size += MLX5_COUNTERS_PER_POOL * MLX5_CNT_SIZE;
@@ -6324,14 +6324,14 @@ flow_dv_counter_pool_prepare(struct rte_eth_dev *dev,
 			     uint32_t age)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	struct mlx5_counters tmp_tq;
 	struct mlx5_devx_obj *dcs = NULL;
 	struct mlx5_flow_counter *cnt;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t i;
 
 	if (fallback) {
@@ -6395,8 +6395,8 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt_free = NULL;
-	bool fallback = priv->sh->cmng.counter_fallback;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
 	uint32_t cnt_idx;
@@ -6442,7 +6442,7 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	if (_flow_dv_query_count(dev, cnt_idx, &cnt_free->hits,
 				 &cnt_free->bytes))
 		goto err;
-	if (!fallback && !priv->sh->cmng.query_thread_on)
+	if (!fallback && !priv->sh->sws_cmng.query_thread_on)
 		/* Start the asynchronous batch query by the host thread. */
 		mlx5_set_query_alarm(priv->sh);
 	/*
@@ -6570,7 +6570,7 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 	 * this case, lock will not be needed as query callback and release
 	 * function both operate with the different list.
 	 */
-	if (!priv->sh->cmng.counter_fallback) {
+	if (!priv->sh->sws_cmng.counter_fallback) {
 		rte_spinlock_lock(&pool->csl);
 		TAILQ_INSERT_TAIL(&pool->counters[pool->query_gen], cnt, next);
 		rte_spinlock_unlock(&pool->csl);
@@ -6578,10 +6578,10 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 		cnt->dcs_when_free = cnt->dcs_when_active;
 		cnt_type = pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 					   MLX5_COUNTER_TYPE_ORIGIN;
-		rte_spinlock_lock(&priv->sh->cmng.csl[cnt_type]);
-		TAILQ_INSERT_TAIL(&priv->sh->cmng.counters[cnt_type],
+		rte_spinlock_lock(&priv->sh->sws_cmng.csl[cnt_type]);
+		TAILQ_INSERT_TAIL(&priv->sh->sws_cmng.counters[cnt_type],
 				  cnt, next);
-		rte_spinlock_unlock(&priv->sh->cmng.csl[cnt_type]);
+		rte_spinlock_unlock(&priv->sh->sws_cmng.csl[cnt_type]);
 	}
 }
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 52125c861e..59d9db04d3 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -477,7 +477,8 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
 				  enum rte_flow_action_type type,
 				  uint16_t action_src,
 				  uint16_t action_dst)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -512,7 +513,8 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				uint16_t action_src,
 				uint16_t action_dst,
 				uint16_t len)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -582,7 +584,8 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 				     uint16_t action_dst,
 				     uint32_t idx,
 				     struct mlx5_shared_action_rss *rss)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -621,7 +624,8 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 				     uint16_t action_src,
 				     uint16_t action_dst,
 				     cnt_id_t cnt_id)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -717,6 +721,10 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/* Not supported, prevent by validate function. */
+		MLX5_ASSERT(0);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
 				       idx, &acts->rule_acts[action_dst]))
@@ -1109,7 +1117,7 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	cnt_id_t cnt_id;
 	int ret;
 
-	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0);
 	if (ret != 0)
 		return ret;
 	ret = mlx5_hws_cnt_pool_get_action_offset
@@ -1250,8 +1258,6 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to the rte_eth_dev structure.
  * @param[in] cfg
  *   Pointer to the table configuration.
- * @param[in] item_templates
- *   Item template array to be binded to the table.
  * @param[in/out] acts
  *   Pointer to the template HW steering DR actions.
  * @param[in] at
@@ -1260,7 +1266,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to error structure.
  *
  * @return
- *    Table on success, NULL otherwise and rte_errno is set.
+ *   0 on success, a negative errno otherwise and rte_errno is set.
  */
 static int
 __flow_hw_actions_translate(struct rte_eth_dev *dev,
@@ -1289,6 +1295,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t jump_pos;
 	uint32_t ct_idx;
 	int err;
+	uint32_t target_grp = 0;
 
 	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
@@ -1516,8 +1523,42 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 							action_pos))
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Age action on root table is not supported in HW steering mode");
+			}
+			action_pos = at->actions_off[actions - at->actions];
+			if (__flow_hw_act_data_general_append(priv, acts,
+							 actions->type,
+							 actions - action_start,
+							 action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			action_pos = at->actions_off[actions - action_start];
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Counter action on root table is not supported in HW steering mode");
+			}
+			if ((at->action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * When both COUNT and AGE are requested, it is
+				 * saved as AGE action which creates also the
+				 * counter.
+				 */
+				break;
+			action_pos = at->actions_off[actions - at->actions];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
@@ -1744,6 +1785,10 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *   Pointer to the flow table.
  * @param[in] it_idx
  *   Item template index the action template refer to.
+ * @param[in] action_flags
+ *   Actions bit-map detected in this template.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
  * @param[in] rule_act
  *   Pointer to the shared action's destination rule DR action.
  *
@@ -1754,7 +1799,8 @@ static __rte_always_inline int
 flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
-				const uint8_t it_idx,
+				const uint8_t it_idx, uint64_t action_flags,
+				struct rte_flow_hw *flow,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -1762,11 +1808,14 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
 	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_age_info *age_info;
+	struct mlx5_hws_age_param *param;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
 		       ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	uint64_t item_flags;
+	cnt_id_t age_cnt;
 
 	memset(&act_data, 0, sizeof(act_data));
 	switch (type) {
@@ -1792,6 +1841,44 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				&rule_act->action,
 				&rule_act->counter.offset))
 			return -1;
+		flow->cnt_id = act_idx;
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/*
+		 * Save the index with the indirect type, to recognize
+		 * it in flow destroy.
+		 */
+		flow->age_idx = act_idx;
+		if (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+			/*
+			 * The mutual update for idirect AGE & COUNT will be
+			 * performed later after we have ID for both of them.
+			 */
+			break;
+		age_info = GET_PORT_AGE_INFO(priv);
+		param = mlx5_ipool_get(age_info->ages_ipool, idx);
+		if (param == NULL)
+			return -1;
+		if (action_flags & MLX5_FLOW_ACTION_COUNT) {
+			if (mlx5_hws_cnt_pool_get(priv->hws_cpool,
+						  &param->queue_id, &age_cnt,
+						  idx) < 0)
+				return -1;
+			flow->cnt_id = age_cnt;
+			param->nb_cnts++;
+		} else {
+			/*
+			 * Get the counter of this indirect AGE or create one
+			 * if doesn't exist.
+			 */
+			age_cnt = mlx5_hws_age_cnt_get(priv, param, idx);
+			if (age_cnt == 0)
+				return -1;
+		}
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+						     age_cnt, &rule_act->action,
+						     &rule_act->counter.offset))
+			return -1;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
@@ -1952,7 +2039,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t queue)
+			  uint32_t queue,
+			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1965,6 +2053,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
 	const struct rte_flow_action_meter *meter = NULL;
+	const struct rte_flow_action_age *age = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1972,6 +2061,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	uint32_t age_idx = 0;
 	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
@@ -2024,6 +2114,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
 					(dev, queue, action, table, it_idx,
+					 at->action_flags, job->flow,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -2132,9 +2223,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			age = action->conf;
+			/*
+			 * First, create the AGE parameter, then create its
+			 * counter later:
+			 * Regular counter - in next case.
+			 * Indirect counter - update it after the loop.
+			 */
+			age_idx = mlx5_hws_age_action_create(priv, queue, 0,
+							     age,
+							     job->flow->idx,
+							     error);
+			if (age_idx == 0)
+				return -rte_errno;
+			job->flow->age_idx = age_idx;
+			if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+				/*
+				 * When AGE uses indirect counter, no need to
+				 * create counter but need to update it with the
+				 * AGE parameter, will be done after the loop.
+				 */
+				break;
+			/* Fall-through. */
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
-					&cnt_id);
+						    &cnt_id, age_idx);
 			if (ret != 0)
 				return ret;
 			ret = mlx5_hws_cnt_pool_get_action_offset
@@ -2191,6 +2305,25 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT) {
+		if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE) {
+			age_idx = job->flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+			if (mlx5_hws_cnt_age_get(priv->hws_cpool,
+						 job->flow->cnt_id) != age_idx)
+				/*
+				 * This is first use of this indirect counter
+				 * for this indirect AGE, need to increase the
+				 * number of counters.
+				 */
+				mlx5_hws_age_nb_cnt_increase(priv, age_idx);
+		}
+		/*
+		 * Update this indirect counter the indirect/direct AGE in which
+		 * using it.
+		 */
+		mlx5_hws_cnt_age_set(priv->hws_cpool, job->flow->cnt_id,
+				     age_idx);
+	}
 	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
@@ -2340,8 +2473,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
-				      pattern_template_index, actions, rule_acts, queue)) {
+	if (flow_hw_actions_construct(dev, job,
+				      &table->ats[action_template_index],
+				      pattern_template_index, actions,
+				      rule_acts, queue, error)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -2426,6 +2561,49 @@ flow_hw_async_flow_destroy(struct rte_eth_dev *dev,
 			"fail to create rte flow");
 }
 
+/**
+ * Release the AGE and counter for given flow.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue
+ *   The queue to release the counter.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
+ * @param[out] error
+ *   Pointer to error structure.
+ */
+static void
+flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
+			  struct rte_flow_hw *flow,
+			  struct rte_flow_error *error)
+{
+	if (mlx5_hws_cnt_is_shared(priv->hws_cpool, flow->cnt_id)) {
+		if (flow->age_idx && !mlx5_hws_age_is_indirect(flow->age_idx)) {
+			/* Remove this AGE parameter from indirect counter. */
+			mlx5_hws_cnt_age_set(priv->hws_cpool, flow->cnt_id, 0);
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+			flow->age_idx = 0;
+		}
+		return;
+	}
+	/* Put the counter first to reduce the race risk in BG thread. */
+	mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue, &flow->cnt_id);
+	flow->cnt_id = 0;
+	if (flow->age_idx) {
+		if (mlx5_hws_age_is_indirect(flow->age_idx)) {
+			uint32_t idx = flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+
+			mlx5_hws_age_nb_cnt_decrease(priv, idx);
+		} else {
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+		}
+		flow->age_idx = 0;
+	}
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2472,13 +2650,9 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
-			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
-			    mlx5_hws_cnt_is_shared
-				(priv->hws_cpool, job->flow->cnt_id) == false) {
-				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
-						&job->flow->cnt_id);
-				job->flow->cnt_id = 0;
-			}
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id))
+				flow_hw_age_count_release(priv, queue,
+							  job->flow, error);
 			if (job->flow->mtr_id) {
 				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
 				job->flow->mtr_id = 0;
@@ -3131,100 +3305,315 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static inline int
-flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
-				const struct rte_flow_action masks[],
-				const struct rte_flow_action *ins_actions,
-				const struct rte_flow_action *ins_masks,
-				struct rte_flow_action *new_actions,
-				struct rte_flow_action *new_masks,
-				uint16_t *ins_pos)
+/**
+ * Validate AGE action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[in] fixed_cnt
+ *   Indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_age(struct rte_eth_dev *dev,
+			    const struct rte_flow_action *action,
+			    uint64_t action_flags, bool fixed_cnt,
+			    struct rte_flow_error *error)
 {
-	uint16_t idx, total = 0;
-	uint16_t end_idx = UINT16_MAX;
-	bool act_end = false;
-	bool modify_field = false;
-	bool rss_or_queue = false;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
 
-	MLX5_ASSERT(actions && masks);
-	MLX5_ASSERT(new_actions && new_masks);
-	MLX5_ASSERT(ins_actions && ins_masks);
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_RSS:
-		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			/* It is assumed that application provided only single RSS/QUEUE action. */
-			MLX5_ASSERT(!rss_or_queue);
-			rss_or_queue = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			modify_field = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_END:
-			end_idx = idx;
-			act_end = true;
-			break;
-		default:
-			break;
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "AGE action not supported");
+	if (age_info->ages_ipool == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "aging pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_AGE) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate AGE actions set");
+	if (fixed_cnt)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "AGE and fixed COUNT combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate count action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_count(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      const struct rte_flow_action *mask,
+			      uint64_t action_flags,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count = mask->conf;
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "count action not supported");
+	if (!priv->hws_cpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "counters pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_COUNT) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate count actions set");
+	if (count && count->id && (action_flags & MLX5_FLOW_ACTION_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, mask,
+					  "AGE and COUNT action shared by mask combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate meter_mark action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_meter_mark(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	RTE_SET_USED(action);
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark action not supported");
+	if (!priv->hws_mpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark pool not initialized");
+	return 0;
+}
+
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in, out] action_flags
+ *   Holds the actions detected until now.
+ * @param[in, out] fixed_cnt
+ *   Pointer to indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_indirect(struct rte_eth_dev *dev,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *mask,
+				 uint64_t *action_flags, bool *fixed_cnt,
+				 struct rte_flow_error *error)
+{
+	uint32_t type;
+	int ret;
+
+	if (!mask)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "Unable to determine indirect action type without a mask specified");
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		ret = flow_hw_validate_action_meter_mark(dev, mask, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_METER;
+		break;
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_RSS;
+		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_CT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (action->conf && mask->conf) {
+			if ((*action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (*action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * AGE cannot use indirect counter which is
+				 * shared with enother flow rules.
+				 */
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "AGE and fixed COUNT combination is not supported");
+			*fixed_cnt = true;
 		}
+		ret = flow_hw_validate_action_count(dev, action, mask,
+						    *action_flags, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_COUNT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		ret = flow_hw_validate_action_age(dev, action, *action_flags,
+						  *fixed_cnt, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_AGE;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, mask,
+					  "Unsupported indirect action type");
 	}
-	if (!rss_or_queue)
-		return 0;
-	else if (idx >= MLX5_HW_MAX_ACTS)
-		return -1; /* No more space. */
-	total = idx;
-	/*
-	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
-	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
-	 * first MODIFY_FIELD flow action.
-	 */
-	if (modify_field) {
-		*ins_pos = end_idx;
-		goto insert_meta_copy;
-	}
-	/*
-	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
-	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	return 0;
+}
+
+/**
+ * Validate raw_encap action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_raw_encap(struct rte_eth_dev *dev __rte_unused,
+				  const struct rte_flow_action *action,
+				  struct rte_flow_error *error)
+{
+	const struct rte_flow_action_raw_encap *raw_encap_data = action->conf;
+
+	if (!raw_encap_data || !raw_encap_data->size || !raw_encap_data->data)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "invalid raw_encap_data");
+	return 0;
+}
+
+static inline uint16_t
+flow_hw_template_expand_modify_field(const struct rte_flow_action actions[],
+				     const struct rte_flow_action masks[],
+				     const struct rte_flow_action *mf_action,
+				     const struct rte_flow_action *mf_mask,
+				     struct rte_flow_action *new_actions,
+				     struct rte_flow_action *new_masks,
+				     uint64_t flags, uint32_t act_num)
+{
+	uint32_t i, tail;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(mf_action && mf_mask);
+	if (flags & MLX5_FLOW_ACTION_MODIFY_FIELD) {
+		/*
+		 * Application action template already has Modify Field.
+		 * It's location will be used in DR.
+		 * Expanded MF action can be added before the END.
+		 */
+		i = act_num - 1;
+		goto insert;
+	}
+	/**
+	 * Locate the first action positioned BEFORE the new MF.
+	 *
+	 * Search for a place to insert modify header
+	 * from the END action backwards:
+	 * 1. END is always present in actions array
+	 * 2. END location is always at action[act_num - 1]
+	 * 3. END always positioned AFTER modify field location
+	 *
+	 * Relative actions order is the same for RX, TX and FDB.
+	 *
+	 * Current actions order (draft-3)
+	 * @see action_order_arr[]
 	 */
-	act_end = false;
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_COUNT:
-		case RTE_FLOW_ACTION_TYPE_METER:
-		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+	for (i = act_num - 2; (int)i >= 0; i--) {
+		enum rte_flow_action_type type = actions[i].type;
+
+		if (type == RTE_FLOW_ACTION_TYPE_INDIRECT)
+			type = masks[i].type;
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_DROP:
+		case RTE_FLOW_ACTION_TYPE_JUMP:
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			*ins_pos = idx;
-			act_end = true;
-			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+		case RTE_FLOW_ACTION_TYPE_VOID:
 		case RTE_FLOW_ACTION_TYPE_END:
-			act_end = true;
 			break;
 		default:
+			i++; /* new MF inserted AFTER actions[i] */
+			goto insert;
 			break;
 		}
 	}
-insert_meta_copy:
-	MLX5_ASSERT(*ins_pos != UINT16_MAX);
-	MLX5_ASSERT(*ins_pos < total);
-	/* Before the position, no change for the actions. */
-	for (idx = 0; idx < *ins_pos; idx++) {
-		new_actions[idx] = actions[idx];
-		new_masks[idx] = masks[idx];
-	}
-	/* Insert the new action and mask to the position. */
-	new_actions[idx] = *ins_actions;
-	new_masks[idx] = *ins_masks;
-	/* Remaining content is right shifted by one position. */
-	for (; idx < total; idx++) {
-		new_actions[idx + 1] = actions[idx];
-		new_masks[idx + 1] = masks[idx];
-	}
-	return 0;
+	i = 0;
+insert:
+	tail = act_num - i; /* num action to move */
+	memcpy(new_actions, actions, sizeof(actions[0]) * i);
+	new_actions[i] = *mf_action;
+	memcpy(new_actions + i + 1, actions + i, sizeof(actions[0]) * tail);
+	memcpy(new_masks, masks, sizeof(masks[0]) * i);
+	new_masks[i] = *mf_mask;
+	memcpy(new_masks + i + 1, masks + i, sizeof(masks[0]) * tail);
+	return i;
 }
 
 static int
@@ -3295,13 +3684,17 @@ flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_actions_validate(struct rte_eth_dev *dev,
-			const struct rte_flow_actions_template_attr *attr,
-			const struct rte_flow_action actions[],
-			const struct rte_flow_action masks[],
-			struct rte_flow_error *error)
+mlx5_flow_hw_actions_validate(struct rte_eth_dev *dev,
+			      const struct rte_flow_actions_template_attr *attr,
+			      const struct rte_flow_action actions[],
+			      const struct rte_flow_action masks[],
+			      uint64_t *act_flags,
+			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count_mask = NULL;
+	bool fixed_cnt = false;
+	uint64_t action_flags = 0;
 	uint16_t i;
 	bool actions_end = false;
 	int ret;
@@ -3327,46 +3720,70 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_indirect(dev, action,
+							       mask,
+							       &action_flags,
+							       &fixed_cnt,
+							       error);
+			if (ret < 0)
+				return ret;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_MARK;
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DROP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_JUMP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_QUEUE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_RSS;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_raw_encap(dev, action, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_meter_mark(dev, action,
+								 error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
@@ -3374,21 +3791,43 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 									error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			ret = flow_hw_validate_action_represented_port
 					(dev, action, mask, error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_PORT_ID;
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			if (count_mask && count_mask->id)
+				fixed_cnt = true;
+			ret = flow_hw_validate_action_age(dev, action,
+							  action_flags,
+							  fixed_cnt, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_AGE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_count(dev, action, mask,
+							    action_flags,
+							    error);
+			if (ret < 0)
+				return ret;
+			count_mask = mask->conf;
+			action_flags |= MLX5_FLOW_ACTION_COUNT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_CT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_flags |= MLX5_FLOW_ACTION_OF_POP_VLAN;
+			break;
 		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			action_flags |= MLX5_FLOW_ACTION_OF_SET_VLAN_VID;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
 			ret = flow_hw_validate_action_push_vlan
@@ -3398,6 +3837,7 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			i += is_of_vlan_pcp_present(action) ?
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
+			action_flags |= MLX5_FLOW_ACTION_OF_PUSH_VLAN;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -3409,9 +3849,23 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 						  "action not supported in template API");
 		}
 	}
+	if (act_flags != NULL)
+		*act_flags = action_flags;
 	return 0;
 }
 
+static int
+flow_hw_actions_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error)
+{
+	return mlx5_flow_hw_actions_validate(dev, attr, actions, masks, NULL,
+					     error);
+}
+
+
 static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
 	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
@@ -3424,7 +3878,6 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
-	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
 	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
@@ -3434,7 +3887,7 @@ static int
 flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 					  unsigned int action_src,
 					  enum mlx5dr_action_type *action_types,
-					  uint16_t *curr_off,
+					  uint16_t *curr_off, uint16_t *cnt_off,
 					  struct rte_flow_actions_template *at)
 {
 	uint32_t type;
@@ -3451,10 +3904,18 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		at->actions_off[action_src] = *curr_off;
-		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
-		*curr_off = *curr_off + 1;
+		/*
+		 * Both AGE and COUNT action need counter, the first one fills
+		 * the action_types array, and the second only saves the offset.
+		 */
+		if (*cnt_off == UINT16_MAX) {
+			*cnt_off = *curr_off;
+			action_types[*cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			*curr_off = *curr_off + 1;
+		}
+		at->actions_off[action_src] = *cnt_off;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		at->actions_off[action_src] = *curr_off;
@@ -3493,6 +3954,7 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
 	uint16_t reformat_off = UINT16_MAX;
 	uint16_t mhdr_off = UINT16_MAX;
+	uint16_t cnt_off = UINT16_MAX;
 	int ret;
 	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		const struct rte_flow_action_raw_encap *raw_encap_data;
@@ -3505,9 +3967,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
-									action_types,
-									&curr_off, at);
+			ret = flow_hw_dr_actions_template_handle_shared
+								 (&at->masks[i],
+								  i,
+								  action_types,
+								  &curr_off,
+								  &cnt_off, at);
 			if (ret)
 				return NULL;
 			break;
@@ -3563,6 +4028,19 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 			if (curr_off >= MLX5_HW_MAX_ACTS)
 				goto err_actions_num;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/*
+			 * Both AGE and COUNT action need counter, the first
+			 * one fills the action_types array, and the second only
+			 * saves the offset.
+			 */
+			if (cnt_off == UINT16_MAX) {
+				cnt_off = curr_off++;
+				action_types[cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			}
+			at->actions_off[i] = cnt_off;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3703,6 +4181,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = UINT16_MAX;
+	uint64_t action_flags = 0;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
@@ -3745,22 +4224,9 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
+	if (mlx5_flow_hw_actions_validate(dev, attr, actions, masks,
+					  &action_flags, error))
 		return NULL;
-	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
-	    priv->sh->config.dv_esw_en) {
-		/* Application should make sure only one Q/RSS exist in one rule. */
-		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
-						    tmp_action, tmp_mask, &pos)) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					   "Failed to concatenate new action/mask");
-			return NULL;
-		} else if (pos != UINT16_MAX) {
-			ra = tmp_action;
-			rm = tmp_mask;
-		}
-	}
 	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		switch (ra[i].type) {
 		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
@@ -3786,6 +4252,28 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
 		return NULL;
 	}
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en &&
+	    (action_flags & (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS))) {
+		/* Insert META copy */
+		if (act_num + 1 > MLX5_HW_MAX_ACTS) {
+			rte_flow_error_set(error, E2BIG,
+					   RTE_FLOW_ERROR_TYPE_ACTION,
+					   NULL, "cannot expand: too many actions");
+			return NULL;
+		}
+		/* Application should make sure only one Q/RSS exist in one rule. */
+		pos = flow_hw_template_expand_modify_field(actions, masks,
+							   &rx_cpy,
+							   &rx_cpy_mask,
+							   tmp_action, tmp_mask,
+							   action_flags,
+							   act_num);
+		ra = tmp_action;
+		rm = tmp_mask;
+		act_num++;
+		action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
+	}
 	if (set_vlan_vid_ix != -1) {
 		/* If temporary action buffer was not used, copy template actions to it */
 		if (ra == actions && rm == masks) {
@@ -3856,6 +4344,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	at->tmpl = flow_hw_dr_actions_template_create(at);
 	if (!at->tmpl)
 		goto error;
+	at->action_flags = action_flags;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
@@ -4199,6 +4688,7 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t port_id = dev->data->port_id;
 	struct rte_mtr_capabilities mtr_cap;
 	int ret;
@@ -4212,6 +4702,8 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 	ret = rte_mtr_capabilities_get(port_id, &mtr_cap, NULL);
 	if (!ret)
 		port_info->max_nb_meters = mtr_cap.n_max;
+	port_info->max_nb_counters = priv->sh->hws_max_nb_counters;
+	port_info->max_nb_aging_objects = port_info->max_nb_counters;
 	return 0;
 }
 
@@ -5586,8 +6078,6 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			goto err;
 		}
 	}
-	if (_queue_attr)
-		mlx5_free(_queue_attr);
 	if (port_attr->nb_conn_tracks) {
 		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
 			   sizeof(*priv->ct_mng);
@@ -5604,13 +6094,37 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
-				nb_queue);
+							   nb_queue);
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	if (port_attr->nb_aging_objects) {
+		if (port_attr->nb_counters == 0) {
+			/*
+			 * Aging management uses counter. Number counters
+			 * requesting should take into account a counter for
+			 * each flow rules containing AGE without counter.
+			 */
+			DRV_LOG(ERR, "Port %u AGE objects are requested (%u) "
+				"without counters requesting.",
+				dev->data->port_id,
+				port_attr->nb_aging_objects);
+			rte_errno = EINVAL;
+			goto err;
+		}
+		ret = mlx5_hws_age_pool_init(dev, port_attr, nb_queue);
+		if (ret < 0)
+			goto err;
+	}
 	ret = flow_hw_create_vlan(dev);
 	if (ret)
 		goto err;
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	if (port_attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE)
+		priv->hws_strict_queue = 1;
+#endif
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -5621,6 +6135,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
+	if (priv->hws_cpool) {
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+		priv->hws_cpool = NULL;
+	}
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -5694,8 +6214,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
-	if (priv->hws_cpool)
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
+	if (priv->hws_cpool) {
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+		priv->hws_cpool = NULL;
+	}
 	if (priv->hws_ctpool) {
 		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
 		priv->hws_ctpool = NULL;
@@ -6030,13 +6554,81 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
 }
 
+/**
+ * Validate shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used.
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] conf
+ *   Indirect action configuration.
+ * @param[in] action
+ *   rte_flow action detail.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_handle_validate(struct rte_eth_dev *dev, uint32_t queue,
+			       const struct rte_flow_op_attr *attr,
+			       const struct rte_flow_indir_action_conf *conf,
+			       const struct rte_flow_action *action,
+			       void *user_data,
+			       struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	RTE_SET_USED(attr);
+	RTE_SET_USED(queue);
+	RTE_SET_USED(user_data);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		if (!priv->hws_age_req)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "aging pool not initialized");
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (!priv->hws_cpool)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "counters pool not initialized");
+		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		if (priv->hws_ctpool == NULL)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "CT pool not initialized");
+		return mlx5_validate_action_ct(dev, action->conf, error);
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		return flow_hw_validate_action_meter_mark(dev, action, error);
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		return flow_dv_action_validate(dev, conf, action, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
+	}
+	return 0;
+}
+
 /**
  * Create shared action.
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] conf
@@ -6061,16 +6653,44 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
+	uint32_t age_idx;
 
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		if (priv->hws_strict_queue) {
+			struct mlx5_age_info *info = GET_PORT_AGE_INFO(priv);
+
+			if (queue >= info->hw_q_age->nb_rings) {
+				rte_flow_error_set(error, EINVAL,
+						   RTE_FLOW_ERROR_TYPE_ACTION,
+						   NULL,
+						   "Invalid queue ID for indirect AGE.");
+				rte_errno = EINVAL;
+				return NULL;
+			}
+		}
+		age = action->conf;
+		age_idx = mlx5_hws_age_action_create(priv, queue, true, age,
+						     0, error);
+		if (age_idx == 0) {
+			rte_flow_error_set(error, ENODEV,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "AGE are not configured!");
+		} else {
+			age_idx = (MLX5_INDIRECT_ACTION_TYPE_AGE <<
+				   MLX5_INDIRECT_ACTION_TYPE_OFFSET) | age_idx;
+			handle =
+			    (struct rte_flow_action_handle *)(uintptr_t)age_idx;
+		}
+		break;
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0))
 			rte_flow_error_set(error, ENODEV,
 					RTE_FLOW_ERROR_TYPE_ACTION,
 					NULL,
@@ -6090,8 +6710,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
 		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
 		break;
-	default:
+	case RTE_FLOW_ACTION_TYPE_RSS:
 		handle = flow_dv_action_create(dev, conf, action, error);
+		break;
+	default:
+		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				   NULL, "action type not supported");
+		return NULL;
 	}
 	return handle;
 }
@@ -6102,7 +6727,7 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6125,7 +6750,6 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6140,6 +6764,8 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_update(priv, idx, update, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
@@ -6173,11 +6799,15 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		return 0;
-	default:
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
+		return flow_dv_action_update(dev, handle, update, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
-	return flow_dv_action_update(dev, handle, update, error);
+	return 0;
 }
 
 /**
@@ -6186,7 +6816,7 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6208,6 +6838,7 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -6218,7 +6849,16 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_destroy(priv, age_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
+		if (age_idx != 0)
+			/*
+			 * If this counter belongs to indirect AGE, here is the
+			 * time to update the AGE.
+			 */
+			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
@@ -6243,10 +6883,15 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
 		mlx5_ipool_free(pool->idx_pool, idx);
-		return 0;
-	default:
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_destroy(dev, handle, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
+	return 0;
 }
 
 static int
@@ -6256,13 +6901,14 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hws_cnt *cnt;
 	struct rte_flow_query_count *qc = data;
-	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint32_t iidx;
 	uint64_t pkts, bytes;
 
 	if (!mlx5_hws_cnt_id_valid(counter))
 		return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				"counter are not available");
+	iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
 	cnt = &priv->hws_cpool->pool[iidx];
 	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
 	qc->hits_set = 1;
@@ -6276,12 +6922,64 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	return 0;
 }
 
+/**
+ * Query a flow rule AGE action for aging information.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] age_idx
+ *   Index of AGE action parameter.
+ * @param[out] data
+ *   Data retrieved by the query.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_query_age(const struct rte_eth_dev *dev, uint32_t age_idx, void *data,
+		  struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+	struct rte_flow_query_age *resp = data;
+
+	if (!param || !param->timeout)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "age data not available");
+	switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+	case HWS_AGE_AGED_OUT_REPORTED:
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		resp->aged = 1;
+		break;
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		resp->aged = 0;
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * When state is FREE the flow itself should be invalid.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	resp->sec_since_last_hit_valid = !resp->aged;
+	if (resp->sec_since_last_hit_valid)
+		resp->sec_since_last_hit = __atomic_load_n
+				 (&param->sec_since_last_hit, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
-flow_hw_query(struct rte_eth_dev *dev,
-	      struct rte_flow *flow __rte_unused,
-	      const struct rte_flow_action *actions __rte_unused,
-	      void *data __rte_unused,
-	      struct rte_flow_error *error __rte_unused)
+flow_hw_query(struct rte_eth_dev *dev, struct rte_flow *flow,
+	      const struct rte_flow_action *actions, void *data,
+	      struct rte_flow_error *error)
 {
 	int ret = -EINVAL;
 	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
@@ -6292,7 +6990,11 @@ flow_hw_query(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
-						  error);
+						    error);
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			ret = flow_hw_query_age(dev, hw_flow->age_idx, data,
+						error);
 			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
@@ -6304,6 +7006,32 @@ flow_hw_query(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_indir_action_conf *conf,
+			const struct rte_flow_action *action,
+			struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_validate(dev, MLX5_HW_INV_QUEUE, NULL,
+					      conf, action, NULL, err);
+}
+
 /**
  * Create indirect action.
  *
@@ -6393,17 +7121,118 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return flow_hw_query_age(dev, age_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	default:
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_query(dev, handle, data, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
 }
 
+/**
+ * Get aged-out flows of a given port on the given HWS flow queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query. Ignored when RTE_FLOW_PORT_FLAG_STRICT_QUEUE not set.
+ * @param[in, out] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array, otherwise negative errno value.
+ */
+static int
+flow_hw_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			 void **contexts, uint32_t nb_contexts,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct rte_ring *r;
+	int nb_flows = 0;
+
+	if (nb_contexts && !contexts)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "empty context");
+	if (priv->hws_strict_queue) {
+		if (queue_id >= age_info->hw_q_age->nb_rings)
+			return rte_flow_error_set(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						NULL, "invalid queue id");
+		r = age_info->hw_q_age->aged_lists[queue_id];
+	} else {
+		r = age_info->hw_age.aged_list;
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	if (nb_contexts == 0)
+		return rte_ring_count(r);
+	while ((uint32_t)nb_flows < nb_contexts) {
+		uint32_t age_idx;
+
+		if (rte_ring_dequeue_elem(r, &age_idx, sizeof(uint32_t)) < 0)
+			break;
+		/* get the AGE context if the aged-out index is still valid. */
+		contexts[nb_flows] = mlx5_hws_age_context_get(priv, age_idx);
+		if (!contexts[nb_flows])
+			continue;
+		nb_flows++;
+	}
+	return nb_flows;
+}
+
+/**
+ * Get aged-out flows.
+ *
+ * This function is relevant only if RTE_FLOW_PORT_FLAG_STRICT_QUEUE isn't set.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+static int
+flow_hw_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
+		       uint32_t nb_contexts, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hws_strict_queue)
+		DRV_LOG(WARNING,
+			"port %u get aged flows called in strict queue mode.",
+			dev->data->port_id);
+	return flow_hw_get_q_aged_flows(dev, 0, contexts, nb_contexts, error);
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -6422,12 +7251,14 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
-	.action_validate = flow_dv_action_validate,
+	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
 	.action_update = flow_hw_action_update,
 	.action_query = flow_hw_action_query,
 	.query = flow_hw_query,
+	.get_aged_flows = flow_hw_get_aged_flows,
+	.get_q_aged_flows = flow_hw_get_q_aged_flows,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 7ffaf4c227..81a33ddf09 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -122,7 +122,7 @@ flow_verbs_counter_get_by_idx(struct rte_eth_dev *dev,
 			      struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	idx = (idx - 1) & (MLX5_CNT_SHARED_OFFSET - 1);
@@ -215,7 +215,7 @@ static uint32_t
 flow_verbs_counter_new(struct rte_eth_dev *dev, uint32_t id __rte_unused)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt = NULL;
 	uint32_t n_valid = cmng->n_valid;
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
index e2408ef36d..93038ce68b 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.c
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -8,6 +8,7 @@
 #include <rte_ring.h>
 #include <mlx5_devx_cmds.h>
 #include <rte_cycles.h>
+#include <rte_eal_paging.h>
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
@@ -26,8 +27,8 @@ __hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
 	uint32_t preload;
 	uint32_t q_num = cpool->cache->q_num;
 	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
-	cnt_id_t cnt_id, iidx = 0;
-	uint32_t qidx;
+	cnt_id_t cnt_id;
+	uint32_t qidx, iidx = 0;
 	struct rte_ring *qcache = NULL;
 
 	/*
@@ -86,6 +87,174 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
 	} while (reset_cnt_num > 0);
 }
 
+/**
+ * Release AGE parameter.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param own_cnt_index
+ *   Counter ID to created only for this AGE to release.
+ *   Zero means there is no such counter.
+ * @param age_ipool
+ *   Pointer to AGE parameter indexed pool.
+ * @param idx
+ *   Index of AGE parameter in the indexed pool.
+ */
+static void
+mlx5_hws_age_param_free(struct mlx5_priv *priv, cnt_id_t own_cnt_index,
+			struct mlx5_indexed_pool *age_ipool, uint32_t idx)
+{
+	if (own_cnt_index) {
+		struct mlx5_hws_cnt_pool *cpool = priv->hws_cpool;
+
+		MLX5_ASSERT(mlx5_hws_cnt_is_shared(cpool, own_cnt_index));
+		mlx5_hws_cnt_shared_put(cpool, &own_cnt_index);
+	}
+	mlx5_ipool_free(age_ipool, idx);
+}
+
+/**
+ * Check and callback event for new aged flow in the HWS counter pool.
+ *
+ * @param[in] priv
+ *   Pointer to port private object.
+ * @param[in] cpool
+ *   Pointer to current counter pool.
+ */
+static void
+mlx5_hws_aging_check(struct mlx5_priv *priv, struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct flow_counter_stats *stats = cpool->raw_mng->raw;
+	struct mlx5_hws_age_param *param;
+	struct rte_ring *r;
+	const uint64_t curr_time = MLX5_CURR_TIME_SEC;
+	const uint32_t time_delta = curr_time - cpool->time_of_last_age_check;
+	uint32_t nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(cpool);
+	uint16_t expected1 = HWS_AGE_CANDIDATE;
+	uint16_t expected2 = HWS_AGE_CANDIDATE_INSIDE_RING;
+	uint32_t i;
+
+	cpool->time_of_last_age_check = curr_time;
+	for (i = 0; i < nb_alloc_cnts; ++i) {
+		uint32_t age_idx = cpool->pool[i].age_idx;
+		uint64_t hits;
+
+		if (!cpool->pool[i].in_used || age_idx == 0)
+			continue;
+		param = mlx5_ipool_get(age_info->ages_ipool, age_idx);
+		if (unlikely(param == NULL)) {
+			/*
+			 * When AGE which used indirect counter it is user
+			 * responsibility not using this indirect counter
+			 * without this AGE.
+			 * If this counter is used after the AGE was freed, the
+			 * AGE index is invalid and using it here will cause a
+			 * segmentation fault.
+			 */
+			DRV_LOG(WARNING,
+				"Counter %u is lost his AGE, it is unused.", i);
+			continue;
+		}
+		if (param->timeout == 0)
+			continue;
+		switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+		case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		case HWS_AGE_AGED_OUT_REPORTED:
+			/* Already aged-out, no action is needed. */
+			continue;
+		case HWS_AGE_CANDIDATE:
+		case HWS_AGE_CANDIDATE_INSIDE_RING:
+			/* This AGE candidate to be aged-out, go to checking. */
+			break;
+		case HWS_AGE_FREE:
+			/*
+			 * AGE parameter with state "FREE" couldn't be pointed
+			 * by any counter since counter is destroyed first.
+			 * Fall-through.
+			 */
+		default:
+			MLX5_ASSERT(0);
+			continue;
+		}
+		hits = rte_be_to_cpu_64(stats[i].hits);
+		if (param->nb_cnts == 1) {
+			if (hits != param->accumulator_last_hits) {
+				__atomic_store_n(&param->sec_since_last_hit, 0,
+						 __ATOMIC_RELAXED);
+				param->accumulator_last_hits = hits;
+				continue;
+			}
+		} else {
+			param->accumulator_hits += hits;
+			param->accumulator_cnt++;
+			if (param->accumulator_cnt < param->nb_cnts)
+				continue;
+			param->accumulator_cnt = 0;
+			if (param->accumulator_last_hits !=
+						param->accumulator_hits) {
+				__atomic_store_n(&param->sec_since_last_hit,
+						 0, __ATOMIC_RELAXED);
+				param->accumulator_last_hits =
+							param->accumulator_hits;
+				param->accumulator_hits = 0;
+				continue;
+			}
+			param->accumulator_hits = 0;
+		}
+		if (__atomic_add_fetch(&param->sec_since_last_hit, time_delta,
+				       __ATOMIC_RELAXED) <=
+		   __atomic_load_n(&param->timeout, __ATOMIC_RELAXED))
+			continue;
+		/* Prepare the relevant ring for this AGE parameter */
+		if (priv->hws_strict_queue)
+			r = age_info->hw_q_age->aged_lists[param->queue_id];
+		else
+			r = age_info->hw_age.aged_list;
+		/* Changing the state atomically and insert it into the ring. */
+		if (__atomic_compare_exchange_n(&param->state, &expected1,
+						HWS_AGE_AGED_OUT_NOT_REPORTED,
+						false, __ATOMIC_RELAXED,
+						__ATOMIC_RELAXED)) {
+			int ret = rte_ring_enqueue_burst_elem(r, &age_idx,
+							      sizeof(uint32_t),
+							      1, NULL);
+
+			/*
+			 * The ring doesn't have enough room for this entry,
+			 * it replace back the state for the next second.
+			 *
+			 * FIXME: if until next sec it get traffic, we are going
+			 *        to lose this "aged out", will be fixed later
+			 *        when optimise it to fill ring in bulks.
+			 */
+			expected2 = HWS_AGE_AGED_OUT_NOT_REPORTED;
+			if (ret == 0 &&
+			    !__atomic_compare_exchange_n(&param->state,
+							 &expected2, expected1,
+							 false,
+							 __ATOMIC_RELAXED,
+							 __ATOMIC_RELAXED) &&
+			    expected2 == HWS_AGE_FREE)
+				mlx5_hws_age_param_free(priv,
+							param->own_cnt_index,
+							age_info->ages_ipool,
+							age_idx);
+			/* The event is irrelevant in strict queue mode. */
+			if (!priv->hws_strict_queue)
+				MLX5_AGE_SET(age_info, MLX5_AGE_EVENT_NEW);
+		} else {
+			__atomic_compare_exchange_n(&param->state, &expected2,
+						  HWS_AGE_AGED_OUT_NOT_REPORTED,
+						  false, __ATOMIC_RELAXED,
+						  __ATOMIC_RELAXED);
+		}
+	}
+	/* The event is irrelevant in strict queue mode. */
+	if (!priv->hws_strict_queue)
+		mlx5_age_event_prepare(priv->sh);
+}
+
 static void
 mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
 			   struct mlx5_hws_cnt_raw_data_mng *mng)
@@ -104,12 +273,14 @@ mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
 	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
 	int ret;
 	size_t sz = n * sizeof(struct flow_counter_stats);
+	size_t pgsz = rte_mem_page_size();
 
+	MLX5_ASSERT(pgsz > 0);
 	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
 			SOCKET_ID_ANY);
 	if (mng == NULL)
 		goto error;
-	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, pgsz,
 			SOCKET_ID_ANY);
 	if (mng->raw == NULL)
 		goto error;
@@ -146,6 +317,9 @@ mlx5_hws_cnt_svc(void *opaque)
 			    opriv->sh == sh &&
 			    opriv->hws_cpool != NULL) {
 				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+				if (opriv->hws_age_req)
+					mlx5_hws_aging_check(opriv,
+							     opriv->hws_cpool);
 			}
 		}
 		query_cycle = rte_rdtsc() - start_cycle;
@@ -158,8 +332,9 @@ mlx5_hws_cnt_svc(void *opaque)
 }
 
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg)
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct mlx5_hws_cnt_pool *cntp;
@@ -185,16 +360,26 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 	cntp->cache->preload_sz = ccfg->preload_sz;
 	cntp->cache->threshold = ccfg->threshold;
 	cntp->cache->q_num = ccfg->q_num;
+	if (pcfg->request_num > sh->hws_max_nb_counters) {
+		DRV_LOG(ERR, "Counter number %u "
+			"is greater than the maximum supported (%u).",
+			pcfg->request_num, sh->hws_max_nb_counters);
+		goto error;
+	}
 	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
 	if (cnt_num > UINT32_MAX) {
 		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
 			cnt_num);
 		goto error;
 	}
+	/*
+	 * When counter request number is supported, but the factor takes it
+	 * out of size, the factor is reduced.
+	 */
+	cnt_num = RTE_MIN((uint32_t)cnt_num, sh->hws_max_nb_counters);
 	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
-			sizeof(struct mlx5_hws_cnt) *
-			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
-			0, SOCKET_ID_ANY);
+				 sizeof(struct mlx5_hws_cnt) * cnt_num,
+				 0, SOCKET_ID_ANY);
 	if (cntp->pool == NULL)
 		goto error;
 	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
@@ -231,6 +416,8 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 		if (cntp->cache->qcache[qidx] == NULL)
 			goto error;
 	}
+	/* Initialize the time for aging-out calculation. */
+	cntp->time_of_last_age_check = MLX5_CURR_TIME_SEC;
 	return cntp;
 error:
 	mlx5_hws_cnt_pool_deinit(cntp);
@@ -297,19 +484,17 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_hws_cnt_pool *cpool)
 {
 	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
-	uint32_t max_log_bulk_sz = 0;
+	uint32_t max_log_bulk_sz = sh->hws_max_log_bulk_sz;
 	uint32_t log_bulk_sz;
-	uint32_t idx, alloced = 0;
+	uint32_t idx, alloc_candidate, alloced = 0;
 	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
 	struct mlx5_devx_counter_attr attr = {0};
 	struct mlx5_devx_obj *dcs;
 
 	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
-		DRV_LOG(ERR,
-			"Fw doesn't support bulk log max alloc");
+		DRV_LOG(ERR, "Fw doesn't support bulk log max alloc");
 		return -1;
 	}
-	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
 	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
 	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
 	attr.pd = sh->cdev->pdn;
@@ -327,18 +512,23 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 	cpool->dcs_mng.dcs[0].iidx = 0;
 	alloced = cpool->dcs_mng.dcs[0].batch_sz;
 	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
-		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+		while (idx < MLX5_HWS_CNT_DCS_NUM) {
 			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			alloc_candidate = RTE_BIT32(max_log_bulk_sz);
+			if (alloced + alloc_candidate > sh->hws_max_nb_counters)
+				continue;
 			dcs = mlx5_devx_cmd_flow_counter_alloc_general
 				(sh->cdev->ctx, &attr);
 			if (dcs == NULL)
 				goto error;
 			cpool->dcs_mng.dcs[idx].obj = dcs;
-			cpool->dcs_mng.dcs[idx].batch_sz =
-				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].batch_sz = alloc_candidate;
 			cpool->dcs_mng.dcs[idx].iidx = alloced;
 			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
 			cpool->dcs_mng.batch_total++;
+			if (alloced >= cnt_num)
+				break;
+			idx++;
 		}
 	}
 	return 0;
@@ -445,7 +635,7 @@ mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
 			dev->data->port_id);
 	pcfg.name = mp_name;
 	pcfg.request_num = pattr->nb_counters;
-	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	cpool = mlx5_hws_cnt_pool_init(priv->sh, &pcfg, &cparam);
 	if (cpool == NULL)
 		goto error;
 	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
@@ -525,4 +715,533 @@ mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
 	sh->cnt_svc = NULL;
 }
 
+/**
+ * Destroy AGE action.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ * @param error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	switch (__atomic_exchange_n(&param->state, HWS_AGE_FREE,
+				    __ATOMIC_RELAXED)) {
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_AGED_OUT_REPORTED:
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		/*
+		 * In both cases AGE is inside the ring. Change the state here
+		 * and destroy it later when it is taken out of ring.
+		 */
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * If index is valid and state is FREE, it says this AGE has
+		 * been freed for the user but not for the PMD since it is
+		 * inside the ring.
+		 */
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "this AGE has already been released");
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return 0;
+}
+
+/**
+ * Create AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue_id
+ *   Which HWS queue to be used.
+ * @param[in] shared
+ *   Whether it indirect AGE action.
+ * @param[in] flow_idx
+ *   Flow index from indexed pool.
+ *   For indirect AGE action it doesn't affect.
+ * @param[in] age
+ *   Pointer to the aging action configuration.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Index to AGE action parameter on success, 0 otherwise.
+ */
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param;
+	uint32_t age_idx;
+
+	param = mlx5_ipool_malloc(ipool, &age_idx);
+	if (param == NULL) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   "cannot allocate AGE parameter");
+		return 0;
+	}
+	MLX5_ASSERT(__atomic_load_n(&param->state,
+				    __ATOMIC_RELAXED) == HWS_AGE_FREE);
+	if (shared) {
+		param->nb_cnts = 0;
+		param->accumulator_hits = 0;
+		param->accumulator_cnt = 0;
+		flow_idx = age_idx;
+	} else {
+		param->nb_cnts = 1;
+	}
+	param->context = age->context ? age->context :
+					(void *)(uintptr_t)flow_idx;
+	param->timeout = age->timeout;
+	param->queue_id = queue_id;
+	param->accumulator_last_hits = 0;
+	param->own_cnt_index = 0;
+	param->sec_since_last_hit = 0;
+	param->state = HWS_AGE_CANDIDATE;
+	return age_idx;
+}
+
+/**
+ * Update indirect AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] idx
+ *   Index of AGE parameter.
+ * @param[in] update
+ *   Update value.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error)
+{
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	const struct rte_flow_update_age *update_ade = update;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	bool sec_since_last_hit_reset = false;
+	bool state_update = false;
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	if (update_ade->timeout_valid) {
+		uint32_t old_timeout = __atomic_exchange_n(&param->timeout,
+							   update_ade->timeout,
+							   __ATOMIC_RELAXED);
+
+		if (old_timeout == 0)
+			sec_since_last_hit_reset = true;
+		else if (old_timeout < update_ade->timeout ||
+			 update_ade->timeout == 0)
+			/*
+			 * When timeout is increased, aged-out flows might be
+			 * active again and state should be updated accordingly.
+			 * When new timeout is 0, we update the state for not
+			 * reporting aged-out stopped.
+			 */
+			state_update = true;
+	}
+	if (update_ade->touch) {
+		sec_since_last_hit_reset = true;
+		state_update = true;
+	}
+	if (sec_since_last_hit_reset)
+		__atomic_store_n(&param->sec_since_last_hit, 0,
+				 __ATOMIC_RELAXED);
+	if (state_update) {
+		uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+		/*
+		 * Change states of aged-out flows to active:
+		 *  - AGED_OUT_NOT_REPORTED -> CANDIDATE_INSIDE_RING
+		 *  - AGED_OUT_REPORTED -> CANDIDATE
+		 */
+		if (!__atomic_compare_exchange_n(&param->state, &expected,
+						 HWS_AGE_CANDIDATE_INSIDE_RING,
+						 false, __ATOMIC_RELAXED,
+						 __ATOMIC_RELAXED) &&
+		    expected == HWS_AGE_AGED_OUT_REPORTED)
+			__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+					 __ATOMIC_RELAXED);
+	}
+	return 0;
+#else
+	RTE_SET_USED(priv);
+	RTE_SET_USED(idx);
+	RTE_SET_USED(update);
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "update age action not supported");
+#endif
+}
+
+/**
+ * Get the AGE context if the aged-out index is still valid.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ *
+ * @return
+ *   AGE context if the index is still aged-out, NULL otherwise.
+ */
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+	MLX5_ASSERT(param != NULL);
+	if (__atomic_compare_exchange_n(&param->state, &expected,
+					HWS_AGE_AGED_OUT_REPORTED, false,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
+		return param->context;
+	switch (expected) {
+	case HWS_AGE_FREE:
+		/*
+		 * This AGE couldn't have been destroyed since it was inside
+		 * the ring. Its state has updated, and now it is actually
+		 * destroyed.
+		 */
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+				 __ATOMIC_RELAXED);
+		break;
+	case HWS_AGE_CANDIDATE:
+		/*
+		 * Only BG thread pushes to ring and it never pushes this state.
+		 * When AGE inside the ring becomes candidate, it has a special
+		 * state called HWS_AGE_CANDIDATE_INSIDE_RING.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_REPORTED:
+		/*
+		 * Only this thread (doing query) may write this state, and it
+		 * happens only after the query thread takes it out of the ring.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		/*
+		 * In this case the compare return true and function return
+		 * the context immediately.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return NULL;
+}
+
+#ifdef RTE_ARCH_64
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX UINT32_MAX
+#else
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX RTE_BIT32(8)
+#endif
+
+/**
+ * Get the size of aged out ring list for each queue.
+ *
+ * The size is one percent of nb_counters divided by nb_queues.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is on.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ * @param nb_queues
+ *   Number of HWS queues in this port.
+ *
+ * @return
+ *   Size of aged out ring per queue.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_q_ring_size_get(uint32_t nb_counters, uint32_t nb_queues)
+{
+	uint32_t size = rte_align32pow2((nb_counters / 100) / nb_queues);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Get the size of the aged out ring list.
+ *
+ * The size is one percent of nb_counters.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is off.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ *
+ * @return
+ *   Size of the aged out ring list.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_ring_size_get(uint32_t nb_counters)
+{
+	uint32_t size = rte_align32pow2(nb_counters / 100);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Initialize the shared aging list information per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param nb_queues
+ *   Number of HWS queues.
+ * @param strict_queue
+ *   Indicator whether is strict_queue mode.
+ * @param ring_size
+ *   Size of aged-out ring for creation.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hws_age_info_init(struct rte_eth_dev *dev, uint16_t nb_queues,
+		       bool strict_queue, uint32_t ring_size)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint32_t flags = RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct rte_ring *r = NULL;
+	uint32_t qidx;
+
+	age_info->flags = 0;
+	if (strict_queue) {
+		size_t size = sizeof(*age_info->hw_q_age) +
+			      sizeof(struct rte_ring *) * nb_queues;
+
+		age_info->hw_q_age = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+						 size, 0, SOCKET_ID_ANY);
+		if (age_info->hw_q_age == NULL)
+			return -ENOMEM;
+		for (qidx = 0; qidx < nb_queues; ++qidx) {
+			snprintf(mz_name, sizeof(mz_name),
+				 "port_%u_queue_%u_aged_out_ring",
+				 dev->data->port_id, qidx);
+			r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY,
+					    flags);
+			if (r == NULL) {
+				DRV_LOG(ERR, "\"%s\" creation failed: %s",
+					mz_name, rte_strerror(rte_errno));
+				goto error;
+			}
+			age_info->hw_q_age->aged_lists[qidx] = r;
+			DRV_LOG(DEBUG,
+				"\"%s\" is successfully created (size=%u).",
+				mz_name, ring_size);
+		}
+		age_info->hw_q_age->nb_rings = nb_queues;
+	} else {
+		snprintf(mz_name, sizeof(mz_name), "port_%u_aged_out_ring",
+			 dev->data->port_id);
+		r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY, flags);
+		if (r == NULL) {
+			DRV_LOG(ERR, "\"%s\" creation failed: %s", mz_name,
+				rte_strerror(rte_errno));
+			return -rte_errno;
+		}
+		age_info->hw_age.aged_list = r;
+		DRV_LOG(DEBUG, "\"%s\" is successfully created (size=%u).",
+			mz_name, ring_size);
+		/* In non "strict_queue" mode, initialize the event. */
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	return 0;
+error:
+	MLX5_ASSERT(strict_queue);
+	while (qidx--)
+		rte_ring_free(age_info->hw_q_age->aged_lists[qidx]);
+	rte_free(age_info->hw_q_age);
+	return -1;
+}
+
+/**
+ * Cleanup aged-out ring before destroying.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ * @param r
+ *   Pointer to aged-out ring object.
+ */
+static void
+mlx5_hws_aged_out_ring_cleanup(struct mlx5_priv *priv, struct rte_ring *r)
+{
+	int ring_size = rte_ring_count(r);
+
+	while (ring_size > 0) {
+		uint32_t age_idx = 0;
+
+		if (rte_ring_dequeue_elem(r, &age_idx, sizeof(uint32_t)) < 0)
+			break;
+		/* get the AGE context if the aged-out index is still valid. */
+		mlx5_hws_age_context_get(priv, age_idx);
+		ring_size--;
+	}
+	rte_ring_free(r);
+}
+
+/**
+ * Destroy the shared aging list information per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+static void
+mlx5_hws_age_info_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint16_t nb_queues = age_info->hw_q_age->nb_rings;
+	struct rte_ring *r;
+
+	if (priv->hws_strict_queue) {
+		uint32_t qidx;
+
+		for (qidx = 0; qidx < nb_queues; ++qidx) {
+			r = age_info->hw_q_age->aged_lists[qidx];
+			mlx5_hws_aged_out_ring_cleanup(priv, r);
+		}
+		mlx5_free(age_info->hw_q_age);
+	} else {
+		r = age_info->hw_age.aged_list;
+		mlx5_hws_aged_out_ring_cleanup(priv, r);
+	}
+}
+
+/**
+ * Initialize the aging mechanism per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param attr
+ *   Port configuration attributes.
+ * @param nb_queues
+ *   Number of HWS queues.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool_config cfg = {
+		.size =
+		      RTE_CACHE_LINE_ROUNDUP(sizeof(struct mlx5_hws_age_param)),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hws_age_pool",
+	};
+	bool strict_queue = false;
+	uint32_t nb_alloc_cnts;
+	uint32_t rsize;
+	uint32_t nb_ages_updated;
+	int ret;
+
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	strict_queue = !!(attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE);
+#endif
+	MLX5_ASSERT(priv->hws_cpool);
+	nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(priv->hws_cpool);
+	if (strict_queue) {
+		rsize = mlx5_hws_aged_out_q_ring_size_get(nb_alloc_cnts,
+							  nb_queues);
+		nb_ages_updated = rsize * nb_queues + attr->nb_aging_objects;
+	} else {
+		rsize = mlx5_hws_aged_out_ring_size_get(nb_alloc_cnts);
+		nb_ages_updated = rsize + attr->nb_aging_objects;
+	}
+	ret = mlx5_hws_age_info_init(dev, nb_queues, strict_queue, rsize);
+	if (ret < 0)
+		return ret;
+	cfg.max_idx = rte_align32pow2(nb_ages_updated);
+	if (cfg.max_idx <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = cfg.max_idx;
+	} else if (cfg.max_idx <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	age_info->ages_ipool = mlx5_ipool_create(&cfg);
+	if (age_info->ages_ipool == NULL) {
+		mlx5_hws_age_info_destroy(priv);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	priv->hws_age_req = 1;
+	return 0;
+}
+
+/**
+ * Cleanup all aging resources per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+
+	MLX5_ASSERT(priv->hws_age_req);
+	mlx5_hws_age_info_destroy(priv);
+	mlx5_ipool_destroy(age_info->ages_ipool);
+	age_info->ages_ipool = NULL;
+	priv->hws_age_req = 0;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index 5fab4ba597..e311923f71 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -10,26 +10,26 @@
 #include "mlx5_flow.h"
 
 /*
- * COUNTER ID's layout
+ * HWS COUNTER ID's layout
  *       3                   2                   1                   0
  *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- *    | T |       | D |                                               |
- *    ~ Y |       | C |                    IDX                        ~
- *    | P |       | S |                                               |
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
- *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
  *    Bit 25:24 = DCS index
  *    Bit 23:00 = IDX in this counter belonged DCS bulk.
  */
-typedef uint32_t cnt_id_t;
 
-#define MLX5_HWS_CNT_DCS_NUM 4
 #define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
 #define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
 #define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
 
+#define MLX5_HWS_AGE_IDX_MASK (RTE_BIT32(MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1)
+
 struct mlx5_hws_cnt_dcs {
 	void *dr_action;
 	uint32_t batch_sz;
@@ -44,12 +44,22 @@ struct mlx5_hws_cnt_dcs_mng {
 
 struct mlx5_hws_cnt {
 	struct flow_counter_stats reset;
+	bool in_used; /* Indicator whether this counter in used or in pool. */
 	union {
-		uint32_t share: 1;
-		/*
-		 * share will be set to 1 when this counter is used as indirect
-		 * action. Only meaningful when user own this counter.
-		 */
+		struct {
+			uint32_t share:1;
+			/*
+			 * share will be set to 1 when this counter is used as
+			 * indirect action.
+			 */
+			uint32_t age_idx:24;
+			/*
+			 * When this counter uses for aging, it save the index
+			 * of AGE parameter. For pure counter (without aging)
+			 * this index is zero.
+			 */
+		};
+		/* This struct is only meaningful when user own this counter. */
 		uint32_t query_gen_when_free;
 		/*
 		 * When PMD own this counter (user put back counter to PMD
@@ -96,8 +106,48 @@ struct mlx5_hws_cnt_pool {
 	struct rte_ring *free_list;
 	struct rte_ring *wait_reset_list;
 	struct mlx5_hws_cnt_pool_caches *cache;
+	uint64_t time_of_last_age_check;
 } __rte_cache_aligned;
 
+/* HWS AGE status. */
+enum {
+	HWS_AGE_FREE, /* Initialized state. */
+	HWS_AGE_CANDIDATE, /* AGE assigned to flows. */
+	HWS_AGE_CANDIDATE_INSIDE_RING,
+	/*
+	 * AGE assigned to flows but it still in ring. It was aged-out but the
+	 * timeout was changed, so it in ring but stiil candidate.
+	 */
+	HWS_AGE_AGED_OUT_REPORTED,
+	/*
+	 * Aged-out, reported by rte_flow_get_q_aged_flows and wait for destroy.
+	 */
+	HWS_AGE_AGED_OUT_NOT_REPORTED,
+	/*
+	 * Aged-out, inside the aged-out ring.
+	 * wait for rte_flow_get_q_aged_flows and destroy.
+	 */
+};
+
+/* HWS counter age parameter. */
+struct mlx5_hws_age_param {
+	uint32_t timeout; /* Aging timeout in seconds (atomically accessed). */
+	uint32_t sec_since_last_hit;
+	/* Time in seconds since last hit (atomically accessed). */
+	uint16_t state; /* AGE state (atomically accessed). */
+	uint64_t accumulator_last_hits;
+	/* Last total value of hits for comparing. */
+	uint64_t accumulator_hits;
+	/* Accumulator for hits coming from several counters. */
+	uint32_t accumulator_cnt;
+	/* Number counters which already updated the accumulator in this sec. */
+	uint32_t nb_cnts; /* Number counters used by this AGE. */
+	uint32_t queue_id; /* Queue id of the counter. */
+	cnt_id_t own_cnt_index;
+	/* Counter action created specifically for this AGE action. */
+	void *context; /* Flow AGE context. */
+} __rte_packed __rte_cache_aligned;
+
 /**
  * Translate counter id into internal index (start from 0), which can be used
  * as index of raw/cnt pool.
@@ -107,7 +157,7 @@ struct mlx5_hws_cnt_pool {
  * @return
  *   Internal index
  */
-static __rte_always_inline cnt_id_t
+static __rte_always_inline uint32_t
 mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 {
 	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
@@ -139,7 +189,7 @@ mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
  *   Counter id
  */
 static __rte_always_inline cnt_id_t
-mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, uint32_t iidx)
 {
 	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
 	uint32_t idx;
@@ -344,9 +394,10 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
 	struct rte_ring_zc_data zcdr = {0};
 	struct rte_ring *qcache = NULL;
 	unsigned int wb_num = 0; /* cache write-back number. */
-	cnt_id_t iidx;
+	uint32_t iidx;
 
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].in_used = false;
 	cpool->pool[iidx].query_gen_when_free =
 		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
 	if (likely(queue != NULL))
@@ -388,20 +439,23 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
  *   A pointer to HWS queue. If null, it means fetch from common pool.
  * @param cnt_id
  *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @param age_idx
+ *   Index of AGE parameter using this counter, zero means there is no such AGE.
+ *
  * @return
  *   - 0: Success; objects taken.
  *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
  *   - -EAGAIN: counter is not ready; try again.
  */
 static __rte_always_inline int
-mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
-		uint32_t *queue, cnt_id_t *cnt_id)
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue,
+		      cnt_id_t *cnt_id, uint32_t age_idx)
 {
 	unsigned int ret;
 	struct rte_ring_zc_data zcdc = {0};
 	struct rte_ring *qcache = NULL;
-	uint32_t query_gen = 0;
-	cnt_id_t iidx, tmp_cid = 0;
+	uint32_t iidx, query_gen = 0;
+	cnt_id_t tmp_cid = 0;
 
 	if (likely(queue != NULL))
 		qcache = cpool->cache->qcache[*queue];
@@ -422,6 +476,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 		__hws_cnt_query_raw(cpool, *cnt_id,
 				    &cpool->pool[iidx].reset.hits,
 				    &cpool->pool[iidx].reset.bytes);
+		cpool->pool[iidx].in_used = true;
+		cpool->pool[iidx].age_idx = age_idx;
 		return 0;
 	}
 	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
@@ -455,6 +511,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 			    &cpool->pool[iidx].reset.bytes);
 	rte_ring_dequeue_zc_elem_finish(qcache, 1);
 	cpool->pool[iidx].share = 0;
+	cpool->pool[iidx].in_used = true;
+	cpool->pool[iidx].age_idx = age_idx;
 	return 0;
 }
 
@@ -478,16 +536,16 @@ mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
 }
 
 static __rte_always_inline int
-mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id,
+			uint32_t age_idx)
 {
 	int ret;
 	uint32_t iidx;
 
-	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id, age_idx);
 	if (ret != 0)
 		return ret;
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
-	MLX5_ASSERT(cpool->pool[iidx].share == 0);
 	cpool->pool[iidx].share = 1;
 	return 0;
 }
@@ -513,10 +571,73 @@ mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 	return cpool->pool[iidx].share ? true : false;
 }
 
+static __rte_always_inline void
+mlx5_hws_cnt_age_set(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		     uint32_t age_idx)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	cpool->pool[iidx].age_idx = age_idx;
+}
+
+static __rte_always_inline uint32_t
+mlx5_hws_cnt_age_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	return cpool->pool[iidx].age_idx;
+}
+
+static __rte_always_inline cnt_id_t
+mlx5_hws_age_cnt_get(struct mlx5_priv *priv, struct mlx5_hws_age_param *param,
+		     uint32_t age_idx)
+{
+	if (!param->own_cnt_index) {
+		/* Create indirect counter one for internal usage. */
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool,
+					    &param->own_cnt_index, age_idx) < 0)
+			return 0;
+		param->nb_cnts++;
+	}
+	return param->own_cnt_index;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_increase(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	MLX5_ASSERT(param != NULL);
+	param->nb_cnts++;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_decrease(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	if (param != NULL)
+		param->nb_cnts--;
+}
+
+static __rte_always_inline bool
+mlx5_hws_age_is_indirect(uint32_t age_idx)
+{
+	return (age_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_AGE ? true : false;
+}
+
 /* init HWS counter pool. */
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg);
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg);
 
 void
 mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
@@ -555,4 +676,28 @@ mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
 void
 mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
 
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error);
+
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error);
+
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error);
+
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx);
+
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues);
+
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv);
+
 #endif /* _MLX5_HWS_CNT_H_ */
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index 254c879d1a..82e8298781 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -170,6 +170,14 @@ struct mlx5_l3t_tbl {
 typedef int32_t (*mlx5_l3t_alloc_callback_fn)(void *ctx,
 					   union mlx5_l3t_data *data);
 
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_HW_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_HW_IPOOL_CACHE_MIN (1 << 9)
+
 /*
  * The indexed memory entry index is made up of trunk index and offset of
  * the entry in the trunk. Since the entry index is 32 bits, in case user
@@ -207,7 +215,7 @@ struct mlx5_indexed_pool_config {
 	 */
 	uint32_t need_lock:1;
 	/* Lock is needed for multiple thread usage. */
-	uint32_t release_mem_en:1; /* Rlease trunk when it is free. */
+	uint32_t release_mem_en:1; /* Release trunk when it is free. */
 	uint32_t max_idx; /* The maximum index can be allocated. */
 	uint32_t per_core_cache;
 	/*
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 14/18] net/mlx5: add async action push and pull support
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (12 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 13/18] net/mlx5: add HWS AGE action support Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

The queue based rte_flow_async_action_* functions work same as
queue based async flow functions. The operations can be pushed
asynchronously, so is the pull.

This commit adds the async action missing push and pull support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |  62 ++++-
 drivers/net/mlx5/mlx5_flow.c       |  45 ++++
 drivers/net/mlx5/mlx5_flow.h       |  17 ++
 drivers/net/mlx5/mlx5_flow_aso.c   | 181 +++++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c    |   7 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 412 +++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c |   6 +-
 7 files changed, 626 insertions(+), 104 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 09ab7a080a..5195529267 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -346,6 +346,8 @@ struct mlx5_lb_ctx {
 enum {
 	MLX5_HW_Q_JOB_TYPE_CREATE, /* Flow create job type. */
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
+	MLX5_HW_Q_JOB_TYPE_UPDATE,
+	MLX5_HW_Q_JOB_TYPE_QUERY,
 };
 
 #define MLX5_HW_MAX_ITEMS (16)
@@ -353,12 +355,23 @@ enum {
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
-	struct rte_flow_hw *flow; /* Flow attached to the job. */
+	union {
+		struct rte_flow_hw *flow; /* Flow attached to the job. */
+		const void *action; /* Indirect action attached to the job. */
+	};
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
 	struct rte_flow_item *items;
-	struct rte_flow_item_ethdev port_spec;
+	union {
+		struct {
+			/* Pointer to ct query user memory. */
+			struct rte_flow_action_conntrack *profile;
+			/* Pointer to ct ASO query out memory. */
+			void *out_data;
+		} __rte_packed;
+		struct rte_flow_item_ethdev port_spec;
+	} __rte_packed;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -366,6 +379,8 @@ struct mlx5_hw_q {
 	uint32_t job_idx; /* Free job index. */
 	uint32_t size; /* LIFO size. */
 	struct mlx5_hw_q_job **job; /* LIFO header. */
+	struct rte_ring *indir_cq; /* Indirect action SW completion queue. */
+	struct rte_ring *indir_iq; /* Indirect action SW in progress queue. */
 } __rte_cache_aligned;
 
 
@@ -574,6 +589,7 @@ struct mlx5_aso_sq_elem {
 			struct mlx5_aso_ct_action *ct;
 			char *query_data;
 		};
+		void *user_data;
 	};
 };
 
@@ -583,7 +599,9 @@ struct mlx5_aso_sq {
 	struct mlx5_aso_cq cq;
 	struct mlx5_devx_sq sq_obj;
 	struct mlx5_pmd_mr mr;
+	volatile struct mlx5_aso_wqe *db;
 	uint16_t pi;
+	uint16_t db_pi;
 	uint32_t head;
 	uint32_t tail;
 	uint32_t sqn;
@@ -998,6 +1016,7 @@ struct mlx5_flow_meter_profile {
 enum mlx5_aso_mtr_state {
 	ASO_METER_FREE, /* In free list. */
 	ASO_METER_WAIT, /* ACCESS_ASO WQE in progress. */
+	ASO_METER_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_METER_READY, /* CQE received. */
 };
 
@@ -1200,6 +1219,7 @@ struct mlx5_bond_info {
 enum mlx5_aso_ct_state {
 	ASO_CONNTRACK_FREE, /* Inactive, in the free list. */
 	ASO_CONNTRACK_WAIT, /* WQE sent in the SQ. */
+	ASO_CONNTRACK_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_CONNTRACK_READY, /* CQE received w/o error. */
 	ASO_CONNTRACK_QUERY, /* WQE for query sent. */
 	ASO_CONNTRACK_MAX, /* Guard. */
@@ -1208,13 +1228,21 @@ enum mlx5_aso_ct_state {
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
 	union {
-		LIST_ENTRY(mlx5_aso_ct_action) next;
-		/* Pointer to the next ASO CT. Used only in SWS. */
-		struct mlx5_aso_ct_pool *pool;
-		/* Pointer to action pool. Used only in HWS. */
+		/* SWS mode struct. */
+		struct {
+			/* Pointer to the next ASO CT. Used only in SWS. */
+			LIST_ENTRY(mlx5_aso_ct_action) next;
+		};
+		/* HWS mode struct. */
+		struct {
+			/* Pointer to action pool. Used only in HWS. */
+			struct mlx5_aso_ct_pool *pool;
+		};
 	};
-	void *dr_action_orig; /* General action object for original dir. */
-	void *dr_action_rply; /* General action object for reply dir. */
+	/* General action object for original dir. */
+	void *dr_action_orig;
+	/* General action object for reply dir. */
+	void *dr_action_rply;
 	uint32_t refcnt; /* Action used count in device flows. */
 	uint16_t offset; /* Offset of ASO CT in DevX objects bulk. */
 	uint16_t peer; /* The only peer port index could also use this CT. */
@@ -2142,18 +2170,21 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 			   enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
-				 struct mlx5_aso_mtr *mtr,
-				 struct mlx5_mtr_bulk *bulk);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk,
+		void *user_data, bool push);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile);
+			      const struct rte_flow_action_conntrack *profile,
+			      void *user_data,
+			      bool push);
 int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
 int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
-			     struct rte_flow_action_conntrack *profile);
+			     struct rte_flow_action_conntrack *profile,
+			     void *user_data, bool push);
 int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
@@ -2161,6 +2192,13 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+void mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
+			     char *wdata);
+void mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_sq *sq);
+int mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			     struct rte_flow_op_result res[],
+			     uint16_t n_res);
 int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c32255a3f9..b11957f8ee 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -981,6 +981,14 @@ mlx5_flow_async_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				  void *user_data,
 				  struct rte_flow_error *error);
 
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				 const struct rte_flow_op_attr *attr,
+				 const struct rte_flow_action_handle *handle,
+				 void *data,
+				 void *user_data,
+				 struct rte_flow_error *error);
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -1019,6 +1027,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.push = mlx5_flow_push,
 	.async_action_handle_create = mlx5_flow_async_action_handle_create,
 	.async_action_handle_update = mlx5_flow_async_action_handle_update,
+	.async_action_handle_query = mlx5_flow_async_action_handle_query,
 	.async_action_handle_destroy = mlx5_flow_async_action_handle_destroy,
 };
 
@@ -8862,6 +8871,42 @@ mlx5_flow_async_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 					 update, user_data, error);
 }
 
+/**
+ * Query shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used..
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] handle
+ *   Action handle to be updated.
+ * @param[in] data
+ *   Pointer query result data.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				    const struct rte_flow_op_attr *attr,
+				    const struct rte_flow_action_handle *handle,
+				    void *data,
+				    void *user_data,
+				    struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops =
+			flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+
+	return fops->async_action_query(dev, queue, attr, handle,
+					data, user_data, error);
+}
+
 /**
  * Destroy shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 5c57f51706..57cebb5ce6 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -57,6 +57,13 @@ enum mlx5_rte_flow_field_id {
 
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
+#define MLX5_INDIRECT_ACTION_TYPE_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) >> MLX5_INDIRECT_ACTION_TYPE_OFFSET)
+
+#define MLX5_INDIRECT_ACTION_IDX_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) & \
+	 ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1))
+
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
@@ -1826,6 +1833,15 @@ typedef int (*mlx5_flow_async_action_handle_update_t)
 			 void *user_data,
 			 struct rte_flow_error *error);
 
+typedef int (*mlx5_flow_async_action_handle_query_t)
+			(struct rte_eth_dev *dev,
+			 uint32_t queue,
+			 const struct rte_flow_op_attr *attr,
+			 const struct rte_flow_action_handle *handle,
+			 void *data,
+			 void *user_data,
+			 struct rte_flow_error *error);
+
 typedef int (*mlx5_flow_async_action_handle_destroy_t)
 			(struct rte_eth_dev *dev,
 			 uint32_t queue,
@@ -1888,6 +1904,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_push_t push;
 	mlx5_flow_async_action_handle_create_t async_action_create;
 	mlx5_flow_async_action_handle_update_t async_action_update;
+	mlx5_flow_async_action_handle_query_t async_action_query;
 	mlx5_flow_async_action_handle_destroy_t async_action_destroy;
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index a5f58301eb..1ddf71e44e 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -519,6 +519,70 @@ mlx5_aso_cqe_err_handle(struct mlx5_aso_sq *sq)
 			       (volatile uint32_t *)&sq->sq_obj.aso_wqes[idx]);
 }
 
+int
+mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			 struct rte_flow_op_result res[],
+			 uint16_t n_res)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const uint32_t cq_size = 1 << cq->log_desc_n;
+	const uint32_t mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx;
+	uint16_t max;
+	uint16_t n = 0;
+	int ret;
+
+	max = (uint16_t)(sq->head - sq->tail);
+	if (unlikely(!max || !n_res))
+		return 0;
+	next_idx = cq->cq_ci & mask;
+	do {
+		idx = next_idx;
+		next_idx = (cq->cq_ci + 1) & mask;
+		/* Need to confirm the position of the prefetch. */
+		rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+		cqe = &cq->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, cq->cq_ci);
+		/*
+		 * Be sure owner read is done before any other cookie field or
+		 * opaque field.
+		 */
+		rte_io_rmb();
+		if (ret == MLX5_CQE_STATUS_HW_OWN)
+			break;
+		res[n].user_data = sq->elts[(uint16_t)((sq->tail + n) & mask)].user_data;
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			mlx5_aso_cqe_err_handle(sq);
+			res[n].status = RTE_FLOW_OP_ERROR;
+		} else {
+			res[n].status = RTE_FLOW_OP_SUCCESS;
+		}
+		cq->cq_ci++;
+		if (++n == n_res)
+			break;
+	} while (1);
+	if (likely(n)) {
+		sq->tail += n;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return n;
+}
+
+void
+mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		  struct mlx5_aso_sq *sq)
+{
+	if (sq->db_pi == sq->pi)
+		return;
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)sq->db,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	sq->db_pi = sq->pi;
+}
+
 /**
  * Update ASO objects upon completion.
  *
@@ -728,7 +792,9 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
 			       struct mlx5_mtr_bulk *bulk,
-				   bool need_lock)
+			       bool need_lock,
+			       void *user_data,
+			       bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -754,7 +820,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
-	sq->elts[sq->head & mask].mtr = aso_mtr;
+	sq->elts[sq->head & mask].mtr = user_data ? user_data : aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
 		if (likely(sh->config.dv_flow_en == 2))
 			pool = aso_mtr->pool;
@@ -820,9 +886,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	 */
 	sq->head++;
 	sq->pi += 2;/* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -912,11 +982,14 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
-			struct mlx5_mtr_bulk *bulk)
+			struct mlx5_mtr_bulk *bulk,
+			void *user_data,
+			bool push)
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 	bool need_lock;
+	int ret;
 
 	if (likely(sh->config.dv_flow_en == 2) &&
 	    mtr->type == ASO_METER_INDIRECT) {
@@ -931,10 +1004,15 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						     need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
-						   bulk, need_lock))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						   need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -963,6 +1041,7 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	uint8_t state;
 	bool need_lock;
 
 	if (likely(sh->config.dv_flow_en == 2) &&
@@ -978,8 +1057,8 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
-	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
-					    ASO_METER_READY)
+	state = __atomic_load_n(&mtr->state, __ATOMIC_RELAXED);
+	if (state == ASO_METER_READY || state == ASO_METER_WAIT_ASYNC)
 		return 0;
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
@@ -1095,7 +1174,9 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile,
-			      bool need_lock)
+			      bool need_lock,
+			      void *user_data,
+			      bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1119,10 +1200,16 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
-	sq->elts[sq->head & mask].ct = ct;
-	sq->elts[sq->head & mask].query_data = NULL;
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_WAIT);
+	if (user_data) {
+		sq->elts[sq->head & mask].user_data = user_data;
+	} else {
+		sq->elts[sq->head & mask].ct = ct;
+		sq->elts[sq->head & mask].query_data = NULL;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
+
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1202,9 +1289,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 		 profile->reply_dir.max_ack);
 	sq->head++;
 	sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1260,7 +1351,9 @@ static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_aso_sq *sq,
 			    struct mlx5_aso_ct_action *ct, char *data,
-			    bool need_lock)
+			    bool need_lock,
+			    void *user_data,
+			    bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1286,14 +1379,23 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_QUERY);
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_QUERY);
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	/* Confirm the location and address of the prefetch instruction. */
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	wqe_idx = sq->head & mask;
-	sq->elts[wqe_idx].ct = ct;
-	sq->elts[wqe_idx].query_data = data;
+	/* Check if this is async mode. */
+	if (user_data) {
+		struct mlx5_hw_q_job *job = (struct mlx5_hw_q_job *)user_data;
+
+		sq->elts[wqe_idx].ct = user_data;
+		job->out_data = (char *)((uintptr_t)sq->mr.addr + wqe_idx * 64);
+	} else {
+		sq->elts[wqe_idx].query_data = data;
+		sq->elts[wqe_idx].ct = ct;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
@@ -1319,9 +1421,13 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	 * data segment is not used in this case.
 	 */
 	sq->pi += 2;
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1407,20 +1513,29 @@ int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
-			  const struct rte_flow_action_conntrack *profile)
+			  const struct rte_flow_action_conntrack *profile,
+			  void *user_data,
+			  bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
 	struct mlx5_aso_sq *sq;
 	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
+	int ret;
 
 	if (sh->config.dv_flow_en == 2)
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						    need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
-		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
+		mlx5_aso_ct_completion_handle(sh, sq,  need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						  need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
@@ -1480,7 +1595,7 @@ mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
  * @param[in] wdata
  *   Pointer to data fetched from hardware.
  */
-static inline void
+void
 mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
 			char *wdata)
 {
@@ -1564,7 +1679,8 @@ int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
-			 struct rte_flow_action_conntrack *profile)
+			 struct rte_flow_action_conntrack *profile,
+			 void *user_data, bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
@@ -1577,9 +1693,15 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+						  need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+				need_lock, NULL, true);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1630,7 +1752,8 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		rte_errno = ENXIO;
 		return -rte_errno;
 	} else if (state == ASO_CONNTRACK_READY ||
-		   state == ASO_CONNTRACK_QUERY) {
+		   state == ASO_CONNTRACK_QUERY ||
+		   state == ASO_CONNTRACK_WAIT_ASYNC) {
 		return 0;
 	}
 	do {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 250f61d46f..3cc4b9bcd4 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -13103,7 +13103,7 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro, NULL, true)) {
 		flow_dv_aso_ct_dev_release(dev, idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -15917,7 +15917,7 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		if (ret)
 			return ret;
 		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						ct, new_prf);
+						ct, new_prf, NULL, true);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16753,7 +16753,8 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct,
+					data, NULL, true))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 59d9db04d3..2792a0fc39 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -1178,9 +1178,9 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 }
 
 static __rte_always_inline struct mlx5_aso_mtr *
-flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
-			   const struct rte_flow_action *action,
-			   uint32_t queue)
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action *action,
+			 void *user_data, bool push)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1200,13 +1200,14 @@ flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
 	fm->is_enable = meter_mark->state;
 	fm->color_aware = meter_mark->color_mode;
 	aso_mtr->pool = pool;
-	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->state = (queue == MLX5_HW_INV_QUEUE) ?
+			  ASO_METER_WAIT : ASO_METER_WAIT_ASYNC;
 	aso_mtr->offset = mtr_id - 1;
 	aso_mtr->init_color = (meter_mark->color_mode) ?
 		meter_mark->init_color : RTE_COLOR_GREEN;
 	/* Update ASO flow meter by wqe. */
 	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-					 &priv->mtr_bulk)) {
+					 &priv->mtr_bulk, user_data, push)) {
 		mlx5_ipool_free(pool->idx_pool, mtr_id);
 		return NULL;
 	}
@@ -1231,7 +1232,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_aso_mtr *aso_mtr;
 
-	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, NULL, true);
 	if (!aso_mtr)
 		return -1;
 
@@ -2295,9 +2296,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				rte_col_2_mlx5_col(aso_mtr->init_color);
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/*
+			 * Allocate meter directly will slow down flow
+			 * insertion rate.
+			 */
 			ret = flow_hw_meter_mark_compile(dev,
 				act_data->action_dst, action,
-				rule_acts, &job->flow->mtr_id, queue);
+				rule_acts, &job->flow->mtr_id, MLX5_HW_INV_QUEUE);
 			if (ret != 0)
 				return ret;
 			break;
@@ -2604,6 +2609,74 @@ flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
 	}
 }
 
+static inline int
+__flow_hw_pull_indir_action_comp(struct rte_eth_dev *dev,
+				 uint32_t queue,
+				 struct rte_flow_op_result res[],
+				 uint16_t n_res)
+
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *r = priv->hw_q[queue].indir_cq;
+	struct mlx5_hw_q_job *job;
+	void *user_data = NULL;
+	uint32_t type, idx;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_aso_ct_action *aso_ct;
+	int ret_comp, i;
+
+	ret_comp = (int)rte_ring_count(r);
+	if (ret_comp > n_res)
+		ret_comp = n_res;
+	for (i = 0; i < ret_comp; i++) {
+		rte_ring_dequeue(r, &user_data);
+		res[i].user_data = user_data;
+		res[i].status = RTE_FLOW_OP_SUCCESS;
+	}
+	if (ret_comp < n_res && priv->hws_mpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->hws_mpool->sq[queue],
+				&res[ret_comp], n_res - ret_comp);
+	if (ret_comp < n_res && priv->hws_ctpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->ct_mng->aso_sqs[queue],
+				&res[ret_comp], n_res - ret_comp);
+	for (i = 0; i <  ret_comp; i++) {
+		job = (struct mlx5_hw_q_job *)res[i].user_data;
+		/* Restore user data. */
+		res[i].user_data = job->user_data;
+		if (job->type == MLX5_HW_Q_JOB_TYPE_DESTROY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				mlx5_ipool_free(priv->hws_mpool->idx_pool, idx);
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_CREATE) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				aso_mtr = mlx5_ipool_get(priv->hws_mpool->idx_pool, idx);
+				aso_mtr->state = ASO_METER_READY;
+			} else if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_QUERY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				mlx5_aso_ct_obj_analyze(job->profile,
+							job->out_data);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		}
+		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
+	}
+	return ret_comp;
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2636,6 +2709,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
+	/* 1. Pull the flow completion. */
 	ret = mlx5dr_send_queue_poll(priv->dr_ctx, queue, res, n_res);
 	if (ret < 0)
 		return rte_flow_error_set(error, rte_errno,
@@ -2661,9 +2735,34 @@ flow_hw_pull(struct rte_eth_dev *dev,
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
 	}
+	/* 2. Pull indirect action comp. */
+	if (ret < n_res)
+		ret += __flow_hw_pull_indir_action_comp(dev, queue, &res[ret],
+							n_res - ret);
 	return ret;
 }
 
+static inline void
+__flow_hw_push_action(struct rte_eth_dev *dev,
+		    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *iq = priv->hw_q[queue].indir_iq;
+	struct rte_ring *cq = priv->hw_q[queue].indir_cq;
+	void *job = NULL;
+	uint32_t ret, i;
+
+	ret = rte_ring_count(iq);
+	for (i = 0; i < ret; i++) {
+		rte_ring_dequeue(iq, &job);
+		rte_ring_enqueue(cq, job);
+	}
+	if (priv->hws_ctpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->ct_mng->aso_sqs[queue]);
+	if (priv->hws_mpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->hws_mpool->sq[queue]);
+}
+
 /**
  * Push the enqueued flows to HW.
  *
@@ -2687,6 +2786,7 @@ flow_hw_push(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret;
 
+	__flow_hw_push_action(dev, queue);
 	ret = mlx5dr_send_queue_action(priv->dr_ctx, queue,
 				       MLX5DR_SEND_QUEUE_ACTION_DRAIN);
 	if (ret) {
@@ -5940,7 +6040,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* Adds one queue to be used by PMD.
 	 * The last queue will be used by the PMD.
 	 */
-	uint16_t nb_q_updated;
+	uint16_t nb_q_updated = 0;
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
@@ -6007,6 +6107,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		goto err;
 	}
 	for (i = 0; i < nb_q_updated; i++) {
+		char mz_name[RTE_MEMZONE_NAMESIZE];
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 		struct rte_flow_item *items = NULL;
@@ -6034,6 +6135,22 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_cq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_cq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_cq)
+			goto err;
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_iq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_iq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_iq)
+			goto err;
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
 	dr_ctx_attr.queues = nb_q_updated;
@@ -6151,6 +6268,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
+	for (i = 0; i < nb_q_updated; i++) {
+		if (priv->hw_q[i].indir_iq)
+			rte_ring_free(priv->hw_q[i].indir_iq);
+		if (priv->hw_q[i].indir_cq)
+			rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	if (priv->acts_ipool) {
@@ -6180,7 +6303,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i;
+	uint32_t i;
 
 	if (!priv->dr_ctx)
 		return;
@@ -6228,6 +6351,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	for (i = 0; i < priv->nb_queue; i++) {
+		rte_ring_free(priv->hw_q[i].indir_iq);
+		rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -6416,8 +6543,9 @@ flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
 }
 
 static int
-flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t queue, uint32_t idx,
 			struct rte_flow_action_conntrack *profile,
+			void *user_data, bool push,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6441,7 +6569,7 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 	}
 	profile->peer_port = ct->peer;
 	profile->is_original_dir = ct->is_original;
-	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, queue, ct, profile, user_data, push))
 		return rte_flow_error_set(error, EIO,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -6453,7 +6581,8 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 static int
 flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_modify_conntrack *action_conf,
-			 uint32_t idx, struct rte_flow_error *error)
+			 uint32_t idx, void *user_data, bool push,
+			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
@@ -6484,7 +6613,8 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf,
+						user_data, push);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -6506,6 +6636,7 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 static struct rte_flow_action_handle *
 flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_action_conntrack *pro,
+			 void *user_data, bool push,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6532,7 +6663,7 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	ct->pool = pool;
-	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro, user_data, push)) {
 		mlx5_ipool_free(pool->cts, ct_idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -6652,15 +6783,29 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     struct rte_flow_error *error)
 {
 	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 	uint32_t age_idx;
+	bool push = true;
+	bool aso = false;
 
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx)) {
+			rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Flow queue full.");
+			return NULL;
+		}
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_CREATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (action->type) {
 	case RTE_FLOW_ACTION_TYPE_AGE:
 		if (priv->hws_strict_queue) {
@@ -6700,10 +6845,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 				 (uintptr_t)cnt_id;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
-		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		aso = true;
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, job,
+						  push, error);
 		break;
 	case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		aso = true;
+		aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, job, push);
 		if (!aso_mtr)
 			break;
 		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
@@ -6716,7 +6864,20 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	default:
 		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				   NULL, "action type not supported");
-		return NULL;
+		break;
+	}
+	if (job) {
+		if (!handle) {
+			priv->hw_q[queue].job_idx++;
+			return NULL;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return handle;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
 	return handle;
 }
@@ -6750,32 +6911,56 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_modify_conntrack *ct_conf =
+		(const struct rte_flow_modify_conntrack *)update;
 	const struct rte_flow_update_meter_mark *upd_meter_mark =
 		(const struct rte_flow_update_meter_mark *)update;
 	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+	int ret = 0;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action update failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_UPDATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_update(priv, idx, update, error);
+		ret = mlx5_hws_age_action_update(priv, idx, update, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+		if (ct_conf->state)
+			aso = true;
+		ret = flow_hw_conntrack_update(dev, queue, update, act_idx,
+					       job, push, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso = true;
 		meter_mark = &upd_meter_mark->meter_mark;
 		/* Find ASO object. */
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark update index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		if (upd_meter_mark->profile_valid)
 			fm->profile = (struct mlx5_flow_meter_profile *)
@@ -6789,25 +6974,46 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			fm->is_enable = meter_mark->state;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
-						 aso_mtr, &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 aso_mtr, &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
+		}
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_update(dev, handle, update, error);
+		ret = flow_dv_action_update(dev, handle, update, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return 0;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 /**
@@ -6842,15 +7048,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
+	bool push = true;
+	bool aso = false;
+	int ret = 0;
 
-	RTE_SET_USED(queue);
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_DESTROY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_destroy(priv, age_idx, error);
+		ret = mlx5_hws_age_action_destroy(priv, age_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
 		if (age_idx != 0)
@@ -6859,39 +7078,69 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			 * time to update the AGE.
 			 */
 			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
-		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		ret = mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_destroy(dev, act_idx, error);
+		ret = flow_hw_conntrack_destroy(dev, act_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark destroy index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		fm->is_enable = 0;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-						 &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		mlx5_ipool_free(pool->idx_pool, idx);
+			break;
+		}
+		if (!job)
+			mlx5_ipool_free(pool->idx_pool, idx);
+		else
+			aso = true;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_destroy(dev, handle, error);
+		ret = flow_dv_action_destroy(dev, handle, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 static int
@@ -7115,28 +7364,76 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_action_query(struct rte_eth_dev *dev,
-		     const struct rte_flow_action_handle *handle, void *data,
-		     struct rte_flow_error *error)
+flow_hw_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+			    const struct rte_flow_op_attr *attr,
+			    const struct rte_flow_action_handle *handle,
+			    void *data, void *user_data,
+			    struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_q_job *job = NULL;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
+	int ret;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_QUERY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return flow_hw_query_age(dev, age_idx, data, error);
+		ret = flow_hw_query_age(dev, age_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
-		return flow_hw_query_counter(dev, act_idx, data, error);
+		ret = flow_hw_query_counter(dev, act_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_query(dev, handle, data, error);
+		aso = true;
+		if (job)
+			job->profile = (struct rte_flow_action_conntrack *)data;
+		ret = flow_hw_conntrack_query(dev, queue, act_idx, data,
+					      job, push, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
+	}
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
+	return 0;
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_query(dev, MLX5_HW_INV_QUEUE, NULL,
+			handle, data, NULL, error);
 }
 
 /**
@@ -7251,6 +7548,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
+	.async_action_query = flow_hw_action_handle_query,
 	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index ed2306283d..08f8aad70a 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -1632,7 +1632,7 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
@@ -1882,7 +1882,7 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1988,7 +1988,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
 	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
-					   &priv->mtr_bulk);
+					   &priv->mtr_bulk, NULL, true);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
 			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 15/18] net/mlx5: support flow integrity in HWS group 0
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (13 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 14/18] net/mlx5: add async action push and pull support Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
                     ` (2 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

- Reformat flow integrity item translation for HWS code.
- Support flow integrity bits in HWS group 0.
- Update integrity item translation to match positive semantics only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   1 +
 drivers/net/mlx5/mlx5_flow_dv.c | 163 ++++++++++++++++----------------
 drivers/net/mlx5/mlx5_flow_hw.c |   8 ++
 3 files changed, 90 insertions(+), 82 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 57cebb5ce6..ddc23aaf9c 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1470,6 +1470,7 @@ struct mlx5_dv_matcher_workspace {
 	struct mlx5_flow_rss_desc *rss_desc; /* RSS descriptor. */
 	const struct rte_flow_item *tunnel_item; /* Flow tunnel item. */
 	const struct rte_flow_item *gre_item; /* Flow GRE item. */
+	const struct rte_flow_item *integrity_items[2];
 };
 
 struct mlx5_flow_split_info {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 3cc4b9bcd4..1497423891 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12648,132 +12648,121 @@ flow_dv_aso_age_params_init(struct rte_eth_dev *dev,
 
 static void
 flow_dv_translate_integrity_l4(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v)
+			       void *headers)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value is used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l4_ok) {
 		/* RTE l4_ok filter aggregates hardware l4_ok and
 		 * l4_checksum_ok filters.
 		 * Positive RTE l4_ok match requires hardware match on both L4
 		 * hardware integrity bits.
-		 * For negative match, check hardware l4_checksum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L4.
+		 * PMD supports positive integrity item semantics only.
 		 */
-		if (value->l4_ok) {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_ok, 1);
-		}
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 !!value->l4_ok);
-	}
-	if (mask->l4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 value->l4_csum_ok);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_ok, 1);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
+	} else if (mask->l4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
 	}
 }
 
 static void
 flow_dv_translate_integrity_l3(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v, bool is_ipv4)
+			       void *headers, bool is_ipv4)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l3_ok) {
 		/* RTE l3_ok filter aggregates for IPv4 hardware l3_ok and
 		 * ipv4_csum_ok filters.
 		 * Positive RTE l3_ok match requires hardware match on both L3
 		 * hardware integrity bits.
-		 * For negative match, check hardware l3_csum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L3.
+		 * PMD supports positive integrity item semantics only.
 		 */
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l3_ok, 1);
 		if (is_ipv4) {
-			if (value->l3_ok) {
-				MLX5_SET(fte_match_set_lyr_2_4, headers_m,
-					 l3_ok, 1);
-				MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-					 l3_ok, 1);
-			}
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m,
+			MLX5_SET(fte_match_set_lyr_2_4, headers,
 				 ipv4_checksum_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-				 ipv4_checksum_ok, !!value->l3_ok);
-		} else {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l3_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l3_ok,
-				 value->l3_ok);
 		}
-	}
-	if (mask->ipv4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ipv4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ipv4_checksum_ok,
-			 value->ipv4_csum_ok);
+	} else if (is_ipv4 && mask->ipv4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, ipv4_checksum_ok, 1);
 	}
 }
 
 static void
-set_integrity_bits(void *headers_m, void *headers_v,
-		   const struct rte_flow_item *integrity_item, bool is_l3_ip4)
+set_integrity_bits(void *headers, const struct rte_flow_item *integrity_item,
+		   bool is_l3_ip4, uint32_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = integrity_item->spec;
-	const struct rte_flow_item_integrity *mask = integrity_item->mask;
+	const struct rte_flow_item_integrity *spec;
+	const struct rte_flow_item_integrity *mask;
 
 	/* Integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (!mask)
-		mask = &rte_flow_item_integrity_mask;
-	flow_dv_translate_integrity_l3(mask, spec, headers_m, headers_v,
-				       is_l3_ip4);
-	flow_dv_translate_integrity_l4(mask, spec, headers_m, headers_v);
+	if (MLX5_ITEM_VALID(integrity_item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(integrity_item, key_type, spec, mask,
+			 &rte_flow_item_integrity_mask);
+	flow_dv_translate_integrity_l3(mask, headers, is_l3_ip4);
+	flow_dv_translate_integrity_l4(mask, headers);
 }
 
 static void
-flow_dv_translate_item_integrity_post(void *matcher, void *key,
+flow_dv_translate_item_integrity_post(void *key,
 				      const
 				      struct rte_flow_item *integrity_items[2],
-				      uint64_t pattern_flags)
+				      uint64_t pattern_flags, uint32_t key_type)
 {
-	void *headers_m, *headers_v;
+	void *headers;
 	bool is_l3_ip4;
 
 	if (pattern_flags & MLX5_FLOW_ITEM_INNER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 inner_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_INNER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[1], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[1], is_l3_ip4,
+				   key_type);
 	}
 	if (pattern_flags & MLX5_FLOW_ITEM_OUTER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 outer_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_OUTER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[0], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[0], is_l3_ip4,
+				   key_type);
 	}
 }
 
-static void
+static uint64_t
 flow_dv_translate_item_integrity(const struct rte_flow_item *item,
-				 const struct rte_flow_item *integrity_items[2],
-				 uint64_t *last_item)
+				 struct mlx5_dv_matcher_workspace *wks,
+				 uint64_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = (typeof(spec))item->spec;
+	if ((key_type & MLX5_SET_MATCHER_SW) != 0) {
+		const struct rte_flow_item_integrity
+			*spec = (typeof(spec))item->spec;
 
-	/* integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (spec->level > 1) {
-		integrity_items[1] = item;
-		*last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		/* SWS integrity bits validation cleared spec pointer */
+		if (spec->level > 1) {
+			wks->integrity_items[1] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		} else {
+			wks->integrity_items[0] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		}
 	} else {
-		integrity_items[0] = item;
-		*last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		/* HWS supports outer integrity only */
+		wks->integrity_items[0] = item;
+		wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
 	}
+	return wks->last_item;
 }
 
 /**
@@ -13401,6 +13390,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		flow_dv_translate_item_meter_color(dev, key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_METER_COLOR;
 		break;
+	case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+		last_item = flow_dv_translate_item_integrity(items,
+							     wks, key_type);
+		break;
 	default:
 		break;
 	}
@@ -13464,6 +13457,12 @@ flow_dv_translate_items_hws(const struct rte_flow_item *items,
 		if (ret)
 			return ret;
 	}
+	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
+		flow_dv_translate_item_integrity_post(key,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      key_type);
+	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(key,
 						 wks.tunnel_item,
@@ -13544,7 +13543,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			     mlx5_flow_get_thread_workspace())->rss_desc,
 	};
 	struct mlx5_dv_matcher_workspace wks_m = wks;
-	const struct rte_flow_item *integrity_items[2] = {NULL, NULL};
 	int ret = 0;
 	int tunnel;
 
@@ -13555,10 +13553,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 						  NULL, "item not supported");
 		tunnel = !!(wks.item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		switch (items->type) {
-		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
-			flow_dv_translate_item_integrity(items, integrity_items,
-							 &wks.last_item);
-			break;
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			flow_dv_translate_item_aso_ct(dev, match_mask,
 						      match_value, items);
@@ -13601,9 +13595,14 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			return -rte_errno;
 	}
 	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
-		flow_dv_translate_item_integrity_post(match_mask, match_value,
-						      integrity_items,
-						      wks.item_flags);
+		flow_dv_translate_item_integrity_post(match_mask,
+						      wks_m.integrity_items,
+						      wks_m.item_flags,
+						      MLX5_SET_MATCHER_SW_M);
+		flow_dv_translate_item_integrity_post(match_value,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      MLX5_SET_MATCHER_SW_V);
 	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(match_mask,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 2792a0fc39..3cbe0305e9 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4655,6 +4655,14 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
+		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+			/*
+			 * Integrity flow item validation require access to
+			 * both item mask and spec.
+			 * Current HWS model allows item mask in pattern
+			 * template and item spec in flow rule.
+			 */
+			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
 			break;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 16/18] net/mlx5: support device control for E-Switch default rule
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (14 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 17/18] net/mlx5: support device control of representor matching Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Dariusz Sosnowski, Xueming Li

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adds support for fdb_def_rule_en device argument to HW
Steering, which controls:

- creation of default FDB jump flow rule,
- ability of the user to create transfer flow rules in root table.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  14 ++
 drivers/net/mlx5/mlx5.h          |   4 +-
 drivers/net/mlx5/mlx5_flow.c     |  20 +--
 drivers/net/mlx5/mlx5_flow.h     |   5 +-
 drivers/net/mlx5/mlx5_flow_dv.c  |  62 ++++---
 drivers/net/mlx5/mlx5_flow_hw.c  | 273 +++++++++++++++----------------
 drivers/net/mlx5/mlx5_trigger.c  |  31 ++--
 drivers/net/mlx5/mlx5_tx.h       |   1 +
 drivers/net/mlx5/mlx5_txq.c      |  47 ++++++
 drivers/net/mlx5/rte_pmd_mlx5.h  |  17 ++
 drivers/net/mlx5/version.map     |   1 +
 11 files changed, 287 insertions(+), 188 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 55801682cc..c23fe6daf1 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1554,6 +1554,20 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
+		if (priv->sh->config.dv_esw_en) {
+			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
+				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
+					     "but it is disabled (configure it through devlink)");
+				err = ENOTSUP;
+				goto error;
+			}
+			if (priv->sh->dv_regc0_mask == 0) {
+				DRV_LOG(ERR, "E-Switch with HWS is not supported "
+					     "(no available bits in reg_c[0])");
+				err = ENOTSUP;
+				goto error;
+			}
+		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5195529267..9a1718e2f2 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2022,7 +2022,7 @@ int mlx5_flow_ops_get(struct rte_eth_dev *dev, const struct rte_flow_ops **ops);
 int mlx5_flow_start_default(struct rte_eth_dev *dev);
 void mlx5_flow_stop_default(struct rte_eth_dev *dev);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
-int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t sq_num);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
@@ -2034,7 +2034,7 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev,
 int mlx5_flow_lacp_miss(struct rte_eth_dev *dev);
 struct rte_flow *mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev);
 uint32_t mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev,
-					    uint32_t txq);
+					    uint32_t sq_num);
 void mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 				       uint64_t async_id, int status);
 void mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b11957f8ee..76187d76ea 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7159,14 +7159,14 @@ mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param txq
- *   Txq index.
+ * @param sq_num
+ *   SQ number.
  *
  * @return
  *   Flow ID on success, 0 otherwise and rte_errno is set.
  */
 uint32_t
-mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sq_num)
 {
 	struct rte_flow_attr attr = {
 		.group = 0,
@@ -7178,8 +7178,8 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_flow_item_port_id port_spec = {
 		.id = MLX5_PORT_ESW_MGR,
 	};
-	struct mlx5_rte_flow_item_sq txq_spec = {
-		.queue = txq,
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sq_num,
 	};
 	struct rte_flow_item pattern[] = {
 		{
@@ -7189,7 +7189,7 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
-			.spec = &txq_spec,
+			.spec = &sq_spec,
 		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
@@ -7560,22 +7560,22 @@ mlx5_flow_verify(struct rte_eth_dev *dev __rte_unused)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param queue
- *   The queue index.
+ * @param sq_num
+ *   The SQ hw number.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
-			    uint32_t queue)
+			    uint32_t sq_num)
 {
 	const struct rte_flow_attr attr = {
 		.egress = 1,
 		.priority = 0,
 	};
 	struct mlx5_rte_flow_item_sq queue_spec = {
-		.queue = queue,
+		.queue = sq_num,
 	};
 	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ddc23aaf9c..88d92b18c7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -116,7 +116,7 @@ struct mlx5_flow_action_copy_mreg {
 
 /* Matches on source queue. */
 struct mlx5_rte_flow_item_sq {
-	uint32_t queue;
+	uint32_t queue; /* DevX SQ number */
 };
 
 /* Feature name to allocate metadata register. */
@@ -2485,9 +2485,8 @@ int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 
 int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
 
-int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
-					 uint32_t txq);
+					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1497423891..0f6fd34a8b 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -10123,6 +10123,29 @@ flow_dv_translate_item_port_id(struct rte_eth_dev *dev, void *key,
 	return 0;
 }
 
+/**
+ * Translate port representor item to eswitch match on port id.
+ *
+ * @param[in] dev
+ *   The devich to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise.
+ */
+static int
+flow_dv_translate_item_port_representor(struct rte_eth_dev *dev, void *key,
+					uint32_t key_type)
+{
+	flow_dv_translate_item_source_vport(key,
+			key_type & MLX5_SET_MATCHER_V ?
+			mlx5_flow_get_esw_manager_vport_id(dev) : 0xffff);
+	return 0;
+}
+
 /**
  * Translate represented port item to eswitch match on port id.
  *
@@ -11402,10 +11425,10 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
 }
 
 /**
- * Add Tx queue matcher
+ * Add SQ matcher
  *
- * @param[in] dev
- *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
  * @param[in, out] key
  *   Flow matcher value.
  * @param[in] item
@@ -11414,40 +11437,29 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
  *   Set flow matcher mask or value.
  */
 static void
-flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
-				void *key,
-				const struct rte_flow_item *item,
-				uint32_t key_type)
+flow_dv_translate_item_sq(void *key,
+			  const struct rte_flow_item *item,
+			  uint32_t key_type)
 {
 	const struct mlx5_rte_flow_item_sq *queue_m;
 	const struct mlx5_rte_flow_item_sq *queue_v;
 	const struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
-	void *misc_v =
-		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
-	struct mlx5_txq_ctrl *txq = NULL;
+	void *misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 	uint32_t queue;
 
 	MLX5_ITEM_UPDATE(item, key_type, queue_v, queue_m, &queue_mask);
 	if (!queue_m || !queue_v)
 		return;
 	if (key_type & MLX5_SET_MATCHER_V) {
-		txq = mlx5_txq_get(dev, queue_v->queue);
-		if (!txq)
-			return;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = queue_v->queue;
 		if (key_type == MLX5_SET_MATCHER_SW_V)
 			queue &= queue_m->queue;
 	} else {
 		queue = queue_m->queue;
 	}
 	MLX5_SET(fte_match_set_misc, misc_v, source_sqn, queue);
-	if (txq)
-		mlx5_txq_release(dev, queue_v->queue);
 }
 
 /**
@@ -13148,6 +13160,11 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 			(dev, key, items, wks->attr, key_type);
 		last_item = MLX5_FLOW_ITEM_PORT_ID;
 		break;
+	case RTE_FLOW_ITEM_TYPE_PORT_REPRESENTOR:
+		flow_dv_translate_item_port_representor
+			(dev, key, key_type);
+		last_item = MLX5_FLOW_ITEM_PORT_REPRESENTOR;
+		break;
 	case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		flow_dv_translate_item_represented_port
 			(dev, key, items, wks->attr, key_type);
@@ -13354,7 +13371,7 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		last_item = MLX5_FLOW_ITEM_TAG;
 		break;
 	case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
-		flow_dv_translate_item_tx_queue(dev, key, items, key_type);
+		flow_dv_translate_item_sq(key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_SQ;
 		break;
 	case RTE_FLOW_ITEM_TYPE_GTP:
@@ -13564,7 +13581,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			wks.last_item = tunnel ? MLX5_FLOW_ITEM_INNER_FLEX :
 						 MLX5_FLOW_ITEM_OUTER_FLEX;
 			break;
-
 		default:
 			ret = flow_dv_translate_items(dev, items, &wks_m,
 				match_mask, MLX5_SET_MATCHER_SW_M, error);
@@ -13587,7 +13603,9 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 	 * in use.
 	 */
 	if (!(wks.item_flags & MLX5_FLOW_ITEM_PORT_ID) &&
-	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) && priv->sh->esw_mode &&
+	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) &&
+	    !(wks.item_flags & MLX5_FLOW_ITEM_PORT_REPRESENTOR) &&
+	    priv->sh->esw_mode &&
 	    !(attr->egress && !attr->transfer) &&
 	    attr->group != MLX5_FLOW_MREG_CP_TABLE_GROUP) {
 		if (flow_dv_translate_item_port_id_all(dev, match_mask,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 3cbe0305e9..9294866628 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3173,7 +3173,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+	if (priv->sh->config.dv_esw_en &&
+	    priv->fdb_def_rule &&
+	    cfg->external &&
+	    flow_attr->transfer) {
 		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -5137,14 +5140,23 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 }
 
 static uint32_t
-flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
-	uint32_t usable_mask = ~priv->vport_meta_mask;
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
 
-	if (usable_mask)
-		return (1 << rte_bsf32(usable_mask));
-	else
-		return 0;
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return mask;
+}
+
+static uint32_t
+flow_hw_esw_mgr_regc_marker(struct rte_eth_dev *dev)
+{
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return RTE_BIT32(rte_bsf32(mask));
 }
 
 /**
@@ -5170,12 +5182,19 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 	struct rte_flow_item_ethdev port_mask = {
 		.port_id = UINT16_MAX,
 	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
 	struct rte_flow_item items[] = {
 		{
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &port_spec,
 			.mask = &port_mask,
 		},
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
@@ -5185,9 +5204,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match REG_C_0 and a TX queue.
- * Matching on REG_C_0 is set up to match on least significant bit usable
- * by user-space, which is set when packet was originated from E-Switch Manager.
+ * Creates a flow pattern template used to match REG_C_0 and a SQ.
+ * Matching on REG_C_0 is set up to match on all bits usable by user-space.
+ * If traffic was sent from E-Switch Manager, then all usable bits will be set to 0,
+ * except the least significant bit, which will be set to 1.
  *
  * This template is used to set up a table for SQ miss default flow.
  *
@@ -5200,8 +5220,6 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_pattern_template *
 flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
@@ -5211,6 +5229,7 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
 	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
@@ -5232,12 +5251,6 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
-		return NULL;
-	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -5329,9 +5342,8 @@ flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_actions_template *
 flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
-	uint32_t marker_bit_mask = UINT32_MAX;
+	uint32_t marker_mask = flow_hw_esw_mgr_regc_marker_mask(dev);
+	uint32_t marker_bits = flow_hw_esw_mgr_regc_marker(dev);
 	struct rte_flow_actions_template_attr attr = {
 		.transfer = 1,
 	};
@@ -5344,7 +5356,7 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		.src = {
 			.field = RTE_FLOW_FIELD_VALUE,
 		},
-		.width = 1,
+		.width = __builtin_popcount(marker_mask),
 	};
 	struct rte_flow_action_modify_field set_reg_m = {
 		.operation = RTE_FLOW_MODIFY_SET,
@@ -5391,13 +5403,9 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		}
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
-		return NULL;
-	}
-	set_reg_v.dst.offset = rte_bsf32(marker_bit);
-	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
-	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	set_reg_v.dst.offset = rte_bsf32(marker_mask);
+	rte_memcpy(set_reg_v.src.value, &marker_bits, sizeof(marker_bits));
+	rte_memcpy(set_reg_m.src.value, &marker_mask, sizeof(marker_mask));
 	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
 }
 
@@ -5584,7 +5592,7 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -5699,7 +5707,7 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.priority = 0,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -7797,141 +7805,123 @@ flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
 }
 
 int
-mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sqn)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_item_ethdev port_spec = {
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev esw_mgr_spec = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item_ethdev port_mask = {
+	struct rte_flow_item_ethdev esw_mgr_mask = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item items[] = {
-		{
-			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-			.spec = &port_spec,
-			.mask = &port_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
-	};
-	struct rte_flow_action_modify_field modify_field = {
-		.operation = RTE_FLOW_MODIFY_SET,
-		.dst = {
-			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-		},
-		.src = {
-			.field = RTE_FLOW_FIELD_VALUE,
-		},
-		.width = 1,
-	};
-	struct rte_flow_action_jump jump = {
-		.group = 1,
-	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
-			.conf = &modify_field,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_JUMP,
-			.conf = &jump,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
-
-	MLX5_ASSERT(priv->master);
-	if (!priv->dr_ctx ||
-	    !priv->hw_esw_sq_miss_root_tbl)
-		return 0;
-	return flow_hw_create_ctrl_flow(dev, dev,
-					priv->hw_esw_sq_miss_root_tbl,
-					items, 0, actions, 0);
-}
-
-int
-mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
-{
-	uint16_t port_id = dev->data->port_id;
 	struct rte_flow_item_tag reg_c0_spec = {
 		.index = (uint8_t)REG_C_0,
+		.data = flow_hw_esw_mgr_regc_marker(dev),
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
-	struct mlx5_rte_flow_item_sq queue_spec = {
-		.queue = txq,
-	};
-	struct mlx5_rte_flow_item_sq queue_mask = {
-		.queue = UINT32_MAX,
-	};
-	struct rte_flow_item items[] = {
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
-			.spec = &reg_c0_spec,
-			.mask = &reg_c0_mask,
-		},
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
-			.spec = &queue_spec,
-			.mask = &queue_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
 	};
 	struct rte_flow_action_ethdev port = {
 		.port_id = port_id,
 	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
-			.conf = &port,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
+	struct rte_flow_item items[3] = { { 0 } };
+	struct rte_flow_action actions[3] = { { 0 } };
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
-	uint32_t marker_bit;
 	int ret;
 
-	RTE_SET_USED(txq);
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default SQ miss flows.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default SQ miss flows. Default flows will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
 	    !proxy_priv->hw_esw_sq_miss_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
-		rte_errno = EINVAL;
-		return -rte_errno;
+	/*
+	 * Create a root SQ miss flow rule - match E-Switch Manager and SQ,
+	 * and jump to group 1.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = &esw_mgr_spec,
+		.mask = &esw_mgr_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_JUMP,
+	};
+	actions[2] = (struct rte_flow_action) {
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_root_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create root SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
 	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
-	return flow_hw_create_ctrl_flow(dev, proxy_dev,
-					proxy_priv->hw_esw_sq_miss_tbl,
-					items, 0, actions, 0);
+	/*
+	 * Create a non-root SQ miss flow rule - match REG_C_0 marker and SQ,
+	 * and forward to port.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &reg_c0_spec,
+		.mask = &reg_c0_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+		.conf = &port,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create HWS SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
+	}
+	return 0;
 }
 
 int
@@ -7969,17 +7959,24 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default FDB jump rule.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default FDB jump rule. Default rule will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_zero_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index c260c81e57..715f2891cf 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -426,7 +426,7 @@ mlx5_hairpin_queue_peer_update(struct rte_eth_dev *dev, uint16_t peer_queue,
 			mlx5_txq_release(dev, peer_queue);
 			return -rte_errno;
 		}
-		peer_info->qp_id = txq_ctrl->obj->sq->id;
+		peer_info->qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		peer_info->vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		/* 1-to-1 mapping, only the first one is used. */
 		peer_info->peer_q = txq_ctrl->hairpin_conf.peers[0].queue;
@@ -818,7 +818,7 @@ mlx5_hairpin_bind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 		}
 		/* Pass TxQ's information to peer RxQ and try binding. */
 		cur.peer_q = rx_queue;
-		cur.qp_id = txq_ctrl->obj->sq->id;
+		cur.qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		cur.vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		cur.tx_explicit = txq_ctrl->hairpin_conf.tx_explicit;
 		cur.manual_bind = txq_ctrl->hairpin_conf.manual_bind;
@@ -1300,8 +1300,6 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	int ret;
 
 	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
-			goto error;
 		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
 			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
 				goto error;
@@ -1312,10 +1310,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 
 		if (!txq)
 			continue;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = mlx5_txq_get_sqn(txq);
 		if ((priv->representor || priv->master) &&
 		    priv->sh->config.dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
@@ -1325,9 +1320,15 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
-			goto error;
+	if (priv->sh->config.fdb_def_rule) {
+		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				goto error;
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled", dev->data->port_id);
 	}
 	return 0;
 error:
@@ -1393,14 +1394,18 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		    txq_ctrl->hairpin_conf.tx_explicit == 0 &&
 		    txq_ctrl->hairpin_conf.peers[0].port ==
 		    priv->dev_data->port_id) {
-			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			ret = mlx5_ctrl_flow_source_queue(dev,
+					mlx5_txq_get_sqn(txq_ctrl));
 			if (ret) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
 		if (priv->sh->config.dv_esw_en) {
-			if (mlx5_flow_create_devx_sq_miss_flow(dev, i) == 0) {
+			uint32_t q = mlx5_txq_get_sqn(txq_ctrl);
+
+			if (mlx5_flow_create_devx_sq_miss_flow(dev, q) == 0) {
+				mlx5_txq_release(dev, i);
 				DRV_LOG(ERR,
 					"Port %u Tx queue %u SQ create representor devx default miss rule failed.",
 					dev->data->port_id, i);
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index e0fc1872fe..6471ebf59f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -213,6 +213,7 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
 uint64_t mlx5_get_tx_port_offloads(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9150ced72d..5543f2c570 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -27,6 +27,8 @@
 #include "mlx5_tx.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
+#include "rte_pmd_mlx5.h"
+#include "mlx5_flow.h"
 
 /**
  * Allocate TX queue elements.
@@ -1274,6 +1276,51 @@ mlx5_txq_verify(struct rte_eth_dev *dev)
 	return ret;
 }
 
+int
+mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq)
+{
+	return txq->is_hairpin ? txq->obj->sq->id : txq->obj->sq_obj.sq->id;
+}
+
+int
+rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint32_t flow;
+
+	if (rte_eth_dev_is_valid_port(port_id) < 0) {
+		DRV_LOG(ERR, "There is no Ethernet device for port %u.",
+			port_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if ((!priv->representor && !priv->master) ||
+	    !priv->sh->config.dv_esw_en) {
+		DRV_LOG(ERR, "Port %u must be represetnor or master port in E-Switch mode.",
+			port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (sq_num == 0) {
+		DRV_LOG(ERR, "Invalid SQ number.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_flow_hw_esw_create_sq_miss_flow(dev, sq_num);
+#endif
+	flow = mlx5_flow_create_devx_sq_miss_flow(dev, sq_num);
+	if (flow > 0)
+		return 0;
+	DRV_LOG(ERR, "Port %u failed to create default miss flow for SQ %u.",
+		port_id, sq_num);
+	return -rte_errno;
+}
+
 /**
  * Set the Tx queue dynamic timestamp (mask and offset)
  *
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index fbfdd9737b..d4caea5b20 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -139,6 +139,23 @@ int rte_pmd_mlx5_external_rx_queue_id_unmap(uint16_t port_id,
 __rte_experimental
 int rte_pmd_mlx5_host_shaper_config(int port_id, uint8_t rate, uint32_t flags);
 
+/**
+ * Enable traffic for external SQ.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] sq_num
+ *   SQ HW number.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ *   Possible values for rte_errno:
+ *   - EINVAL - invalid sq_number or port type.
+ *   - ENODEV - there is no Ethernet device for this port id.
+ */
+__rte_experimental
+int rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/mlx5/version.map b/drivers/net/mlx5/version.map
index 9942de5079..848270da13 100644
--- a/drivers/net/mlx5/version.map
+++ b/drivers/net/mlx5/version.map
@@ -14,4 +14,5 @@ EXPERIMENTAL {
 	rte_pmd_mlx5_external_rx_queue_id_unmap;
 	# added in 22.07
 	rte_pmd_mlx5_host_shaper_config;
+	rte_pmd_mlx5_external_sq_enable;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 17/18] net/mlx5: support device control of representor matching
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (15 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  2022-10-19 16:25   ` [PATCH v4 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

In some E-Switch use cases applications want to receive all traffic
on a single port. Since currently flow API does not provide a way to
match traffic forwarded to any port representor, this patch adds
support for controlling representor matching on ingress flow rules.

Representor matching is controlled through new device argument
repr_matching_en.

- If representor matching is enabled (default setting),
  then each ingress pattern template has an implicit REPRESENTED_PORT
  item added. Flow rules based on this pattern template will match
  the vport associated with port on which rule is created.
- If representor matching is disabled, then there will be no implicit
  item added. As a result ingress flow rules will match traffic
  coming to any port, not only the port on which flow rule is created.

Representor matching is enabled by default, to provide an expected
default behavior.

This patch enables egress flow rules on representors when E-Switch is
enabled in the following configurations:

- repr_matching_en=1 and dv_xmeta_en=4
- repr_matching_en=1 and dv_xmeta_en=0
- repr_matching_en=0 and dv_xmeta_en=0

When representor matching is enabled, the following logic is
implemented:

1. Creating an egress template table in group 0 for each port. These
   tables will hold default flow rules defined as follows:

      pattern SQ
      actions MODIFY_FIELD (set available bits in REG_C_0 to
                            vport_meta_tag)
              MODIFY_FIELD (copy REG_A to REG_C_1, only when
                            dv_xmeta_en == 4)
              JUMP (group 1)

2. Egress pattern templates created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   available bits of REG_C_0.

3. Egress flow rules created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   vport_meta_tag placed in available bits of REG_C_0.

4. Egress template tables created by an application, which are in
   group n, are placed in group n + 1.

5. Items and actions related to META are operating on REG_A when
   dv_xmeta_en == 0 or REG_C_1 when dv_xmeta_en == 4.

When representor matching is disabled and extended metadata is disabled,
no changes to current logic are required.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/nics/mlx5.rst         |  11 +
 drivers/net/mlx5/linux/mlx5_os.c |  11 +
 drivers/net/mlx5/mlx5.c          |  13 +
 drivers/net/mlx5/mlx5.h          |   5 +
 drivers/net/mlx5/mlx5_flow.c     |   8 +-
 drivers/net/mlx5/mlx5_flow.h     |   7 +
 drivers/net/mlx5/mlx5_flow_hw.c  | 738 ++++++++++++++++++++++++-------
 drivers/net/mlx5/mlx5_trigger.c  | 167 ++++++-
 8 files changed, 794 insertions(+), 166 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 75620c286b..1158920486 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1161,6 +1161,17 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``repr_matching_en`` parameter [int]
+
+  - 0. If representor matching is disabled, then there will be no implicit
+    item added. As a result ingress flow rules will match traffic
+    coming to any port, not only the port on which flow rule is created.
+
+  - 1. If representor matching is enabled (default setting),
+    then each ingress pattern template has an implicit REPRESENTED_PORT
+    item added. Flow rules based on this pattern template will match
+    the vport associated with port on which rule is created.
+
 Supported NICs
 --------------
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index c23fe6daf1..8efc7dbb3f 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1555,6 +1555,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
 		if (priv->sh->config.dv_esw_en) {
+			uint32_t usable_bits;
+			uint32_t required_bits;
+
 			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
 				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
 					     "but it is disabled (configure it through devlink)");
@@ -1567,6 +1570,14 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				err = ENOTSUP;
 				goto error;
 			}
+			usable_bits = __builtin_popcount(priv->sh->dv_regc0_mask);
+			required_bits = __builtin_popcount(priv->vport_meta_mask);
+			if (usable_bits < required_bits) {
+				DRV_LOG(ERR, "Not enough bits available in reg_c[0] to provide "
+					     "representor matching.");
+				err = ENOTSUP;
+				goto error;
+			}
 		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4e532f0807..78234b116c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -181,6 +181,9 @@
 /* HW steering counter's query interval. */
 #define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
 
+/* Device parameter to control representor matching in ingress/egress flows with HWS. */
+#define MLX5_REPR_MATCHING_EN "repr_matching_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1283,6 +1286,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->cnt_svc.service_core = tmp;
 	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
 		config->cnt_svc.cycle_time = tmp;
+	} else if (strcmp(MLX5_REPR_MATCHING_EN, key) == 0) {
+		config->repr_matching = !!tmp;
 	}
 	return 0;
 }
@@ -1321,6 +1326,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_FDB_DEFAULT_RULE_EN,
 		MLX5_HWS_CNT_SERVICE_CORE,
 		MLX5_HWS_CNT_CYCLE_TIME,
+		MLX5_REPR_MATCHING_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1335,6 +1341,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->fdb_def_rule = 1;
 	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
 	config->cnt_svc.service_core = rte_get_main_lcore();
+	config->repr_matching = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1368,6 +1375,11 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 			config->dv_xmeta_en);
 		config->dv_xmeta_en = MLX5_XMETA_MODE_LEGACY;
 	}
+	if (config->dv_flow_en != 2 && !config->repr_matching) {
+		DRV_LOG(DEBUG, "Disabling representor matching is valid only "
+			       "when HW Steering is enabled.");
+		config->repr_matching = 1;
+	}
 	if (config->tx_pp && !sh->dev_cap.txpp_en) {
 		DRV_LOG(ERR, "Packet pacing is not supported.");
 		rte_errno = ENODEV;
@@ -1411,6 +1423,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
 	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
+	DRV_LOG(DEBUG, "\"repr_matching_en\" is %u.", config->repr_matching);
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9a1718e2f2..87c90d58d7 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -321,6 +321,7 @@ struct mlx5_sh_config {
 	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
+	uint32_t repr_matching:1; /* Enable implicit vport matching in HWS FDB. */
 };
 
 /* Structure for VF VLAN workaround. */
@@ -371,6 +372,7 @@ struct mlx5_hw_q_job {
 			void *out_data;
 		} __rte_packed;
 		struct rte_flow_item_ethdev port_spec;
+		struct rte_flow_item_tag tag_spec;
 	} __rte_packed;
 };
 
@@ -1680,6 +1682,9 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
+	struct rte_flow_pattern_template *hw_tx_repr_tagging_pt;
+	struct rte_flow_actions_template *hw_tx_repr_tagging_at;
+	struct rte_flow_template_table *hw_tx_repr_tagging_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 76187d76ea..60af09dbeb 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1127,7 +1127,11 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 		}
 		break;
 	case MLX5_METADATA_TX:
-		return REG_A;
+		if (config->dv_flow_en == 2 && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		} else {
+			return REG_A;
+		}
 	case MLX5_METADATA_FDB:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
@@ -11323,7 +11327,7 @@ mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 			return 0;
 		}
 	}
-	return rte_flow_error_set(error, EINVAL,
+	return rte_flow_error_set(error, ENODEV,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, "unable to find a proxy port");
 }
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 88d92b18c7..edf45b814d 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1207,12 +1207,18 @@ struct rte_flow_pattern_template {
 	struct rte_flow_pattern_template_attr attr;
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
+	uint64_t orig_item_nb; /* Number of pattern items provided by the user (with END item). */
 	uint32_t refcnt;  /* Reference counter. */
 	/*
 	 * If true, then rule pattern should be prepended with
 	 * represented_port pattern item.
 	 */
 	bool implicit_port;
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * tag pattern item for representor matching.
+	 */
+	bool implicit_tag;
 };
 
 /* Flow action template struct. */
@@ -2489,6 +2495,7 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_actions_template_attr *attr,
 		const struct rte_flow_action actions[],
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 9294866628..49186c4339 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -32,12 +32,15 @@
 /* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Lowest flow group usable by an application. */
+/* Lowest flow group usable by an application if group translation is done. */
 #define MLX5_HW_LOWEST_USABLE_GROUP (1)
 
 /* Maximum group index usable by user applications for transfer flows. */
 #define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
 
+/* Maximum group index usable by user applications for egress flows. */
+#define MLX5_HW_MAX_EGRESS_GROUP (UINT32_MAX - 1)
+
 /* Lowest priority for HW root table. */
 #define MLX5_HW_LOWEST_PRIO_ROOT 15
 
@@ -61,6 +64,9 @@ flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
 			       const struct mlx5_hw_actions *hw_acts,
 			       const struct rte_flow_action *action);
 
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev);
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -2346,21 +2352,18 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 		       uint8_t pattern_template_index,
 		       struct mlx5_hw_q_job *job)
 {
-	if (table->its[pattern_template_index]->implicit_port) {
-		const struct rte_flow_item *curr_item;
-		unsigned int nb_items;
-		bool found_end;
-		unsigned int i;
-
-		/* Count number of pattern items. */
-		nb_items = 0;
-		found_end = false;
-		for (curr_item = items; !found_end; ++curr_item) {
-			++nb_items;
-			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-				found_end = true;
+	struct rte_flow_pattern_template *pt = table->its[pattern_template_index];
+
+	/* Only one implicit item can be added to flow rule pattern. */
+	MLX5_ASSERT(!pt->implicit_port || !pt->implicit_tag);
+	/* At least one item was allocated in job descriptor for items. */
+	MLX5_ASSERT(MLX5_HW_MAX_ITEMS >= 1);
+	if (pt->implicit_port) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
-		/* Prepend represented port item. */
+		/* Set up represented port item in job descriptor. */
 		job->port_spec = (struct rte_flow_item_ethdev){
 			.port_id = dev->data->port_id,
 		};
@@ -2368,21 +2371,26 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &job->port_spec,
 		};
-		found_end = false;
-		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
-			job->items[i] = items[i - 1];
-			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
-				found_end = true;
-				break;
-			}
-		}
-		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
+		return job->items;
+	} else if (pt->implicit_tag) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
 			rte_errno = ENOMEM;
 			return NULL;
 		}
+		/* Set up tag item in job descriptor. */
+		job->tag_spec = (struct rte_flow_item_tag){
+			.data = flow_hw_tx_tag_regc_value(dev),
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &job->tag_spec,
+		};
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
 		return job->items;
+	} else {
+		return items;
 	}
-	return items;
 }
 
 /**
@@ -2960,6 +2968,11 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		     uint8_t nb_action_templates,
 		     struct rte_flow_error *error)
 {
+	struct rte_flow_error sub_error = {
+		.type = RTE_FLOW_ERROR_TYPE_NONE,
+		.cause = NULL,
+		.message = NULL,
+	};
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5dr_matcher_attr matcher_attr = {0};
 	struct rte_flow_template_table *tbl = NULL;
@@ -2970,7 +2983,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
-		.error = error,
+		.error = &sub_error,
 		.data = &flow_attr,
 	};
 	struct mlx5_indexed_pool_config cfg = {
@@ -3064,7 +3077,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			continue;
 		err = __flow_hw_actions_translate(dev, &tbl->cfg,
 						  &tbl->ats[i].acts,
-						  action_templates[i], error);
+						  action_templates[i], &sub_error);
 		if (err) {
 			i++;
 			goto at_error;
@@ -3105,12 +3118,11 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mlx5_free(tbl);
 	}
 	if (error != NULL) {
-		rte_flow_error_set(error, err,
-				error->type == RTE_FLOW_ERROR_TYPE_NONE ?
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED : error->type,
-				NULL,
-				error->message == NULL ?
-				"fail to create rte table" : error->message);
+		if (sub_error.type == RTE_FLOW_ERROR_TYPE_NONE)
+			rte_flow_error_set(error, err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					   "Failed to create template table");
+		else
+			rte_memcpy(error, &sub_error, sizeof(sub_error));
 	}
 	return NULL;
 }
@@ -3171,9 +3183,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en &&
+	if (config->dv_esw_en &&
 	    priv->fdb_def_rule &&
 	    cfg->external &&
 	    flow_attr->transfer) {
@@ -3183,6 +3196,22 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 						  NULL,
 						  "group index not supported");
 		*table_group = group + 1;
+	} else if (config->dv_esw_en &&
+		   !(config->repr_matching && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) &&
+		   cfg->external &&
+		   flow_attr->egress) {
+		/*
+		 * On E-Switch setups, egress group translation is not done if and only if
+		 * representor matching is disabled and legacy metadata mode is selected.
+		 * In all other cases, egree group 0 is reserved for representor tagging flows
+		 * and metadata copy flows.
+		 */
+		if (group > MLX5_HW_MAX_EGRESS_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
 	} else {
 		*table_group = group;
 	}
@@ -3223,7 +3252,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -3232,12 +3260,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
-		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-				  "egress flows are not supported with HW Steering"
-				  " when E-Switch is enabled");
-		return NULL;
-	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -4493,26 +4515,28 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
-static struct rte_flow_item *
-flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
-			       struct rte_flow_error *error)
+static uint32_t
+flow_hw_count_items(const struct rte_flow_item *items)
 {
 	const struct rte_flow_item *curr_item;
-	struct rte_flow_item *copied_items;
-	bool found_end;
-	unsigned int nb_items;
-	unsigned int i;
-	size_t size;
+	uint32_t nb_items;
 
-	/* Count number of pattern items. */
 	nb_items = 0;
-	found_end = false;
-	for (curr_item = items; !found_end; ++curr_item) {
+	for (curr_item = items; curr_item->type != RTE_FLOW_ITEM_TYPE_END; ++curr_item)
 		++nb_items;
-		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-			found_end = true;
-	}
-	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	return ++nb_items;
+}
+
+static struct rte_flow_item *
+flow_hw_prepend_item(const struct rte_flow_item *items,
+		     const uint32_t nb_items,
+		     const struct rte_flow_item *new_item,
+		     struct rte_flow_error *error)
+{
+	struct rte_flow_item *copied_items;
+	size_t size;
+
+	/* Allocate new array of items. */
 	size = sizeof(*copied_items) * (nb_items + 1);
 	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
 	if (!copied_items) {
@@ -4522,14 +4546,9 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 				   "cannot allocate item template");
 		return NULL;
 	}
-	copied_items[0] = (struct rte_flow_item){
-		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-		.spec = NULL,
-		.last = NULL,
-		.mask = &rte_flow_item_ethdev_mask,
-	};
-	for (i = 1; i < nb_items + 1; ++i)
-		copied_items[i] = items[i - 1];
+	/* Put new item at the beginning and copy the rest. */
+	copied_items[0] = *new_item;
+	rte_memcpy(&copied_items[1], items, sizeof(*items) * nb_items);
 	return copied_items;
 }
 
@@ -4550,17 +4569,13 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	if (priv->sh->config.dv_esw_en) {
 		MLX5_ASSERT(priv->master || priv->representor);
 		if (priv->master) {
-			/*
-			 * It is allowed to specify ingress, egress and transfer attributes
-			 * at the same time, in order to construct flows catching all missed
-			 * FDB traffic and forwarding it to the master port.
-			 */
-			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+			if ((attr->ingress && attr->egress) ||
+			    (attr->ingress && attr->transfer) ||
+			    (attr->egress && attr->transfer))
 				return rte_flow_error_set(error, EINVAL,
 							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-							  "only one or all direction attributes"
-							  " at once can be used on transfer proxy"
-							  " port");
+							  "only one direction attribute at once"
+							  " can be used on transfer proxy port");
 		} else {
 			if (attr->transfer)
 				return rte_flow_error_set(error, EINVAL,
@@ -4613,11 +4628,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			break;
 		}
 		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
-			if (attr->ingress || attr->egress)
+			if (attr->ingress && priv->sh->config.repr_matching)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when ingress attribute is set");
+			if (attr->egress)
 				return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
 						  "represented port item cannot be used"
-						  " when transfer attribute is set");
+						  " when egress attribute is set");
 			break;
 		case RTE_FLOW_ITEM_TYPE_META:
 			if (!priv->sh->config.dv_esw_en ||
@@ -4679,6 +4699,17 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static bool
+flow_hw_pattern_has_sq_match(const struct rte_flow_item *items)
+{
+	unsigned int i;
+
+	for (i = 0; items[i].type != RTE_FLOW_ITEM_TYPE_END; ++i)
+		if (items[i].type == (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ)
+			return true;
+	return false;
+}
+
 /**
  * Create flow item template.
  *
@@ -4704,17 +4735,53 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
+	uint64_t orig_item_nb;
+	struct rte_flow_item port = {
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	struct rte_flow_item_tag tag_v = {
+		.data = 0,
+		.index = REG_C_0,
+	};
+	struct rte_flow_item_tag tag_m = {
+		.data = flow_hw_tx_tag_regc_mask(dev),
+		.index = 0xff,
+	};
+	struct rte_flow_item tag = {
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &tag_v,
+		.mask = &tag_m,
+		.last = NULL
+	};
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
-		copied_items = flow_hw_copy_prepend_port_item(items, error);
+	orig_item_nb = flow_hw_count_items(items);
+	if (priv->sh->config.dv_esw_en &&
+	    priv->sh->config.repr_matching &&
+	    attr->ingress && !attr->egress && !attr->transfer) {
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &port, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else if (priv->sh->config.dv_esw_en &&
+		   priv->sh->config.repr_matching &&
+		   !attr->ingress && attr->egress && !attr->transfer) {
+		if (flow_hw_pattern_has_sq_match(items)) {
+			DRV_LOG(DEBUG, "Port %u omitting implicit REG_C_0 match for egress "
+				       "pattern template", dev->data->port_id);
+			tmpl_items = items;
+			goto setup_pattern_template;
+		}
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &tag, error);
 		if (!copied_items)
 			return NULL;
 		tmpl_items = copied_items;
 	} else {
 		tmpl_items = items;
 	}
+setup_pattern_template:
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
 		if (copied_items)
@@ -4726,6 +4793,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
+	it->orig_item_nb = orig_item_nb;
 	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
 		if (copied_items)
@@ -4738,11 +4806,15 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
-	it->implicit_port = !!copied_items;
+	if (copied_items) {
+		if (attr->ingress)
+			it->implicit_port = true;
+		else if (attr->egress)
+			it->implicit_tag = true;
+		mlx5_free(copied_items);
+	}
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
-	if (copied_items)
-		mlx5_free(copied_items);
 	return it;
 }
 
@@ -5139,6 +5211,254 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+/**
+ * Create an egress pattern template matching on source SQ.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to pattern template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_repr_sq_pattern_tmpl(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t mask = priv->sh->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(mask != 0);
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT(__builtin_popcount(mask) >= __builtin_popcount(priv->vport_meta_mask));
+	return mask;
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t tag;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(priv->vport_meta_mask != 0);
+	tag = priv->vport_meta_tag >> (rte_bsf32(priv->vport_meta_mask));
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT((tag & priv->sh->dv_regc0_mask) == tag);
+	return tag;
+}
+
+static void
+flow_hw_update_action_mask(struct rte_flow_action *action,
+			   struct rte_flow_action *mask,
+			   enum rte_flow_action_type type,
+			   void *conf_v,
+			   void *conf_m)
+{
+	action->type = type;
+	action->conf = conf_v;
+	mask->type = type;
+	mask->conf = conf_m;
+}
+
+/**
+ * Create an egress actions template with MODIFY_FIELD action for setting unused REG_C_0 bits
+ * to vport tag and JUMP action to group 1.
+ *
+ * If extended metadata mode is enabled, then MODIFY_FIELD action for copying software metadata
+ * to REG_C_1 is added as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to actions template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_repr_tag_jump_acts_tmpl(struct rte_eth_dev *dev)
+{
+	uint32_t tag_mask = flow_hw_tx_tag_regc_mask(dev);
+	uint32_t tag_value = flow_hw_tx_tag_regc_value(dev);
+	struct rte_flow_actions_template_attr attr = {
+		.egress = 1,
+	};
+	struct rte_flow_action_modify_field set_tag_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+			.offset = rte_bsf32(tag_mask),
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = __builtin_popcount(tag_mask),
+	};
+	struct rte_flow_action_modify_field set_tag_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_modify_field copy_metadata_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action_modify_field copy_metadata_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[4] = { { 0 } };
+	struct rte_flow_action actions_m[4] = { { 0 } };
+	unsigned int idx = 0;
+
+	rte_memcpy(set_tag_v.src.value, &tag_value, sizeof(tag_value));
+	rte_memcpy(set_tag_m.src.value, &tag_mask, sizeof(tag_mask));
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+				   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+				   &set_tag_v, &set_tag_m);
+	idx++;
+	if (MLX5_SH(dev)->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+					   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+					   &copy_metadata_v, &copy_metadata_m);
+		idx++;
+	}
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_JUMP,
+				   &jump_v, &jump_m);
+	idx++;
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_END,
+				   NULL, NULL);
+	idx++;
+	MLX5_ASSERT(idx <= RTE_DIM(actions_v));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
+static void
+flow_hw_cleanup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hw_tx_repr_tagging_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_tx_repr_tagging_tbl, NULL);
+		priv->hw_tx_repr_tagging_tbl = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_at) {
+		flow_hw_actions_template_destroy(dev, priv->hw_tx_repr_tagging_at, NULL);
+		priv->hw_tx_repr_tagging_at = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_pt) {
+		flow_hw_pattern_template_destroy(dev, priv->hw_tx_repr_tagging_pt, NULL);
+		priv->hw_tx_repr_tagging_pt = NULL;
+	}
+}
+
+/**
+ * Setup templates and table used to create default Tx flow rules. These default rules
+ * allow for matching Tx representor traffic using a vport tag placed in unused bits of
+ * REG_C_0 register.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static int
+flow_hw_setup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
+
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	priv->hw_tx_repr_tagging_pt = flow_hw_create_tx_repr_sq_pattern_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_pt)
+		goto error;
+	priv->hw_tx_repr_tagging_at = flow_hw_create_tx_repr_tag_jump_acts_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_at)
+		goto error;
+	priv->hw_tx_repr_tagging_tbl = flow_hw_table_create(dev, &cfg,
+							    &priv->hw_tx_repr_tagging_pt, 1,
+							    &priv->hw_tx_repr_tagging_at, 1,
+							    NULL);
+	if (!priv->hw_tx_repr_tagging_tbl)
+		goto error;
+	return 0;
+error:
+	flow_hw_cleanup_tx_repr_tagging(dev);
+	return -rte_errno;
+}
+
 static uint32_t
 flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
@@ -5545,29 +5865,43 @@ flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
 		},
 		.width = UINT32_MAX,
 	};
-	const struct rte_flow_action copy_reg_action[] = {
+	const struct rte_flow_action_jump jump_action = {
+		.group = 1,
+	};
+	const struct rte_flow_action_jump jump_mask = {
+		.group = UINT32_MAX,
+	};
+	const struct rte_flow_action actions[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_action,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
-	const struct rte_flow_action copy_reg_mask[] = {
+	const struct rte_flow_action masks[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_mask,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_mask,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
 	struct rte_flow_error drop_err;
 
 	RTE_SET_USED(drop_err);
-	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
-					       copy_reg_mask, &drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, actions,
+					       masks, &drop_err);
 }
 
 /**
@@ -5745,63 +6079,21 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
 	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
 	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
+	uint32_t repr_matching = priv->sh->config.repr_matching;
 
-	/* Item templates */
+	/* Create templates and table for default SQ miss flow rules - root table. */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
 	if (!esw_mgr_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
-	if (!regc_sq_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
-	if (!port_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
-		if (!tx_meta_items_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Action templates */
 	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
 	if (!regc_jump_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
-	if (!port_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create port action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
-			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
-	if (!jump_one_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
-		if (!tx_meta_actions_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
 			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
@@ -5810,6 +6102,19 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default SQ miss flow rules - non-root table. */
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
 	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
@@ -5818,6 +6123,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default FDB jump flow rules. */
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
 	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
 							       jump_one_actions_tmpl);
@@ -5826,7 +6145,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+	/* Create templates and table for default Tx metadata copy flow rule. */
+	if (!repr_matching && xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
 		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
 		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
 					tx_meta_items_tmpl, tx_meta_actions_tmpl);
@@ -5850,7 +6182,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+	if (tx_meta_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
@@ -5858,7 +6190,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
 	if (regc_jump_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+	if (tx_meta_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
@@ -6199,6 +6531,13 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (priv->sh->config.dv_esw_en && priv->sh->config.repr_matching) {
+		ret = flow_hw_setup_tx_repr_tagging(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
 	if (is_proxy) {
 		ret = flow_hw_create_vport_actions(priv);
 		if (ret) {
@@ -6325,6 +6664,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	flow_hw_cleanup_tx_repr_tagging(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -7720,45 +8060,30 @@ flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 }
 
 /**
- * Destroys control flows created on behalf of @p owner_dev device.
+ * Destroys control flows created on behalf of @p owner device on @p dev device.
  *
- * @param owner_dev
+ * @param dev
+ *   Pointer to Ethernet device on which control flows were created.
+ * @param owner
  *   Pointer to Ethernet device owning control flows.
  *
  * @return
  *   0 on success, otherwise negative error code is returned and
  *   rte_errno is set.
  */
-int
-mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+static int
+flow_hw_flush_ctrl_flows_owned_by(struct rte_eth_dev *dev, struct rte_eth_dev *owner)
 {
-	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
-	struct rte_eth_dev *proxy_dev;
-	struct mlx5_priv *proxy_priv;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hw_ctrl_flow *cf;
 	struct mlx5_hw_ctrl_flow *cf_next;
-	uint16_t owner_port_id = owner_dev->data->port_id;
-	uint16_t proxy_port_id = owner_dev->data->port_id;
 	int ret;
 
-	if (owner_priv->sh->config.dv_esw_en) {
-		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
-			DRV_LOG(ERR, "Unable to find proxy port for port %u",
-				owner_port_id);
-			rte_errno = EINVAL;
-			return -rte_errno;
-		}
-		proxy_dev = &rte_eth_devices[proxy_port_id];
-		proxy_priv = proxy_dev->data->dev_private;
-	} else {
-		proxy_dev = owner_dev;
-		proxy_priv = owner_priv;
-	}
-	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
 	while (cf != NULL) {
 		cf_next = LIST_NEXT(cf, next);
-		if (cf->owner_dev == owner_dev) {
-			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+		if (cf->owner_dev == owner) {
+			ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
 			if (ret) {
 				rte_errno = ret;
 				return -ret;
@@ -7771,6 +8096,50 @@ mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
 	return 0;
 }
 
+/**
+ * Destroys control flows created for @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	/* Flush all flows created by this port for itself. */
+	ret = flow_hw_flush_ctrl_flows_owned_by(owner_dev, owner_dev);
+	if (ret)
+		return ret;
+	/* Flush all flows created for this port on proxy port. */
+	if (owner_priv->sh->config.dv_esw_en) {
+		ret = rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL);
+		if (ret == -ENODEV) {
+			DRV_LOG(DEBUG, "Unable to find transfer proxy port for port %u. It was "
+				       "probably closed. Control flows were cleared.",
+				       owner_port_id);
+			rte_errno = 0;
+			return 0;
+		} else if (ret) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u (ret = %d)",
+				owner_port_id, ret);
+			return ret;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+	} else {
+		proxy_dev = owner_dev;
+	}
+	return flow_hw_flush_ctrl_flows_owned_by(proxy_dev, owner_dev);
+}
+
 /**
  * Destroys all control flows created on @p dev device.
  *
@@ -8022,6 +8391,9 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
@@ -8034,6 +8406,60 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+int
+mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &sq_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	/*
+	 * Allocate actions array suitable for all cases - extended metadata enabled or not.
+	 * With extended metadata there will be an additional MODIFY_FIELD action before JUMP.
+	 */
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD },
+		{ .type = RTE_FLOW_ACTION_TYPE_JUMP },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+
+	/* It is assumed that caller checked for representor matching. */
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	if (!priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Port %u must be configured for HWS, before creating "
+			       "default egress flow rules. Omitting creation.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!priv->hw_tx_repr_tagging_tbl) {
+		DRV_LOG(ERR, "Port %u is configured for HWS, but table for default "
+			     "egress flow rules does not exist.",
+			     dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * If extended metadata mode is enabled, then an additional MODIFY_FIELD action must be
+	 * placed before terminating JUMP action.
+	 */
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		actions[1].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+		actions[2].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	}
+	return flow_hw_create_ctrl_flow(dev, dev, priv->hw_tx_repr_tagging_tbl,
+					items, 0, actions, 0);
+}
+
 void
 mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 {
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 715f2891cf..8c9d5c1b13 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1065,6 +1065,69 @@ mlx5_hairpin_get_peer_ports(struct rte_eth_dev *dev, uint16_t *peer_ports,
 	return ret;
 }
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+
+/**
+ * Check if starting representor port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then starting representor port
+ * is allowed if and only if transfer proxy port is started as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping representor port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_representor_port_allowed_start(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = UINT16_MAX;
+	int ret;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->representor);
+	ret = rte_flow_pick_transfer_proxy(dev->data->port_id, &proxy_port_id, NULL);
+	if (ret) {
+		if (ret == -ENODEV)
+			DRV_LOG(ERR, "Starting representor port %u is not allowed. Transfer "
+				     "proxy port is not available.", dev->data->port_id);
+		else
+			DRV_LOG(ERR, "Failed to pick transfer proxy for port %u (ret = %d)",
+				dev->data->port_id, ret);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (proxy_priv->dr_ctx == NULL) {
+		DRV_LOG(DEBUG, "Starting representor port %u is allowed, but default traffic flows"
+			       " will not be created. Transfer proxy port must be configured"
+			       " for HWS and started.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!proxy_dev->data->dev_started) {
+		DRV_LOG(ERR, "Failed to start port %u: transfer proxy (port %u) must be started",
+			     dev->data->port_id, proxy_port_id);
+		rte_errno = EAGAIN;
+		return -rte_errno;
+	}
+	if (priv->sh->config.repr_matching && !priv->dr_ctx) {
+		DRV_LOG(ERR, "Failed to start port %u: with representor matching enabled, port "
+			     "must be configured for HWS", dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return 0;
+}
+
+#endif
+
 /**
  * DPDK callback to start the device.
  *
@@ -1084,6 +1147,19 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	int fine_inline;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_start;
+		/* If master is being started, then it is always allowed. */
+		if (priv->master)
+			goto continue_dev_start;
+		if (mlx5_hw_representor_port_allowed_start(dev))
+			return -rte_errno;
+	}
+continue_dev_start:
+#endif
 	fine_inline = rte_mbuf_dynflag_lookup
 		(RTE_PMD_MLX5_FINE_GRANULARITY_INLINE, NULL);
 	if (fine_inline >= 0)
@@ -1248,6 +1324,53 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	return -rte_errno;
 }
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+/**
+ * Check if stopping transfer proxy port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then it is allowed to stop it
+ * if and only if all other representor ports are stopped.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping transfer proxy port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_proxy_port_allowed_stop(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	bool representor_started = false;
+	uint16_t port_id;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->master);
+	/* If transfer proxy port was not configured for HWS, then stopping it is allowed. */
+	if (!priv->dr_ctx)
+		return 0;
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_id != dev->data->port_id &&
+		    port_priv->domain_id == priv->domain_id &&
+		    port_dev->data->dev_started)
+			representor_started = true;
+	}
+	if (representor_started) {
+		DRV_LOG(INFO, "Failed to stop port %u: attached representor ports"
+			      " must be stopped before stopping transfer proxy port",
+			      dev->data->port_id);
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
+	return 0;
+}
+#endif
+
 /**
  * DPDK callback to stop the device.
  *
@@ -1261,6 +1384,21 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_stop;
+		/* If representor is being stopped, then it is always allowed. */
+		if (priv->representor)
+			goto continue_dev_stop;
+		if (mlx5_hw_proxy_port_allowed_stop(dev)) {
+			dev->data->dev_started = 1;
+			return -rte_errno;
+		}
+	}
+continue_dev_stop:
+#endif
 	dev->data->dev_started = 0;
 	/* Prevent crashes when queues are still in use. */
 	dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
@@ -1296,13 +1434,21 @@ static int
 mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	unsigned int i;
 	int ret;
 
-	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
-			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
-				goto error;
+	/*
+	 * With extended metadata enabled, the Tx metadata copy is handled by default
+	 * Tx tagging flow rules, so default Tx flow rule is not needed. It is only
+	 * required when representor matching is disabled.
+	 */
+	if (config->dv_esw_en &&
+	    !config->repr_matching &&
+	    config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->master) {
+		if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+			goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
@@ -1311,17 +1457,22 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		if (!txq)
 			continue;
 		queue = mlx5_txq_get_sqn(txq);
-		if ((priv->representor || priv->master) &&
-		    priv->sh->config.dv_esw_en) {
+		if ((priv->representor || priv->master) && config->dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
+		if (config->dv_esw_en && config->repr_matching) {
+			if (mlx5_flow_hw_tx_repr_matching_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.fdb_def_rule) {
-		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+	if (config->fdb_def_rule) {
+		if ((priv->master || priv->representor) && config->dv_esw_en) {
 			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
 				priv->fdb_def_rule = 1;
 			else
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v4 18/18] net/mlx5: create control flow rules with HWS
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (16 preceding siblings ...)
  2022-10-19 16:25   ` [PATCH v4 17/18] net/mlx5: support device control of representor matching Suanming Mou
@ 2022-10-19 16:25   ` Suanming Mou
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-19 16:25 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adds creation of control flow rules required to receive
default traffic (based on port configuration) with HWS.

Control flow rules are created on port start and destroyed on port stop.
Handling of destroying these rules was already implemented before that
patch.

Control flow rules are created if and only if flow isolation mode is
disabled and creation process goes as follows:

- Port configuration is collected into a set of flags. Each flag
  corresponds to a certain Ethernet pattern type, defined by
  mlx5_flow_ctrl_rx_eth_pattern_type enumeration. There is a separate
  flag for VLAN filtering.
- For each possible Ethernet pattern type and:
  - For each possible RSS action configuration:
    - If configuration flags do not match this combination, it is
      omitted.
    - A template table is created using this combination of pattern
      and actions template (templates are fetched from hw_ctrl_rx
      struct stored in port's private data).
    - Flow rules are created in this table.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/mlx5.h         |   4 +
 drivers/net/mlx5/mlx5_flow.h    |  56 +++
 drivers/net/mlx5/mlx5_flow_hw.c | 799 ++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxq.c     |   3 +-
 drivers/net/mlx5/mlx5_trigger.c |  20 +-
 5 files changed, 880 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 87c90d58d7..911bb43344 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1636,6 +1636,8 @@ struct mlx5_hw_ctrl_flow {
 	struct rte_flow *flow;
 };
 
+struct mlx5_flow_hw_ctrl_rx;
+
 struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared device context. */
@@ -1767,6 +1769,8 @@ struct mlx5_priv {
 	/* Management data for ASO connection tracking. */
 	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
 	struct mlx5_aso_mtr_pool *hws_mpool; /* HW steering's Meter pool. */
+	struct mlx5_flow_hw_ctrl_rx *hw_ctrl_rx;
+	/**< HW steering templates used to create control flow rules. */
 #endif
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index edf45b814d..e9e4537700 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2113,6 +2113,62 @@ rte_col_2_mlx5_col(enum rte_color rcol)
 	return MLX5_FLOW_COLOR_UNDEFINED;
 }
 
+/* All types of Ethernet patterns used in control flow rules. */
+enum mlx5_flow_ctrl_rx_eth_pattern_type {
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL = 0,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX,
+};
+
+/* All types of RSS actions used in control flow rules. */
+enum mlx5_flow_ctrl_rx_expanded_rss_type {
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_NON_IP = 0,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX,
+};
+
+/**
+ * Contains pattern template, template table and its attributes for a single
+ * combination of Ethernet pattern and RSS action. Used to create control flow rules
+ * with HWS.
+ */
+struct mlx5_flow_hw_ctrl_rx_table {
+	struct rte_flow_template_table_attr attr;
+	struct rte_flow_pattern_template *pt;
+	struct rte_flow_template_table *tbl;
+};
+
+/* Contains all templates required to create control flow rules with HWS. */
+struct mlx5_flow_hw_ctrl_rx {
+	struct rte_flow_actions_template *rss[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX];
+	struct mlx5_flow_hw_ctrl_rx_table tables[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX]
+						[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX];
+};
+
+#define MLX5_CTRL_PROMISCUOUS    (RTE_BIT32(0))
+#define MLX5_CTRL_ALL_MULTICAST  (RTE_BIT32(1))
+#define MLX5_CTRL_BROADCAST      (RTE_BIT32(2))
+#define MLX5_CTRL_IPV4_MULTICAST (RTE_BIT32(3))
+#define MLX5_CTRL_IPV6_MULTICAST (RTE_BIT32(4))
+#define MLX5_CTRL_DMAC           (RTE_BIT32(5))
+#define MLX5_CTRL_VLAN_FILTER    (RTE_BIT32(6))
+
+int mlx5_flow_hw_ctrl_flows(struct rte_eth_dev *dev, uint32_t flags);
+void mlx5_flow_hw_cleanup_ctrl_rx_templates(struct rte_eth_dev *dev);
+
 int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
 			     const struct mlx5_flow_tunnel *tunnel,
 			     uint32_t group, uint32_t *table,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 49186c4339..84c891cab6 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -47,6 +47,11 @@
 /* Lowest priority for HW non-root table. */
 #define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
+/* Priorities for Rx control flow rules. */
+#define MLX5_HW_CTRL_RX_PRIO_L2 (MLX5_HW_LOWEST_PRIO_ROOT)
+#define MLX5_HW_CTRL_RX_PRIO_L3 (MLX5_HW_LOWEST_PRIO_ROOT - 1)
+#define MLX5_HW_CTRL_RX_PRIO_L4 (MLX5_HW_LOWEST_PRIO_ROOT - 2)
+
 #define MLX5_HW_VLAN_PUSH_TYPE_IDX 0
 #define MLX5_HW_VLAN_PUSH_VID_IDX 1
 #define MLX5_HW_VLAN_PUSH_PCP_IDX 2
@@ -84,6 +89,72 @@ static uint32_t mlx5_hw_act_flag[MLX5_HW_ACTION_FLAG_MAX]
 	},
 };
 
+/* Ethernet item spec for promiscuous mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_promisc_spec = {
+	.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for promiscuous mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_promisc_mask = {
+	.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for all multicast mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_mcast_spec = {
+	.dst.addr_bytes = "\x01\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for all multicast mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_mcast_mask = {
+	.dst.addr_bytes = "\x01\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for IPv4 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv4_mcast_spec = {
+	.dst.addr_bytes = "\x01\x00\x5e\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for IPv4 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv4_mcast_mask = {
+	.dst.addr_bytes = "\xff\xff\xff\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for IPv6 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv6_mcast_spec = {
+	.dst.addr_bytes = "\x33\x33\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for IPv6 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv6_mcast_mask = {
+	.dst.addr_bytes = "\xff\xff\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item mask for unicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_dmac_mask = {
+	.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for broadcast. */
+static const struct rte_flow_item_eth ctrl_rx_eth_bcast_spec = {
+	.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
 /**
  * Set rxq flag.
  *
@@ -6346,6 +6417,365 @@ flow_hw_create_vlan(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static void
+flow_hw_cleanup_ctrl_rx_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	unsigned int j;
+
+	if (!priv->hw_ctrl_rx)
+		return;
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			struct rte_flow_template_table *tbl = priv->hw_ctrl_rx->tables[i][j].tbl;
+			struct rte_flow_pattern_template *pt = priv->hw_ctrl_rx->tables[i][j].pt;
+
+			if (tbl)
+				claim_zero(flow_hw_table_destroy(dev, tbl, NULL));
+			if (pt)
+				claim_zero(flow_hw_pattern_template_destroy(dev, pt, NULL));
+		}
+	}
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++i) {
+		struct rte_flow_actions_template *at = priv->hw_ctrl_rx->rss[i];
+
+		if (at)
+			claim_zero(flow_hw_actions_template_destroy(dev, at, NULL));
+	}
+	mlx5_free(priv->hw_ctrl_rx);
+	priv->hw_ctrl_rx = NULL;
+}
+
+static uint64_t
+flow_hw_ctrl_rx_rss_type_hash_types(const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	switch (rss_type) {
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_NON_IP:
+		return 0;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4:
+		return RTE_ETH_RSS_IPV4 | RTE_ETH_RSS_FRAG_IPV4 | RTE_ETH_RSS_NONFRAG_IPV4_OTHER;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP:
+		return RTE_ETH_RSS_NONFRAG_IPV4_UDP;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP:
+		return RTE_ETH_RSS_NONFRAG_IPV4_TCP;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6:
+		return RTE_ETH_RSS_IPV6 | RTE_ETH_RSS_FRAG_IPV6 | RTE_ETH_RSS_NONFRAG_IPV6_OTHER;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP:
+		return RTE_ETH_RSS_NONFRAG_IPV6_UDP | RTE_ETH_RSS_IPV6_UDP_EX;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP:
+		return RTE_ETH_RSS_NONFRAG_IPV6_TCP | RTE_ETH_RSS_IPV6_TCP_EX;
+	default:
+		/* Should not reach here. */
+		MLX5_ASSERT(false);
+		return 0;
+	}
+}
+
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_rx_rss_template(struct rte_eth_dev *dev,
+				    const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_actions_template_attr attr = {
+		.ingress = 1,
+	};
+	uint16_t queue[RTE_MAX_QUEUES_PER_PORT];
+	struct rte_flow_action_rss rss_conf = {
+		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		.level = 0,
+		.types = 0,
+		.key_len = priv->rss_conf.rss_key_len,
+		.key = priv->rss_conf.rss_key,
+		.queue_num = priv->reta_idx_n,
+		.queue = queue,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_RSS,
+			.conf = &rss_conf,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action masks[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_RSS,
+			.conf = &rss_conf,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_actions_template *at;
+	struct rte_flow_error error;
+	unsigned int i;
+
+	MLX5_ASSERT(priv->reta_idx_n > 0 && priv->reta_idx);
+	/* Select proper RSS hash types and based on that configure the actions template. */
+	rss_conf.types = flow_hw_ctrl_rx_rss_type_hash_types(rss_type);
+	if (rss_conf.types) {
+		for (i = 0; i < priv->reta_idx_n; ++i)
+			queue[i] = (*priv->reta_idx)[i];
+	} else {
+		rss_conf.queue_num = 1;
+		queue[0] = (*priv->reta_idx)[0];
+	}
+	at = flow_hw_actions_template_create(dev, &attr, actions, masks, &error);
+	if (!at)
+		DRV_LOG(ERR,
+			"Failed to create ctrl flow actions template: rte_errno(%d), type(%d): %s",
+			rte_errno, error.type,
+			error.message ? error.message : "(no stated reason)");
+	return at;
+}
+
+static uint32_t ctrl_rx_rss_priority_map[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX] = {
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_NON_IP] = MLX5_HW_CTRL_RX_PRIO_L2,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4] = MLX5_HW_CTRL_RX_PRIO_L3,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP] = MLX5_HW_CTRL_RX_PRIO_L4,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP] = MLX5_HW_CTRL_RX_PRIO_L4,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6] = MLX5_HW_CTRL_RX_PRIO_L3,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP] = MLX5_HW_CTRL_RX_PRIO_L4,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP] = MLX5_HW_CTRL_RX_PRIO_L4,
+};
+
+static uint32_t ctrl_rx_nb_flows_map[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX] = {
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN] = MLX5_MAX_VLAN_IDS,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN] = MLX5_MAX_VLAN_IDS,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN] = MLX5_MAX_VLAN_IDS,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC] = MLX5_MAX_UC_MAC_ADDRESSES,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN] =
+			MLX5_MAX_UC_MAC_ADDRESSES * MLX5_MAX_VLAN_IDS,
+};
+
+static struct rte_flow_template_table_attr
+flow_hw_get_ctrl_rx_table_attr(enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type,
+			       const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	return (struct rte_flow_template_table_attr){
+		.flow_attr = {
+			.group = 0,
+			.priority = ctrl_rx_rss_priority_map[rss_type],
+			.ingress = 1,
+		},
+		.nb_flows = ctrl_rx_nb_flows_map[eth_pattern_type],
+	};
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_eth_item(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.mask = NULL,
+	};
+
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+		item.mask = &ctrl_rx_eth_promisc_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+		item.mask = &ctrl_rx_eth_mcast_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		item.mask = &ctrl_rx_eth_dmac_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+		item.mask = &ctrl_rx_eth_ipv4_mcast_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		item.mask = &ctrl_rx_eth_ipv6_mcast_mask;
+		break;
+	default:
+		/* Should not reach here - ETH mask must be present. */
+		item.type = RTE_FLOW_ITEM_TYPE_END;
+		MLX5_ASSERT(false);
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_vlan_item(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_VOID,
+		.mask = NULL,
+	};
+
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		item.type = RTE_FLOW_ITEM_TYPE_VLAN;
+		item.mask = &rte_flow_item_vlan_mask;
+		break;
+	default:
+		/* Nothing to update. */
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_l3_item(const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_VOID,
+		.mask = NULL,
+	};
+
+	switch (rss_type) {
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP:
+		item.type = RTE_FLOW_ITEM_TYPE_IPV4;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP:
+		item.type = RTE_FLOW_ITEM_TYPE_IPV6;
+		break;
+	default:
+		/* Nothing to update. */
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_l4_item(const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_VOID,
+		.mask = NULL,
+	};
+
+	switch (rss_type) {
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP:
+		item.type = RTE_FLOW_ITEM_TYPE_UDP;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP:
+		item.type = RTE_FLOW_ITEM_TYPE_TCP;
+		break;
+	default:
+		/* Nothing to update. */
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_rx_pattern_template
+		(struct rte_eth_dev *dev,
+		 const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type,
+		 const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	const struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.ingress = 1,
+	};
+	struct rte_flow_item items[] = {
+		/* Matching patterns */
+		flow_hw_get_ctrl_rx_eth_item(eth_pattern_type),
+		flow_hw_get_ctrl_rx_vlan_item(eth_pattern_type),
+		flow_hw_get_ctrl_rx_l3_item(rss_type),
+		flow_hw_get_ctrl_rx_l4_item(rss_type),
+		/* Terminate pattern */
+		{ .type = RTE_FLOW_ITEM_TYPE_END }
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+static int
+flow_hw_create_ctrl_rx_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	unsigned int j;
+	int ret;
+
+	MLX5_ASSERT(!priv->hw_ctrl_rx);
+	priv->hw_ctrl_rx = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*priv->hw_ctrl_rx),
+				       RTE_CACHE_LINE_SIZE, rte_socket_id());
+	if (!priv->hw_ctrl_rx) {
+		DRV_LOG(ERR, "Failed to allocate memory for Rx control flow tables");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	/* Create all pattern template variants. */
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type = i;
+
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type = j;
+			struct rte_flow_template_table_attr attr;
+			struct rte_flow_pattern_template *pt;
+
+			attr = flow_hw_get_ctrl_rx_table_attr(eth_pattern_type, rss_type);
+			pt = flow_hw_create_ctrl_rx_pattern_template(dev, eth_pattern_type,
+								     rss_type);
+			if (!pt)
+				goto err;
+			priv->hw_ctrl_rx->tables[i][j].attr = attr;
+			priv->hw_ctrl_rx->tables[i][j].pt = pt;
+		}
+	}
+	return 0;
+err:
+	ret = rte_errno;
+	flow_hw_cleanup_ctrl_rx_tables(dev);
+	rte_errno = ret;
+	return -ret;
+}
+
+void
+mlx5_flow_hw_cleanup_ctrl_rx_templates(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_hw_ctrl_rx *hw_ctrl_rx;
+	unsigned int i;
+	unsigned int j;
+
+	if (!priv->dr_ctx)
+		return;
+	if (!priv->hw_ctrl_rx)
+		return;
+	hw_ctrl_rx = priv->hw_ctrl_rx;
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			struct mlx5_flow_hw_ctrl_rx_table *tmpls = &hw_ctrl_rx->tables[i][j];
+
+			if (tmpls->tbl) {
+				claim_zero(flow_hw_table_destroy(dev, tmpls->tbl, NULL));
+				tmpls->tbl = NULL;
+			}
+		}
+	}
+	for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+		if (hw_ctrl_rx->rss[j]) {
+			claim_zero(flow_hw_actions_template_destroy(dev, hw_ctrl_rx->rss[j], NULL));
+			hw_ctrl_rx->rss[j] = NULL;
+		}
+	}
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -6512,6 +6942,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	priv->nb_queue = nb_q_updated;
 	rte_spinlock_init(&priv->hw_ctrl_lock);
 	LIST_INIT(&priv->hw_ctrl_flows);
+	ret = flow_hw_create_ctrl_rx_tables(dev);
+	if (ret) {
+		rte_flow_error_set(error, -ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   "Failed to set up Rx control flow templates");
+		goto err;
+	}
 	/* Initialize meter library*/
 	if (port_attr->nb_meters)
 		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1, nb_q_updated))
@@ -6665,6 +7101,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
 	flow_hw_cleanup_tx_repr_tagging(dev);
+	flow_hw_cleanup_ctrl_rx_tables(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -8460,6 +8897,368 @@ mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn)
 					items, 0, actions, 0);
 }
 
+static uint32_t
+__calc_pattern_flags(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+		return MLX5_CTRL_PROMISCUOUS;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+		return MLX5_CTRL_ALL_MULTICAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+		return MLX5_CTRL_BROADCAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+		return MLX5_CTRL_IPV4_MULTICAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		return MLX5_CTRL_IPV6_MULTICAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		return MLX5_CTRL_DMAC;
+	default:
+		/* Should not reach here. */
+		MLX5_ASSERT(false);
+		return 0;
+	}
+}
+
+static uint32_t
+__calc_vlan_flags(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		return MLX5_CTRL_VLAN_FILTER;
+	default:
+		return 0;
+	}
+}
+
+static bool
+eth_pattern_type_is_requested(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type,
+			      uint32_t flags)
+{
+	uint32_t pattern_flags = __calc_pattern_flags(eth_pattern_type);
+	uint32_t vlan_flags = __calc_vlan_flags(eth_pattern_type);
+	bool pattern_requested = !!(pattern_flags & flags);
+	bool consider_vlan = vlan_flags || (MLX5_CTRL_VLAN_FILTER & flags);
+	bool vlan_requested = !!(vlan_flags & flags);
+
+	if (consider_vlan)
+		return pattern_requested && vlan_requested;
+	else
+		return pattern_requested;
+}
+
+static bool
+rss_type_is_requested(struct mlx5_priv *priv,
+		      const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_actions_template *at = priv->hw_ctrl_rx->rss[rss_type];
+	unsigned int i;
+
+	for (i = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		if (at->actions[i].type == RTE_FLOW_ACTION_TYPE_RSS) {
+			const struct rte_flow_action_rss *rss = at->actions[i].conf;
+			uint64_t rss_types = rss->types;
+
+			if ((rss_types & priv->rss_conf.rss_hf) != rss_types)
+				return false;
+		}
+	}
+	return true;
+}
+
+static const struct rte_flow_item_eth *
+__get_eth_spec(const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern)
+{
+	switch (pattern) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+		return &ctrl_rx_eth_promisc_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+		return &ctrl_rx_eth_mcast_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+		return &ctrl_rx_eth_bcast_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+		return &ctrl_rx_eth_ipv4_mcast_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		return &ctrl_rx_eth_ipv6_mcast_spec;
+	default:
+		/* This case should not be reached. */
+		MLX5_ASSERT(false);
+		return NULL;
+	}
+}
+
+static int
+__flow_hw_ctrl_flows_single(struct rte_eth_dev *dev,
+			    struct rte_flow_template_table *tbl,
+			    const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+			    const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	const struct rte_flow_item_eth *eth_spec = __get_eth_spec(pattern_type);
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+
+	if (!eth_spec)
+		return -EINVAL;
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = eth_spec,
+	};
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VOID };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	/* Without VLAN filtering, only a single flow rule must be created. */
+	return flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0);
+}
+
+static int
+__flow_hw_ctrl_flows_single_vlan(struct rte_eth_dev *dev,
+				 struct rte_flow_template_table *tbl,
+				 const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+				 const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_item_eth *eth_spec = __get_eth_spec(pattern_type);
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+	unsigned int i;
+
+	if (!eth_spec)
+		return -EINVAL;
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = eth_spec,
+	};
+	/* Optional VLAN for now will be VOID - will be filled later. */
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VLAN };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	/* Since VLAN filtering is done, create a single flow rule for each registered vid. */
+	for (i = 0; i < priv->vlan_filter_n; ++i) {
+		uint16_t vlan = priv->vlan_filter[i];
+		struct rte_flow_item_vlan vlan_spec = {
+			.tci = rte_cpu_to_be_16(vlan),
+		};
+
+		items[1].spec = &vlan_spec;
+		if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0))
+			return -rte_errno;
+	}
+	return 0;
+}
+
+static int
+__flow_hw_ctrl_flows_unicast(struct rte_eth_dev *dev,
+			     struct rte_flow_template_table *tbl,
+			     const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+			     const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_item_eth eth_spec;
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+	const struct rte_ether_addr cmp = {
+		.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	};
+	unsigned int i;
+
+	RTE_SET_USED(pattern_type);
+
+	memset(&eth_spec, 0, sizeof(eth_spec));
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = &eth_spec,
+	};
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VOID };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	for (i = 0; i < MLX5_MAX_MAC_ADDRESSES; ++i) {
+		struct rte_ether_addr *mac = &dev->data->mac_addrs[i];
+
+		if (!memcmp(mac, &cmp, sizeof(*mac)))
+			continue;
+		memcpy(&eth_spec.dst.addr_bytes, mac->addr_bytes, RTE_ETHER_ADDR_LEN);
+		if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0))
+			return -rte_errno;
+	}
+	return 0;
+}
+
+static int
+__flow_hw_ctrl_flows_unicast_vlan(struct rte_eth_dev *dev,
+				  struct rte_flow_template_table *tbl,
+				  const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+				  const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_eth eth_spec;
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+	const struct rte_ether_addr cmp = {
+		.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	};
+	unsigned int i;
+	unsigned int j;
+
+	RTE_SET_USED(pattern_type);
+
+	memset(&eth_spec, 0, sizeof(eth_spec));
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = &eth_spec,
+	};
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VLAN };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	for (i = 0; i < MLX5_MAX_MAC_ADDRESSES; ++i) {
+		struct rte_ether_addr *mac = &dev->data->mac_addrs[i];
+
+		if (!memcmp(mac, &cmp, sizeof(*mac)))
+			continue;
+		memcpy(&eth_spec.dst.addr_bytes, mac->addr_bytes, RTE_ETHER_ADDR_LEN);
+		for (j = 0; j < priv->vlan_filter_n; ++j) {
+			uint16_t vlan = priv->vlan_filter[j];
+			struct rte_flow_item_vlan vlan_spec = {
+				.tci = rte_cpu_to_be_16(vlan),
+			};
+
+			items[1].spec = &vlan_spec;
+			if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0))
+				return -rte_errno;
+		}
+	}
+	return 0;
+}
+
+static int
+__flow_hw_ctrl_flows(struct rte_eth_dev *dev,
+		     struct rte_flow_template_table *tbl,
+		     const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+		     const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	switch (pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+		return __flow_hw_ctrl_flows_single(dev, tbl, pattern_type, rss_type);
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		return __flow_hw_ctrl_flows_single_vlan(dev, tbl, pattern_type, rss_type);
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC:
+		return __flow_hw_ctrl_flows_unicast(dev, tbl, pattern_type, rss_type);
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		return __flow_hw_ctrl_flows_unicast_vlan(dev, tbl, pattern_type, rss_type);
+	default:
+		/* Should not reach here. */
+		MLX5_ASSERT(false);
+		rte_errno = EINVAL;
+		return -EINVAL;
+	}
+}
+
+
+int
+mlx5_flow_hw_ctrl_flows(struct rte_eth_dev *dev, uint32_t flags)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_hw_ctrl_rx *hw_ctrl_rx;
+	unsigned int i;
+	unsigned int j;
+	int ret = 0;
+
+	RTE_SET_USED(priv);
+	RTE_SET_USED(flags);
+	if (!priv->dr_ctx) {
+		DRV_LOG(DEBUG, "port %u Control flow rules will not be created. "
+			       "HWS needs to be configured beforehand.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!priv->hw_ctrl_rx) {
+		DRV_LOG(ERR, "port %u Control flow rules templates were not created.",
+			dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	hw_ctrl_rx = priv->hw_ctrl_rx;
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type = i;
+
+		if (!eth_pattern_type_is_requested(eth_pattern_type, flags))
+			continue;
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type = j;
+			struct rte_flow_actions_template *at;
+			struct mlx5_flow_hw_ctrl_rx_table *tmpls = &hw_ctrl_rx->tables[i][j];
+			const struct mlx5_flow_template_table_cfg cfg = {
+				.attr = tmpls->attr,
+				.external = 0,
+			};
+
+			if (!hw_ctrl_rx->rss[rss_type]) {
+				at = flow_hw_create_ctrl_rx_rss_template(dev, rss_type);
+				if (!at)
+					return -rte_errno;
+				hw_ctrl_rx->rss[rss_type] = at;
+			} else {
+				at = hw_ctrl_rx->rss[rss_type];
+			}
+			if (!rss_type_is_requested(priv, rss_type))
+				continue;
+			if (!tmpls->tbl) {
+				tmpls->tbl = flow_hw_table_create(dev, &cfg,
+								  &tmpls->pt, 1, &at, 1, NULL);
+				if (!tmpls->tbl) {
+					DRV_LOG(ERR, "port %u Failed to create template table "
+						     "for control flow rules. Unable to create "
+						     "control flow rules.",
+						     dev->data->port_id);
+					return -rte_errno;
+				}
+			}
+
+			ret = __flow_hw_ctrl_flows(dev, tmpls->tbl, eth_pattern_type, rss_type);
+			if (ret) {
+				DRV_LOG(ERR, "port %u Failed to create control flow rule.",
+					dev->data->port_id);
+				return ret;
+			}
+		}
+	}
+	return 0;
+}
+
 void
 mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 {
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b1543b480e..b7818f9598 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2568,13 +2568,14 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_ind_table_obj *ind_tbl;
 	int ret;
+	uint32_t max_queues_n = priv->rxqs_n > queues_n ? priv->rxqs_n : queues_n;
 
 	/*
 	 * Allocate maximum queues for shared action as queue number
 	 * maybe modified later.
 	 */
 	ind_tbl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*ind_tbl) +
-			      (standalone ? priv->rxqs_n : queues_n) *
+			      (standalone ? max_queues_n : queues_n) *
 			      sizeof(uint16_t), 0, SOCKET_ID_ANY);
 	if (!ind_tbl) {
 		rte_errno = ENOMEM;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 8c9d5c1b13..4b821a1076 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1415,6 +1415,9 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
 	mlx5_action_handle_detach(dev);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	mlx5_flow_hw_cleanup_ctrl_rx_templates(dev);
+#endif
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
@@ -1435,6 +1438,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_sh_config *config = &priv->sh->config;
+	uint64_t flags = 0;
 	unsigned int i;
 	int ret;
 
@@ -1481,7 +1485,18 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	} else {
 		DRV_LOG(INFO, "port %u FDB default rule is disabled", dev->data->port_id);
 	}
-	return 0;
+	if (priv->isolated)
+		return 0;
+	if (dev->data->promiscuous)
+		flags |= MLX5_CTRL_PROMISCUOUS;
+	if (dev->data->all_multicast)
+		flags |= MLX5_CTRL_ALL_MULTICAST;
+	else
+		flags |= MLX5_CTRL_BROADCAST | MLX5_CTRL_IPV4_MULTICAST | MLX5_CTRL_IPV6_MULTICAST;
+	flags |= MLX5_CTRL_DMAC;
+	if (priv->vlan_filter_n)
+		flags |= MLX5_CTRL_VLAN_FILTER;
+	return mlx5_flow_hw_ctrl_flows(dev, flags);
 error:
 	ret = rte_errno;
 	mlx5_flow_hw_flush_ctrl_flows(dev);
@@ -1717,6 +1732,9 @@ mlx5_traffic_restart(struct rte_eth_dev *dev)
 {
 	if (dev->data->dev_started) {
 		mlx5_traffic_disable(dev);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		mlx5_flow_hw_cleanup_ctrl_rx_templates(dev);
+#endif
 		return mlx5_traffic_enable(dev);
 	}
 	return 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 00/18] net/mlx5: HW steering PMD update
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (29 preceding siblings ...)
  2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-10-20  3:21 ` Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
                     ` (17 more replies)
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
  31 siblings, 18 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:21 UTC (permalink / raw)
  Cc: dev, rasland, orika

The skeleton of mlx5 HW steering(HWS) has been updated into
upstream for pretty a long time, but not updated anymore due
to missing of the low-level steering layer code. Luckily,
better late than never, the steering layer finnaly comes[1].

This series will add more features to the existing PMD code:
 - FDB and metadata copy.
 - Modify field.
 - Meter color.
 - Counter.
 - Aging.
 - Action template pre-parser optimization.
 - Connection tracking.
 - Control flow.

Some features such as meter/aging/ct touches the public API,
and public API changes have been sent to ML much earily in
other threads in order not to be swallowed by this big series.

The dpends patches as below:
 [1]https://inbox.dpdk.org/dev/20220922190345.394-1-valex@nvidia.com/

---

 v5:
  - Rebase to the latest version.

 v4:
  - Disable aging due to the flow age API change still in progress.
    https://patches.dpdk.org/project/dpdk/cover/20221019144904.2543586-1-michaelba@nvidia.com/
  - Add control flow for HWS.

 v3:
  - Fixed flow can't be aged out.
  - Fix error not be filled properly while table creat failed.
  - Remove transfer_mode in flow attributes before ethdev layer applied.
    https://patches.dpdk.org/project/dpdk/patch/20220928092425.68214-1-rongweil@nvidia.com/

 v2:
  - Remove the rte_flow patches as they will be integrated in other thread.
  - Fix compilation issues.
  - Make the patches be better organized.

Alexander Kozyrev (2):
  net/mlx5: add HW steering meter action
  net/mlx5: implement METER MARK indirect action for HWS

Bing Zhao (1):
  net/mlx5: add extended metadata mode for hardware steering

Dariusz Sosnowski (5):
  net/mlx5: add HW steering port action
  net/mlx5: support DR action template API
  net/mlx5: support device control for E-Switch default rule
  net/mlx5: support device control of representor matching
  net/mlx5: create control flow rules with HWS

Gregory Etelson (2):
  net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  net/mlx5: support flow integrity in HWS group 0

Michael Baum (1):
  net/mlx5: add HWS AGE action support

Suanming Mou (6):
  net/mlx5: fix invalid flow attributes
  net/mlx5: fix IPv6 and TCP RSS hash fields
  net/mlx5: add shared header reformat support
  net/mlx5: add modify field hws support
  net/mlx5: add HW steering connection tracking support
  net/mlx5: add async action push and pull support

Xiaoyu Min (1):
  net/mlx5: add HW steering counter action

 doc/guides/nics/mlx5.rst               |   43 +-
 doc/guides/rel_notes/release_22_11.rst |    8 +-
 drivers/common/mlx5/mlx5_devx_cmds.c   |   50 +
 drivers/common/mlx5/mlx5_devx_cmds.h   |   27 +
 drivers/common/mlx5/mlx5_prm.h         |   22 +-
 drivers/common/mlx5/version.map        |    1 +
 drivers/net/mlx5/linux/mlx5_os.c       |   78 +-
 drivers/net/mlx5/meson.build           |    1 +
 drivers/net/mlx5/mlx5.c                |  126 +-
 drivers/net/mlx5/mlx5.h                |  322 +-
 drivers/net/mlx5/mlx5_defs.h           |    5 +
 drivers/net/mlx5/mlx5_flow.c           |  409 +-
 drivers/net/mlx5/mlx5_flow.h           |  335 +-
 drivers/net/mlx5/mlx5_flow_aso.c       |  797 ++-
 drivers/net/mlx5/mlx5_flow_dv.c        | 1128 +--
 drivers/net/mlx5/mlx5_flow_hw.c        | 8789 +++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow_meter.c     |  776 ++-
 drivers/net/mlx5/mlx5_flow_verbs.c     |    8 +-
 drivers/net/mlx5/mlx5_hws_cnt.c        | 1247 ++++
 drivers/net/mlx5/mlx5_hws_cnt.h        |  703 ++
 drivers/net/mlx5/mlx5_rxq.c            |    3 +-
 drivers/net/mlx5/mlx5_trigger.c        |  272 +-
 drivers/net/mlx5/mlx5_tx.h             |    1 +
 drivers/net/mlx5/mlx5_txq.c            |   47 +
 drivers/net/mlx5/mlx5_utils.h          |   10 +-
 drivers/net/mlx5/rte_pmd_mlx5.h        |   17 +
 drivers/net/mlx5/version.map           |    1 +
 27 files changed, 13586 insertions(+), 1640 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 01/18] net/mlx5: fix invalid flow attributes
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-10-20  3:21   ` Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:21 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the function flow_get_drv_type(), attr will be read in non-HWS mode.
In case user call the HWS API in SWS mode, attr should be placed in
HWS functions, or it will cause crash.

Fixes: c40c061a022e ("net/mlx5: add basic flow queue operation")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c | 38 ++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 2c6acd551c..f36e72fb89 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -3742,6 +3742,8 @@ flow_get_drv_type(struct rte_eth_dev *dev, const struct rte_flow_attr *attr)
 	 */
 	if (priv->sh->config.dv_flow_en == 2)
 		return MLX5_FLOW_TYPE_HW;
+	if (!attr)
+		return MLX5_FLOW_TYPE_MIN;
 	/* If no OS specific type - continue with DV/VERBS selection */
 	if (attr->transfer && priv->sh->config.dv_esw_en)
 		type = MLX5_FLOW_TYPE_DV;
@@ -8254,8 +8256,9 @@ mlx5_flow_info_get(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8289,8 +8292,9 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8321,8 +8325,9 @@ mlx5_flow_pattern_template_create(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8352,8 +8357,9 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8387,8 +8393,9 @@ mlx5_flow_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8418,8 +8425,9 @@ mlx5_flow_actions_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8459,8 +8467,9 @@ mlx5_flow_table_create(struct rte_eth_dev *dev,
 		       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8496,8 +8505,9 @@ mlx5_flow_table_destroy(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8544,8 +8554,9 @@ mlx5_flow_async_flow_create(struct rte_eth_dev *dev,
 			    struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8587,8 +8598,9 @@ mlx5_flow_async_flow_destroy(struct rte_eth_dev *dev,
 			     struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8623,8 +8635,9 @@ mlx5_flow_pull(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8652,8 +8665,9 @@ mlx5_flow_push(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
@ 2022-10-20  3:21   ` Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 03/18] net/mlx5: add shared header reformat support Suanming Mou
                     ` (15 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:21 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the flow_dv_hashfields_set() function, while item_flags was 0,
the code went directly to the first if and the else case would
never have chance be checked. This caused the IPv6 and TCP hash
fields in the else case would never be set.

This commit adds the dedicate HW steering hash field set function
to generate the RSS hash fields.

Fixes: 3a2f674b6aa8 ("net/mlx5: add queue and RSS HW steering action")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 12 +++----
 drivers/net/mlx5/mlx5_flow_hw.c | 59 ++++++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 0cf757898d..29d7bf7049 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -11274,8 +11274,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		rss_inner = 1;
 #endif
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV4)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4)) ||
-	     !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4))) {
 		if (rss_types & MLX5_IPV4_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV4;
@@ -11285,8 +11284,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_IPV4_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV6)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6))) {
 		if (rss_types & MLX5_IPV6_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV6;
@@ -11309,8 +11307,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		return;
 	}
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_UDP)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP)) ||
-	    !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP))) {
 		if (rss_types & RTE_ETH_RSS_UDP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_UDP;
@@ -11320,8 +11317,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_UDP_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_TCP)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP))) {
 		if (rss_types & RTE_ETH_RSS_TCP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_TCP;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index fecf28c1ca..d46e4c6769 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -62,6 +62,63 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	priv->mark_enabled = enable;
 }
 
+/**
+ * Set the hash fields according to the @p rss_desc information.
+ *
+ * @param[in] rss_desc
+ *   Pointer to the mlx5_flow_rss_desc.
+ * @param[out] hash_fields
+ *   Pointer to the RSS hash fields.
+ */
+static void
+flow_hw_hashfields_set(struct mlx5_flow_rss_desc *rss_desc,
+		       uint64_t *hash_fields)
+{
+	uint64_t fields = 0;
+	int rss_inner = 0;
+	uint64_t rss_types = rte_eth_rss_hf_refine(rss_desc->types);
+
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	if (rss_desc->level >= 2)
+		rss_inner = 1;
+#endif
+	if (rss_types & MLX5_IPV4_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV4;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV4;
+		else
+			fields |= MLX5_IPV4_IBV_RX_HASH;
+	} else if (rss_types & MLX5_IPV6_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV6;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV6;
+		else
+			fields |= MLX5_IPV6_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_UDP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_UDP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_UDP;
+		else
+			fields |= MLX5_UDP_IBV_RX_HASH;
+	} else if (rss_types & RTE_ETH_RSS_TCP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_TCP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_TCP;
+		else
+			fields |= MLX5_TCP_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_ESP)
+		fields |= IBV_RX_HASH_IPSEC_SPI;
+	if (rss_inner)
+		fields |= IBV_RX_HASH_INNER;
+	*hash_fields = fields;
+}
+
 /**
  * Generate the pattern item flags.
  * Will be used for shared RSS action.
@@ -225,7 +282,7 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 		       MLX5_RSS_HASH_KEY_LEN);
 		rss_desc.key_len = MLX5_RSS_HASH_KEY_LEN;
 		rss_desc.types = !rss->types ? RTE_ETH_RSS_IP : rss->types;
-		flow_dv_hashfields_set(0, &rss_desc, &rss_desc.hash_fields);
+		flow_hw_hashfields_set(&rss_desc, &rss_desc.hash_fields);
 		flow_dv_action_rss_l34_hash_adjust(rss->types,
 						   &rss_desc.hash_fields);
 		if (rss->level > 1) {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 03/18] net/mlx5: add shared header reformat support
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
@ 2022-10-20  3:21   ` Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 04/18] net/mlx5: add modify field hws support Suanming Mou
                     ` (14 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:21 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

As the rte_flow_async API defines, the action mask with field value
not be 0 means the action will be used as shared in all the flows in
the table.

The header reformat action with action mask field not be 0 will be
created as constant shared action. For encapsulation header reformat
action, there are two kinds of encapsulation data, raw_encap_data
and rte_flow_item encap_data. Both of these two kinds of data can be
identified from the action mask conf as constant or not.

Examples:
1. VXLAN encap (encap_data: rte_flow_item)
	action conf (eth/ipv4/udp/vxlan_hdr)

	a. action mask conf (eth/ipv4/udp/vxlan_hdr)
	  - items are constant.
	b. action mask conf (NULL)
	  - items will change.

2. RAW encap (encap_data: raw)
	action conf (raw_data)

	a. action mask conf (not NULL)
	  - encap_data constant.
	b. action mask conf (NULL)
	  - encap_data will change.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   6 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 124 ++++++++++----------------------
 2 files changed, 39 insertions(+), 91 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index cde602d3a1..26660da0de 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1075,10 +1075,6 @@ struct mlx5_action_construct_data {
 	uint16_t action_dst; /* mlx5dr_rule_action dst offset. */
 	union {
 		struct {
-			/* encap src(item) offset. */
-			uint16_t src;
-			/* encap dst data offset. */
-			uint16_t dst;
 			/* encap data len. */
 			uint16_t len;
 		} encap;
@@ -1121,6 +1117,8 @@ struct mlx5_hw_jump_action {
 /* Encap decap action struct. */
 struct mlx5_hw_encap_decap_action {
 	struct mlx5dr_action *action; /* Action object. */
+	/* Is header_reformat action shared across flows in table. */
+	bool shared;
 	size_t data_size; /* Action metadata size. */
 	uint8_t data[]; /* Action data. */
 };
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index d46e4c6769..e62d25fda2 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -402,10 +402,6 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
  *   Offset of source rte flow action.
  * @param[in] action_dst
  *   Offset of destination DR action.
- * @param[in] encap_src
- *   Offset of source encap raw data.
- * @param[in] encap_dst
- *   Offset of destination encap raw data.
  * @param[in] len
  *   Length of the data to be updated.
  *
@@ -418,16 +414,12 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				enum rte_flow_action_type type,
 				uint16_t action_src,
 				uint16_t action_dst,
-				uint16_t encap_src,
-				uint16_t encap_dst,
 				uint16_t len)
 {	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
 		return -1;
-	act_data->encap.src = encap_src;
-	act_data->encap.dst = encap_dst;
 	act_data->encap.len = len;
 	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
 	return 0;
@@ -523,53 +515,6 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
-/**
- * Translate encap items to encapsulation list.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev data structure.
- * @param[in] acts
- *   Pointer to the template HW steering DR actions.
- * @param[in] type
- *   Action type.
- * @param[in] action_src
- *   Offset of source rte flow action.
- * @param[in] action_dst
- *   Offset of destination DR action.
- * @param[in] items
- *   Encap item pattern.
- * @param[in] items_m
- *   Encap item mask indicates which part are constant and dynamic.
- *
- * @return
- *    0 on success, negative value otherwise and rte_errno is set.
- */
-static __rte_always_inline int
-flow_hw_encap_item_translate(struct rte_eth_dev *dev,
-			     struct mlx5_hw_actions *acts,
-			     enum rte_flow_action_type type,
-			     uint16_t action_src,
-			     uint16_t action_dst,
-			     const struct rte_flow_item *items,
-			     const struct rte_flow_item *items_m)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	size_t len, total_len = 0;
-	uint32_t i = 0;
-
-	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++, items_m++, i++) {
-		len = flow_dv_get_item_hdr_len(items->type);
-		if ((!items_m->spec ||
-		    memcmp(items_m->spec, items->spec, len)) &&
-		    __flow_hw_act_data_encap_append(priv, acts, type,
-						    action_src, action_dst, i,
-						    total_len, len))
-			return -1;
-		total_len += len;
-	}
-	return 0;
-}
-
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -611,7 +556,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
-	uint8_t *encap_data = NULL;
+	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	bool actions_end = false;
 	uint32_t type, i;
@@ -718,9 +663,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_vxlan_encap *)
-				 masks->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -729,9 +674,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_nvgre_encap *)
-				actions->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -743,6 +688,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data =
+				(const struct rte_flow_action_raw_encap *)
+				 masks->conf;
+			if (raw_encap_data)
+				encap_data_m = raw_encap_data->data;
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 actions->conf;
@@ -773,22 +723,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
+		bool shared_rfmt = true;
 
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
-			if (flow_dv_convert_encap_data
-				(enc_item, buf, &data_size, error) ||
-			    flow_hw_encap_item_translate
-				(dev, acts, (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos,
-				 enc_item, enc_item_m))
+			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
 				goto err;
 			encap_data = buf;
-		} else if (encap_data && __flow_hw_act_data_encap_append
-				(priv, acts,
-				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, 0, 0, data_size)) {
-			goto err;
+			if (!enc_item_m)
+				shared_rfmt = false;
+		} else if (encap_data && !encap_data_m) {
+			shared_rfmt = false;
 		}
 		acts->encap_decap = mlx5_malloc(MLX5_MEM_ZERO,
 				    sizeof(*acts->encap_decap) + data_size,
@@ -802,12 +747,22 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		acts->encap_decap->action = mlx5dr_action_create_reformat
 				(priv->dr_ctx, refmt_type,
 				 data_size, encap_data,
-				 rte_log2_u32(table_attr->nb_flows),
-				 mlx5_hw_act_flag[!!attr->group][type]);
+				 shared_rfmt ? 0 : rte_log2_u32(table_attr->nb_flows),
+				 mlx5_hw_act_flag[!!attr->group][type] |
+				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
 		acts->rule_acts[reformat_pos].action =
 						acts->encap_decap->action;
+		acts->rule_acts[reformat_pos].reformat.data =
+						acts->encap_decap->data;
+		if (shared_rfmt)
+			acts->rule_acts[reformat_pos].reformat.offset = 0;
+		else if (__flow_hw_act_data_encap_append(priv, acts,
+				 (action_start + reformat_src)->type,
+				 reformat_src, reformat_pos, data_size))
+			goto err;
+		acts->encap_decap->shared = shared_rfmt;
 		acts->encap_decap_pos = reformat_pos;
 	}
 	acts->acts_num = i;
@@ -972,6 +927,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			.ingress = 1,
 	};
 	uint32_t ft_flag;
+	size_t encap_len = 0;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -989,9 +945,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
-	if (hw_acts->encap_decap && hw_acts->encap_decap->data_size)
-		memcpy(buf, hw_acts->encap_decap->data,
-		       hw_acts->encap_decap->data_size);
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1050,23 +1003,20 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 action->conf;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   raw_encap_data->data, act_data->encap.len);
+			rte_memcpy((void *)buf, raw_encap_data->data, act_data->encap.len);
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
@@ -1074,7 +1024,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
-	if (hw_acts->encap_decap) {
+	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 04/18] net/mlx5: add modify field hws support
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (2 preceding siblings ...)
  2022-10-20  3:21   ` [PATCH v5 03/18] net/mlx5: add shared header reformat support Suanming Mou
@ 2022-10-20  3:21   ` Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 05/18] net/mlx5: add HW steering port action Suanming Mou
                     ` (13 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:21 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

This patch introduces support for modify_field rte_flow actions in HWS
mode. Support includes:

- Ingress and egress domains,
	- SET and ADD operations,
	- usage of arbitrary bit offsets and widths for packet and metadata
	fields.

	Support is implemented in two phases:

	1. On flow table creation the hardware commands are generated, based
	on rte_flow action templates, and stored alongside action template.
	2. On flow rule creation/queueing the hardware commands are updated with
	values provided by the user. Any masks over immediate values, provided
	in action templates, are applied to these values before enqueueing rules
	for creation.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 doc/guides/rel_notes/release_22_11.rst |   3 +-
 drivers/common/mlx5/mlx5_prm.h         |   2 +
 drivers/net/mlx5/linux/mlx5_os.c       |  18 +-
 drivers/net/mlx5/mlx5.h                |   1 +
 drivers/net/mlx5/mlx5_flow.h           |  96 ++++
 drivers/net/mlx5/mlx5_flow_dv.c        | 551 +++++++++++-----------
 drivers/net/mlx5/mlx5_flow_hw.c        | 614 ++++++++++++++++++++++++-
 7 files changed, 1009 insertions(+), 276 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index eed7acc838..bac805dc0e 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -239,7 +239,8 @@ New Features
 
 * **Updated Nvidia mlx5 driver.**
 
-  * Added fully support for queue based async HW steering to the PMD.
+  * Added fully support for queue based async HW steering to the PMD:
+    - Support of modify fields.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 371942ae50..fb3c43eed9 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -751,6 +751,8 @@ enum mlx5_modification_field {
 	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
 	MLX5_MODI_GTP_TEID = 0x6E,
 	MLX5_MODI_OUT_IP_ECN = 0x73,
+	MLX5_MODI_TUNNEL_HDR_DW_1 = 0x75,
+	MLX5_MODI_GTPU_FIRST_EXT_DW_0 = 0x76,
 };
 
 /* Total number of metadata reg_c's. */
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index aed55e6a62..b7cc11a2ef 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1540,6 +1540,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				       mlx5_hrxq_clone_free_cb);
 	if (!priv->hrxqs)
 		goto error;
+	mlx5_set_metadata_mask(eth_dev);
+	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+	    !priv->sh->dv_regc0_mask) {
+		DRV_LOG(ERR, "metadata mode %u is not supported "
+			     "(no metadata reg_c[0] is available)",
+			     sh->config.dv_xmeta_en);
+			err = ENOTSUP;
+			goto error;
+	}
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
@@ -1566,15 +1575,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		err = -err;
 		goto error;
 	}
-	mlx5_set_metadata_mask(eth_dev);
-	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
-	    !priv->sh->dv_regc0_mask) {
-		DRV_LOG(ERR, "metadata mode %u is not supported "
-			     "(no metadata reg_c[0] is available)",
-			     sh->config.dv_xmeta_en);
-			err = ENOTSUP;
-			goto error;
-	}
 	/* Query availability of metadata reg_c's. */
 	if (!priv->sh->metadata_regc_check_flag) {
 		err = mlx5_flow_discover_mreg_c(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1d3c1ad93d..6f75a32488 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -348,6 +348,7 @@ struct mlx5_hw_q_job {
 	struct rte_flow_hw *flow; /* Flow attached to the job. */
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
+	struct mlx5_modification_cmd *mhdr_cmd;
 };
 
 /* HW steering job descriptor LIFO pool. */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 26660da0de..407b3d79bd 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1010,6 +1010,51 @@ flow_items_to_tunnel(const struct rte_flow_item items[])
 	return items[0].spec;
 }
 
+/**
+ * Fetch 1, 2, 3 or 4 byte field from the byte array
+ * and return as unsigned integer in host-endian format.
+ *
+ * @param[in] data
+ *   Pointer to data array.
+ * @param[in] size
+ *   Size of field to extract.
+ *
+ * @return
+ *   converted field in host endian format.
+ */
+static inline uint32_t
+flow_dv_fetch_field(const uint8_t *data, uint32_t size)
+{
+	uint32_t ret;
+
+	switch (size) {
+	case 1:
+		ret = *data;
+		break;
+	case 2:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		break;
+	case 3:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		ret = (ret << 8) | *(data + sizeof(uint16_t));
+		break;
+	case 4:
+		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
+		break;
+	default:
+		MLX5_ASSERT(false);
+		ret = 0;
+		break;
+	}
+	return ret;
+}
+
+struct field_modify_info {
+	uint32_t size; /* Size of field in protocol header, in bytes. */
+	uint32_t offset; /* Offset of field in protocol header, in bytes. */
+	enum mlx5_modification_field id;
+};
+
 /* HW steering flow attributes. */
 struct mlx5_flow_attr {
 	uint32_t port_id; /* Port index. */
@@ -1078,6 +1123,29 @@ struct mlx5_action_construct_data {
 			/* encap data len. */
 			uint16_t len;
 		} encap;
+		struct {
+			/* Modify header action offset in pattern. */
+			uint16_t mhdr_cmds_off;
+			/* Offset in pattern after modify header actions. */
+			uint16_t mhdr_cmds_end;
+			/*
+			 * True if this action is masked and does not need to
+			 * be generated.
+			 */
+			bool shared;
+			/*
+			 * Modified field definitions in dst field (SET, ADD)
+			 * or src field (COPY).
+			 */
+			struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS];
+			/* Modified field definitions in dst field (COPY). */
+			struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS];
+			/*
+			 * Masks applied to field values to generate
+			 * PRM actions.
+			 */
+			uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS];
+		} modify_header;
 		struct {
 			uint64_t types; /* RSS hash types. */
 			uint32_t level; /* RSS level. */
@@ -1103,6 +1171,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 };
 
@@ -1123,6 +1192,22 @@ struct mlx5_hw_encap_decap_action {
 	uint8_t data[]; /* Action data. */
 };
 
+#define MLX5_MHDR_MAX_CMD ((MLX5_MAX_MODIFY_NUM) * 2 + 1)
+
+/* Modify field action struct. */
+struct mlx5_hw_modify_header_action {
+	/* Reference to DR action */
+	struct mlx5dr_action *action;
+	/* Modify header action position in action rule table. */
+	uint16_t pos;
+	/* Is MODIFY_HEADER action shared across flows in table. */
+	bool shared;
+	/* Amount of modification commands stored in the precompiled buffer. */
+	uint32_t mhdr_cmds_num;
+	/* Precompiled modification commands. */
+	struct mlx5_modification_cmd mhdr_cmds[MLX5_MHDR_MAX_CMD];
+};
+
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
@@ -1132,6 +1217,7 @@ struct mlx5_hw_actions {
 	LIST_HEAD(act_list, mlx5_action_construct_data) act_list;
 	struct mlx5_hw_jump_action *jump; /* Jump action. */
 	struct mlx5_hrxq *tir; /* TIR action. */
+	struct mlx5_hw_modify_header_action *mhdr; /* Modify header action. */
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
@@ -2238,6 +2324,16 @@ int flow_dv_action_query(struct rte_eth_dev *dev,
 size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type);
 int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf,
 			   size_t *size, struct rte_flow_error *error);
+void mlx5_flow_field_id_to_modify_info
+		(const struct rte_flow_action_modify_data *data,
+		 struct field_modify_info *info, uint32_t *mask,
+		 uint32_t width, struct rte_eth_dev *dev,
+		 const struct rte_flow_attr *attr, struct rte_flow_error *error);
+int flow_dv_convert_modify_action(struct rte_flow_item *item,
+			      struct field_modify_info *field,
+			      struct field_modify_info *dcopy,
+			      struct mlx5_flow_dv_modify_hdr_resource *resource,
+			      uint32_t type, struct rte_flow_error *error);
 
 #define MLX5_PF_VPORT_ID 0
 #define MLX5_ECPF_VPORT_ID 0xFFFE
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 29d7bf7049..9fbaa4bfe8 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -216,12 +216,6 @@ flow_dv_attr_init(const struct rte_flow_item *item, union flow_dv_attr *attr,
 	attr->valid = 1;
 }
 
-struct field_modify_info {
-	uint32_t size; /* Size of field in protocol header, in bytes. */
-	uint32_t offset; /* Offset of field in protocol header, in bytes. */
-	enum mlx5_modification_field id;
-};
-
 struct field_modify_info modify_eth[] = {
 	{4,  0, MLX5_MODI_OUT_DMAC_47_16},
 	{2,  4, MLX5_MODI_OUT_DMAC_15_0},
@@ -354,45 +348,6 @@ mlx5_update_vlan_vid_pcp(const struct rte_flow_action *action,
 	}
 }
 
-/**
- * Fetch 1, 2, 3 or 4 byte field from the byte array
- * and return as unsigned integer in host-endian format.
- *
- * @param[in] data
- *   Pointer to data array.
- * @param[in] size
- *   Size of field to extract.
- *
- * @return
- *   converted field in host endian format.
- */
-static inline uint32_t
-flow_dv_fetch_field(const uint8_t *data, uint32_t size)
-{
-	uint32_t ret;
-
-	switch (size) {
-	case 1:
-		ret = *data;
-		break;
-	case 2:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		break;
-	case 3:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		ret = (ret << 8) | *(data + sizeof(uint16_t));
-		break;
-	case 4:
-		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
-		break;
-	default:
-		MLX5_ASSERT(false);
-		ret = 0;
-		break;
-	}
-	return ret;
-}
-
 /**
  * Convert modify-header action to DV specification.
  *
@@ -421,7 +376,7 @@ flow_dv_fetch_field(const uint8_t *data, uint32_t size)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-static int
+int
 flow_dv_convert_modify_action(struct rte_flow_item *item,
 			      struct field_modify_info *field,
 			      struct field_modify_info *dcopy,
@@ -1439,7 +1394,32 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static void
+static __rte_always_inline uint8_t
+flow_modify_info_mask_8(uint32_t length, uint32_t off)
+{
+	return (0xffu >> (8 - length)) << off;
+}
+
+static __rte_always_inline uint16_t
+flow_modify_info_mask_16(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_16((0xffffu >> (16 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_32((0xffffffffu >> (32 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mask)
+{
+	uint32_t mask = (0xffffffffu >> (32 - length)) << off;
+	return rte_cpu_to_be_32(mask & post_mask);
+}
+
+void
 mlx5_flow_field_id_to_modify_info
 		(const struct rte_flow_action_modify_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
@@ -1448,323 +1428,340 @@ mlx5_flow_field_id_to_modify_info
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint32_t idx = 0;
-	uint32_t off = 0;
-
-	switch (data->field) {
+	uint32_t off_be = 0;
+	uint32_t length = 0;
+	switch ((int)data->field) {
 	case RTE_FLOW_FIELD_START:
 		/* not supported yet */
 		MLX5_ASSERT(false);
 		break;
 	case RTE_FLOW_FIELD_MAC_DST:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_DMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_DMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_DMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_DMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_DMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_SRC:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_SMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_SMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_SMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_SMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_SMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VLAN_TYPE:
 		/* not supported yet */
 		break;
 	case RTE_FLOW_FIELD_VLAN_ID:
+		MLX5_ASSERT(data->offset + width <= 12);
+		off_be = 12 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_FIRST_VID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x0fff >> (12 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_TYPE:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_ETHERTYPE};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_TTL:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV4_TTL};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_SRC:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_SIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DST:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_DIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_HOPLIMIT:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV6_HOPLIMIT};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_SRC:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_SIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_SIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_SIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	case RTE_FLOW_FIELD_IPV6_SRC: {
+		/*
+		 * Fields corresponding to IPv6 source address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_SIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_SIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_SIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_SIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_DST:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_DIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_DIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_DIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	}
+	case RTE_FLOW_FIELD_IPV6_DST: {
+		/*
+		 * Fields corresponding to IPv6 destination address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_DIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_DIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_DIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_DIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
+	}
 	case RTE_FLOW_FIELD_TCP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_SEQ_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_SEQ_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_ACK_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_ACK_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_FLAGS:
+		MLX5_ASSERT(data->offset + width <= 9);
+		off_be = 9 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_FLAGS};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x1ff >> (9 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VXLAN_VNI:
-		/* not supported yet */
+		MLX5_ASSERT(data->offset + width <= 24);
+		/* VNI is on bits 31-8 of TUNNEL_HDR_DW_1. */
+		off_be = 24 - (data->offset + width) + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_TUNNEL_HDR_DW_1};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_GENEVE_VNI:
 		/* not supported yet*/
 		break;
 	case RTE_FLOW_FIELD_GTP_TEID:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_GTP_TEID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TAG:
 		{
-			int reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
-						   data->level, error);
+			MLX5_ASSERT(data->offset + width <= 32);
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = REG_C_1;
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
+							   data->level, error);
 			if (reg < 0)
 				return;
 			MLX5_ASSERT(reg != REG_NON);
@@ -1772,15 +1769,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] =
-					rte_cpu_to_be_32(0xffffffff >>
-							 (32 - width));
+				mask[idx] = flow_modify_info_mask_32
+					(width, data->offset);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_MARK:
 		{
 			uint32_t mark_mask = priv->sh->dv_mark_mask;
 			uint32_t mark_count = __builtin_popcount(mark_mask);
+			RTE_SET_USED(mark_count);
+			MLX5_ASSERT(data->offset + width <= mark_count);
 			int reg = mlx5_flow_get_reg_id(dev, MLX5_FLOW_MARK,
 						       0, error);
 			if (reg < 0)
@@ -1790,14 +1790,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((mark_mask >>
-					 (mark_count - width)) & mark_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, mark_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_META:
 		{
 			uint32_t meta_mask = priv->sh->dv_meta_mask;
 			uint32_t meta_count = __builtin_popcount(meta_mask);
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
 			int reg = flow_dv_get_metadata_reg(dev, attr, error);
 			if (reg < 0)
 				return;
@@ -1806,16 +1810,32 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((meta_mask >>
-					(meta_count - width)) & meta_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+		MLX5_ASSERT(data->offset + width <= 2);
+		off_be = 2 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_ECN};
 		if (mask)
-			mask[idx] = 0x3 >> (2 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
+		break;
+	case RTE_FLOW_FIELD_GTP_PSC_QFI:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = data->offset + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_GTPU_FIRST_EXT_DW_0};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
@@ -1865,7 +1885,8 @@ flow_dv_convert_action_modify_field
 
 	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
 	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
-		type = MLX5_MODIFICATION_TYPE_SET;
+		type = conf->operation == RTE_FLOW_MODIFY_SET ?
+			MLX5_MODIFICATION_TYPE_SET : MLX5_MODIFICATION_TYPE_ADD;
 		/** For SET fill the destination field (field) first. */
 		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
 						  conf->width, dev,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index e62d25fda2..fa7bd37737 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -319,6 +319,11 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->mhdr) {
+		if (acts->mhdr->action)
+			mlx5dr_action_destroy(acts->mhdr->action);
+		mlx5_free(acts->mhdr);
+	}
 }
 
 /**
@@ -425,6 +430,37 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+static __rte_always_inline int
+__flow_hw_act_data_hdr_modify_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     uint16_t mhdr_cmds_off,
+				     uint16_t mhdr_cmds_end,
+				     bool shared,
+				     struct field_modify_info *field,
+				     struct field_modify_info *dcopy,
+				     uint32_t *mask)
+{
+	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->modify_header.mhdr_cmds_off = mhdr_cmds_off;
+	act_data->modify_header.mhdr_cmds_end = mhdr_cmds_end;
+	act_data->modify_header.shared = shared;
+	rte_memcpy(act_data->modify_header.field, field,
+		   sizeof(*field) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.dcopy, dcopy,
+		   sizeof(*dcopy) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.mask, mask,
+		   sizeof(*mask) * MLX5_ACT_MAX_MOD_FIELDS);
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
 /**
  * Append shared RSS action to the dynamic action list.
  *
@@ -515,6 +551,265 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline bool
+flow_hw_action_modify_field_is_shared(const struct rte_flow_action *action,
+				      const struct rte_flow_action *mask)
+{
+	const struct rte_flow_action_modify_field *v = action->conf;
+	const struct rte_flow_action_modify_field *m = mask->conf;
+
+	if (v->src.field == RTE_FLOW_FIELD_VALUE) {
+		uint32_t j;
+
+		if (m == NULL)
+			return false;
+		for (j = 0; j < RTE_DIM(m->src.value); ++j) {
+			/*
+			 * Immediate value is considered to be masked
+			 * (and thus shared by all flow rules), if mask
+			 * is non-zero. Partial mask over immediate value
+			 * is not allowed.
+			 */
+			if (m->src.value[j])
+				return true;
+		}
+		return false;
+	}
+	if (v->src.field == RTE_FLOW_FIELD_POINTER)
+		return m->src.pvalue != NULL;
+	/*
+	 * Source field types other than VALUE and
+	 * POINTER are always shared.
+	 */
+	return true;
+}
+
+static __rte_always_inline bool
+flow_hw_should_insert_nop(const struct mlx5_hw_modify_header_action *mhdr,
+			  const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd last_cmd = { { 0 } };
+	struct mlx5_modification_cmd new_cmd = { { 0 } };
+	const uint32_t cmds_num = mhdr->mhdr_cmds_num;
+	unsigned int last_type;
+	bool should_insert = false;
+
+	if (cmds_num == 0)
+		return false;
+	last_cmd = *(&mhdr->mhdr_cmds[cmds_num - 1]);
+	last_cmd.data0 = rte_be_to_cpu_32(last_cmd.data0);
+	last_cmd.data1 = rte_be_to_cpu_32(last_cmd.data1);
+	last_type = last_cmd.action_type;
+	new_cmd = *cmd;
+	new_cmd.data0 = rte_be_to_cpu_32(new_cmd.data0);
+	new_cmd.data1 = rte_be_to_cpu_32(new_cmd.data1);
+	switch (new_cmd.action_type) {
+	case MLX5_MODIFICATION_TYPE_SET:
+	case MLX5_MODIFICATION_TYPE_ADD:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = new_cmd.field == last_cmd.field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = new_cmd.field == last_cmd.dst_field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	case MLX5_MODIFICATION_TYPE_COPY:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = (new_cmd.field == last_cmd.field ||
+					 new_cmd.dst_field == last_cmd.field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = (new_cmd.field == last_cmd.dst_field ||
+					 new_cmd.dst_field == last_cmd.dst_field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	default:
+		/* Other action types should be rejected on AT validation. */
+		MLX5_ASSERT(false);
+		break;
+	}
+	return should_insert;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_nop_append(struct mlx5_hw_modify_header_action *mhdr)
+{
+	struct mlx5_modification_cmd *nop;
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	nop = mhdr->mhdr_cmds + num;
+	nop->data0 = 0;
+	nop->action_type = MLX5_MODIFICATION_TYPE_NOP;
+	nop->data0 = rte_cpu_to_be_32(nop->data0);
+	nop->data1 = 0;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_append(struct mlx5_hw_modify_header_action *mhdr,
+			struct mlx5_modification_cmd *cmd)
+{
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	mhdr->mhdr_cmds[num] = *cmd;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_converted_mhdr_cmds_append(struct mlx5_hw_modify_header_action *mhdr,
+				   struct mlx5_flow_dv_modify_hdr_resource *resource)
+{
+	uint32_t idx;
+	int ret;
+
+	for (idx = 0; idx < resource->actions_num; ++idx) {
+		struct mlx5_modification_cmd *src = &resource->actions[idx];
+
+		if (flow_hw_should_insert_nop(mhdr, src)) {
+			ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+			if (ret)
+				return ret;
+		}
+		ret = flow_hw_mhdr_cmd_append(mhdr, src);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static __rte_always_inline void
+flow_hw_modify_field_init(struct mlx5_hw_modify_header_action *mhdr,
+			  struct rte_flow_actions_template *at)
+{
+	memset(mhdr, 0, sizeof(*mhdr));
+	/* Modify header action without any commands is shared by default. */
+	mhdr->shared = true;
+	mhdr->pos = at->mhdr_off;
+}
+
+static __rte_always_inline int
+flow_hw_modify_field_compile(struct rte_eth_dev *dev,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *action_start, /* Start of AT actions. */
+			     const struct rte_flow_action *action, /* Current action from AT. */
+			     const struct rte_flow_action *action_mask, /* Current mask from AT. */
+			     struct mlx5_hw_actions *acts,
+			     struct mlx5_hw_modify_header_action *mhdr,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_modify_field *conf = action->conf;
+	union {
+		struct mlx5_flow_dv_modify_hdr_resource resource;
+		uint8_t data[sizeof(struct mlx5_flow_dv_modify_hdr_resource) +
+			     sizeof(struct mlx5_modification_cmd) * MLX5_MHDR_MAX_CMD];
+	} dummy;
+	struct mlx5_flow_dv_modify_hdr_resource *resource;
+	struct rte_flow_item item = {
+		.spec = NULL,
+		.mask = NULL
+	};
+	struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS] = { 0 };
+	uint32_t type, value = 0;
+	uint16_t cmds_start, cmds_end;
+	bool shared;
+	int ret;
+
+	/*
+	 * Modify header action is shared if previous modify_field actions
+	 * are shared and currently compiled action is shared.
+	 */
+	shared = flow_hw_action_modify_field_is_shared(action, action_mask);
+	mhdr->shared &= shared;
+	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
+	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
+		type = conf->operation == RTE_FLOW_MODIFY_SET ? MLX5_MODIFICATION_TYPE_SET :
+								MLX5_MODIFICATION_TYPE_ADD;
+		/* For SET/ADD fill the destination field (field) first. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
+						  conf->width, dev,
+						  attr, error);
+		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
+				(void *)(uintptr_t)conf->src.pvalue :
+				(void *)(uintptr_t)&conf->src.value;
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+			value = *(const unaligned_uint32_t *)item.spec;
+			value = rte_cpu_to_be_32(value);
+			item.spec = &value;
+		} else if (conf->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+			/*
+			 * QFI is passed as an uint8_t integer, but it is accessed through
+			 * a 2nd least significant byte of a 32-bit field in modify header command.
+			 */
+			value = *(const uint8_t *)item.spec;
+			value = rte_cpu_to_be_32(value << 8);
+			item.spec = &value;
+		}
+	} else {
+		type = MLX5_MODIFICATION_TYPE_COPY;
+		/* For COPY fill the destination field (dcopy) without mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, dcopy, NULL,
+						  conf->width, dev,
+						  attr, error);
+		/* Then construct the source field (field) with mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->src, field, mask,
+						  conf->width, dev,
+						  attr, error);
+	}
+	item.mask = &mask;
+	memset(&dummy, 0, sizeof(dummy));
+	resource = &dummy.resource;
+	ret = flow_dv_convert_modify_action(&item, field, dcopy, resource, type, error);
+	if (ret)
+		return ret;
+	MLX5_ASSERT(resource->actions_num > 0);
+	/*
+	 * If previous modify field action collide with this one, then insert NOP command.
+	 * This NOP command will not be a part of action's command range used to update commands
+	 * on rule creation.
+	 */
+	if (flow_hw_should_insert_nop(mhdr, &resource->actions[0])) {
+		ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+		if (ret)
+			return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL, "too many modify field operations specified");
+	}
+	cmds_start = mhdr->mhdr_cmds_num;
+	ret = flow_hw_converted_mhdr_cmds_append(mhdr, resource);
+	if (ret)
+		return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "too many modify field operations specified");
+
+	cmds_end = mhdr->mhdr_cmds_num;
+	if (shared)
+		return 0;
+	ret = __flow_hw_act_data_hdr_modify_append(priv, acts, RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+						   action - action_start, mhdr->pos,
+						   cmds_start, cmds_end, shared,
+						   field, dcopy, mask);
+	if (ret)
+		return rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "not enough memory to store modify field metadata");
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -558,10 +853,12 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
+	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
 	uint32_t type, i;
 	int err;
 
+	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
 		type = MLX5DR_TABLE_TYPE_FDB;
 	else if (attr->egress)
@@ -714,6 +1011,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			reformat_pos = i++;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr.pos == UINT16_MAX)
+				mhdr.pos = i++;
+			err = flow_hw_modify_field_compile(dev, attr, action_start,
+							   actions, masks, acts, &mhdr,
+							   error);
+			if (err)
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -721,6 +1027,31 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (mhdr.pos != UINT16_MAX) {
+		uint32_t flags;
+		uint32_t bulk_size;
+		size_t mhdr_len;
+
+		acts->mhdr = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*acts->mhdr),
+					 0, SOCKET_ID_ANY);
+		if (!acts->mhdr)
+			goto err;
+		rte_memcpy(acts->mhdr, &mhdr, sizeof(*acts->mhdr));
+		mhdr_len = sizeof(struct mlx5_modification_cmd) * acts->mhdr->mhdr_cmds_num;
+		flags = mlx5_hw_act_flag[!!attr->group][type];
+		if (acts->mhdr->shared) {
+			flags |= MLX5DR_ACTION_FLAG_SHARED;
+			bulk_size = 0;
+		} else {
+			bulk_size = rte_log2_u32(table_attr->nb_flows);
+		}
+		acts->mhdr->action = mlx5dr_action_create_modify_header
+				(priv->dr_ctx, mhdr_len, (__be64 *)acts->mhdr->mhdr_cmds,
+				 bulk_size, flags);
+		if (!acts->mhdr->action)
+			goto err;
+		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
+	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
@@ -884,6 +1215,110 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_mhdr_cmd_is_nop(const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd cmd_he = {
+		.data0 = rte_be_to_cpu_32(cmd->data0),
+		.data1 = 0,
+	};
+
+	return cmd_he.action_type == MLX5_MODIFICATION_TYPE_NOP;
+}
+
+/**
+ * Construct flow action array.
+ *
+ * For action template contains dynamic actions, these actions need to
+ * be updated according to the rte_flow action during flow creation.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] job
+ *   Pointer to job descriptor.
+ * @param[in] hw_acts
+ *   Pointer to translated actions from template.
+ * @param[in] it_idx
+ *   Item template index the action template refer to.
+ * @param[in] actions
+ *   Array of rte_flow action need to be checked.
+ * @param[in] rule_acts
+ *   Array of DR rule actions to be used during flow creation..
+ * @param[in] acts_num
+ *   Pointer to the real acts_num flow has.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	const struct rte_flow_action_modify_field *mhdr_action = action->conf;
+	uint8_t values[16] = { 0 };
+	unaligned_uint32_t *value_p;
+	uint32_t i;
+	struct field_modify_info *field;
+
+	if (!hw_acts->mhdr)
+		return -1;
+	if (hw_acts->mhdr->shared || act_data->modify_header.shared)
+		return 0;
+	MLX5_ASSERT(mhdr_action->operation == RTE_FLOW_MODIFY_SET ||
+		    mhdr_action->operation == RTE_FLOW_MODIFY_ADD);
+	if (mhdr_action->src.field != RTE_FLOW_FIELD_VALUE &&
+	    mhdr_action->src.field != RTE_FLOW_FIELD_POINTER)
+		return 0;
+	if (mhdr_action->src.field == RTE_FLOW_FIELD_VALUE)
+		rte_memcpy(values, &mhdr_action->src.value, sizeof(values));
+	else
+		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
+	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(*value_p);
+	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+		uint32_t tmp;
+
+		/*
+		 * QFI is passed as an uint8_t integer, but it is accessed through
+		 * a 2nd least significant byte of a 32-bit field in modify header command.
+		 */
+		tmp = values[0];
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(tmp << 8);
+	}
+	i = act_data->modify_header.mhdr_cmds_off;
+	field = act_data->modify_header.field;
+	do {
+		uint32_t off_b;
+		uint32_t mask;
+		uint32_t data;
+		const uint8_t *mask_src;
+
+		if (i >= act_data->modify_header.mhdr_cmds_end)
+			return -1;
+		if (flow_hw_mhdr_cmd_is_nop(&job->mhdr_cmd[i])) {
+			++i;
+			continue;
+		}
+		mask_src = (const uint8_t *)act_data->modify_header.mask;
+		mask = flow_dv_fetch_field(mask_src + field->offset, field->size);
+		if (!mask) {
+			++field;
+			continue;
+		}
+		off_b = rte_bsf32(mask);
+		data = flow_dv_fetch_field(values + field->offset, field->size);
+		data = (data & mask) >> off_b;
+		job->mhdr_cmd[i++].data1 = rte_cpu_to_be_32(data);
+		++field;
+	} while (field->size);
+	return 0;
+}
+
 /**
  * Construct flow action array.
  *
@@ -928,6 +1363,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	};
 	uint32_t ft_flag;
 	size_t encap_len = 0;
+	int ret;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -945,6 +1381,18 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
+	if (hw_acts->mhdr && hw_acts->mhdr->mhdr_cmds_num > 0) {
+		uint16_t pos = hw_acts->mhdr->pos;
+
+		if (!hw_acts->mhdr->shared) {
+			rule_acts[pos].modify_header.offset =
+						job->flow->idx - 1;
+			rule_acts[pos].modify_header.data =
+						(uint8_t *)job->mhdr_cmd;
+			rte_memcpy(job->mhdr_cmd, hw_acts->mhdr->mhdr_cmds,
+				   sizeof(*job->mhdr_cmd) * hw_acts->mhdr->mhdr_cmds_num);
+		}
+	}
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1020,6 +1468,14 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_modify_field_construct(job,
+							     act_data,
+							     hw_acts,
+							     action);
+			if (ret)
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -1609,6 +2065,155 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static bool
+flow_hw_modify_field_is_used(const struct rte_flow_action_modify_field *action,
+			     enum rte_flow_field_id field)
+{
+	return action->src.field == field || action->dst.field == field;
+}
+
+static int
+flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
+				     const struct rte_flow_action *mask,
+				     struct rte_flow_error *error)
+{
+	const struct rte_flow_action_modify_field *action_conf =
+		action->conf;
+	const struct rte_flow_action_modify_field *mask_conf =
+		mask->conf;
+
+	if (action_conf->operation != mask_conf->operation)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field operation mask and template are not equal");
+	if (action_conf->dst.field != mask_conf->dst.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->dst.field == RTE_FLOW_FIELD_POINTER ||
+	    action_conf->dst.field == RTE_FLOW_FIELD_VALUE)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"immediate value and pointer cannot be used as destination");
+	if (mask_conf->dst.level != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination encapsulation level must be fully masked");
+	if (mask_conf->dst.offset != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination offset level must be fully masked");
+	if (action_conf->src.field != mask_conf->src.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->src.field != RTE_FLOW_FIELD_POINTER &&
+	    action_conf->src.field != RTE_FLOW_FIELD_VALUE) {
+		if (mask_conf->src.level != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source encapsulation level must be fully masked");
+		if (mask_conf->src.offset != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source offset level must be fully masked");
+	}
+	if (mask_conf->width != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field width field must be fully masked");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_START))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying arbitrary place in a packet is not supported");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_VLAN_TYPE))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying vlan_type is not supported");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_GENEVE_VNI))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying Geneve VNI is not supported");
+	return 0;
+}
+
+static int
+flow_hw_action_validate(const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	int i;
+	bool actions_end = false;
+	int ret;
+
+	for (i = 0; !actions_end; ++i) {
+		const struct rte_flow_action *action = &actions[i];
+		const struct rte_flow_action *mask = &masks[i];
+
+		if (action->type != mask->type)
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "mask type does not match action type");
+		switch (action->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MARK:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_DROP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_JUMP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_validate_action_modify_field(action,
+									mask,
+									error);
+			if (ret < 0)
+				return ret;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			actions_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "action not supported in template API");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow action template.
  *
@@ -1637,6 +2242,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
+	if (flow_hw_action_validate(actions, masks, error))
+		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
 	if (act_len <= 0)
@@ -2093,6 +2700,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
+			    sizeof(struct mlx5_modification_cmd) *
+			    MLX5_MHDR_MAX_CMD +
 			    sizeof(struct mlx5_hw_q_job)) *
 			    queue_attr[0]->size;
 	}
@@ -2104,6 +2713,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	for (i = 0; i < nb_queue; i++) {
 		uint8_t *encap = NULL;
+		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 
 		priv->hw_q[i].job_idx = queue_attr[i]->size;
 		priv->hw_q[i].size = queue_attr[i]->size;
@@ -2115,8 +2725,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 					    &job[queue_attr[i - 1]->size];
 		job = (struct mlx5_hw_q_job *)
 		      &priv->hw_q[i].job[queue_attr[i]->size];
-		encap = (uint8_t *)&job[queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
+		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
 		for (j = 0; j < queue_attr[i]->size; j++) {
+			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
 			priv->hw_q[i].job[j] = &job[j];
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 05/18] net/mlx5: add HW steering port action
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (3 preceding siblings ...)
  2022-10-20  3:21   ` [PATCH v5 04/18] net/mlx5: add modify field hws support Suanming Mou
@ 2022-10-20  3:21   ` Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
                     ` (12 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:21 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch implements creating and caching of port actions for use with
HW Steering FDB flows.

Actions are created on flow template API configuration and created
only on the port designated as master. Attaching and detaching of ports
in the same switching domain causes an update to the port actions cache
by, respectively, creating and destroying actions.

A new devarg fdb_def_rule_en is being added and it's used to control
the default dedicated E-Switch rule is created by PMD implicitly or not,
and PMD sets this value to 1 by default.
If set to 0, the default E-Switch rule will not be created and user can
create the specific E-Switch rule on root table if needed.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |    9 +
 drivers/net/mlx5/linux/mlx5_os.c   |   16 +-
 drivers/net/mlx5/mlx5.c            |   14 +
 drivers/net/mlx5/mlx5.h            |   26 +-
 drivers/net/mlx5/mlx5_flow.c       |   96 +-
 drivers/net/mlx5/mlx5_flow.h       |   22 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   93 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1356 +++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_trigger.c    |   77 +-
 10 files changed, 1595 insertions(+), 118 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 303eb17714..7d2095f075 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1132,6 +1132,15 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``fdb_def_rule_en`` parameter [int]
+
+  A non-zero value enables the PMD to create a dedicated rule on E-Switch root
+  table, this dedicated rule forwards all incoming packets into table 1, other
+  rules will be created in E-Switch table original table level plus one, to
+  improve the flow insertion rate due to skip root table managed by firmware.
+  If set to 0, all rules will be created on the original E-Switch table level.
+
+  By default, the PMD will set this value to 1.
 
 Supported NICs
 --------------
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index b7cc11a2ef..d674b54624 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1551,11 +1551,18 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5_HWS_SUPPORT
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    flow_hw_create_vport_action(eth_dev)) {
+			DRV_LOG(ERR, "port %u failed to create vport action",
+				eth_dev->data->port_id);
+			err = EINVAL;
+			goto error;
+		}
 		return eth_dev;
 #else
 		DRV_LOG(ERR, "DV support is missing for HWS.");
@@ -1620,6 +1627,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	return eth_dev;
 error:
 	if (priv) {
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		if (eth_dev &&
+		    priv->sh &&
+		    priv->sh->config.dv_flow_en == 2 &&
+		    priv->sh->config.dv_esw_en)
+			flow_hw_destroy_vport_action(eth_dev);
+#endif
 		if (priv->mreg_cp_tbl)
 			mlx5_hlist_destroy(priv->mreg_cp_tbl);
 		if (priv->sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a34fbcf74d..470b9c2d0f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -172,6 +172,9 @@
 /* Device parameter to configure the delay drop when creating Rxqs. */
 #define MLX5_DELAY_DROP "delay_drop"
 
+/* Device parameter to create the fdb default rule in PMD */
+#define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1239,6 +1242,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->decap_en = !!tmp;
 	} else if (strcmp(MLX5_ALLOW_DUPLICATE_PATTERN, key) == 0) {
 		config->allow_duplicate_pattern = !!tmp;
+	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
+		config->fdb_def_rule = !!tmp;
 	}
 	return 0;
 }
@@ -1274,6 +1279,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_RECLAIM_MEM,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
+		MLX5_FDB_DEFAULT_RULE_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1285,6 +1291,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->dv_flow_en = 1;
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
+	config->fdb_def_rule = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1360,6 +1367,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"decap_en\" is %u.", config->decap_en);
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
+	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
 	return 0;
 }
 
@@ -1943,6 +1951,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	mlx5_flex_parser_ecpri_release(dev);
 	mlx5_flex_item_port_cleanup(dev);
 #ifdef HAVE_MLX5_HWS_SUPPORT
+	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
 	if (priv->sh->config.dv_flow_en == 2)
@@ -2644,6 +2653,11 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.fdb_def_rule ^ config->fdb_def_rule) {
+		DRV_LOG(ERR, "\"fdb_def_rule_en\" configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.l3_vxlan_en ^ config->l3_vxlan_en) {
 		DRV_LOG(ERR, "\"l3_vxlan_en\" "
 			"configuration mismatch for shared %s context.",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6f75a32488..69a0a60030 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -314,6 +314,7 @@ struct mlx5_sh_config {
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
 	/* Allow/Prevent the duplicate rules pattern. */
+	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
 
@@ -342,6 +343,8 @@ enum {
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
 };
 
+#define MLX5_HW_MAX_ITEMS (16)
+
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
@@ -349,6 +352,8 @@ struct mlx5_hw_q_job {
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
+	struct rte_flow_item *items;
+	struct rte_flow_item_ethdev port_spec;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -1207,6 +1212,8 @@ struct mlx5_dev_ctx_shared {
 	uint32_t flow_priority_check_flag:1; /* Check Flag for flow priority. */
 	uint32_t metadata_regc_check_flag:1; /* Check Flag for metadata REGC. */
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
+	uint32_t shared_mark_enabled:1;
+	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1457,6 +1464,12 @@ struct mlx5_obj_ops {
 
 #define MLX5_RSS_HASH_FIELDS_LEN RTE_DIM(mlx5_rss_hash_fields)
 
+struct mlx5_hw_ctrl_flow {
+	LIST_ENTRY(mlx5_hw_ctrl_flow) next;
+	struct rte_eth_dev *owner_dev;
+	struct rte_flow *flow;
+};
+
 struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared device context. */
@@ -1497,6 +1510,11 @@ struct mlx5_priv {
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
 	void *root_drop_action; /* Pointer to root drop action. */
+	rte_spinlock_t hw_ctrl_lock;
+	LIST_HEAD(hw_ctrl_flow, mlx5_hw_ctrl_flow) hw_ctrl_flows;
+	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
+	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
+	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
@@ -1557,11 +1575,11 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_drop[MLX5_HW_ACTION_FLAG_MAX]
-				     [MLX5DR_TABLE_TYPE_MAX];
-	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_tag[MLX5_HW_ACTION_FLAG_MAX];
+	struct mlx5dr_action *hw_drop[2];
+	/* HW steering global tag action. */
+	struct mlx5dr_action *hw_tag[2];
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index f36e72fb89..60f76f5a43 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1001,6 +1001,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.flex_item_create = mlx5_flow_flex_item_create,
 	.flex_item_release = mlx5_flow_flex_item_release,
 	.info_get = mlx5_flow_info_get,
+	.pick_transfer_proxy = mlx5_flow_pick_transfer_proxy,
 	.configure = mlx5_flow_port_configure,
 	.pattern_template_create = mlx5_flow_pattern_template_create,
 	.pattern_template_destroy = mlx5_flow_pattern_template_destroy,
@@ -1244,7 +1245,7 @@ mlx5_get_lowest_priority(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (!attr->group && !attr->transfer)
+	if (!attr->group && !(attr->transfer && priv->fdb_def_rule))
 		return priv->sh->flow_max_priority - 2;
 	return MLX5_NON_ROOT_FLOW_MAX_PRIO - 1;
 }
@@ -1271,11 +1272,14 @@ mlx5_get_matcher_priority(struct rte_eth_dev *dev,
 	uint16_t priority = (uint16_t)attr->priority;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+	/* NIC root rules */
 	if (!attr->group && !attr->transfer) {
 		if (attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR)
 			priority = priv->sh->flow_max_priority - 1;
 		return mlx5_os_flow_adjust_priority(dev, priority, subpriority);
-	} else if (!external && attr->transfer && attr->group == 0 &&
+	/* FDB root rules */
+	} else if (attr->transfer && (!external || !priv->fdb_def_rule) &&
+		   attr->group == 0 &&
 		   attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR) {
 		return (priv->sh->flow_max_priority - 1) * 3;
 	}
@@ -1483,13 +1487,32 @@ flow_rxq_mark_flag_set(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_ctrl *rxq_ctrl;
+	uint16_t port_id;
 
-	if (priv->mark_enabled)
+	if (priv->sh->shared_mark_enabled)
 		return;
-	LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
-		rxq_ctrl->rxq.mark = 1;
+	if (priv->master || priv->representor) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->domain_id != priv->domain_id ||
+			    opriv->mark_enabled)
+				continue;
+			LIST_FOREACH(rxq_ctrl, &opriv->rxqsctrl, next) {
+				rxq_ctrl->rxq.mark = 1;
+			}
+			opriv->mark_enabled = 1;
+		}
+	} else {
+		LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
+			rxq_ctrl->rxq.mark = 1;
+		}
+		priv->mark_enabled = 1;
 	}
-	priv->mark_enabled = 1;
+	priv->sh->shared_mark_enabled = 1;
 }
 
 /**
@@ -1625,6 +1648,7 @@ flow_rxq_flags_clear(struct rte_eth_dev *dev)
 		rxq->ctrl->rxq.tunnel = 0;
 	}
 	priv->mark_enabled = 0;
+	priv->sh->shared_mark_enabled = 0;
 }
 
 /**
@@ -2810,8 +2834,8 @@ mlx5_flow_validate_item_tcp(const struct rte_flow_item *item,
  *   Item specification.
  * @param[in] item_flags
  *   Bit-fields that holds the items detected until now.
- * @param[in] attr
- *   Flow rule attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2823,7 +2847,7 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 			      uint16_t udp_dport,
 			      const struct rte_flow_item *item,
 			      uint64_t item_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_vxlan *spec = item->spec;
@@ -2860,12 +2884,11 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 	if (priv->sh->steering_format_version !=
 	    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
 	    !udp_dport || udp_dport == MLX5_UDP_PORT_VXLAN) {
-		/* FDB domain & NIC domain non-zero group */
-		if ((attr->transfer || attr->group) && priv->sh->misc5_cap)
+		/* non-root table */
+		if (!root && priv->sh->misc5_cap)
 			valid_mask = &nic_mask;
 		/* Group zero in NIC domain */
-		if (!attr->group && !attr->transfer &&
-		    priv->sh->tunnel_header_0_1)
+		if (!root && priv->sh->tunnel_header_0_1)
 			valid_mask = &nic_mask;
 	}
 	ret = mlx5_flow_item_acceptable
@@ -3104,11 +3127,11 @@ mlx5_flow_validate_item_gre_option(struct rte_eth_dev *dev,
 	if (mask->checksum_rsvd.checksum || mask->sequence.sequence) {
 		if (priv->sh->steering_format_version ==
 		    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
-		    ((attr->group || attr->transfer) &&
+		    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
 		     !priv->sh->misc5_cap) ||
 		    (!(priv->sh->tunnel_header_0_1 &&
 		       priv->sh->tunnel_header_2_3) &&
-		    !attr->group && !attr->transfer))
+		    !attr->group && (!attr->transfer || !priv->fdb_def_rule)))
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  item,
@@ -6165,7 +6188,8 @@ flow_create_split_metadata(struct rte_eth_dev *dev,
 	}
 	if (qrss) {
 		/* Check if it is in meter suffix table. */
-		mtr_sfx = attr->group == (attr->transfer ?
+		mtr_sfx = attr->group ==
+			  ((attr->transfer && priv->fdb_def_rule) ?
 			  (MLX5_FLOW_TABLE_LEVEL_METER - 1) :
 			  MLX5_FLOW_TABLE_LEVEL_METER);
 		/*
@@ -11088,3 +11112,43 @@ int mlx5_flow_get_item_vport_id(struct rte_eth_dev *dev,
 
 	return 0;
 }
+
+int
+mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+			      uint16_t *proxy_port_id,
+			      struct rte_flow_error *error)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t port_id;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " without E-Switch configured");
+	if (!priv->master && !priv->representor)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " for port which is not a master"
+					  " or a representor port");
+	if (priv->master) {
+		*proxy_port_id = dev->data->port_id;
+		return 0;
+	}
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_priv->master &&
+		    port_priv->domain_id == priv->domain_id) {
+			*proxy_port_id = port_id;
+			return 0;
+		}
+	}
+	return rte_flow_error_set(error, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "unable to find a proxy port");
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 407b3d79bd..25b44ccca2 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1162,6 +1162,11 @@ struct rte_flow_pattern_template {
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
 	uint32_t refcnt;  /* Reference counter. */
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * represented_port pattern item.
+	 */
+	bool implicit_port;
 };
 
 /* Flow action template struct. */
@@ -1237,6 +1242,7 @@ struct mlx5_hw_action_template {
 /* mlx5 flow group struct. */
 struct mlx5_flow_group {
 	struct mlx5_list_entry entry;
+	struct rte_eth_dev *dev; /* Reference to corresponding device. */
 	struct mlx5dr_table *tbl; /* HWS table object. */
 	struct mlx5_hw_jump_action jump; /* Jump action. */
 	enum mlx5dr_table_type type; /* Table type. */
@@ -1495,6 +1501,9 @@ void flow_hw_clear_port_info(struct rte_eth_dev *dev);
 void flow_hw_init_tags_set(struct rte_eth_dev *dev);
 void flow_hw_clear_tags_set(struct rte_eth_dev *dev);
 
+int flow_hw_create_vport_action(struct rte_eth_dev *dev);
+void flow_hw_destroy_vport_action(struct rte_eth_dev *dev);
+
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr,
 				    const struct rte_flow_item items[],
@@ -2093,7 +2102,7 @@ int mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 				  uint16_t udp_dport,
 				  const struct rte_flow_item *item,
 				  uint64_t item_flags,
-				  const struct rte_flow_attr *attr,
+				  bool root,
 				  struct rte_flow_error *error);
 int mlx5_flow_validate_item_vxlan_gpe(const struct rte_flow_item *item,
 				      uint64_t item_flags,
@@ -2350,4 +2359,15 @@ int flow_dv_translate_items_hws(const struct rte_flow_item *items,
 				uint32_t key_type, uint64_t *item_flags,
 				uint8_t *match_criteria,
 				struct rte_flow_error *error);
+
+int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+				  uint16_t *proxy_port_id,
+				  struct rte_flow_error *error);
+
+int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
+
+int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
+					 uint32_t txq);
+int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 9fbaa4bfe8..1ee26be975 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2446,8 +2446,8 @@ flow_dv_validate_item_gtp(struct rte_eth_dev *dev,
  *   Previous validated item in the pattern items.
  * @param[in] gtp_item
  *   Previous GTP item specification.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2458,7 +2458,7 @@ static int
 flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			      uint64_t last_item,
 			      const struct rte_flow_item *gtp_item,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_gtp *gtp_spec;
@@ -2483,7 +2483,7 @@ flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, item,
 			 "GTP E flag must be 1 to match GTP PSC");
 	/* Check the flow is not created in group zero. */
-	if (!attr->transfer && !attr->group)
+	if (root)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 			 "GTP PSC is not supported for group 0");
@@ -3348,20 +3348,19 @@ flow_dv_validate_action_set_tag(struct rte_eth_dev *dev,
 /**
  * Indicates whether ASO aging is supported.
  *
- * @param[in] sh
- *   Pointer to shared device context structure.
- * @param[in] attr
- *   Attributes of flow that includes AGE action.
+ * @param[in] priv
+ *   Pointer to device private context structure.
+ * @param[in] root
+ *   Whether action is on root table.
  *
  * @return
  *   True when ASO aging is supported, false otherwise.
  */
 static inline bool
-flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
-		const struct rte_flow_attr *attr)
+flow_hit_aso_supported(const struct mlx5_priv *priv, bool root)
 {
-	MLX5_ASSERT(sh && attr);
-	return (sh->flow_hit_aso_en && (attr->transfer || attr->group));
+	MLX5_ASSERT(priv);
+	return (priv->sh->flow_hit_aso_en && !root);
 }
 
 /**
@@ -3373,8 +3372,8 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
  *   Indicator if action is shared.
  * @param[in] action_flags
  *   Holds the actions detected until now.
- * @param[in] attr
- *   Attributes of flow that includes this action.
+ * @param[in] root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3384,7 +3383,7 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
 static int
 flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 			      uint64_t action_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -3396,7 +3395,7 @@ flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "duplicate count actions set");
 	if (shared && (action_flags & MLX5_FLOW_ACTION_AGE) &&
-	    !flow_hit_aso_supported(priv->sh, attr))
+	    !flow_hit_aso_supported(priv, root))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "old age and indirect count combination is not supported");
@@ -3627,8 +3626,8 @@ flow_dv_validate_action_raw_encap_decap
  *   Holds the actions detected until now.
  * @param[in] item_flags
  *   The items found in this flow rule.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3639,12 +3638,12 @@ static int
 flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 			       uint64_t action_flags,
 			       uint64_t item_flags,
-			       const struct rte_flow_attr *attr,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	RTE_SET_USED(dev);
 
-	if (attr->group == 0 && !attr->transfer)
+	if (root)
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -4894,6 +4893,8 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   Pointer to the modify action.
  * @param[in] attr
  *   Pointer to the flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -4906,6 +4907,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 				   const uint64_t action_flags,
 				   const struct rte_flow_action *action,
 				   const struct rte_flow_attr *attr,
+				   bool root,
 				   struct rte_flow_error *error)
 {
 	int ret = 0;
@@ -4953,7 +4955,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	}
 	if (action_modify_field->src.field != RTE_FLOW_FIELD_VALUE &&
 	    action_modify_field->src.field != RTE_FLOW_FIELD_POINTER) {
-		if (!attr->transfer && !attr->group)
+		if (root)
 			return rte_flow_error_set(error, ENOTSUP,
 					RTE_FLOW_ERROR_TYPE_ACTION, action,
 					"modify field action is not"
@@ -5043,8 +5045,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV4_ECN ||
 	    action_modify_field->dst.field == RTE_FLOW_FIELD_IPV6_ECN ||
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV6_ECN)
-		if (!hca_attr->modify_outer_ip_ecn &&
-		    !attr->transfer && !attr->group)
+		if (!hca_attr->modify_outer_ip_ecn && root)
 			return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_ACTION, action,
 				"modifications of the ECN for current firmware is not supported");
@@ -5078,11 +5079,12 @@ flow_dv_validate_action_jump(struct rte_eth_dev *dev,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
 	struct flow_grp_info grp_info = {
 		.external = !!external,
 		.transfer = !!attributes->transfer,
-		.fdb_def_rule = 1,
+		.fdb_def_rule = !!priv->fdb_def_rule,
 		.std_tbl_fix = 0
 	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
@@ -5662,6 +5664,8 @@ flow_dv_modify_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  *   Pointer to the COUNT action in sample action list.
  * @param[out] fdb_mirror_limit
  *   Pointer to the FDB mirror limitation flag.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -5678,6 +5682,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 			       const struct rte_flow_action_rss **sample_rss,
 			       const struct rte_flow_action_count **count,
 			       int *fdb_mirror_limit,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5779,7 +5784,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count
 				(dev, false, *action_flags | sub_action_flags,
-				 attr, error);
+				 root, error);
 			if (ret < 0)
 				return ret;
 			*count = act->conf;
@@ -7259,7 +7264,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
@@ -7353,7 +7358,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
 			ret = flow_dv_validate_item_gtp_psc(items, last_item,
-							    gtp_item, attr,
+							    gtp_item, is_root,
 							    error);
 			if (ret < 0)
 				return ret;
@@ -7570,7 +7575,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count(dev, shared_count,
 							    action_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			count = actions->conf;
@@ -7864,7 +7869,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_SET_TAG;
 			break;
 		case MLX5_RTE_FLOW_ACTION_TYPE_AGE:
-			if (!attr->transfer && !attr->group)
+			if (is_root)
 				return rte_flow_error_set(error, ENOTSUP,
 						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 									   NULL,
@@ -7889,7 +7894,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			 * Validate the regular AGE action (using counter)
 			 * mutual exclusion with indirect counter actions.
 			 */
-			if (!flow_hit_aso_supported(priv->sh, attr)) {
+			if (!flow_hit_aso_supported(priv, is_root)) {
 				if (shared_count)
 					return rte_flow_error_set
 						(error, EINVAL,
@@ -7945,6 +7950,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 							     rss, &sample_rss,
 							     &sample_count,
 							     &fdb_mirror_limit,
+							     is_root,
 							     error);
 			if (ret < 0)
 				return ret;
@@ -7961,6 +7967,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 								   action_flags,
 								   actions,
 								   attr,
+								   is_root,
 								   error);
 			if (ret < 0)
 				return ret;
@@ -7974,8 +7981,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			ret = flow_dv_validate_action_aso_ct(dev, action_flags,
-							     item_flags, attr,
-							     error);
+							     item_flags,
+							     is_root, error);
 			if (ret < 0)
 				return ret;
 			action_flags |= MLX5_FLOW_ACTION_CT;
@@ -9172,15 +9179,18 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev,
 	if (MLX5_ITEM_VALID(item, key_type))
 		return;
 	MLX5_ITEM_UPDATE(item, key_type, vxlan_v, vxlan_m, &nic_mask);
-	if (item->mask == &nic_mask &&
-	    ((!attr->group && !priv->sh->tunnel_header_0_1) ||
-	    (attr->group && !priv->sh->misc5_cap)))
+	if ((item->mask == &nic_mask) &&
+	    ((!attr->group && !(attr->transfer && priv->fdb_def_rule) &&
+	    !priv->sh->tunnel_header_0_1) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)))
 		vxlan_m = &rte_flow_item_vxlan_mask;
 	if ((priv->sh->steering_format_version ==
 	     MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 &&
 	     dport != MLX5_UDP_PORT_VXLAN) ||
-	    (!attr->group && !attr->transfer) ||
-	    ((attr->group || attr->transfer) && !priv->sh->misc5_cap)) {
+	    (!attr->group && !(attr->transfer && priv->fdb_def_rule)) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)) {
 		misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 		size = sizeof(vxlan_m->vni);
 		vni_v = MLX5_ADDR_OF(fte_match_set_misc, misc_v, vxlan_vni);
@@ -14152,7 +14162,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 			 */
 			if (action_flags & MLX5_FLOW_ACTION_AGE) {
 				if ((non_shared_age && count) ||
-				    !flow_hit_aso_supported(priv->sh, attr)) {
+				    !flow_hit_aso_supported(priv, !dev_flow->dv.group)) {
 					/* Creates age by counters. */
 					cnt_act = flow_dv_prepare_counter
 								(dev, dev_flow,
@@ -18301,6 +18311,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 			struct rte_flow_error *err)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	/* called from RTE API */
 
 	RTE_SET_USED(conf);
 	switch (action->type) {
@@ -18328,7 +18339,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 						"Indirect age action not supported");
 		return flow_dv_validate_action_age(0, action, dev, err);
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		return flow_dv_validate_action_count(dev, true, 0, NULL, err);
+		return flow_dv_validate_action_count(dev, true, 0, false, err);
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		if (!priv->sh->ct_aso_en)
 			return rte_flow_error_set(err, ENOTSUP,
@@ -18505,6 +18516,8 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 	bool def_green = false;
 	bool def_yellow = false;
 	const struct rte_flow_action_rss *rss_color[RTE_COLORS] = {NULL};
+	/* Called from RTE API */
+	bool is_root = !(attr->group || (attr->transfer && priv->fdb_def_rule));
 
 	if (!dev_conf->dv_esw_en)
 		def_domain &= ~MLX5_MTR_DOMAIN_TRANSFER_BIT;
@@ -18706,7 +18719,7 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 				break;
 			case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 				ret = flow_dv_validate_action_modify_field(dev,
-					action_flags[i], act, attr, &flow_err);
+					action_flags[i], act, attr, is_root, &flow_err);
 				if (ret < 0)
 					return -rte_mtr_error_set(error,
 					  ENOTSUP,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index fa7bd37737..991e4c9b7b 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,6 +20,14 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
+/* Maximum number of rules in control flow tables */
+#define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
+
+/* Flow group for SQ miss default flows/ */
+#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+
+static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -57,6 +65,9 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_ctrl_get(dev, i);
 
+		/* With RXQ start/stop feature, RXQ might be stopped. */
+		if (!rxq_ctrl)
+			continue;
 		rxq_ctrl->rxq.mark = enable;
 	}
 	priv->mark_enabled = enable;
@@ -810,6 +821,77 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+flow_hw_represented_port_compile(struct rte_eth_dev *dev,
+				 const struct rte_flow_attr *attr,
+				 const struct rte_flow_action *action_start,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *action_mask,
+				 struct mlx5_hw_actions *acts,
+				 uint16_t action_dst,
+				 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_ethdev *v = action->conf;
+	const struct rte_flow_action_ethdev *m = action_mask->conf;
+	int ret;
+
+	if (!attr->group)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used on group 0");
+	if (!attr->transfer)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER,
+					  NULL,
+					  "represented_port action requires"
+					  " transfer attribute");
+	if (attr->ingress || attr->egress)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used with direction attributes");
+	if (!priv->master)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "represented_port acton must"
+					  " be used on proxy port");
+	if (m && !!m->port_id) {
+		struct mlx5_priv *port_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
+		if (port_priv == NULL)
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "port does not exist or unable to"
+					 " obtain E-Switch info for port");
+		MLX5_ASSERT(priv->hw_vport != NULL);
+		if (priv->hw_vport[v->port_id]) {
+			acts->rule_acts[action_dst].action =
+					priv->hw_vport[v->port_id];
+		} else {
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "cannot use represented_port action"
+					 " with this port");
+		}
+	} else {
+		ret = __flow_hw_act_data_general_append
+				(priv, acts, action->type,
+				 action - action_start, action_dst);
+		if (ret)
+			return rte_flow_error_set
+					(error, ENOMEM,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "not enough memory to store"
+					 " vport action");
+	}
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -887,7 +969,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			acts->rule_acts[i++].action =
-				priv->hw_drop[!!attr->group][type];
+				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			acts->mark = true;
@@ -1020,6 +1102,13 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			if (err)
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			if (flow_hw_represented_port_compile
+					(dev, attr, action_start, actions,
+					 masks, acts, i, error))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1352,11 +1441,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5dr_rule_action *rule_acts,
 			  uint32_t *acts_num)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
+	const struct rte_flow_action_ethdev *port_action = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1476,6 +1567,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (ret)
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			port_action = action->conf;
+			if (!priv->hw_vport[port_action->port_id])
+				return -1;
+			rule_acts[act_data->action_dst].action =
+					priv->hw_vport[port_action->port_id];
+			break;
 		default:
 			break;
 		}
@@ -1488,6 +1586,52 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static const struct rte_flow_item *
+flow_hw_get_rule_items(struct rte_eth_dev *dev,
+		       struct rte_flow_template_table *table,
+		       const struct rte_flow_item items[],
+		       uint8_t pattern_template_index,
+		       struct mlx5_hw_q_job *job)
+{
+	if (table->its[pattern_template_index]->implicit_port) {
+		const struct rte_flow_item *curr_item;
+		unsigned int nb_items;
+		bool found_end;
+		unsigned int i;
+
+		/* Count number of pattern items. */
+		nb_items = 0;
+		found_end = false;
+		for (curr_item = items; !found_end; ++curr_item) {
+			++nb_items;
+			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+				found_end = true;
+		}
+		/* Prepend represented port item. */
+		job->port_spec = (struct rte_flow_item_ethdev){
+			.port_id = dev->data->port_id,
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &job->port_spec,
+		};
+		found_end = false;
+		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
+			job->items[i] = items[i - 1];
+			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
+				found_end = true;
+				break;
+			}
+		}
+		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+		return job->items;
+	}
+	return items;
+}
+
 /**
  * Enqueue HW steering flow creation.
  *
@@ -1539,6 +1683,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
+	const struct rte_flow_item *rule_items;
 	uint32_t acts_num, flow_idx;
 	int ret;
 
@@ -1565,15 +1710,23 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow action array based on the input actions.*/
-	flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num);
+	/* Construct the flow actions based on the input actions.*/
+	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
+				  actions, rule_acts, &acts_num)) {
+		rte_errno = EINVAL;
+		goto free;
+	}
+	rule_items = flow_hw_get_rule_items(dev, table, items,
+					    pattern_template_index, job);
+	if (!rule_items)
+		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
 				 pattern_template_index, items,
 				 action_template_index, rule_acts,
 				 &rule_attr, (struct mlx5dr_rule *)flow->rule);
 	if (likely(!ret))
 		return (struct rte_flow *)flow;
+free:
 	/* Flow created fail, return the descriptor and flow memory. */
 	mlx5_ipool_free(table->flow, flow_idx);
 	priv->hw_q[queue].job_idx++;
@@ -1754,7 +1907,9 @@ __flow_hw_pull_comp(struct rte_eth_dev *dev,
 	struct rte_flow_op_result comp[BURST_THR];
 	int ret, i, empty_loop = 0;
 
-	flow_hw_push(dev, queue, error);
+	ret = flow_hw_push(dev, queue, error);
+	if (ret < 0)
+		return ret;
 	while (pending_rules) {
 		ret = flow_hw_pull(dev, queue, comp, BURST_THR, error);
 		if (ret < 0)
@@ -2039,8 +2194,12 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
+	uint32_t fidx = 1;
 
-	if (table->refcnt) {
+	/* Build ipool allocated object bitmap. */
+	mlx5_ipool_flush_cache(table->flow);
+	/* Check if ipool has allocated objects. */
+	if (table->refcnt || mlx5_ipool_get_next(table->flow, &fidx)) {
 		DRV_LOG(WARNING, "Table %p is still in using.", (void *)table);
 		return rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2052,8 +2211,6 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 		__atomic_sub_fetch(&table->its[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
 	for (i = 0; i < table->nb_action_templates; i++) {
-		if (table->ats[i].acts.mark)
-			flow_hw_rxq_flag_set(dev, false);
 		__flow_hw_action_template_destroy(dev, &table->ats[i].acts);
 		__atomic_sub_fetch(&table->ats[i].action_template->refcnt,
 				   1, __ATOMIC_RELAXED);
@@ -2138,7 +2295,51 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
 }
 
 static int
-flow_hw_action_validate(const struct rte_flow_action actions[],
+flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
+					 const struct rte_flow_action *action,
+					 const struct rte_flow_action *mask,
+					 struct rte_flow_error *error)
+{
+	const struct rte_flow_action_ethdev *action_conf = action->conf;
+	const struct rte_flow_action_ethdev *mask_conf = mask->conf;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "cannot use represented_port actions"
+					  " without an E-Switch");
+	if (mask_conf->port_id) {
+		struct mlx5_priv *port_priv;
+		struct mlx5_priv *dev_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
+		if (!port_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for port");
+		dev_priv = mlx5_dev_to_eswitch_info(dev);
+		if (!dev_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for transfer proxy");
+		if (port_priv->domain_id != dev_priv->domain_id)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "cannot forward to port from"
+						  " a different E-Switch");
+	}
+	return 0;
+}
+
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
@@ -2201,6 +2402,12 @@ flow_hw_action_validate(const struct rte_flow_action actions[],
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			ret = flow_hw_validate_action_represented_port
+					(dev, action, mask, error);
+			if (ret < 0)
+				return ret;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2242,7 +2449,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
-	if (flow_hw_action_validate(actions, masks, error))
+	if (flow_hw_action_validate(dev, actions, masks, error))
 		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
@@ -2325,6 +2532,46 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
+static struct rte_flow_item *
+flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
+			       struct rte_flow_error *error)
+{
+	const struct rte_flow_item *curr_item;
+	struct rte_flow_item *copied_items;
+	bool found_end;
+	unsigned int nb_items;
+	unsigned int i;
+	size_t size;
+
+	/* Count number of pattern items. */
+	nb_items = 0;
+	found_end = false;
+	for (curr_item = items; !found_end; ++curr_item) {
+		++nb_items;
+		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+			found_end = true;
+	}
+	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	size = sizeof(*copied_items) * (nb_items + 1);
+	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
+	if (!copied_items) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				   NULL,
+				   "cannot allocate item template");
+		return NULL;
+	}
+	copied_items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = NULL,
+		.last = NULL,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	for (i = 1; i < nb_items + 1; ++i)
+		copied_items[i] = items[i - 1];
+	return copied_items;
+}
+
 /**
  * Create flow item template.
  *
@@ -2348,9 +2595,35 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *it;
+	struct rte_flow_item *copied_items = NULL;
+	const struct rte_flow_item *tmpl_items;
 
+	if (priv->sh->config.dv_esw_en && attr->ingress) {
+		/*
+		 * Disallow pattern template with ingress and egress/transfer
+		 * attributes in order to forbid implicit port matching
+		 * on egress and transfer traffic.
+		 */
+		if (attr->egress || attr->transfer) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "item template for ingress traffic"
+					   " cannot be used for egress/transfer"
+					   " traffic when E-Switch is enabled");
+			return NULL;
+		}
+		copied_items = flow_hw_copy_prepend_port_item(items, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else {
+		tmpl_items = items;
+	}
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL,
@@ -2358,8 +2631,10 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
-	it->mt = mlx5dr_match_template_create(items, attr->relaxed_matching);
+	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		mlx5_free(it);
 		rte_flow_error_set(error, rte_errno,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2367,9 +2642,12 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 				   "cannot create match template");
 		return NULL;
 	}
-	it->item_flags = flow_hw_rss_item_flags_get(items);
+	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
+	it->implicit_port = !!copied_items;
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
+	if (copied_items)
+		mlx5_free(copied_items);
 	return it;
 }
 
@@ -2495,6 +2773,7 @@ flow_hw_grp_create_cb(void *tool_ctx, void *cb_ctx)
 			goto error;
 		grp_data->jump.root_action = jump;
 	}
+	grp_data->dev = dev;
 	grp_data->idx = idx;
 	grp_data->group_id = attr->group;
 	grp_data->type = dr_tbl_attr.type;
@@ -2563,7 +2842,8 @@ flow_hw_grp_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 	struct rte_flow_attr *attr =
 			(struct rte_flow_attr *)ctx->data;
 
-	return (grp_data->group_id != attr->group) ||
+	return (grp_data->dev != ctx->dev) ||
+		(grp_data->group_id != attr->group) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_FDB) &&
 		attr->transfer) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_NIC_TX) &&
@@ -2626,6 +2906,545 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
 	mlx5_ipool_free(sh->ipool[MLX5_IPOOL_HW_GRP], grp_data->idx);
 }
 
+/**
+ * Create and cache a vport action for given @p dev port. vport actions
+ * cache is used in HWS with FDB flows.
+ *
+ * This function does not create any function if proxy port for @p dev port
+ * was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ *
+ * @return
+ *   0 on success, positive value otherwise.
+ */
+int
+flow_hw_create_vport_action(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+	int ret;
+
+	ret = mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL);
+	if (ret)
+		return ret;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport)
+		return 0;
+	if (proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u HWS vport action already created",
+			port_id);
+		return -EINVAL;
+	}
+	proxy_priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+			(proxy_priv->dr_ctx, priv->dev_port,
+			 MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u unable to create HWS vport action",
+			port_id);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Destroys the vport action associated with @p dev device
+ * from actions' cache.
+ *
+ * This function does not destroy any action if there is no action cached
+ * for @p dev or proxy port was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ */
+void
+flow_hw_destroy_vport_action(struct rte_eth_dev *dev)
+{
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+
+	if (mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL))
+		return;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport || !proxy_priv->hw_vport[port_id])
+		return;
+	mlx5dr_action_destroy(proxy_priv->hw_vport[port_id]);
+	proxy_priv->hw_vport[port_id] = NULL;
+}
+
+static int
+flow_hw_create_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	MLX5_ASSERT(!priv->hw_vport);
+	priv->hw_vport = mlx5_malloc(MLX5_MEM_ZERO,
+				     sizeof(*priv->hw_vport) * RTE_MAX_ETHPORTS,
+				     0, SOCKET_ID_ANY);
+	if (!priv->hw_vport)
+		return -ENOMEM;
+	DRV_LOG(DEBUG, "port %u :: creating vport actions", priv->dev_data->port_id);
+	DRV_LOG(DEBUG, "port %u ::    domain_id=%u", priv->dev_data->port_id, priv->domain_id);
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
+		struct mlx5_priv *port_priv = rte_eth_devices[port_id].data->dev_private;
+
+		if (!port_priv ||
+		    port_priv->domain_id != priv->domain_id)
+			continue;
+		DRV_LOG(DEBUG, "port %u :: for port_id=%u, calling mlx5dr_action_create_dest_vport() with ibport=%u",
+			priv->dev_data->port_id, port_id, port_priv->dev_port);
+		priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+				(priv->dr_ctx, port_priv->dev_port,
+				 MLX5DR_ACTION_FLAG_HWS_FDB);
+		DRV_LOG(DEBUG, "port %u :: priv->hw_vport[%u]=%p",
+			priv->dev_data->port_id, port_id, (void *)priv->hw_vport[port_id]);
+		if (!priv->hw_vport[port_id])
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static void
+flow_hw_free_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	if (!priv->hw_vport)
+		return;
+	for (port_id = 0; port_id < RTE_MAX_ETHPORTS; ++port_id)
+		if (priv->hw_vport[port_id])
+			mlx5dr_action_destroy(priv->hw_vport[port_id]);
+	mlx5_free(priv->hw_vport);
+	priv->hw_vport = NULL;
+}
+
+/**
+ * Creates a flow pattern template used to match on E-Switch Manager.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template used to match on a TX queue.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct mlx5_rte_flow_item_sq queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template with unmasked represented port matching.
+ * This template is used to set up a table for default transfer flows
+ * directing packets to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow actions template with an unmasked JUMP action. Flows
+ * based on this template will perform a jump to some group. This template
+ * is used to set up tables for control flows.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param group
+ *   Destination group for this action template.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_jump_actions_template(struct rte_eth_dev *dev,
+					  uint32_t group)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = group,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a flow action template with a unmasked REPRESENTED_PORT action.
+ * It is used to create control flow tables.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow action template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_ethdev port_v = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action_ethdev port_m = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
+				       struct rte_flow_pattern_template *it,
+				       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
+				  struct rte_flow_pattern_template *it,
+				  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = MLX5_HW_SQ_MISS_GROUP,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic
+ * from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table *
+flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
+			       struct rte_flow_pattern_template *it,
+			       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 15, /* TODO: Flow priority discovery. */
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a set of flow tables used to create control flows used
+ * when E-Switch is engaged.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, EINVAL otherwise
+ */
+static __rte_unused int
+flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
+	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *port_items_tmpl = NULL;
+	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_actions_template *port_actions_tmpl = NULL;
+	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+
+	/* Item templates */
+	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
+	if (!esw_mgr_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
+			" template for control flows", dev->data->port_id);
+		goto error;
+	}
+	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
+	if (!sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Action templates */
+	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
+									 MLX5_HW_SQ_MISS_GROUP);
+	if (!jump_sq_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Tables */
+	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
+	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
+			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_root_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+								     port_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
+	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
+							       jump_one_actions_tmpl);
+	if (!priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default jump to group 1"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	return 0;
+error:
+	if (priv->hw_esw_zero_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_zero_tbl, NULL);
+		priv->hw_esw_zero_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_tbl, NULL);
+		priv->hw_esw_sq_miss_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_root_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
+		priv->hw_esw_sq_miss_root_tbl = NULL;
+	}
+	if (jump_one_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
+	if (port_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
+	if (jump_sq_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (port_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
+	if (sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (esw_mgr_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
+	return -EINVAL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -2643,7 +3462,6 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-
 static int
 flow_hw_configure(struct rte_eth_dev *dev,
 		  const struct rte_flow_port_attr *port_attr,
@@ -2666,6 +3484,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		.free = mlx5_free,
 		.type = "mlx5_hw_action_construct_data",
 	};
+	/* Adds one queue to be used by PMD.
+	 * The last queue will be used by the PMD.
+	 */
+	uint16_t nb_q_updated;
+	struct rte_flow_queue_attr **_queue_attr = NULL;
+	struct rte_flow_queue_attr ctrl_queue_attr = {0};
+	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
+	int ret;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -2674,7 +3500,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* In case re-configuring, release existing context at first. */
 	if (priv->dr_ctx) {
 		/* */
-		for (i = 0; i < nb_queue; i++) {
+		for (i = 0; i < priv->nb_queue; i++) {
 			hw_q = &priv->hw_q[i];
 			/* Make sure all queues are empty. */
 			if (hw_q->size != hw_q->job_idx) {
@@ -2684,26 +3510,42 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		flow_hw_resource_release(dev);
 	}
+	ctrl_queue_attr.size = queue_attr[0]->size;
+	nb_q_updated = nb_queue + 1;
+	_queue_attr = mlx5_malloc(MLX5_MEM_ZERO,
+				  nb_q_updated *
+				  sizeof(struct rte_flow_queue_attr *),
+				  64, SOCKET_ID_ANY);
+	if (!_queue_attr) {
+		rte_errno = ENOMEM;
+		goto err;
+	}
+
+	memcpy(_queue_attr, queue_attr,
+	       sizeof(void *) * nb_queue);
+	_queue_attr[nb_queue] = &ctrl_queue_attr;
 	priv->acts_ipool = mlx5_ipool_create(&cfg);
 	if (!priv->acts_ipool)
 		goto err;
 	/* Allocate the queue job descriptor LIFO. */
-	mem_size = sizeof(priv->hw_q[0]) * nb_queue;
-	for (i = 0; i < nb_queue; i++) {
+	mem_size = sizeof(priv->hw_q[0]) * nb_q_updated;
+	for (i = 0; i < nb_q_updated; i++) {
 		/*
 		 * Check if the queues' size are all the same as the
 		 * limitation from HWS layer.
 		 */
-		if (queue_attr[i]->size != queue_attr[0]->size) {
+		if (_queue_attr[i]->size != _queue_attr[0]->size) {
 			rte_errno = EINVAL;
 			goto err;
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
+			    sizeof(struct mlx5_hw_q_job) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
 			    sizeof(struct mlx5_modification_cmd) *
 			    MLX5_MHDR_MAX_CMD +
-			    sizeof(struct mlx5_hw_q_job)) *
-			    queue_attr[0]->size;
+			    sizeof(struct rte_flow_item) *
+			    MLX5_HW_MAX_ITEMS) *
+			    _queue_attr[i]->size;
 	}
 	priv->hw_q = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
 				 64, SOCKET_ID_ANY);
@@ -2711,58 +3553,82 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		rte_errno = ENOMEM;
 		goto err;
 	}
-	for (i = 0; i < nb_queue; i++) {
+	for (i = 0; i < nb_q_updated; i++) {
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
+		struct rte_flow_item *items = NULL;
 
-		priv->hw_q[i].job_idx = queue_attr[i]->size;
-		priv->hw_q[i].size = queue_attr[i]->size;
+		priv->hw_q[i].job_idx = _queue_attr[i]->size;
+		priv->hw_q[i].size = _queue_attr[i]->size;
 		if (i == 0)
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &priv->hw_q[nb_queue];
+					    &priv->hw_q[nb_q_updated];
 		else
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &job[queue_attr[i - 1]->size];
+				&job[_queue_attr[i - 1]->size - 1].items
+				 [MLX5_HW_MAX_ITEMS];
 		job = (struct mlx5_hw_q_job *)
-		      &priv->hw_q[i].job[queue_attr[i]->size];
-		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
-		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
-		for (j = 0; j < queue_attr[i]->size; j++) {
+		      &priv->hw_q[i].job[_queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)
+			   &job[_queue_attr[i]->size];
+		encap = (uint8_t *)
+			 &mhdr_cmd[_queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
+		items = (struct rte_flow_item *)
+			 &encap[_queue_attr[i]->size * MLX5_ENCAP_MAX_LEN];
+		for (j = 0; j < _queue_attr[i]->size; j++) {
 			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
+			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
-	dr_ctx_attr.queues = nb_queue;
+	dr_ctx_attr.queues = nb_q_updated;
 	/* Queue size should all be the same. Take the first one. */
-	dr_ctx_attr.queue_size = queue_attr[0]->size;
+	dr_ctx_attr.queue_size = _queue_attr[0]->size;
 	dr_ctx = mlx5dr_context_open(priv->sh->cdev->ctx, &dr_ctx_attr);
 	/* rte_errno has been updated by HWS layer. */
 	if (!dr_ctx)
 		goto err;
 	priv->dr_ctx = dr_ctx;
-	priv->nb_queue = nb_queue;
+	priv->nb_queue = nb_q_updated;
+	rte_spinlock_init(&priv->hw_ctrl_lock);
+	LIST_INIT(&priv->hw_ctrl_flows);
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			priv->hw_drop[i][j] = mlx5dr_action_create_dest_drop
-				(priv->dr_ctx, mlx5_hw_act_flag[i][j]);
-			if (!priv->hw_drop[i][j])
-				goto err;
-		}
+		uint32_t act_flags = 0;
+
+		act_flags = mlx5_hw_act_flag[i][0] | mlx5_hw_act_flag[i][1];
+		if (is_proxy)
+			act_flags |= mlx5_hw_act_flag[i][2];
+		priv->hw_drop[i] = mlx5dr_action_create_dest_drop(priv->dr_ctx, act_flags);
+		if (!priv->hw_drop[i])
+			goto err;
 		priv->hw_tag[i] = mlx5dr_action_create_tag
 			(priv->dr_ctx, mlx5_hw_act_flag[i][0]);
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (is_proxy) {
+		ret = flow_hw_create_vport_actions(priv);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+		ret = flow_hw_create_ctrl_tables(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return 0;
 err:
+	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
@@ -2774,6 +3640,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -2792,10 +3660,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i, j;
+	int i;
 
 	if (!priv->dr_ctx)
 		return;
+	flow_hw_rxq_flag_set(dev, false);
+	flow_hw_flush_all_ctrl_flows(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -2809,13 +3679,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, at, NULL);
 	}
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
@@ -3058,4 +3927,397 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
+static uint32_t
+flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
+{
+	MLX5_ASSERT(priv->nb_queue > 0);
+	return priv->nb_queue - 1;
+}
+
+/**
+ * Creates a control flow using flow template API on @p proxy_dev device,
+ * on behalf of @p owner_dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * Created flow is stored in private list associated with @p proxy_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device on behalf of which flow is created.
+ * @param proxy_dev
+ *   Pointer to Ethernet device on which flow is created.
+ * @param table
+ *   Pointer to flow table.
+ * @param items
+ *   Pointer to flow rule items.
+ * @param item_template_idx
+ *   Index of an item template associated with @p table.
+ * @param actions
+ *   Pointer to flow rule actions.
+ * @param action_template_idx
+ *   Index of an action template associated with @p table.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno set.
+ */
+static __rte_unused int
+flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
+			 struct rte_eth_dev *proxy_dev,
+			 struct rte_flow_template_table *table,
+			 struct rte_flow_item items[],
+			 uint8_t item_template_idx,
+			 struct rte_flow_action actions[],
+			 uint8_t action_template_idx)
+{
+	struct mlx5_priv *priv = proxy_dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	struct rte_flow *flow = NULL;
+	struct mlx5_hw_ctrl_flow *entry = NULL;
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	entry = mlx5_malloc(MLX5_MEM_ZERO | MLX5_MEM_SYS, sizeof(*entry),
+			    0, SOCKET_ID_ANY);
+	if (!entry) {
+		DRV_LOG(ERR, "port %u not enough memory to create control flows",
+			proxy_dev->data->port_id);
+		rte_errno = ENOMEM;
+		ret = -rte_errno;
+		goto error;
+	}
+	flow = flow_hw_async_flow_create(proxy_dev, queue, &op_attr, table,
+					 items, item_template_idx,
+					 actions, action_template_idx,
+					 NULL, NULL);
+	if (!flow) {
+		DRV_LOG(ERR, "port %u failed to enqueue create control"
+			" flow operation", proxy_dev->data->port_id);
+		ret = -rte_errno;
+		goto error;
+	}
+	ret = flow_hw_push(proxy_dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			proxy_dev->data->port_id);
+		goto error;
+	}
+	ret = __flow_hw_pull_comp(proxy_dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to insert control flow",
+			proxy_dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto error;
+	}
+	entry->owner_dev = owner_dev;
+	entry->flow = flow;
+	LIST_INSERT_HEAD(&priv->hw_ctrl_flows, entry, next);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+error:
+	if (entry)
+		mlx5_free(entry);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys a control flow @p flow using flow template API on @p dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * If the @p flow is stored on any private list/pool, then caller must free up
+ * the relevant resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param flow
+ *   Pointer to flow rule.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+static int
+flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	ret = flow_hw_async_flow_destroy(dev, queue, &op_attr, flow, NULL, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to enqueue destroy control"
+			" flow operation", dev->data->port_id);
+		goto exit;
+	}
+	ret = flow_hw_push(dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			dev->data->port_id);
+		goto exit;
+	}
+	ret = __flow_hw_pull_comp(dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to destroy control flow",
+			dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto exit;
+	}
+exit:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys control flows created on behalf of @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	if (owner_priv->sh->config.dv_esw_en) {
+		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u",
+				owner_port_id);
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+		proxy_priv = proxy_dev->data->dev_private;
+	} else {
+		proxy_dev = owner_dev;
+		proxy_priv = owner_priv;
+	}
+	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		if (cf->owner_dev == owner_dev) {
+			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+			if (ret) {
+				rte_errno = ret;
+				return -ret;
+			}
+			LIST_REMOVE(cf, next);
+			mlx5_free(cf);
+		}
+		cf = cf_next;
+	}
+	return 0;
+}
+
+/**
+ * Destroys all control flows created on @p dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+static int
+flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	int ret;
+
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
+		if (ret) {
+			rte_errno = ret;
+			return -ret;
+		}
+		LIST_REMOVE(cf, next);
+		mlx5_free(cf);
+		cf = cf_next;
+	}
+	return 0;
+}
+
+int
+mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HW_SQ_MISS_GROUP,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx ||
+	    !priv->hw_esw_sq_miss_root_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_esw_sq_miss_root_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct mlx5_rte_flow_item_sq queue_spec = {
+		.queue = txq,
+	};
+	struct mlx5_rte_flow_item_sq queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &queue_spec,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_ethdev port = {
+		.port_id = port_id,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	RTE_SET_USED(txq);
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
+	    !proxy_priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_sq_miss_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = port_id,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = 1,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_zero_tbl,
+					items, 0, actions, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index fd902078f8..7ffaf4c227 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1245,12 +1245,14 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 	uint16_t ether_type = 0;
 	bool is_empty_vlan = false;
 	uint16_t udp_dport = 0;
+	bool is_root;
 
 	if (items == NULL)
 		return -1;
 	ret = mlx5_flow_validate_attributes(dev, attr, error);
 	if (ret < 0)
 		return ret;
+	is_root = ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int ret = 0;
@@ -1380,7 +1382,7 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index c68b32cf14..f59d314ff4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1280,6 +1280,52 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+
+static int
+mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	int ret;
+
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
+			goto error;
+	}
+	for (i = 0; i < priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
+		uint32_t queue;
+
+		if (!txq)
+			continue;
+		if (txq->is_hairpin)
+			queue = txq->obj->sq->id;
+		else
+			queue = txq->obj->sq_obj.sq->id;
+		if ((priv->representor || priv->master) &&
+		    priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
+	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
+			goto error;
+	}
+	return 0;
+error:
+	ret = rte_errno;
+	mlx5_flow_hw_flush_ctrl_flows(dev);
+	rte_errno = ret;
+	return -rte_errno;
+}
+
+#endif
+
 /**
  * Enable traffic flows configured by control plane
  *
@@ -1316,6 +1362,10 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 	unsigned int j;
 	int ret;
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_traffic_enable_hws(dev);
+#endif
 	/*
 	 * Hairpin txq default flow should be created no matter if it is
 	 * isolation mode. Or else all the packets to be sent will be sent
@@ -1346,13 +1396,17 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_create_esw_table_zero_flow(dev))
-			priv->fdb_def_rule = 1;
-		else
-			DRV_LOG(INFO, "port %u FDB default rule cannot be"
-				" configured - only Eswitch group 0 flows are"
-				" supported.", dev->data->port_id);
+	if (priv->sh->config.fdb_def_rule) {
+		if (priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_create_esw_table_zero_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				DRV_LOG(INFO, "port %u FDB default rule cannot be configured - only Eswitch group 0 flows are supported.",
+					dev->data->port_id);
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled",
+			dev->data->port_id);
 	}
 	if (!priv->sh->config.lacp_by_user && priv->pf_bond >= 0) {
 		ret = mlx5_flow_lacp_miss(dev);
@@ -1470,7 +1524,14 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 void
 mlx5_traffic_disable(struct rte_eth_dev *dev)
 {
-	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		mlx5_flow_hw_flush_ctrl_flows(dev);
+	else
+#endif
+		mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 06/18] net/mlx5: add extended metadata mode for hardware steering
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (4 preceding siblings ...)
  2022-10-20  3:21   ` [PATCH v5 05/18] net/mlx5: add HW steering port action Suanming Mou
@ 2022-10-20  3:21   ` Suanming Mou
  2022-10-20  3:21   ` [PATCH v5 07/18] net/mlx5: add HW steering meter action Suanming Mou
                     ` (11 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:21 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Bing Zhao

From: Bing Zhao <bingz@nvidia.com>

The new mode 4 of devarg "dv_xmeta_en" is added for HWS only. In this
mode, the Rx / Tx metadata with 32b width copy between FDB and NIC is
supported. The mark is only supported in NIC and there is no copy
supported.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 doc/guides/nics/mlx5.rst               |   4 +
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/net/mlx5/linux/mlx5_os.c       |  10 +-
 drivers/net/mlx5/mlx5.c                |   7 +-
 drivers/net/mlx5/mlx5.h                |   8 +-
 drivers/net/mlx5/mlx5_flow.c           |   8 +-
 drivers/net/mlx5/mlx5_flow.h           |  14 +
 drivers/net/mlx5/mlx5_flow_dv.c        |  43 +-
 drivers/net/mlx5/mlx5_flow_hw.c        | 864 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_trigger.c        |   3 +
 10 files changed, 877 insertions(+), 85 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 7d2095f075..0c7bd042a4 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -980,6 +980,10 @@ for an additional list of options shared with other mlx5 drivers.
   - 3, this engages tunnel offload mode. In E-Switch configuration, that
     mode implicitly activates ``dv_xmeta_en=1``.
 
+  - 4, this mode only supported in HWS (``dv_flow_en=2``). The Rx / Tx
+    metadata with 32b width copy between FDB and NIC is supported. The
+    mark is only supported in NIC and there is no copy supported.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index bac805dc0e..324076be50 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -241,6 +241,7 @@ New Features
 
   * Added fully support for queue based async HW steering to the PMD:
     - Support of modify fields.
+    - Support of FDB.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index d674b54624..c70cd84b8d 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1554,6 +1554,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 #ifdef HAVE_MLX5_HWS_SUPPORT
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+			DRV_LOG(ERR,
+				"metadata mode %u is not supported in HWS eswitch mode",
+				priv->sh->config.dv_xmeta_en);
+				err = ENOTSUP;
+				goto error;
+		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
@@ -1569,7 +1578,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		goto error;
 #endif
 	}
-	/* Port representor shares the same max priority with pf port. */
 	if (!priv->sh->flow_priority_check_flag) {
 		/* Supported Verbs flow priority number detection. */
 		err = mlx5_flow_discover_priorities(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 470b9c2d0f..9cd4892858 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1218,7 +1218,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
 		    tmp != MLX5_XMETA_MODE_META32 &&
-		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
+		    tmp != MLX5_XMETA_MODE_MISS_INFO &&
+		    tmp != MLX5_XMETA_MODE_META32_HWS) {
 			DRV_LOG(ERR, "Invalid extensive metadata parameter.");
 			rte_errno = EINVAL;
 			return -rte_errno;
@@ -2849,6 +2850,10 @@ mlx5_set_metadata_mask(struct rte_eth_dev *dev)
 		meta = UINT32_MAX;
 		mark = (reg_c0 >> rte_bsf32(reg_c0)) & MLX5_FLOW_MARK_MASK;
 		break;
+	case MLX5_XMETA_MODE_META32_HWS:
+		meta = UINT32_MAX;
+		mark = MLX5_FLOW_MARK_MASK;
+		break;
 	default:
 		meta = 0;
 		mark = 0;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 69a0a60030..6e7216efab 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -303,8 +303,8 @@ struct mlx5_sh_config {
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
-	unsigned int dv_flow_en:2;
-	uint32_t dv_xmeta_en:2; /* Enable extensive flow metadata. */
+	uint32_t dv_flow_en:2; /* Enable DV flow. */
+	uint32_t dv_xmeta_en:3; /* Enable extensive flow metadata. */
 	uint32_t dv_miss_info:1; /* Restore packet after partial hw miss. */
 	uint32_t l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	uint32_t vf_nl_en:1; /* Enable Netlink requests in VF mode. */
@@ -317,7 +317,6 @@ struct mlx5_sh_config {
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
@@ -1284,12 +1283,12 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
+	/* Availability of mreg_c's. */
 	void *devx_channel_lwm;
 	struct rte_intr_handle *intr_handle_lwm;
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
-	/* Availability of mreg_c's. */
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1515,6 +1514,7 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
+	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 60f76f5a43..3b8e97ccd0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1109,6 +1109,8 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_METADATA_TX:
@@ -1121,11 +1123,14 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_FLOW_MARK:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
+		case MLX5_XMETA_MODE_META32_HWS:
 			return REG_NON;
 		case MLX5_XMETA_MODE_META16:
 			return REG_C_1;
@@ -4444,7 +4449,8 @@ static bool flow_check_modify_action_type(struct rte_eth_dev *dev,
 		return true;
 	case RTE_FLOW_ACTION_TYPE_FLAG:
 	case RTE_FLOW_ACTION_TYPE_MARK:
-		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY)
+		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS)
 			return true;
 		else
 			return false;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 25b44ccca2..b0af13886a 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -48,6 +48,12 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
 };
 
+/* Private (internal) Field IDs for MODIFY_FIELD action. */
+enum mlx5_rte_flow_field_id {
+		MLX5_RTE_FLOW_FIELD_END = INT_MIN,
+			MLX5_RTE_FLOW_FIELD_META_REG,
+};
+
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
 
 enum {
@@ -1178,6 +1184,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
+	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
 };
 
 /* Jump action struct. */
@@ -1254,6 +1261,11 @@ struct mlx5_flow_group {
 #define MLX5_HW_TBL_MAX_ITEM_TEMPLATE 2
 #define MLX5_HW_TBL_MAX_ACTION_TEMPLATE 32
 
+struct mlx5_flow_template_table_cfg {
+	struct rte_flow_template_table_attr attr; /* Table attributes passed through flow API. */
+	bool external; /* True if created by flow API, false if table is internal to PMD. */
+};
+
 struct rte_flow_template_table {
 	LIST_ENTRY(rte_flow_template_table) next;
 	struct mlx5_flow_group *grp; /* The group rte_flow_template_table uses. */
@@ -1263,6 +1275,7 @@ struct rte_flow_template_table {
 	/* Action templates bind to the table. */
 	struct mlx5_hw_action_template ats[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	struct mlx5_indexed_pool *flow; /* The table's flow ipool. */
+	struct mlx5_flow_template_table_cfg cfg;
 	uint32_t type; /* Flow table type RX/TX/FDB. */
 	uint8_t nb_item_templates; /* Item template number. */
 	uint8_t nb_action_templates; /* Action template number. */
@@ -2370,4 +2383,5 @@ int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1ee26be975..a0bcaa5c53 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1758,7 +1758,8 @@ mlx5_flow_field_id_to_modify_info
 			int reg;
 
 			if (priv->sh->config.dv_flow_en == 2)
-				reg = REG_C_1;
+				reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG,
+							 data->level);
 			else
 				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
 							   data->level, error);
@@ -1837,6 +1838,24 @@ mlx5_flow_field_id_to_modify_info
 		else
 			info[idx].offset = off_be;
 		break;
+	case MLX5_RTE_FLOW_FIELD_META_REG:
+		{
+			uint32_t meta_mask = priv->sh->dv_meta_mask;
+			uint32_t meta_count = __builtin_popcount(meta_mask);
+			uint32_t reg = data->level;
+
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT(reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0, reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -9794,7 +9813,19 @@ flow_dv_translate_item_meta(struct rte_eth_dev *dev,
 	mask = meta_m->data;
 	if (key_type == MLX5_SET_MATCHER_HS_M)
 		mask = value;
-	reg = flow_dv_get_metadata_reg(dev, attr, NULL);
+	/*
+	 * In the current implementation, REG_B cannot be used to match.
+	 * Force to use REG_C_1 in HWS root table as other tables.
+	 * This map may change.
+	 * NIC: modify - REG_B to be present in SW
+	 *      match - REG_C_1 when copied from FDB, different from SWS
+	 * FDB: modify - REG_C_1 in Xmeta mode, REG_NON in legacy mode
+	 *      match - REG_C_1 in FDB
+	 */
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = flow_dv_get_metadata_reg(dev, attr, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_META, 0);
 	if (reg < 0)
 		return;
 	MLX5_ASSERT(reg != REG_NON);
@@ -9894,7 +9925,10 @@ flow_dv_translate_item_tag(struct rte_eth_dev *dev, void *key,
 	/* When set mask, the index should be from spec. */
 	index = tag_vv ? tag_vv->index : tag_v->index;
 	/* Get the metadata register index for the tag. */
-	reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG, index, NULL);
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG, index, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, index);
 	MLX5_ASSERT(reg > 0);
 	flow_dv_match_meta_reg(key, reg, tag_v->data, tag_m->data);
 }
@@ -13412,7 +13446,8 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 	 */
 	if (!(wks.item_flags & MLX5_FLOW_ITEM_PORT_ID) &&
 	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) && priv->sh->esw_mode &&
-	    !(attr->egress && !attr->transfer)) {
+	    !(attr->egress && !attr->transfer) &&
+	    attr->group != MLX5_FLOW_MREG_CP_TABLE_GROUP) {
 		if (flow_dv_translate_item_port_id_all(dev, match_mask,
 						   match_value, NULL, attr))
 			return -rte_errno;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 991e4c9b7b..319c8d1a89 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,13 +20,27 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
-/* Maximum number of rules in control flow tables */
+/* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Flow group for SQ miss default flows/ */
-#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+/* Lowest flow group usable by an application. */
+#define MLX5_HW_LOWEST_USABLE_GROUP (1)
+
+/* Maximum group index usable by user applications for transfer flows. */
+#define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
+
+/* Lowest priority for HW root table. */
+#define MLX5_HW_LOWEST_PRIO_ROOT 15
+
+/* Lowest priority for HW non-root table. */
+#define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+static int flow_hw_translate_group(struct rte_eth_dev *dev,
+				   const struct mlx5_flow_template_table_cfg *cfg,
+				   uint32_t group,
+				   uint32_t *table_group,
+				   struct rte_flow_error *error);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -213,12 +227,12 @@ flow_hw_rss_item_flags_get(const struct rte_flow_item items[])
  */
 static struct mlx5_hw_jump_action *
 flow_hw_jump_action_register(struct rte_eth_dev *dev,
-			     const struct rte_flow_attr *attr,
+			     const struct mlx5_flow_template_table_cfg *cfg,
 			     uint32_t dest_group,
 			     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_attr jattr = *attr;
+	struct rte_flow_attr jattr = cfg->attr.flow_attr;
 	struct mlx5_flow_group *grp;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -226,9 +240,13 @@ flow_hw_jump_action_register(struct rte_eth_dev *dev,
 		.data = &jattr,
 	};
 	struct mlx5_list_entry *ge;
+	uint32_t target_group;
 
-	jattr.group = dest_group;
-	ge = mlx5_hlist_register(priv->sh->flow_tbls, dest_group, &ctx);
+	target_group = dest_group;
+	if (flow_hw_translate_group(dev, cfg, dest_group, &target_group, error))
+		return NULL;
+	jattr.group = target_group;
+	ge = mlx5_hlist_register(priv->sh->flow_tbls, target_group, &ctx);
 	if (!ge)
 		return NULL;
 	grp = container_of(ge, struct mlx5_flow_group, entry);
@@ -760,7 +778,8 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)conf->src.pvalue :
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
-		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
 			item.spec = &value;
@@ -860,6 +879,9 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	if (m && !!m->port_id) {
 		struct mlx5_priv *port_priv;
 
+		if (!v)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
 		if (port_priv == NULL)
 			return rte_flow_error_set
@@ -903,8 +925,8 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] table_attr
- *   Pointer to the table attributes.
+ * @param[in] cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in/out] acts
@@ -919,12 +941,13 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  */
 static int
 flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct rte_flow_template_table_attr *table_attr,
+			  const struct mlx5_flow_template_table_cfg *cfg,
 			  struct mlx5_hw_actions *acts,
 			  struct rte_flow_actions_template *at,
 			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
 	const struct rte_flow_attr *attr = &table_attr->flow_attr;
 	struct rte_flow_action *actions = at->actions;
 	struct rte_flow_action *action_start = actions;
@@ -991,7 +1014,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
 				acts->jump = flow_hw_jump_action_register
-						(dev, attr, jump_group, error);
+						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
 				acts->rule_acts[i].action = (!!attr->group) ?
@@ -1101,6 +1124,16 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 							   error);
 			if (err)
 				goto err;
+			/*
+			 * Adjust the action source position for the following.
+			 * ... / MODIFY_FIELD: rx_cpy_pos / (QUEUE|RSS) / ...
+			 * The next action will be Q/RSS, there will not be
+			 * another adjustment and the real source position of
+			 * the following actions will be decreased by 1.
+			 * No change of the total actions in the new template.
+			 */
+			if ((actions - action_start) == at->rx_cpy_pos)
+				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			if (flow_hw_represented_port_compile
@@ -1365,7 +1398,8 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 	else
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
-	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
 	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
@@ -1513,7 +1547,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
 			jump = flow_hw_jump_action_register
-				(dev, &attr, jump_group, NULL);
+				(dev, &table->cfg, jump_group, NULL);
 			if (!jump)
 				return -1;
 			rule_acts[act_data->action_dst].action =
@@ -1710,7 +1744,13 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow actions based on the input actions.*/
+	/*
+	 * Construct the flow actions based on the input actions.
+	 * The implicitly appended action is always fixed, like metadata
+	 * copy action from FDB to NIC Rx.
+	 * No need to copy and contrust a new "actions" list based on the
+	 * user's input, in order to save the cost.
+	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
 				  actions, rule_acts, &acts_num)) {
 		rte_errno = EINVAL;
@@ -1981,6 +2021,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
 	/* Flush flow per-table from MLX5_DEFAULT_FLUSH_QUEUE. */
 	hw_q = &priv->hw_q[MLX5_DEFAULT_FLUSH_QUEUE];
 	LIST_FOREACH(tbl, &priv->flow_hw_tbl, next) {
+		if (!tbl->cfg.external)
+			continue;
 		MLX5_IPOOL_FOREACH(tbl->flow, fidx, flow) {
 			if (flow_hw_async_flow_destroy(dev,
 						MLX5_DEFAULT_FLUSH_QUEUE,
@@ -2018,8 +2060,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] attr
- *   Pointer to the table attributes.
+ * @param[in] table_cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in] nb_item_templates
@@ -2036,7 +2078,7 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  */
 static struct rte_flow_template_table *
 flow_hw_table_create(struct rte_eth_dev *dev,
-		     const struct rte_flow_template_table_attr *attr,
+		     const struct mlx5_flow_template_table_cfg *table_cfg,
 		     struct rte_flow_pattern_template *item_templates[],
 		     uint8_t nb_item_templates,
 		     struct rte_flow_actions_template *action_templates[],
@@ -2048,6 +2090,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -2088,6 +2131,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*tbl), 0, rte_socket_id());
 	if (!tbl)
 		goto error;
+	tbl->cfg = *table_cfg;
 	/* Allocate flow indexed pool. */
 	tbl->flow = mlx5_ipool_create(&cfg);
 	if (!tbl->flow)
@@ -2131,7 +2175,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			goto at_error;
 		}
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, attr,
+		err = flow_hw_actions_translate(dev, &tbl->cfg,
 						&tbl->ats[i].acts,
 						action_templates[i], error);
 		if (err) {
@@ -2174,6 +2218,96 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Translates group index specified by the user in @p attr to internal
+ * group index.
+ *
+ * Translation is done by incrementing group index, so group n becomes n + 1.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] cfg
+ *   Pointer to the template table configuration.
+ * @param[in] group
+ *   Currently used group index (table group or jump destination).
+ * @param[out] table_group
+ *   Pointer to output group index.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success. Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static int
+flow_hw_translate_group(struct rte_eth_dev *dev,
+			const struct mlx5_flow_template_table_cfg *cfg,
+			uint32_t group,
+			uint32_t *table_group,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
+
+	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
+	} else {
+		*table_group = group;
+	}
+	return 0;
+}
+
+/**
+ * Create flow table.
+ *
+ * This function is a wrapper over @ref flow_hw_table_create(), which translates parameters
+ * provided by user to proper internal values.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Pointer to the table attributes.
+ * @param[in] item_templates
+ *   Item template array to be binded to the table.
+ * @param[in] nb_item_templates
+ *   Number of item templates.
+ * @param[in] action_templates
+ *   Action template array to be binded to the table.
+ * @param[in] nb_action_templates
+ *   Number of action templates.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Table on success, Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static struct rte_flow_template_table *
+flow_hw_template_table_create(struct rte_eth_dev *dev,
+			      const struct rte_flow_template_table_attr *attr,
+			      struct rte_flow_pattern_template *item_templates[],
+			      uint8_t nb_item_templates,
+			      struct rte_flow_actions_template *action_templates[],
+			      uint8_t nb_action_templates,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = *attr,
+		.external = true,
+	};
+	uint32_t group = attr->flow_attr.group;
+
+	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
+		return NULL;
+	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
+				    action_templates, nb_action_templates, error);
+}
+
 /**
  * Destroy flow table.
  *
@@ -2309,10 +2443,13 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 					  "cannot use represented_port actions"
 					  " without an E-Switch");
-	if (mask_conf->port_id) {
+	if (mask_conf && mask_conf->port_id) {
 		struct mlx5_priv *port_priv;
 		struct mlx5_priv *dev_priv;
 
+		if (!action_conf)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
 		if (!port_priv)
 			return rte_flow_error_set(error, rte_errno,
@@ -2337,20 +2474,77 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static inline int
+flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
+				const struct rte_flow_action masks[],
+				const struct rte_flow_action *ins_actions,
+				const struct rte_flow_action *ins_masks,
+				struct rte_flow_action *new_actions,
+				struct rte_flow_action *new_masks,
+				uint16_t *ins_pos)
+{
+	uint16_t idx, total = 0;
+	bool ins = false;
+	bool act_end = false;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(ins_actions && ins_masks);
+	for (idx = 0; !act_end; idx++) {
+		if (idx >= MLX5_HW_MAX_ACTS)
+			return -1;
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
+		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+			ins = true;
+			*ins_pos = idx;
+		}
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+			act_end = true;
+	}
+	if (!ins)
+		return 0;
+	else if (idx == MLX5_HW_MAX_ACTS)
+		return -1; /* No more space. */
+	total = idx;
+	/* Before the position, no change for the actions. */
+	for (idx = 0; idx < *ins_pos; idx++) {
+		new_actions[idx] = actions[idx];
+		new_masks[idx] = masks[idx];
+	}
+	/* Insert the new action and mask to the position. */
+	new_actions[idx] = *ins_actions;
+	new_masks[idx] = *ins_masks;
+	/* Remaining content is right shifted by one position. */
+	for (; idx < total; idx++) {
+		new_actions[idx + 1] = actions[idx];
+		new_masks[idx + 1] = masks[idx];
+	}
+	return 0;
+}
+
 static int
 flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
-	int i;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t i;
 	bool actions_end = false;
 	int ret;
 
+	/* FDB actions are only valid to proxy port. */
+	if (attr->transfer && (!priv->sh->config.dv_esw_en || !priv->master))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "transfer actions are only valid to proxy port");
 	for (i = 0; !actions_end; ++i) {
 		const struct rte_flow_action *action = &actions[i];
 		const struct rte_flow_action *mask = &masks[i];
 
+		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
 		if (action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -2447,21 +2641,77 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int len, act_len, mask_len, i;
-	struct rte_flow_actions_template *at;
+	struct rte_flow_actions_template *at = NULL;
+	uint16_t pos = MLX5_HW_MAX_ACTS;
+	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
+	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
+	const struct rte_flow_action *ra;
+	const struct rte_flow_action *rm;
+	const struct rte_flow_action_modify_field rx_mreg = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_B,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field rx_mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action rx_cpy = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg,
+	};
+	const struct rte_flow_action rx_cpy_mask = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg_mask,
+	};
 
-	if (flow_hw_action_validate(dev, actions, masks, error))
+	if (flow_hw_action_validate(dev, attr, actions, masks, error))
 		return NULL;
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				NULL, 0, actions, error);
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en) {
+		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
+						    tmp_action, tmp_mask, &pos)) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "Failed to concatenate new action/mask");
+			return NULL;
+		}
+	}
+	/* Application should make sure only one Q/RSS exist in one rule. */
+	if (pos == MLX5_HW_MAX_ACTS) {
+		ra = actions;
+		rm = masks;
+	} else {
+		ra = tmp_action;
+		rm = tmp_mask;
+	}
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
 		return NULL;
 	len = RTE_ALIGN(act_len, 16);
-	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				 NULL, 0, masks, error);
+	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, rm, error);
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at), 64, rte_socket_id());
+	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
+			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2469,18 +2719,20 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
+	/* Actions part is in the first half. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions, len,
-				actions, error);
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
+				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	at->masks = (struct rte_flow_action *)
-		    (((uint8_t *)at->actions) + act_len);
+	/* Masks part is in the second half. */
+	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
-				 len - act_len, masks, error);
+				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
 	 * The rte_flow_conv() function copies the content from conf pointer.
@@ -2497,7 +2749,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	mlx5_free(at);
+	if (at)
+		mlx5_free(at);
 	return NULL;
 }
 
@@ -2572,6 +2825,80 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 	return copied_items;
 }
 
+static int
+flow_hw_pattern_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error)
+{
+	int i;
+	bool items_end = false;
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+
+	for (i = 0; !items_end; i++) {
+		int type = items[i].type;
+
+		switch (type) {
+		case RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			int reg;
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+
+			reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, tag->index);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported tag index");
+			break;
+		}
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+			struct mlx5_priv *priv = dev->data->dev_private;
+			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
+
+			if (!((1 << (tag->index - REG_C_0)) & regcs))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported internal tag index");
+		}
+		case RTE_FLOW_ITEM_TYPE_VOID:
+		case RTE_FLOW_ITEM_TYPE_ETH:
+		case RTE_FLOW_ITEM_TYPE_VLAN:
+		case RTE_FLOW_ITEM_TYPE_IPV4:
+		case RTE_FLOW_ITEM_TYPE_IPV6:
+		case RTE_FLOW_ITEM_TYPE_UDP:
+		case RTE_FLOW_ITEM_TYPE_TCP:
+		case RTE_FLOW_ITEM_TYPE_GTP:
+		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ITEM_TYPE_VXLAN:
+		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
+		case RTE_FLOW_ITEM_TYPE_META:
+		case RTE_FLOW_ITEM_TYPE_GRE:
+		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
+		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
+		case RTE_FLOW_ITEM_TYPE_ICMP:
+		case RTE_FLOW_ITEM_TYPE_ICMP6:
+			break;
+		case RTE_FLOW_ITEM_TYPE_END:
+			items_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL,
+						  "Unsupported item type");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow item template.
  *
@@ -2598,6 +2925,8 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
 
+	if (flow_hw_pattern_validate(dev, attr, items, error))
+		return NULL;
 	if (priv->sh->config.dv_esw_en && attr->ingress) {
 		/*
 		 * Disallow pattern template with ingress and egress/transfer
@@ -3032,6 +3361,17 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+static uint32_t
+flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+{
+	uint32_t usable_mask = ~priv->vport_meta_mask;
+
+	if (usable_mask)
+		return (1 << rte_bsf32(usable_mask));
+	else
+		return 0;
+}
+
 /**
  * Creates a flow pattern template used to match on E-Switch Manager.
  * This template is used to set up a table for SQ miss default flow.
@@ -3070,7 +3410,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match on a TX queue.
+ * Creates a flow pattern template used to match REG_C_0 and a TX queue.
+ * Matching on REG_C_0 is set up to match on least significant bit usable
+ * by user-space, which is set when packet was originated from E-Switch Manager.
+ *
  * This template is used to set up a table for SQ miss default flow.
  *
  * @param dev
@@ -3080,16 +3423,30 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
  *   Pointer to flow pattern template on success, NULL otherwise.
  */
 static struct rte_flow_pattern_template *
-flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
 	};
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
@@ -3100,6 +3457,12 @@ flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
+		return NULL;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -3137,6 +3500,132 @@ flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
+/*
+ * Creating a flow pattern template with all ETH packets matching.
+ * This template is used to set up a table for default Tx copy (Tx metadata
+ * to REG_C_1) flow rule usage.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr tx_pa_attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_pattern_template_create(dev, &tx_pa_attr, eth_all, &drop_err);
+}
+
+/**
+ * Creates a flow actions template with modify field action and masked jump action.
+ * Modify field action sets the least significant bit of REG_C_0 (usable by user-space)
+ * to 1, meaning that packet was originated from E-Switch Manager. Jump action
+ * transfers steering to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
+	uint32_t marker_bit_mask = UINT32_MAX;
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
+		return NULL;
+	}
+	set_reg_v.dst.offset = rte_bsf32(marker_bit);
+	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
+	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
 /**
  * Creates a flow actions template with an unmasked JUMP action. Flows
  * based on this template will perform a jump to some group. This template
@@ -3231,6 +3720,73 @@ flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
 					       NULL);
 }
 
+/*
+ * Creating an actions template to use header modify action for register
+ * copying. This template is used to set up a table for copy flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr tx_act_attr = {
+		.egress = 1,
+	};
+	const struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	const struct rte_flow_action copy_reg_mask[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_mask,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
+					       copy_reg_mask, &drop_err);
+}
+
 /**
  * Creates a control flow table used to transfer traffic from E-Switch Manager
  * and TX queues from group 0 to group 1.
@@ -3260,8 +3816,12 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 
@@ -3286,16 +3846,56 @@ flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
 {
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
-			.group = MLX5_HW_SQ_MISS_GROUP,
-			.priority = 0,
+			.group = 1,
+			.priority = MLX5_HW_LOWEST_PRIO_NON_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
+
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
+}
+
+/*
+ * Creating the default Tx metadata copy table on NIC Tx group 0.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param pt
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
+					  struct rte_flow_pattern_template *pt,
+					  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr tx_tbl_attr = {
+		.flow_attr = {
+			.group = 0, /* Root */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = 1, /* One default flow rule for all. */
+	};
+	struct mlx5_flow_template_table_cfg tx_tbl_cfg = {
+		.attr = tx_tbl_attr,
+		.external = false,
+	};
+	struct rte_flow_error drop_err;
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	RTE_SET_USED(drop_err);
+	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
 }
 
 /**
@@ -3320,15 +3920,19 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 15, /* TODO: Flow priority discovery. */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 /**
@@ -3346,11 +3950,14 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
-	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *regc_sq_items_tmpl = NULL;
 	struct rte_flow_pattern_template *port_items_tmpl = NULL;
-	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_pattern_template *tx_meta_items_tmpl = NULL;
+	struct rte_flow_actions_template *regc_jump_actions_tmpl = NULL;
 	struct rte_flow_actions_template *port_actions_tmpl = NULL;
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
+	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
 
 	/* Item templates */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
@@ -3359,8 +3966,8 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
-	if (!sq_items_tmpl) {
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create SQ item template for"
 			" control flows", dev->data->port_id);
 		goto error;
@@ -3371,11 +3978,18 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Action templates */
-	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
-									 MLX5_HW_SQ_MISS_GROUP);
-	if (!jump_sq_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
+	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
+	if (!regc_jump_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
@@ -3385,23 +3999,32 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
 	if (!jump_one_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
-			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_root_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
-	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
@@ -3416,6 +4039,16 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
+		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
+					tx_meta_items_tmpl, tx_meta_actions_tmpl);
+		if (!priv->hw_tx_meta_cpy_tbl) {
+			DRV_LOG(ERR, "port %u failed to create table for default"
+				" Tx metadata copy flow rule", dev->data->port_id);
+			goto error;
+		}
+	}
 	return 0;
 error:
 	if (priv->hw_esw_zero_tbl) {
@@ -3430,16 +4063,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
 	if (port_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
-	if (jump_sq_actions_tmpl)
-		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (regc_jump_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
-	if (sq_items_tmpl)
-		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (regc_sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, regc_sq_items_tmpl, NULL);
 	if (esw_mgr_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
 	return -EINVAL;
@@ -3491,7 +4128,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
-	int ret;
+	int ret = 0;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -3642,6 +4279,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	/* Do not overwrite the internal errno information. */
+	if (ret)
+		return ret;
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -3751,17 +4391,17 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		return;
 	unset |= 1 << (priv->mtr_color_reg - REG_C_0);
 	unset |= 1 << (REG_C_6 - REG_C_0);
-	if (meta_mode == MLX5_XMETA_MODE_META32_HWS) {
-		unset |= 1 << (REG_C_1 - REG_C_0);
+	if (priv->sh->config.dv_esw_en)
 		unset |= 1 << (REG_C_0 - REG_C_0);
-	}
+	if (meta_mode == MLX5_XMETA_MODE_META32_HWS)
+		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
 						mlx5_flow_hw_avl_tags[i];
-				copy_masks |= (1 << i);
+				copy_masks |= (1 << (mlx5_flow_hw_avl_tags[i] - REG_C_0));
 			}
 		}
 		if (copy_masks != masks) {
@@ -3903,7 +4543,6 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	return flow_dv_action_destroy(dev, handle, error);
 }
 
-
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -3911,7 +4550,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
-	.template_table_create = flow_hw_table_create,
+	.template_table_create = flow_hw_template_table_create,
 	.template_table_destroy = flow_hw_table_destroy,
 	.async_flow_create = flow_hw_async_flow_create,
 	.async_flow_destroy = flow_hw_async_flow_destroy,
@@ -3927,13 +4566,6 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
-static uint32_t
-flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
-{
-	MLX5_ASSERT(priv->nb_queue > 0);
-	return priv->nb_queue - 1;
-}
-
 /**
  * Creates a control flow using flow template API on @p proxy_dev device,
  * on behalf of @p owner_dev device.
@@ -3971,7 +4603,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4046,7 +4678,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4183,10 +4815,24 @@ mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
+	struct rte_flow_action_modify_field modify_field = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
 	struct rte_flow_action_jump jump = {
-		.group = MLX5_HW_SQ_MISS_GROUP,
+		.group = 1,
 	};
 	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &modify_field,
+		},
 		{
 			.type = RTE_FLOW_ACTION_TYPE_JUMP,
 			.conf = &jump,
@@ -4209,6 +4855,12 @@ int
 mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 {
 	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_sq queue_spec = {
 		.queue = txq,
 	};
@@ -4216,6 +4868,12 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
@@ -4241,6 +4899,7 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
+	uint32_t marker_bit;
 	int ret;
 
 	RTE_SET_USED(txq);
@@ -4261,6 +4920,14 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
+	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_create_ctrl_flow(dev, proxy_dev,
 					proxy_priv->hw_esw_sq_miss_tbl,
 					items, 0, actions, 0);
@@ -4320,4 +4987,53 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 					items, 0, actions, 0);
 }
 
+int
+mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx || !priv->hw_tx_meta_cpy_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_tx_meta_cpy_tbl,
+					eth_all, 0, copy_reg_action, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f59d314ff4..cccec08d70 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1292,6 +1292,9 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	if (priv->sh->config.dv_esw_en && priv->master) {
 		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
 			goto error;
+		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
+			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+				goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 07/18] net/mlx5: add HW steering meter action
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (5 preceding siblings ...)
  2022-10-20  3:21   ` [PATCH v5 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
@ 2022-10-20  3:21   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 08/18] net/mlx5: add HW steering counter action Suanming Mou
                     ` (10 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:21 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

This commit adds meter action for HWS steering.

HW steering meter is based on ASO. The number of meters will
be used by flows should be specified in advanced in the flow
configure API.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/net/mlx5/mlx5.h                |  61 ++-
 drivers/net/mlx5/mlx5_flow.c           |  71 +++
 drivers/net/mlx5/mlx5_flow.h           |  24 +
 drivers/net/mlx5/mlx5_flow_aso.c       |  30 +-
 drivers/net/mlx5/mlx5_flow_hw.c        | 258 ++++++++-
 drivers/net/mlx5/mlx5_flow_meter.c     | 702 ++++++++++++++++++++++++-
 7 files changed, 1111 insertions(+), 36 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 324076be50..192bb84211 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -242,6 +242,7 @@ New Features
   * Added fully support for queue based async HW steering to the PMD:
     - Support of modify fields.
     - Support of FDB.
+    - Support of meter.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6e7216efab..325f0b31c5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -362,6 +362,9 @@ struct mlx5_hw_q {
 	struct mlx5_hw_q_job **job; /* LIFO header. */
 } __rte_cache_aligned;
 
+
+
+
 #define MLX5_COUNTERS_PER_POOL 512
 #define MLX5_MAX_PENDING_QUERIES 4
 #define MLX5_CNT_CONTAINER_RESIZE 64
@@ -787,15 +790,29 @@ struct mlx5_flow_meter_policy {
 	/* Is meter action in policy table. */
 	uint32_t hierarchy_drop_cnt:1;
 	/* Is any meter in hierarchy contains drop_cnt. */
+	uint32_t skip_r:1;
+	/* If red color policy is skipped. */
 	uint32_t skip_y:1;
 	/* If yellow color policy is skipped. */
 	uint32_t skip_g:1;
 	/* If green color policy is skipped. */
 	uint32_t mark:1;
 	/* If policy contains mark action. */
+	uint32_t initialized:1;
+	/* Initialized. */
+	uint16_t group;
+	/* The group. */
 	rte_spinlock_t sl;
 	uint32_t ref_cnt;
 	/* Use count. */
+	struct rte_flow_pattern_template *hws_item_templ;
+	/* Hardware steering item templates. */
+	struct rte_flow_actions_template *hws_act_templ[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering action templates. */
+	struct rte_flow_template_table *hws_flow_table[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering tables. */
+	struct rte_flow *hws_flow_rule[MLX5_MTR_DOMAIN_MAX][RTE_COLORS];
+	/* Hardware steering rules. */
 	struct mlx5_meter_policy_action_container act_cnt[MLX5_MTR_RTE_COLORS];
 	/* Policy actions container. */
 	void *dr_drop_action[MLX5_MTR_DOMAIN_MAX];
@@ -870,6 +887,7 @@ struct mlx5_flow_meter_info {
 	 */
 	uint32_t transfer:1;
 	uint32_t def_policy:1;
+	uint32_t initialized:1;
 	/* Meter points to default policy. */
 	uint32_t color_aware:1;
 	/* Meter is color aware mode. */
@@ -885,6 +903,10 @@ struct mlx5_flow_meter_info {
 	/**< Flow meter action. */
 	void *meter_action_y;
 	/**< Flow meter action for yellow init_color. */
+	uint32_t meter_offset;
+	/**< Flow meter offset. */
+	uint16_t group;
+	/**< Flow meter group. */
 };
 
 /* PPS(packets per second) map to BPS(Bytes per second).
@@ -919,6 +941,7 @@ struct mlx5_flow_meter_profile {
 	uint32_t ref_cnt; /**< Use count. */
 	uint32_t g_support:1; /**< If G color will be generated. */
 	uint32_t y_support:1; /**< If Y color will be generated. */
+	uint32_t initialized:1; /**< Initialized. */
 };
 
 /* 2 meters in each ASO cache line */
@@ -939,13 +962,20 @@ enum mlx5_aso_mtr_state {
 	ASO_METER_READY, /* CQE received. */
 };
 
+/*aso flow meter type*/
+enum mlx5_aso_mtr_type {
+	ASO_METER_INDIRECT,
+	ASO_METER_DIRECT,
+};
+
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
 	LIST_ENTRY(mlx5_aso_mtr) next;
+	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
-	uint8_t offset;
+	uint32_t offset;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -969,6 +999,14 @@ struct mlx5_aso_mtr_pools_mng {
 	struct mlx5_aso_mtr_pool **pools; /* ASO flow meter pool array. */
 };
 
+/* Bulk management structure for ASO flow meter. */
+struct mlx5_mtr_bulk {
+	uint32_t size; /* Number of ASO objects. */
+	struct mlx5dr_action *action; /* HWS action */
+	struct mlx5_devx_obj *devx_obj; /* DEVX object. */
+	struct mlx5_aso_mtr *aso; /* Array of ASO objects. */
+};
+
 /* Meter management structure for global flow meter resource. */
 struct mlx5_flow_mtr_mng {
 	struct mlx5_aso_mtr_pools_mng pools_mng;
@@ -1022,6 +1060,7 @@ struct mlx5_flow_tbl_resource {
 #define MLX5_FLOW_TABLE_LEVEL_METER (MLX5_MAX_TABLES - 3)
 #define MLX5_FLOW_TABLE_LEVEL_POLICY (MLX5_MAX_TABLES - 4)
 #define MLX5_MAX_TABLES_EXTERNAL MLX5_FLOW_TABLE_LEVEL_POLICY
+#define MLX5_FLOW_TABLE_HWS_POLICY (MLX5_MAX_TABLES - 10)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 #define MLX5_FLOW_TABLE_FACTOR 10
 
@@ -1308,6 +1347,12 @@ TAILQ_HEAD(mlx5_mtr_profiles, mlx5_flow_meter_profile);
 /* MTR list. */
 TAILQ_HEAD(mlx5_legacy_flow_meters, mlx5_legacy_flow_meter);
 
+struct mlx5_mtr_config {
+	uint32_t nb_meters; /**< Number of configured meters */
+	uint32_t nb_meter_profiles; /**< Number of configured meter profiles */
+	uint32_t nb_meter_policies; /**< Number of configured meter policies */
+};
+
 /* RSS description. */
 struct mlx5_flow_rss_desc {
 	uint32_t level;
@@ -1545,12 +1590,16 @@ struct mlx5_priv {
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
+	struct mlx5_mtr_config mtr_config; /* Meter configuration */
 	uint8_t mtr_sfx_reg; /* Meter prefix-suffix flow match REG_C. */
 	uint8_t mtr_color_reg; /* Meter color match REG_C. */
 	struct mlx5_legacy_flow_meters flow_meters; /* MTR list. */
 	struct mlx5_l3t_tbl *mtr_profile_tbl; /* Meter index lookup table. */
+	struct mlx5_flow_meter_profile *mtr_profile_arr; /* Profile array. */
 	struct mlx5_l3t_tbl *policy_idx_tbl; /* Policy index lookup table. */
+	struct mlx5_flow_meter_policy *mtr_policy_arr; /* Policy array. */
 	struct mlx5_l3t_tbl *mtr_idx_tbl; /* Meter index lookup table. */
+	struct mlx5_mtr_bulk mtr_bulk; /* Meter index mapping for HWS */
 	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 	uint8_t fdb_def_rule; /* Whether fdb jump to table 1 is configured. */
 	struct mlx5_mp_id mp_id; /* ID of a multi-process process */
@@ -1564,13 +1613,13 @@ struct mlx5_priv {
 	struct mlx5_flex_item flex_item[MLX5_PORT_FLEX_ITEM_NUM];
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
+	uint32_t nb_queue; /* HW steering queue number. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
 	/* Action template list. */
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
-	uint32_t nb_queue; /* HW steering queue number. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
@@ -1586,6 +1635,7 @@ struct mlx5_priv {
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
 #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])
+#define CTRL_QUEUE_ID(priv) ((priv)->nb_queue - 1)
 
 struct rte_hairpin_peer_info {
 	uint32_t qp_id;
@@ -1897,6 +1947,11 @@ void mlx5_pmd_socket_uninit(void);
 
 /* mlx5_flow_meter.c */
 
+int mlx5_flow_meter_init(struct rte_eth_dev *dev,
+			 uint32_t nb_meters,
+			 uint32_t nb_meter_profiles,
+			 uint32_t nb_meter_policies);
+void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
 		uint32_t meter_id, uint32_t *mtr_idx);
@@ -1971,7 +2026,7 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 3b8e97ccd0..892c42a10b 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8333,6 +8333,40 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 	return fops->configure(dev, port_attr, nb_queue, queue_attr, error);
 }
 
+/**
+ * Validate item template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the item template attributes.
+ * @param[in] items
+ *   The template item pattern.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"pattern validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->pattern_validate(dev, attr, items, error);
+}
+
 /**
  * Create flow item template.
  *
@@ -8398,6 +8432,43 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 	return fops->pattern_template_destroy(dev, template, error);
 }
 
+/**
+ * Validate flow actions template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the action template attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[in] masks
+ *   List of actions that marks which of the action's member is constant.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
+			const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"actions validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->actions_validate(dev, attr, actions, masks, error);
+}
+
 /**
  * Create flow item template.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index b0af13886a..5f89afbe29 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1665,6 +1665,11 @@ typedef int (*mlx5_flow_port_configure_t)
 			 uint16_t nb_queue,
 			 const struct rte_flow_queue_attr *queue_attr[],
 			 struct rte_flow_error *err);
+typedef int (*mlx5_flow_pattern_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_pattern_template *(*mlx5_flow_pattern_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_pattern_template_attr *attr,
@@ -1674,6 +1679,12 @@ typedef int (*mlx5_flow_pattern_template_destroy_t)
 			(struct rte_eth_dev *dev,
 			 struct rte_flow_pattern_template *template,
 			 struct rte_flow_error *error);
+typedef int (*mlx5_flow_actions_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_actions_template *(*mlx5_flow_actions_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_actions_template_attr *attr,
@@ -1790,8 +1801,10 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_item_update_t item_update;
 	mlx5_flow_info_get_t info_get;
 	mlx5_flow_port_configure_t configure;
+	mlx5_flow_pattern_validate_t pattern_validate;
 	mlx5_flow_pattern_template_create_t pattern_template_create;
 	mlx5_flow_pattern_template_destroy_t pattern_template_destroy;
+	mlx5_flow_actions_validate_t actions_validate;
 	mlx5_flow_actions_template_create_t actions_template_create;
 	mlx5_flow_actions_template_destroy_t actions_template_destroy;
 	mlx5_flow_table_create_t template_table_create;
@@ -1873,6 +1886,8 @@ mlx5_aso_meter_by_idx(struct mlx5_priv *priv, uint32_t idx)
 
 	/* Decrease to original index. */
 	idx--;
+	if (priv->mtr_bulk.aso)
+		return priv->mtr_bulk.aso + idx;
 	MLX5_ASSERT(idx / MLX5_ASO_MTRS_PER_POOL < pools_mng->n);
 	rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 	pool = pools_mng->pools[idx / MLX5_ASO_MTRS_PER_POOL];
@@ -2384,4 +2399,13 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_actions_template_attr *attr,
+		const struct rte_flow_action actions[],
+		const struct rte_flow_action masks[],
+		struct rte_flow_error *error);
+int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 4129e3a9e0..60d0280367 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -642,7 +642,8 @@ mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh)
 static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
-			       struct mlx5_aso_mtr *aso_mtr)
+			       struct mlx5_aso_mtr *aso_mtr,
+			       struct mlx5_mtr_bulk *bulk)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -653,6 +654,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t dseg_idx = 0;
 	struct mlx5_aso_mtr_pool *pool = NULL;
 	uint32_t param_le;
+	int id;
 
 	rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
@@ -666,14 +668,19 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
-	pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-			mtrs[aso_mtr->offset]);
-	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
-			(aso_mtr->offset >> 1));
-	wqe->general_cseg.opcode = rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
-			(ASO_OPC_MOD_POLICER <<
-			WQE_CSEG_OPC_MOD_OFFSET) |
-			sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
+	if (aso_mtr->type == ASO_METER_INDIRECT) {
+		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+				    mtrs[aso_mtr->offset]);
+		id = pool->devx_obj->id;
+	} else {
+		id = bulk->devx_obj->id;
+	}
+	wqe->general_cseg.misc = rte_cpu_to_be_32(id +
+						  (aso_mtr->offset >> 1));
+	wqe->general_cseg.opcode =
+		rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
+			(ASO_OPC_MOD_POLICER << WQE_CSEG_OPC_MOD_OFFSET) |
+			 sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
 	/* There are 2 meters in one ASO cache line. */
 	dseg_idx = aso_mtr->offset & 0x1;
 	wqe->aso_cseg.data_mask =
@@ -811,14 +818,15 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  */
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-			struct mlx5_aso_mtr *mtr)
+			struct mlx5_aso_mtr *mtr,
+			struct mlx5_mtr_bulk *bulk)
 {
 	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 
 	do {
 		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 319c8d1a89..5051741a5a 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -914,6 +914,38 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_meter_compile(struct rte_eth_dev *dev,
+		      const struct mlx5_flow_template_table_cfg *cfg,
+		      uint32_t  start_pos, const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	const struct rte_flow_action_meter *meter = action->conf;
+	uint32_t pos = start_pos;
+	uint32_t group = cfg->attr.flow_attr.group;
+
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
+	acts->rule_acts[pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
+		acts->jump = flow_hw_jump_action_register
+		(dev, cfg, aso_mtr->fm.group, error);
+	if (!acts->jump) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	acts->rule_acts[++pos].action = (!!group) ?
+				    acts->jump->hws_action :
+				    acts->jump->root_action;
+	*end_pos = pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	return 0;
+}
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1142,6 +1174,21 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter *)
+			     masks->conf)->mtr_id) {
+				err = flow_hw_meter_compile(dev, cfg,
+						i, actions, acts, &i, error);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							i))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1482,6 +1529,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
+	const struct rte_flow_action_meter *meter = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1489,6 +1537,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	struct mlx5_aso_mtr *mtr;
+	uint32_t mtr_id;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -1608,6 +1658,29 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			rule_acts[act_data->action_dst].action =
 					priv->hw_vport[port_action->port_id];
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			meter = action->conf;
+			mtr_id = meter->mtr_id;
+			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			rule_acts[act_data->action_dst].action =
+				priv->mtr_bulk.action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+								mtr->offset;
+			jump = flow_hw_jump_action_register
+				(dev, &table->cfg, mtr->fm.group, NULL);
+			if (!jump)
+				return -1;
+			MLX5_ASSERT
+				(!rule_acts[act_data->action_dst + 1].action);
+			rule_acts[act_data->action_dst + 1].action =
+					(!!attr.group) ? jump->hws_action :
+							 jump->root_action;
+			job->flow->jump = jump;
+			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
+			(*acts_num)++;
+			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2523,7 +2596,7 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 }
 
 static int
-flow_hw_action_validate(struct rte_eth_dev *dev,
+flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
@@ -2589,6 +2662,9 @@ flow_hw_action_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -2682,7 +2758,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_action_validate(dev, attr, actions, masks, error))
+	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
@@ -3028,15 +3104,24 @@ flow_hw_pattern_template_destroy(struct rte_eth_dev *dev __rte_unused,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_hw_info_get(struct rte_eth_dev *dev __rte_unused,
-		 struct rte_flow_port_info *port_info __rte_unused,
-		 struct rte_flow_queue_info *queue_info __rte_unused,
+flow_hw_info_get(struct rte_eth_dev *dev,
+		 struct rte_flow_port_info *port_info,
+		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
-	/* Nothing to be updated currently. */
+	uint16_t port_id = dev->data->port_id;
+	struct rte_mtr_capabilities mtr_cap;
+	int ret;
+
 	memset(port_info, 0, sizeof(*port_info));
 	/* Queue size is unlimited from low-level. */
+	port_info->max_nb_queues = UINT32_MAX;
 	queue_info->max_size = UINT32_MAX;
+
+	memset(&mtr_cap, 0, sizeof(struct rte_mtr_capabilities));
+	ret = rte_mtr_capabilities_get(port_id, &mtr_cap, NULL);
+	if (!ret)
+		port_info->max_nb_meters = mtr_cap.n_max;
 	return 0;
 }
 
@@ -4231,6 +4316,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	priv->nb_queue = nb_q_updated;
 	rte_spinlock_init(&priv->hw_ctrl_lock);
 	LIST_INIT(&priv->hw_ctrl_flows);
+	/* Initialize meter library*/
+	if (port_attr->nb_meters)
+		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1))
+			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		uint32_t act_flags = 0;
@@ -4546,8 +4635,10 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
+	.pattern_validate = flow_hw_pattern_validate,
 	.pattern_template_create = flow_hw_pattern_template_create,
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
+	.actions_validate = flow_hw_actions_validate,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
 	.template_table_create = flow_hw_template_table_create,
@@ -4603,7 +4694,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4678,7 +4769,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -5036,4 +5127,155 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+void
+mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->mtr_policy_arr) {
+		mlx5_free(priv->mtr_policy_arr);
+		priv->mtr_policy_arr = NULL;
+	}
+	if (priv->mtr_profile_arr) {
+		mlx5_free(priv->mtr_profile_arr);
+		priv->mtr_profile_arr = NULL;
+	}
+	if (priv->mtr_bulk.aso) {
+		mlx5_free(priv->mtr_bulk.aso);
+		priv->mtr_bulk.aso = NULL;
+		priv->mtr_bulk.size = 0;
+		mlx5_aso_queue_uninit(priv->sh, ASO_OPC_MOD_POLICER);
+	}
+	if (priv->mtr_bulk.action) {
+		mlx5dr_action_destroy(priv->mtr_bulk.action);
+		priv->mtr_bulk.action = NULL;
+	}
+	if (priv->mtr_bulk.devx_obj) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->mtr_bulk.devx_obj));
+		priv->mtr_bulk.devx_obj = NULL;
+	}
+}
+
+int
+mlx5_flow_meter_init(struct rte_eth_dev *dev,
+		     uint32_t nb_meters,
+		     uint32_t nb_meter_profiles,
+		     uint32_t nb_meter_policies)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_obj *dcs = NULL;
+	uint32_t log_obj_size;
+	int ret = 0;
+	int reg_id;
+	struct mlx5_aso_mtr *aso;
+	uint32_t i;
+	struct rte_flow_error error;
+
+	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter configuration is invalid.");
+		goto err;
+	}
+	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO is not supported.");
+		goto err;
+	}
+	priv->mtr_config.nb_meters = nb_meters;
+	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	log_obj_size = rte_log2_u32(nb_meters >> 1);
+	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
+		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
+			log_obj_size);
+	if (!dcs) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO object allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.devx_obj = dcs;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	if (reg_id < 0) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter register is not available.");
+		goto err;
+	}
+	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
+			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
+				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
+				MLX5DR_ACTION_FLAG_HWS_TX |
+				MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!priv->mtr_bulk.action) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter action creation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
+						sizeof(struct mlx5_aso_mtr) * nb_meters,
+						RTE_CACHE_LINE_SIZE,
+						SOCKET_ID_ANY);
+	if (!priv->mtr_bulk.aso) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter bulk ASO allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.size = nb_meters;
+	aso = priv->mtr_bulk.aso;
+	for (i = 0; i < priv->mtr_bulk.size; i++) {
+		aso->type = ASO_METER_DIRECT;
+		aso->state = ASO_METER_WAIT;
+		aso->offset = i;
+		aso++;
+	}
+	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
+	priv->mtr_profile_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_profile) *
+				nb_meter_profiles,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_profile_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter profile allocation failed.");
+		goto err;
+	}
+	priv->mtr_config.nb_meter_policies = nb_meter_policies;
+	priv->mtr_policy_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_policy) *
+				nb_meter_policies,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_policy_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter policy allocation failed.");
+		goto err;
+	}
+	return 0;
+err:
+	mlx5_flow_meter_uninit(dev);
+	return ret;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index d4aafe4eea..8cf24d1f7a 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -98,6 +98,8 @@ mlx5_flow_meter_profile_find(struct mlx5_priv *priv, uint32_t meter_profile_id)
 	union mlx5_l3t_data data;
 	int32_t ret;
 
+	if (priv->mtr_profile_arr)
+		return &priv->mtr_profile_arr[meter_profile_id];
 	if (mlx5_l3t_get_entry(priv->mtr_profile_tbl,
 			       meter_profile_id, &data) || !data.ptr)
 		return NULL;
@@ -145,17 +147,29 @@ mlx5_flow_meter_profile_validate(struct rte_eth_dev *dev,
 					  RTE_MTR_ERROR_TYPE_METER_PROFILE,
 					  NULL, "Meter profile is null.");
 	/* Meter profile ID must be valid. */
-	if (meter_profile_id == UINT32_MAX)
-		return -rte_mtr_error_set(error, EINVAL,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL, "Meter profile id not valid.");
-	/* Meter profile must not exist. */
-	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
-	if (fmp)
-		return -rte_mtr_error_set(error, EEXIST,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL,
-					  "Meter profile already exists.");
+	if (priv->mtr_profile_arr) {
+		if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp->initialized)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	} else {
+		if (meter_profile_id == UINT32_MAX)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	}
 	if (!priv->sh->meter_aso_en) {
 		/* Old version is even not supported. */
 		if (!priv->sh->cdev->config.hca_attr.qos.flow_meter_old)
@@ -574,6 +588,96 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to add MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[in] profile
+ *   Pointer to meter profile detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_add(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_meter_profile *profile,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+	int ret;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Check input params. */
+	ret = mlx5_flow_meter_profile_validate(dev, meter_profile_id,
+					       profile, error);
+	if (ret)
+		return ret;
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	/* Fill profile info. */
+	fmp->id = meter_profile_id;
+	fmp->profile = *profile;
+	fmp->initialized = 1;
+	/* Fill the flow meter parameters for the PRM. */
+	return mlx5_flow_meter_param_fill(fmp, error);
+}
+
+/**
+ * Callback to delete MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_delete(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Meter id must be valid. */
+	if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id not valid.");
+	/* Meter profile must exist. */
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	if (!fmp->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id is invalid.");
+	/* Check profile is unused. */
+	if (fmp->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  NULL, "Meter profile is in use.");
+	memset(fmp, 0, sizeof(struct mlx5_flow_meter_profile));
+	return 0;
+}
+
 /**
  * Find policy by id.
  *
@@ -594,6 +698,11 @@ mlx5_flow_meter_policy_find(struct rte_eth_dev *dev,
 	struct mlx5_flow_meter_sub_policy *sub_policy = NULL;
 	union mlx5_l3t_data data;
 
+	if (priv->mtr_policy_arr) {
+		if (policy_idx)
+			*policy_idx = policy_id;
+		return &priv->mtr_policy_arr[policy_id];
+	}
 	if (policy_id > MLX5_MAX_SUB_POLICY_TBL_NUM || !priv->policy_idx_tbl)
 		return NULL;
 	if (mlx5_l3t_get_entry(priv->policy_idx_tbl, policy_id, &data) ||
@@ -710,6 +819,43 @@ mlx5_flow_meter_policy_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to check MTR policy action validate for HWS
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_validate(struct rte_eth_dev *dev,
+	struct rte_mtr_meter_policy_params *policy,
+	struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_actions_template_attr attr = {
+		.transfer = priv->sh->config.dv_esw_en ? 1 : 0 };
+	int ret;
+	int i;
+
+	if (!priv->mtr_en || !priv->sh->meter_aso_en)
+		return -rte_mtr_error_set(error, ENOTSUP,
+				RTE_MTR_ERROR_TYPE_METER_POLICY,
+				NULL, "meter policy unsupported.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		ret = mlx5_flow_actions_validate(dev, &attr, policy->actions[i],
+						 policy->actions[i], NULL);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int
 __mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 			uint32_t policy_id,
@@ -1004,6 +1150,338 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to delete MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_delete(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy;
+	uint32_t i, j;
+	uint32_t nb_flows = 0;
+	int ret;
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter policy array is not allocated");
+	/* Meter id must be valid. */
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  &policy_id,
+					  "Meter policy id not valid.");
+	/* Meter policy must exist. */
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (!mtr_policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID, NULL,
+			"Meter policy does not exists.");
+	/* Check policy is unused. */
+	if (mtr_policy->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy is in use.");
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->hws_flow_rule[i][j]) {
+				ret = rte_flow_async_destroy(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_rule[i][j],
+					NULL, NULL);
+				if (ret < 0)
+					continue;
+				nb_flows++;
+			}
+		}
+	}
+	ret = rte_flow_push(dev->data->port_id, CTRL_QUEUE_ID(priv), NULL);
+	while (nb_flows && (ret >= 0)) {
+		ret = rte_flow_pull(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), result,
+					nb_flows, NULL);
+		nb_flows -= ret;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		if (mtr_policy->hws_flow_table[i])
+			rte_flow_template_table_destroy(dev->data->port_id,
+				 mtr_policy->hws_flow_table[i], NULL);
+	}
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->hws_act_templ[i])
+			rte_flow_actions_template_destroy(dev->data->port_id,
+				 mtr_policy->hws_act_templ[i], NULL);
+	}
+	if (mtr_policy->hws_item_templ)
+		rte_flow_pattern_template_destroy(dev->data->port_id,
+				mtr_policy->hws_item_templ, NULL);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	return 0;
+}
+
+/**
+ * Callback to add MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[out] policy_id
+ *   Pointer to policy id
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
+			uint32_t policy_id,
+			struct rte_mtr_meter_policy_params *policy,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy = NULL;
+	const struct rte_flow_action *act;
+	const struct rte_flow_action_meter *mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *plc;
+	uint8_t domain_color = MLX5_MTR_ALL_DOMAIN_BIT;
+	bool is_rss = false;
+	bool is_hierarchy = false;
+	int i, j;
+	uint32_t nb_colors = 0;
+	uint32_t nb_flows = 0;
+	int color;
+	int ret;
+	struct rte_flow_pattern_template_attr pta = {0};
+	struct rte_flow_actions_template_attr ata = {0};
+	struct rte_flow_template_table_attr ta = { {0}, 0 };
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+	const uint32_t color_mask = (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	int color_reg_c_idx = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						   0, NULL);
+	struct rte_flow_item_tag tag_spec = {
+		.data = 0,
+		.index = color_reg_c_idx
+	};
+	struct rte_flow_item_tag tag_mask = {
+		.data = color_mask,
+		.index = 0xff};
+	struct rte_flow_item pattern[] = {
+		[0] = {
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &tag_spec,
+			.mask = &tag_mask,
+		},
+		[1] = { .type = RTE_FLOW_ITEM_TYPE_END }
+	};
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy array is not allocated.");
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy id not valid.");
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (mtr_policy->initialized)
+		return -rte_mtr_error_set(error, EEXIST,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy already exists.");
+	if (!policy ||
+	    !policy->actions[RTE_COLOR_RED] ||
+	    !policy->actions[RTE_COLOR_YELLOW] ||
+	    !policy->actions[RTE_COLOR_GREEN])
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy actions are not valid.");
+	if (policy->actions[RTE_COLOR_RED] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_r = 1;
+	if (policy->actions[RTE_COLOR_YELLOW] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_y = 1;
+	if (policy->actions[RTE_COLOR_GREEN] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_g = 1;
+	if (mtr_policy->skip_r && mtr_policy->skip_y && mtr_policy->skip_g)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy actions are empty.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		act = policy->actions[i];
+		while (act && act->type != RTE_FLOW_ACTION_TYPE_END) {
+			switch (act->type) {
+			case RTE_FLOW_ACTION_TYPE_PORT_ID:
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+				domain_color &= ~(MLX5_MTR_DOMAIN_INGRESS_BIT |
+						  MLX5_MTR_DOMAIN_EGRESS_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_RSS:
+				is_rss = true;
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_QUEUE:
+				domain_color &= ~(MLX5_MTR_DOMAIN_EGRESS_BIT |
+						  MLX5_MTR_DOMAIN_TRANSFER_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_METER:
+				is_hierarchy = true;
+				mtr = act->conf;
+				fm = mlx5_flow_meter_find(priv,
+							  mtr->mtr_id, NULL);
+				if (!fm)
+					return -rte_mtr_error_set(error, EINVAL,
+						RTE_MTR_ERROR_TYPE_MTR_ID, NULL,
+						"Meter not found in meter hierarchy.");
+				plc = mlx5_flow_meter_policy_find(dev,
+								  fm->policy_id,
+								  NULL);
+				MLX5_ASSERT(plc);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->ingress <<
+					 MLX5_MTR_DOMAIN_INGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->egress <<
+					 MLX5_MTR_DOMAIN_EGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->transfer <<
+					 MLX5_MTR_DOMAIN_TRANSFER);
+				break;
+			default:
+				break;
+			}
+			act++;
+		}
+	}
+	if (!domain_color)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy domains are conflicting.");
+	mtr_policy->is_rss = is_rss;
+	mtr_policy->ingress = !!(domain_color & MLX5_MTR_DOMAIN_INGRESS_BIT);
+	pta.ingress = mtr_policy->ingress;
+	mtr_policy->egress = !!(domain_color & MLX5_MTR_DOMAIN_EGRESS_BIT);
+	pta.egress = mtr_policy->egress;
+	mtr_policy->transfer = !!(domain_color & MLX5_MTR_DOMAIN_TRANSFER_BIT);
+	pta.transfer = mtr_policy->transfer;
+	mtr_policy->group = MLX5_FLOW_TABLE_HWS_POLICY - policy_id;
+	mtr_policy->is_hierarchy = is_hierarchy;
+	mtr_policy->initialized = 1;
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	mtr_policy->hws_item_templ =
+		rte_flow_pattern_template_create(dev->data->port_id,
+						 &pta, pattern, NULL);
+	if (!mtr_policy->hws_item_templ)
+		goto policy_add_err;
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->skip_g && i == RTE_COLOR_GREEN)
+			continue;
+		if (mtr_policy->skip_y && i == RTE_COLOR_YELLOW)
+			continue;
+		if (mtr_policy->skip_r && i == RTE_COLOR_RED)
+			continue;
+		mtr_policy->hws_act_templ[nb_colors] =
+			rte_flow_actions_template_create(dev->data->port_id,
+						&ata, policy->actions[i],
+						policy->actions[i], NULL);
+		if (!mtr_policy->hws_act_templ[nb_colors])
+			goto policy_add_err;
+		nb_colors++;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		memset(&ta, 0, sizeof(ta));
+		ta.nb_flows = RTE_COLORS;
+		ta.flow_attr.group = mtr_policy->group;
+		if (i == MLX5_MTR_DOMAIN_INGRESS) {
+			if (!mtr_policy->ingress)
+				continue;
+			ta.flow_attr.ingress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_EGRESS) {
+			if (!mtr_policy->egress)
+				continue;
+			ta.flow_attr.egress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_TRANSFER) {
+			if (!mtr_policy->transfer)
+				continue;
+			ta.flow_attr.transfer = 1;
+		}
+		mtr_policy->hws_flow_table[i] =
+			rte_flow_template_table_create(dev->data->port_id,
+					&ta, &mtr_policy->hws_item_templ, 1,
+					mtr_policy->hws_act_templ, nb_colors,
+					NULL);
+		if (!mtr_policy->hws_flow_table[i])
+			goto policy_add_err;
+		nb_colors = 0;
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->skip_g && j == RTE_COLOR_GREEN)
+				continue;
+			if (mtr_policy->skip_y && j == RTE_COLOR_YELLOW)
+				continue;
+			if (mtr_policy->skip_r && j == RTE_COLOR_RED)
+				continue;
+			color = rte_col_2_mlx5_col((enum rte_color)j);
+			tag_spec.data = color;
+			mtr_policy->hws_flow_rule[i][j] =
+				rte_flow_async_create(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_table[i],
+					pattern, 0, policy->actions[j],
+					nb_colors, NULL, NULL);
+			if (!mtr_policy->hws_flow_rule[i][j])
+				goto policy_add_err;
+			nb_colors++;
+			nb_flows++;
+		}
+		ret = rte_flow_push(dev->data->port_id,
+				    CTRL_QUEUE_ID(priv), NULL);
+		if (ret < 0)
+			goto policy_add_err;
+		while (nb_flows) {
+			ret = rte_flow_pull(dev->data->port_id,
+					    CTRL_QUEUE_ID(priv), result,
+					    nb_flows, NULL);
+			if (ret < 0)
+				goto policy_add_err;
+			for (j = 0; j < ret; j++) {
+				if (result[j].status == RTE_FLOW_OP_ERROR)
+					goto policy_add_err;
+			}
+			nb_flows -= ret;
+		}
+	}
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+policy_add_err:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	ret = mlx5_flow_meter_policy_hws_delete(dev, policy_id, error);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	if (ret)
+		return ret;
+	return -rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Failed to create meter policy.");
+}
+
 /**
  * Check meter validation.
  *
@@ -1087,7 +1565,8 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
@@ -1336,7 +1815,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1369,6 +1849,90 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 		NULL, "Failed to create devx meter.");
 }
 
+/**
+ * Create meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[in] params
+ *   Pointer to rte meter parameters.
+ * @param[in] shared
+ *   Meter shared with other flow or not.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
+		       struct rte_mtr_params *params, int shared,
+		       struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *profile;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy = NULL;
+	struct mlx5_aso_mtr *aso_mtr;
+	int ret;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+			"Meter bulk array is not allocated.");
+	/* Meter profile must exist. */
+	profile = mlx5_flow_meter_profile_find(priv, params->meter_profile_id);
+	if (!profile->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+			NULL, "Meter profile id not valid.");
+	/* Meter policy must exist. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			params->meter_policy_id, NULL);
+	if (!policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy id not valid.");
+	/* Meter ID must be valid. */
+	if (meter_id >= priv->mtr_config.nb_meters)
+		return -rte_mtr_error_set(error, EINVAL,
+			RTE_MTR_ERROR_TYPE_MTR_ID,
+			NULL, "Meter id not valid.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object already exists.");
+	/* Fill the flow meter parameters. */
+	fm->meter_id = meter_id;
+	fm->policy_id = params->meter_policy_id;
+	fm->profile = profile;
+	fm->meter_offset = meter_id;
+	fm->group = policy->group;
+	/* Add to the flow meter list. */
+	fm->active_state = 1; /* Config meter starts as active. */
+	fm->is_enable = params->meter_enable;
+	fm->shared = !!shared;
+	fm->initialized = 1;
+	/* Update ASO flow meter by wqe. */
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+					   &priv->mtr_bulk);
+	if (ret)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+			NULL, "Failed to create devx meter.");
+	fm->active_state = params->meter_enable;
+	__atomic_add_fetch(&fm->profile->ref_cnt, 1, __ATOMIC_RELAXED);
+	__atomic_add_fetch(&policy->ref_cnt, 1, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
 mlx5_flow_meter_params_flush(struct rte_eth_dev *dev,
 			struct mlx5_flow_meter_info *fm,
@@ -1475,6 +2039,58 @@ mlx5_flow_meter_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
 	return 0;
 }
 
+/**
+ * Destroy meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_METER_POLICY, NULL,
+			"Meter bulk array is not allocated.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (!fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object id not valid.");
+	/* Meter object must not have any owner. */
+	if (fm->ref_cnt > 0)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter object is being used.");
+	/* Destroy the meter profile. */
+	__atomic_sub_fetch(&fm->profile->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	/* Destroy the meter policy. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			fm->policy_id, NULL);
+	__atomic_sub_fetch(&policy->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	memset(fm, 0, sizeof(struct mlx5_flow_meter_info));
+	return 0;
+}
+
 /**
  * Modify meter state.
  *
@@ -1798,6 +2414,23 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.stats_read = mlx5_flow_meter_stats_read,
 };
 
+static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
+	.capabilities_get = mlx5_flow_mtr_cap_get,
+	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
+	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
+	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
+	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.create = mlx5_flow_meter_hws_create,
+	.destroy = mlx5_flow_meter_hws_destroy,
+	.meter_enable = mlx5_flow_meter_enable,
+	.meter_disable = mlx5_flow_meter_disable,
+	.meter_profile_update = mlx5_flow_meter_profile_update,
+	.meter_dscp_table_update = NULL,
+	.stats_update = NULL,
+	.stats_read = NULL,
+};
+
 /**
  * Get meter operations.
  *
@@ -1812,7 +2445,12 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 int
 mlx5_flow_meter_ops_get(struct rte_eth_dev *dev __rte_unused, void *arg)
 {
-	*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_hws_ops;
+	else
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
 	return 0;
 }
 
@@ -1841,6 +2479,12 @@ mlx5_flow_meter_find(struct mlx5_priv *priv, uint32_t meter_id,
 	union mlx5_l3t_data data;
 	uint16_t n_valid;
 
+	if (priv->mtr_bulk.aso) {
+		if (mtr_idx)
+			*mtr_idx = meter_id;
+		aso_mtr = priv->mtr_bulk.aso + meter_id;
+		return &aso_mtr->fm;
+	}
 	if (priv->sh->meter_aso_en) {
 		rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 		n_valid = pools_mng->n_valid;
@@ -2185,6 +2829,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 	struct mlx5_flow_meter_profile *fmp;
 	struct mlx5_legacy_flow_meter *legacy_fm;
 	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
 	struct mlx5_flow_meter_sub_policy *sub_policy;
 	void *tmp;
 	uint32_t i, mtr_idx, policy_idx;
@@ -2219,6 +2864,14 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 				NULL, "MTR object meter profile invalid.");
 		}
 	}
+	if (priv->mtr_bulk.aso) {
+		for (i = 1; i <= priv->mtr_config.nb_meter_profiles; i++) {
+			aso_mtr = mlx5_aso_meter_by_idx(priv, i);
+			fm = &aso_mtr->fm;
+			if (fm->initialized)
+				mlx5_flow_meter_hws_destroy(dev, i, error);
+		}
+	}
 	if (priv->policy_idx_tbl) {
 		MLX5_L3T_FOREACH(priv->policy_idx_tbl, i, entry) {
 			policy_idx = *(uint32_t *)entry;
@@ -2244,6 +2897,15 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->policy_idx_tbl);
 		priv->policy_idx_tbl = NULL;
 	}
+	if (priv->mtr_policy_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_policies; i++) {
+			policy = mlx5_flow_meter_policy_find(dev, i,
+							     &policy_idx);
+			if (policy->initialized)
+				mlx5_flow_meter_policy_hws_delete(dev, i,
+								  error);
+		}
+	}
 	if (priv->mtr_profile_tbl) {
 		MLX5_L3T_FOREACH(priv->mtr_profile_tbl, i, entry) {
 			fmp = entry;
@@ -2257,9 +2919,21 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->mtr_profile_tbl);
 		priv->mtr_profile_tbl = NULL;
 	}
+	if (priv->mtr_profile_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_profiles; i++) {
+			fmp = mlx5_flow_meter_profile_find(priv, i);
+			if (fmp->initialized)
+				mlx5_flow_meter_profile_hws_delete(dev, i,
+								   error);
+		}
+	}
 	/* Delete default policy table. */
 	mlx5_flow_destroy_def_policy(dev);
 	if (priv->sh->refcnt == 1)
 		mlx5_flow_destroy_mtr_drop_tbls(dev);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	/* Destroy HWS configuration. */
+	mlx5_flow_meter_uninit(dev);
+#endif
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 08/18] net/mlx5: add HW steering counter action
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (6 preceding siblings ...)
  2022-10-20  3:21   ` [PATCH v5 07/18] net/mlx5: add HW steering meter action Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 09/18] net/mlx5: support DR action template API Suanming Mou
                     ` (9 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Xiaoyu Min

From: Xiaoyu Min <jackmin@nvidia.com>

This commit adds HW steering counter action support.
Pool mechanism is the basic data structure for the HW steering counter.

The HW steering's counter pool is based on the rte_ring of zero-copy
variation.

There are two global rte_rings:
1. free_list:
     Store the counters indexes, which are ready for use.
2. wait_reset_list:
     Store the counters indexes, which are just freed from the user and
     need to query the hardware counter to get the reset value before
     this counter can be reused again.

The counter pool also supports cache per HW steering's queues, which are
also based on rte_ring of zero-copy variation.

The cache can be configured in size, preload, threshold, and fetch size,
they are all exposed via device args.

The main operations of the counter pool are as follows:

 - Get one counter from the pool:
   1. The user call _get_* API.
   2. If the cache is enabled, dequeue one counter index from the local
      cache:
      2.A: if the dequeued one from the local cache is still in reset
	status (counter's query_gen_when_free is equal to pool's query
	gen):
	I. Flush all counters in local cache back to global
	   wait_reset_list.
	II. Fetch _fetch_sz_ counters into the cache from the global
	    free list.
	III. Fetch one counter from the cache.
   3. If the cache is empty, fetch _fetch_sz_ counters from the global
      free list into the cache and fetch one counter from the cache.
 - Free one counter into the pool:
   1. The user calls _put_* API.
   2. Put the counter into the local cache.
   3. If the local cache is full:
      3.A: Write back all counters above _threshold_ into the global
           wait_reset_list.
      3.B: Also, write back this counter into the global wait_reset_list.

When the local cache is disabled, _get_/_put_ cache directly from/into
global list.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
---
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/common/mlx5/mlx5_devx_cmds.c   |  50 +++
 drivers/common/mlx5/mlx5_devx_cmds.h   |  27 ++
 drivers/common/mlx5/mlx5_prm.h         |  20 +-
 drivers/common/mlx5/version.map        |   1 +
 drivers/net/mlx5/meson.build           |   1 +
 drivers/net/mlx5/mlx5.c                |  14 +
 drivers/net/mlx5/mlx5.h                |  27 ++
 drivers/net/mlx5/mlx5_defs.h           |   2 +
 drivers/net/mlx5/mlx5_flow.c           |  27 +-
 drivers/net/mlx5/mlx5_flow.h           |   5 +
 drivers/net/mlx5/mlx5_flow_aso.c       | 261 +++++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c        | 340 ++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.c        | 528 +++++++++++++++++++++++
 drivers/net/mlx5/mlx5_hws_cnt.h        | 558 +++++++++++++++++++++++++
 15 files changed, 1831 insertions(+), 31 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 192bb84211..64fbecd7b3 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -243,6 +243,7 @@ New Features
     - Support of modify fields.
     - Support of FDB.
     - Support of meter.
+    - Support of counter.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 9c185366d0..05b9429c7f 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -176,6 +176,41 @@ mlx5_devx_cmd_register_write(void *ctx, uint16_t reg_id, uint32_t arg,
 	return 0;
 }
 
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+		struct mlx5_devx_counter_attr *attr)
+{
+	struct mlx5_devx_obj *dcs = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*dcs),
+						0, SOCKET_ID_ANY);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	if (attr->bulk_log_max_alloc)
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk_log_size,
+			 attr->flow_counter_bulk_log_size);
+	else
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk,
+			 attr->bulk_n_128);
+	if (attr->pd_valid)
+		MLX5_SET(alloc_flow_counter_in, in, pd, attr->pd);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		mlx5_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
 /**
  * Allocate flow counters via devx interface.
  *
@@ -967,6 +1002,16 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 					 general_obj_types) &
 			      MLX5_GENERAL_OBJ_TYPES_CAP_CONN_TRACK_OFFLOAD);
 	attr->rq_delay_drop = MLX5_GET(cmd_hca_cap, hcattr, rq_delay_drop);
+	attr->max_flow_counter_15_0 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_15_0);
+	attr->max_flow_counter_31_16 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_31_16);
+	attr->alloc_flow_counter_pd = MLX5_GET(cmd_hca_cap, hcattr,
+			alloc_flow_counter_pd);
+	attr->flow_counter_access_aso = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_counter_access_aso);
+	attr->flow_access_aso_opc_mod = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_access_aso_opc_mod);
 	if (attr->crypto) {
 		attr->aes_xts = MLX5_GET(cmd_hca_cap, hcattr, aes_xts);
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
@@ -995,6 +1040,11 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 							   hairpin_sq_wq_in_host_mem);
 		attr->hairpin_data_buffer_locked = MLX5_GET(cmd_hca_cap_2, hcattr,
 							    hairpin_data_buffer_locked);
+		attr->flow_counter_bulk_log_max_alloc = MLX5_GET(cmd_hca_cap_2,
+				hcattr, flow_counter_bulk_log_max_alloc);
+		attr->flow_counter_bulk_log_granularity =
+			MLX5_GET(cmd_hca_cap_2, hcattr,
+				 flow_counter_bulk_log_granularity);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index a10aa3331b..c94b9eac06 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -15,6 +15,16 @@
 #define MLX5_DEVX_MAX_KLM_ENTRIES ((UINT16_MAX - \
 		MLX5_ST_SZ_DW(create_mkey_in) * 4) / (MLX5_ST_SZ_DW(klm) * 4))
 
+struct mlx5_devx_counter_attr {
+	uint32_t pd_valid:1;
+	uint32_t pd:24;
+	uint32_t bulk_log_max_alloc:1;
+	union {
+		uint8_t flow_counter_bulk_log_size;
+		uint8_t bulk_n_128;
+	};
+};
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
@@ -266,6 +276,18 @@ struct mlx5_hca_attr {
 	uint32_t set_reg_c:8;
 	uint32_t nic_flow_table:1;
 	uint32_t modify_outer_ip_ecn:1;
+	union {
+		uint32_t max_flow_counter;
+		struct {
+			uint16_t max_flow_counter_15_0;
+			uint16_t max_flow_counter_31_16;
+		};
+	};
+	uint32_t flow_counter_bulk_log_max_alloc:5;
+	uint32_t flow_counter_bulk_log_granularity:5;
+	uint32_t alloc_flow_counter_pd:1;
+	uint32_t flow_counter_access_aso:1;
+	uint32_t flow_access_aso_opc_mod:8;
 };
 
 /* LAG Context. */
@@ -598,6 +620,11 @@ struct mlx5_devx_crypto_login_attr {
 
 /* mlx5_devx_cmds.c */
 
+__rte_internal
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+				struct mlx5_devx_counter_attr *attr);
+
 __rte_internal
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(void *ctx,
 						       uint32_t bulk_sz);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index fb3c43eed9..2b5c43ee6e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1170,8 +1170,10 @@ struct mlx5_ifc_alloc_flow_counter_in_bits {
 	u8 reserved_at_10[0x10];
 	u8 reserved_at_20[0x10];
 	u8 op_mod[0x10];
-	u8 flow_counter_id[0x20];
-	u8 reserved_at_40[0x18];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x13];
+	u8 flow_counter_bulk_log_size[0x5];
 	u8 flow_counter_bulk[0x8];
 };
 
@@ -1405,7 +1407,13 @@ enum {
 #define MLX5_STEERING_LOGIC_FORMAT_CONNECTX_6DX 0x1
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x20];
+	u8 access_other_hca_roce[0x1];
+	u8 alloc_flow_counter_pd[0x1];
+	u8 flow_counter_access_aso[0x1];
+	u8 reserved_at_3[0x5];
+	u8 flow_access_aso_opc_mod[0x8];
+	u8 reserved_at_10[0xf];
+	u8 vhca_resource_manager[0x1];
 	u8 hca_cap_2[0x1];
 	u8 reserved_at_21[0xf];
 	u8 vhca_id[0x10];
@@ -2118,7 +2126,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 format_select_dw_8_6_ext[0x1];
 	u8 reserved_at_1ac[0x14];
 	u8 general_obj_types_127_64[0x40];
-	u8 reserved_at_200[0x80];
+	u8 reserved_at_200[0x53];
+	u8 flow_counter_bulk_log_max_alloc[0x5];
+	u8 reserved_at_258[0x3];
+	u8 flow_counter_bulk_log_granularity[0x5];
+	u8 reserved_at_260[0x20];
 	u8 format_select_dw_gtpu_dw_0[0x8];
 	u8 format_select_dw_gtpu_dw_1[0x8];
 	u8 format_select_dw_gtpu_dw_2[0x8];
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 413dec14ab..4f72900519 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -40,6 +40,7 @@ INTERNAL {
 	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_alloc_general;
 	mlx5_devx_cmd_flow_counter_query;
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_flow_single_dump;
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index c3b8fa16d3..0b506e52b4 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -41,6 +41,7 @@ sources = files(
 if is_linux
     sources += files(
             'mlx5_flow_hw.c',
+	    'mlx5_hws_cnt.c',
             'mlx5_flow_verbs.c',
     )
     if (dpdk_conf.has('RTE_ARCH_X86_64')
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9cd4892858..4d87da8e29 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -175,6 +175,12 @@
 /* Device parameter to create the fdb default rule in PMD */
 #define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
 
+/* HW steering counter configuration. */
+#define MLX5_HWS_CNT_SERVICE_CORE "service_core"
+
+/* HW steering counter's query interval. */
+#define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1245,6 +1251,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->allow_duplicate_pattern = !!tmp;
 	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
 		config->fdb_def_rule = !!tmp;
+	} else if (strcmp(MLX5_HWS_CNT_SERVICE_CORE, key) == 0) {
+		config->cnt_svc.service_core = tmp;
+	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
+		config->cnt_svc.cycle_time = tmp;
 	}
 	return 0;
 }
@@ -1281,6 +1291,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
 		MLX5_FDB_DEFAULT_RULE_EN,
+		MLX5_HWS_CNT_SERVICE_CORE,
+		MLX5_HWS_CNT_CYCLE_TIME,
 		NULL,
 	};
 	int ret = 0;
@@ -1293,6 +1305,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
 	config->fdb_def_rule = 1;
+	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
+	config->cnt_svc.service_core = rte_get_main_lcore();
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 325f0b31c5..c71db131a1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -313,6 +313,10 @@ struct mlx5_sh_config {
 	uint32_t hw_fcs_strip:1; /* FCS stripping is supported. */
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
+	struct {
+		uint16_t service_core;
+		uint32_t cycle_time; /* query cycle time in milli-second. */
+	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
@@ -1229,6 +1233,22 @@ struct mlx5_flex_item {
 	struct mlx5_flex_pattern_field map[MLX5_FLEX_ITEM_MAPPING_NUM];
 };
 
+#define HWS_CNT_ASO_SQ_NUM 4
+
+struct mlx5_hws_aso_mng {
+	uint16_t sq_num;
+	struct mlx5_aso_sq sqs[HWS_CNT_ASO_SQ_NUM];
+};
+
+struct mlx5_hws_cnt_svc_mng {
+	uint32_t refcnt;
+	uint32_t service_core;
+	uint32_t query_interval;
+	pthread_t service_thread;
+	uint8_t svc_running;
+	struct mlx5_hws_aso_mng aso_mng __rte_cache_aligned;
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -1328,6 +1348,7 @@ struct mlx5_dev_ctx_shared {
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
+	struct mlx5_hws_cnt_svc_mng *cnt_svc;
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1614,6 +1635,7 @@ struct mlx5_priv {
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
 	uint32_t nb_queue; /* HW steering queue number. */
+	struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
@@ -2044,6 +2066,11 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
+void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
+int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
 /* mlx5_flow_flex.c */
 
 struct rte_flow_item_flex_handle *
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 585afb0a98..d064abfef3 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -188,4 +188,6 @@
 #define static_assert _Static_assert
 #endif
 
+#define MLX5_CNT_SVC_CYCLE_TIME_DEFAULT 500
+
 #endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 892c42a10b..38932fe9d7 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7834,24 +7834,33 @@ mlx5_flow_isolate(struct rte_eth_dev *dev,
  */
 static int
 flow_drv_query(struct rte_eth_dev *dev,
-	       uint32_t flow_idx,
+	       struct rte_flow *eflow,
 	       const struct rte_flow_action *actions,
 	       void *data,
 	       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow *flow = mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
-					       flow_idx);
-	enum mlx5_flow_drv_type ftype;
+	struct rte_flow *flow = NULL;
+	enum mlx5_flow_drv_type ftype = MLX5_FLOW_TYPE_MIN;
 
+	if (priv->sh->config.dv_flow_en == 2) {
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		flow = eflow;
+		ftype = MLX5_FLOW_TYPE_HW;
+#endif
+	} else {
+		flow = (struct rte_flow *)mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
+				(uintptr_t)(void *)eflow);
+	}
 	if (!flow) {
 		return rte_flow_error_set(error, ENOENT,
 			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			  NULL,
 			  "invalid flow handle");
 	}
-	ftype = flow->drv_type;
+	if (ftype == MLX5_FLOW_TYPE_MIN)
+		ftype = flow->drv_type;
 	MLX5_ASSERT(ftype > MLX5_FLOW_TYPE_MIN && ftype < MLX5_FLOW_TYPE_MAX);
 	fops = flow_get_drv_ops(ftype);
 
@@ -7872,14 +7881,8 @@ mlx5_flow_query(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	int ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (priv->sh->config.dv_flow_en == 2)
-		return rte_flow_error_set(error, ENOTSUP,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-			  NULL,
-			  "Flow non-Q query not supported");
-	ret = flow_drv_query(dev, (uintptr_t)(void *)flow, actions, data,
+	ret = flow_drv_query(dev, flow, actions, data,
 			     error);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 5f89afbe29..1948de5dd8 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1109,6 +1109,7 @@ struct rte_flow_hw {
 		struct mlx5_hrxq *hrxq; /* TIR action. */
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
+	uint32_t cnt_id;
 	uint8_t rule[0]; /* HWS layer data struct. */
 } __rte_packed;
 
@@ -1157,6 +1158,9 @@ struct mlx5_action_construct_data {
 			uint32_t level; /* RSS level. */
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
+		struct {
+			uint32_t id;
+		} shared_counter;
 	};
 };
 
@@ -1235,6 +1239,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
+	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 60d0280367..ed9272e583 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -12,6 +12,9 @@
 
 #include "mlx5.h"
 #include "mlx5_flow.h"
+#include "mlx5_hws_cnt.h"
+
+#define MLX5_ASO_CNT_QUEUE_LOG_DESC 14
 
 /**
  * Free MR resources.
@@ -79,6 +82,33 @@ mlx5_aso_destroy_sq(struct mlx5_aso_sq *sq)
 	memset(sq, 0, sizeof(*sq));
 }
 
+/**
+ * Initialize Send Queue used for ASO access counter.
+ *
+ * @param[in] sq
+ *   ASO SQ to initialize.
+ */
+static void
+mlx5_aso_cnt_init_sq(struct mlx5_aso_sq *sq)
+{
+	volatile struct mlx5_aso_wqe *restrict wqe;
+	int i;
+	int size = 1 << sq->log_desc_n;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0, wqe = &sq->sq_obj.aso_wqes[0]; i < size; ++i, ++wqe) {
+		wqe->general_cseg.sq_ds = rte_cpu_to_be_32((sq->sqn << 8) |
+							  (sizeof(*wqe) >> 4));
+		wqe->aso_cseg.operand_masks = rte_cpu_to_be_32
+			(0u |
+			 (ASO_OPER_LOGICAL_OR << ASO_CSEG_COND_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_1_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_0_OPER_OFFSET) |
+			 (BYTEWISE_64BYTE << ASO_CSEG_DATA_MASK_MODE_OFFSET));
+		wqe->aso_cseg.data_mask = RTE_BE64(UINT64_MAX);
+	}
+}
+
 /**
  * Initialize Send Queue used for ASO access.
  *
@@ -191,7 +221,7 @@ mlx5_aso_ct_init_sq(struct mlx5_aso_sq *sq)
  */
 static int
 mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
-		   void *uar)
+		   void *uar, uint16_t log_desc_n)
 {
 	struct mlx5_devx_cq_attr cq_attr = {
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(uar),
@@ -212,12 +242,12 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	int ret;
 
 	if (mlx5_devx_cq_create(cdev->ctx, &sq->cq.cq_obj,
-				MLX5_ASO_QUEUE_LOG_DESC, &cq_attr,
+				log_desc_n, &cq_attr,
 				SOCKET_ID_ANY))
 		goto error;
 	sq->cq.cq_ci = 0;
-	sq->cq.log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
-	sq->log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
+	sq->cq.log_desc_n = log_desc_n;
+	sq->log_desc_n = log_desc_n;
 	sq_attr.cqn = sq->cq.cq_obj.cq->id;
 	/* for mlx5_aso_wqe that is twice the size of mlx5_wqe */
 	log_wqbb_n = sq->log_desc_n + 1;
@@ -269,7 +299,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    sq_desc_n, &sh->aso_age_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->aso_age_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->aso_age_mng->aso_sq.mr);
 			return -1;
 		}
@@ -277,7 +308,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		break;
 	case ASO_OPC_MOD_POLICER:
 		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj))
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
 			return -1;
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
@@ -287,7 +318,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    &sh->ct_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
 			return -1;
 		}
@@ -1403,3 +1434,219 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 	rte_errno = EBUSY;
 	return -rte_errno;
 }
+
+int
+mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_hws_aso_mng *aso_mng = NULL;
+	uint8_t idx;
+	struct mlx5_aso_sq *sq;
+
+	MLX5_ASSERT(sh);
+	MLX5_ASSERT(sh->cnt_svc);
+	aso_mng = &sh->cnt_svc->aso_mng;
+	aso_mng->sq_num = HWS_CNT_ASO_SQ_NUM;
+	for (idx = 0; idx < HWS_CNT_ASO_SQ_NUM; idx++) {
+		sq = &aso_mng->sqs[idx];
+		if (mlx5_aso_sq_create(sh->cdev, sq, sh->tx_uar.obj,
+					MLX5_ASO_CNT_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_cnt_init_sq(sq);
+	}
+	return 0;
+error:
+	mlx5_aso_cnt_queue_uninit(sh);
+	return -1;
+}
+
+void
+mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh)
+{
+	uint16_t idx;
+
+	for (idx = 0; idx < sh->cnt_svc->aso_mng.sq_num; idx++)
+		mlx5_aso_destroy_sq(&sh->cnt_svc->aso_mng.sqs[idx]);
+	sh->cnt_svc->aso_mng.sq_num = 0;
+}
+
+static uint16_t
+mlx5_aso_cnt_sq_enqueue_burst(struct mlx5_hws_cnt_pool *cpool,
+		struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_aso_sq *sq, uint32_t n,
+		uint32_t offset, uint32_t dcs_id_base)
+{
+	volatile struct mlx5_aso_wqe *wqe;
+	uint16_t size = 1 << sq->log_desc_n;
+	uint16_t mask = size - 1;
+	uint16_t max;
+	uint32_t upper_offset = offset;
+	uint64_t addr;
+	uint32_t ctrl_gen_id = 0;
+	uint8_t opcmod = sh->cdev->config.hca_attr.flow_access_aso_opc_mod;
+	rte_be32_t lkey = rte_cpu_to_be_32(cpool->raw_mng->mr.lkey);
+	uint16_t aso_n = (uint16_t)(RTE_ALIGN_CEIL(n, 4) / 4);
+	uint32_t ccntid;
+
+	max = RTE_MIN(size - (uint16_t)(sq->head - sq->tail), aso_n);
+	if (unlikely(!max))
+		return 0;
+	upper_offset += (max * 4);
+	/* Because only one burst at one time, we can use the same elt. */
+	sq->elts[0].burst_size = max;
+	ctrl_gen_id = dcs_id_base;
+	ctrl_gen_id /= 4;
+	do {
+		ccntid = upper_offset - max * 4;
+		wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
+		rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
+		wqe->general_cseg.misc = rte_cpu_to_be_32(ctrl_gen_id);
+		wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+							 MLX5_COMP_MODE_OFFSET);
+		wqe->general_cseg.opcode = rte_cpu_to_be_32
+						(MLX5_OPCODE_ACCESS_ASO |
+						 (opcmod <<
+						  WQE_CSEG_OPC_MOD_OFFSET) |
+						 (sq->pi <<
+						  WQE_CSEG_WQE_INDEX_OFFSET));
+		addr = (uint64_t)RTE_PTR_ADD(cpool->raw_mng->raw,
+				ccntid * sizeof(struct flow_counter_stats));
+		wqe->aso_cseg.va_h = rte_cpu_to_be_32((uint32_t)(addr >> 32));
+		wqe->aso_cseg.va_l_r = rte_cpu_to_be_32((uint32_t)addr | 1u);
+		wqe->aso_cseg.lkey = lkey;
+		sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
+		sq->head++;
+		sq->next++;
+		ctrl_gen_id++;
+		max--;
+	} while (max);
+	wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+							 MLX5_COMP_MODE_OFFSET);
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	return sq->elts[0].burst_size;
+}
+
+static uint16_t
+mlx5_aso_cnt_completion_handle(struct mlx5_aso_sq *sq)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const unsigned int cq_size = 1 << cq->log_desc_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = cq->cq_ci & mask;
+	const uint16_t max = (uint16_t)(sq->head - sq->tail);
+	uint16_t i = 0;
+	int ret;
+	if (unlikely(!max))
+		return 0;
+	idx = next_idx;
+	next_idx = (cq->cq_ci + 1) & mask;
+	rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+	cqe = &cq->cq_obj.cqes[idx];
+	ret = check_cqe(cqe, cq_size, cq->cq_ci);
+	/*
+	 * Be sure owner read is done before any other cookie field or
+	 * opaque field.
+	 */
+	rte_io_rmb();
+	if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+		if (likely(ret == MLX5_CQE_STATUS_HW_OWN))
+			return 0; /* return immediately. */
+		mlx5_aso_cqe_err_handle(sq);
+	}
+	i += sq->elts[0].burst_size;
+	sq->elts[0].burst_size = 0;
+	cq->cq_ci++;
+	if (likely(i)) {
+		sq->tail += i;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return i;
+}
+
+static uint16_t
+mlx5_aso_cnt_query_one_dcs(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool,
+			   uint8_t dcs_idx, uint32_t num)
+{
+	uint32_t dcs_id = cpool->dcs_mng.dcs[dcs_idx].obj->id;
+	uint64_t cnt_num = cpool->dcs_mng.dcs[dcs_idx].batch_sz;
+	uint64_t left;
+	uint32_t iidx = cpool->dcs_mng.dcs[dcs_idx].iidx;
+	uint32_t offset;
+	uint16_t mask;
+	uint16_t sq_idx;
+	uint64_t burst_sz = (uint64_t)(1 << MLX5_ASO_CNT_QUEUE_LOG_DESC) * 4 *
+		sh->cnt_svc->aso_mng.sq_num;
+	uint64_t qburst_sz = burst_sz / sh->cnt_svc->aso_mng.sq_num;
+	uint64_t n;
+	struct mlx5_aso_sq *sq;
+
+	cnt_num = RTE_MIN(num, cnt_num);
+	left = cnt_num;
+	while (left) {
+		mask = 0;
+		for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+				sq_idx++) {
+			if (left == 0) {
+				mask |= (1 << sq_idx);
+				continue;
+			}
+			n = RTE_MIN(left, qburst_sz);
+			offset = cnt_num - left;
+			offset += iidx;
+			mlx5_aso_cnt_sq_enqueue_burst(cpool, sh,
+					&sh->cnt_svc->aso_mng.sqs[sq_idx], n,
+					offset, dcs_id);
+			left -= n;
+		}
+		do {
+			for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+					sq_idx++) {
+				sq = &sh->cnt_svc->aso_mng.sqs[sq_idx];
+				if (mlx5_aso_cnt_completion_handle(sq))
+					mask |= (1 << sq_idx);
+			}
+		} while (mask < ((1 << sh->cnt_svc->aso_mng.sq_num) - 1));
+	}
+	return cnt_num;
+}
+
+/*
+ * Query FW counter via ASO WQE.
+ *
+ * ASO query counter use _sync_ mode, means:
+ * 1. each SQ issue one burst with several WQEs
+ * 2. ask for CQE at last WQE
+ * 3. busy poll CQ of each SQ's
+ * 4. If all SQ's CQE are received then goto step 1, issue next burst
+ *
+ * @param[in] sh
+ *   Pointer to shared device.
+ * @param[in] cpool
+ *   Pointer to counter pool.
+ *
+ * @return
+ *   0 on success, -1 on failure.
+ */
+int
+mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	uint32_t num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool) -
+		rte_ring_count(cpool->free_list);
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		num = RTE_MIN(cnt_num, cpool->dcs_mng.dcs[idx].batch_sz);
+		mlx5_aso_cnt_query_one_dcs(sh, cpool, idx, num);
+		cnt_num -= num;
+		if (cnt_num == 0)
+			break;
+	}
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 5051741a5a..1e441c9c0d 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -10,6 +10,7 @@
 #include "mlx5_rx.h"
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+#include "mlx5_hws_cnt.h"
 
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
@@ -353,6 +354,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 			mlx5dr_action_destroy(acts->mhdr->action);
 		mlx5_free(acts->mhdr);
 	}
+	if (mlx5_hws_cnt_id_valid(acts->cnt_id)) {
+		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
+		acts->cnt_id = 0;
+	}
 }
 
 /**
@@ -532,6 +537,44 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared counter action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] cnt_id
+ *   Shared counter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t cnt_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_counter.id = cnt_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
+
 /**
  * Translate shared indirect action.
  *
@@ -573,6 +616,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 		    action_src, action_dst, idx, shared_rss))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (__flow_hw_act_data_shared_cnt_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
+			action_src, action_dst, act_idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -946,6 +996,30 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	}
 	return 0;
 }
+
+static __rte_always_inline int
+flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
+		      struct mlx5_hw_actions *acts)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t pos = start_pos;
+	cnt_id_t cnt_id;
+	int ret;
+
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	if (ret != 0)
+		return ret;
+	ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &acts->rule_acts[pos].action,
+				 &acts->rule_acts[pos].counter.offset);
+	if (ret != 0)
+		return ret;
+	acts->cnt_id = cnt_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1189,6 +1263,20 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (masks->conf &&
+			    ((const struct rte_flow_action_count *)
+			     masks->conf)->id) {
+				err = flow_hw_cnt_compile(dev, i, acts);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, i)) {
+				goto err;
+			}
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1377,6 +1465,13 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				(dev, &act_data, item_flags, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+				act_idx,
+				&rule_act->action,
+				&rule_act->counter.offset))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1520,7 +1615,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num)
+			  uint32_t *acts_num,
+			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
@@ -1574,6 +1670,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
 		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
@@ -1681,6 +1778,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
+					&cnt_id);
+			if (ret != 0)
+				return ret;
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = cnt_id;
+			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 act_data->shared_counter.id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = act_data->shared_counter.id;
+			break;
 		default:
 			break;
 		}
@@ -1690,6 +1813,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
 	}
+	if (mlx5_hws_cnt_id_valid(hw_acts->cnt_id))
+		job->flow->cnt_id = hw_acts->cnt_id;
 	return 0;
 }
 
@@ -1825,7 +1950,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * user's input, in order to save the cost.
 	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num)) {
+				  actions, rule_acts, &acts_num, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1955,6 +2080,13 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
+			    mlx5_hws_cnt_is_shared
+				(priv->hws_cpool, job->flow->cnt_id) == false) {
+				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
+						&job->flow->cnt_id);
+				job->flow->cnt_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -2678,6 +2810,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -4349,6 +4484,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_counters) {
+		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
+				nb_queue);
+		if (priv->hws_cpool == NULL)
+			goto err;
+	}
 	return 0;
 err:
 	flow_hw_free_vport_actions(priv);
@@ -4418,6 +4559,8 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (priv->hws_cpool)
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4559,10 +4702,28 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	cnt_id_t cnt_id;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_create(dev, conf, action, error);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+			rte_flow_error_set(error, ENODEV,
+					RTE_FLOW_ERROR_TYPE_ACTION,
+					NULL,
+					"counter are not configured!");
+		else
+			handle = (struct rte_flow_action_handle *)
+				 (uintptr_t)cnt_id;
+		break;
+	default:
+		handle = flow_dv_action_create(dev, conf, action, error);
+	}
+	return handle;
 }
 
 /**
@@ -4626,10 +4787,172 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			      void *user_data,
 			      struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_destroy(dev, handle, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	default:
+		return flow_dv_action_destroy(dev, handle, error);
+	}
+}
+
+static int
+flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
+		      void *data, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cnt *cnt;
+	struct rte_flow_query_count *qc = data;
+	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint64_t pkts, bytes;
+
+	if (!mlx5_hws_cnt_id_valid(counter))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"counter are not available");
+	cnt = &priv->hws_cpool->pool[iidx];
+	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
+	qc->hits_set = 1;
+	qc->bytes_set = 1;
+	qc->hits = pkts - cnt->reset.hits;
+	qc->bytes = bytes - cnt->reset.bytes;
+	if (qc->reset) {
+		cnt->reset.bytes = bytes;
+		cnt->reset.hits = pkts;
+	}
+	return 0;
+}
+
+static int
+flow_hw_query(struct rte_eth_dev *dev,
+	      struct rte_flow *flow __rte_unused,
+	      const struct rte_flow_action *actions __rte_unused,
+	      void *data __rte_unused,
+	      struct rte_flow_error *error __rte_unused)
+{
+	int ret = -EINVAL;
+	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
+
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
+						  error);
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  actions,
+						  "action not supported");
+		}
+	}
+	return ret;
+}
+
+/**
+ * Create indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   A valid shared action handle in case of success, NULL otherwise and
+ *   rte_errno is set.
+ */
+static struct rte_flow_action_handle *
+flow_hw_action_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_indir_action_conf *conf,
+		       const struct rte_flow_action *action,
+		       struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
+					    NULL, err);
+}
+
+/**
+ * Destroy the indirect action.
+ * Release action related resources on the NIC and the memory.
+ * Lock free, (mutex should be acquired by caller).
+ * Dispatcher for action type specific call.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be removed.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_destroy(struct rte_eth_dev *dev,
+		       struct rte_flow_action_handle *handle,
+		       struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
+			NULL, error);
+}
+
+/**
+ * Updates in place shared action configuration.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be updated.
+ * @param[in] update
+ *   Action specification used to modify the action pointed by *handle*.
+ *   *update* could be of same type with the action pointed by the *handle*
+ *   handle argument, or some other structures like a wrapper, depending on
+ *   the indirect action type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_update(struct rte_eth_dev *dev,
+		      struct rte_flow_action_handle *handle,
+		      const void *update,
+		      struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
+			update, NULL, err);
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return flow_hw_query_counter(dev, act_idx, data, error);
+	default:
+		return flow_dv_action_query(dev, handle, data, error);
+	}
 }
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
@@ -4651,10 +4974,11 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
 	.action_validate = flow_dv_action_validate,
-	.action_create = flow_dv_action_create,
-	.action_destroy = flow_dv_action_destroy,
-	.action_update = flow_dv_action_update,
-	.action_query = flow_dv_action_query,
+	.action_create = flow_hw_action_create,
+	.action_destroy = flow_hw_action_destroy,
+	.action_update = flow_hw_action_update,
+	.action_query = flow_hw_action_query,
+	.query = flow_hw_query,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
new file mode 100644
index 0000000000..d826ebaa25
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -0,0 +1,528 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#include <stdint.h>
+#include <rte_malloc.h>
+#include <mlx5_malloc.h>
+#include <rte_ring.h>
+#include <mlx5_devx_cmds.h>
+#include <rte_cycles.h>
+
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+
+#include "mlx5_utils.h"
+#include "mlx5_hws_cnt.h"
+
+#define HWS_CNT_CACHE_SZ_DEFAULT 511
+#define HWS_CNT_CACHE_PRELOAD_DEFAULT 254
+#define HWS_CNT_CACHE_FETCH_DEFAULT 254
+#define HWS_CNT_CACHE_THRESHOLD_DEFAULT 254
+#define HWS_CNT_ALLOC_FACTOR_DEFAULT 20
+
+static void
+__hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t preload;
+	uint32_t q_num = cpool->cache->q_num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	cnt_id_t cnt_id, iidx = 0;
+	uint32_t qidx;
+	struct rte_ring *qcache = NULL;
+
+	/*
+	 * Counter ID order is important for tracking the max number of in used
+	 * counter for querying, which means counter internal index order must
+	 * be from zero to the number user configured, i.e: 0 - 8000000.
+	 * Need to load counter ID in this order into the cache firstly,
+	 * and then the global free list.
+	 * In the end, user fetch the counter from minimal to the maximum.
+	 */
+	preload = RTE_MIN(cpool->cache->preload_sz, cnt_num / q_num);
+	for (qidx = 0; qidx < q_num; qidx++) {
+		for (; iidx < preload * (qidx + 1); iidx++) {
+			cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+			qcache = cpool->cache->qcache[qidx];
+			if (qcache)
+				rte_ring_enqueue_elem(qcache, &cnt_id,
+						sizeof(cnt_id));
+		}
+	}
+	for (; iidx < cnt_num; iidx++) {
+		cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+		rte_ring_enqueue_elem(cpool->free_list, &cnt_id,
+				sizeof(cnt_id));
+	}
+}
+
+static void
+__mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	struct rte_ring *reset_list = cpool->wait_reset_list;
+	struct rte_ring *reuse_list = cpool->reuse_list;
+	uint32_t reset_cnt_num;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdu = {0};
+
+	reset_cnt_num = rte_ring_count(reset_list);
+	do {
+		cpool->query_gen++;
+		mlx5_aso_cnt_query(sh, cpool);
+		zcdr.n1 = 0;
+		zcdu.n1 = 0;
+		rte_ring_enqueue_zc_burst_elem_start(reuse_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdu,
+				NULL);
+		rte_ring_dequeue_zc_burst_elem_start(reset_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdr,
+				NULL);
+		__hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num);
+		rte_ring_dequeue_zc_elem_finish(reset_list,
+				reset_cnt_num);
+		rte_ring_enqueue_zc_elem_finish(reuse_list,
+				reset_cnt_num);
+		reset_cnt_num = rte_ring_count(reset_list);
+	} while (reset_cnt_num > 0);
+}
+
+static void
+mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_raw_data_mng *mng)
+{
+	if (mng == NULL)
+		return;
+	sh->cdev->mr_scache.dereg_mr_cb(&mng->mr);
+	mlx5_free(mng->raw);
+	mlx5_free(mng);
+}
+
+__rte_unused
+static struct mlx5_hws_cnt_raw_data_mng *
+mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
+{
+	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
+	int ret;
+	size_t sz = n * sizeof(struct flow_counter_stats);
+
+	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
+			SOCKET_ID_ANY);
+	if (mng == NULL)
+		goto error;
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+			SOCKET_ID_ANY);
+	if (mng->raw == NULL)
+		goto error;
+	ret = sh->cdev->mr_scache.reg_mr_cb(sh->cdev->pd, mng->raw, sz,
+					    &mng->mr);
+	if (ret) {
+		rte_errno = errno;
+		goto error;
+	}
+	return mng;
+error:
+	mlx5_hws_cnt_raw_data_free(sh, mng);
+	return NULL;
+}
+
+static void *
+mlx5_hws_cnt_svc(void *opaque)
+{
+	struct mlx5_dev_ctx_shared *sh =
+		(struct mlx5_dev_ctx_shared *)opaque;
+	uint64_t interval =
+		(uint64_t)sh->cnt_svc->query_interval * (US_PER_S / MS_PER_S);
+	uint16_t port_id;
+	uint64_t start_cycle, query_cycle = 0;
+	uint64_t query_us;
+	uint64_t sleep_us;
+
+	while (sh->cnt_svc->svc_running != 0) {
+		start_cycle = rte_rdtsc();
+		MLX5_ETH_FOREACH_DEV(port_id, sh->cdev->dev) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+			if (opriv != NULL &&
+			    opriv->sh == sh &&
+			    opriv->hws_cpool != NULL) {
+				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+			}
+		}
+		query_cycle = rte_rdtsc() - start_cycle;
+		query_us = query_cycle / (rte_get_timer_hz() / US_PER_S);
+		sleep_us = interval - query_us;
+		if (interval > query_us)
+			rte_delay_us_sleep(sleep_us);
+	}
+	return NULL;
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg)
+{
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct mlx5_hws_cnt_pool *cntp;
+	uint64_t cnt_num = 0;
+	uint32_t qidx;
+
+	MLX5_ASSERT(pcfg);
+	MLX5_ASSERT(ccfg);
+	cntp = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*cntp), 0,
+			   SOCKET_ID_ANY);
+	if (cntp == NULL)
+		return NULL;
+
+	cntp->cfg = *pcfg;
+	cntp->cache = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*cntp->cache) +
+			sizeof(((struct mlx5_hws_cnt_pool_caches *)0)->qcache[0])
+				* ccfg->q_num, 0, SOCKET_ID_ANY);
+	if (cntp->cache == NULL)
+		goto error;
+	 /* store the necessary cache parameters. */
+	cntp->cache->fetch_sz = ccfg->fetch_sz;
+	cntp->cache->preload_sz = ccfg->preload_sz;
+	cntp->cache->threshold = ccfg->threshold;
+	cntp->cache->q_num = ccfg->q_num;
+	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
+	if (cnt_num > UINT32_MAX) {
+		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
+			cnt_num);
+		goto error;
+	}
+	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(struct mlx5_hws_cnt) *
+			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
+			0, SOCKET_ID_ANY);
+	if (cntp->pool == NULL)
+		goto error;
+	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
+	cntp->free_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->free_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_R_RING", pcfg->name);
+	cntp->wait_reset_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_MP_HTS_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
+	if (cntp->wait_reset_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_U_RING", pcfg->name);
+	cntp->reuse_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->reuse_list == NULL) {
+		DRV_LOG(ERR, "failed to create reuse list ring");
+		goto error;
+	}
+	for (qidx = 0; qidx < ccfg->q_num; qidx++) {
+		snprintf(mz_name, sizeof(mz_name), "%s_cache/%u", pcfg->name,
+				qidx);
+		cntp->cache->qcache[qidx] = rte_ring_create(mz_name, ccfg->size,
+				SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (cntp->cache->qcache[qidx] == NULL)
+			goto error;
+	}
+	return cntp;
+error:
+	mlx5_hws_cnt_pool_deinit(cntp);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool * const cntp)
+{
+	uint32_t qidx = 0;
+	if (cntp == NULL)
+		return;
+	rte_ring_free(cntp->free_list);
+	rte_ring_free(cntp->wait_reset_list);
+	rte_ring_free(cntp->reuse_list);
+	if (cntp->cache) {
+		for (qidx = 0; qidx < cntp->cache->q_num; qidx++)
+			rte_ring_free(cntp->cache->qcache[qidx]);
+	}
+	mlx5_free(cntp->cache);
+	mlx5_free(cntp->raw_mng);
+	mlx5_free(cntp->pool);
+	mlx5_free(cntp);
+}
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh)
+{
+#define CNT_THREAD_NAME_MAX 256
+	char name[CNT_THREAD_NAME_MAX];
+	rte_cpuset_t cpuset;
+	int ret;
+	uint32_t service_core = sh->cnt_svc->service_core;
+
+	CPU_ZERO(&cpuset);
+	sh->cnt_svc->svc_running = 1;
+	ret = pthread_create(&sh->cnt_svc->service_thread, NULL,
+			mlx5_hws_cnt_svc, sh);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create HW steering's counter service thread.");
+		return -ENOSYS;
+	}
+	snprintf(name, CNT_THREAD_NAME_MAX - 1, "%s/svc@%d",
+		 sh->ibdev_name, service_core);
+	rte_thread_setname(sh->cnt_svc->service_thread, name);
+	CPU_SET(service_core, &cpuset);
+	pthread_setaffinity_np(sh->cnt_svc->service_thread, sizeof(cpuset),
+				&cpuset);
+	return 0;
+}
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc->service_thread == 0)
+		return;
+	sh->cnt_svc->svc_running = 0;
+	pthread_join(sh->cnt_svc->service_thread, NULL);
+	sh->cnt_svc->service_thread = 0;
+}
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+	uint32_t max_log_bulk_sz = 0;
+	uint32_t log_bulk_sz;
+	uint32_t idx, alloced = 0;
+	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	struct mlx5_devx_counter_attr attr = {0};
+	struct mlx5_devx_obj *dcs;
+
+	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
+		DRV_LOG(ERR,
+			"Fw doesn't support bulk log max alloc");
+		return -1;
+	}
+	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
+	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
+	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
+	attr.pd = sh->cdev->pdn;
+	attr.pd_valid = 1;
+	attr.bulk_log_max_alloc = 1;
+	attr.flow_counter_bulk_log_size = log_bulk_sz;
+	idx = 0;
+	dcs = mlx5_devx_cmd_flow_counter_alloc_general(sh->cdev->ctx, &attr);
+	if (dcs == NULL)
+		goto error;
+	cpool->dcs_mng.dcs[idx].obj = dcs;
+	cpool->dcs_mng.dcs[idx].batch_sz = (1 << log_bulk_sz);
+	cpool->dcs_mng.batch_total++;
+	idx++;
+	cpool->dcs_mng.dcs[0].iidx = 0;
+	alloced = cpool->dcs_mng.dcs[0].batch_sz;
+	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
+		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			dcs = mlx5_devx_cmd_flow_counter_alloc_general
+				(sh->cdev->ctx, &attr);
+			if (dcs == NULL)
+				goto error;
+			cpool->dcs_mng.dcs[idx].obj = dcs;
+			cpool->dcs_mng.dcs[idx].batch_sz =
+				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].iidx = alloced;
+			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
+			cpool->dcs_mng.batch_total++;
+		}
+	}
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Cannot alloc device counter, allocated[%" PRIu32 "] request[%" PRIu32 "]",
+		alloced, cnt_num);
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+		cpool->dcs_mng.dcs[idx].obj = NULL;
+		cpool->dcs_mng.dcs[idx].batch_sz = 0;
+		cpool->dcs_mng.dcs[idx].iidx = 0;
+	}
+	cpool->dcs_mng.batch_total = 0;
+	return -1;
+}
+
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+
+	if (cpool == NULL)
+		return;
+	for (idx = 0; idx < MLX5_HWS_CNT_DCS_NUM; idx++)
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+	if (cpool->raw_mng) {
+		mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+		cpool->raw_mng = NULL;
+	}
+}
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	int ret = 0;
+	struct mlx5_hws_cnt_dcs *dcs;
+	uint32_t flags;
+
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		dcs->dr_action = mlx5dr_action_create_counter(priv->dr_ctx,
+					(struct mlx5dr_devx_obj *)dcs->obj,
+					flags);
+		if (dcs->dr_action == NULL) {
+			mlx5_hws_cnt_pool_action_destroy(cpool);
+			ret = -ENOSYS;
+			break;
+		}
+	}
+	return ret;
+}
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	struct mlx5_hws_cnt_dcs *dcs;
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		if (dcs->dr_action != NULL) {
+			mlx5dr_action_destroy(dcs->dr_action);
+			dcs->dr_action = NULL;
+		}
+	}
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue)
+{
+	struct mlx5_hws_cnt_pool *cpool = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cache_param cparam = {0};
+	struct mlx5_hws_cnt_pool_cfg pcfg = {0};
+	char *mp_name;
+	int ret = 0;
+	size_t sz;
+
+	/* init cnt service if not. */
+	if (priv->sh->cnt_svc == NULL) {
+		ret = mlx5_hws_cnt_svc_init(priv->sh);
+		if (ret != 0)
+			return NULL;
+	}
+	cparam.fetch_sz = HWS_CNT_CACHE_FETCH_DEFAULT;
+	cparam.preload_sz = HWS_CNT_CACHE_PRELOAD_DEFAULT;
+	cparam.q_num = nb_queue;
+	cparam.threshold = HWS_CNT_CACHE_THRESHOLD_DEFAULT;
+	cparam.size = HWS_CNT_CACHE_SZ_DEFAULT;
+	pcfg.alloc_factor = HWS_CNT_ALLOC_FACTOR_DEFAULT;
+	mp_name = mlx5_malloc(MLX5_MEM_ZERO, RTE_MEMZONE_NAMESIZE, 0,
+			SOCKET_ID_ANY);
+	if (mp_name == NULL)
+		goto error;
+	snprintf(mp_name, RTE_MEMZONE_NAMESIZE, "MLX5_HWS_CNT_POOL_%u",
+			dev->data->port_id);
+	pcfg.name = mp_name;
+	pcfg.request_num = pattr->nb_counters;
+	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	if (cpool == NULL)
+		goto error;
+	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
+	if (ret != 0)
+		goto error;
+	sz = RTE_ALIGN_CEIL(mlx5_hws_cnt_pool_get_size(cpool), 4);
+	cpool->raw_mng = mlx5_hws_cnt_raw_data_alloc(priv->sh, sz);
+	if (cpool->raw_mng == NULL)
+		goto error;
+	__hws_cnt_id_load(cpool);
+	/*
+	 * Bump query gen right after pool create so the
+	 * pre-loaded counters can be used directly
+	 * because they already have init value no need
+	 * to wait for query.
+	 */
+	cpool->query_gen = 1;
+	ret = mlx5_hws_cnt_pool_action_create(priv, cpool);
+	if (ret != 0)
+		goto error;
+	priv->sh->cnt_svc->refcnt++;
+	return cpool;
+error:
+	mlx5_hws_cnt_pool_destroy(priv->sh, cpool);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	if (cpool == NULL)
+		return;
+	if (--sh->cnt_svc->refcnt == 0)
+		mlx5_hws_cnt_svc_deinit(sh);
+	mlx5_hws_cnt_pool_action_destroy(cpool);
+	mlx5_hws_cnt_pool_dcs_free(sh, cpool);
+	mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+	mlx5_free((void *)cpool->cfg.name);
+	mlx5_hws_cnt_pool_deinit(cpool);
+}
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh)
+{
+	int ret;
+
+	sh->cnt_svc = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*sh->cnt_svc), 0, SOCKET_ID_ANY);
+	if (sh->cnt_svc == NULL)
+		return -1;
+	sh->cnt_svc->query_interval = sh->config.cnt_svc.cycle_time;
+	sh->cnt_svc->service_core = sh->config.cnt_svc.service_core;
+	ret = mlx5_aso_cnt_queue_init(sh);
+	if (ret != 0) {
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+		return -1;
+	}
+	ret = mlx5_hws_cnt_service_thread_create(sh);
+	if (ret != 0) {
+		mlx5_aso_cnt_queue_uninit(sh);
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+	}
+	return 0;
+}
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc == NULL)
+		return;
+	mlx5_hws_cnt_service_thread_destroy(sh);
+	mlx5_aso_cnt_queue_uninit(sh);
+	mlx5_free(sh->cnt_svc);
+	sh->cnt_svc = NULL;
+}
+
+#endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
new file mode 100644
index 0000000000..5fab4ba597
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -0,0 +1,558 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Mellanox Technologies, Ltd
+ */
+
+#ifndef _MLX5_HWS_CNT_H_
+#define _MLX5_HWS_CNT_H_
+
+#include <rte_ring.h>
+#include "mlx5_utils.h"
+#include "mlx5_flow.h"
+
+/*
+ * COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    | T |       | D |                                               |
+ *    ~ Y |       | C |                    IDX                        ~
+ *    | P |       | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
+#define MLX5_HWS_CNT_DCS_NUM 4
+#define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
+#define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
+#define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
+
+struct mlx5_hws_cnt_dcs {
+	void *dr_action;
+	uint32_t batch_sz;
+	uint32_t iidx; /* internal index of first counter in this bulk. */
+	struct mlx5_devx_obj *obj;
+};
+
+struct mlx5_hws_cnt_dcs_mng {
+	uint32_t batch_total;
+	struct mlx5_hws_cnt_dcs dcs[MLX5_HWS_CNT_DCS_NUM];
+};
+
+struct mlx5_hws_cnt {
+	struct flow_counter_stats reset;
+	union {
+		uint32_t share: 1;
+		/*
+		 * share will be set to 1 when this counter is used as indirect
+		 * action. Only meaningful when user own this counter.
+		 */
+		uint32_t query_gen_when_free;
+		/*
+		 * When PMD own this counter (user put back counter to PMD
+		 * counter pool, i.e), this field recorded value of counter
+		 * pools query generation at time user release the counter.
+		 */
+	};
+};
+
+struct mlx5_hws_cnt_raw_data_mng {
+	struct flow_counter_stats *raw;
+	struct mlx5_pmd_mr mr;
+};
+
+struct mlx5_hws_cache_param {
+	uint32_t size;
+	uint32_t q_num;
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+};
+
+struct mlx5_hws_cnt_pool_cfg {
+	char *name;
+	uint32_t request_num;
+	uint32_t alloc_factor;
+};
+
+struct mlx5_hws_cnt_pool_caches {
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+	uint32_t q_num;
+	struct rte_ring *qcache[];
+};
+
+struct mlx5_hws_cnt_pool {
+	struct mlx5_hws_cnt_pool_cfg cfg __rte_cache_aligned;
+	struct mlx5_hws_cnt_dcs_mng dcs_mng __rte_cache_aligned;
+	uint32_t query_gen __rte_cache_aligned;
+	struct mlx5_hws_cnt *pool;
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng;
+	struct rte_ring *reuse_list;
+	struct rte_ring *free_list;
+	struct rte_ring *wait_reset_list;
+	struct mlx5_hws_cnt_pool_caches *cache;
+} __rte_cache_aligned;
+
+/**
+ * Translate counter id into internal index (start from 0), which can be used
+ * as index of raw/cnt pool.
+ *
+ * @param cnt_id
+ *   The external counter id
+ * @return
+ *   Internal index
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+	uint32_t offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+
+	dcs_idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	return (cpool->dcs_mng.dcs[dcs_idx].iidx + offset);
+}
+
+/**
+ * Check if it's valid counter id.
+ */
+static __rte_always_inline bool
+mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
+{
+	return (cnt_id >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_COUNT ? true : false;
+}
+
+/**
+ * Generate Counter id from internal index.
+ *
+ * @param cpool
+ *   The pointer to counter pool
+ * @param index
+ *   The internal counter index.
+ *
+ * @return
+ *   Counter id
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+{
+	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
+	uint32_t idx;
+	uint32_t offset;
+	cnt_id_t cnt_id;
+
+	for (idx = 0, offset = iidx; idx < dcs_mng->batch_total; idx++) {
+		if (dcs_mng->dcs[idx].batch_sz <= offset)
+			offset -= dcs_mng->dcs[idx].batch_sz;
+		else
+			break;
+	}
+	cnt_id = offset;
+	cnt_id |= (idx << MLX5_HWS_CNT_DCS_IDX_OFFSET);
+	return (MLX5_INDIRECT_ACTION_TYPE_COUNT <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | cnt_id;
+}
+
+static __rte_always_inline void
+__hws_cnt_query_raw(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		uint64_t *raw_pkts, uint64_t *raw_bytes)
+{
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng = cpool->raw_mng;
+	struct flow_counter_stats s[2];
+	uint8_t i = 0x1;
+	size_t stat_sz = sizeof(s[0]);
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	memcpy(&s[0], &raw_mng->raw[iidx], stat_sz);
+	do {
+		memcpy(&s[i & 1], &raw_mng->raw[iidx], stat_sz);
+		if (memcmp(&s[0], &s[1], stat_sz) == 0) {
+			*raw_pkts = rte_be_to_cpu_64(s[0].hits);
+			*raw_bytes = rte_be_to_cpu_64(s[0].bytes);
+			break;
+		}
+		i = ~i;
+	} while (1);
+}
+
+/**
+ * Copy elems from one zero-copy ring to zero-copy ring in place.
+ *
+ * The input is a rte ring zero-copy data struct, which has two pointer.
+ * in case of the wrapper happened, the ptr2 will be meaningful.
+ *
+ * So this rountin needs to consider the situation that the address given by
+ * source and destination could be both wrapped.
+ * First, calculate the first number of element needs to be copied until wrapped
+ * address, which could be in source or destination.
+ * Second, copy left number of element until second wrapped address. If in first
+ * step the wrapped address is source, then this time it must be in destination.
+ * and vice-vers.
+ * Third, copy all left numbe of element.
+ *
+ * In worst case, we need copy three pieces of continuous memory.
+ *
+ * @param zcdd
+ *   A pointer to zero-copy data of dest ring.
+ * @param zcds
+ *   A pointer to zero-copy data of source ring.
+ * @param n
+ *   Number of elems to copy.
+ */
+static __rte_always_inline void
+__hws_cnt_r2rcpy(struct rte_ring_zc_data *zcdd, struct rte_ring_zc_data *zcds,
+		unsigned int n)
+{
+	unsigned int n1, n2, n3;
+	void *s1, *s2, *s3;
+	void *d1, *d2, *d3;
+
+	s1 = zcds->ptr1;
+	d1 = zcdd->ptr1;
+	n1 = RTE_MIN(zcdd->n1, zcds->n1);
+	if (zcds->n1 > n1) {
+		n2 = zcds->n1 - n1;
+		s2 = RTE_PTR_ADD(zcds->ptr1, sizeof(cnt_id_t) * n1);
+		d2 = zcdd->ptr2;
+		n3 = n - n1 - n2;
+		s3 = zcds->ptr2;
+		d3 = RTE_PTR_ADD(zcdd->ptr2, sizeof(cnt_id_t) * n2);
+	} else {
+		n2 = zcdd->n1 - n1;
+		s2 = zcds->ptr2;
+		d2 = RTE_PTR_ADD(zcdd->ptr1, sizeof(cnt_id_t) * n1);
+		n3 = n - n1 - n2;
+		s3 = RTE_PTR_ADD(zcds->ptr2, sizeof(cnt_id_t) * n2);
+		d3 = zcdd->ptr2;
+	}
+	memcpy(d1, s1, n1 * sizeof(cnt_id_t));
+	if (n2 != 0) {
+		memcpy(d2, s2, n2 * sizeof(cnt_id_t));
+		if (n3 != 0)
+			memcpy(d3, s3, n3 * sizeof(cnt_id_t));
+	}
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_flush(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *reset_list = NULL;
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache,
+			sizeof(cnt_id_t), rte_ring_count(qcache), &zcdc,
+			NULL);
+	MLX5_ASSERT(ret);
+	reset_list = cpool->wait_reset_list;
+	rte_ring_enqueue_zc_burst_elem_start(reset_list,
+			sizeof(cnt_id_t), ret, &zcdr, NULL);
+	__hws_cnt_r2rcpy(&zcdr, &zcdc, ret);
+	rte_ring_enqueue_zc_elem_finish(reset_list, ret);
+	rte_ring_dequeue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_fetch(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+	struct rte_ring *free_list = NULL;
+	struct rte_ring *reuse_list = NULL;
+	struct rte_ring *list = NULL;
+	struct rte_ring_zc_data zcdf = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdu = {0};
+	struct rte_ring_zc_data zcds = {0};
+	struct mlx5_hws_cnt_pool_caches *cache = cpool->cache;
+	unsigned int ret;
+
+	reuse_list = cpool->reuse_list;
+	ret = rte_ring_dequeue_zc_burst_elem_start(reuse_list,
+			sizeof(cnt_id_t), cache->fetch_sz, &zcdu, NULL);
+	zcds = zcdu;
+	list = reuse_list;
+	if (unlikely(ret == 0)) { /* no reuse counter. */
+		rte_ring_dequeue_zc_elem_finish(reuse_list, 0);
+		free_list = cpool->free_list;
+		ret = rte_ring_dequeue_zc_burst_elem_start(free_list,
+				sizeof(cnt_id_t), cache->fetch_sz, &zcdf, NULL);
+		zcds = zcdf;
+		list = free_list;
+		if (unlikely(ret == 0)) { /* no free counter. */
+			rte_ring_dequeue_zc_elem_finish(free_list, 0);
+			if (rte_ring_count(cpool->wait_reset_list))
+				return -EAGAIN;
+			return -ENOENT;
+		}
+	}
+	rte_ring_enqueue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+			ret, &zcdc, NULL);
+	__hws_cnt_r2rcpy(&zcdc, &zcds, ret);
+	rte_ring_dequeue_zc_elem_finish(list, ret);
+	rte_ring_enqueue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+__mlx5_hws_cnt_pool_enqueue_revert(struct rte_ring *r, unsigned int n,
+		struct rte_ring_zc_data *zcd)
+{
+	uint32_t current_head = 0;
+	uint32_t revert2head = 0;
+
+	MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST);
+	MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST);
+	current_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	MLX5_ASSERT(n <= r->capacity);
+	MLX5_ASSERT(n <= rte_ring_count(r));
+	revert2head = current_head - n;
+	r->prod.head = revert2head; /* This ring should be SP. */
+	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
+			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
+	/* Update tail */
+	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
+	return n;
+}
+
+/**
+ * Put one counter back in the mempool.
+ *
+ * @param cpool
+ *   A pointer to the counter pool structure.
+ * @param cnt_id
+ *   A counter id to be added.
+ * @return
+ *   - 0: Success; object taken
+ *   - -ENOENT: not enough entry in pool
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret = 0;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring *qcache = NULL;
+	unsigned int wb_num = 0; /* cache write-back number. */
+	cnt_id_t iidx;
+
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].query_gen_when_free =
+		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_enqueue_elem(cpool->wait_reset_list, cnt_id,
+				sizeof(cnt_id_t));
+		MLX5_ASSERT(ret == 0);
+		return ret;
+	}
+	ret = rte_ring_enqueue_burst_elem(qcache, cnt_id, sizeof(cnt_id_t), 1,
+					  NULL);
+	if (unlikely(ret == 0)) { /* cache is full. */
+		wb_num = rte_ring_count(qcache) - cpool->cache->threshold;
+		MLX5_ASSERT(wb_num < rte_ring_count(qcache));
+		__mlx5_hws_cnt_pool_enqueue_revert(qcache, wb_num, &zcdc);
+		rte_ring_enqueue_zc_burst_elem_start(cpool->wait_reset_list,
+				sizeof(cnt_id_t), wb_num, &zcdr, NULL);
+		__hws_cnt_r2rcpy(&zcdr, &zcdc, wb_num);
+		rte_ring_enqueue_zc_elem_finish(cpool->wait_reset_list, wb_num);
+		/* write-back THIS counter too */
+		ret = rte_ring_enqueue_burst_elem(cpool->wait_reset_list,
+				cnt_id, sizeof(cnt_id_t), 1, NULL);
+	}
+	return ret == 1 ? 0 : -ENOENT;
+}
+
+/**
+ * Get one counter from the pool.
+ *
+ * If @param queue is not null, objects will be retrieved first from queue's
+ * cache, subsequently from the common pool. Note that it can return -ENOENT
+ * when the local cache and common pool are empty, even if cache from other
+ * queue are full.
+ *
+ * @param cntp
+ *   A pointer to the counter pool structure.
+ * @param queue
+ *   A pointer to HWS queue. If null, it means fetch from common pool.
+ * @param cnt_id
+ *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @return
+ *   - 0: Success; objects taken.
+ *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
+ *   - -EAGAIN: counter is not ready; try again.
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *qcache = NULL;
+	uint32_t query_gen = 0;
+	cnt_id_t iidx, tmp_cid = 0;
+
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_dequeue_elem(cpool->reuse_list, &tmp_cid,
+				sizeof(cnt_id_t));
+		if (unlikely(ret != 0)) {
+			ret = rte_ring_dequeue_elem(cpool->free_list, &tmp_cid,
+					sizeof(cnt_id_t));
+			if (unlikely(ret != 0)) {
+				if (rte_ring_count(cpool->wait_reset_list))
+					return -EAGAIN;
+				return -ENOENT;
+			}
+		}
+		*cnt_id = tmp_cid;
+		iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+		__hws_cnt_query_raw(cpool, *cnt_id,
+				    &cpool->pool[iidx].reset.hits,
+				    &cpool->pool[iidx].reset.bytes);
+		return 0;
+	}
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
+			&zcdc, NULL);
+	if (unlikely(ret == 0)) { /* local cache is empty. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+	}
+	/* get one from local cache. */
+	*cnt_id = (*(cnt_id_t *)zcdc.ptr1);
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	query_gen = cpool->pool[iidx].query_gen_when_free;
+	if (cpool->query_gen == query_gen) { /* counter is waiting to reset. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* write-back counter to reset list. */
+		mlx5_hws_cnt_pool_cache_flush(cpool, *queue);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+		*cnt_id = *(cnt_id_t *)zcdc.ptr1;
+	}
+	__hws_cnt_query_raw(cpool, *cnt_id, &cpool->pool[iidx].reset.hits,
+			    &cpool->pool[iidx].reset.bytes);
+	rte_ring_dequeue_zc_elem_finish(qcache, 1);
+	cpool->pool[iidx].share = 0;
+	return 0;
+}
+
+static __rte_always_inline unsigned int
+mlx5_hws_cnt_pool_get_size(struct mlx5_hws_cnt_pool *cpool)
+{
+	return rte_ring_get_capacity(cpool->free_list);
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
+		cnt_id_t cnt_id, struct mlx5dr_action **action,
+		uint32_t *offset)
+{
+	uint8_t idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+
+	idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	*action = cpool->dcs_mng.dcs[idx].dr_action;
+	*offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx;
+
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	if (ret != 0)
+		return ret;
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	MLX5_ASSERT(cpool->pool[iidx].share == 0);
+	cpool->pool[iidx].share = 1;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_put(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+
+	cpool->pool[iidx].share = 0;
+	ret = mlx5_hws_cnt_pool_put(cpool, NULL, cnt_id);
+	if (unlikely(ret != 0))
+		cpool->pool[iidx].share = 1; /* fail to release, restore. */
+	return ret;
+}
+
+static __rte_always_inline bool
+mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	return cpool->pool[iidx].share ? true : false;
+}
+
+/* init HWS counter pool. */
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg);
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh);
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool);
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool);
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue);
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
+
+#endif /* _MLX5_HWS_CNT_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 09/18] net/mlx5: support DR action template API
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (7 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 08/18] net/mlx5: add HW steering counter action Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
                     ` (8 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adapts mlx5 PMD to changes in mlx5dr API regarding action
templates. It changes the following:

1. Actions template creation:

    - Flow actions types are translated to mlx5dr action types in order
      to create mlx5dr_action_template object.
    - An offset is assigned to each flow action. This offset is used to
      predetermine action's location in rule_acts array passed on rule
      creation.

2. Template table creation:

    - Fixed actions are created and put in rule_acts cache using
      predetermined offsets
    - mlx5dr matcher is parametrized by action templates bound to
      template table.
    - mlx5dr matcher is configured to optimize rule creation based on
      passed rule indices.

3. Flow rule creation:

    - mlx5dr rule is parametrized by action template on which these
      rule's actions are based.
    - Rule index hint is provided to mlx5dr.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   1 +
 drivers/net/mlx5/mlx5.c          |   4 +-
 drivers/net/mlx5/mlx5.h          |   2 +
 drivers/net/mlx5/mlx5_flow.h     |  32 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 617 +++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c  |  10 +
 6 files changed, 543 insertions(+), 123 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index c70cd84b8d..78cc44fae8 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1565,6 +1565,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		flow_hw_init_flow_metadata_config(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
 		    flow_hw_create_vport_action(eth_dev)) {
 			DRV_LOG(ERR, "port %u failed to create vport action",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4d87da8e29..e7a4aac354 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1969,8 +1969,10 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
-	if (priv->sh->config.dv_flow_en == 2)
+	if (priv->sh->config.dv_flow_en == 2) {
+		flow_hw_clear_flow_metadata_config();
 		flow_hw_clear_tags_set(dev);
+	}
 #endif
 	if (priv->rxq_privs != NULL) {
 		/* XXX race condition if mlx5_rx_burst() is still running. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c71db131a1..b8663e0322 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1651,6 +1651,8 @@ struct mlx5_priv {
 	struct mlx5dr_action *hw_drop[2];
 	/* HW steering global tag action. */
 	struct mlx5dr_action *hw_tag[2];
+	/* HW steering create ongoing rte flow table list header. */
+	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1948de5dd8..210cc9ae3e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1186,6 +1186,11 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint16_t dr_actions_num; /* Amount of DR rules actions. */
+	uint16_t actions_num; /* Amount of flow actions */
+	uint16_t *actions_off; /* DR action offset for given rte action offset. */
+	uint16_t reformat_off; /* Offset of DR reformat action. */
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
@@ -1237,7 +1242,6 @@ struct mlx5_hw_actions {
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
-	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
@@ -1493,6 +1497,13 @@ flow_hw_get_wire_port(struct ibv_context *ibctx)
 }
 #endif
 
+extern uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+extern uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+extern uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+void flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev);
+void flow_hw_clear_flow_metadata_config(void);
+
 /*
  * Convert metadata or tag to the actual register.
  * META: Can only be used to match in the FDB in this stage, fixed C_1.
@@ -1504,7 +1515,22 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 {
 	switch (type) {
 	case RTE_FLOW_ITEM_TYPE_META:
-		return REG_C_1;
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		if (mlx5_flow_hw_flow_metadata_esw_en &&
+		    mlx5_flow_hw_flow_metadata_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		}
+#endif
+		/*
+		 * On root table - PMD allows only egress META matching, thus
+		 * REG_A matching is sufficient.
+		 *
+		 * On non-root tables - REG_A corresponds to general_purpose_lookup_field,
+		 * which translates to REG_A in NIC TX and to REG_B in NIC RX.
+		 * However, current FW does not implement REG_B case right now, so
+		 * REG_B case should be rejected on pattern template validation.
+		 */
+		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
@@ -2413,4 +2439,6 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_pattern_template_attr *attr,
 		const struct rte_flow_item items[],
 		struct rte_flow_error *error);
+int flow_hw_table_update(struct rte_eth_dev *dev,
+			 struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1e441c9c0d..5b7ef1be68 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -340,6 +340,13 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 				 struct mlx5_hw_actions *acts)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_action_construct_data *data;
+
+	while (!LIST_EMPTY(&acts->act_list)) {
+		data = LIST_FIRST(&acts->act_list);
+		LIST_REMOVE(data, next);
+		mlx5_ipool_free(priv->acts_ipool, data->idx);
+	}
 
 	if (acts->jump) {
 		struct mlx5_flow_group *grp;
@@ -349,6 +356,16 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->tir) {
+		mlx5_hrxq_release(dev, acts->tir->idx);
+		acts->tir = NULL;
+	}
+	if (acts->encap_decap) {
+		if (acts->encap_decap->action)
+			mlx5dr_action_destroy(acts->encap_decap->action);
+		mlx5_free(acts->encap_decap);
+		acts->encap_decap = NULL;
+	}
 	if (acts->mhdr) {
 		if (acts->mhdr->action)
 			mlx5dr_action_destroy(acts->mhdr->action);
@@ -967,33 +984,29 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 static __rte_always_inline int
 flow_hw_meter_compile(struct rte_eth_dev *dev,
 		      const struct mlx5_flow_template_table_cfg *cfg,
-		      uint32_t  start_pos, const struct rte_flow_action *action,
-		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      uint16_t aso_mtr_pos,
+		      uint16_t jump_pos,
+		      const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts,
 		      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr *aso_mtr;
 	const struct rte_flow_action_meter *meter = action->conf;
-	uint32_t pos = start_pos;
 	uint32_t group = cfg->attr.flow_attr.group;
 
 	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
-	acts->rule_acts[pos].action = priv->mtr_bulk.action;
-	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
-		acts->jump = flow_hw_jump_action_register
+	acts->rule_acts[aso_mtr_pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts->jump = flow_hw_jump_action_register
 		(dev, cfg, aso_mtr->fm.group, error);
-	if (!acts->jump) {
-		*end_pos = start_pos;
+	if (!acts->jump)
 		return -ENOMEM;
-	}
-	acts->rule_acts[++pos].action = (!!group) ?
+	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	*end_pos = pos;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
-		*end_pos = start_pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
 		return -ENOMEM;
-	}
 	return 0;
 }
 
@@ -1046,11 +1059,11 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
  *    Table on success, NULL otherwise and rte_errno is set.
  */
 static int
-flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct mlx5_flow_template_table_cfg *cfg,
-			  struct mlx5_hw_actions *acts,
-			  struct rte_flow_actions_template *at,
-			  struct rte_flow_error *error)
+__flow_hw_actions_translate(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_template_table_cfg *cfg,
+			    struct mlx5_hw_actions *acts,
+			    struct rte_flow_actions_template *at,
+			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
@@ -1061,12 +1074,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	enum mlx5dr_action_reformat_type refmt_type = 0;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
-	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
+	uint16_t reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
-	uint32_t type, i;
+	uint32_t type;
+	bool reformat_used = false;
+	uint16_t action_pos;
+	uint16_t jump_pos;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1076,46 +1092,53 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		type = MLX5DR_TABLE_TYPE_NIC_TX;
 	else
 		type = MLX5DR_TABLE_TYPE_NIC_RX;
-	for (i = 0; !actions_end; actions++, masks++) {
+	for (; !actions_end; actions++, masks++) {
 		switch (actions->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			action_pos = at->actions_off[actions - at->actions];
 			if (!attr->group) {
 				DRV_LOG(ERR, "Indirect action is not supported in root table.");
 				goto err;
 			}
 			if (actions->conf && masks->conf) {
 				if (flow_hw_shared_action_translate
-				(dev, actions, acts, actions - action_start, i))
+				(dev, actions, acts, actions - action_start, action_pos))
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
-			acts->rule_acts[i++].action =
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
 				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
+			action_pos = at->actions_off[actions - at->actions];
 			acts->mark = true;
-			if (masks->conf)
-				acts->rule_acts[i].tag.value =
+			if (masks->conf &&
+			    ((const struct rte_flow_action_mark *)
+			     masks->conf)->id)
+				acts->rule_acts[action_pos].tag.value =
 					mlx5_flow_mark_set
 					(((const struct rte_flow_action_mark *)
-					(masks->conf))->id);
+					(actions->conf))->id);
 			else if (__flow_hw_act_data_general_append(priv, acts,
-				actions->type, actions - action_start, i))
+				actions->type, actions - action_start, action_pos))
 				goto err;
-			acts->rule_acts[i++].action =
+			acts->rule_acts[action_pos].action =
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_jump *)
+			     masks->conf)->group) {
 				uint32_t jump_group =
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
@@ -1123,76 +1146,77 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
-				acts->rule_acts[i].action = (!!attr->group) ?
+				acts->rule_acts[action_pos].action = (!!attr->group) ?
 						acts->jump->hws_action :
 						acts->jump->root_action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_queue *)
+			     masks->conf)->index) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (actions->conf && masks->conf) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
-			reformat_pos = i++;
+			MLX5_ASSERT(!reformat_used);
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
@@ -1206,25 +1230,23 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 actions->conf;
 			encap_data = raw_encap_data->data;
 			data_size = raw_encap_data->size;
-			if (reformat_pos != MLX5_HW_MAX_ACTS) {
+			if (reformat_used) {
 				refmt_type = data_size <
 				MLX5_ENCAPSULATION_DECISION_SIZE ?
 				MLX5DR_ACTION_REFORMAT_TYPE_TNL_L3_TO_L2 :
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L3;
 			} else {
-				reformat_pos = i++;
+				reformat_used = true;
 				refmt_type =
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			}
 			reformat_src = actions - action_start;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
-			reformat_pos = i++;
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			if (mhdr.pos == UINT16_MAX)
-				mhdr.pos = i++;
 			err = flow_hw_modify_field_compile(dev, attr, action_start,
 							   actions, masks, acts, &mhdr,
 							   error);
@@ -1242,40 +1264,46 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			action_pos = at->actions_off[actions - at->actions];
 			if (flow_hw_represented_port_compile
 					(dev, attr, action_start, actions,
-					 masks, acts, i, error))
+					 masks, acts, action_pos, error))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
+			/*
+			 * METER action is compiled to 2 DR actions - ASO_METER and FT.
+			 * Calculated DR offset is stored only for ASO_METER and FT
+			 * is assumed to be the next action.
+			 */
+			action_pos = at->actions_off[actions - at->actions];
+			jump_pos = action_pos + 1;
 			if (actions->conf && masks->conf &&
 			    ((const struct rte_flow_action_meter *)
 			     masks->conf)->mtr_id) {
 				err = flow_hw_meter_compile(dev, cfg,
-						i, actions, acts, &i, error);
+						action_pos, jump_pos, actions, acts, error);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append(priv, acts,
 							actions->type,
 							actions - action_start,
-							i))
+							action_pos))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
+			action_pos = at->actions_off[actions - action_start];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
-				err = flow_hw_cnt_compile(dev, i, acts);
+				err = flow_hw_cnt_compile(dev, action_pos, acts);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -1309,10 +1337,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			goto err;
 		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
 	}
-	if (reformat_pos != MLX5_HW_MAX_ACTS) {
+	if (reformat_used) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
 
+		MLX5_ASSERT(at->reformat_off != UINT16_MAX);
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
 			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
@@ -1340,20 +1369,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
-		acts->rule_acts[reformat_pos].action =
-						acts->encap_decap->action;
-		acts->rule_acts[reformat_pos].reformat.data =
-						acts->encap_decap->data;
+		acts->rule_acts[at->reformat_off].action = acts->encap_decap->action;
+		acts->rule_acts[at->reformat_off].reformat.data = acts->encap_decap->data;
 		if (shared_rfmt)
-			acts->rule_acts[reformat_pos].reformat.offset = 0;
+			acts->rule_acts[at->reformat_off].reformat.offset = 0;
 		else if (__flow_hw_act_data_encap_append(priv, acts,
 				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, data_size))
+				 reformat_src, at->reformat_off, data_size))
 			goto err;
 		acts->encap_decap->shared = shared_rfmt;
-		acts->encap_decap_pos = reformat_pos;
+		acts->encap_decap_pos = at->reformat_off;
 	}
-	acts->acts_num = i;
 	return 0;
 err:
 	err = rte_errno;
@@ -1363,6 +1389,40 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				  "fail to create rte table");
 }
 
+/**
+ * Translate rte_flow actions to DR action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] tbl
+ *   Pointer to the flow template table.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_actions_translate(struct rte_eth_dev *dev,
+			  struct rte_flow_template_table *tbl,
+			  struct rte_flow_error *error)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->nb_action_templates; i++) {
+		if (__flow_hw_actions_translate(dev, &tbl->cfg,
+						&tbl->ats[i].acts,
+						tbl->ats[i].action_template,
+						error))
+			goto err;
+	}
+	return 0;
+err:
+	while (i--)
+		__flow_hw_action_template_destroy(dev, &tbl->ats[i].acts);
+	return -1;
+}
+
 /**
  * Get shared indirect action.
  *
@@ -1611,16 +1671,17 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 static __rte_always_inline int
 flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5_hw_q_job *job,
-			  const struct mlx5_hw_actions *hw_acts,
+			  const struct mlx5_hw_action_template *hw_at,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
+	const struct rte_flow_actions_template *at = hw_at->action_template;
+	const struct mlx5_hw_actions *hw_acts = &hw_at->acts;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
@@ -1636,11 +1697,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *mtr;
 	uint32_t mtr_id;
 
-	memcpy(rule_acts, hw_acts->rule_acts,
-	       sizeof(*rule_acts) * hw_acts->acts_num);
-	*acts_num = hw_acts->acts_num;
-	if (LIST_EMPTY(&hw_acts->act_list))
-		return 0;
+	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
 	ft_flag = mlx5_hw_act_flag[!!table->grp->group_id][table->type];
 	if (table->type == MLX5DR_TABLE_TYPE_FDB) {
@@ -1774,7 +1831,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			(*acts_num)++;
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
@@ -1912,13 +1968,16 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 		.burst = attr->postpone,
 	};
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
-	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
 	const struct rte_flow_item *rule_items;
-	uint32_t acts_num, flow_idx;
+	uint32_t flow_idx;
 	int ret;
 
+	if (unlikely((!dev->data->dev_started))) {
+		rte_errno = EINVAL;
+		goto error;
+	}
 	if (unlikely(!priv->hw_q[queue].job_idx)) {
 		rte_errno = ENOMEM;
 		goto error;
@@ -1941,7 +2000,12 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->flow = flow;
 	job->user_data = user_data;
 	rule_attr.user_data = job;
-	hw_acts = &table->ats[action_template_index].acts;
+	/*
+	 * Indexed pool returns 1-based indices, but mlx5dr expects 0-based indices for rule
+	 * insertion hints.
+	 */
+	MLX5_ASSERT(flow_idx > 0);
+	rule_attr.rule_idx = flow_idx - 1;
 	/*
 	 * Construct the flow actions based on the input actions.
 	 * The implicitly appended action is always fixed, like metadata
@@ -1949,8 +2013,8 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num, queue)) {
+	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
+				      pattern_template_index, actions, rule_acts, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1959,7 +2023,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	if (!rule_items)
 		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
-				 pattern_template_index, items,
+				 pattern_template_index, rule_items,
 				 action_template_index, rule_acts,
 				 &rule_attr, (struct mlx5dr_rule *)flow->rule);
 	if (likely(!ret))
@@ -2295,6 +2359,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	struct mlx5dr_action_template *at[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
@@ -2315,6 +2380,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct mlx5_list_entry *ge;
 	uint32_t i, max_tpl = MLX5_HW_TBL_MAX_ITEM_TEMPLATE;
 	uint32_t nb_flows = rte_align32pow2(attr->nb_flows);
+	bool port_started = !!dev->data->dev_started;
 	int err;
 
 	/* HWS layer accepts only 1 item template with root table. */
@@ -2349,12 +2415,20 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl->grp = grp;
 	/* Prepare matcher information. */
 	matcher_attr.priority = attr->flow_attr.priority;
+	matcher_attr.optimize_using_rule_idx = true;
 	matcher_attr.mode = MLX5DR_MATCHER_RESOURCE_MODE_RULE;
 	matcher_attr.rule.num_log = rte_log2_u32(nb_flows);
 	/* Build the item template. */
 	for (i = 0; i < nb_item_templates; i++) {
 		uint32_t ret;
 
+		if ((flow_attr.ingress && !item_templates[i]->attr.ingress) ||
+		    (flow_attr.egress && !item_templates[i]->attr.egress) ||
+		    (flow_attr.transfer && !item_templates[i]->attr.transfer)) {
+			DRV_LOG(ERR, "pattern template and template table attribute mismatch");
+			rte_errno = EINVAL;
+			goto it_error;
+		}
 		ret = __atomic_add_fetch(&item_templates[i]->refcnt, 1,
 					 __ATOMIC_RELAXED);
 		if (ret <= 1) {
@@ -2364,10 +2438,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mt[i] = item_templates[i]->mt;
 		tbl->its[i] = item_templates[i];
 	}
-	tbl->matcher = mlx5dr_matcher_create
-		(tbl->grp->tbl, mt, nb_item_templates, NULL, 0, &matcher_attr);
-	if (!tbl->matcher)
-		goto it_error;
 	tbl->nb_item_templates = nb_item_templates;
 	/* Build the action template. */
 	for (i = 0; i < nb_action_templates; i++) {
@@ -2379,21 +2449,31 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			rte_errno = EINVAL;
 			goto at_error;
 		}
+		at[i] = action_templates[i]->tmpl;
+		tbl->ats[i].action_template = action_templates[i];
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, &tbl->cfg,
-						&tbl->ats[i].acts,
-						action_templates[i], error);
+		if (!port_started)
+			continue;
+		err = __flow_hw_actions_translate(dev, &tbl->cfg,
+						  &tbl->ats[i].acts,
+						  action_templates[i], error);
 		if (err) {
 			i++;
 			goto at_error;
 		}
-		tbl->ats[i].action_template = action_templates[i];
 	}
 	tbl->nb_action_templates = nb_action_templates;
+	tbl->matcher = mlx5dr_matcher_create
+		(tbl->grp->tbl, mt, nb_item_templates, at, nb_action_templates, &matcher_attr);
+	if (!tbl->matcher)
+		goto at_error;
 	tbl->type = attr->flow_attr.transfer ? MLX5DR_TABLE_TYPE_FDB :
 		    (attr->flow_attr.egress ? MLX5DR_TABLE_TYPE_NIC_TX :
 		    MLX5DR_TABLE_TYPE_NIC_RX);
-	LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	if (port_started)
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	else
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl_ongo, tbl, next);
 	return tbl;
 at_error:
 	while (i--) {
@@ -2406,7 +2486,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	while (i--)
 		__atomic_sub_fetch(&item_templates[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
-	mlx5dr_matcher_destroy(tbl->matcher);
 error:
 	err = rte_errno;
 	if (tbl) {
@@ -2423,6 +2502,33 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Update flow template table.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+int
+flow_hw_table_update(struct rte_eth_dev *dev,
+		     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table *tbl;
+
+	while ((tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo)) != NULL) {
+		if (flow_hw_actions_translate(dev, tbl, error))
+			return -1;
+		LIST_REMOVE(tbl, next);
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	}
+	return 0;
+}
+
 /**
  * Translates group index specified by the user in @p attr to internal
  * group index.
@@ -2501,6 +2607,7 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -2509,6 +2616,12 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
+	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
+		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+				  "egress flows are not supported with HW Steering"
+				  " when E-Switch is enabled");
+		return NULL;
+	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -2750,7 +2863,8 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_action *mask = &masks[i];
 
 		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
-		if (action->type != mask->type)
+		if (action->type != RTE_FLOW_ACTION_TYPE_INDIRECT &&
+		    action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  action,
@@ -2826,6 +2940,157 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
+	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
+	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
+	[RTE_FLOW_ACTION_TYPE_JUMP] = MLX5DR_ACTION_TYP_FT,
+	[RTE_FLOW_ACTION_TYPE_QUEUE] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_RSS] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
+	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
+	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+};
+
+static int
+flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
+					  unsigned int action_src,
+					  enum mlx5dr_action_type *action_types,
+					  uint16_t *curr_off,
+					  struct rte_flow_actions_template *at)
+{
+	uint32_t type;
+
+	if (!mask) {
+		DRV_LOG(WARNING, "Unable to determine indirect action type "
+			"without a mask specified");
+		return -EINVAL;
+	}
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
+		*curr_off = *curr_off + 1;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
+		*curr_off = *curr_off + 1;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Create DR action template based on a provided sequence of flow actions.
+ *
+ * @param[in] at
+ *   Pointer to flow actions template to be updated.
+ *
+ * @return
+ *   DR action template pointer on success and action offsets in @p at are updated.
+ *   NULL otherwise.
+ */
+static struct mlx5dr_action_template *
+flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
+{
+	struct mlx5dr_action_template *dr_template;
+	enum mlx5dr_action_type action_types[MLX5_HW_MAX_ACTS] = { MLX5DR_ACTION_TYP_LAST };
+	unsigned int i;
+	uint16_t curr_off;
+	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+	uint16_t reformat_off = UINT16_MAX;
+	uint16_t mhdr_off = UINT16_MAX;
+	int ret;
+	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		const struct rte_flow_action_raw_encap *raw_encap_data;
+		size_t data_size;
+		enum mlx5dr_action_type type;
+
+		if (curr_off >= MLX5_HW_MAX_ACTS)
+			goto err_actions_num;
+		switch (at->actions[i].type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
+									action_types,
+									&curr_off, at);
+			if (ret)
+				return NULL;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			MLX5_ASSERT(reformat_off == UINT16_MAX);
+			reformat_off = curr_off++;
+			reformat_act_type = mlx5_hw_dr_action_types[at->actions[i].type];
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data = at->actions[i].conf;
+			data_size = raw_encap_data->size;
+			if (reformat_off != UINT16_MAX) {
+				reformat_act_type = data_size < MLX5_ENCAPSULATION_DECISION_SIZE ?
+					MLX5DR_ACTION_TYP_TNL_L3_TO_L2 :
+					MLX5DR_ACTION_TYP_L2_TO_TNL_L3;
+			} else {
+				reformat_off = curr_off++;
+				reformat_act_type = MLX5DR_ACTION_TYP_L2_TO_TNL_L2;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			reformat_off = curr_off++;
+			reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr_off == UINT16_MAX) {
+				mhdr_off = curr_off++;
+				type = mlx5_hw_dr_action_types[at->actions[i].type];
+				action_types[mhdr_off] = type;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
+			break;
+		default:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			break;
+		}
+	}
+	if (curr_off >= MLX5_HW_MAX_ACTS)
+		goto err_actions_num;
+	if (mhdr_off != UINT16_MAX)
+		at->mhdr_off = mhdr_off;
+	if (reformat_off != UINT16_MAX) {
+		at->reformat_off = reformat_off;
+		action_types[reformat_off] = reformat_act_type;
+	}
+	dr_template = mlx5dr_action_template_create(action_types);
+	if (dr_template)
+		at->dr_actions_num = curr_off;
+	else
+		DRV_LOG(ERR, "Failed to create DR action template: %d", rte_errno);
+	return dr_template;
+err_actions_num:
+	DRV_LOG(ERR, "Number of HW actions (%u) exceeded maximum (%u) allowed in template",
+		curr_off, MLX5_HW_MAX_ACTS);
+	return NULL;
+}
+
 /**
  * Create flow action template.
  *
@@ -2851,7 +3116,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_len, mask_len, i;
+	int len, act_num, act_len, mask_len;
+	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = MLX5_HW_MAX_ACTS;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
@@ -2921,6 +3187,11 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
+	/* Count flow actions to allocate required space for storing DR offsets. */
+	act_num = 0;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
+		act_num++;
+	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
@@ -2930,19 +3201,26 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
-	/* Actions part is in the first half. */
+	/* Actions part is in the first part. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
 				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	/* Masks part is in the second half. */
+	/* Masks part is in the second part. */
 	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
 				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	/* DR actions offsets in the third part. */
+	at->actions_off = (uint16_t *)((uint8_t *)at->masks + mask_len);
+	at->actions_num = act_num;
+	for (i = 0; i < at->actions_num; ++i)
+		at->actions_off[i] = UINT16_MAX;
+	at->reformat_off = UINT16_MAX;
+	at->mhdr_off = UINT16_MAX;
 	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
@@ -2956,12 +3234,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			at->masks[i].conf = masks->conf;
 		}
 	}
+	at->tmpl = flow_hw_dr_actions_template_create(at);
+	if (!at->tmpl)
+		goto error;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	if (at)
+	if (at) {
+		if (at->tmpl)
+			mlx5dr_action_template_destroy(at->tmpl);
 		mlx5_free(at);
+	}
 	return NULL;
 }
 
@@ -2992,6 +3276,8 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 				   "action template in using");
 	}
 	LIST_REMOVE(template, next);
+	if (template->tmpl)
+		mlx5dr_action_template_destroy(template->tmpl);
 	mlx5_free(template);
 	return 0;
 }
@@ -3042,11 +3328,48 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			 const struct rte_flow_item items[],
 			 struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 	bool items_end = false;
-	RTE_SET_USED(dev);
-	RTE_SET_USED(attr);
 
+	if (!attr->ingress && !attr->egress && !attr->transfer)
+		return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "at least one of the direction attributes"
+					  " must be specified");
+	if (priv->sh->config.dv_esw_en) {
+		MLX5_ASSERT(priv->master || priv->representor);
+		if (priv->master) {
+			/*
+			 * It is allowed to specify ingress, egress and transfer attributes
+			 * at the same time, in order to construct flows catching all missed
+			 * FDB traffic and forwarding it to the master port.
+			 */
+			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "only one or all direction attributes"
+							  " at once can be used on transfer proxy"
+							  " port");
+		} else {
+			if (attr->transfer)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+							  "transfer attribute cannot be used with"
+							  " port representors");
+			if (attr->ingress && attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "ingress and egress direction attributes"
+							  " cannot be used at the same time on"
+							  " port representors");
+		}
+	} else {
+		if (attr->transfer)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+						  "transfer attribute cannot be used when"
+						  " E-Switch is disabled");
+	}
 	for (i = 0; !items_end; i++) {
 		int type = items[i].type;
 
@@ -3069,7 +3392,6 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		{
 			const struct rte_flow_item_tag *tag =
 				(const struct rte_flow_item_tag *)items[i].spec;
-			struct mlx5_priv *priv = dev->data->dev_private;
 			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
 
 			if (!((1 << (tag->index - REG_C_0)) & regcs))
@@ -3077,7 +3399,26 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 							  NULL,
 							  "Unsupported internal tag index");
+			break;
 		}
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+			if (attr->ingress || attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when transfer attribute is set");
+			break;
+		case RTE_FLOW_ITEM_TYPE_META:
+			if (!priv->sh->config.dv_esw_en ||
+			    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+				if (attr->ingress)
+					return rte_flow_error_set(error, EINVAL,
+								  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+								  "META item is not supported"
+								  " on current FW with ingress"
+								  " attribute");
+			}
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -3087,10 +3428,8 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_TCP:
 		case RTE_FLOW_ITEM_TYPE_GTP:
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
-		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
-		case RTE_FLOW_ITEM_TYPE_META:
 		case RTE_FLOW_ITEM_TYPE_GRE:
 		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
@@ -3138,21 +3477,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress) {
-		/*
-		 * Disallow pattern template with ingress and egress/transfer
-		 * attributes in order to forbid implicit port matching
-		 * on egress and transfer traffic.
-		 */
-		if (attr->egress || attr->transfer) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					   NULL,
-					   "item template for ingress traffic"
-					   " cannot be used for egress/transfer"
-					   " traffic when E-Switch is enabled");
-			return NULL;
-		}
+	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
 		copied_items = flow_hw_copy_prepend_port_item(items, error);
 		if (!copied_items)
 			return NULL;
@@ -4536,6 +4861,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
+		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
+		flow_hw_table_destroy(dev, tbl, NULL);
+	}
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -4673,6 +5002,54 @@ void flow_hw_clear_tags_set(struct rte_eth_dev *dev)
 		       sizeof(enum modify_reg) * MLX5_FLOW_HW_TAGS_MAX);
 }
 
+uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+/**
+ * Initializes static configuration of META flow items.
+ *
+ * As a temporary workaround, META flow item is translated to a register,
+ * based on statically saved dv_esw_en and dv_xmeta_en device arguments.
+ * It is a workaround for flow_hw_get_reg_id() where port specific information
+ * is not available at runtime.
+ *
+ * Values of dv_esw_en and dv_xmeta_en device arguments are taken from the first opened port.
+ * This means that each mlx5 port will use the same configuration for translation
+ * of META flow items.
+ *
+ * @param[in] dev
+ *    Pointer to Ethernet device.
+ */
+void
+flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_fetch_add(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = MLX5_SH(dev)->config.dv_esw_en;
+	mlx5_flow_hw_flow_metadata_xmeta_en = MLX5_SH(dev)->config.dv_xmeta_en;
+}
+
+/**
+ * Clears statically stored configuration related to META flow items.
+ */
+void
+flow_hw_clear_flow_metadata_config(void)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_sub_fetch(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = 0;
+	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
+}
+
 /**
  * Create shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cccec08d70..c260c81e57 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1170,6 +1170,16 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 			dev->data->port_id, rte_strerror(rte_errno));
 		goto error;
 	}
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		ret = flow_hw_table_update(dev, NULL);
+		if (ret) {
+			DRV_LOG(ERR, "port %u failed to update HWS tables",
+				dev->data->port_id);
+			goto error;
+		}
+	}
+#endif
 	ret = mlx5_traffic_enable(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u failed to set defaults flows",
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 10/18] net/mlx5: add HW steering connection tracking support
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (8 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 09/18] net/mlx5: support DR action template API Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

This commit adds the support of connection tracking to HW steering as
SW steering did before.

Different with SW steering implementation, take advantage of HW steering
bulk action allocation support, in HW steering only one single CT pool
is needed.

An indexed pool is introduced to record allocated actions from bulk and
CT action state etc. Once one CT action is allocated from bulk, one
indexed object will also be allocated from the indexed pool, similar for
deallocate. That makes mlx5_aso_ct_action can also be managed by that
indexed pool, no need to be reserved from mlx5_aso_ct_pool. The single
CT pool is also saved to mlx5_aso_ct_action struct directly.

The ASO operation functions are shared with SW steering implementation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 doc/guides/nics/mlx5.rst               |   2 +-
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/net/mlx5/linux/mlx5_os.c       |   8 +-
 drivers/net/mlx5/mlx5.c                |   3 +-
 drivers/net/mlx5/mlx5.h                |  54 +++-
 drivers/net/mlx5/mlx5_flow.c           |   1 +
 drivers/net/mlx5/mlx5_flow.h           |   7 +
 drivers/net/mlx5/mlx5_flow_aso.c       | 212 ++++++++++----
 drivers/net/mlx5/mlx5_flow_dv.c        |  28 +-
 drivers/net/mlx5/mlx5_flow_hw.c        | 381 ++++++++++++++++++++++++-
 10 files changed, 619 insertions(+), 78 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0c7bd042a4..e499c38dcf 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -506,7 +506,7 @@ Limitations
   - Cannot co-exist with ASO meter, ASO age action in a single flow rule.
   - Flow rules insertion rate and memory consumption need more optimization.
   - 256 ports maximum.
-  - 4M connections maximum.
+  - 4M connections maximum with ``dv_flow_en`` 1 mode. 16M with ``dv_flow_en`` 2.
 
 - Multi-thread flow insertion:
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 64fbecd7b3..1ec218a5d1 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -244,6 +244,7 @@ New Features
     - Support of FDB.
     - Support of meter.
     - Support of counter.
+    - Support of CT.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 78cc44fae8..55801682cc 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1349,9 +1349,11 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			DRV_LOG(DEBUG, "Flow Hit ASO is supported.");
 		}
 #endif /* HAVE_MLX5_DR_CREATE_ACTION_ASO */
-#if defined(HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
-	defined(HAVE_MLX5_DR_ACTION_ASO_CT)
-		if (hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
+#if defined (HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
+    defined (HAVE_MLX5_DR_ACTION_ASO_CT)
+		/* HWS create CT ASO SQ based on HWS configure queue number. */
+		if (sh->config.dv_flow_en != 2 &&
+		    hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
 			err = mlx5_flow_aso_ct_mng_init(sh);
 			if (err) {
 				err = -err;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e7a4aac354..6490ac636c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -755,7 +755,8 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 
 	if (sh->ct_mng)
 		return 0;
-	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng),
+	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng) +
+				 sizeof(struct mlx5_aso_sq) * MLX5_ASO_CT_SQ_NUM,
 				 RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
 	if (!sh->ct_mng) {
 		DRV_LOG(ERR, "ASO CT management allocation failed.");
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b8663e0322..9c080e5eac 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -44,6 +44,8 @@
 
 #define MLX5_SH(dev) (((struct mlx5_priv *)(dev)->data->dev_private)->sh)
 
+#define MLX5_HW_INV_QUEUE UINT32_MAX
+
 /*
  * Number of modification commands.
  * The maximal actions amount in FW is some constant, and it is 16 in the
@@ -1164,7 +1166,12 @@ enum mlx5_aso_ct_state {
 
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
-	LIST_ENTRY(mlx5_aso_ct_action) next; /* Pointer to the next ASO CT. */
+	union {
+		LIST_ENTRY(mlx5_aso_ct_action) next;
+		/* Pointer to the next ASO CT. Used only in SWS. */
+		struct mlx5_aso_ct_pool *pool;
+		/* Pointer to action pool. Used only in HWS. */
+	};
 	void *dr_action_orig; /* General action object for original dir. */
 	void *dr_action_rply; /* General action object for reply dir. */
 	uint32_t refcnt; /* Action used count in device flows. */
@@ -1178,28 +1185,48 @@ struct mlx5_aso_ct_action {
 #define MLX5_ASO_CT_UPDATE_STATE(c, s) \
 	__atomic_store_n(&((c)->state), (s), __ATOMIC_RELAXED)
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+
 /* ASO connection tracking software pool definition. */
 struct mlx5_aso_ct_pool {
 	uint16_t index; /* Pool index in pools array. */
+	/* Free ASO CT index in the pool. Used by HWS. */
+	struct mlx5_indexed_pool *cts;
 	struct mlx5_devx_obj *devx_obj;
-	/* The first devx object in the bulk, used for freeing (not yet). */
-	struct mlx5_aso_ct_action actions[MLX5_ASO_CT_ACTIONS_PER_POOL];
+	union {
+		void *dummy_action;
+		/* Dummy action to increase the reference count in the driver. */
+		struct mlx5dr_action *dr_action;
+		/* HWS action. */
+	};
+	struct mlx5_aso_sq *sq; /* Async ASO SQ. */
+	struct mlx5_aso_sq *shared_sq; /* Shared ASO SQ. */
+	struct mlx5_aso_ct_action actions[0];
 	/* CT action structures bulk. */
 };
 
 LIST_HEAD(aso_ct_list, mlx5_aso_ct_action);
 
+#define MLX5_ASO_CT_SQ_NUM 16
+
 /* Pools management structure for ASO connection tracking pools. */
 struct mlx5_aso_ct_pools_mng {
 	struct mlx5_aso_ct_pool **pools;
 	uint16_t n; /* Total number of pools. */
 	uint16_t next; /* Number of pools in use, index of next free pool. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
 	rte_spinlock_t ct_sl; /* The ASO CT free list lock. */
 	rte_rwlock_t resize_rwl; /* The ASO CT pool resize lock. */
 	struct aso_ct_list free_cts; /* Free ASO CT objects list. */
-	struct mlx5_aso_sq aso_sq; /* ASO queue objects. */
+	struct mlx5_aso_sq aso_sqs[0]; /* ASO queue objects. */
 };
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
 /* LAG attr. */
 struct mlx5_lag {
 	uint8_t tx_remap_affinity[16]; /* The PF port number of affinity */
@@ -1337,8 +1364,7 @@ struct mlx5_dev_ctx_shared {
 	rte_spinlock_t geneve_tlv_opt_sl; /* Lock for geneve tlv resource */
 	struct mlx5_flow_mtr_mng *mtrmng;
 	/* Meter management structure. */
-	struct mlx5_aso_ct_pools_mng *ct_mng;
-	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pools_mng *ct_mng; /* Management data for ASO CT in HWS only. */
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
@@ -1654,6 +1680,9 @@ struct mlx5_priv {
 	/* HW steering create ongoing rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
+	struct mlx5_aso_ct_pools_mng *ct_mng;
+	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
 #endif
 };
 
@@ -2053,15 +2082,15 @@ int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
-int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
-int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
 			     struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
 mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
@@ -2072,6 +2101,11 @@ int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_hws_cnt_pool *cpool);
+int mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_aso_ct_pools_mng *ct_mng,
+			   uint32_t nb_queues);
+int mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_aso_ct_pools_mng *ct_mng);
 
 /* mlx5_flow_flex.c */
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 38932fe9d7..7c3295609d 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -49,6 +49,7 @@ struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
  */
 uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 enum modify_reg mlx5_flow_hw_avl_tags[MLX5_FLOW_HW_TAGS_MAX] = {REG_NON};
+enum modify_reg mlx5_flow_hw_aso_tag;
 
 struct tunnel_default_miss_ctx {
 	uint16_t *queue;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 210cc9ae3e..7e90eac2d0 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -82,6 +82,10 @@ enum {
 #define MLX5_INDIRECT_ACT_CT_GET_IDX(index) \
 	((index) & ((1 << MLX5_INDIRECT_ACT_CT_OWNER_SHIFT) - 1))
 
+#define MLX5_ACTION_CTX_CT_GET_IDX  MLX5_INDIRECT_ACT_CT_GET_IDX
+#define MLX5_ACTION_CTX_CT_GET_OWNER MLX5_INDIRECT_ACT_CT_GET_OWNER
+#define MLX5_ACTION_CTX_CT_GEN_IDX MLX5_INDIRECT_ACT_CT_GEN_IDX
+
 /* Matches on selected register. */
 struct mlx5_rte_flow_item_tag {
 	enum modify_reg id;
@@ -1455,6 +1459,7 @@ extern struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
 #define MLX5_FLOW_HW_TAGS_MAX 8
 extern uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 extern enum modify_reg mlx5_flow_hw_avl_tags[];
+extern enum modify_reg mlx5_flow_hw_aso_tag;
 
 /*
  * Get metadata match tag and mask for given rte_eth_dev port.
@@ -1531,6 +1536,8 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 * REG_B case should be rejected on pattern template validation.
 		 */
 		return REG_A;
+	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index ed9272e583..c00c07b891 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -313,16 +313,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		/* 64B per object for query. */
-		if (mlx5_aso_reg_mr(cdev, 64 * sq_desc_n,
-				    &sh->ct_mng->aso_sq.mr))
+		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
 			return -1;
-		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
-			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
-			return -1;
-		}
-		mlx5_aso_ct_init_sq(&sh->ct_mng->aso_sq);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
@@ -343,7 +335,7 @@ void
 mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		      enum mlx5_access_aso_opc_mod aso_opc_mod)
 {
-	struct mlx5_aso_sq *sq;
+	struct mlx5_aso_sq *sq = NULL;
 
 	switch (aso_opc_mod) {
 	case ASO_OPC_MOD_FLOW_HIT:
@@ -354,14 +346,14 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->mtrmng->pools_mng.sq;
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		mlx5_aso_dereg_mr(sh->cdev, &sh->ct_mng->aso_sq.mr);
-		sq = &sh->ct_mng->aso_sq;
+		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
 		return;
 	}
-	mlx5_aso_destroy_sq(sq);
+	if (sq)
+		mlx5_aso_destroy_sq(sq);
 }
 
 /**
@@ -903,6 +895,89 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 	return -1;
 }
 
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_hws(uint32_t queue,
+			    struct mlx5_aso_ct_pool *pool)
+{
+	return (queue == MLX5_HW_INV_QUEUE) ?
+		pool->shared_sq : &pool->sq[queue];
+}
+
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_sws(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_ct_action *ct)
+{
+	return &sh->ct_mng->aso_sqs[ct->offset & (MLX5_ASO_CT_SQ_NUM - 1)];
+}
+
+static inline struct mlx5_aso_ct_pool*
+__mlx5_aso_ct_get_pool(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_action *ct)
+{
+	if (likely(sh->config.dv_flow_en == 2))
+		return ct->pool;
+	return container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+}
+
+int
+mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			 struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < ct_mng->nb_sq; i++) {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	}
+	return 0;
+}
+
+/**
+ * API to create and initialize CT Send Queue used for ASO access.
+ *
+ * @param[in] sh
+ *   Pointer to shared device context.
+ * @param[in] ct_mng
+ *   Pointer to the CT management struct.
+ * *param[in] nb_queues
+ *   Number of queues to be allocated.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_pools_mng *ct_mng,
+		       uint32_t nb_queues)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < nb_queues; i++) {
+		if (mlx5_aso_reg_mr(sh->cdev, 64 * (1 << MLX5_ASO_QUEUE_LOG_DESC),
+				    &ct_mng->aso_sqs[i].mr))
+			goto error;
+		if (mlx5_aso_sq_create(sh->cdev, &ct_mng->aso_sqs[i],
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_ct_init_sq(&ct_mng->aso_sqs[i]);
+	}
+	ct_mng->nb_sq = nb_queues;
+	return 0;
+error:
+	do {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		if (&ct_mng->aso_sqs[i])
+			mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	} while (i--);
+	ct_mng->nb_sq = 0;
+	return -1;
+}
+
 /*
  * Post a WQE to the ASO CT SQ to modify the context.
  *
@@ -918,11 +993,12 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
  */
 static uint16_t
 mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
+			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile)
+			      const struct rte_flow_action_conntrack *profile,
+			      bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -931,11 +1007,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	void *orig_dir;
 	void *reply_dir;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	/* Prevent other threads to update the index. */
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -945,7 +1023,7 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
 	sq->elts[sq->head & mask].ct = ct;
 	sq->elts[sq->head & mask].query_data = NULL;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1028,7 +1106,8 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1080,10 +1159,11 @@ mlx5_aso_ct_status_update(struct mlx5_aso_sq *sq, uint16_t num)
  */
 static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
-			    struct mlx5_aso_ct_action *ct, char *data)
+			    struct mlx5_aso_sq *sq,
+			    struct mlx5_aso_ct_action *ct, char *data,
+			    bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -1098,10 +1178,12 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	} else if (state == ASO_CONNTRACK_WAIT) {
 		return 0;
 	}
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -1113,7 +1195,7 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	wqe_idx = sq->head & mask;
 	sq->elts[wqe_idx].ct = ct;
 	sq->elts[wqe_idx].query_data = data;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1141,7 +1223,8 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1152,9 +1235,10 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
  *   Pointer to the CT pools management structure.
  */
 static void
-mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
+mlx5_aso_ct_completion_handle(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			      struct mlx5_aso_sq *sq,
+			      bool need_lock)
 {
-	struct mlx5_aso_sq *sq = &mng->aso_sq;
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
 	const uint32_t cq_size = 1 << cq->log_desc_n;
@@ -1165,10 +1249,12 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return;
 	}
 	next_idx = cq->cq_ci & mask;
@@ -1199,7 +1285,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /*
@@ -1207,6 +1294,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue index.
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  * @param[in] profile
@@ -1217,21 +1306,26 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  */
 int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
 			  const struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, ct, profile))
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1242,6 +1336,8 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue which CT works on..
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  *
@@ -1249,25 +1345,29 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, -1 on failure.
  */
 int
-mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		       struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 	    ASO_CONNTRACK_READY)
 		return 0;
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 		    ASO_CONNTRACK_READY)
 			return 0;
 		/* Waiting for CQE ready, consider should block or sleep. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to poll CQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1363,18 +1463,24 @@ mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
  */
 int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
 			 struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	char out_data[64 * 2];
 	int ret;
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		ret = mlx5_aso_ct_sq_query_single(sh, ct, out_data);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1383,12 +1489,11 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		else
 			rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
 data_handle:
-	ret = mlx5_aso_ct_wait_ready(sh, ct);
+	ret = mlx5_aso_ct_wait_ready(sh, queue, ct);
 	if (!ret)
 		mlx5_aso_ct_obj_analyze(profile, out_data);
 	return ret;
@@ -1408,13 +1513,20 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
  */
 int
 mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+		      uint32_t queue,
 		      struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	enum mlx5_aso_ct_state state =
 				__atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (state == ASO_CONNTRACK_FREE) {
 		rte_errno = ENXIO;
 		return -rte_errno;
@@ -1423,13 +1535,13 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		return 0;
 	}
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		state = __atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 		if (state == ASO_CONNTRACK_READY ||
 		    state == ASO_CONNTRACK_QUERY)
 			return 0;
-		/* Waiting for CQE ready, consider should block or sleep. */
-		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
+		/* Waiting for CQE ready, consider should block or sleep.  */
+		rte_delay_us_block(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
 	rte_errno = EBUSY;
 	return -rte_errno;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index a0bcaa5c53..ea13345baf 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12813,6 +12813,7 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 	struct mlx5_devx_obj *obj = NULL;
 	uint32_t i;
 	uint32_t log_obj_size = rte_log2_u32(MLX5_ASO_CT_ACTIONS_PER_POOL);
+	size_t mem_size;
 
 	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
 							  priv->sh->cdev->pdn,
@@ -12822,7 +12823,10 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
 		return NULL;
 	}
-	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	mem_size = sizeof(struct mlx5_aso_ct_action) *
+		   MLX5_ASO_CT_ACTIONS_PER_POOL +
+		   sizeof(*pool);
+	pool = mlx5_malloc(MLX5_MEM_ZERO, mem_size, 0, SOCKET_ID_ANY);
 	if (!pool) {
 		rte_errno = ENOMEM;
 		claim_zero(mlx5_devx_cmd_destroy(obj));
@@ -12962,10 +12966,13 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, ct, pro))
-		return rte_flow_error_set(error, EBUSY,
-					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					  "Failed to update CT");
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+		flow_dv_aso_ct_dev_release(dev, idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	return idx;
@@ -14160,7 +14167,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
 						"Failed to get CT object.");
-			if (mlx5_aso_ct_available(priv->sh, ct))
+			if (mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct))
 				return rte_flow_error_set(error, rte_errno,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
@@ -15768,14 +15775,15 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						ct, new_prf);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
 					"Failed to send CT context update WQE");
-		/* Block until ready or a failure. */
-		ret = mlx5_aso_ct_available(priv->sh, ct);
+		/* Block until ready or a failure, default is asynchronous. */
+		ret = mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct);
 		if (ret)
 			rte_flow_error_set(error, rte_errno,
 					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16604,7 +16612,7 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 5b7ef1be68..535df6ba5d 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -15,6 +15,14 @@
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_HW_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_HW_IPOOL_CACHE_MIN (1 << 9)
+
 /* Default push burst threshold. */
 #define BURST_THR 32u
 
@@ -324,6 +332,25 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 	return hrxq;
 }
 
+static __rte_always_inline int
+flow_hw_ct_compile(struct rte_eth_dev *dev,
+		   uint32_t queue, uint32_t idx,
+		   struct mlx5dr_rule_action *rule_act)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(priv->hws_ctpool->cts, MLX5_ACTION_CTX_CT_GET_IDX(idx));
+	if (!ct || mlx5_aso_ct_available(priv->sh, queue, ct))
+		return -1;
+	rule_act->action = priv->hws_ctpool->dr_action;
+	rule_act->aso_ct.offset = ct->offset;
+	rule_act->aso_ct.direction = ct->is_original ?
+		MLX5DR_ACTION_ASO_CT_DIRECTION_INITIATOR :
+		MLX5DR_ACTION_ASO_CT_DIRECTION_RESPONDER;
+	return 0;
+}
+
 /**
  * Destroy DR actions created by action template.
  *
@@ -640,6 +667,11 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
+				       idx, &acts->rule_acts[action_dst]))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1083,6 +1115,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	bool reformat_used = false;
 	uint16_t action_pos;
 	uint16_t jump_pos;
+	uint32_t ct_idx;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1305,6 +1338,20 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf) {
+				ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+					 ((uint32_t)(uintptr_t)actions->conf);
+				if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE, ct_idx,
+						       &acts->rule_acts[action_pos]))
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos)) {
+				goto err;
+			}
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1479,6 +1526,8 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev data structure.
+ * @param[in] queue
+ *   The flow creation queue index.
  * @param[in] action
  *   Pointer to the shared indirect rte_flow action.
  * @param[in] table
@@ -1492,7 +1541,7 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *    0 on success, negative value otherwise and rte_errno is set.
  */
 static __rte_always_inline int
-flow_hw_shared_action_construct(struct rte_eth_dev *dev,
+flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
 				const uint8_t it_idx,
@@ -1532,6 +1581,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				&rule_act->counter.offset))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1727,6 +1780,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		uint32_t ct_idx;
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
@@ -1735,7 +1789,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
-					(dev, action, table, it_idx,
+					(dev, queue, action, table, it_idx,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -1860,6 +1914,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				return ret;
 			job->flow->cnt_id = act_data->shared_counter.id;
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+				 ((uint32_t)(uintptr_t)action->conf);
+			if (flow_hw_ct_compile(dev, queue, ct_idx,
+					       &rule_acts[act_data->action_dst]))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2391,6 +2452,8 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	if (nb_flows < cfg.trunk_size) {
 		cfg.per_core_cache = 0;
 		cfg.trunk_size = nb_flows;
+	} else if (nb_flows <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
 	}
 	/* Check if we requires too many templates. */
 	if (nb_item_templates > max_tpl ||
@@ -2927,6 +2990,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2953,6 +3019,7 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 };
 
 static int
@@ -2981,6 +3048,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3435,6 +3507,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
 		case RTE_FLOW_ITEM_TYPE_ICMP:
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
+		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
@@ -4627,6 +4700,97 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	return -EINVAL;
 }
 
+static void
+flow_hw_ct_mng_destroy(struct rte_eth_dev *dev,
+		       struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	mlx5_aso_ct_queue_uninit(priv->sh, ct_mng);
+	mlx5_free(ct_mng);
+}
+
+static void
+flow_hw_ct_pool_destroy(struct rte_eth_dev *dev __rte_unused,
+			struct mlx5_aso_ct_pool *pool)
+{
+	if (pool->dr_action)
+		mlx5dr_action_destroy(pool->dr_action);
+	if (pool->devx_obj)
+		claim_zero(mlx5_devx_cmd_destroy(pool->devx_obj));
+	if (pool->cts)
+		mlx5_ipool_destroy(pool->cts);
+	mlx5_free(pool);
+}
+
+static struct mlx5_aso_ct_pool *
+flow_hw_ct_pool_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *port_attr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_devx_obj *obj;
+	uint32_t nb_cts = rte_align32pow2(port_attr->nb_conn_tracks);
+	uint32_t log_obj_size = rte_log2_u32(nb_cts);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_ct_action),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hw_ct_action",
+	};
+	int reg_id;
+	uint32_t flags;
+
+	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	if (!pool) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
+							  priv->sh->cdev->pdn,
+							  log_obj_size);
+	if (!obj) {
+		rte_errno = ENODATA;
+		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
+		goto err;
+	}
+	pool->devx_obj = obj;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_ASO_CONNTRACK, 0, NULL);
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	pool->dr_action = mlx5dr_action_create_aso_ct(priv->dr_ctx,
+						      (struct mlx5dr_devx_obj *)obj,
+						      reg_id - REG_C_0, flags);
+	if (!pool->dr_action)
+		goto err;
+	/*
+	 * No need for local cache if CT number is a small number. Since
+	 * flow insertion rate will be very limited in that case. Here let's
+	 * set the number to less than default trunk size 4K.
+	 */
+	if (nb_cts <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_cts;
+	} else if (nb_cts <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	pool->cts = mlx5_ipool_create(&cfg);
+	if (!pool->cts)
+		goto err;
+	pool->sq = priv->ct_mng->aso_sqs;
+	/* Assign the last extra ASO SQ as public SQ. */
+	pool->shared_sq = &priv->ct_mng->aso_sqs[priv->nb_queue - 1];
+	return pool;
+err:
+	flow_hw_ct_pool_destroy(dev, pool);
+	return NULL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4809,6 +4973,20 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_conn_tracks) {
+		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
+			   sizeof(*priv->ct_mng);
+		priv->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
+					   RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!priv->ct_mng)
+			goto err;
+		if (mlx5_aso_ct_queue_init(priv->sh, priv->ct_mng, nb_q_updated))
+			goto err;
+		priv->hws_ctpool = flow_hw_ct_pool_create(dev, port_attr);
+		if (!priv->hws_ctpool)
+			goto err;
+		priv->sh->ct_aso_en = 1;
+	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
 				nb_queue);
@@ -4817,6 +4995,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	return 0;
 err:
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -4890,6 +5076,14 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	}
 	if (priv->hws_cpool)
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4958,6 +5152,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
+		MLX5_ASSERT(mlx5_flow_hw_aso_tag == priv->mtr_color_reg);
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
@@ -4980,6 +5175,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		}
 	}
 	priv->sh->hws_tags = 1;
+	mlx5_flow_hw_aso_tag = (enum modify_reg)priv->mtr_color_reg;
 	mlx5_flow_hw_avl_tags_init_cnt++;
 }
 
@@ -5050,6 +5246,170 @@ flow_hw_clear_flow_metadata_config(void)
 	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
 }
 
+static int
+flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
+			  uint32_t idx,
+			  struct rte_flow_error *error)
+{
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	struct rte_eth_dev *owndev = &rte_eth_devices[owner];
+	struct mlx5_priv *priv = owndev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT destruction index");
+	}
+	__atomic_store_n(&ct->state, ASO_CONNTRACK_FREE,
+				 __ATOMIC_RELAXED);
+	mlx5_ipool_free(pool->cts, ct_idx);
+	return 0;
+}
+
+static int
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+			struct rte_flow_action_conntrack *profile,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+
+	if (owner != PORT_ID(priv))
+		return rte_flow_error_set(error, EACCES,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Can't query CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT query index");
+	}
+	profile->peer_port = ct->peer;
+	profile->is_original_dir = ct->is_original;
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+		return rte_flow_error_set(error, EIO,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Failed to query CT context");
+	return 0;
+}
+
+
+static int
+flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_modify_conntrack *action_conf,
+			 uint32_t idx, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	const struct rte_flow_action_conntrack *new_prf;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+	int ret = 0;
+
+	if (PORT_ID(priv) != owner)
+		return rte_flow_error_set(error, EACCES,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "Can't update CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT update index");
+	}
+	new_prf = &action_conf->new_ct;
+	if (action_conf->direction)
+		ct->is_original = !!new_prf->is_original_dir;
+	if (action_conf->state) {
+		/* Only validate the profile when it needs to be updated. */
+		ret = mlx5_validate_action_ct(dev, new_prf, error);
+		if (ret)
+			return ret;
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		if (ret)
+			return rte_flow_error_set(error, EIO,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL,
+					"Failed to send CT context update WQE");
+		if (queue != MLX5_HW_INV_QUEUE)
+			return 0;
+		/* Block until ready or a failure in synchronous mode. */
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret)
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+	}
+	return ret;
+}
+
+static struct rte_flow_action_handle *
+flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action_conntrack *pro,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint32_t ct_idx = 0;
+	int ret;
+	bool async = !!(queue != MLX5_HW_INV_QUEUE);
+
+	if (!pool) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "CT is not enabled");
+		return 0;
+	}
+	ct = mlx5_ipool_zmalloc(pool->cts, &ct_idx);
+	if (!ct) {
+		rte_flow_error_set(error, rte_errno,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to allocate CT object");
+		return 0;
+	}
+	ct->offset = ct_idx - 1;
+	ct->is_original = !!pro->is_original_dir;
+	ct->peer = pro->peer_port;
+	ct->pool = pool;
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+		mlx5_ipool_free(pool->cts, ct_idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
+	if (!async) {
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret) {
+			mlx5_ipool_free(pool->cts, ct_idx);
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+			return 0;
+		}
+	}
+	return (struct rte_flow_action_handle *)(uintptr_t)
+		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
+}
+
 /**
  * Create shared action.
  *
@@ -5097,6 +5457,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			handle = (struct rte_flow_action_handle *)
 				 (uintptr_t)cnt_id;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5132,10 +5495,18 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_update(dev, handle, update, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	default:
+		return flow_dv_action_update(dev, handle, update, error);
+	}
 }
 
 /**
@@ -5174,6 +5545,8 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_destroy(dev, act_idx, error);
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -5327,6 +5700,8 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_query(dev, act_idx, data, error);
 	default:
 		return flow_dv_action_query(dev, handle, data, error);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (9 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

Add PMD implementation for HW steering VLAN push, pop and modify flow
actions.

HWS VLAN push flow action is triggered by a sequence of mandatory
OF_PUSH_VLAN, OF_SET_VLAN_VID and optional OF_SET_VLAN_PCP
flow actions commands.
The commands must be arranged in the exact order:
OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ].
In masked HWS VLAN push flow action template *ALL* the above flow
actions must be masked.
In non-masked HWS VLAN push flow action template *ALL* the above flow
actions must not be masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan / \
  of_set_vlan_vid \
  [ / of_set_vlan_pcp  ] / end \
mask \
  of_push_vlan ethertype 0 / \
  of_set_vlan_vid vlan_vid 0 \
  [ / of_set_vlan_pcp vlan_pcp 0 ] / end\

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan ethertype <E>/ \
  of_set_vlan_vid vlan_vid <VID>\
  [ / of_set_vlan_pcp  <PCP>] / end \
mask \
  of_push_vlan ethertype <type != 0> / \
  of_set_vlan_vid vlan_vid <vid_mask != 0>\
  [ / of_set_vlan_pcp vlan_pcp <pcp_mask != 0> ] / end\

HWS VLAN pop flow action is triggered by OF_POP_VLAN
flow action command.
HWS VLAN pop action template is always non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_pop_vlan / end mask of_pop_vlan / end

HWS VLAN VID modify flow action is triggered by a standalone
OF_SET_VLAN_VID flow action command.
HWS VLAN VID modify action template can be ether masked or non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid / end mask of_set_vlan_vid vlan_vid 0 / end

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid vlan_vid 0x101 / end \
mask of_set_vlan_vid vlan_vid 0xffff / end

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5.h         |   2 +
 drivers/net/mlx5/mlx5_flow.h    |   4 +
 drivers/net/mlx5/mlx5_flow_dv.c |   2 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 492 +++++++++++++++++++++++++++++---
 4 files changed, 463 insertions(+), 37 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9c080e5eac..e78ed958c8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1672,6 +1672,8 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action *hw_push_vlan[MLX5DR_TABLE_TYPE_MAX];
+	struct mlx5dr_action *hw_pop_vlan[MLX5DR_TABLE_TYPE_MAX];
 	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
 	struct mlx5dr_action *hw_drop[2];
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 7e90eac2d0..b8124f6f79 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2448,4 +2448,8 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		struct rte_flow_error *error);
 int flow_hw_table_update(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error);
+int mlx5_flow_item_field_width(struct rte_eth_dev *dev,
+			   enum rte_flow_field_id field, int inherit,
+			   const struct rte_flow_attr *attr,
+			   struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index ea13345baf..7efc936ddd 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1326,7 +1326,7 @@ flow_dv_convert_action_modify_ipv6_dscp
 					     MLX5_MODIFICATION_TYPE_SET, error);
 }
 
-static int
+int
 mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 			   enum rte_flow_field_id field, int inherit,
 			   const struct rte_flow_attr *attr,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 535df6ba5d..0c110819e6 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -44,12 +44,22 @@
 /* Lowest priority for HW non-root table. */
 #define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
+#define MLX5_HW_VLAN_PUSH_TYPE_IDX 0
+#define MLX5_HW_VLAN_PUSH_VID_IDX 1
+#define MLX5_HW_VLAN_PUSH_PCP_IDX 2
+
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
 static int flow_hw_translate_group(struct rte_eth_dev *dev,
 				   const struct mlx5_flow_template_table_cfg *cfg,
 				   uint32_t group,
 				   uint32_t *table_group,
 				   struct rte_flow_error *error);
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -1065,6 +1075,52 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	return 0;
 }
 
+static __rte_always_inline bool
+is_of_vlan_pcp_present(const struct rte_flow_action *actions)
+{
+	/*
+	 * Order of RTE VLAN push actions is
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	return actions[MLX5_HW_VLAN_PUSH_PCP_IDX].type ==
+		RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP;
+}
+
+static __rte_always_inline bool
+is_template_masked_push_vlan(const struct rte_flow_action_of_push_vlan *mask)
+{
+	/*
+	 * In masked push VLAN template all RTE push actions are masked.
+	 */
+	return mask && mask->ethertype != 0;
+}
+
+static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
+{
+/*
+ * OpenFlow Switch Specification defines 801.1q VID as 12+1 bits.
+ */
+	rte_be32_t type, vid, pcp;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	rte_be32_t vid_lo, vid_hi;
+#endif
+
+	type = ((const struct rte_flow_action_of_push_vlan *)
+		actions[MLX5_HW_VLAN_PUSH_TYPE_IDX].conf)->ethertype;
+	vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+		actions[MLX5_HW_VLAN_PUSH_VID_IDX].conf)->vlan_vid;
+	pcp = is_of_vlan_pcp_present(actions) ?
+	      ((const struct rte_flow_action_of_set_vlan_pcp *)
+		      actions[MLX5_HW_VLAN_PUSH_PCP_IDX].conf)->vlan_pcp : 0;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	vid_hi = vid & 0xff;
+	vid_lo = vid >> 8;
+	return (((vid_lo << 8) | (pcp << 5) | vid_hi) << 16) | type;
+#else
+	return (type << 16) | (pcp << 13) | vid;
+#endif
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1167,6 +1223,26 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_push_vlan[type];
+			if (is_template_masked_push_vlan(masks->conf))
+				acts->rule_acts[action_pos].push_vlan.vlan_hdr =
+					vlan_hdr_to_be32(actions);
+			else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos))
+				goto err;
+			actions += is_of_vlan_pcp_present(actions) ?
+					MLX5_HW_VLAN_PUSH_PCP_IDX :
+					MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_pop_vlan[type];
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			action_pos = at->actions_off[actions - at->actions];
 			if (masks->conf &&
@@ -1784,8 +1860,17 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
-		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
-			    (int)action->type == act_data->type);
+		/*
+		 * action template construction replaces
+		 * OF_SET_VLAN_VID with MODIFY_FIELD
+		 */
+		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+			MLX5_ASSERT(act_data->type ==
+				    RTE_FLOW_ACTION_TYPE_MODIFY_FIELD);
+		else
+			MLX5_ASSERT(action->type ==
+				    RTE_FLOW_ACTION_TYPE_INDIRECT ||
+				    (int)action->type == act_data->type);
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
@@ -1801,6 +1886,10 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			      (action->conf))->id);
 			rule_acts[act_data->action_dst].tag.value = tag;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			rule_acts[act_data->action_dst].push_vlan.vlan_hdr =
+				vlan_hdr_to_be32(action);
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
@@ -1852,10 +1941,16 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				    act_data->encap.len);
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			ret = flow_hw_modify_field_construct(job,
-							     act_data,
-							     hw_acts,
-							     action);
+			if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+				ret = flow_hw_set_vlan_vid_construct(dev, job,
+								     act_data,
+								     hw_acts,
+								     action);
+			else
+				ret = flow_hw_modify_field_construct(job,
+								     act_data,
+								     hw_acts,
+								     action);
 			if (ret)
 				return -1;
 			break;
@@ -2559,9 +2654,14 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			mlx5_ipool_destroy(tbl->flow);
 		mlx5_free(tbl);
 	}
-	rte_flow_error_set(error, err,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
-			  "fail to create rte table");
+	if (error != NULL) {
+		rte_flow_error_set(error, err,
+				error->type == RTE_FLOW_ERROR_TYPE_NONE ?
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED : error->type,
+				NULL,
+				error->message == NULL ?
+				"fail to create rte table" : error->message);
+	}
 	return NULL;
 }
 
@@ -2865,28 +2965,76 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 				uint16_t *ins_pos)
 {
 	uint16_t idx, total = 0;
-	bool ins = false;
+	uint16_t end_idx = UINT16_MAX;
 	bool act_end = false;
+	bool modify_field = false;
+	bool rss_or_queue = false;
 
 	MLX5_ASSERT(actions && masks);
 	MLX5_ASSERT(new_actions && new_masks);
 	MLX5_ASSERT(ins_actions && ins_masks);
 	for (idx = 0; !act_end; idx++) {
-		if (idx >= MLX5_HW_MAX_ACTS)
-			return -1;
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
-		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
-			ins = true;
-			*ins_pos = idx;
-		}
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* It is assumed that application provided only single RSS/QUEUE action. */
+			MLX5_ASSERT(!rss_or_queue);
+			rss_or_queue = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			modify_field = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			end_idx = idx;
 			act_end = true;
+			break;
+		default:
+			break;
+		}
 	}
-	if (!ins)
+	if (!rss_or_queue)
 		return 0;
-	else if (idx == MLX5_HW_MAX_ACTS)
+	else if (idx >= MLX5_HW_MAX_ACTS)
 		return -1; /* No more space. */
 	total = idx;
+	/*
+	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
+	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
+	 * first MODIFY_FIELD flow action.
+	 */
+	if (modify_field) {
+		*ins_pos = end_idx;
+		goto insert_meta_copy;
+	}
+	/*
+	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
+	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	 */
+	act_end = false;
+	for (idx = 0; !act_end; idx++) {
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+		case RTE_FLOW_ACTION_TYPE_METER:
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			*ins_pos = idx;
+			act_end = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			act_end = true;
+			break;
+		default:
+			break;
+		}
+	}
+insert_meta_copy:
+	MLX5_ASSERT(*ins_pos != UINT16_MAX);
+	MLX5_ASSERT(*ins_pos < total);
 	/* Before the position, no change for the actions. */
 	for (idx = 0; idx < *ins_pos; idx++) {
 		new_actions[idx] = actions[idx];
@@ -2903,6 +3051,73 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 	return 0;
 }
 
+static int
+flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
+				  const
+				  struct rte_flow_actions_template_attr *attr,
+				  const struct rte_flow_action *action,
+				  const struct rte_flow_action *mask,
+				  struct rte_flow_error *error)
+{
+#define X_FIELD(ptr, t, f) (((ptr)->conf) && ((t *)((ptr)->conf))->f)
+
+	const bool masked_push =
+		X_FIELD(mask + MLX5_HW_VLAN_PUSH_TYPE_IDX,
+			const struct rte_flow_action_of_push_vlan, ethertype);
+	bool masked_param;
+
+	/*
+	 * Mandatory actions order:
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+	/* Check that mark matches OF_PUSH_VLAN */
+	if (mask[MLX5_HW_VLAN_PUSH_TYPE_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: mask does not match");
+	/* Check that the second template and mask items are SET_VLAN_VID */
+	if (action[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID ||
+	    mask[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: invalid actions order");
+	masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_VID_IDX,
+			       const struct rte_flow_action_of_set_vlan_vid,
+			       vlan_vid);
+	/*
+	 * PMD requires OF_SET_VLAN_VID mask to must match OF_PUSH_VLAN
+	 */
+	if (masked_push ^ masked_param)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "OF_SET_VLAN_VID: mask does not match OF_PUSH_VLAN");
+	if (is_of_vlan_pcp_present(action)) {
+		if (mask[MLX5_HW_VLAN_PUSH_PCP_IDX].type !=
+		     RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "OF_SET_VLAN_PCP: missing mask configuration");
+		masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_PCP_IDX,
+				       const struct
+				       rte_flow_action_of_set_vlan_pcp,
+				       vlan_pcp);
+		/*
+		 * PMD requires OF_SET_VLAN_PCP mask to must match OF_PUSH_VLAN
+		 */
+		if (masked_push ^ masked_param)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION, action,
+						  "OF_SET_VLAN_PCP: mask does not match OF_PUSH_VLAN");
+	}
+	return 0;
+#undef X_FIELD
+}
+
 static int
 flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
@@ -2993,6 +3208,18 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			ret = flow_hw_validate_action_push_vlan
+					(dev, attr, action, mask, error);
+			if (ret != 0)
+				return ret;
+			i += is_of_vlan_pcp_present(action) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -3020,6 +3247,8 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
+	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
+	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
 };
 
 static int
@@ -3136,6 +3365,14 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				goto err_actions_num;
 			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			i += is_of_vlan_pcp_present(at->actions + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3163,6 +3400,89 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	return NULL;
 }
 
+static void
+flow_hw_set_vlan_vid(struct rte_eth_dev *dev,
+		     struct rte_flow_action *ra,
+		     struct rte_flow_action *rm,
+		     struct rte_flow_action_modify_field *spec,
+		     struct rte_flow_action_modify_field *mask,
+		     int set_vlan_vid_ix)
+{
+	struct rte_flow_error error;
+	const bool masked = rm[set_vlan_vid_ix].conf &&
+		(((const struct rte_flow_action_of_set_vlan_vid *)
+			rm[set_vlan_vid_ix].conf)->vlan_vid != 0);
+	const struct rte_flow_action_of_set_vlan_vid *conf =
+		ra[set_vlan_vid_ix].conf;
+	rte_be16_t vid = masked ? conf->vlan_vid : 0;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	*spec = (typeof(*spec)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	*mask = (typeof(*mask)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0xffffffff, .offset = 0xffffffff,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = masked ? (1U << width) - 1 : 0,
+			.offset = 0,
+		},
+		.width = 0xffffffff,
+	};
+	ra[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	ra[set_vlan_vid_ix].conf = spec;
+	rm[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	rm[set_vlan_vid_ix].conf = mask;
+}
+
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	struct rte_flow_error error;
+	rte_be16_t vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+			   action->conf)->vlan_vid;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	struct rte_flow_action_modify_field conf = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	struct rte_flow_action modify_action = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &conf
+	};
+
+	return flow_hw_modify_field_construct(job, act_data, hw_acts,
+					      &modify_action);
+}
+
 /**
  * Create flow action template.
  *
@@ -3188,14 +3508,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_num, act_len, mask_len;
+	int len, act_len, mask_len;
+	unsigned int act_num;
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
-	uint16_t pos = MLX5_HW_MAX_ACTS;
+	uint16_t pos = UINT16_MAX;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
-	const struct rte_flow_action *ra;
-	const struct rte_flow_action *rm;
+	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
+	struct rte_flow_action *rm = (void *)(uintptr_t)masks;
+	int set_vlan_vid_ix = -1;
+	struct rte_flow_action_modify_field set_vlan_vid_spec = {0, };
+	struct rte_flow_action_modify_field set_vlan_vid_mask = {0, };
 	const struct rte_flow_action_modify_field rx_mreg = {
 		.operation = RTE_FLOW_MODIFY_SET,
 		.dst = {
@@ -3235,21 +3559,58 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
+		/* Application should make sure only one Q/RSS exist in one rule. */
 		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
 						    tmp_action, tmp_mask, &pos)) {
 			rte_flow_error_set(error, EINVAL,
 					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					   "Failed to concatenate new action/mask");
 			return NULL;
+		} else if (pos != UINT16_MAX) {
+			ra = tmp_action;
+			rm = tmp_mask;
 		}
 	}
-	/* Application should make sure only one Q/RSS exist in one rule. */
-	if (pos == MLX5_HW_MAX_ACTS) {
-		ra = actions;
-		rm = masks;
-	} else {
-		ra = tmp_action;
-		rm = tmp_mask;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		switch (ra[i].type) {
+		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			i += is_of_vlan_pcp_present(ra + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			set_vlan_vid_ix = i;
+			break;
+		default:
+			break;
+		}
+	}
+	/*
+	 * Count flow actions to allocate required space for storing DR offsets and to check
+	 * if temporary buffer would not be overrun.
+	 */
+	act_num = i + 1;
+	if (act_num >= MLX5_HW_MAX_ACTS) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
+		return NULL;
+	}
+	if (set_vlan_vid_ix != -1) {
+		/* If temporary action buffer was not used, copy template actions to it */
+		if (ra == actions && rm == masks) {
+			for (i = 0; i < act_num; ++i) {
+				tmp_action[i] = actions[i];
+				tmp_mask[i] = masks[i];
+				if (actions[i].type == RTE_FLOW_ACTION_TYPE_END)
+					break;
+			}
+			ra = tmp_action;
+			rm = tmp_mask;
+		}
+		flow_hw_set_vlan_vid(dev, ra, rm,
+				     &set_vlan_vid_spec, &set_vlan_vid_mask,
+				     set_vlan_vid_ix);
 	}
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
@@ -3259,10 +3620,6 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	/* Count flow actions to allocate required space for storing DR offsets. */
-	act_num = 0;
-	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
-		act_num++;
 	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
@@ -4510,7 +4867,11 @@ flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
 		.attr = tx_tbl_attr,
 		.external = false,
 	};
-	struct rte_flow_error drop_err;
+	struct rte_flow_error drop_err = {
+		.type = RTE_FLOW_ERROR_TYPE_NONE,
+		.cause = NULL,
+		.message = NULL,
+	};
 
 	RTE_SET_USED(drop_err);
 	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
@@ -4791,6 +5152,60 @@ flow_hw_ct_pool_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+static void
+flow_hw_destroy_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i < MLX5DR_TABLE_TYPE_MAX; i++) {
+		if (priv->hw_pop_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_pop_vlan[i]);
+			priv->hw_pop_vlan[i] = NULL;
+		}
+		if (priv->hw_push_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_push_vlan[i]);
+			priv->hw_push_vlan[i] = NULL;
+		}
+	}
+}
+
+static int
+flow_hw_create_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+	const enum mlx5dr_action_flags flags[MLX5DR_TABLE_TYPE_MAX] = {
+		MLX5DR_ACTION_FLAG_HWS_RX,
+		MLX5DR_ACTION_FLAG_HWS_TX,
+		MLX5DR_ACTION_FLAG_HWS_FDB
+	};
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i <= MLX5DR_TABLE_TYPE_NIC_TX; i++) {
+		priv->hw_pop_vlan[i] =
+			mlx5dr_action_create_pop_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+		priv->hw_push_vlan[i] =
+			mlx5dr_action_create_push_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+	}
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_pop_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+		priv->hw_push_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_push_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+	}
+	return 0;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4993,6 +5408,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	ret = flow_hw_create_vlan(dev);
+	if (ret)
+		goto err;
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -5010,6 +5428,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
 	mlx5_free(priv->hw_q);
@@ -5069,6 +5488,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 12/18] net/mlx5: implement METER MARK indirect action for HWS
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (10 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 13/18] net/mlx5: add HWS AGE action support Suanming Mou
                     ` (5 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Add ability to create an indirect action handle for METER_MARK.
It allows to share one Meter between several different actions.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |   3 +
 drivers/net/mlx5/mlx5.c            |   4 +-
 drivers/net/mlx5/mlx5.h            |  33 ++-
 drivers/net/mlx5/mlx5_flow.c       |   6 +
 drivers/net/mlx5/mlx5_flow.h       |  20 +-
 drivers/net/mlx5/mlx5_flow_aso.c   | 141 ++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c    | 145 +++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c    | 437 +++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c |  92 +++++-
 9 files changed, 776 insertions(+), 105 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index e499c38dcf..12646550b0 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -105,6 +105,7 @@ Features
 - Sub-Function representors.
 - Sub-Function.
 - Matching on represented port.
+- Meter mark.
 
 
 Limitations
@@ -485,6 +486,8 @@ Limitations
     if meter has drop count
     or meter hierarchy contains any meter that uses drop count,
     it cannot be used by flow rule matching all ports.
+  - When using HWS flow engine (``dv_flow_en`` = 2),
+    Only meter mark action is supported.
 
 - Integrity:
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6490ac636c..64a0e6f31d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -442,7 +442,7 @@ mlx5_flow_aso_age_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT, 1);
 	if (err) {
 		mlx5_free(sh->aso_age_mng);
 		return -1;
@@ -763,7 +763,7 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING, MLX5_ASO_CT_SQ_NUM);
 	if (err) {
 		mlx5_free(sh->ct_mng);
 		/* rte_errno should be extracted from the failure. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e78ed958c8..d3267fafda 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -976,12 +976,16 @@ enum mlx5_aso_mtr_type {
 
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
-	LIST_ENTRY(mlx5_aso_mtr) next;
+	union {
+		LIST_ENTRY(mlx5_aso_mtr) next;
+		struct mlx5_aso_mtr_pool *pool;
+	};
 	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
 	uint32_t offset;
+	enum rte_color init_color;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -990,7 +994,11 @@ struct mlx5_aso_mtr_pool {
 	/*Must be the first in pool*/
 	struct mlx5_devx_obj *devx_obj;
 	/* The devx object of the minimum aso flow meter ID. */
+	struct mlx5dr_action *action; /* HWS action. */
+	struct mlx5_indexed_pool *idx_pool; /* HWS index pool. */
 	uint32_t index; /* Pool index in management structure. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
+	struct mlx5_aso_sq *sq; /* ASO SQs. */
 };
 
 LIST_HEAD(aso_meter_list, mlx5_aso_mtr);
@@ -1685,6 +1693,7 @@ struct mlx5_priv {
 	struct mlx5_aso_ct_pools_mng *ct_mng;
 	/* Management data for ASO connection tracking. */
 	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
+	struct mlx5_aso_mtr_pool *hws_mpool; /* HW steering's Meter pool. */
 #endif
 };
 
@@ -2005,7 +2014,8 @@ void mlx5_pmd_socket_uninit(void);
 int mlx5_flow_meter_init(struct rte_eth_dev *dev,
 			 uint32_t nb_meters,
 			 uint32_t nb_meter_profiles,
-			 uint32_t nb_meter_policies);
+			 uint32_t nb_meter_policies,
+			 uint32_t nb_queues);
 void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
@@ -2074,15 +2084,24 @@ eth_tx_burst_t mlx5_select_tx_function(struct rte_eth_dev *dev);
 
 /* mlx5_flow_aso.c */
 
+int mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_mtr_pool *hws_pool,
+			    struct mlx5_aso_mtr_pools_mng *pool_mng,
+			    uint32_t nb_queues);
+void mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			       struct mlx5_aso_mtr_pool *hws_pool,
+			       struct mlx5_aso_mtr_pools_mng *pool_mng);
 int mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
+			enum mlx5_access_aso_opc_mod aso_opc_mode,
+			uint32_t nb_queues);
 int mlx5_aso_flow_hit_queue_poll_start(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
-int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
-int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+			   enum mlx5_access_aso_opc_mod aso_opc_mod);
+int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
+				 struct mlx5_aso_mtr *mtr,
+				 struct mlx5_mtr_bulk *bulk);
+int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 7c3295609d..e3485352db 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -4223,6 +4223,12 @@ flow_action_handles_translate(struct rte_eth_dev *dev,
 						MLX5_RTE_FLOW_ACTION_TYPE_COUNT;
 			translated[handle->index].conf = (void *)(uintptr_t)idx;
 			break;
+		case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+			translated[handle->index].type =
+						(enum rte_flow_action_type)
+						MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK;
+			translated[handle->index].conf = (void *)(uintptr_t)idx;
+			break;
 		case MLX5_INDIRECT_ACTION_TYPE_AGE:
 			if (priv->sh->flow_hit_aso_en) {
 				translated[handle->index].type =
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index b8124f6f79..96198d7d17 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -46,6 +46,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
 	MLX5_RTE_FLOW_ACTION_TYPE_JUMP,
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
+	MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
 };
 
 /* Private (internal) Field IDs for MODIFY_FIELD action. */
@@ -54,22 +55,23 @@ enum mlx5_rte_flow_field_id {
 			MLX5_RTE_FLOW_FIELD_META_REG,
 };
 
-#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
+#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
 	MLX5_INDIRECT_ACTION_TYPE_COUNT,
 	MLX5_INDIRECT_ACTION_TYPE_CT,
+	MLX5_INDIRECT_ACTION_TYPE_METER_MARK,
 };
 
-/* Now, the maximal ports will be supported is 256, action number is 4M. */
-#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x100
+/* Now, the maximal ports will be supported is 16, action number is 32M. */
+#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x10
 
 #define MLX5_INDIRECT_ACT_CT_OWNER_SHIFT 22
 #define MLX5_INDIRECT_ACT_CT_OWNER_MASK (MLX5_INDIRECT_ACT_CT_MAX_PORT - 1)
 
-/* 30-31: type, 22-29: owner port, 0-21: index. */
+/* 29-31: type, 25-28: owner port, 0-24: index */
 #define MLX5_INDIRECT_ACT_CT_GEN_IDX(owner, index) \
 	((MLX5_INDIRECT_ACTION_TYPE_CT << MLX5_INDIRECT_ACTION_TYPE_OFFSET) | \
 	 (((owner) & MLX5_INDIRECT_ACT_CT_OWNER_MASK) << \
@@ -1114,6 +1116,7 @@ struct rte_flow_hw {
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	uint32_t cnt_id;
+	uint32_t mtr_id;
 	uint8_t rule[0]; /* HWS layer data struct. */
 } __rte_packed;
 
@@ -1165,6 +1168,9 @@ struct mlx5_action_construct_data {
 		struct {
 			uint32_t id;
 		} shared_counter;
+		struct {
+			uint32_t id;
+		} shared_meter;
 	};
 };
 
@@ -1248,6 +1254,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
+	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
@@ -1537,6 +1544,7 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 */
 		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
 		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
@@ -1922,10 +1930,10 @@ mlx5_aso_meter_by_idx(struct mlx5_priv *priv, uint32_t idx)
 	struct mlx5_aso_mtr_pools_mng *pools_mng =
 				&priv->sh->mtrmng->pools_mng;
 
-	/* Decrease to original index. */
-	idx--;
 	if (priv->mtr_bulk.aso)
 		return priv->mtr_bulk.aso + idx;
+	/* Decrease to original index. */
+	idx--;
 	MLX5_ASSERT(idx / MLX5_ASO_MTRS_PER_POOL < pools_mng->n);
 	rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 	pool = pools_mng->pools[idx / MLX5_ASO_MTRS_PER_POOL];
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index c00c07b891..a5f58301eb 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -275,6 +275,65 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	return -1;
 }
 
+void
+mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			  struct mlx5_aso_mtr_pool *hws_pool,
+			  struct mlx5_aso_mtr_pools_mng *pool_mng)
+{
+	uint32_t i;
+
+	if (hws_pool) {
+		for (i = 0; i < hws_pool->nb_sq; i++)
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+		mlx5_free(hws_pool->sq);
+		return;
+	}
+	if (pool_mng)
+		mlx5_aso_destroy_sq(&pool_mng->sq);
+}
+
+int
+mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+				struct mlx5_aso_mtr_pool *hws_pool,
+				struct mlx5_aso_mtr_pools_mng *pool_mng,
+				uint32_t nb_queues)
+{
+	struct mlx5_common_device *cdev = sh->cdev;
+	struct mlx5_aso_sq *sq;
+	uint32_t i;
+
+	if (hws_pool) {
+		sq = mlx5_malloc(MLX5_MEM_ZERO,
+			sizeof(struct mlx5_aso_sq) * nb_queues,
+			RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!sq)
+			return -1;
+		hws_pool->sq = sq;
+		for (i = 0; i < nb_queues; i++) {
+			if (mlx5_aso_sq_create(cdev, hws_pool->sq + i,
+					       sh->tx_uar.obj,
+					       MLX5_ASO_QUEUE_LOG_DESC))
+				goto error;
+			mlx5_aso_mtr_init_sq(hws_pool->sq + i);
+		}
+		hws_pool->nb_sq = nb_queues;
+	}
+	if (pool_mng) {
+		if (mlx5_aso_sq_create(cdev, &pool_mng->sq,
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			return -1;
+		mlx5_aso_mtr_init_sq(&pool_mng->sq);
+	}
+	return 0;
+error:
+	do {
+		if (&hws_pool->sq[i])
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+	} while (i--);
+	return -1;
+}
+
 /**
  * API to create and initialize Send Queue used for ASO access.
  *
@@ -282,13 +341,16 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
  *   Pointer to shared device context.
  * @param[in] aso_opc_mod
  *   Mode of ASO feature.
+ * @param[in] nb_queues
+ *   Number of Send Queues to create.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		    enum mlx5_access_aso_opc_mod aso_opc_mod)
+		    enum mlx5_access_aso_opc_mod aso_opc_mod,
+			uint32_t nb_queues)
 {
 	uint32_t sq_desc_n = 1 << MLX5_ASO_QUEUE_LOG_DESC;
 	struct mlx5_common_device *cdev = sh->cdev;
@@ -307,10 +369,9 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_age_init_sq(&sh->aso_age_mng->aso_sq);
 		break;
 	case ASO_OPC_MOD_POLICER:
-		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
+		if (mlx5_aso_mtr_queue_init(sh, NULL,
+					    &sh->mtrmng->pools_mng, nb_queues))
 			return -1;
-		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
@@ -343,7 +404,7 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->aso_age_mng->aso_sq;
 		break;
 	case ASO_OPC_MOD_POLICER:
-		sq = &sh->mtrmng->pools_mng.sq;
+		mlx5_aso_mtr_queue_uninit(sh, NULL, &sh->mtrmng->pools_mng);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
@@ -666,7 +727,8 @@ static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
-			       struct mlx5_mtr_bulk *bulk)
+			       struct mlx5_mtr_bulk *bulk,
+				   bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -679,11 +741,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t param_le;
 	int id;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return 0;
 	}
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
@@ -692,8 +756,11 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
-		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-				    mtrs[aso_mtr->offset]);
+		if (likely(sh->config.dv_flow_en == 2))
+			pool = aso_mtr->pool;
+		else
+			pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+					    mtrs[aso_mtr->offset]);
 		id = pool->devx_obj->id;
 	} else {
 		id = bulk->devx_obj->id;
@@ -756,7 +823,8 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -779,7 +847,7 @@ mlx5_aso_mtrs_status_update(struct mlx5_aso_sq *sq, uint16_t aso_mtrs_nums)
 }
 
 static void
-mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
+mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 {
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
@@ -791,7 +859,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
 		rte_spinlock_unlock(&sq->sqsl);
@@ -823,7 +892,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /**
@@ -840,16 +910,31 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
 			struct mlx5_mtr_bulk *bulk)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2) &&
+	    mtr->type == ASO_METER_INDIRECT) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
+						   bulk, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -873,17 +958,31 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2) &&
+	    mtr->type == ASO_METER_INDIRECT) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 		return 0;
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
 		if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 			return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 7efc936ddd..868fa6e1a5 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1387,6 +1387,7 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 		return inherit < 0 ? 0 : inherit;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+	case RTE_FLOW_FIELD_METER_COLOR:
 		return 2;
 	default:
 		MLX5_ASSERT(false);
@@ -1856,6 +1857,31 @@ mlx5_flow_field_id_to_modify_info
 				info[idx].offset = data->offset;
 		}
 		break;
+	case RTE_FLOW_FIELD_METER_COLOR:
+		{
+			const uint32_t color_mask =
+				(UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = flow_hw_get_reg_id
+					(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						       0, error);
+			if (reg < 0)
+				return;
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT((unsigned int)reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0,
+						reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, color_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -1913,7 +1939,9 @@ flow_dv_convert_action_modify_field
 		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
 					(void *)(uintptr_t)conf->src.pvalue :
 					(void *)(uintptr_t)&conf->src.value;
-		if (conf->dst.field == RTE_FLOW_FIELD_META) {
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR) {
 			meta = *(const unaligned_uint32_t *)item.spec;
 			meta = rte_cpu_to_be_32(meta);
 			item.spec = &meta;
@@ -3687,6 +3715,69 @@ flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Validate METER_COLOR item.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] item
+ *   Item specification.
+ * @param[in] attr
+ *   Attributes of flow that includes this item.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_validate_item_meter_color(struct rte_eth_dev *dev,
+			   const struct rte_flow_item *item,
+			   const struct rte_flow_attr *attr __rte_unused,
+			   struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_item_meter_color *spec = item->spec;
+	const struct rte_flow_item_meter_color *mask = item->mask;
+	struct rte_flow_item_meter_color nic_mask = {
+		.color = RTE_COLORS
+	};
+	int ret;
+
+	if (priv->mtr_color_reg == REG_NON)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ITEM, item,
+					  "meter color register"
+					  " isn't available");
+	ret = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, error);
+	if (ret < 0)
+		return ret;
+	if (!spec)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+					  item->spec,
+					  "data cannot be empty");
+	if (spec->color > RTE_COLORS)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &spec->color,
+					  "meter color is invalid");
+	if (!mask)
+		mask = &rte_flow_item_meter_color_mask;
+	if (!mask->color)
+		return rte_flow_error_set(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ITEM_SPEC, NULL,
+					"mask cannot be zero");
+
+	ret = mlx5_flow_item_acceptable(item, (const uint8_t *)mask,
+				(const uint8_t *)&nic_mask,
+				sizeof(struct rte_flow_item_meter_color),
+				MLX5_ITEM_RANGE_NOT_ACCEPTED, error);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
 int
 flow_dv_encap_decap_match_cb(void *tool_ctx __rte_unused,
 			     struct mlx5_list_entry *entry, void *cb_ctx)
@@ -6519,7 +6610,7 @@ flow_dv_mtr_container_resize(struct rte_eth_dev *dev)
 		return -ENOMEM;
 	}
 	if (!pools_mng->n)
-		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER, 1)) {
 			mlx5_free(pools);
 			return -ENOMEM;
 		}
@@ -7421,6 +7512,13 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+			ret = flow_dv_validate_item_meter_color(dev, items,
+								attr, error);
+			if (ret < 0)
+				return ret;
+			last_item = MLX5_FLOW_ITEM_METER_COLOR;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -10508,6 +10606,45 @@ flow_dv_translate_item_flex(struct rte_eth_dev *dev, void *matcher, void *key,
 	mlx5_flex_flow_translate_item(dev, matcher, key, item, is_inner);
 }
 
+/**
+ * Add METER_COLOR item to matcher
+ *
+ * @param[in] dev
+ *   The device to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ */
+static void
+flow_dv_translate_item_meter_color(struct rte_eth_dev *dev, void *key,
+			    const struct rte_flow_item *item,
+			    uint32_t key_type)
+{
+	const struct rte_flow_item_meter_color *color_m = item->mask;
+	const struct rte_flow_item_meter_color *color_v = item->spec;
+	uint32_t value, mask;
+	int reg = REG_NON;
+
+	MLX5_ASSERT(color_v);
+	if (MLX5_ITEM_VALID(item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(item, key_type, color_v, color_m,
+		&rte_flow_item_meter_color_mask);
+	value = rte_col_2_mlx5_col(color_v->color);
+	mask = color_m ?
+		color_m->color : (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+	if (reg == REG_NON)
+		return;
+	flow_dv_match_meta_reg(key, (enum modify_reg)reg, value, mask);
+}
+
 static uint32_t matcher_zero[MLX5_ST_SZ_DW(fte_match_param)] = { 0 };
 
 #define HEADER_IS_ZERO(match_criteria, headers)				     \
@@ -13260,6 +13397,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		/* No other protocol should follow eCPRI layer. */
 		last_item = MLX5_FLOW_LAYER_ECPRI;
 		break;
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		flow_dv_translate_item_meter_color(dev, key, items, key_type);
+		last_item = MLX5_FLOW_ITEM_METER_COLOR;
+		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 0c110819e6..52125c861e 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -412,6 +412,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
 		acts->cnt_id = 0;
 	}
+	if (acts->mtr_id) {
+		mlx5_ipool_free(priv->hws_mpool->idx_pool, acts->mtr_id);
+		acts->mtr_id = 0;
+	}
 }
 
 /**
@@ -628,6 +632,42 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared meter_mark action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] mtr_id
+ *   Shared meter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_mtr_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t mtr_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_meter.id = mtr_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
 
 /**
  * Translate shared indirect action.
@@ -682,6 +722,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 				       idx, &acts->rule_acts[action_dst]))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		if (__flow_hw_act_data_shared_mtr_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
+			action_src, action_dst, idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -888,6 +935,7 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
 		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
@@ -1047,7 +1095,7 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
+	if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 		return -ENOMEM;
 	return 0;
 }
@@ -1121,6 +1169,74 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 #endif
 }
 
+static __rte_always_inline struct mlx5_aso_mtr *
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
+			   const struct rte_flow_action *action,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_action_meter_mark *meter_mark = action->conf;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t mtr_id;
+
+	aso_mtr = mlx5_ipool_malloc(priv->hws_mpool->idx_pool, &mtr_id);
+	if (!aso_mtr)
+		return NULL;
+	/* Fill the flow meter parameters. */
+	aso_mtr->type = ASO_METER_INDIRECT;
+	fm = &aso_mtr->fm;
+	fm->meter_id = mtr_id;
+	fm->profile = (struct mlx5_flow_meter_profile *)(meter_mark->profile);
+	fm->is_enable = meter_mark->state;
+	fm->color_aware = meter_mark->color_mode;
+	aso_mtr->pool = pool;
+	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->offset = mtr_id - 1;
+	aso_mtr->init_color = (meter_mark->color_mode) ?
+		meter_mark->init_color : RTE_COLOR_GREEN;
+	/* Update ASO flow meter by wqe. */
+	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+					 &priv->mtr_bulk)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	/* Wait for ASO object completion. */
+	if (queue == MLX5_HW_INV_QUEUE &&
+	    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	return aso_mtr;
+}
+
+static __rte_always_inline int
+flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
+			   uint16_t aso_mtr_pos,
+			   const struct rte_flow_action *action,
+			   struct mlx5dr_rule_action *acts,
+			   uint32_t *index,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+
+	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	if (!aso_mtr)
+		return -1;
+
+	/* Compile METER_MARK action */
+	acts[aso_mtr_pos].action = pool->action;
+	acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts[aso_mtr_pos].aso_meter.init_color =
+		(enum mlx5dr_action_aso_meter_color)
+		rte_col_2_mlx5_col(aso_mtr->init_color);
+	*index = aso_mtr->fm.meter_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1428,6 +1544,24 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			action_pos = at->actions_off[actions - at->actions];
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter_mark *)
+			     masks->conf)->profile) {
+				err = flow_hw_meter_mark_compile(dev,
+							action_pos, actions,
+							acts->rule_acts,
+							&acts->mtr_id,
+							MLX5_HW_INV_QUEUE);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1624,8 +1758,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
+	struct mlx5_aso_mtr *aso_mtr;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
@@ -1661,6 +1797,17 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return -1;
+		rule_act->action = pool->action;
+		rule_act->aso_meter.offset = aso_mtr->offset;
+		rule_act->aso_meter.init_color =
+			(enum mlx5dr_action_aso_meter_color)
+			rte_col_2_mlx5_col(aso_mtr->init_color);
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1730,6 +1877,7 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
 	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
@@ -1807,6 +1955,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_actions_template *at = hw_at->action_template;
@@ -1823,8 +1972,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
-	struct mlx5_aso_mtr *mtr;
-	uint32_t mtr_id;
+	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
@@ -1858,6 +2006,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		struct mlx5_hrxq *hrxq;
 		uint32_t ct_idx;
 		cnt_id_t cnt_id;
+		uint32_t mtr_id;
 
 		action = &actions[act_data->action_src];
 		/*
@@ -1964,13 +2113,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			meter = action->conf;
 			mtr_id = meter->mtr_id;
-			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			aso_mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
 			rule_acts[act_data->action_dst].action =
 				priv->mtr_bulk.action;
 			rule_acts[act_data->action_dst].aso_meter.offset =
-								mtr->offset;
+								aso_mtr->offset;
 			jump = flow_hw_jump_action_register
-				(dev, &table->cfg, mtr->fm.group, NULL);
+				(dev, &table->cfg, aso_mtr->fm.group, NULL);
 			if (!jump)
 				return -1;
 			MLX5_ASSERT
@@ -1980,7 +2129,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
@@ -2016,6 +2165,28 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 					       &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK:
+			mtr_id = act_data->shared_meter.id &
+				((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+			/* Find ASO object. */
+			aso_mtr = mlx5_ipool_get(pool->idx_pool, mtr_id);
+			if (!aso_mtr)
+				return -1;
+			rule_acts[act_data->action_dst].action =
+							pool->action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+							aso_mtr->offset;
+			rule_acts[act_data->action_dst].aso_meter.init_color =
+				(enum mlx5dr_action_aso_meter_color)
+				rte_col_2_mlx5_col(aso_mtr->init_color);
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			ret = flow_hw_meter_mark_compile(dev,
+				act_data->action_dst, action,
+				rule_acts, &job->flow->mtr_id, queue);
+			if (ret != 0)
+				return ret;
+			break;
 		default:
 			break;
 		}
@@ -2283,6 +2454,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
@@ -2307,6 +2479,10 @@ flow_hw_pull(struct rte_eth_dev *dev,
 						&job->flow->cnt_id);
 				job->flow->cnt_id = 0;
 			}
+			if (job->flow->mtr_id) {
+				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
+				job->flow->mtr_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -3189,6 +3365,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -3282,6 +3461,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_METER;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3373,6 +3557,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3848,6 +4038,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 								  " attribute");
 			}
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		{
+			int reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported meter color register");
+			break;
+		}
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -5357,7 +5557,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	LIST_INIT(&priv->hw_ctrl_flows);
 	/* Initialize meter library*/
 	if (port_attr->nb_meters)
-		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1))
+		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1, nb_q_updated))
 			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
@@ -5861,7 +6061,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
+	uint32_t mtr_id;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5880,6 +6082,14 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		if (!aso_mtr)
+			break;
+		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
+		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5915,18 +6125,59 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
-	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
-
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_update_meter_mark *upd_meter_mark =
+		(const struct rte_flow_update_meter_mark *)update;
+	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		meter_mark = &upd_meter_mark->meter_mark;
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark update index");
+		fm = &aso_mtr->fm;
+		if (upd_meter_mark->profile_valid)
+			fm->profile = (struct mlx5_flow_meter_profile *)
+							(meter_mark->profile);
+		if (upd_meter_mark->color_mode_valid)
+			fm->color_aware = meter_mark->color_mode;
+		if (upd_meter_mark->init_color_valid)
+			aso_mtr->init_color = (meter_mark->color_mode) ?
+				meter_mark->init_color : RTE_COLOR_GREEN;
+		if (upd_meter_mark->state_valid)
+			fm->is_enable = meter_mark->state;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
+						 aso_mtr, &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		return 0;
 	default:
-		return flow_dv_action_update(dev, handle, update, error);
+		break;
 	}
+	return flow_dv_action_update(dev, handle, update, error);
 }
 
 /**
@@ -5957,7 +6208,11 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5967,6 +6222,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark destroy index");
+		fm = &aso_mtr->fm;
+		fm->is_enable = 0;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+						 &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		mlx5_ipool_free(pool->idx_pool, idx);
+		return 0;
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -6050,8 +6327,8 @@ flow_hw_action_create(struct rte_eth_dev *dev,
 		       const struct rte_flow_action *action,
 		       struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
-					    NULL, err);
+	return flow_hw_action_handle_create(dev, MLX5_HW_INV_QUEUE,
+					    NULL, conf, action, NULL, err);
 }
 
 /**
@@ -6076,8 +6353,8 @@ flow_hw_action_destroy(struct rte_eth_dev *dev,
 		       struct rte_flow_action_handle *handle,
 		       struct rte_flow_error *error)
 {
-	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
-			NULL, error);
+	return flow_hw_action_handle_destroy(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, NULL, error);
 }
 
 /**
@@ -6105,8 +6382,8 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 		      const void *update,
 		      struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
-			update, NULL, err);
+	return flow_hw_action_handle_update(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, update, NULL, err);
 }
 
 static int
@@ -6636,6 +6913,12 @@ mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 		mlx5_free(priv->mtr_profile_arr);
 		priv->mtr_profile_arr = NULL;
 	}
+	if (priv->hws_mpool) {
+		mlx5_aso_mtr_queue_uninit(priv->sh, priv->hws_mpool, NULL);
+		mlx5_ipool_destroy(priv->hws_mpool->idx_pool);
+		mlx5_free(priv->hws_mpool);
+		priv->hws_mpool = NULL;
+	}
 	if (priv->mtr_bulk.aso) {
 		mlx5_free(priv->mtr_bulk.aso);
 		priv->mtr_bulk.aso = NULL;
@@ -6656,7 +6939,8 @@ int
 mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		     uint32_t nb_meters,
 		     uint32_t nb_meter_profiles,
-		     uint32_t nb_meter_policies)
+		     uint32_t nb_meter_policies,
+		     uint32_t nb_queues)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_obj *dcs = NULL;
@@ -6666,29 +6950,35 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *aso;
 	uint32_t i;
 	struct rte_flow_error error;
+	uint32_t flags;
+	uint32_t nb_mtrs = rte_align32pow2(nb_meters);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_mtr),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.max_idx = nb_meters,
+		.free = mlx5_free,
+		.type = "mlx5_hw_mtr_mark_action",
+	};
 
 	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter configuration is invalid.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter configuration is invalid.");
 		goto err;
 	}
 	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO is not supported.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO is not supported.");
 		goto err;
 	}
 	priv->mtr_config.nb_meters = nb_meters;
-	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
-		ret = ENOMEM;
-		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO queue allocation failed.");
-		goto err;
-	}
 	log_obj_size = rte_log2_u32(nb_meters >> 1);
 	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
 		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
@@ -6696,8 +6986,8 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (!dcs) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO object allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO object allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.devx_obj = dcs;
@@ -6705,31 +6995,33 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (reg_id < 0) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter register is not available.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter register is not available.");
 		goto err;
 	}
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
 	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
 			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
-				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
-				MLX5DR_ACTION_FLAG_HWS_TX |
-				MLX5DR_ACTION_FLAG_HWS_FDB);
+				reg_id - REG_C_0, flags);
 	if (!priv->mtr_bulk.action) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter action creation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter action creation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
-						sizeof(struct mlx5_aso_mtr) * nb_meters,
-						RTE_CACHE_LINE_SIZE,
-						SOCKET_ID_ANY);
+					 sizeof(struct mlx5_aso_mtr) *
+					 nb_meters,
+					 RTE_CACHE_LINE_SIZE,
+					 SOCKET_ID_ANY);
 	if (!priv->mtr_bulk.aso) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter bulk ASO allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter bulk ASO allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.size = nb_meters;
@@ -6740,32 +7032,65 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		aso->offset = i;
 		aso++;
 	}
+	priv->hws_mpool = mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_aso_mtr_pool),
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	if (!priv->hws_mpool) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ipool allocation failed.");
+		goto err;
+	}
+	priv->hws_mpool->devx_obj = priv->mtr_bulk.devx_obj;
+	priv->hws_mpool->action = priv->mtr_bulk.action;
+	priv->hws_mpool->nb_sq = nb_queues;
+	if (mlx5_aso_mtr_queue_init(priv->sh, priv->hws_mpool,
+				    &priv->sh->mtrmng->pools_mng, nb_queues)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	/*
+	 * No need for local cache if Meter number is a small number.
+	 * Since flow insertion rate will be very limited in that case.
+	 * Here let's set the number to less than default trunk size 4K.
+	 */
+	if (nb_mtrs <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_mtrs;
+	} else if (nb_mtrs <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	priv->hws_mpool->idx_pool = mlx5_ipool_create(&cfg);
 	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
 	priv->mtr_profile_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_profile) *
-				nb_meter_profiles,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_profile) *
+			    nb_meter_profiles,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_profile_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter profile allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter profile allocation failed.");
 		goto err;
 	}
 	priv->mtr_config.nb_meter_policies = nb_meter_policies;
 	priv->mtr_policy_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_policy) *
-				nb_meter_policies,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_policy) *
+			    nb_meter_policies,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_policy_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter policy allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter policy allocation failed.");
 		goto err;
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index 8cf24d1f7a..ed2306283d 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -588,6 +588,36 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR profile.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_profile *
+mlx5_flow_meter_profile_get(struct rte_eth_dev *dev,
+			  uint32_t meter_profile_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_profile_find(priv,
+							meter_profile_id);
+}
+
 /**
  * Callback to add MTR profile with HWS.
  *
@@ -1150,6 +1180,37 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR policy.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_policy *
+mlx5_flow_meter_policy_get(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t policy_idx;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_policy_find(dev, policy_id,
+							      &policy_idx);
+}
+
 /**
  * Callback to delete MTR policy for HWS.
  *
@@ -1310,9 +1371,9 @@ mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
 			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
 			NULL, "Meter policy already exists.");
 	if (!policy ||
-	    !policy->actions[RTE_COLOR_RED] ||
-	    !policy->actions[RTE_COLOR_YELLOW] ||
-	    !policy->actions[RTE_COLOR_GREEN])
+	    (!policy->actions[RTE_COLOR_RED] &&
+	    !policy->actions[RTE_COLOR_YELLOW] &&
+	    !policy->actions[RTE_COLOR_GREEN]))
 		return -rte_mtr_error_set(error, EINVAL,
 					  RTE_MTR_ERROR_TYPE_METER_POLICY,
 					  NULL, "Meter policy actions are not valid.");
@@ -1372,6 +1433,11 @@ mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
 			act++;
 		}
 	}
+	if (priv->sh->config.dv_esw_en)
+		domain_color &= ~(MLX5_MTR_DOMAIN_EGRESS_BIT |
+				  MLX5_MTR_DOMAIN_TRANSFER_BIT);
+	else
+		domain_color &= ~MLX5_MTR_DOMAIN_TRANSFER_BIT;
 	if (!domain_color)
 		return -rte_mtr_error_set(error, ENOTSUP,
 					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
@@ -1565,11 +1631,11 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
+		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
 		if (ret)
 			return ret;
 	} else {
@@ -1815,8 +1881,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1921,7 +1987,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->shared = !!shared;
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
-	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
 					   &priv->mtr_bulk);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
@@ -2401,9 +2467,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_create,
 	.destroy = mlx5_flow_meter_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2418,9 +2486,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_hws_create,
 	.destroy = mlx5_flow_meter_hws_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2566,7 +2636,7 @@ mlx5_flow_meter_attach(struct mlx5_priv *priv,
 		struct mlx5_aso_mtr *aso_mtr;
 
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
 			return rte_flow_error_set(error, ENOENT,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
@@ -2865,7 +2935,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		}
 	}
 	if (priv->mtr_bulk.aso) {
-		for (i = 1; i <= priv->mtr_config.nb_meter_profiles; i++) {
+		for (i = 0; i < priv->mtr_config.nb_meters; i++) {
 			aso_mtr = mlx5_aso_meter_by_idx(priv, i);
 			fm = &aso_mtr->fm;
 			if (fm->initialized)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 13/18] net/mlx5: add HWS AGE action support
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (11 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 14/18] net/mlx5: add async action push and pull support Suanming Mou
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Michael Baum

From: Michael Baum <michaelba@nvidia.com>

Add support for AGE action for HW steering.
This patch includes:

 1. Add new structures to manage the aging.
 2. Initialize all them in configure function.
 3. Implement per second aging check using CNT background thread.
 4. Enable AGE action in flow create/destroy operations.
 5. Implement queue-based function to report aged flow rules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |   14 +
 drivers/net/mlx5/mlx5.c            |   67 +-
 drivers/net/mlx5/mlx5.h            |   51 +-
 drivers/net/mlx5/mlx5_defs.h       |    3 +
 drivers/net/mlx5/mlx5_flow.c       |   91 ++-
 drivers/net/mlx5/mlx5_flow.h       |   33 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   30 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1145 ++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_hws_cnt.c    |  753 +++++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.h    |  193 ++++-
 drivers/net/mlx5/mlx5_utils.h      |   10 +-
 12 files changed, 2127 insertions(+), 267 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 12646550b0..ae4d406ca1 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -560,6 +560,20 @@ Limitations
 - The NIC egress flow rules on representor port are not supported.
 
 
+- HWS AGE action in mlx5:
+
+  - Using the same indirect COUNT action combined with multiple AGE actions in
+    different flows may cause a wrong AGE state for the AGE actions.
+  - Creating/destroying flow rules with indirect AGE action when it is active
+    (timeout != 0) may cause a wrong AGE state for the indirect AGE action.
+  - The mlx5 driver reuses counters for aging action, so for optimization
+    the values in ``rte_flow_port_attr`` structure should describe:
+
+    - ``nb_counters`` is the number of flow rules using counter (with/without AGE)
+      in addition to flow rules using only AGE (without COUNT action).
+    - ``nb_aging_objects`` is the number of flow rules containing AGE action.
+
+
 Statistics
 ----------
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 64a0e6f31d..4e532f0807 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -497,6 +497,12 @@ mlx5_flow_aging_init(struct mlx5_dev_ctx_shared *sh)
 	uint32_t i;
 	struct mlx5_age_info *age_info;
 
+	/*
+	 * In HW steering, aging information structure is initialized later
+	 * during configure function.
+	 */
+	if (sh->config.dv_flow_en == 2)
+		return;
 	for (i = 0; i < sh->max_port; i++) {
 		age_info = &sh->port[i].age_info;
 		age_info->flags = 0;
@@ -540,8 +546,8 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 			hca_attr->flow_counter_bulk_alloc_bitmap);
 	/* Initialize fallback mode only on the port initializes sh. */
 	if (sh->refcnt == 1)
-		sh->cmng.counter_fallback = fallback;
-	else if (fallback != sh->cmng.counter_fallback)
+		sh->sws_cmng.counter_fallback = fallback;
+	else if (fallback != sh->sws_cmng.counter_fallback)
 		DRV_LOG(WARNING, "Port %d in sh has different fallback mode "
 			"with others:%d.", PORT_ID(priv), fallback);
 #endif
@@ -556,17 +562,38 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 static void
 mlx5_flow_counters_mng_init(struct mlx5_dev_ctx_shared *sh)
 {
-	int i;
+	int i, j;
+
+	if (sh->config.dv_flow_en < 2) {
+		memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
+		TAILQ_INIT(&sh->sws_cmng.flow_counters);
+		sh->sws_cmng.min_id = MLX5_CNT_BATCH_OFFSET;
+		sh->sws_cmng.max_id = -1;
+		sh->sws_cmng.last_pool_idx = POOL_IDX_INVALID;
+		rte_spinlock_init(&sh->sws_cmng.pool_update_sl);
+		for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
+			TAILQ_INIT(&sh->sws_cmng.counters[i]);
+			rte_spinlock_init(&sh->sws_cmng.csl[i]);
+		}
+	} else {
+		struct mlx5_hca_attr *attr = &sh->cdev->config.hca_attr;
+		uint32_t fw_max_nb_cnts = attr->max_flow_counter;
+		uint8_t log_dcs = log2above(fw_max_nb_cnts) - 1;
+		uint32_t max_nb_cnts = 0;
+
+		for (i = 0, j = 0; j < MLX5_HWS_CNT_DCS_NUM; ++i) {
+			int log_dcs_i = log_dcs - i;
 
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
-	TAILQ_INIT(&sh->cmng.flow_counters);
-	sh->cmng.min_id = MLX5_CNT_BATCH_OFFSET;
-	sh->cmng.max_id = -1;
-	sh->cmng.last_pool_idx = POOL_IDX_INVALID;
-	rte_spinlock_init(&sh->cmng.pool_update_sl);
-	for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
-		TAILQ_INIT(&sh->cmng.counters[i]);
-		rte_spinlock_init(&sh->cmng.csl[i]);
+			if (log_dcs_i < 0)
+				break;
+			if ((max_nb_cnts | RTE_BIT32(log_dcs_i)) >
+			    fw_max_nb_cnts)
+				continue;
+			max_nb_cnts |= RTE_BIT32(log_dcs_i);
+			j++;
+		}
+		sh->hws_max_log_bulk_sz = log_dcs;
+		sh->hws_max_nb_counters = max_nb_cnts;
 	}
 }
 
@@ -607,13 +634,13 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 		rte_pause();
 	}
 
-	if (sh->cmng.pools) {
+	if (sh->sws_cmng.pools) {
 		struct mlx5_flow_counter_pool *pool;
-		uint16_t n_valid = sh->cmng.n_valid;
-		bool fallback = sh->cmng.counter_fallback;
+		uint16_t n_valid = sh->sws_cmng.n_valid;
+		bool fallback = sh->sws_cmng.counter_fallback;
 
 		for (i = 0; i < n_valid; ++i) {
-			pool = sh->cmng.pools[i];
+			pool = sh->sws_cmng.pools[i];
 			if (!fallback && pool->min_dcs)
 				claim_zero(mlx5_devx_cmd_destroy
 							       (pool->min_dcs));
@@ -632,14 +659,14 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 			}
 			mlx5_free(pool);
 		}
-		mlx5_free(sh->cmng.pools);
+		mlx5_free(sh->sws_cmng.pools);
 	}
-	mng = LIST_FIRST(&sh->cmng.mem_mngs);
+	mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	while (mng) {
 		mlx5_flow_destroy_counter_stat_mem_mng(mng);
-		mng = LIST_FIRST(&sh->cmng.mem_mngs);
+		mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	}
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
+	memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d3267fafda..09ab7a080a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -644,12 +644,45 @@ struct mlx5_geneve_tlv_option_resource {
 /* Current time in seconds. */
 #define MLX5_CURR_TIME_SEC	(rte_rdtsc() / rte_get_tsc_hz())
 
+/*
+ * HW steering queue oriented AGE info.
+ * It contains an array of rings, one for each HWS queue.
+ */
+struct mlx5_hws_q_age_info {
+	uint16_t nb_rings; /* Number of aged-out ring lists. */
+	struct rte_ring *aged_lists[]; /* Aged-out lists. */
+};
+
+/*
+ * HW steering AGE info.
+ * It has a ring list containing all aged out flow rules.
+ */
+struct mlx5_hws_age_info {
+	struct rte_ring *aged_list; /* Aged out lists. */
+};
+
 /* Aging information for per port. */
 struct mlx5_age_info {
 	uint8_t flags; /* Indicate if is new event or need to be triggered. */
-	struct mlx5_counters aged_counters; /* Aged counter list. */
-	struct aso_age_list aged_aso; /* Aged ASO actions list. */
-	rte_spinlock_t aged_sl; /* Aged flow list lock. */
+	union {
+		/* SW/FW steering AGE info. */
+		struct {
+			struct mlx5_counters aged_counters;
+			/* Aged counter list. */
+			struct aso_age_list aged_aso;
+			/* Aged ASO actions list. */
+			rte_spinlock_t aged_sl; /* Aged flow list lock. */
+		};
+		struct {
+			struct mlx5_indexed_pool *ages_ipool;
+			union {
+				struct mlx5_hws_age_info hw_age;
+				/* HW steering AGE info. */
+				struct mlx5_hws_q_age_info *hw_q_age;
+				/* HW steering queue oriented AGE info. */
+			};
+		};
+	};
 };
 
 /* Per port data of shared IB device. */
@@ -1307,6 +1340,9 @@ struct mlx5_dev_ctx_shared {
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
 	uint32_t shared_mark_enabled:1;
 	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
+	uint32_t hws_max_log_bulk_sz:5;
+	/* Log of minimal HWS counters created hard coded. */
+	uint32_t hws_max_nb_counters; /* Maximal number for HWS counters. */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1347,7 +1383,8 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_list *dest_array_list;
 	struct mlx5_list *flex_parsers_dv; /* Flex Item parsers. */
 	/* List of destination array actions. */
-	struct mlx5_flow_counter_mng cmng; /* Counters management structure. */
+	struct mlx5_flow_counter_mng sws_cmng;
+	/* SW steering counters management structure. */
 	void *default_miss_action; /* Default miss action. */
 	struct mlx5_indexed_pool *ipool[MLX5_IPOOL_MAX];
 	struct mlx5_indexed_pool *mdh_ipools[MLX5_MAX_MODIFY_NUM];
@@ -1677,6 +1714,9 @@ struct mlx5_priv {
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
+	uint32_t hws_strict_queue:1;
+	/**< Whether all operations strictly happen on the same HWS queue. */
+	uint32_t hws_age_req:1; /**< Whether this port has AGE indexed pool. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
@@ -1992,6 +2032,9 @@ int mlx5_validate_action_ct(struct rte_eth_dev *dev,
 			    const struct rte_flow_action_conntrack *conntrack,
 			    struct rte_flow_error *error);
 
+int mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			       void **contexts, uint32_t nb_contexts,
+			       struct rte_flow_error *error);
 
 /* mlx5_mp_os.c */
 
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index d064abfef3..2af8c731ef 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -43,6 +43,9 @@
 #define MLX5_PMD_SOFT_COUNTERS 1
 #endif
 
+/* Maximum number of DCS created per port. */
+#define MLX5_HWS_CNT_DCS_NUM 4
+
 /* Alarm timeout. */
 #define MLX5_ALARM_TIMEOUT_US 100000
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index e3485352db..c32255a3f9 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -989,6 +989,9 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.isolate = mlx5_flow_isolate,
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	.get_q_aged_flows = mlx5_flow_get_q_aged_flows,
+#endif
 	.get_aged_flows = mlx5_flow_get_aged_flows,
 	.action_handle_create = mlx5_action_handle_create,
 	.action_handle_destroy = mlx5_action_handle_destroy,
@@ -8944,11 +8947,11 @@ mlx5_flow_create_counter_stat_mem_mng(struct mlx5_dev_ctx_shared *sh)
 		mem_mng->raws[i].data = raw_data + i * MLX5_COUNTERS_PER_POOL;
 	}
 	for (i = 0; i < MLX5_MAX_PENDING_QUERIES; ++i)
-		LIST_INSERT_HEAD(&sh->cmng.free_stat_raws,
+		LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws,
 				 mem_mng->raws + MLX5_CNT_CONTAINER_RESIZE + i,
 				 next);
-	LIST_INSERT_HEAD(&sh->cmng.mem_mngs, mem_mng, next);
-	sh->cmng.mem_mng = mem_mng;
+	LIST_INSERT_HEAD(&sh->sws_cmng.mem_mngs, mem_mng, next);
+	sh->sws_cmng.mem_mng = mem_mng;
 	return 0;
 }
 
@@ -8967,7 +8970,7 @@ static int
 mlx5_flow_set_counter_stat_mem(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_flow_counter_pool *pool)
 {
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	/* Resize statistic memory once used out. */
 	if (!(pool->index % MLX5_CNT_CONTAINER_RESIZE) &&
 	    mlx5_flow_create_counter_stat_mem_mng(sh)) {
@@ -8996,14 +8999,14 @@ mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh)
 {
 	uint32_t pools_n, us;
 
-	pools_n = __atomic_load_n(&sh->cmng.n_valid, __ATOMIC_RELAXED);
+	pools_n = __atomic_load_n(&sh->sws_cmng.n_valid, __ATOMIC_RELAXED);
 	us = MLX5_POOL_QUERY_FREQ_US / pools_n;
 	DRV_LOG(DEBUG, "Set alarm for %u pools each %u us", pools_n, us);
 	if (rte_eal_alarm_set(us, mlx5_flow_query_alarm, sh)) {
-		sh->cmng.query_thread_on = 0;
+		sh->sws_cmng.query_thread_on = 0;
 		DRV_LOG(ERR, "Cannot reinitialize query alarm");
 	} else {
-		sh->cmng.query_thread_on = 1;
+		sh->sws_cmng.query_thread_on = 1;
 	}
 }
 
@@ -9019,12 +9022,12 @@ mlx5_flow_query_alarm(void *arg)
 {
 	struct mlx5_dev_ctx_shared *sh = arg;
 	int ret;
-	uint16_t pool_index = sh->cmng.pool_index;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	uint16_t pool_index = sh->sws_cmng.pool_index;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	uint16_t n_valid;
 
-	if (sh->cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
+	if (sh->sws_cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
 		goto set_alarm;
 	rte_spinlock_lock(&cmng->pool_update_sl);
 	pool = cmng->pools[pool_index];
@@ -9037,7 +9040,7 @@ mlx5_flow_query_alarm(void *arg)
 		/* There is a pool query in progress. */
 		goto set_alarm;
 	pool->raw_hw =
-		LIST_FIRST(&sh->cmng.free_stat_raws);
+		LIST_FIRST(&sh->sws_cmng.free_stat_raws);
 	if (!pool->raw_hw)
 		/* No free counter statistics raw memory. */
 		goto set_alarm;
@@ -9063,12 +9066,12 @@ mlx5_flow_query_alarm(void *arg)
 		goto set_alarm;
 	}
 	LIST_REMOVE(pool->raw_hw, next);
-	sh->cmng.pending_queries++;
+	sh->sws_cmng.pending_queries++;
 	pool_index++;
 	if (pool_index >= n_valid)
 		pool_index = 0;
 set_alarm:
-	sh->cmng.pool_index = pool_index;
+	sh->sws_cmng.pool_index = pool_index;
 	mlx5_set_query_alarm(sh);
 }
 
@@ -9151,7 +9154,7 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 		(struct mlx5_flow_counter_pool *)(uintptr_t)async_id;
 	struct mlx5_counter_stats_raw *raw_to_free;
 	uint8_t query_gen = pool->query_gen ^ 1;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 		pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 				MLX5_COUNTER_TYPE_ORIGIN;
@@ -9174,9 +9177,9 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 			rte_spinlock_unlock(&cmng->csl[cnt_type]);
 		}
 	}
-	LIST_INSERT_HEAD(&sh->cmng.free_stat_raws, raw_to_free, next);
+	LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws, raw_to_free, next);
 	pool->raw_hw = NULL;
-	sh->cmng.pending_queries--;
+	sh->sws_cmng.pending_queries--;
 }
 
 static int
@@ -9536,7 +9539,7 @@ mlx5_flow_dev_dump_sh_all(struct rte_eth_dev *dev,
 	struct mlx5_list_inconst *l_inconst;
 	struct mlx5_list_entry *e;
 	int lcore_index;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	uint32_t max;
 	void *action;
 
@@ -9707,18 +9710,58 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 {
 	const struct mlx5_flow_driver_ops *fops;
 	struct rte_flow_attr attr = { .transfer = 0 };
+	enum mlx5_flow_drv_type type = flow_get_drv_type(dev, &attr);
 
-	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_DV) {
-		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_DV);
-		return fops->get_aged_flows(dev, contexts, nb_contexts,
-						    error);
+	if (type == MLX5_FLOW_TYPE_DV || type == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(type);
+		return fops->get_aged_flows(dev, contexts, nb_contexts, error);
 	}
-	DRV_LOG(ERR,
-		"port %u get aged flows is not supported.",
-		 dev->data->port_id);
+	DRV_LOG(ERR, "port %u get aged flows is not supported.",
+		dev->data->port_id);
 	return -ENOTSUP;
 }
 
+/**
+ * Get aged-out flows per HWS queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query.
+ * @param[in] context
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_countexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+int
+mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			   void **contexts, uint32_t nb_contexts,
+			   struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = { 0 };
+
+	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+		return fops->get_q_aged_flows(dev, queue_id, contexts,
+					      nb_contexts, error);
+	}
+	DRV_LOG(ERR, "port %u queue %u get aged flows is not supported.",
+		dev->data->port_id, queue_id);
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "get Q aged flows with incorrect steering mode");
+}
+
 /* Wrapper for driver action_validate op callback */
 static int
 flow_drv_action_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 96198d7d17..5c57f51706 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -293,6 +293,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_MODIFY_FIELD (1ull << 39)
 #define MLX5_FLOW_ACTION_METER_WITH_TERMINATED_POLICY (1ull << 40)
 #define MLX5_FLOW_ACTION_CT (1ull << 41)
+#define MLX5_FLOW_ACTION_INDIRECT_COUNT (1ull << 42)
+#define MLX5_FLOW_ACTION_INDIRECT_AGE (1ull << 43)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -1099,6 +1101,22 @@ struct rte_flow {
 	uint32_t geneve_tlv_option; /**< Holds Geneve TLV option id. > */
 } __rte_packed;
 
+/*
+ * HWS COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
 #ifdef PEDANTIC
@@ -1115,7 +1133,8 @@ struct rte_flow_hw {
 		struct mlx5_hrxq *hrxq; /* TIR action. */
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
-	uint32_t cnt_id;
+	uint32_t age_idx;
+	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 	uint8_t rule[0]; /* HWS layer data struct. */
 } __rte_packed;
@@ -1166,7 +1185,7 @@ struct mlx5_action_construct_data {
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
 		struct {
-			uint32_t id;
+			cnt_id_t id;
 		} shared_counter;
 		struct {
 			uint32_t id;
@@ -1197,6 +1216,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint64_t action_flags; /* Bit-map of all valid action in template. */
 	uint16_t dr_actions_num; /* Amount of DR rules actions. */
 	uint16_t actions_num; /* Amount of flow actions */
 	uint16_t *actions_off; /* DR action offset for given rte action offset. */
@@ -1253,7 +1273,7 @@ struct mlx5_hw_actions {
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
-	uint32_t cnt_id; /* Counter id. */
+	cnt_id_t cnt_id; /* Counter id. */
 	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
@@ -1629,6 +1649,12 @@ typedef int (*mlx5_flow_get_aged_flows_t)
 					 void **context,
 					 uint32_t nb_contexts,
 					 struct rte_flow_error *error);
+typedef int (*mlx5_flow_get_q_aged_flows_t)
+					(struct rte_eth_dev *dev,
+					 uint32_t queue_id,
+					 void **context,
+					 uint32_t nb_contexts,
+					 struct rte_flow_error *error);
 typedef int (*mlx5_flow_action_validate_t)
 				(struct rte_eth_dev *dev,
 				 const struct rte_flow_indir_action_conf *conf,
@@ -1835,6 +1861,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_counter_free_t counter_free;
 	mlx5_flow_counter_query_t counter_query;
 	mlx5_flow_get_aged_flows_t get_aged_flows;
+	mlx5_flow_get_q_aged_flows_t get_q_aged_flows;
 	mlx5_flow_action_validate_t action_validate;
 	mlx5_flow_action_create_t action_create;
 	mlx5_flow_action_destroy_t action_destroy;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 868fa6e1a5..250f61d46f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5524,7 +5524,7 @@ flow_dv_validate_action_age(uint64_t action_flags,
 	const struct rte_flow_action_age *age = action->conf;
 
 	if (!priv->sh->cdev->config.devx ||
-	    (priv->sh->cmng.counter_fallback && !priv->sh->aso_age_mng))
+	    (priv->sh->sws_cmng.counter_fallback && !priv->sh->aso_age_mng))
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -6085,7 +6085,7 @@ flow_dv_counter_get_by_idx(struct rte_eth_dev *dev,
 			   struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	/* Decrease to original index and clear shared bit. */
@@ -6179,7 +6179,7 @@ static int
 flow_dv_container_resize(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	void *old_pools = cmng->pools;
 	uint32_t resize = cmng->n + MLX5_CNT_CONTAINER_RESIZE;
 	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * resize;
@@ -6225,7 +6225,7 @@ _flow_dv_query_count(struct rte_eth_dev *dev, uint32_t counter, uint64_t *pkts,
 
 	cnt = flow_dv_counter_get_by_idx(dev, counter, &pool);
 	MLX5_ASSERT(pool);
-	if (priv->sh->cmng.counter_fallback)
+	if (priv->sh->sws_cmng.counter_fallback)
 		return mlx5_devx_cmd_flow_counter_query(cnt->dcs_when_active, 0,
 					0, pkts, bytes, 0, NULL, NULL, 0);
 	rte_spinlock_lock(&pool->sl);
@@ -6262,8 +6262,8 @@ flow_dv_pool_create(struct rte_eth_dev *dev, struct mlx5_devx_obj *dcs,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t size = sizeof(*pool);
 
 	size += MLX5_COUNTERS_PER_POOL * MLX5_CNT_SIZE;
@@ -6324,14 +6324,14 @@ flow_dv_counter_pool_prepare(struct rte_eth_dev *dev,
 			     uint32_t age)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	struct mlx5_counters tmp_tq;
 	struct mlx5_devx_obj *dcs = NULL;
 	struct mlx5_flow_counter *cnt;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t i;
 
 	if (fallback) {
@@ -6395,8 +6395,8 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt_free = NULL;
-	bool fallback = priv->sh->cmng.counter_fallback;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
 	uint32_t cnt_idx;
@@ -6442,7 +6442,7 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	if (_flow_dv_query_count(dev, cnt_idx, &cnt_free->hits,
 				 &cnt_free->bytes))
 		goto err;
-	if (!fallback && !priv->sh->cmng.query_thread_on)
+	if (!fallback && !priv->sh->sws_cmng.query_thread_on)
 		/* Start the asynchronous batch query by the host thread. */
 		mlx5_set_query_alarm(priv->sh);
 	/*
@@ -6570,7 +6570,7 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 	 * this case, lock will not be needed as query callback and release
 	 * function both operate with the different list.
 	 */
-	if (!priv->sh->cmng.counter_fallback) {
+	if (!priv->sh->sws_cmng.counter_fallback) {
 		rte_spinlock_lock(&pool->csl);
 		TAILQ_INSERT_TAIL(&pool->counters[pool->query_gen], cnt, next);
 		rte_spinlock_unlock(&pool->csl);
@@ -6578,10 +6578,10 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 		cnt->dcs_when_free = cnt->dcs_when_active;
 		cnt_type = pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 					   MLX5_COUNTER_TYPE_ORIGIN;
-		rte_spinlock_lock(&priv->sh->cmng.csl[cnt_type]);
-		TAILQ_INSERT_TAIL(&priv->sh->cmng.counters[cnt_type],
+		rte_spinlock_lock(&priv->sh->sws_cmng.csl[cnt_type]);
+		TAILQ_INSERT_TAIL(&priv->sh->sws_cmng.counters[cnt_type],
 				  cnt, next);
-		rte_spinlock_unlock(&priv->sh->cmng.csl[cnt_type]);
+		rte_spinlock_unlock(&priv->sh->sws_cmng.csl[cnt_type]);
 	}
 }
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 52125c861e..59d9db04d3 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -477,7 +477,8 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
 				  enum rte_flow_action_type type,
 				  uint16_t action_src,
 				  uint16_t action_dst)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -512,7 +513,8 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				uint16_t action_src,
 				uint16_t action_dst,
 				uint16_t len)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -582,7 +584,8 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 				     uint16_t action_dst,
 				     uint32_t idx,
 				     struct mlx5_shared_action_rss *rss)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -621,7 +624,8 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 				     uint16_t action_src,
 				     uint16_t action_dst,
 				     cnt_id_t cnt_id)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -717,6 +721,10 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/* Not supported, prevent by validate function. */
+		MLX5_ASSERT(0);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
 				       idx, &acts->rule_acts[action_dst]))
@@ -1109,7 +1117,7 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	cnt_id_t cnt_id;
 	int ret;
 
-	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0);
 	if (ret != 0)
 		return ret;
 	ret = mlx5_hws_cnt_pool_get_action_offset
@@ -1250,8 +1258,6 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to the rte_eth_dev structure.
  * @param[in] cfg
  *   Pointer to the table configuration.
- * @param[in] item_templates
- *   Item template array to be binded to the table.
  * @param[in/out] acts
  *   Pointer to the template HW steering DR actions.
  * @param[in] at
@@ -1260,7 +1266,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to error structure.
  *
  * @return
- *    Table on success, NULL otherwise and rte_errno is set.
+ *   0 on success, a negative errno otherwise and rte_errno is set.
  */
 static int
 __flow_hw_actions_translate(struct rte_eth_dev *dev,
@@ -1289,6 +1295,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t jump_pos;
 	uint32_t ct_idx;
 	int err;
+	uint32_t target_grp = 0;
 
 	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
@@ -1516,8 +1523,42 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 							action_pos))
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Age action on root table is not supported in HW steering mode");
+			}
+			action_pos = at->actions_off[actions - at->actions];
+			if (__flow_hw_act_data_general_append(priv, acts,
+							 actions->type,
+							 actions - action_start,
+							 action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			action_pos = at->actions_off[actions - action_start];
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Counter action on root table is not supported in HW steering mode");
+			}
+			if ((at->action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * When both COUNT and AGE are requested, it is
+				 * saved as AGE action which creates also the
+				 * counter.
+				 */
+				break;
+			action_pos = at->actions_off[actions - at->actions];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
@@ -1744,6 +1785,10 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *   Pointer to the flow table.
  * @param[in] it_idx
  *   Item template index the action template refer to.
+ * @param[in] action_flags
+ *   Actions bit-map detected in this template.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
  * @param[in] rule_act
  *   Pointer to the shared action's destination rule DR action.
  *
@@ -1754,7 +1799,8 @@ static __rte_always_inline int
 flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
-				const uint8_t it_idx,
+				const uint8_t it_idx, uint64_t action_flags,
+				struct rte_flow_hw *flow,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -1762,11 +1808,14 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
 	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_age_info *age_info;
+	struct mlx5_hws_age_param *param;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
 		       ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	uint64_t item_flags;
+	cnt_id_t age_cnt;
 
 	memset(&act_data, 0, sizeof(act_data));
 	switch (type) {
@@ -1792,6 +1841,44 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				&rule_act->action,
 				&rule_act->counter.offset))
 			return -1;
+		flow->cnt_id = act_idx;
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/*
+		 * Save the index with the indirect type, to recognize
+		 * it in flow destroy.
+		 */
+		flow->age_idx = act_idx;
+		if (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+			/*
+			 * The mutual update for idirect AGE & COUNT will be
+			 * performed later after we have ID for both of them.
+			 */
+			break;
+		age_info = GET_PORT_AGE_INFO(priv);
+		param = mlx5_ipool_get(age_info->ages_ipool, idx);
+		if (param == NULL)
+			return -1;
+		if (action_flags & MLX5_FLOW_ACTION_COUNT) {
+			if (mlx5_hws_cnt_pool_get(priv->hws_cpool,
+						  &param->queue_id, &age_cnt,
+						  idx) < 0)
+				return -1;
+			flow->cnt_id = age_cnt;
+			param->nb_cnts++;
+		} else {
+			/*
+			 * Get the counter of this indirect AGE or create one
+			 * if doesn't exist.
+			 */
+			age_cnt = mlx5_hws_age_cnt_get(priv, param, idx);
+			if (age_cnt == 0)
+				return -1;
+		}
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+						     age_cnt, &rule_act->action,
+						     &rule_act->counter.offset))
+			return -1;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
@@ -1952,7 +2039,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t queue)
+			  uint32_t queue,
+			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1965,6 +2053,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
 	const struct rte_flow_action_meter *meter = NULL;
+	const struct rte_flow_action_age *age = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1972,6 +2061,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	uint32_t age_idx = 0;
 	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
@@ -2024,6 +2114,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
 					(dev, queue, action, table, it_idx,
+					 at->action_flags, job->flow,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -2132,9 +2223,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			age = action->conf;
+			/*
+			 * First, create the AGE parameter, then create its
+			 * counter later:
+			 * Regular counter - in next case.
+			 * Indirect counter - update it after the loop.
+			 */
+			age_idx = mlx5_hws_age_action_create(priv, queue, 0,
+							     age,
+							     job->flow->idx,
+							     error);
+			if (age_idx == 0)
+				return -rte_errno;
+			job->flow->age_idx = age_idx;
+			if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+				/*
+				 * When AGE uses indirect counter, no need to
+				 * create counter but need to update it with the
+				 * AGE parameter, will be done after the loop.
+				 */
+				break;
+			/* Fall-through. */
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
-					&cnt_id);
+						    &cnt_id, age_idx);
 			if (ret != 0)
 				return ret;
 			ret = mlx5_hws_cnt_pool_get_action_offset
@@ -2191,6 +2305,25 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT) {
+		if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE) {
+			age_idx = job->flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+			if (mlx5_hws_cnt_age_get(priv->hws_cpool,
+						 job->flow->cnt_id) != age_idx)
+				/*
+				 * This is first use of this indirect counter
+				 * for this indirect AGE, need to increase the
+				 * number of counters.
+				 */
+				mlx5_hws_age_nb_cnt_increase(priv, age_idx);
+		}
+		/*
+		 * Update this indirect counter the indirect/direct AGE in which
+		 * using it.
+		 */
+		mlx5_hws_cnt_age_set(priv->hws_cpool, job->flow->cnt_id,
+				     age_idx);
+	}
 	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
@@ -2340,8 +2473,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
-				      pattern_template_index, actions, rule_acts, queue)) {
+	if (flow_hw_actions_construct(dev, job,
+				      &table->ats[action_template_index],
+				      pattern_template_index, actions,
+				      rule_acts, queue, error)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -2426,6 +2561,49 @@ flow_hw_async_flow_destroy(struct rte_eth_dev *dev,
 			"fail to create rte flow");
 }
 
+/**
+ * Release the AGE and counter for given flow.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue
+ *   The queue to release the counter.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
+ * @param[out] error
+ *   Pointer to error structure.
+ */
+static void
+flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
+			  struct rte_flow_hw *flow,
+			  struct rte_flow_error *error)
+{
+	if (mlx5_hws_cnt_is_shared(priv->hws_cpool, flow->cnt_id)) {
+		if (flow->age_idx && !mlx5_hws_age_is_indirect(flow->age_idx)) {
+			/* Remove this AGE parameter from indirect counter. */
+			mlx5_hws_cnt_age_set(priv->hws_cpool, flow->cnt_id, 0);
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+			flow->age_idx = 0;
+		}
+		return;
+	}
+	/* Put the counter first to reduce the race risk in BG thread. */
+	mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue, &flow->cnt_id);
+	flow->cnt_id = 0;
+	if (flow->age_idx) {
+		if (mlx5_hws_age_is_indirect(flow->age_idx)) {
+			uint32_t idx = flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+
+			mlx5_hws_age_nb_cnt_decrease(priv, idx);
+		} else {
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+		}
+		flow->age_idx = 0;
+	}
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2472,13 +2650,9 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
-			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
-			    mlx5_hws_cnt_is_shared
-				(priv->hws_cpool, job->flow->cnt_id) == false) {
-				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
-						&job->flow->cnt_id);
-				job->flow->cnt_id = 0;
-			}
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id))
+				flow_hw_age_count_release(priv, queue,
+							  job->flow, error);
 			if (job->flow->mtr_id) {
 				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
 				job->flow->mtr_id = 0;
@@ -3131,100 +3305,315 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static inline int
-flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
-				const struct rte_flow_action masks[],
-				const struct rte_flow_action *ins_actions,
-				const struct rte_flow_action *ins_masks,
-				struct rte_flow_action *new_actions,
-				struct rte_flow_action *new_masks,
-				uint16_t *ins_pos)
+/**
+ * Validate AGE action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[in] fixed_cnt
+ *   Indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_age(struct rte_eth_dev *dev,
+			    const struct rte_flow_action *action,
+			    uint64_t action_flags, bool fixed_cnt,
+			    struct rte_flow_error *error)
 {
-	uint16_t idx, total = 0;
-	uint16_t end_idx = UINT16_MAX;
-	bool act_end = false;
-	bool modify_field = false;
-	bool rss_or_queue = false;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
 
-	MLX5_ASSERT(actions && masks);
-	MLX5_ASSERT(new_actions && new_masks);
-	MLX5_ASSERT(ins_actions && ins_masks);
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_RSS:
-		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			/* It is assumed that application provided only single RSS/QUEUE action. */
-			MLX5_ASSERT(!rss_or_queue);
-			rss_or_queue = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			modify_field = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_END:
-			end_idx = idx;
-			act_end = true;
-			break;
-		default:
-			break;
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "AGE action not supported");
+	if (age_info->ages_ipool == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "aging pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_AGE) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate AGE actions set");
+	if (fixed_cnt)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "AGE and fixed COUNT combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate count action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_count(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      const struct rte_flow_action *mask,
+			      uint64_t action_flags,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count = mask->conf;
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "count action not supported");
+	if (!priv->hws_cpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "counters pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_COUNT) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate count actions set");
+	if (count && count->id && (action_flags & MLX5_FLOW_ACTION_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, mask,
+					  "AGE and COUNT action shared by mask combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate meter_mark action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_meter_mark(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	RTE_SET_USED(action);
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark action not supported");
+	if (!priv->hws_mpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark pool not initialized");
+	return 0;
+}
+
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in, out] action_flags
+ *   Holds the actions detected until now.
+ * @param[in, out] fixed_cnt
+ *   Pointer to indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_indirect(struct rte_eth_dev *dev,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *mask,
+				 uint64_t *action_flags, bool *fixed_cnt,
+				 struct rte_flow_error *error)
+{
+	uint32_t type;
+	int ret;
+
+	if (!mask)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "Unable to determine indirect action type without a mask specified");
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		ret = flow_hw_validate_action_meter_mark(dev, mask, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_METER;
+		break;
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_RSS;
+		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_CT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (action->conf && mask->conf) {
+			if ((*action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (*action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * AGE cannot use indirect counter which is
+				 * shared with enother flow rules.
+				 */
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "AGE and fixed COUNT combination is not supported");
+			*fixed_cnt = true;
 		}
+		ret = flow_hw_validate_action_count(dev, action, mask,
+						    *action_flags, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_COUNT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		ret = flow_hw_validate_action_age(dev, action, *action_flags,
+						  *fixed_cnt, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_AGE;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, mask,
+					  "Unsupported indirect action type");
 	}
-	if (!rss_or_queue)
-		return 0;
-	else if (idx >= MLX5_HW_MAX_ACTS)
-		return -1; /* No more space. */
-	total = idx;
-	/*
-	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
-	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
-	 * first MODIFY_FIELD flow action.
-	 */
-	if (modify_field) {
-		*ins_pos = end_idx;
-		goto insert_meta_copy;
-	}
-	/*
-	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
-	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	return 0;
+}
+
+/**
+ * Validate raw_encap action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_raw_encap(struct rte_eth_dev *dev __rte_unused,
+				  const struct rte_flow_action *action,
+				  struct rte_flow_error *error)
+{
+	const struct rte_flow_action_raw_encap *raw_encap_data = action->conf;
+
+	if (!raw_encap_data || !raw_encap_data->size || !raw_encap_data->data)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "invalid raw_encap_data");
+	return 0;
+}
+
+static inline uint16_t
+flow_hw_template_expand_modify_field(const struct rte_flow_action actions[],
+				     const struct rte_flow_action masks[],
+				     const struct rte_flow_action *mf_action,
+				     const struct rte_flow_action *mf_mask,
+				     struct rte_flow_action *new_actions,
+				     struct rte_flow_action *new_masks,
+				     uint64_t flags, uint32_t act_num)
+{
+	uint32_t i, tail;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(mf_action && mf_mask);
+	if (flags & MLX5_FLOW_ACTION_MODIFY_FIELD) {
+		/*
+		 * Application action template already has Modify Field.
+		 * It's location will be used in DR.
+		 * Expanded MF action can be added before the END.
+		 */
+		i = act_num - 1;
+		goto insert;
+	}
+	/**
+	 * Locate the first action positioned BEFORE the new MF.
+	 *
+	 * Search for a place to insert modify header
+	 * from the END action backwards:
+	 * 1. END is always present in actions array
+	 * 2. END location is always at action[act_num - 1]
+	 * 3. END always positioned AFTER modify field location
+	 *
+	 * Relative actions order is the same for RX, TX and FDB.
+	 *
+	 * Current actions order (draft-3)
+	 * @see action_order_arr[]
 	 */
-	act_end = false;
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_COUNT:
-		case RTE_FLOW_ACTION_TYPE_METER:
-		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+	for (i = act_num - 2; (int)i >= 0; i--) {
+		enum rte_flow_action_type type = actions[i].type;
+
+		if (type == RTE_FLOW_ACTION_TYPE_INDIRECT)
+			type = masks[i].type;
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_DROP:
+		case RTE_FLOW_ACTION_TYPE_JUMP:
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			*ins_pos = idx;
-			act_end = true;
-			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+		case RTE_FLOW_ACTION_TYPE_VOID:
 		case RTE_FLOW_ACTION_TYPE_END:
-			act_end = true;
 			break;
 		default:
+			i++; /* new MF inserted AFTER actions[i] */
+			goto insert;
 			break;
 		}
 	}
-insert_meta_copy:
-	MLX5_ASSERT(*ins_pos != UINT16_MAX);
-	MLX5_ASSERT(*ins_pos < total);
-	/* Before the position, no change for the actions. */
-	for (idx = 0; idx < *ins_pos; idx++) {
-		new_actions[idx] = actions[idx];
-		new_masks[idx] = masks[idx];
-	}
-	/* Insert the new action and mask to the position. */
-	new_actions[idx] = *ins_actions;
-	new_masks[idx] = *ins_masks;
-	/* Remaining content is right shifted by one position. */
-	for (; idx < total; idx++) {
-		new_actions[idx + 1] = actions[idx];
-		new_masks[idx + 1] = masks[idx];
-	}
-	return 0;
+	i = 0;
+insert:
+	tail = act_num - i; /* num action to move */
+	memcpy(new_actions, actions, sizeof(actions[0]) * i);
+	new_actions[i] = *mf_action;
+	memcpy(new_actions + i + 1, actions + i, sizeof(actions[0]) * tail);
+	memcpy(new_masks, masks, sizeof(masks[0]) * i);
+	new_masks[i] = *mf_mask;
+	memcpy(new_masks + i + 1, masks + i, sizeof(masks[0]) * tail);
+	return i;
 }
 
 static int
@@ -3295,13 +3684,17 @@ flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_actions_validate(struct rte_eth_dev *dev,
-			const struct rte_flow_actions_template_attr *attr,
-			const struct rte_flow_action actions[],
-			const struct rte_flow_action masks[],
-			struct rte_flow_error *error)
+mlx5_flow_hw_actions_validate(struct rte_eth_dev *dev,
+			      const struct rte_flow_actions_template_attr *attr,
+			      const struct rte_flow_action actions[],
+			      const struct rte_flow_action masks[],
+			      uint64_t *act_flags,
+			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count_mask = NULL;
+	bool fixed_cnt = false;
+	uint64_t action_flags = 0;
 	uint16_t i;
 	bool actions_end = false;
 	int ret;
@@ -3327,46 +3720,70 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_indirect(dev, action,
+							       mask,
+							       &action_flags,
+							       &fixed_cnt,
+							       error);
+			if (ret < 0)
+				return ret;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_MARK;
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DROP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_JUMP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_QUEUE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_RSS;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_raw_encap(dev, action, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_meter_mark(dev, action,
+								 error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
@@ -3374,21 +3791,43 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 									error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			ret = flow_hw_validate_action_represented_port
 					(dev, action, mask, error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_PORT_ID;
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			if (count_mask && count_mask->id)
+				fixed_cnt = true;
+			ret = flow_hw_validate_action_age(dev, action,
+							  action_flags,
+							  fixed_cnt, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_AGE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_count(dev, action, mask,
+							    action_flags,
+							    error);
+			if (ret < 0)
+				return ret;
+			count_mask = mask->conf;
+			action_flags |= MLX5_FLOW_ACTION_COUNT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_CT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_flags |= MLX5_FLOW_ACTION_OF_POP_VLAN;
+			break;
 		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			action_flags |= MLX5_FLOW_ACTION_OF_SET_VLAN_VID;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
 			ret = flow_hw_validate_action_push_vlan
@@ -3398,6 +3837,7 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			i += is_of_vlan_pcp_present(action) ?
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
+			action_flags |= MLX5_FLOW_ACTION_OF_PUSH_VLAN;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -3409,9 +3849,23 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 						  "action not supported in template API");
 		}
 	}
+	if (act_flags != NULL)
+		*act_flags = action_flags;
 	return 0;
 }
 
+static int
+flow_hw_actions_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error)
+{
+	return mlx5_flow_hw_actions_validate(dev, attr, actions, masks, NULL,
+					     error);
+}
+
+
 static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
 	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
@@ -3424,7 +3878,6 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
-	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
 	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
@@ -3434,7 +3887,7 @@ static int
 flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 					  unsigned int action_src,
 					  enum mlx5dr_action_type *action_types,
-					  uint16_t *curr_off,
+					  uint16_t *curr_off, uint16_t *cnt_off,
 					  struct rte_flow_actions_template *at)
 {
 	uint32_t type;
@@ -3451,10 +3904,18 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		at->actions_off[action_src] = *curr_off;
-		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
-		*curr_off = *curr_off + 1;
+		/*
+		 * Both AGE and COUNT action need counter, the first one fills
+		 * the action_types array, and the second only saves the offset.
+		 */
+		if (*cnt_off == UINT16_MAX) {
+			*cnt_off = *curr_off;
+			action_types[*cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			*curr_off = *curr_off + 1;
+		}
+		at->actions_off[action_src] = *cnt_off;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		at->actions_off[action_src] = *curr_off;
@@ -3493,6 +3954,7 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
 	uint16_t reformat_off = UINT16_MAX;
 	uint16_t mhdr_off = UINT16_MAX;
+	uint16_t cnt_off = UINT16_MAX;
 	int ret;
 	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		const struct rte_flow_action_raw_encap *raw_encap_data;
@@ -3505,9 +3967,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
-									action_types,
-									&curr_off, at);
+			ret = flow_hw_dr_actions_template_handle_shared
+								 (&at->masks[i],
+								  i,
+								  action_types,
+								  &curr_off,
+								  &cnt_off, at);
 			if (ret)
 				return NULL;
 			break;
@@ -3563,6 +4028,19 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 			if (curr_off >= MLX5_HW_MAX_ACTS)
 				goto err_actions_num;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/*
+			 * Both AGE and COUNT action need counter, the first
+			 * one fills the action_types array, and the second only
+			 * saves the offset.
+			 */
+			if (cnt_off == UINT16_MAX) {
+				cnt_off = curr_off++;
+				action_types[cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			}
+			at->actions_off[i] = cnt_off;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3703,6 +4181,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = UINT16_MAX;
+	uint64_t action_flags = 0;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
@@ -3745,22 +4224,9 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
+	if (mlx5_flow_hw_actions_validate(dev, attr, actions, masks,
+					  &action_flags, error))
 		return NULL;
-	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
-	    priv->sh->config.dv_esw_en) {
-		/* Application should make sure only one Q/RSS exist in one rule. */
-		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
-						    tmp_action, tmp_mask, &pos)) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					   "Failed to concatenate new action/mask");
-			return NULL;
-		} else if (pos != UINT16_MAX) {
-			ra = tmp_action;
-			rm = tmp_mask;
-		}
-	}
 	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		switch (ra[i].type) {
 		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
@@ -3786,6 +4252,28 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
 		return NULL;
 	}
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en &&
+	    (action_flags & (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS))) {
+		/* Insert META copy */
+		if (act_num + 1 > MLX5_HW_MAX_ACTS) {
+			rte_flow_error_set(error, E2BIG,
+					   RTE_FLOW_ERROR_TYPE_ACTION,
+					   NULL, "cannot expand: too many actions");
+			return NULL;
+		}
+		/* Application should make sure only one Q/RSS exist in one rule. */
+		pos = flow_hw_template_expand_modify_field(actions, masks,
+							   &rx_cpy,
+							   &rx_cpy_mask,
+							   tmp_action, tmp_mask,
+							   action_flags,
+							   act_num);
+		ra = tmp_action;
+		rm = tmp_mask;
+		act_num++;
+		action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
+	}
 	if (set_vlan_vid_ix != -1) {
 		/* If temporary action buffer was not used, copy template actions to it */
 		if (ra == actions && rm == masks) {
@@ -3856,6 +4344,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	at->tmpl = flow_hw_dr_actions_template_create(at);
 	if (!at->tmpl)
 		goto error;
+	at->action_flags = action_flags;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
@@ -4199,6 +4688,7 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t port_id = dev->data->port_id;
 	struct rte_mtr_capabilities mtr_cap;
 	int ret;
@@ -4212,6 +4702,8 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 	ret = rte_mtr_capabilities_get(port_id, &mtr_cap, NULL);
 	if (!ret)
 		port_info->max_nb_meters = mtr_cap.n_max;
+	port_info->max_nb_counters = priv->sh->hws_max_nb_counters;
+	port_info->max_nb_aging_objects = port_info->max_nb_counters;
 	return 0;
 }
 
@@ -5586,8 +6078,6 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			goto err;
 		}
 	}
-	if (_queue_attr)
-		mlx5_free(_queue_attr);
 	if (port_attr->nb_conn_tracks) {
 		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
 			   sizeof(*priv->ct_mng);
@@ -5604,13 +6094,37 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
-				nb_queue);
+							   nb_queue);
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	if (port_attr->nb_aging_objects) {
+		if (port_attr->nb_counters == 0) {
+			/*
+			 * Aging management uses counter. Number counters
+			 * requesting should take into account a counter for
+			 * each flow rules containing AGE without counter.
+			 */
+			DRV_LOG(ERR, "Port %u AGE objects are requested (%u) "
+				"without counters requesting.",
+				dev->data->port_id,
+				port_attr->nb_aging_objects);
+			rte_errno = EINVAL;
+			goto err;
+		}
+		ret = mlx5_hws_age_pool_init(dev, port_attr, nb_queue);
+		if (ret < 0)
+			goto err;
+	}
 	ret = flow_hw_create_vlan(dev);
 	if (ret)
 		goto err;
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	if (port_attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE)
+		priv->hws_strict_queue = 1;
+#endif
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -5621,6 +6135,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
+	if (priv->hws_cpool) {
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+		priv->hws_cpool = NULL;
+	}
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -5694,8 +6214,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
-	if (priv->hws_cpool)
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
+	if (priv->hws_cpool) {
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+		priv->hws_cpool = NULL;
+	}
 	if (priv->hws_ctpool) {
 		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
 		priv->hws_ctpool = NULL;
@@ -6030,13 +6554,81 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
 }
 
+/**
+ * Validate shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used.
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] conf
+ *   Indirect action configuration.
+ * @param[in] action
+ *   rte_flow action detail.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_handle_validate(struct rte_eth_dev *dev, uint32_t queue,
+			       const struct rte_flow_op_attr *attr,
+			       const struct rte_flow_indir_action_conf *conf,
+			       const struct rte_flow_action *action,
+			       void *user_data,
+			       struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	RTE_SET_USED(attr);
+	RTE_SET_USED(queue);
+	RTE_SET_USED(user_data);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		if (!priv->hws_age_req)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "aging pool not initialized");
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (!priv->hws_cpool)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "counters pool not initialized");
+		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		if (priv->hws_ctpool == NULL)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "CT pool not initialized");
+		return mlx5_validate_action_ct(dev, action->conf, error);
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		return flow_hw_validate_action_meter_mark(dev, action, error);
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		return flow_dv_action_validate(dev, conf, action, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
+	}
+	return 0;
+}
+
 /**
  * Create shared action.
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] conf
@@ -6061,16 +6653,44 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
+	uint32_t age_idx;
 
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		if (priv->hws_strict_queue) {
+			struct mlx5_age_info *info = GET_PORT_AGE_INFO(priv);
+
+			if (queue >= info->hw_q_age->nb_rings) {
+				rte_flow_error_set(error, EINVAL,
+						   RTE_FLOW_ERROR_TYPE_ACTION,
+						   NULL,
+						   "Invalid queue ID for indirect AGE.");
+				rte_errno = EINVAL;
+				return NULL;
+			}
+		}
+		age = action->conf;
+		age_idx = mlx5_hws_age_action_create(priv, queue, true, age,
+						     0, error);
+		if (age_idx == 0) {
+			rte_flow_error_set(error, ENODEV,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "AGE are not configured!");
+		} else {
+			age_idx = (MLX5_INDIRECT_ACTION_TYPE_AGE <<
+				   MLX5_INDIRECT_ACTION_TYPE_OFFSET) | age_idx;
+			handle =
+			    (struct rte_flow_action_handle *)(uintptr_t)age_idx;
+		}
+		break;
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0))
 			rte_flow_error_set(error, ENODEV,
 					RTE_FLOW_ERROR_TYPE_ACTION,
 					NULL,
@@ -6090,8 +6710,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
 		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
 		break;
-	default:
+	case RTE_FLOW_ACTION_TYPE_RSS:
 		handle = flow_dv_action_create(dev, conf, action, error);
+		break;
+	default:
+		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				   NULL, "action type not supported");
+		return NULL;
 	}
 	return handle;
 }
@@ -6102,7 +6727,7 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6125,7 +6750,6 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6140,6 +6764,8 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_update(priv, idx, update, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
@@ -6173,11 +6799,15 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		return 0;
-	default:
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
+		return flow_dv_action_update(dev, handle, update, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
-	return flow_dv_action_update(dev, handle, update, error);
+	return 0;
 }
 
 /**
@@ -6186,7 +6816,7 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6208,6 +6838,7 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -6218,7 +6849,16 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_destroy(priv, age_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
+		if (age_idx != 0)
+			/*
+			 * If this counter belongs to indirect AGE, here is the
+			 * time to update the AGE.
+			 */
+			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
@@ -6243,10 +6883,15 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
 		mlx5_ipool_free(pool->idx_pool, idx);
-		return 0;
-	default:
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_destroy(dev, handle, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
+	return 0;
 }
 
 static int
@@ -6256,13 +6901,14 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hws_cnt *cnt;
 	struct rte_flow_query_count *qc = data;
-	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint32_t iidx;
 	uint64_t pkts, bytes;
 
 	if (!mlx5_hws_cnt_id_valid(counter))
 		return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				"counter are not available");
+	iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
 	cnt = &priv->hws_cpool->pool[iidx];
 	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
 	qc->hits_set = 1;
@@ -6276,12 +6922,64 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	return 0;
 }
 
+/**
+ * Query a flow rule AGE action for aging information.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] age_idx
+ *   Index of AGE action parameter.
+ * @param[out] data
+ *   Data retrieved by the query.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_query_age(const struct rte_eth_dev *dev, uint32_t age_idx, void *data,
+		  struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+	struct rte_flow_query_age *resp = data;
+
+	if (!param || !param->timeout)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "age data not available");
+	switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+	case HWS_AGE_AGED_OUT_REPORTED:
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		resp->aged = 1;
+		break;
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		resp->aged = 0;
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * When state is FREE the flow itself should be invalid.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	resp->sec_since_last_hit_valid = !resp->aged;
+	if (resp->sec_since_last_hit_valid)
+		resp->sec_since_last_hit = __atomic_load_n
+				 (&param->sec_since_last_hit, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
-flow_hw_query(struct rte_eth_dev *dev,
-	      struct rte_flow *flow __rte_unused,
-	      const struct rte_flow_action *actions __rte_unused,
-	      void *data __rte_unused,
-	      struct rte_flow_error *error __rte_unused)
+flow_hw_query(struct rte_eth_dev *dev, struct rte_flow *flow,
+	      const struct rte_flow_action *actions, void *data,
+	      struct rte_flow_error *error)
 {
 	int ret = -EINVAL;
 	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
@@ -6292,7 +6990,11 @@ flow_hw_query(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
-						  error);
+						    error);
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			ret = flow_hw_query_age(dev, hw_flow->age_idx, data,
+						error);
 			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
@@ -6304,6 +7006,32 @@ flow_hw_query(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_indir_action_conf *conf,
+			const struct rte_flow_action *action,
+			struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_validate(dev, MLX5_HW_INV_QUEUE, NULL,
+					      conf, action, NULL, err);
+}
+
 /**
  * Create indirect action.
  *
@@ -6393,17 +7121,118 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return flow_hw_query_age(dev, age_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	default:
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_query(dev, handle, data, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
 }
 
+/**
+ * Get aged-out flows of a given port on the given HWS flow queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query. Ignored when RTE_FLOW_PORT_FLAG_STRICT_QUEUE not set.
+ * @param[in, out] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array, otherwise negative errno value.
+ */
+static int
+flow_hw_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			 void **contexts, uint32_t nb_contexts,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct rte_ring *r;
+	int nb_flows = 0;
+
+	if (nb_contexts && !contexts)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "empty context");
+	if (priv->hws_strict_queue) {
+		if (queue_id >= age_info->hw_q_age->nb_rings)
+			return rte_flow_error_set(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						NULL, "invalid queue id");
+		r = age_info->hw_q_age->aged_lists[queue_id];
+	} else {
+		r = age_info->hw_age.aged_list;
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	if (nb_contexts == 0)
+		return rte_ring_count(r);
+	while ((uint32_t)nb_flows < nb_contexts) {
+		uint32_t age_idx;
+
+		if (rte_ring_dequeue_elem(r, &age_idx, sizeof(uint32_t)) < 0)
+			break;
+		/* get the AGE context if the aged-out index is still valid. */
+		contexts[nb_flows] = mlx5_hws_age_context_get(priv, age_idx);
+		if (!contexts[nb_flows])
+			continue;
+		nb_flows++;
+	}
+	return nb_flows;
+}
+
+/**
+ * Get aged-out flows.
+ *
+ * This function is relevant only if RTE_FLOW_PORT_FLAG_STRICT_QUEUE isn't set.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+static int
+flow_hw_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
+		       uint32_t nb_contexts, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hws_strict_queue)
+		DRV_LOG(WARNING,
+			"port %u get aged flows called in strict queue mode.",
+			dev->data->port_id);
+	return flow_hw_get_q_aged_flows(dev, 0, contexts, nb_contexts, error);
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -6422,12 +7251,14 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
-	.action_validate = flow_dv_action_validate,
+	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
 	.action_update = flow_hw_action_update,
 	.action_query = flow_hw_action_query,
 	.query = flow_hw_query,
+	.get_aged_flows = flow_hw_get_aged_flows,
+	.get_q_aged_flows = flow_hw_get_q_aged_flows,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 7ffaf4c227..81a33ddf09 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -122,7 +122,7 @@ flow_verbs_counter_get_by_idx(struct rte_eth_dev *dev,
 			      struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	idx = (idx - 1) & (MLX5_CNT_SHARED_OFFSET - 1);
@@ -215,7 +215,7 @@ static uint32_t
 flow_verbs_counter_new(struct rte_eth_dev *dev, uint32_t id __rte_unused)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt = NULL;
 	uint32_t n_valid = cmng->n_valid;
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
index d826ebaa25..9c37700f94 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.c
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -8,6 +8,7 @@
 #include <rte_ring.h>
 #include <mlx5_devx_cmds.h>
 #include <rte_cycles.h>
+#include <rte_eal_paging.h>
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
@@ -26,8 +27,8 @@ __hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
 	uint32_t preload;
 	uint32_t q_num = cpool->cache->q_num;
 	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
-	cnt_id_t cnt_id, iidx = 0;
-	uint32_t qidx;
+	cnt_id_t cnt_id;
+	uint32_t qidx, iidx = 0;
 	struct rte_ring *qcache = NULL;
 
 	/*
@@ -86,6 +87,174 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
 	} while (reset_cnt_num > 0);
 }
 
+/**
+ * Release AGE parameter.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param own_cnt_index
+ *   Counter ID to created only for this AGE to release.
+ *   Zero means there is no such counter.
+ * @param age_ipool
+ *   Pointer to AGE parameter indexed pool.
+ * @param idx
+ *   Index of AGE parameter in the indexed pool.
+ */
+static void
+mlx5_hws_age_param_free(struct mlx5_priv *priv, cnt_id_t own_cnt_index,
+			struct mlx5_indexed_pool *age_ipool, uint32_t idx)
+{
+	if (own_cnt_index) {
+		struct mlx5_hws_cnt_pool *cpool = priv->hws_cpool;
+
+		MLX5_ASSERT(mlx5_hws_cnt_is_shared(cpool, own_cnt_index));
+		mlx5_hws_cnt_shared_put(cpool, &own_cnt_index);
+	}
+	mlx5_ipool_free(age_ipool, idx);
+}
+
+/**
+ * Check and callback event for new aged flow in the HWS counter pool.
+ *
+ * @param[in] priv
+ *   Pointer to port private object.
+ * @param[in] cpool
+ *   Pointer to current counter pool.
+ */
+static void
+mlx5_hws_aging_check(struct mlx5_priv *priv, struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct flow_counter_stats *stats = cpool->raw_mng->raw;
+	struct mlx5_hws_age_param *param;
+	struct rte_ring *r;
+	const uint64_t curr_time = MLX5_CURR_TIME_SEC;
+	const uint32_t time_delta = curr_time - cpool->time_of_last_age_check;
+	uint32_t nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(cpool);
+	uint16_t expected1 = HWS_AGE_CANDIDATE;
+	uint16_t expected2 = HWS_AGE_CANDIDATE_INSIDE_RING;
+	uint32_t i;
+
+	cpool->time_of_last_age_check = curr_time;
+	for (i = 0; i < nb_alloc_cnts; ++i) {
+		uint32_t age_idx = cpool->pool[i].age_idx;
+		uint64_t hits;
+
+		if (!cpool->pool[i].in_used || age_idx == 0)
+			continue;
+		param = mlx5_ipool_get(age_info->ages_ipool, age_idx);
+		if (unlikely(param == NULL)) {
+			/*
+			 * When AGE which used indirect counter it is user
+			 * responsibility not using this indirect counter
+			 * without this AGE.
+			 * If this counter is used after the AGE was freed, the
+			 * AGE index is invalid and using it here will cause a
+			 * segmentation fault.
+			 */
+			DRV_LOG(WARNING,
+				"Counter %u is lost his AGE, it is unused.", i);
+			continue;
+		}
+		if (param->timeout == 0)
+			continue;
+		switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+		case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		case HWS_AGE_AGED_OUT_REPORTED:
+			/* Already aged-out, no action is needed. */
+			continue;
+		case HWS_AGE_CANDIDATE:
+		case HWS_AGE_CANDIDATE_INSIDE_RING:
+			/* This AGE candidate to be aged-out, go to checking. */
+			break;
+		case HWS_AGE_FREE:
+			/*
+			 * AGE parameter with state "FREE" couldn't be pointed
+			 * by any counter since counter is destroyed first.
+			 * Fall-through.
+			 */
+		default:
+			MLX5_ASSERT(0);
+			continue;
+		}
+		hits = rte_be_to_cpu_64(stats[i].hits);
+		if (param->nb_cnts == 1) {
+			if (hits != param->accumulator_last_hits) {
+				__atomic_store_n(&param->sec_since_last_hit, 0,
+						 __ATOMIC_RELAXED);
+				param->accumulator_last_hits = hits;
+				continue;
+			}
+		} else {
+			param->accumulator_hits += hits;
+			param->accumulator_cnt++;
+			if (param->accumulator_cnt < param->nb_cnts)
+				continue;
+			param->accumulator_cnt = 0;
+			if (param->accumulator_last_hits !=
+						param->accumulator_hits) {
+				__atomic_store_n(&param->sec_since_last_hit,
+						 0, __ATOMIC_RELAXED);
+				param->accumulator_last_hits =
+							param->accumulator_hits;
+				param->accumulator_hits = 0;
+				continue;
+			}
+			param->accumulator_hits = 0;
+		}
+		if (__atomic_add_fetch(&param->sec_since_last_hit, time_delta,
+				       __ATOMIC_RELAXED) <=
+		   __atomic_load_n(&param->timeout, __ATOMIC_RELAXED))
+			continue;
+		/* Prepare the relevant ring for this AGE parameter */
+		if (priv->hws_strict_queue)
+			r = age_info->hw_q_age->aged_lists[param->queue_id];
+		else
+			r = age_info->hw_age.aged_list;
+		/* Changing the state atomically and insert it into the ring. */
+		if (__atomic_compare_exchange_n(&param->state, &expected1,
+						HWS_AGE_AGED_OUT_NOT_REPORTED,
+						false, __ATOMIC_RELAXED,
+						__ATOMIC_RELAXED)) {
+			int ret = rte_ring_enqueue_burst_elem(r, &age_idx,
+							      sizeof(uint32_t),
+							      1, NULL);
+
+			/*
+			 * The ring doesn't have enough room for this entry,
+			 * it replace back the state for the next second.
+			 *
+			 * FIXME: if until next sec it get traffic, we are going
+			 *        to lose this "aged out", will be fixed later
+			 *        when optimise it to fill ring in bulks.
+			 */
+			expected2 = HWS_AGE_AGED_OUT_NOT_REPORTED;
+			if (ret == 0 &&
+			    !__atomic_compare_exchange_n(&param->state,
+							 &expected2, expected1,
+							 false,
+							 __ATOMIC_RELAXED,
+							 __ATOMIC_RELAXED) &&
+			    expected2 == HWS_AGE_FREE)
+				mlx5_hws_age_param_free(priv,
+							param->own_cnt_index,
+							age_info->ages_ipool,
+							age_idx);
+			/* The event is irrelevant in strict queue mode. */
+			if (!priv->hws_strict_queue)
+				MLX5_AGE_SET(age_info, MLX5_AGE_EVENT_NEW);
+		} else {
+			__atomic_compare_exchange_n(&param->state, &expected2,
+						  HWS_AGE_AGED_OUT_NOT_REPORTED,
+						  false, __ATOMIC_RELAXED,
+						  __ATOMIC_RELAXED);
+		}
+	}
+	/* The event is irrelevant in strict queue mode. */
+	if (!priv->hws_strict_queue)
+		mlx5_age_event_prepare(priv->sh);
+}
+
 static void
 mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
 			   struct mlx5_hws_cnt_raw_data_mng *mng)
@@ -104,12 +273,14 @@ mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
 	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
 	int ret;
 	size_t sz = n * sizeof(struct flow_counter_stats);
+	size_t pgsz = rte_mem_page_size();
 
+	MLX5_ASSERT(pgsz > 0);
 	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
 			SOCKET_ID_ANY);
 	if (mng == NULL)
 		goto error;
-	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, pgsz,
 			SOCKET_ID_ANY);
 	if (mng->raw == NULL)
 		goto error;
@@ -146,6 +317,9 @@ mlx5_hws_cnt_svc(void *opaque)
 			    opriv->sh == sh &&
 			    opriv->hws_cpool != NULL) {
 				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+				if (opriv->hws_age_req)
+					mlx5_hws_aging_check(opriv,
+							     opriv->hws_cpool);
 			}
 		}
 		query_cycle = rte_rdtsc() - start_cycle;
@@ -158,8 +332,9 @@ mlx5_hws_cnt_svc(void *opaque)
 }
 
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg)
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct mlx5_hws_cnt_pool *cntp;
@@ -185,16 +360,26 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 	cntp->cache->preload_sz = ccfg->preload_sz;
 	cntp->cache->threshold = ccfg->threshold;
 	cntp->cache->q_num = ccfg->q_num;
+	if (pcfg->request_num > sh->hws_max_nb_counters) {
+		DRV_LOG(ERR, "Counter number %u "
+			"is greater than the maximum supported (%u).",
+			pcfg->request_num, sh->hws_max_nb_counters);
+		goto error;
+	}
 	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
 	if (cnt_num > UINT32_MAX) {
 		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
 			cnt_num);
 		goto error;
 	}
+	/*
+	 * When counter request number is supported, but the factor takes it
+	 * out of size, the factor is reduced.
+	 */
+	cnt_num = RTE_MIN((uint32_t)cnt_num, sh->hws_max_nb_counters);
 	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
-			sizeof(struct mlx5_hws_cnt) *
-			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
-			0, SOCKET_ID_ANY);
+				 sizeof(struct mlx5_hws_cnt) * cnt_num,
+				 0, SOCKET_ID_ANY);
 	if (cntp->pool == NULL)
 		goto error;
 	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
@@ -231,6 +416,8 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 		if (cntp->cache->qcache[qidx] == NULL)
 			goto error;
 	}
+	/* Initialize the time for aging-out calculation. */
+	cntp->time_of_last_age_check = MLX5_CURR_TIME_SEC;
 	return cntp;
 error:
 	mlx5_hws_cnt_pool_deinit(cntp);
@@ -297,19 +484,17 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_hws_cnt_pool *cpool)
 {
 	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
-	uint32_t max_log_bulk_sz = 0;
+	uint32_t max_log_bulk_sz = sh->hws_max_log_bulk_sz;
 	uint32_t log_bulk_sz;
-	uint32_t idx, alloced = 0;
+	uint32_t idx, alloc_candidate, alloced = 0;
 	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
 	struct mlx5_devx_counter_attr attr = {0};
 	struct mlx5_devx_obj *dcs;
 
 	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
-		DRV_LOG(ERR,
-			"Fw doesn't support bulk log max alloc");
+		DRV_LOG(ERR, "Fw doesn't support bulk log max alloc");
 		return -1;
 	}
-	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
 	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
 	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
 	attr.pd = sh->cdev->pdn;
@@ -327,18 +512,23 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 	cpool->dcs_mng.dcs[0].iidx = 0;
 	alloced = cpool->dcs_mng.dcs[0].batch_sz;
 	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
-		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+		while (idx < MLX5_HWS_CNT_DCS_NUM) {
 			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			alloc_candidate = RTE_BIT32(max_log_bulk_sz);
+			if (alloced + alloc_candidate > sh->hws_max_nb_counters)
+				continue;
 			dcs = mlx5_devx_cmd_flow_counter_alloc_general
 				(sh->cdev->ctx, &attr);
 			if (dcs == NULL)
 				goto error;
 			cpool->dcs_mng.dcs[idx].obj = dcs;
-			cpool->dcs_mng.dcs[idx].batch_sz =
-				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].batch_sz = alloc_candidate;
 			cpool->dcs_mng.dcs[idx].iidx = alloced;
 			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
 			cpool->dcs_mng.batch_total++;
+			if (alloced >= cnt_num)
+				break;
+			idx++;
 		}
 	}
 	return 0;
@@ -445,7 +635,7 @@ mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
 			dev->data->port_id);
 	pcfg.name = mp_name;
 	pcfg.request_num = pattr->nb_counters;
-	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	cpool = mlx5_hws_cnt_pool_init(priv->sh, &pcfg, &cparam);
 	if (cpool == NULL)
 		goto error;
 	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
@@ -525,4 +715,533 @@ mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
 	sh->cnt_svc = NULL;
 }
 
+/**
+ * Destroy AGE action.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ * @param error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	switch (__atomic_exchange_n(&param->state, HWS_AGE_FREE,
+				    __ATOMIC_RELAXED)) {
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_AGED_OUT_REPORTED:
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		/*
+		 * In both cases AGE is inside the ring. Change the state here
+		 * and destroy it later when it is taken out of ring.
+		 */
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * If index is valid and state is FREE, it says this AGE has
+		 * been freed for the user but not for the PMD since it is
+		 * inside the ring.
+		 */
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "this AGE has already been released");
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return 0;
+}
+
+/**
+ * Create AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue_id
+ *   Which HWS queue to be used.
+ * @param[in] shared
+ *   Whether it indirect AGE action.
+ * @param[in] flow_idx
+ *   Flow index from indexed pool.
+ *   For indirect AGE action it doesn't affect.
+ * @param[in] age
+ *   Pointer to the aging action configuration.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Index to AGE action parameter on success, 0 otherwise.
+ */
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param;
+	uint32_t age_idx;
+
+	param = mlx5_ipool_malloc(ipool, &age_idx);
+	if (param == NULL) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   "cannot allocate AGE parameter");
+		return 0;
+	}
+	MLX5_ASSERT(__atomic_load_n(&param->state,
+				    __ATOMIC_RELAXED) == HWS_AGE_FREE);
+	if (shared) {
+		param->nb_cnts = 0;
+		param->accumulator_hits = 0;
+		param->accumulator_cnt = 0;
+		flow_idx = age_idx;
+	} else {
+		param->nb_cnts = 1;
+	}
+	param->context = age->context ? age->context :
+					(void *)(uintptr_t)flow_idx;
+	param->timeout = age->timeout;
+	param->queue_id = queue_id;
+	param->accumulator_last_hits = 0;
+	param->own_cnt_index = 0;
+	param->sec_since_last_hit = 0;
+	param->state = HWS_AGE_CANDIDATE;
+	return age_idx;
+}
+
+/**
+ * Update indirect AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] idx
+ *   Index of AGE parameter.
+ * @param[in] update
+ *   Update value.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error)
+{
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	const struct rte_flow_update_age *update_ade = update;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	bool sec_since_last_hit_reset = false;
+	bool state_update = false;
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	if (update_ade->timeout_valid) {
+		uint32_t old_timeout = __atomic_exchange_n(&param->timeout,
+							   update_ade->timeout,
+							   __ATOMIC_RELAXED);
+
+		if (old_timeout == 0)
+			sec_since_last_hit_reset = true;
+		else if (old_timeout < update_ade->timeout ||
+			 update_ade->timeout == 0)
+			/*
+			 * When timeout is increased, aged-out flows might be
+			 * active again and state should be updated accordingly.
+			 * When new timeout is 0, we update the state for not
+			 * reporting aged-out stopped.
+			 */
+			state_update = true;
+	}
+	if (update_ade->touch) {
+		sec_since_last_hit_reset = true;
+		state_update = true;
+	}
+	if (sec_since_last_hit_reset)
+		__atomic_store_n(&param->sec_since_last_hit, 0,
+				 __ATOMIC_RELAXED);
+	if (state_update) {
+		uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+		/*
+		 * Change states of aged-out flows to active:
+		 *  - AGED_OUT_NOT_REPORTED -> CANDIDATE_INSIDE_RING
+		 *  - AGED_OUT_REPORTED -> CANDIDATE
+		 */
+		if (!__atomic_compare_exchange_n(&param->state, &expected,
+						 HWS_AGE_CANDIDATE_INSIDE_RING,
+						 false, __ATOMIC_RELAXED,
+						 __ATOMIC_RELAXED) &&
+		    expected == HWS_AGE_AGED_OUT_REPORTED)
+			__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+					 __ATOMIC_RELAXED);
+	}
+	return 0;
+#else
+	RTE_SET_USED(priv);
+	RTE_SET_USED(idx);
+	RTE_SET_USED(update);
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "update age action not supported");
+#endif
+}
+
+/**
+ * Get the AGE context if the aged-out index is still valid.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ *
+ * @return
+ *   AGE context if the index is still aged-out, NULL otherwise.
+ */
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+	MLX5_ASSERT(param != NULL);
+	if (__atomic_compare_exchange_n(&param->state, &expected,
+					HWS_AGE_AGED_OUT_REPORTED, false,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
+		return param->context;
+	switch (expected) {
+	case HWS_AGE_FREE:
+		/*
+		 * This AGE couldn't have been destroyed since it was inside
+		 * the ring. Its state has updated, and now it is actually
+		 * destroyed.
+		 */
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+				 __ATOMIC_RELAXED);
+		break;
+	case HWS_AGE_CANDIDATE:
+		/*
+		 * Only BG thread pushes to ring and it never pushes this state.
+		 * When AGE inside the ring becomes candidate, it has a special
+		 * state called HWS_AGE_CANDIDATE_INSIDE_RING.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_REPORTED:
+		/*
+		 * Only this thread (doing query) may write this state, and it
+		 * happens only after the query thread takes it out of the ring.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		/*
+		 * In this case the compare return true and function return
+		 * the context immediately.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return NULL;
+}
+
+#ifdef RTE_ARCH_64
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX UINT32_MAX
+#else
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX RTE_BIT32(8)
+#endif
+
+/**
+ * Get the size of aged out ring list for each queue.
+ *
+ * The size is one percent of nb_counters divided by nb_queues.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is on.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ * @param nb_queues
+ *   Number of HWS queues in this port.
+ *
+ * @return
+ *   Size of aged out ring per queue.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_q_ring_size_get(uint32_t nb_counters, uint32_t nb_queues)
+{
+	uint32_t size = rte_align32pow2((nb_counters / 100) / nb_queues);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Get the size of the aged out ring list.
+ *
+ * The size is one percent of nb_counters.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is off.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ *
+ * @return
+ *   Size of the aged out ring list.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_ring_size_get(uint32_t nb_counters)
+{
+	uint32_t size = rte_align32pow2(nb_counters / 100);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Initialize the shared aging list information per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param nb_queues
+ *   Number of HWS queues.
+ * @param strict_queue
+ *   Indicator whether is strict_queue mode.
+ * @param ring_size
+ *   Size of aged-out ring for creation.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hws_age_info_init(struct rte_eth_dev *dev, uint16_t nb_queues,
+		       bool strict_queue, uint32_t ring_size)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint32_t flags = RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct rte_ring *r = NULL;
+	uint32_t qidx;
+
+	age_info->flags = 0;
+	if (strict_queue) {
+		size_t size = sizeof(*age_info->hw_q_age) +
+			      sizeof(struct rte_ring *) * nb_queues;
+
+		age_info->hw_q_age = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+						 size, 0, SOCKET_ID_ANY);
+		if (age_info->hw_q_age == NULL)
+			return -ENOMEM;
+		for (qidx = 0; qidx < nb_queues; ++qidx) {
+			snprintf(mz_name, sizeof(mz_name),
+				 "port_%u_queue_%u_aged_out_ring",
+				 dev->data->port_id, qidx);
+			r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY,
+					    flags);
+			if (r == NULL) {
+				DRV_LOG(ERR, "\"%s\" creation failed: %s",
+					mz_name, rte_strerror(rte_errno));
+				goto error;
+			}
+			age_info->hw_q_age->aged_lists[qidx] = r;
+			DRV_LOG(DEBUG,
+				"\"%s\" is successfully created (size=%u).",
+				mz_name, ring_size);
+		}
+		age_info->hw_q_age->nb_rings = nb_queues;
+	} else {
+		snprintf(mz_name, sizeof(mz_name), "port_%u_aged_out_ring",
+			 dev->data->port_id);
+		r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY, flags);
+		if (r == NULL) {
+			DRV_LOG(ERR, "\"%s\" creation failed: %s", mz_name,
+				rte_strerror(rte_errno));
+			return -rte_errno;
+		}
+		age_info->hw_age.aged_list = r;
+		DRV_LOG(DEBUG, "\"%s\" is successfully created (size=%u).",
+			mz_name, ring_size);
+		/* In non "strict_queue" mode, initialize the event. */
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	return 0;
+error:
+	MLX5_ASSERT(strict_queue);
+	while (qidx--)
+		rte_ring_free(age_info->hw_q_age->aged_lists[qidx]);
+	rte_free(age_info->hw_q_age);
+	return -1;
+}
+
+/**
+ * Cleanup aged-out ring before destroying.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ * @param r
+ *   Pointer to aged-out ring object.
+ */
+static void
+mlx5_hws_aged_out_ring_cleanup(struct mlx5_priv *priv, struct rte_ring *r)
+{
+	int ring_size = rte_ring_count(r);
+
+	while (ring_size > 0) {
+		uint32_t age_idx = 0;
+
+		if (rte_ring_dequeue_elem(r, &age_idx, sizeof(uint32_t)) < 0)
+			break;
+		/* get the AGE context if the aged-out index is still valid. */
+		mlx5_hws_age_context_get(priv, age_idx);
+		ring_size--;
+	}
+	rte_ring_free(r);
+}
+
+/**
+ * Destroy the shared aging list information per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+static void
+mlx5_hws_age_info_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint16_t nb_queues = age_info->hw_q_age->nb_rings;
+	struct rte_ring *r;
+
+	if (priv->hws_strict_queue) {
+		uint32_t qidx;
+
+		for (qidx = 0; qidx < nb_queues; ++qidx) {
+			r = age_info->hw_q_age->aged_lists[qidx];
+			mlx5_hws_aged_out_ring_cleanup(priv, r);
+		}
+		mlx5_free(age_info->hw_q_age);
+	} else {
+		r = age_info->hw_age.aged_list;
+		mlx5_hws_aged_out_ring_cleanup(priv, r);
+	}
+}
+
+/**
+ * Initialize the aging mechanism per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param attr
+ *   Port configuration attributes.
+ * @param nb_queues
+ *   Number of HWS queues.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool_config cfg = {
+		.size =
+		      RTE_CACHE_LINE_ROUNDUP(sizeof(struct mlx5_hws_age_param)),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hws_age_pool",
+	};
+	bool strict_queue = false;
+	uint32_t nb_alloc_cnts;
+	uint32_t rsize;
+	uint32_t nb_ages_updated;
+	int ret;
+
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	strict_queue = !!(attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE);
+#endif
+	MLX5_ASSERT(priv->hws_cpool);
+	nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(priv->hws_cpool);
+	if (strict_queue) {
+		rsize = mlx5_hws_aged_out_q_ring_size_get(nb_alloc_cnts,
+							  nb_queues);
+		nb_ages_updated = rsize * nb_queues + attr->nb_aging_objects;
+	} else {
+		rsize = mlx5_hws_aged_out_ring_size_get(nb_alloc_cnts);
+		nb_ages_updated = rsize + attr->nb_aging_objects;
+	}
+	ret = mlx5_hws_age_info_init(dev, nb_queues, strict_queue, rsize);
+	if (ret < 0)
+		return ret;
+	cfg.max_idx = rte_align32pow2(nb_ages_updated);
+	if (cfg.max_idx <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = cfg.max_idx;
+	} else if (cfg.max_idx <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	age_info->ages_ipool = mlx5_ipool_create(&cfg);
+	if (age_info->ages_ipool == NULL) {
+		mlx5_hws_age_info_destroy(priv);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	priv->hws_age_req = 1;
+	return 0;
+}
+
+/**
+ * Cleanup all aging resources per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+
+	MLX5_ASSERT(priv->hws_age_req);
+	mlx5_hws_age_info_destroy(priv);
+	mlx5_ipool_destroy(age_info->ages_ipool);
+	age_info->ages_ipool = NULL;
+	priv->hws_age_req = 0;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index 5fab4ba597..e311923f71 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -10,26 +10,26 @@
 #include "mlx5_flow.h"
 
 /*
- * COUNTER ID's layout
+ * HWS COUNTER ID's layout
  *       3                   2                   1                   0
  *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- *    | T |       | D |                                               |
- *    ~ Y |       | C |                    IDX                        ~
- *    | P |       | S |                                               |
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
- *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
  *    Bit 25:24 = DCS index
  *    Bit 23:00 = IDX in this counter belonged DCS bulk.
  */
-typedef uint32_t cnt_id_t;
 
-#define MLX5_HWS_CNT_DCS_NUM 4
 #define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
 #define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
 #define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
 
+#define MLX5_HWS_AGE_IDX_MASK (RTE_BIT32(MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1)
+
 struct mlx5_hws_cnt_dcs {
 	void *dr_action;
 	uint32_t batch_sz;
@@ -44,12 +44,22 @@ struct mlx5_hws_cnt_dcs_mng {
 
 struct mlx5_hws_cnt {
 	struct flow_counter_stats reset;
+	bool in_used; /* Indicator whether this counter in used or in pool. */
 	union {
-		uint32_t share: 1;
-		/*
-		 * share will be set to 1 when this counter is used as indirect
-		 * action. Only meaningful when user own this counter.
-		 */
+		struct {
+			uint32_t share:1;
+			/*
+			 * share will be set to 1 when this counter is used as
+			 * indirect action.
+			 */
+			uint32_t age_idx:24;
+			/*
+			 * When this counter uses for aging, it save the index
+			 * of AGE parameter. For pure counter (without aging)
+			 * this index is zero.
+			 */
+		};
+		/* This struct is only meaningful when user own this counter. */
 		uint32_t query_gen_when_free;
 		/*
 		 * When PMD own this counter (user put back counter to PMD
@@ -96,8 +106,48 @@ struct mlx5_hws_cnt_pool {
 	struct rte_ring *free_list;
 	struct rte_ring *wait_reset_list;
 	struct mlx5_hws_cnt_pool_caches *cache;
+	uint64_t time_of_last_age_check;
 } __rte_cache_aligned;
 
+/* HWS AGE status. */
+enum {
+	HWS_AGE_FREE, /* Initialized state. */
+	HWS_AGE_CANDIDATE, /* AGE assigned to flows. */
+	HWS_AGE_CANDIDATE_INSIDE_RING,
+	/*
+	 * AGE assigned to flows but it still in ring. It was aged-out but the
+	 * timeout was changed, so it in ring but stiil candidate.
+	 */
+	HWS_AGE_AGED_OUT_REPORTED,
+	/*
+	 * Aged-out, reported by rte_flow_get_q_aged_flows and wait for destroy.
+	 */
+	HWS_AGE_AGED_OUT_NOT_REPORTED,
+	/*
+	 * Aged-out, inside the aged-out ring.
+	 * wait for rte_flow_get_q_aged_flows and destroy.
+	 */
+};
+
+/* HWS counter age parameter. */
+struct mlx5_hws_age_param {
+	uint32_t timeout; /* Aging timeout in seconds (atomically accessed). */
+	uint32_t sec_since_last_hit;
+	/* Time in seconds since last hit (atomically accessed). */
+	uint16_t state; /* AGE state (atomically accessed). */
+	uint64_t accumulator_last_hits;
+	/* Last total value of hits for comparing. */
+	uint64_t accumulator_hits;
+	/* Accumulator for hits coming from several counters. */
+	uint32_t accumulator_cnt;
+	/* Number counters which already updated the accumulator in this sec. */
+	uint32_t nb_cnts; /* Number counters used by this AGE. */
+	uint32_t queue_id; /* Queue id of the counter. */
+	cnt_id_t own_cnt_index;
+	/* Counter action created specifically for this AGE action. */
+	void *context; /* Flow AGE context. */
+} __rte_packed __rte_cache_aligned;
+
 /**
  * Translate counter id into internal index (start from 0), which can be used
  * as index of raw/cnt pool.
@@ -107,7 +157,7 @@ struct mlx5_hws_cnt_pool {
  * @return
  *   Internal index
  */
-static __rte_always_inline cnt_id_t
+static __rte_always_inline uint32_t
 mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 {
 	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
@@ -139,7 +189,7 @@ mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
  *   Counter id
  */
 static __rte_always_inline cnt_id_t
-mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, uint32_t iidx)
 {
 	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
 	uint32_t idx;
@@ -344,9 +394,10 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
 	struct rte_ring_zc_data zcdr = {0};
 	struct rte_ring *qcache = NULL;
 	unsigned int wb_num = 0; /* cache write-back number. */
-	cnt_id_t iidx;
+	uint32_t iidx;
 
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].in_used = false;
 	cpool->pool[iidx].query_gen_when_free =
 		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
 	if (likely(queue != NULL))
@@ -388,20 +439,23 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
  *   A pointer to HWS queue. If null, it means fetch from common pool.
  * @param cnt_id
  *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @param age_idx
+ *   Index of AGE parameter using this counter, zero means there is no such AGE.
+ *
  * @return
  *   - 0: Success; objects taken.
  *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
  *   - -EAGAIN: counter is not ready; try again.
  */
 static __rte_always_inline int
-mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
-		uint32_t *queue, cnt_id_t *cnt_id)
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue,
+		      cnt_id_t *cnt_id, uint32_t age_idx)
 {
 	unsigned int ret;
 	struct rte_ring_zc_data zcdc = {0};
 	struct rte_ring *qcache = NULL;
-	uint32_t query_gen = 0;
-	cnt_id_t iidx, tmp_cid = 0;
+	uint32_t iidx, query_gen = 0;
+	cnt_id_t tmp_cid = 0;
 
 	if (likely(queue != NULL))
 		qcache = cpool->cache->qcache[*queue];
@@ -422,6 +476,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 		__hws_cnt_query_raw(cpool, *cnt_id,
 				    &cpool->pool[iidx].reset.hits,
 				    &cpool->pool[iidx].reset.bytes);
+		cpool->pool[iidx].in_used = true;
+		cpool->pool[iidx].age_idx = age_idx;
 		return 0;
 	}
 	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
@@ -455,6 +511,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 			    &cpool->pool[iidx].reset.bytes);
 	rte_ring_dequeue_zc_elem_finish(qcache, 1);
 	cpool->pool[iidx].share = 0;
+	cpool->pool[iidx].in_used = true;
+	cpool->pool[iidx].age_idx = age_idx;
 	return 0;
 }
 
@@ -478,16 +536,16 @@ mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
 }
 
 static __rte_always_inline int
-mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id,
+			uint32_t age_idx)
 {
 	int ret;
 	uint32_t iidx;
 
-	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id, age_idx);
 	if (ret != 0)
 		return ret;
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
-	MLX5_ASSERT(cpool->pool[iidx].share == 0);
 	cpool->pool[iidx].share = 1;
 	return 0;
 }
@@ -513,10 +571,73 @@ mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 	return cpool->pool[iidx].share ? true : false;
 }
 
+static __rte_always_inline void
+mlx5_hws_cnt_age_set(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		     uint32_t age_idx)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	cpool->pool[iidx].age_idx = age_idx;
+}
+
+static __rte_always_inline uint32_t
+mlx5_hws_cnt_age_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	return cpool->pool[iidx].age_idx;
+}
+
+static __rte_always_inline cnt_id_t
+mlx5_hws_age_cnt_get(struct mlx5_priv *priv, struct mlx5_hws_age_param *param,
+		     uint32_t age_idx)
+{
+	if (!param->own_cnt_index) {
+		/* Create indirect counter one for internal usage. */
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool,
+					    &param->own_cnt_index, age_idx) < 0)
+			return 0;
+		param->nb_cnts++;
+	}
+	return param->own_cnt_index;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_increase(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	MLX5_ASSERT(param != NULL);
+	param->nb_cnts++;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_decrease(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	if (param != NULL)
+		param->nb_cnts--;
+}
+
+static __rte_always_inline bool
+mlx5_hws_age_is_indirect(uint32_t age_idx)
+{
+	return (age_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_AGE ? true : false;
+}
+
 /* init HWS counter pool. */
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg);
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg);
 
 void
 mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
@@ -555,4 +676,28 @@ mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
 void
 mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
 
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error);
+
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error);
+
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error);
+
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx);
+
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues);
+
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv);
+
 #endif /* _MLX5_HWS_CNT_H_ */
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index 254c879d1a..82e8298781 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -170,6 +170,14 @@ struct mlx5_l3t_tbl {
 typedef int32_t (*mlx5_l3t_alloc_callback_fn)(void *ctx,
 					   union mlx5_l3t_data *data);
 
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_HW_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_HW_IPOOL_CACHE_MIN (1 << 9)
+
 /*
  * The indexed memory entry index is made up of trunk index and offset of
  * the entry in the trunk. Since the entry index is 32 bits, in case user
@@ -207,7 +215,7 @@ struct mlx5_indexed_pool_config {
 	 */
 	uint32_t need_lock:1;
 	/* Lock is needed for multiple thread usage. */
-	uint32_t release_mem_en:1; /* Rlease trunk when it is free. */
+	uint32_t release_mem_en:1; /* Release trunk when it is free. */
 	uint32_t max_idx; /* The maximum index can be allocated. */
 	uint32_t per_core_cache;
 	/*
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 14/18] net/mlx5: add async action push and pull support
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (12 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 13/18] net/mlx5: add HWS AGE action support Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

The queue based rte_flow_async_action_* functions work same as
queue based async flow functions. The operations can be pushed
asynchronously, so is the pull.

This commit adds the async action missing push and pull support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |  62 ++++-
 drivers/net/mlx5/mlx5_flow.c       |  45 ++++
 drivers/net/mlx5/mlx5_flow.h       |  17 ++
 drivers/net/mlx5/mlx5_flow_aso.c   | 181 +++++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c    |   7 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 412 +++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c |   6 +-
 7 files changed, 626 insertions(+), 104 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 09ab7a080a..5195529267 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -346,6 +346,8 @@ struct mlx5_lb_ctx {
 enum {
 	MLX5_HW_Q_JOB_TYPE_CREATE, /* Flow create job type. */
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
+	MLX5_HW_Q_JOB_TYPE_UPDATE,
+	MLX5_HW_Q_JOB_TYPE_QUERY,
 };
 
 #define MLX5_HW_MAX_ITEMS (16)
@@ -353,12 +355,23 @@ enum {
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
-	struct rte_flow_hw *flow; /* Flow attached to the job. */
+	union {
+		struct rte_flow_hw *flow; /* Flow attached to the job. */
+		const void *action; /* Indirect action attached to the job. */
+	};
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
 	struct rte_flow_item *items;
-	struct rte_flow_item_ethdev port_spec;
+	union {
+		struct {
+			/* Pointer to ct query user memory. */
+			struct rte_flow_action_conntrack *profile;
+			/* Pointer to ct ASO query out memory. */
+			void *out_data;
+		} __rte_packed;
+		struct rte_flow_item_ethdev port_spec;
+	} __rte_packed;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -366,6 +379,8 @@ struct mlx5_hw_q {
 	uint32_t job_idx; /* Free job index. */
 	uint32_t size; /* LIFO size. */
 	struct mlx5_hw_q_job **job; /* LIFO header. */
+	struct rte_ring *indir_cq; /* Indirect action SW completion queue. */
+	struct rte_ring *indir_iq; /* Indirect action SW in progress queue. */
 } __rte_cache_aligned;
 
 
@@ -574,6 +589,7 @@ struct mlx5_aso_sq_elem {
 			struct mlx5_aso_ct_action *ct;
 			char *query_data;
 		};
+		void *user_data;
 	};
 };
 
@@ -583,7 +599,9 @@ struct mlx5_aso_sq {
 	struct mlx5_aso_cq cq;
 	struct mlx5_devx_sq sq_obj;
 	struct mlx5_pmd_mr mr;
+	volatile struct mlx5_aso_wqe *db;
 	uint16_t pi;
+	uint16_t db_pi;
 	uint32_t head;
 	uint32_t tail;
 	uint32_t sqn;
@@ -998,6 +1016,7 @@ struct mlx5_flow_meter_profile {
 enum mlx5_aso_mtr_state {
 	ASO_METER_FREE, /* In free list. */
 	ASO_METER_WAIT, /* ACCESS_ASO WQE in progress. */
+	ASO_METER_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_METER_READY, /* CQE received. */
 };
 
@@ -1200,6 +1219,7 @@ struct mlx5_bond_info {
 enum mlx5_aso_ct_state {
 	ASO_CONNTRACK_FREE, /* Inactive, in the free list. */
 	ASO_CONNTRACK_WAIT, /* WQE sent in the SQ. */
+	ASO_CONNTRACK_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_CONNTRACK_READY, /* CQE received w/o error. */
 	ASO_CONNTRACK_QUERY, /* WQE for query sent. */
 	ASO_CONNTRACK_MAX, /* Guard. */
@@ -1208,13 +1228,21 @@ enum mlx5_aso_ct_state {
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
 	union {
-		LIST_ENTRY(mlx5_aso_ct_action) next;
-		/* Pointer to the next ASO CT. Used only in SWS. */
-		struct mlx5_aso_ct_pool *pool;
-		/* Pointer to action pool. Used only in HWS. */
+		/* SWS mode struct. */
+		struct {
+			/* Pointer to the next ASO CT. Used only in SWS. */
+			LIST_ENTRY(mlx5_aso_ct_action) next;
+		};
+		/* HWS mode struct. */
+		struct {
+			/* Pointer to action pool. Used only in HWS. */
+			struct mlx5_aso_ct_pool *pool;
+		};
 	};
-	void *dr_action_orig; /* General action object for original dir. */
-	void *dr_action_rply; /* General action object for reply dir. */
+	/* General action object for original dir. */
+	void *dr_action_orig;
+	/* General action object for reply dir. */
+	void *dr_action_rply;
 	uint32_t refcnt; /* Action used count in device flows. */
 	uint16_t offset; /* Offset of ASO CT in DevX objects bulk. */
 	uint16_t peer; /* The only peer port index could also use this CT. */
@@ -2142,18 +2170,21 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 			   enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
-				 struct mlx5_aso_mtr *mtr,
-				 struct mlx5_mtr_bulk *bulk);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk,
+		void *user_data, bool push);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile);
+			      const struct rte_flow_action_conntrack *profile,
+			      void *user_data,
+			      bool push);
 int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
 int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
-			     struct rte_flow_action_conntrack *profile);
+			     struct rte_flow_action_conntrack *profile,
+			     void *user_data, bool push);
 int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
@@ -2161,6 +2192,13 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+void mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
+			     char *wdata);
+void mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_sq *sq);
+int mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			     struct rte_flow_op_result res[],
+			     uint16_t n_res);
 int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c32255a3f9..b11957f8ee 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -981,6 +981,14 @@ mlx5_flow_async_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				  void *user_data,
 				  struct rte_flow_error *error);
 
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				 const struct rte_flow_op_attr *attr,
+				 const struct rte_flow_action_handle *handle,
+				 void *data,
+				 void *user_data,
+				 struct rte_flow_error *error);
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -1019,6 +1027,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.push = mlx5_flow_push,
 	.async_action_handle_create = mlx5_flow_async_action_handle_create,
 	.async_action_handle_update = mlx5_flow_async_action_handle_update,
+	.async_action_handle_query = mlx5_flow_async_action_handle_query,
 	.async_action_handle_destroy = mlx5_flow_async_action_handle_destroy,
 };
 
@@ -8862,6 +8871,42 @@ mlx5_flow_async_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 					 update, user_data, error);
 }
 
+/**
+ * Query shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used..
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] handle
+ *   Action handle to be updated.
+ * @param[in] data
+ *   Pointer query result data.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				    const struct rte_flow_op_attr *attr,
+				    const struct rte_flow_action_handle *handle,
+				    void *data,
+				    void *user_data,
+				    struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops =
+			flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+
+	return fops->async_action_query(dev, queue, attr, handle,
+					data, user_data, error);
+}
+
 /**
  * Destroy shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 5c57f51706..57cebb5ce6 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -57,6 +57,13 @@ enum mlx5_rte_flow_field_id {
 
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
+#define MLX5_INDIRECT_ACTION_TYPE_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) >> MLX5_INDIRECT_ACTION_TYPE_OFFSET)
+
+#define MLX5_INDIRECT_ACTION_IDX_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) & \
+	 ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1))
+
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
@@ -1826,6 +1833,15 @@ typedef int (*mlx5_flow_async_action_handle_update_t)
 			 void *user_data,
 			 struct rte_flow_error *error);
 
+typedef int (*mlx5_flow_async_action_handle_query_t)
+			(struct rte_eth_dev *dev,
+			 uint32_t queue,
+			 const struct rte_flow_op_attr *attr,
+			 const struct rte_flow_action_handle *handle,
+			 void *data,
+			 void *user_data,
+			 struct rte_flow_error *error);
+
 typedef int (*mlx5_flow_async_action_handle_destroy_t)
 			(struct rte_eth_dev *dev,
 			 uint32_t queue,
@@ -1888,6 +1904,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_push_t push;
 	mlx5_flow_async_action_handle_create_t async_action_create;
 	mlx5_flow_async_action_handle_update_t async_action_update;
+	mlx5_flow_async_action_handle_query_t async_action_query;
 	mlx5_flow_async_action_handle_destroy_t async_action_destroy;
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index a5f58301eb..1ddf71e44e 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -519,6 +519,70 @@ mlx5_aso_cqe_err_handle(struct mlx5_aso_sq *sq)
 			       (volatile uint32_t *)&sq->sq_obj.aso_wqes[idx]);
 }
 
+int
+mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			 struct rte_flow_op_result res[],
+			 uint16_t n_res)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const uint32_t cq_size = 1 << cq->log_desc_n;
+	const uint32_t mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx;
+	uint16_t max;
+	uint16_t n = 0;
+	int ret;
+
+	max = (uint16_t)(sq->head - sq->tail);
+	if (unlikely(!max || !n_res))
+		return 0;
+	next_idx = cq->cq_ci & mask;
+	do {
+		idx = next_idx;
+		next_idx = (cq->cq_ci + 1) & mask;
+		/* Need to confirm the position of the prefetch. */
+		rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+		cqe = &cq->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, cq->cq_ci);
+		/*
+		 * Be sure owner read is done before any other cookie field or
+		 * opaque field.
+		 */
+		rte_io_rmb();
+		if (ret == MLX5_CQE_STATUS_HW_OWN)
+			break;
+		res[n].user_data = sq->elts[(uint16_t)((sq->tail + n) & mask)].user_data;
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			mlx5_aso_cqe_err_handle(sq);
+			res[n].status = RTE_FLOW_OP_ERROR;
+		} else {
+			res[n].status = RTE_FLOW_OP_SUCCESS;
+		}
+		cq->cq_ci++;
+		if (++n == n_res)
+			break;
+	} while (1);
+	if (likely(n)) {
+		sq->tail += n;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return n;
+}
+
+void
+mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		  struct mlx5_aso_sq *sq)
+{
+	if (sq->db_pi == sq->pi)
+		return;
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)sq->db,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	sq->db_pi = sq->pi;
+}
+
 /**
  * Update ASO objects upon completion.
  *
@@ -728,7 +792,9 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
 			       struct mlx5_mtr_bulk *bulk,
-				   bool need_lock)
+			       bool need_lock,
+			       void *user_data,
+			       bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -754,7 +820,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
-	sq->elts[sq->head & mask].mtr = aso_mtr;
+	sq->elts[sq->head & mask].mtr = user_data ? user_data : aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
 		if (likely(sh->config.dv_flow_en == 2))
 			pool = aso_mtr->pool;
@@ -820,9 +886,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	 */
 	sq->head++;
 	sq->pi += 2;/* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -912,11 +982,14 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
-			struct mlx5_mtr_bulk *bulk)
+			struct mlx5_mtr_bulk *bulk,
+			void *user_data,
+			bool push)
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 	bool need_lock;
+	int ret;
 
 	if (likely(sh->config.dv_flow_en == 2) &&
 	    mtr->type == ASO_METER_INDIRECT) {
@@ -931,10 +1004,15 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						     need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
-						   bulk, need_lock))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						   need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -963,6 +1041,7 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	uint8_t state;
 	bool need_lock;
 
 	if (likely(sh->config.dv_flow_en == 2) &&
@@ -978,8 +1057,8 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
-	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
-					    ASO_METER_READY)
+	state = __atomic_load_n(&mtr->state, __ATOMIC_RELAXED);
+	if (state == ASO_METER_READY || state == ASO_METER_WAIT_ASYNC)
 		return 0;
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
@@ -1095,7 +1174,9 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile,
-			      bool need_lock)
+			      bool need_lock,
+			      void *user_data,
+			      bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1119,10 +1200,16 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
-	sq->elts[sq->head & mask].ct = ct;
-	sq->elts[sq->head & mask].query_data = NULL;
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_WAIT);
+	if (user_data) {
+		sq->elts[sq->head & mask].user_data = user_data;
+	} else {
+		sq->elts[sq->head & mask].ct = ct;
+		sq->elts[sq->head & mask].query_data = NULL;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
+
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1202,9 +1289,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 		 profile->reply_dir.max_ack);
 	sq->head++;
 	sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1260,7 +1351,9 @@ static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_aso_sq *sq,
 			    struct mlx5_aso_ct_action *ct, char *data,
-			    bool need_lock)
+			    bool need_lock,
+			    void *user_data,
+			    bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1286,14 +1379,23 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_QUERY);
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_QUERY);
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	/* Confirm the location and address of the prefetch instruction. */
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	wqe_idx = sq->head & mask;
-	sq->elts[wqe_idx].ct = ct;
-	sq->elts[wqe_idx].query_data = data;
+	/* Check if this is async mode. */
+	if (user_data) {
+		struct mlx5_hw_q_job *job = (struct mlx5_hw_q_job *)user_data;
+
+		sq->elts[wqe_idx].ct = user_data;
+		job->out_data = (char *)((uintptr_t)sq->mr.addr + wqe_idx * 64);
+	} else {
+		sq->elts[wqe_idx].query_data = data;
+		sq->elts[wqe_idx].ct = ct;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
@@ -1319,9 +1421,13 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	 * data segment is not used in this case.
 	 */
 	sq->pi += 2;
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1407,20 +1513,29 @@ int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
-			  const struct rte_flow_action_conntrack *profile)
+			  const struct rte_flow_action_conntrack *profile,
+			  void *user_data,
+			  bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
 	struct mlx5_aso_sq *sq;
 	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
+	int ret;
 
 	if (sh->config.dv_flow_en == 2)
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						    need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
-		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
+		mlx5_aso_ct_completion_handle(sh, sq,  need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						  need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
@@ -1480,7 +1595,7 @@ mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
  * @param[in] wdata
  *   Pointer to data fetched from hardware.
  */
-static inline void
+void
 mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
 			char *wdata)
 {
@@ -1564,7 +1679,8 @@ int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
-			 struct rte_flow_action_conntrack *profile)
+			 struct rte_flow_action_conntrack *profile,
+			 void *user_data, bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
@@ -1577,9 +1693,15 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+						  need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+				need_lock, NULL, true);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1630,7 +1752,8 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		rte_errno = ENXIO;
 		return -rte_errno;
 	} else if (state == ASO_CONNTRACK_READY ||
-		   state == ASO_CONNTRACK_QUERY) {
+		   state == ASO_CONNTRACK_QUERY ||
+		   state == ASO_CONNTRACK_WAIT_ASYNC) {
 		return 0;
 	}
 	do {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 250f61d46f..3cc4b9bcd4 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -13103,7 +13103,7 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro, NULL, true)) {
 		flow_dv_aso_ct_dev_release(dev, idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -15917,7 +15917,7 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		if (ret)
 			return ret;
 		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						ct, new_prf);
+						ct, new_prf, NULL, true);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16753,7 +16753,8 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct,
+					data, NULL, true))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 59d9db04d3..2792a0fc39 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -1178,9 +1178,9 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 }
 
 static __rte_always_inline struct mlx5_aso_mtr *
-flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
-			   const struct rte_flow_action *action,
-			   uint32_t queue)
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action *action,
+			 void *user_data, bool push)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1200,13 +1200,14 @@ flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
 	fm->is_enable = meter_mark->state;
 	fm->color_aware = meter_mark->color_mode;
 	aso_mtr->pool = pool;
-	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->state = (queue == MLX5_HW_INV_QUEUE) ?
+			  ASO_METER_WAIT : ASO_METER_WAIT_ASYNC;
 	aso_mtr->offset = mtr_id - 1;
 	aso_mtr->init_color = (meter_mark->color_mode) ?
 		meter_mark->init_color : RTE_COLOR_GREEN;
 	/* Update ASO flow meter by wqe. */
 	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-					 &priv->mtr_bulk)) {
+					 &priv->mtr_bulk, user_data, push)) {
 		mlx5_ipool_free(pool->idx_pool, mtr_id);
 		return NULL;
 	}
@@ -1231,7 +1232,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_aso_mtr *aso_mtr;
 
-	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, NULL, true);
 	if (!aso_mtr)
 		return -1;
 
@@ -2295,9 +2296,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				rte_col_2_mlx5_col(aso_mtr->init_color);
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/*
+			 * Allocate meter directly will slow down flow
+			 * insertion rate.
+			 */
 			ret = flow_hw_meter_mark_compile(dev,
 				act_data->action_dst, action,
-				rule_acts, &job->flow->mtr_id, queue);
+				rule_acts, &job->flow->mtr_id, MLX5_HW_INV_QUEUE);
 			if (ret != 0)
 				return ret;
 			break;
@@ -2604,6 +2609,74 @@ flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
 	}
 }
 
+static inline int
+__flow_hw_pull_indir_action_comp(struct rte_eth_dev *dev,
+				 uint32_t queue,
+				 struct rte_flow_op_result res[],
+				 uint16_t n_res)
+
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *r = priv->hw_q[queue].indir_cq;
+	struct mlx5_hw_q_job *job;
+	void *user_data = NULL;
+	uint32_t type, idx;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_aso_ct_action *aso_ct;
+	int ret_comp, i;
+
+	ret_comp = (int)rte_ring_count(r);
+	if (ret_comp > n_res)
+		ret_comp = n_res;
+	for (i = 0; i < ret_comp; i++) {
+		rte_ring_dequeue(r, &user_data);
+		res[i].user_data = user_data;
+		res[i].status = RTE_FLOW_OP_SUCCESS;
+	}
+	if (ret_comp < n_res && priv->hws_mpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->hws_mpool->sq[queue],
+				&res[ret_comp], n_res - ret_comp);
+	if (ret_comp < n_res && priv->hws_ctpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->ct_mng->aso_sqs[queue],
+				&res[ret_comp], n_res - ret_comp);
+	for (i = 0; i <  ret_comp; i++) {
+		job = (struct mlx5_hw_q_job *)res[i].user_data;
+		/* Restore user data. */
+		res[i].user_data = job->user_data;
+		if (job->type == MLX5_HW_Q_JOB_TYPE_DESTROY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				mlx5_ipool_free(priv->hws_mpool->idx_pool, idx);
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_CREATE) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				aso_mtr = mlx5_ipool_get(priv->hws_mpool->idx_pool, idx);
+				aso_mtr->state = ASO_METER_READY;
+			} else if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_QUERY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				mlx5_aso_ct_obj_analyze(job->profile,
+							job->out_data);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		}
+		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
+	}
+	return ret_comp;
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2636,6 +2709,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
+	/* 1. Pull the flow completion. */
 	ret = mlx5dr_send_queue_poll(priv->dr_ctx, queue, res, n_res);
 	if (ret < 0)
 		return rte_flow_error_set(error, rte_errno,
@@ -2661,9 +2735,34 @@ flow_hw_pull(struct rte_eth_dev *dev,
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
 	}
+	/* 2. Pull indirect action comp. */
+	if (ret < n_res)
+		ret += __flow_hw_pull_indir_action_comp(dev, queue, &res[ret],
+							n_res - ret);
 	return ret;
 }
 
+static inline void
+__flow_hw_push_action(struct rte_eth_dev *dev,
+		    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *iq = priv->hw_q[queue].indir_iq;
+	struct rte_ring *cq = priv->hw_q[queue].indir_cq;
+	void *job = NULL;
+	uint32_t ret, i;
+
+	ret = rte_ring_count(iq);
+	for (i = 0; i < ret; i++) {
+		rte_ring_dequeue(iq, &job);
+		rte_ring_enqueue(cq, job);
+	}
+	if (priv->hws_ctpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->ct_mng->aso_sqs[queue]);
+	if (priv->hws_mpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->hws_mpool->sq[queue]);
+}
+
 /**
  * Push the enqueued flows to HW.
  *
@@ -2687,6 +2786,7 @@ flow_hw_push(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret;
 
+	__flow_hw_push_action(dev, queue);
 	ret = mlx5dr_send_queue_action(priv->dr_ctx, queue,
 				       MLX5DR_SEND_QUEUE_ACTION_DRAIN);
 	if (ret) {
@@ -5940,7 +6040,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* Adds one queue to be used by PMD.
 	 * The last queue will be used by the PMD.
 	 */
-	uint16_t nb_q_updated;
+	uint16_t nb_q_updated = 0;
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
@@ -6007,6 +6107,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		goto err;
 	}
 	for (i = 0; i < nb_q_updated; i++) {
+		char mz_name[RTE_MEMZONE_NAMESIZE];
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 		struct rte_flow_item *items = NULL;
@@ -6034,6 +6135,22 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_cq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_cq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_cq)
+			goto err;
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_iq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_iq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_iq)
+			goto err;
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
 	dr_ctx_attr.queues = nb_q_updated;
@@ -6151,6 +6268,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
+	for (i = 0; i < nb_q_updated; i++) {
+		if (priv->hw_q[i].indir_iq)
+			rte_ring_free(priv->hw_q[i].indir_iq);
+		if (priv->hw_q[i].indir_cq)
+			rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	if (priv->acts_ipool) {
@@ -6180,7 +6303,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i;
+	uint32_t i;
 
 	if (!priv->dr_ctx)
 		return;
@@ -6228,6 +6351,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	for (i = 0; i < priv->nb_queue; i++) {
+		rte_ring_free(priv->hw_q[i].indir_iq);
+		rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -6416,8 +6543,9 @@ flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
 }
 
 static int
-flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t queue, uint32_t idx,
 			struct rte_flow_action_conntrack *profile,
+			void *user_data, bool push,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6441,7 +6569,7 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 	}
 	profile->peer_port = ct->peer;
 	profile->is_original_dir = ct->is_original;
-	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, queue, ct, profile, user_data, push))
 		return rte_flow_error_set(error, EIO,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -6453,7 +6581,8 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 static int
 flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_modify_conntrack *action_conf,
-			 uint32_t idx, struct rte_flow_error *error)
+			 uint32_t idx, void *user_data, bool push,
+			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
@@ -6484,7 +6613,8 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf,
+						user_data, push);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -6506,6 +6636,7 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 static struct rte_flow_action_handle *
 flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_action_conntrack *pro,
+			 void *user_data, bool push,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6532,7 +6663,7 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	ct->pool = pool;
-	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro, user_data, push)) {
 		mlx5_ipool_free(pool->cts, ct_idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -6652,15 +6783,29 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     struct rte_flow_error *error)
 {
 	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 	uint32_t age_idx;
+	bool push = true;
+	bool aso = false;
 
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx)) {
+			rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Flow queue full.");
+			return NULL;
+		}
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_CREATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (action->type) {
 	case RTE_FLOW_ACTION_TYPE_AGE:
 		if (priv->hws_strict_queue) {
@@ -6700,10 +6845,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 				 (uintptr_t)cnt_id;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
-		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		aso = true;
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, job,
+						  push, error);
 		break;
 	case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		aso = true;
+		aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, job, push);
 		if (!aso_mtr)
 			break;
 		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
@@ -6716,7 +6864,20 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	default:
 		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				   NULL, "action type not supported");
-		return NULL;
+		break;
+	}
+	if (job) {
+		if (!handle) {
+			priv->hw_q[queue].job_idx++;
+			return NULL;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return handle;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
 	return handle;
 }
@@ -6750,32 +6911,56 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_modify_conntrack *ct_conf =
+		(const struct rte_flow_modify_conntrack *)update;
 	const struct rte_flow_update_meter_mark *upd_meter_mark =
 		(const struct rte_flow_update_meter_mark *)update;
 	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+	int ret = 0;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action update failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_UPDATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_update(priv, idx, update, error);
+		ret = mlx5_hws_age_action_update(priv, idx, update, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+		if (ct_conf->state)
+			aso = true;
+		ret = flow_hw_conntrack_update(dev, queue, update, act_idx,
+					       job, push, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso = true;
 		meter_mark = &upd_meter_mark->meter_mark;
 		/* Find ASO object. */
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark update index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		if (upd_meter_mark->profile_valid)
 			fm->profile = (struct mlx5_flow_meter_profile *)
@@ -6789,25 +6974,46 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			fm->is_enable = meter_mark->state;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
-						 aso_mtr, &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 aso_mtr, &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
+		}
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_update(dev, handle, update, error);
+		ret = flow_dv_action_update(dev, handle, update, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return 0;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 /**
@@ -6842,15 +7048,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
+	bool push = true;
+	bool aso = false;
+	int ret = 0;
 
-	RTE_SET_USED(queue);
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_DESTROY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_destroy(priv, age_idx, error);
+		ret = mlx5_hws_age_action_destroy(priv, age_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
 		if (age_idx != 0)
@@ -6859,39 +7078,69 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			 * time to update the AGE.
 			 */
 			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
-		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		ret = mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_destroy(dev, act_idx, error);
+		ret = flow_hw_conntrack_destroy(dev, act_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark destroy index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		fm->is_enable = 0;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-						 &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		mlx5_ipool_free(pool->idx_pool, idx);
+			break;
+		}
+		if (!job)
+			mlx5_ipool_free(pool->idx_pool, idx);
+		else
+			aso = true;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_destroy(dev, handle, error);
+		ret = flow_dv_action_destroy(dev, handle, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 static int
@@ -7115,28 +7364,76 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_action_query(struct rte_eth_dev *dev,
-		     const struct rte_flow_action_handle *handle, void *data,
-		     struct rte_flow_error *error)
+flow_hw_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+			    const struct rte_flow_op_attr *attr,
+			    const struct rte_flow_action_handle *handle,
+			    void *data, void *user_data,
+			    struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_q_job *job = NULL;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
+	int ret;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_QUERY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return flow_hw_query_age(dev, age_idx, data, error);
+		ret = flow_hw_query_age(dev, age_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
-		return flow_hw_query_counter(dev, act_idx, data, error);
+		ret = flow_hw_query_counter(dev, act_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_query(dev, handle, data, error);
+		aso = true;
+		if (job)
+			job->profile = (struct rte_flow_action_conntrack *)data;
+		ret = flow_hw_conntrack_query(dev, queue, act_idx, data,
+					      job, push, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
+	}
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
+	return 0;
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_query(dev, MLX5_HW_INV_QUEUE, NULL,
+			handle, data, NULL, error);
 }
 
 /**
@@ -7251,6 +7548,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
+	.async_action_query = flow_hw_action_handle_query,
 	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index ed2306283d..08f8aad70a 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -1632,7 +1632,7 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
@@ -1882,7 +1882,7 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1988,7 +1988,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
 	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
-					   &priv->mtr_bulk);
+					   &priv->mtr_bulk, NULL, true);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
 			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 15/18] net/mlx5: support flow integrity in HWS group 0
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (13 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 14/18] net/mlx5: add async action push and pull support Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
                     ` (2 subsequent siblings)
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

- Reformat flow integrity item translation for HWS code.
- Support flow integrity bits in HWS group 0.
- Update integrity item translation to match positive semantics only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   1 +
 drivers/net/mlx5/mlx5_flow_dv.c | 163 ++++++++++++++++----------------
 drivers/net/mlx5/mlx5_flow_hw.c |   8 ++
 3 files changed, 90 insertions(+), 82 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 57cebb5ce6..ddc23aaf9c 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1470,6 +1470,7 @@ struct mlx5_dv_matcher_workspace {
 	struct mlx5_flow_rss_desc *rss_desc; /* RSS descriptor. */
 	const struct rte_flow_item *tunnel_item; /* Flow tunnel item. */
 	const struct rte_flow_item *gre_item; /* Flow GRE item. */
+	const struct rte_flow_item *integrity_items[2];
 };
 
 struct mlx5_flow_split_info {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 3cc4b9bcd4..1497423891 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12648,132 +12648,121 @@ flow_dv_aso_age_params_init(struct rte_eth_dev *dev,
 
 static void
 flow_dv_translate_integrity_l4(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v)
+			       void *headers)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value is used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l4_ok) {
 		/* RTE l4_ok filter aggregates hardware l4_ok and
 		 * l4_checksum_ok filters.
 		 * Positive RTE l4_ok match requires hardware match on both L4
 		 * hardware integrity bits.
-		 * For negative match, check hardware l4_checksum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L4.
+		 * PMD supports positive integrity item semantics only.
 		 */
-		if (value->l4_ok) {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_ok, 1);
-		}
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 !!value->l4_ok);
-	}
-	if (mask->l4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 value->l4_csum_ok);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_ok, 1);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
+	} else if (mask->l4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
 	}
 }
 
 static void
 flow_dv_translate_integrity_l3(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v, bool is_ipv4)
+			       void *headers, bool is_ipv4)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l3_ok) {
 		/* RTE l3_ok filter aggregates for IPv4 hardware l3_ok and
 		 * ipv4_csum_ok filters.
 		 * Positive RTE l3_ok match requires hardware match on both L3
 		 * hardware integrity bits.
-		 * For negative match, check hardware l3_csum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L3.
+		 * PMD supports positive integrity item semantics only.
 		 */
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l3_ok, 1);
 		if (is_ipv4) {
-			if (value->l3_ok) {
-				MLX5_SET(fte_match_set_lyr_2_4, headers_m,
-					 l3_ok, 1);
-				MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-					 l3_ok, 1);
-			}
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m,
+			MLX5_SET(fte_match_set_lyr_2_4, headers,
 				 ipv4_checksum_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-				 ipv4_checksum_ok, !!value->l3_ok);
-		} else {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l3_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l3_ok,
-				 value->l3_ok);
 		}
-	}
-	if (mask->ipv4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ipv4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ipv4_checksum_ok,
-			 value->ipv4_csum_ok);
+	} else if (is_ipv4 && mask->ipv4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, ipv4_checksum_ok, 1);
 	}
 }
 
 static void
-set_integrity_bits(void *headers_m, void *headers_v,
-		   const struct rte_flow_item *integrity_item, bool is_l3_ip4)
+set_integrity_bits(void *headers, const struct rte_flow_item *integrity_item,
+		   bool is_l3_ip4, uint32_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = integrity_item->spec;
-	const struct rte_flow_item_integrity *mask = integrity_item->mask;
+	const struct rte_flow_item_integrity *spec;
+	const struct rte_flow_item_integrity *mask;
 
 	/* Integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (!mask)
-		mask = &rte_flow_item_integrity_mask;
-	flow_dv_translate_integrity_l3(mask, spec, headers_m, headers_v,
-				       is_l3_ip4);
-	flow_dv_translate_integrity_l4(mask, spec, headers_m, headers_v);
+	if (MLX5_ITEM_VALID(integrity_item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(integrity_item, key_type, spec, mask,
+			 &rte_flow_item_integrity_mask);
+	flow_dv_translate_integrity_l3(mask, headers, is_l3_ip4);
+	flow_dv_translate_integrity_l4(mask, headers);
 }
 
 static void
-flow_dv_translate_item_integrity_post(void *matcher, void *key,
+flow_dv_translate_item_integrity_post(void *key,
 				      const
 				      struct rte_flow_item *integrity_items[2],
-				      uint64_t pattern_flags)
+				      uint64_t pattern_flags, uint32_t key_type)
 {
-	void *headers_m, *headers_v;
+	void *headers;
 	bool is_l3_ip4;
 
 	if (pattern_flags & MLX5_FLOW_ITEM_INNER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 inner_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_INNER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[1], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[1], is_l3_ip4,
+				   key_type);
 	}
 	if (pattern_flags & MLX5_FLOW_ITEM_OUTER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 outer_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_OUTER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[0], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[0], is_l3_ip4,
+				   key_type);
 	}
 }
 
-static void
+static uint64_t
 flow_dv_translate_item_integrity(const struct rte_flow_item *item,
-				 const struct rte_flow_item *integrity_items[2],
-				 uint64_t *last_item)
+				 struct mlx5_dv_matcher_workspace *wks,
+				 uint64_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = (typeof(spec))item->spec;
+	if ((key_type & MLX5_SET_MATCHER_SW) != 0) {
+		const struct rte_flow_item_integrity
+			*spec = (typeof(spec))item->spec;
 
-	/* integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (spec->level > 1) {
-		integrity_items[1] = item;
-		*last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		/* SWS integrity bits validation cleared spec pointer */
+		if (spec->level > 1) {
+			wks->integrity_items[1] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		} else {
+			wks->integrity_items[0] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		}
 	} else {
-		integrity_items[0] = item;
-		*last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		/* HWS supports outer integrity only */
+		wks->integrity_items[0] = item;
+		wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
 	}
+	return wks->last_item;
 }
 
 /**
@@ -13401,6 +13390,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		flow_dv_translate_item_meter_color(dev, key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_METER_COLOR;
 		break;
+	case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+		last_item = flow_dv_translate_item_integrity(items,
+							     wks, key_type);
+		break;
 	default:
 		break;
 	}
@@ -13464,6 +13457,12 @@ flow_dv_translate_items_hws(const struct rte_flow_item *items,
 		if (ret)
 			return ret;
 	}
+	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
+		flow_dv_translate_item_integrity_post(key,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      key_type);
+	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(key,
 						 wks.tunnel_item,
@@ -13544,7 +13543,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			     mlx5_flow_get_thread_workspace())->rss_desc,
 	};
 	struct mlx5_dv_matcher_workspace wks_m = wks;
-	const struct rte_flow_item *integrity_items[2] = {NULL, NULL};
 	int ret = 0;
 	int tunnel;
 
@@ -13555,10 +13553,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 						  NULL, "item not supported");
 		tunnel = !!(wks.item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		switch (items->type) {
-		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
-			flow_dv_translate_item_integrity(items, integrity_items,
-							 &wks.last_item);
-			break;
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			flow_dv_translate_item_aso_ct(dev, match_mask,
 						      match_value, items);
@@ -13601,9 +13595,14 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			return -rte_errno;
 	}
 	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
-		flow_dv_translate_item_integrity_post(match_mask, match_value,
-						      integrity_items,
-						      wks.item_flags);
+		flow_dv_translate_item_integrity_post(match_mask,
+						      wks_m.integrity_items,
+						      wks_m.item_flags,
+						      MLX5_SET_MATCHER_SW_M);
+		flow_dv_translate_item_integrity_post(match_value,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      MLX5_SET_MATCHER_SW_V);
 	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(match_mask,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 2792a0fc39..3cbe0305e9 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4655,6 +4655,14 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
+		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+			/*
+			 * Integrity flow item validation require access to
+			 * both item mask and spec.
+			 * Current HWS model allows item mask in pattern
+			 * template and item spec in flow rule.
+			 */
+			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
 			break;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 16/18] net/mlx5: support device control for E-Switch default rule
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (14 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 17/18] net/mlx5: support device control of representor matching Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Dariusz Sosnowski, Xueming Li

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adds support for fdb_def_rule_en device argument to HW
Steering, which controls:

- creation of default FDB jump flow rule,
- ability of the user to create transfer flow rules in root table.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  14 ++
 drivers/net/mlx5/mlx5.h          |   4 +-
 drivers/net/mlx5/mlx5_flow.c     |  20 +--
 drivers/net/mlx5/mlx5_flow.h     |   5 +-
 drivers/net/mlx5/mlx5_flow_dv.c  |  62 ++++---
 drivers/net/mlx5/mlx5_flow_hw.c  | 273 +++++++++++++++----------------
 drivers/net/mlx5/mlx5_trigger.c  |  31 ++--
 drivers/net/mlx5/mlx5_tx.h       |   1 +
 drivers/net/mlx5/mlx5_txq.c      |  47 ++++++
 drivers/net/mlx5/rte_pmd_mlx5.h  |  17 ++
 drivers/net/mlx5/version.map     |   1 +
 11 files changed, 287 insertions(+), 188 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 55801682cc..c23fe6daf1 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1554,6 +1554,20 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
+		if (priv->sh->config.dv_esw_en) {
+			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
+				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
+					     "but it is disabled (configure it through devlink)");
+				err = ENOTSUP;
+				goto error;
+			}
+			if (priv->sh->dv_regc0_mask == 0) {
+				DRV_LOG(ERR, "E-Switch with HWS is not supported "
+					     "(no available bits in reg_c[0])");
+				err = ENOTSUP;
+				goto error;
+			}
+		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5195529267..9a1718e2f2 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2022,7 +2022,7 @@ int mlx5_flow_ops_get(struct rte_eth_dev *dev, const struct rte_flow_ops **ops);
 int mlx5_flow_start_default(struct rte_eth_dev *dev);
 void mlx5_flow_stop_default(struct rte_eth_dev *dev);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
-int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t sq_num);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
@@ -2034,7 +2034,7 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev,
 int mlx5_flow_lacp_miss(struct rte_eth_dev *dev);
 struct rte_flow *mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev);
 uint32_t mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev,
-					    uint32_t txq);
+					    uint32_t sq_num);
 void mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 				       uint64_t async_id, int status);
 void mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b11957f8ee..76187d76ea 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7159,14 +7159,14 @@ mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param txq
- *   Txq index.
+ * @param sq_num
+ *   SQ number.
  *
  * @return
  *   Flow ID on success, 0 otherwise and rte_errno is set.
  */
 uint32_t
-mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sq_num)
 {
 	struct rte_flow_attr attr = {
 		.group = 0,
@@ -7178,8 +7178,8 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_flow_item_port_id port_spec = {
 		.id = MLX5_PORT_ESW_MGR,
 	};
-	struct mlx5_rte_flow_item_sq txq_spec = {
-		.queue = txq,
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sq_num,
 	};
 	struct rte_flow_item pattern[] = {
 		{
@@ -7189,7 +7189,7 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
-			.spec = &txq_spec,
+			.spec = &sq_spec,
 		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
@@ -7560,22 +7560,22 @@ mlx5_flow_verify(struct rte_eth_dev *dev __rte_unused)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param queue
- *   The queue index.
+ * @param sq_num
+ *   The SQ hw number.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
-			    uint32_t queue)
+			    uint32_t sq_num)
 {
 	const struct rte_flow_attr attr = {
 		.egress = 1,
 		.priority = 0,
 	};
 	struct mlx5_rte_flow_item_sq queue_spec = {
-		.queue = queue,
+		.queue = sq_num,
 	};
 	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ddc23aaf9c..88d92b18c7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -116,7 +116,7 @@ struct mlx5_flow_action_copy_mreg {
 
 /* Matches on source queue. */
 struct mlx5_rte_flow_item_sq {
-	uint32_t queue;
+	uint32_t queue; /* DevX SQ number */
 };
 
 /* Feature name to allocate metadata register. */
@@ -2485,9 +2485,8 @@ int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 
 int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
 
-int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
-					 uint32_t txq);
+					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 1497423891..0f6fd34a8b 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -10123,6 +10123,29 @@ flow_dv_translate_item_port_id(struct rte_eth_dev *dev, void *key,
 	return 0;
 }
 
+/**
+ * Translate port representor item to eswitch match on port id.
+ *
+ * @param[in] dev
+ *   The devich to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise.
+ */
+static int
+flow_dv_translate_item_port_representor(struct rte_eth_dev *dev, void *key,
+					uint32_t key_type)
+{
+	flow_dv_translate_item_source_vport(key,
+			key_type & MLX5_SET_MATCHER_V ?
+			mlx5_flow_get_esw_manager_vport_id(dev) : 0xffff);
+	return 0;
+}
+
 /**
  * Translate represented port item to eswitch match on port id.
  *
@@ -11402,10 +11425,10 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
 }
 
 /**
- * Add Tx queue matcher
+ * Add SQ matcher
  *
- * @param[in] dev
- *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
  * @param[in, out] key
  *   Flow matcher value.
  * @param[in] item
@@ -11414,40 +11437,29 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
  *   Set flow matcher mask or value.
  */
 static void
-flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
-				void *key,
-				const struct rte_flow_item *item,
-				uint32_t key_type)
+flow_dv_translate_item_sq(void *key,
+			  const struct rte_flow_item *item,
+			  uint32_t key_type)
 {
 	const struct mlx5_rte_flow_item_sq *queue_m;
 	const struct mlx5_rte_flow_item_sq *queue_v;
 	const struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
-	void *misc_v =
-		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
-	struct mlx5_txq_ctrl *txq = NULL;
+	void *misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 	uint32_t queue;
 
 	MLX5_ITEM_UPDATE(item, key_type, queue_v, queue_m, &queue_mask);
 	if (!queue_m || !queue_v)
 		return;
 	if (key_type & MLX5_SET_MATCHER_V) {
-		txq = mlx5_txq_get(dev, queue_v->queue);
-		if (!txq)
-			return;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = queue_v->queue;
 		if (key_type == MLX5_SET_MATCHER_SW_V)
 			queue &= queue_m->queue;
 	} else {
 		queue = queue_m->queue;
 	}
 	MLX5_SET(fte_match_set_misc, misc_v, source_sqn, queue);
-	if (txq)
-		mlx5_txq_release(dev, queue_v->queue);
 }
 
 /**
@@ -13148,6 +13160,11 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 			(dev, key, items, wks->attr, key_type);
 		last_item = MLX5_FLOW_ITEM_PORT_ID;
 		break;
+	case RTE_FLOW_ITEM_TYPE_PORT_REPRESENTOR:
+		flow_dv_translate_item_port_representor
+			(dev, key, key_type);
+		last_item = MLX5_FLOW_ITEM_PORT_REPRESENTOR;
+		break;
 	case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		flow_dv_translate_item_represented_port
 			(dev, key, items, wks->attr, key_type);
@@ -13354,7 +13371,7 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		last_item = MLX5_FLOW_ITEM_TAG;
 		break;
 	case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
-		flow_dv_translate_item_tx_queue(dev, key, items, key_type);
+		flow_dv_translate_item_sq(key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_SQ;
 		break;
 	case RTE_FLOW_ITEM_TYPE_GTP:
@@ -13564,7 +13581,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			wks.last_item = tunnel ? MLX5_FLOW_ITEM_INNER_FLEX :
 						 MLX5_FLOW_ITEM_OUTER_FLEX;
 			break;
-
 		default:
 			ret = flow_dv_translate_items(dev, items, &wks_m,
 				match_mask, MLX5_SET_MATCHER_SW_M, error);
@@ -13587,7 +13603,9 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 	 * in use.
 	 */
 	if (!(wks.item_flags & MLX5_FLOW_ITEM_PORT_ID) &&
-	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) && priv->sh->esw_mode &&
+	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) &&
+	    !(wks.item_flags & MLX5_FLOW_ITEM_PORT_REPRESENTOR) &&
+	    priv->sh->esw_mode &&
 	    !(attr->egress && !attr->transfer) &&
 	    attr->group != MLX5_FLOW_MREG_CP_TABLE_GROUP) {
 		if (flow_dv_translate_item_port_id_all(dev, match_mask,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 3cbe0305e9..9294866628 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3173,7 +3173,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+	if (priv->sh->config.dv_esw_en &&
+	    priv->fdb_def_rule &&
+	    cfg->external &&
+	    flow_attr->transfer) {
 		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -5137,14 +5140,23 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 }
 
 static uint32_t
-flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
-	uint32_t usable_mask = ~priv->vport_meta_mask;
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
 
-	if (usable_mask)
-		return (1 << rte_bsf32(usable_mask));
-	else
-		return 0;
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return mask;
+}
+
+static uint32_t
+flow_hw_esw_mgr_regc_marker(struct rte_eth_dev *dev)
+{
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return RTE_BIT32(rte_bsf32(mask));
 }
 
 /**
@@ -5170,12 +5182,19 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 	struct rte_flow_item_ethdev port_mask = {
 		.port_id = UINT16_MAX,
 	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
 	struct rte_flow_item items[] = {
 		{
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &port_spec,
 			.mask = &port_mask,
 		},
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
@@ -5185,9 +5204,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match REG_C_0 and a TX queue.
- * Matching on REG_C_0 is set up to match on least significant bit usable
- * by user-space, which is set when packet was originated from E-Switch Manager.
+ * Creates a flow pattern template used to match REG_C_0 and a SQ.
+ * Matching on REG_C_0 is set up to match on all bits usable by user-space.
+ * If traffic was sent from E-Switch Manager, then all usable bits will be set to 0,
+ * except the least significant bit, which will be set to 1.
  *
  * This template is used to set up a table for SQ miss default flow.
  *
@@ -5200,8 +5220,6 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_pattern_template *
 flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
@@ -5211,6 +5229,7 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
 	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
@@ -5232,12 +5251,6 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
-		return NULL;
-	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -5329,9 +5342,8 @@ flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_actions_template *
 flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
-	uint32_t marker_bit_mask = UINT32_MAX;
+	uint32_t marker_mask = flow_hw_esw_mgr_regc_marker_mask(dev);
+	uint32_t marker_bits = flow_hw_esw_mgr_regc_marker(dev);
 	struct rte_flow_actions_template_attr attr = {
 		.transfer = 1,
 	};
@@ -5344,7 +5356,7 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		.src = {
 			.field = RTE_FLOW_FIELD_VALUE,
 		},
-		.width = 1,
+		.width = __builtin_popcount(marker_mask),
 	};
 	struct rte_flow_action_modify_field set_reg_m = {
 		.operation = RTE_FLOW_MODIFY_SET,
@@ -5391,13 +5403,9 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		}
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
-		return NULL;
-	}
-	set_reg_v.dst.offset = rte_bsf32(marker_bit);
-	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
-	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	set_reg_v.dst.offset = rte_bsf32(marker_mask);
+	rte_memcpy(set_reg_v.src.value, &marker_bits, sizeof(marker_bits));
+	rte_memcpy(set_reg_m.src.value, &marker_mask, sizeof(marker_mask));
 	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
 }
 
@@ -5584,7 +5592,7 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -5699,7 +5707,7 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.priority = 0,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -7797,141 +7805,123 @@ flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
 }
 
 int
-mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sqn)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_item_ethdev port_spec = {
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev esw_mgr_spec = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item_ethdev port_mask = {
+	struct rte_flow_item_ethdev esw_mgr_mask = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item items[] = {
-		{
-			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-			.spec = &port_spec,
-			.mask = &port_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
-	};
-	struct rte_flow_action_modify_field modify_field = {
-		.operation = RTE_FLOW_MODIFY_SET,
-		.dst = {
-			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-		},
-		.src = {
-			.field = RTE_FLOW_FIELD_VALUE,
-		},
-		.width = 1,
-	};
-	struct rte_flow_action_jump jump = {
-		.group = 1,
-	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
-			.conf = &modify_field,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_JUMP,
-			.conf = &jump,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
-
-	MLX5_ASSERT(priv->master);
-	if (!priv->dr_ctx ||
-	    !priv->hw_esw_sq_miss_root_tbl)
-		return 0;
-	return flow_hw_create_ctrl_flow(dev, dev,
-					priv->hw_esw_sq_miss_root_tbl,
-					items, 0, actions, 0);
-}
-
-int
-mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
-{
-	uint16_t port_id = dev->data->port_id;
 	struct rte_flow_item_tag reg_c0_spec = {
 		.index = (uint8_t)REG_C_0,
+		.data = flow_hw_esw_mgr_regc_marker(dev),
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
-	struct mlx5_rte_flow_item_sq queue_spec = {
-		.queue = txq,
-	};
-	struct mlx5_rte_flow_item_sq queue_mask = {
-		.queue = UINT32_MAX,
-	};
-	struct rte_flow_item items[] = {
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
-			.spec = &reg_c0_spec,
-			.mask = &reg_c0_mask,
-		},
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
-			.spec = &queue_spec,
-			.mask = &queue_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
 	};
 	struct rte_flow_action_ethdev port = {
 		.port_id = port_id,
 	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
-			.conf = &port,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
+	struct rte_flow_item items[3] = { { 0 } };
+	struct rte_flow_action actions[3] = { { 0 } };
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
-	uint32_t marker_bit;
 	int ret;
 
-	RTE_SET_USED(txq);
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default SQ miss flows.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default SQ miss flows. Default flows will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
 	    !proxy_priv->hw_esw_sq_miss_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
-		rte_errno = EINVAL;
-		return -rte_errno;
+	/*
+	 * Create a root SQ miss flow rule - match E-Switch Manager and SQ,
+	 * and jump to group 1.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = &esw_mgr_spec,
+		.mask = &esw_mgr_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_JUMP,
+	};
+	actions[2] = (struct rte_flow_action) {
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_root_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create root SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
 	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
-	return flow_hw_create_ctrl_flow(dev, proxy_dev,
-					proxy_priv->hw_esw_sq_miss_tbl,
-					items, 0, actions, 0);
+	/*
+	 * Create a non-root SQ miss flow rule - match REG_C_0 marker and SQ,
+	 * and forward to port.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &reg_c0_spec,
+		.mask = &reg_c0_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+		.conf = &port,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create HWS SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
+	}
+	return 0;
 }
 
 int
@@ -7969,17 +7959,24 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default FDB jump rule.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default FDB jump rule. Default rule will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_zero_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index c260c81e57..715f2891cf 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -426,7 +426,7 @@ mlx5_hairpin_queue_peer_update(struct rte_eth_dev *dev, uint16_t peer_queue,
 			mlx5_txq_release(dev, peer_queue);
 			return -rte_errno;
 		}
-		peer_info->qp_id = txq_ctrl->obj->sq->id;
+		peer_info->qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		peer_info->vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		/* 1-to-1 mapping, only the first one is used. */
 		peer_info->peer_q = txq_ctrl->hairpin_conf.peers[0].queue;
@@ -818,7 +818,7 @@ mlx5_hairpin_bind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 		}
 		/* Pass TxQ's information to peer RxQ and try binding. */
 		cur.peer_q = rx_queue;
-		cur.qp_id = txq_ctrl->obj->sq->id;
+		cur.qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		cur.vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		cur.tx_explicit = txq_ctrl->hairpin_conf.tx_explicit;
 		cur.manual_bind = txq_ctrl->hairpin_conf.manual_bind;
@@ -1300,8 +1300,6 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	int ret;
 
 	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
-			goto error;
 		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
 			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
 				goto error;
@@ -1312,10 +1310,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 
 		if (!txq)
 			continue;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = mlx5_txq_get_sqn(txq);
 		if ((priv->representor || priv->master) &&
 		    priv->sh->config.dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
@@ -1325,9 +1320,15 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
-			goto error;
+	if (priv->sh->config.fdb_def_rule) {
+		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				goto error;
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled", dev->data->port_id);
 	}
 	return 0;
 error:
@@ -1393,14 +1394,18 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		    txq_ctrl->hairpin_conf.tx_explicit == 0 &&
 		    txq_ctrl->hairpin_conf.peers[0].port ==
 		    priv->dev_data->port_id) {
-			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			ret = mlx5_ctrl_flow_source_queue(dev,
+					mlx5_txq_get_sqn(txq_ctrl));
 			if (ret) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
 		if (priv->sh->config.dv_esw_en) {
-			if (mlx5_flow_create_devx_sq_miss_flow(dev, i) == 0) {
+			uint32_t q = mlx5_txq_get_sqn(txq_ctrl);
+
+			if (mlx5_flow_create_devx_sq_miss_flow(dev, q) == 0) {
+				mlx5_txq_release(dev, i);
 				DRV_LOG(ERR,
 					"Port %u Tx queue %u SQ create representor devx default miss rule failed.",
 					dev->data->port_id, i);
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index e0fc1872fe..6471ebf59f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -213,6 +213,7 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
 uint64_t mlx5_get_tx_port_offloads(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9150ced72d..5543f2c570 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -27,6 +27,8 @@
 #include "mlx5_tx.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
+#include "rte_pmd_mlx5.h"
+#include "mlx5_flow.h"
 
 /**
  * Allocate TX queue elements.
@@ -1274,6 +1276,51 @@ mlx5_txq_verify(struct rte_eth_dev *dev)
 	return ret;
 }
 
+int
+mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq)
+{
+	return txq->is_hairpin ? txq->obj->sq->id : txq->obj->sq_obj.sq->id;
+}
+
+int
+rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint32_t flow;
+
+	if (rte_eth_dev_is_valid_port(port_id) < 0) {
+		DRV_LOG(ERR, "There is no Ethernet device for port %u.",
+			port_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if ((!priv->representor && !priv->master) ||
+	    !priv->sh->config.dv_esw_en) {
+		DRV_LOG(ERR, "Port %u must be represetnor or master port in E-Switch mode.",
+			port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (sq_num == 0) {
+		DRV_LOG(ERR, "Invalid SQ number.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_flow_hw_esw_create_sq_miss_flow(dev, sq_num);
+#endif
+	flow = mlx5_flow_create_devx_sq_miss_flow(dev, sq_num);
+	if (flow > 0)
+		return 0;
+	DRV_LOG(ERR, "Port %u failed to create default miss flow for SQ %u.",
+		port_id, sq_num);
+	return -rte_errno;
+}
+
 /**
  * Set the Tx queue dynamic timestamp (mask and offset)
  *
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index fbfdd9737b..d4caea5b20 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -139,6 +139,23 @@ int rte_pmd_mlx5_external_rx_queue_id_unmap(uint16_t port_id,
 __rte_experimental
 int rte_pmd_mlx5_host_shaper_config(int port_id, uint8_t rate, uint32_t flags);
 
+/**
+ * Enable traffic for external SQ.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] sq_num
+ *   SQ HW number.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ *   Possible values for rte_errno:
+ *   - EINVAL - invalid sq_number or port type.
+ *   - ENODEV - there is no Ethernet device for this port id.
+ */
+__rte_experimental
+int rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/mlx5/version.map b/drivers/net/mlx5/version.map
index 9942de5079..848270da13 100644
--- a/drivers/net/mlx5/version.map
+++ b/drivers/net/mlx5/version.map
@@ -14,4 +14,5 @@ EXPERIMENTAL {
 	rte_pmd_mlx5_external_rx_queue_id_unmap;
 	# added in 22.07
 	rte_pmd_mlx5_host_shaper_config;
+	rte_pmd_mlx5_external_sq_enable;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 17/18] net/mlx5: support device control of representor matching
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (15 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  2022-10-20  3:22   ` [PATCH v5 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

In some E-Switch use cases applications want to receive all traffic
on a single port. Since currently flow API does not provide a way to
match traffic forwarded to any port representor, this patch adds
support for controlling representor matching on ingress flow rules.

Representor matching is controlled through new device argument
repr_matching_en.

- If representor matching is enabled (default setting),
  then each ingress pattern template has an implicit REPRESENTED_PORT
  item added. Flow rules based on this pattern template will match
  the vport associated with port on which rule is created.
- If representor matching is disabled, then there will be no implicit
  item added. As a result ingress flow rules will match traffic
  coming to any port, not only the port on which flow rule is created.

Representor matching is enabled by default, to provide an expected
default behavior.

This patch enables egress flow rules on representors when E-Switch is
enabled in the following configurations:

- repr_matching_en=1 and dv_xmeta_en=4
- repr_matching_en=1 and dv_xmeta_en=0
- repr_matching_en=0 and dv_xmeta_en=0

When representor matching is enabled, the following logic is
implemented:

1. Creating an egress template table in group 0 for each port. These
   tables will hold default flow rules defined as follows:

      pattern SQ
      actions MODIFY_FIELD (set available bits in REG_C_0 to
                            vport_meta_tag)
              MODIFY_FIELD (copy REG_A to REG_C_1, only when
                            dv_xmeta_en == 4)
              JUMP (group 1)

2. Egress pattern templates created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   available bits of REG_C_0.

3. Egress flow rules created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   vport_meta_tag placed in available bits of REG_C_0.

4. Egress template tables created by an application, which are in
   group n, are placed in group n + 1.

5. Items and actions related to META are operating on REG_A when
   dv_xmeta_en == 0 or REG_C_1 when dv_xmeta_en == 4.

When representor matching is disabled and extended metadata is disabled,
no changes to current logic are required.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/nics/mlx5.rst         |  11 +
 drivers/net/mlx5/linux/mlx5_os.c |  11 +
 drivers/net/mlx5/mlx5.c          |  13 +
 drivers/net/mlx5/mlx5.h          |   5 +
 drivers/net/mlx5/mlx5_flow.c     |   8 +-
 drivers/net/mlx5/mlx5_flow.h     |   7 +
 drivers/net/mlx5/mlx5_flow_hw.c  | 738 ++++++++++++++++++++++++-------
 drivers/net/mlx5/mlx5_trigger.c  | 167 ++++++-
 8 files changed, 794 insertions(+), 166 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index ae4d406ca1..b923976fad 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1163,6 +1163,17 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``repr_matching_en`` parameter [int]
+
+  - 0. If representor matching is disabled, then there will be no implicit
+    item added. As a result ingress flow rules will match traffic
+    coming to any port, not only the port on which flow rule is created.
+
+  - 1. If representor matching is enabled (default setting),
+    then each ingress pattern template has an implicit REPRESENTED_PORT
+    item added. Flow rules based on this pattern template will match
+    the vport associated with port on which rule is created.
+
 Supported NICs
 --------------
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index c23fe6daf1..8efc7dbb3f 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1555,6 +1555,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
 		if (priv->sh->config.dv_esw_en) {
+			uint32_t usable_bits;
+			uint32_t required_bits;
+
 			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
 				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
 					     "but it is disabled (configure it through devlink)");
@@ -1567,6 +1570,14 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				err = ENOTSUP;
 				goto error;
 			}
+			usable_bits = __builtin_popcount(priv->sh->dv_regc0_mask);
+			required_bits = __builtin_popcount(priv->vport_meta_mask);
+			if (usable_bits < required_bits) {
+				DRV_LOG(ERR, "Not enough bits available in reg_c[0] to provide "
+					     "representor matching.");
+				err = ENOTSUP;
+				goto error;
+			}
 		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4e532f0807..78234b116c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -181,6 +181,9 @@
 /* HW steering counter's query interval. */
 #define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
 
+/* Device parameter to control representor matching in ingress/egress flows with HWS. */
+#define MLX5_REPR_MATCHING_EN "repr_matching_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1283,6 +1286,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->cnt_svc.service_core = tmp;
 	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
 		config->cnt_svc.cycle_time = tmp;
+	} else if (strcmp(MLX5_REPR_MATCHING_EN, key) == 0) {
+		config->repr_matching = !!tmp;
 	}
 	return 0;
 }
@@ -1321,6 +1326,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_FDB_DEFAULT_RULE_EN,
 		MLX5_HWS_CNT_SERVICE_CORE,
 		MLX5_HWS_CNT_CYCLE_TIME,
+		MLX5_REPR_MATCHING_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1335,6 +1341,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->fdb_def_rule = 1;
 	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
 	config->cnt_svc.service_core = rte_get_main_lcore();
+	config->repr_matching = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1368,6 +1375,11 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 			config->dv_xmeta_en);
 		config->dv_xmeta_en = MLX5_XMETA_MODE_LEGACY;
 	}
+	if (config->dv_flow_en != 2 && !config->repr_matching) {
+		DRV_LOG(DEBUG, "Disabling representor matching is valid only "
+			       "when HW Steering is enabled.");
+		config->repr_matching = 1;
+	}
 	if (config->tx_pp && !sh->dev_cap.txpp_en) {
 		DRV_LOG(ERR, "Packet pacing is not supported.");
 		rte_errno = ENODEV;
@@ -1411,6 +1423,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
 	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
+	DRV_LOG(DEBUG, "\"repr_matching_en\" is %u.", config->repr_matching);
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9a1718e2f2..87c90d58d7 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -321,6 +321,7 @@ struct mlx5_sh_config {
 	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
+	uint32_t repr_matching:1; /* Enable implicit vport matching in HWS FDB. */
 };
 
 /* Structure for VF VLAN workaround. */
@@ -371,6 +372,7 @@ struct mlx5_hw_q_job {
 			void *out_data;
 		} __rte_packed;
 		struct rte_flow_item_ethdev port_spec;
+		struct rte_flow_item_tag tag_spec;
 	} __rte_packed;
 };
 
@@ -1680,6 +1682,9 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
+	struct rte_flow_pattern_template *hw_tx_repr_tagging_pt;
+	struct rte_flow_actions_template *hw_tx_repr_tagging_at;
+	struct rte_flow_template_table *hw_tx_repr_tagging_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 76187d76ea..60af09dbeb 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1127,7 +1127,11 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 		}
 		break;
 	case MLX5_METADATA_TX:
-		return REG_A;
+		if (config->dv_flow_en == 2 && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		} else {
+			return REG_A;
+		}
 	case MLX5_METADATA_FDB:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
@@ -11323,7 +11327,7 @@ mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 			return 0;
 		}
 	}
-	return rte_flow_error_set(error, EINVAL,
+	return rte_flow_error_set(error, ENODEV,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, "unable to find a proxy port");
 }
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 88d92b18c7..edf45b814d 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1207,12 +1207,18 @@ struct rte_flow_pattern_template {
 	struct rte_flow_pattern_template_attr attr;
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
+	uint64_t orig_item_nb; /* Number of pattern items provided by the user (with END item). */
 	uint32_t refcnt;  /* Reference counter. */
 	/*
 	 * If true, then rule pattern should be prepended with
 	 * represented_port pattern item.
 	 */
 	bool implicit_port;
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * tag pattern item for representor matching.
+	 */
+	bool implicit_tag;
 };
 
 /* Flow action template struct. */
@@ -2489,6 +2495,7 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_actions_template_attr *attr,
 		const struct rte_flow_action actions[],
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 9294866628..49186c4339 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -32,12 +32,15 @@
 /* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Lowest flow group usable by an application. */
+/* Lowest flow group usable by an application if group translation is done. */
 #define MLX5_HW_LOWEST_USABLE_GROUP (1)
 
 /* Maximum group index usable by user applications for transfer flows. */
 #define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
 
+/* Maximum group index usable by user applications for egress flows. */
+#define MLX5_HW_MAX_EGRESS_GROUP (UINT32_MAX - 1)
+
 /* Lowest priority for HW root table. */
 #define MLX5_HW_LOWEST_PRIO_ROOT 15
 
@@ -61,6 +64,9 @@ flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
 			       const struct mlx5_hw_actions *hw_acts,
 			       const struct rte_flow_action *action);
 
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev);
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -2346,21 +2352,18 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 		       uint8_t pattern_template_index,
 		       struct mlx5_hw_q_job *job)
 {
-	if (table->its[pattern_template_index]->implicit_port) {
-		const struct rte_flow_item *curr_item;
-		unsigned int nb_items;
-		bool found_end;
-		unsigned int i;
-
-		/* Count number of pattern items. */
-		nb_items = 0;
-		found_end = false;
-		for (curr_item = items; !found_end; ++curr_item) {
-			++nb_items;
-			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-				found_end = true;
+	struct rte_flow_pattern_template *pt = table->its[pattern_template_index];
+
+	/* Only one implicit item can be added to flow rule pattern. */
+	MLX5_ASSERT(!pt->implicit_port || !pt->implicit_tag);
+	/* At least one item was allocated in job descriptor for items. */
+	MLX5_ASSERT(MLX5_HW_MAX_ITEMS >= 1);
+	if (pt->implicit_port) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
-		/* Prepend represented port item. */
+		/* Set up represented port item in job descriptor. */
 		job->port_spec = (struct rte_flow_item_ethdev){
 			.port_id = dev->data->port_id,
 		};
@@ -2368,21 +2371,26 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &job->port_spec,
 		};
-		found_end = false;
-		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
-			job->items[i] = items[i - 1];
-			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
-				found_end = true;
-				break;
-			}
-		}
-		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
+		return job->items;
+	} else if (pt->implicit_tag) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
 			rte_errno = ENOMEM;
 			return NULL;
 		}
+		/* Set up tag item in job descriptor. */
+		job->tag_spec = (struct rte_flow_item_tag){
+			.data = flow_hw_tx_tag_regc_value(dev),
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &job->tag_spec,
+		};
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
 		return job->items;
+	} else {
+		return items;
 	}
-	return items;
 }
 
 /**
@@ -2960,6 +2968,11 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		     uint8_t nb_action_templates,
 		     struct rte_flow_error *error)
 {
+	struct rte_flow_error sub_error = {
+		.type = RTE_FLOW_ERROR_TYPE_NONE,
+		.cause = NULL,
+		.message = NULL,
+	};
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5dr_matcher_attr matcher_attr = {0};
 	struct rte_flow_template_table *tbl = NULL;
@@ -2970,7 +2983,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
-		.error = error,
+		.error = &sub_error,
 		.data = &flow_attr,
 	};
 	struct mlx5_indexed_pool_config cfg = {
@@ -3064,7 +3077,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			continue;
 		err = __flow_hw_actions_translate(dev, &tbl->cfg,
 						  &tbl->ats[i].acts,
-						  action_templates[i], error);
+						  action_templates[i], &sub_error);
 		if (err) {
 			i++;
 			goto at_error;
@@ -3105,12 +3118,11 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mlx5_free(tbl);
 	}
 	if (error != NULL) {
-		rte_flow_error_set(error, err,
-				error->type == RTE_FLOW_ERROR_TYPE_NONE ?
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED : error->type,
-				NULL,
-				error->message == NULL ?
-				"fail to create rte table" : error->message);
+		if (sub_error.type == RTE_FLOW_ERROR_TYPE_NONE)
+			rte_flow_error_set(error, err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					   "Failed to create template table");
+		else
+			rte_memcpy(error, &sub_error, sizeof(sub_error));
 	}
 	return NULL;
 }
@@ -3171,9 +3183,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en &&
+	if (config->dv_esw_en &&
 	    priv->fdb_def_rule &&
 	    cfg->external &&
 	    flow_attr->transfer) {
@@ -3183,6 +3196,22 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 						  NULL,
 						  "group index not supported");
 		*table_group = group + 1;
+	} else if (config->dv_esw_en &&
+		   !(config->repr_matching && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) &&
+		   cfg->external &&
+		   flow_attr->egress) {
+		/*
+		 * On E-Switch setups, egress group translation is not done if and only if
+		 * representor matching is disabled and legacy metadata mode is selected.
+		 * In all other cases, egree group 0 is reserved for representor tagging flows
+		 * and metadata copy flows.
+		 */
+		if (group > MLX5_HW_MAX_EGRESS_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
 	} else {
 		*table_group = group;
 	}
@@ -3223,7 +3252,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -3232,12 +3260,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
-		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-				  "egress flows are not supported with HW Steering"
-				  " when E-Switch is enabled");
-		return NULL;
-	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -4493,26 +4515,28 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
-static struct rte_flow_item *
-flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
-			       struct rte_flow_error *error)
+static uint32_t
+flow_hw_count_items(const struct rte_flow_item *items)
 {
 	const struct rte_flow_item *curr_item;
-	struct rte_flow_item *copied_items;
-	bool found_end;
-	unsigned int nb_items;
-	unsigned int i;
-	size_t size;
+	uint32_t nb_items;
 
-	/* Count number of pattern items. */
 	nb_items = 0;
-	found_end = false;
-	for (curr_item = items; !found_end; ++curr_item) {
+	for (curr_item = items; curr_item->type != RTE_FLOW_ITEM_TYPE_END; ++curr_item)
 		++nb_items;
-		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-			found_end = true;
-	}
-	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	return ++nb_items;
+}
+
+static struct rte_flow_item *
+flow_hw_prepend_item(const struct rte_flow_item *items,
+		     const uint32_t nb_items,
+		     const struct rte_flow_item *new_item,
+		     struct rte_flow_error *error)
+{
+	struct rte_flow_item *copied_items;
+	size_t size;
+
+	/* Allocate new array of items. */
 	size = sizeof(*copied_items) * (nb_items + 1);
 	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
 	if (!copied_items) {
@@ -4522,14 +4546,9 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 				   "cannot allocate item template");
 		return NULL;
 	}
-	copied_items[0] = (struct rte_flow_item){
-		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-		.spec = NULL,
-		.last = NULL,
-		.mask = &rte_flow_item_ethdev_mask,
-	};
-	for (i = 1; i < nb_items + 1; ++i)
-		copied_items[i] = items[i - 1];
+	/* Put new item at the beginning and copy the rest. */
+	copied_items[0] = *new_item;
+	rte_memcpy(&copied_items[1], items, sizeof(*items) * nb_items);
 	return copied_items;
 }
 
@@ -4550,17 +4569,13 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	if (priv->sh->config.dv_esw_en) {
 		MLX5_ASSERT(priv->master || priv->representor);
 		if (priv->master) {
-			/*
-			 * It is allowed to specify ingress, egress and transfer attributes
-			 * at the same time, in order to construct flows catching all missed
-			 * FDB traffic and forwarding it to the master port.
-			 */
-			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+			if ((attr->ingress && attr->egress) ||
+			    (attr->ingress && attr->transfer) ||
+			    (attr->egress && attr->transfer))
 				return rte_flow_error_set(error, EINVAL,
 							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-							  "only one or all direction attributes"
-							  " at once can be used on transfer proxy"
-							  " port");
+							  "only one direction attribute at once"
+							  " can be used on transfer proxy port");
 		} else {
 			if (attr->transfer)
 				return rte_flow_error_set(error, EINVAL,
@@ -4613,11 +4628,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			break;
 		}
 		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
-			if (attr->ingress || attr->egress)
+			if (attr->ingress && priv->sh->config.repr_matching)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when ingress attribute is set");
+			if (attr->egress)
 				return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
 						  "represented port item cannot be used"
-						  " when transfer attribute is set");
+						  " when egress attribute is set");
 			break;
 		case RTE_FLOW_ITEM_TYPE_META:
 			if (!priv->sh->config.dv_esw_en ||
@@ -4679,6 +4699,17 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static bool
+flow_hw_pattern_has_sq_match(const struct rte_flow_item *items)
+{
+	unsigned int i;
+
+	for (i = 0; items[i].type != RTE_FLOW_ITEM_TYPE_END; ++i)
+		if (items[i].type == (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ)
+			return true;
+	return false;
+}
+
 /**
  * Create flow item template.
  *
@@ -4704,17 +4735,53 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
+	uint64_t orig_item_nb;
+	struct rte_flow_item port = {
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	struct rte_flow_item_tag tag_v = {
+		.data = 0,
+		.index = REG_C_0,
+	};
+	struct rte_flow_item_tag tag_m = {
+		.data = flow_hw_tx_tag_regc_mask(dev),
+		.index = 0xff,
+	};
+	struct rte_flow_item tag = {
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &tag_v,
+		.mask = &tag_m,
+		.last = NULL
+	};
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
-		copied_items = flow_hw_copy_prepend_port_item(items, error);
+	orig_item_nb = flow_hw_count_items(items);
+	if (priv->sh->config.dv_esw_en &&
+	    priv->sh->config.repr_matching &&
+	    attr->ingress && !attr->egress && !attr->transfer) {
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &port, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else if (priv->sh->config.dv_esw_en &&
+		   priv->sh->config.repr_matching &&
+		   !attr->ingress && attr->egress && !attr->transfer) {
+		if (flow_hw_pattern_has_sq_match(items)) {
+			DRV_LOG(DEBUG, "Port %u omitting implicit REG_C_0 match for egress "
+				       "pattern template", dev->data->port_id);
+			tmpl_items = items;
+			goto setup_pattern_template;
+		}
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &tag, error);
 		if (!copied_items)
 			return NULL;
 		tmpl_items = copied_items;
 	} else {
 		tmpl_items = items;
 	}
+setup_pattern_template:
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
 		if (copied_items)
@@ -4726,6 +4793,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
+	it->orig_item_nb = orig_item_nb;
 	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
 		if (copied_items)
@@ -4738,11 +4806,15 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
-	it->implicit_port = !!copied_items;
+	if (copied_items) {
+		if (attr->ingress)
+			it->implicit_port = true;
+		else if (attr->egress)
+			it->implicit_tag = true;
+		mlx5_free(copied_items);
+	}
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
-	if (copied_items)
-		mlx5_free(copied_items);
 	return it;
 }
 
@@ -5139,6 +5211,254 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+/**
+ * Create an egress pattern template matching on source SQ.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to pattern template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_repr_sq_pattern_tmpl(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t mask = priv->sh->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(mask != 0);
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT(__builtin_popcount(mask) >= __builtin_popcount(priv->vport_meta_mask));
+	return mask;
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t tag;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(priv->vport_meta_mask != 0);
+	tag = priv->vport_meta_tag >> (rte_bsf32(priv->vport_meta_mask));
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT((tag & priv->sh->dv_regc0_mask) == tag);
+	return tag;
+}
+
+static void
+flow_hw_update_action_mask(struct rte_flow_action *action,
+			   struct rte_flow_action *mask,
+			   enum rte_flow_action_type type,
+			   void *conf_v,
+			   void *conf_m)
+{
+	action->type = type;
+	action->conf = conf_v;
+	mask->type = type;
+	mask->conf = conf_m;
+}
+
+/**
+ * Create an egress actions template with MODIFY_FIELD action for setting unused REG_C_0 bits
+ * to vport tag and JUMP action to group 1.
+ *
+ * If extended metadata mode is enabled, then MODIFY_FIELD action for copying software metadata
+ * to REG_C_1 is added as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to actions template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_repr_tag_jump_acts_tmpl(struct rte_eth_dev *dev)
+{
+	uint32_t tag_mask = flow_hw_tx_tag_regc_mask(dev);
+	uint32_t tag_value = flow_hw_tx_tag_regc_value(dev);
+	struct rte_flow_actions_template_attr attr = {
+		.egress = 1,
+	};
+	struct rte_flow_action_modify_field set_tag_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+			.offset = rte_bsf32(tag_mask),
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = __builtin_popcount(tag_mask),
+	};
+	struct rte_flow_action_modify_field set_tag_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_modify_field copy_metadata_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action_modify_field copy_metadata_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[4] = { { 0 } };
+	struct rte_flow_action actions_m[4] = { { 0 } };
+	unsigned int idx = 0;
+
+	rte_memcpy(set_tag_v.src.value, &tag_value, sizeof(tag_value));
+	rte_memcpy(set_tag_m.src.value, &tag_mask, sizeof(tag_mask));
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+				   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+				   &set_tag_v, &set_tag_m);
+	idx++;
+	if (MLX5_SH(dev)->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+					   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+					   &copy_metadata_v, &copy_metadata_m);
+		idx++;
+	}
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_JUMP,
+				   &jump_v, &jump_m);
+	idx++;
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_END,
+				   NULL, NULL);
+	idx++;
+	MLX5_ASSERT(idx <= RTE_DIM(actions_v));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
+static void
+flow_hw_cleanup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hw_tx_repr_tagging_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_tx_repr_tagging_tbl, NULL);
+		priv->hw_tx_repr_tagging_tbl = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_at) {
+		flow_hw_actions_template_destroy(dev, priv->hw_tx_repr_tagging_at, NULL);
+		priv->hw_tx_repr_tagging_at = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_pt) {
+		flow_hw_pattern_template_destroy(dev, priv->hw_tx_repr_tagging_pt, NULL);
+		priv->hw_tx_repr_tagging_pt = NULL;
+	}
+}
+
+/**
+ * Setup templates and table used to create default Tx flow rules. These default rules
+ * allow for matching Tx representor traffic using a vport tag placed in unused bits of
+ * REG_C_0 register.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static int
+flow_hw_setup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
+
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	priv->hw_tx_repr_tagging_pt = flow_hw_create_tx_repr_sq_pattern_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_pt)
+		goto error;
+	priv->hw_tx_repr_tagging_at = flow_hw_create_tx_repr_tag_jump_acts_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_at)
+		goto error;
+	priv->hw_tx_repr_tagging_tbl = flow_hw_table_create(dev, &cfg,
+							    &priv->hw_tx_repr_tagging_pt, 1,
+							    &priv->hw_tx_repr_tagging_at, 1,
+							    NULL);
+	if (!priv->hw_tx_repr_tagging_tbl)
+		goto error;
+	return 0;
+error:
+	flow_hw_cleanup_tx_repr_tagging(dev);
+	return -rte_errno;
+}
+
 static uint32_t
 flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
@@ -5545,29 +5865,43 @@ flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
 		},
 		.width = UINT32_MAX,
 	};
-	const struct rte_flow_action copy_reg_action[] = {
+	const struct rte_flow_action_jump jump_action = {
+		.group = 1,
+	};
+	const struct rte_flow_action_jump jump_mask = {
+		.group = UINT32_MAX,
+	};
+	const struct rte_flow_action actions[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_action,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
-	const struct rte_flow_action copy_reg_mask[] = {
+	const struct rte_flow_action masks[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_mask,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_mask,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
 	struct rte_flow_error drop_err;
 
 	RTE_SET_USED(drop_err);
-	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
-					       copy_reg_mask, &drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, actions,
+					       masks, &drop_err);
 }
 
 /**
@@ -5745,63 +6079,21 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
 	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
 	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
+	uint32_t repr_matching = priv->sh->config.repr_matching;
 
-	/* Item templates */
+	/* Create templates and table for default SQ miss flow rules - root table. */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
 	if (!esw_mgr_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
-	if (!regc_sq_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
-	if (!port_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
-		if (!tx_meta_items_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Action templates */
 	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
 	if (!regc_jump_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
-	if (!port_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create port action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
-			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
-	if (!jump_one_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
-		if (!tx_meta_actions_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
 			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
@@ -5810,6 +6102,19 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default SQ miss flow rules - non-root table. */
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
 	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
@@ -5818,6 +6123,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default FDB jump flow rules. */
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
 	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
 							       jump_one_actions_tmpl);
@@ -5826,7 +6145,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+	/* Create templates and table for default Tx metadata copy flow rule. */
+	if (!repr_matching && xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
 		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
 		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
 					tx_meta_items_tmpl, tx_meta_actions_tmpl);
@@ -5850,7 +6182,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+	if (tx_meta_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
@@ -5858,7 +6190,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
 	if (regc_jump_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+	if (tx_meta_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
@@ -6199,6 +6531,13 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (priv->sh->config.dv_esw_en && priv->sh->config.repr_matching) {
+		ret = flow_hw_setup_tx_repr_tagging(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
 	if (is_proxy) {
 		ret = flow_hw_create_vport_actions(priv);
 		if (ret) {
@@ -6325,6 +6664,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	flow_hw_cleanup_tx_repr_tagging(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -7720,45 +8060,30 @@ flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 }
 
 /**
- * Destroys control flows created on behalf of @p owner_dev device.
+ * Destroys control flows created on behalf of @p owner device on @p dev device.
  *
- * @param owner_dev
+ * @param dev
+ *   Pointer to Ethernet device on which control flows were created.
+ * @param owner
  *   Pointer to Ethernet device owning control flows.
  *
  * @return
  *   0 on success, otherwise negative error code is returned and
  *   rte_errno is set.
  */
-int
-mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+static int
+flow_hw_flush_ctrl_flows_owned_by(struct rte_eth_dev *dev, struct rte_eth_dev *owner)
 {
-	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
-	struct rte_eth_dev *proxy_dev;
-	struct mlx5_priv *proxy_priv;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hw_ctrl_flow *cf;
 	struct mlx5_hw_ctrl_flow *cf_next;
-	uint16_t owner_port_id = owner_dev->data->port_id;
-	uint16_t proxy_port_id = owner_dev->data->port_id;
 	int ret;
 
-	if (owner_priv->sh->config.dv_esw_en) {
-		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
-			DRV_LOG(ERR, "Unable to find proxy port for port %u",
-				owner_port_id);
-			rte_errno = EINVAL;
-			return -rte_errno;
-		}
-		proxy_dev = &rte_eth_devices[proxy_port_id];
-		proxy_priv = proxy_dev->data->dev_private;
-	} else {
-		proxy_dev = owner_dev;
-		proxy_priv = owner_priv;
-	}
-	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
 	while (cf != NULL) {
 		cf_next = LIST_NEXT(cf, next);
-		if (cf->owner_dev == owner_dev) {
-			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+		if (cf->owner_dev == owner) {
+			ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
 			if (ret) {
 				rte_errno = ret;
 				return -ret;
@@ -7771,6 +8096,50 @@ mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
 	return 0;
 }
 
+/**
+ * Destroys control flows created for @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	/* Flush all flows created by this port for itself. */
+	ret = flow_hw_flush_ctrl_flows_owned_by(owner_dev, owner_dev);
+	if (ret)
+		return ret;
+	/* Flush all flows created for this port on proxy port. */
+	if (owner_priv->sh->config.dv_esw_en) {
+		ret = rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL);
+		if (ret == -ENODEV) {
+			DRV_LOG(DEBUG, "Unable to find transfer proxy port for port %u. It was "
+				       "probably closed. Control flows were cleared.",
+				       owner_port_id);
+			rte_errno = 0;
+			return 0;
+		} else if (ret) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u (ret = %d)",
+				owner_port_id, ret);
+			return ret;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+	} else {
+		proxy_dev = owner_dev;
+	}
+	return flow_hw_flush_ctrl_flows_owned_by(proxy_dev, owner_dev);
+}
+
 /**
  * Destroys all control flows created on @p dev device.
  *
@@ -8022,6 +8391,9 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
@@ -8034,6 +8406,60 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+int
+mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &sq_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	/*
+	 * Allocate actions array suitable for all cases - extended metadata enabled or not.
+	 * With extended metadata there will be an additional MODIFY_FIELD action before JUMP.
+	 */
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD },
+		{ .type = RTE_FLOW_ACTION_TYPE_JUMP },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+
+	/* It is assumed that caller checked for representor matching. */
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	if (!priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Port %u must be configured for HWS, before creating "
+			       "default egress flow rules. Omitting creation.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!priv->hw_tx_repr_tagging_tbl) {
+		DRV_LOG(ERR, "Port %u is configured for HWS, but table for default "
+			     "egress flow rules does not exist.",
+			     dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * If extended metadata mode is enabled, then an additional MODIFY_FIELD action must be
+	 * placed before terminating JUMP action.
+	 */
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		actions[1].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+		actions[2].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	}
+	return flow_hw_create_ctrl_flow(dev, dev, priv->hw_tx_repr_tagging_tbl,
+					items, 0, actions, 0);
+}
+
 void
 mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 {
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 715f2891cf..8c9d5c1b13 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1065,6 +1065,69 @@ mlx5_hairpin_get_peer_ports(struct rte_eth_dev *dev, uint16_t *peer_ports,
 	return ret;
 }
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+
+/**
+ * Check if starting representor port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then starting representor port
+ * is allowed if and only if transfer proxy port is started as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping representor port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_representor_port_allowed_start(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = UINT16_MAX;
+	int ret;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->representor);
+	ret = rte_flow_pick_transfer_proxy(dev->data->port_id, &proxy_port_id, NULL);
+	if (ret) {
+		if (ret == -ENODEV)
+			DRV_LOG(ERR, "Starting representor port %u is not allowed. Transfer "
+				     "proxy port is not available.", dev->data->port_id);
+		else
+			DRV_LOG(ERR, "Failed to pick transfer proxy for port %u (ret = %d)",
+				dev->data->port_id, ret);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (proxy_priv->dr_ctx == NULL) {
+		DRV_LOG(DEBUG, "Starting representor port %u is allowed, but default traffic flows"
+			       " will not be created. Transfer proxy port must be configured"
+			       " for HWS and started.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!proxy_dev->data->dev_started) {
+		DRV_LOG(ERR, "Failed to start port %u: transfer proxy (port %u) must be started",
+			     dev->data->port_id, proxy_port_id);
+		rte_errno = EAGAIN;
+		return -rte_errno;
+	}
+	if (priv->sh->config.repr_matching && !priv->dr_ctx) {
+		DRV_LOG(ERR, "Failed to start port %u: with representor matching enabled, port "
+			     "must be configured for HWS", dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return 0;
+}
+
+#endif
+
 /**
  * DPDK callback to start the device.
  *
@@ -1084,6 +1147,19 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	int fine_inline;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_start;
+		/* If master is being started, then it is always allowed. */
+		if (priv->master)
+			goto continue_dev_start;
+		if (mlx5_hw_representor_port_allowed_start(dev))
+			return -rte_errno;
+	}
+continue_dev_start:
+#endif
 	fine_inline = rte_mbuf_dynflag_lookup
 		(RTE_PMD_MLX5_FINE_GRANULARITY_INLINE, NULL);
 	if (fine_inline >= 0)
@@ -1248,6 +1324,53 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	return -rte_errno;
 }
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+/**
+ * Check if stopping transfer proxy port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then it is allowed to stop it
+ * if and only if all other representor ports are stopped.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping transfer proxy port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_proxy_port_allowed_stop(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	bool representor_started = false;
+	uint16_t port_id;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->master);
+	/* If transfer proxy port was not configured for HWS, then stopping it is allowed. */
+	if (!priv->dr_ctx)
+		return 0;
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_id != dev->data->port_id &&
+		    port_priv->domain_id == priv->domain_id &&
+		    port_dev->data->dev_started)
+			representor_started = true;
+	}
+	if (representor_started) {
+		DRV_LOG(INFO, "Failed to stop port %u: attached representor ports"
+			      " must be stopped before stopping transfer proxy port",
+			      dev->data->port_id);
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
+	return 0;
+}
+#endif
+
 /**
  * DPDK callback to stop the device.
  *
@@ -1261,6 +1384,21 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_stop;
+		/* If representor is being stopped, then it is always allowed. */
+		if (priv->representor)
+			goto continue_dev_stop;
+		if (mlx5_hw_proxy_port_allowed_stop(dev)) {
+			dev->data->dev_started = 1;
+			return -rte_errno;
+		}
+	}
+continue_dev_stop:
+#endif
 	dev->data->dev_started = 0;
 	/* Prevent crashes when queues are still in use. */
 	dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
@@ -1296,13 +1434,21 @@ static int
 mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	unsigned int i;
 	int ret;
 
-	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
-			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
-				goto error;
+	/*
+	 * With extended metadata enabled, the Tx metadata copy is handled by default
+	 * Tx tagging flow rules, so default Tx flow rule is not needed. It is only
+	 * required when representor matching is disabled.
+	 */
+	if (config->dv_esw_en &&
+	    !config->repr_matching &&
+	    config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->master) {
+		if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+			goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
@@ -1311,17 +1457,22 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		if (!txq)
 			continue;
 		queue = mlx5_txq_get_sqn(txq);
-		if ((priv->representor || priv->master) &&
-		    priv->sh->config.dv_esw_en) {
+		if ((priv->representor || priv->master) && config->dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
+		if (config->dv_esw_en && config->repr_matching) {
+			if (mlx5_flow_hw_tx_repr_matching_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.fdb_def_rule) {
-		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+	if (config->fdb_def_rule) {
+		if ((priv->master || priv->representor) && config->dv_esw_en) {
 			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
 				priv->fdb_def_rule = 1;
 			else
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 18/18] net/mlx5: create control flow rules with HWS
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (16 preceding siblings ...)
  2022-10-20  3:22   ` [PATCH v5 17/18] net/mlx5: support device control of representor matching Suanming Mou
@ 2022-10-20  3:22   ` Suanming Mou
  17 siblings, 0 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20  3:22 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adds creation of control flow rules required to receive
default traffic (based on port configuration) with HWS.

Control flow rules are created on port start and destroyed on port stop.
Handling of destroying these rules was already implemented before that
patch.

Control flow rules are created if and only if flow isolation mode is
disabled and creation process goes as follows:

- Port configuration is collected into a set of flags. Each flag
  corresponds to a certain Ethernet pattern type, defined by
  mlx5_flow_ctrl_rx_eth_pattern_type enumeration. There is a separate
  flag for VLAN filtering.
- For each possible Ethernet pattern type and:
  - For each possible RSS action configuration:
    - If configuration flags do not match this combination, it is
      omitted.
    - A template table is created using this combination of pattern
      and actions template (templates are fetched from hw_ctrl_rx
      struct stored in port's private data).
    - Flow rules are created in this table.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/net/mlx5/mlx5.h                |   4 +
 drivers/net/mlx5/mlx5_flow.h           |  56 ++
 drivers/net/mlx5/mlx5_flow_hw.c        | 799 +++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxq.c            |   3 +-
 drivers/net/mlx5/mlx5_trigger.c        |  20 +-
 6 files changed, 881 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 1ec218a5d1..8e3412a7ff 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -245,6 +245,7 @@ New Features
     - Support of meter.
     - Support of counter.
     - Support of CT.
+    - Support of control flow and isolate mode.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 87c90d58d7..911bb43344 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1636,6 +1636,8 @@ struct mlx5_hw_ctrl_flow {
 	struct rte_flow *flow;
 };
 
+struct mlx5_flow_hw_ctrl_rx;
+
 struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared device context. */
@@ -1767,6 +1769,8 @@ struct mlx5_priv {
 	/* Management data for ASO connection tracking. */
 	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
 	struct mlx5_aso_mtr_pool *hws_mpool; /* HW steering's Meter pool. */
+	struct mlx5_flow_hw_ctrl_rx *hw_ctrl_rx;
+	/**< HW steering templates used to create control flow rules. */
 #endif
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index edf45b814d..e9e4537700 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2113,6 +2113,62 @@ rte_col_2_mlx5_col(enum rte_color rcol)
 	return MLX5_FLOW_COLOR_UNDEFINED;
 }
 
+/* All types of Ethernet patterns used in control flow rules. */
+enum mlx5_flow_ctrl_rx_eth_pattern_type {
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL = 0,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX,
+};
+
+/* All types of RSS actions used in control flow rules. */
+enum mlx5_flow_ctrl_rx_expanded_rss_type {
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_NON_IP = 0,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX,
+};
+
+/**
+ * Contains pattern template, template table and its attributes for a single
+ * combination of Ethernet pattern and RSS action. Used to create control flow rules
+ * with HWS.
+ */
+struct mlx5_flow_hw_ctrl_rx_table {
+	struct rte_flow_template_table_attr attr;
+	struct rte_flow_pattern_template *pt;
+	struct rte_flow_template_table *tbl;
+};
+
+/* Contains all templates required to create control flow rules with HWS. */
+struct mlx5_flow_hw_ctrl_rx {
+	struct rte_flow_actions_template *rss[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX];
+	struct mlx5_flow_hw_ctrl_rx_table tables[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX]
+						[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX];
+};
+
+#define MLX5_CTRL_PROMISCUOUS    (RTE_BIT32(0))
+#define MLX5_CTRL_ALL_MULTICAST  (RTE_BIT32(1))
+#define MLX5_CTRL_BROADCAST      (RTE_BIT32(2))
+#define MLX5_CTRL_IPV4_MULTICAST (RTE_BIT32(3))
+#define MLX5_CTRL_IPV6_MULTICAST (RTE_BIT32(4))
+#define MLX5_CTRL_DMAC           (RTE_BIT32(5))
+#define MLX5_CTRL_VLAN_FILTER    (RTE_BIT32(6))
+
+int mlx5_flow_hw_ctrl_flows(struct rte_eth_dev *dev, uint32_t flags);
+void mlx5_flow_hw_cleanup_ctrl_rx_templates(struct rte_eth_dev *dev);
+
 int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
 			     const struct mlx5_flow_tunnel *tunnel,
 			     uint32_t group, uint32_t *table,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 49186c4339..84c891cab6 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -47,6 +47,11 @@
 /* Lowest priority for HW non-root table. */
 #define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
+/* Priorities for Rx control flow rules. */
+#define MLX5_HW_CTRL_RX_PRIO_L2 (MLX5_HW_LOWEST_PRIO_ROOT)
+#define MLX5_HW_CTRL_RX_PRIO_L3 (MLX5_HW_LOWEST_PRIO_ROOT - 1)
+#define MLX5_HW_CTRL_RX_PRIO_L4 (MLX5_HW_LOWEST_PRIO_ROOT - 2)
+
 #define MLX5_HW_VLAN_PUSH_TYPE_IDX 0
 #define MLX5_HW_VLAN_PUSH_VID_IDX 1
 #define MLX5_HW_VLAN_PUSH_PCP_IDX 2
@@ -84,6 +89,72 @@ static uint32_t mlx5_hw_act_flag[MLX5_HW_ACTION_FLAG_MAX]
 	},
 };
 
+/* Ethernet item spec for promiscuous mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_promisc_spec = {
+	.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for promiscuous mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_promisc_mask = {
+	.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for all multicast mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_mcast_spec = {
+	.dst.addr_bytes = "\x01\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for all multicast mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_mcast_mask = {
+	.dst.addr_bytes = "\x01\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for IPv4 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv4_mcast_spec = {
+	.dst.addr_bytes = "\x01\x00\x5e\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for IPv4 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv4_mcast_mask = {
+	.dst.addr_bytes = "\xff\xff\xff\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for IPv6 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv6_mcast_spec = {
+	.dst.addr_bytes = "\x33\x33\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for IPv6 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv6_mcast_mask = {
+	.dst.addr_bytes = "\xff\xff\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item mask for unicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_dmac_mask = {
+	.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for broadcast. */
+static const struct rte_flow_item_eth ctrl_rx_eth_bcast_spec = {
+	.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
 /**
  * Set rxq flag.
  *
@@ -6346,6 +6417,365 @@ flow_hw_create_vlan(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static void
+flow_hw_cleanup_ctrl_rx_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	unsigned int j;
+
+	if (!priv->hw_ctrl_rx)
+		return;
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			struct rte_flow_template_table *tbl = priv->hw_ctrl_rx->tables[i][j].tbl;
+			struct rte_flow_pattern_template *pt = priv->hw_ctrl_rx->tables[i][j].pt;
+
+			if (tbl)
+				claim_zero(flow_hw_table_destroy(dev, tbl, NULL));
+			if (pt)
+				claim_zero(flow_hw_pattern_template_destroy(dev, pt, NULL));
+		}
+	}
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++i) {
+		struct rte_flow_actions_template *at = priv->hw_ctrl_rx->rss[i];
+
+		if (at)
+			claim_zero(flow_hw_actions_template_destroy(dev, at, NULL));
+	}
+	mlx5_free(priv->hw_ctrl_rx);
+	priv->hw_ctrl_rx = NULL;
+}
+
+static uint64_t
+flow_hw_ctrl_rx_rss_type_hash_types(const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	switch (rss_type) {
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_NON_IP:
+		return 0;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4:
+		return RTE_ETH_RSS_IPV4 | RTE_ETH_RSS_FRAG_IPV4 | RTE_ETH_RSS_NONFRAG_IPV4_OTHER;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP:
+		return RTE_ETH_RSS_NONFRAG_IPV4_UDP;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP:
+		return RTE_ETH_RSS_NONFRAG_IPV4_TCP;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6:
+		return RTE_ETH_RSS_IPV6 | RTE_ETH_RSS_FRAG_IPV6 | RTE_ETH_RSS_NONFRAG_IPV6_OTHER;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP:
+		return RTE_ETH_RSS_NONFRAG_IPV6_UDP | RTE_ETH_RSS_IPV6_UDP_EX;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP:
+		return RTE_ETH_RSS_NONFRAG_IPV6_TCP | RTE_ETH_RSS_IPV6_TCP_EX;
+	default:
+		/* Should not reach here. */
+		MLX5_ASSERT(false);
+		return 0;
+	}
+}
+
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_rx_rss_template(struct rte_eth_dev *dev,
+				    const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_actions_template_attr attr = {
+		.ingress = 1,
+	};
+	uint16_t queue[RTE_MAX_QUEUES_PER_PORT];
+	struct rte_flow_action_rss rss_conf = {
+		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		.level = 0,
+		.types = 0,
+		.key_len = priv->rss_conf.rss_key_len,
+		.key = priv->rss_conf.rss_key,
+		.queue_num = priv->reta_idx_n,
+		.queue = queue,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_RSS,
+			.conf = &rss_conf,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action masks[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_RSS,
+			.conf = &rss_conf,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_actions_template *at;
+	struct rte_flow_error error;
+	unsigned int i;
+
+	MLX5_ASSERT(priv->reta_idx_n > 0 && priv->reta_idx);
+	/* Select proper RSS hash types and based on that configure the actions template. */
+	rss_conf.types = flow_hw_ctrl_rx_rss_type_hash_types(rss_type);
+	if (rss_conf.types) {
+		for (i = 0; i < priv->reta_idx_n; ++i)
+			queue[i] = (*priv->reta_idx)[i];
+	} else {
+		rss_conf.queue_num = 1;
+		queue[0] = (*priv->reta_idx)[0];
+	}
+	at = flow_hw_actions_template_create(dev, &attr, actions, masks, &error);
+	if (!at)
+		DRV_LOG(ERR,
+			"Failed to create ctrl flow actions template: rte_errno(%d), type(%d): %s",
+			rte_errno, error.type,
+			error.message ? error.message : "(no stated reason)");
+	return at;
+}
+
+static uint32_t ctrl_rx_rss_priority_map[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX] = {
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_NON_IP] = MLX5_HW_CTRL_RX_PRIO_L2,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4] = MLX5_HW_CTRL_RX_PRIO_L3,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP] = MLX5_HW_CTRL_RX_PRIO_L4,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP] = MLX5_HW_CTRL_RX_PRIO_L4,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6] = MLX5_HW_CTRL_RX_PRIO_L3,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP] = MLX5_HW_CTRL_RX_PRIO_L4,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP] = MLX5_HW_CTRL_RX_PRIO_L4,
+};
+
+static uint32_t ctrl_rx_nb_flows_map[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX] = {
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN] = MLX5_MAX_VLAN_IDS,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN] = MLX5_MAX_VLAN_IDS,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN] = MLX5_MAX_VLAN_IDS,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC] = MLX5_MAX_UC_MAC_ADDRESSES,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN] =
+			MLX5_MAX_UC_MAC_ADDRESSES * MLX5_MAX_VLAN_IDS,
+};
+
+static struct rte_flow_template_table_attr
+flow_hw_get_ctrl_rx_table_attr(enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type,
+			       const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	return (struct rte_flow_template_table_attr){
+		.flow_attr = {
+			.group = 0,
+			.priority = ctrl_rx_rss_priority_map[rss_type],
+			.ingress = 1,
+		},
+		.nb_flows = ctrl_rx_nb_flows_map[eth_pattern_type],
+	};
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_eth_item(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.mask = NULL,
+	};
+
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+		item.mask = &ctrl_rx_eth_promisc_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+		item.mask = &ctrl_rx_eth_mcast_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		item.mask = &ctrl_rx_eth_dmac_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+		item.mask = &ctrl_rx_eth_ipv4_mcast_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		item.mask = &ctrl_rx_eth_ipv6_mcast_mask;
+		break;
+	default:
+		/* Should not reach here - ETH mask must be present. */
+		item.type = RTE_FLOW_ITEM_TYPE_END;
+		MLX5_ASSERT(false);
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_vlan_item(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_VOID,
+		.mask = NULL,
+	};
+
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		item.type = RTE_FLOW_ITEM_TYPE_VLAN;
+		item.mask = &rte_flow_item_vlan_mask;
+		break;
+	default:
+		/* Nothing to update. */
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_l3_item(const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_VOID,
+		.mask = NULL,
+	};
+
+	switch (rss_type) {
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP:
+		item.type = RTE_FLOW_ITEM_TYPE_IPV4;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP:
+		item.type = RTE_FLOW_ITEM_TYPE_IPV6;
+		break;
+	default:
+		/* Nothing to update. */
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_l4_item(const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_VOID,
+		.mask = NULL,
+	};
+
+	switch (rss_type) {
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP:
+		item.type = RTE_FLOW_ITEM_TYPE_UDP;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP:
+		item.type = RTE_FLOW_ITEM_TYPE_TCP;
+		break;
+	default:
+		/* Nothing to update. */
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_rx_pattern_template
+		(struct rte_eth_dev *dev,
+		 const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type,
+		 const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	const struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.ingress = 1,
+	};
+	struct rte_flow_item items[] = {
+		/* Matching patterns */
+		flow_hw_get_ctrl_rx_eth_item(eth_pattern_type),
+		flow_hw_get_ctrl_rx_vlan_item(eth_pattern_type),
+		flow_hw_get_ctrl_rx_l3_item(rss_type),
+		flow_hw_get_ctrl_rx_l4_item(rss_type),
+		/* Terminate pattern */
+		{ .type = RTE_FLOW_ITEM_TYPE_END }
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+static int
+flow_hw_create_ctrl_rx_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	unsigned int j;
+	int ret;
+
+	MLX5_ASSERT(!priv->hw_ctrl_rx);
+	priv->hw_ctrl_rx = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*priv->hw_ctrl_rx),
+				       RTE_CACHE_LINE_SIZE, rte_socket_id());
+	if (!priv->hw_ctrl_rx) {
+		DRV_LOG(ERR, "Failed to allocate memory for Rx control flow tables");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	/* Create all pattern template variants. */
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type = i;
+
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type = j;
+			struct rte_flow_template_table_attr attr;
+			struct rte_flow_pattern_template *pt;
+
+			attr = flow_hw_get_ctrl_rx_table_attr(eth_pattern_type, rss_type);
+			pt = flow_hw_create_ctrl_rx_pattern_template(dev, eth_pattern_type,
+								     rss_type);
+			if (!pt)
+				goto err;
+			priv->hw_ctrl_rx->tables[i][j].attr = attr;
+			priv->hw_ctrl_rx->tables[i][j].pt = pt;
+		}
+	}
+	return 0;
+err:
+	ret = rte_errno;
+	flow_hw_cleanup_ctrl_rx_tables(dev);
+	rte_errno = ret;
+	return -ret;
+}
+
+void
+mlx5_flow_hw_cleanup_ctrl_rx_templates(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_hw_ctrl_rx *hw_ctrl_rx;
+	unsigned int i;
+	unsigned int j;
+
+	if (!priv->dr_ctx)
+		return;
+	if (!priv->hw_ctrl_rx)
+		return;
+	hw_ctrl_rx = priv->hw_ctrl_rx;
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			struct mlx5_flow_hw_ctrl_rx_table *tmpls = &hw_ctrl_rx->tables[i][j];
+
+			if (tmpls->tbl) {
+				claim_zero(flow_hw_table_destroy(dev, tmpls->tbl, NULL));
+				tmpls->tbl = NULL;
+			}
+		}
+	}
+	for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+		if (hw_ctrl_rx->rss[j]) {
+			claim_zero(flow_hw_actions_template_destroy(dev, hw_ctrl_rx->rss[j], NULL));
+			hw_ctrl_rx->rss[j] = NULL;
+		}
+	}
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -6512,6 +6942,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	priv->nb_queue = nb_q_updated;
 	rte_spinlock_init(&priv->hw_ctrl_lock);
 	LIST_INIT(&priv->hw_ctrl_flows);
+	ret = flow_hw_create_ctrl_rx_tables(dev);
+	if (ret) {
+		rte_flow_error_set(error, -ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   "Failed to set up Rx control flow templates");
+		goto err;
+	}
 	/* Initialize meter library*/
 	if (port_attr->nb_meters)
 		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1, nb_q_updated))
@@ -6665,6 +7101,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
 	flow_hw_cleanup_tx_repr_tagging(dev);
+	flow_hw_cleanup_ctrl_rx_tables(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -8460,6 +8897,368 @@ mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn)
 					items, 0, actions, 0);
 }
 
+static uint32_t
+__calc_pattern_flags(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+		return MLX5_CTRL_PROMISCUOUS;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+		return MLX5_CTRL_ALL_MULTICAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+		return MLX5_CTRL_BROADCAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+		return MLX5_CTRL_IPV4_MULTICAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		return MLX5_CTRL_IPV6_MULTICAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		return MLX5_CTRL_DMAC;
+	default:
+		/* Should not reach here. */
+		MLX5_ASSERT(false);
+		return 0;
+	}
+}
+
+static uint32_t
+__calc_vlan_flags(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		return MLX5_CTRL_VLAN_FILTER;
+	default:
+		return 0;
+	}
+}
+
+static bool
+eth_pattern_type_is_requested(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type,
+			      uint32_t flags)
+{
+	uint32_t pattern_flags = __calc_pattern_flags(eth_pattern_type);
+	uint32_t vlan_flags = __calc_vlan_flags(eth_pattern_type);
+	bool pattern_requested = !!(pattern_flags & flags);
+	bool consider_vlan = vlan_flags || (MLX5_CTRL_VLAN_FILTER & flags);
+	bool vlan_requested = !!(vlan_flags & flags);
+
+	if (consider_vlan)
+		return pattern_requested && vlan_requested;
+	else
+		return pattern_requested;
+}
+
+static bool
+rss_type_is_requested(struct mlx5_priv *priv,
+		      const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_actions_template *at = priv->hw_ctrl_rx->rss[rss_type];
+	unsigned int i;
+
+	for (i = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		if (at->actions[i].type == RTE_FLOW_ACTION_TYPE_RSS) {
+			const struct rte_flow_action_rss *rss = at->actions[i].conf;
+			uint64_t rss_types = rss->types;
+
+			if ((rss_types & priv->rss_conf.rss_hf) != rss_types)
+				return false;
+		}
+	}
+	return true;
+}
+
+static const struct rte_flow_item_eth *
+__get_eth_spec(const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern)
+{
+	switch (pattern) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+		return &ctrl_rx_eth_promisc_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+		return &ctrl_rx_eth_mcast_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+		return &ctrl_rx_eth_bcast_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+		return &ctrl_rx_eth_ipv4_mcast_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		return &ctrl_rx_eth_ipv6_mcast_spec;
+	default:
+		/* This case should not be reached. */
+		MLX5_ASSERT(false);
+		return NULL;
+	}
+}
+
+static int
+__flow_hw_ctrl_flows_single(struct rte_eth_dev *dev,
+			    struct rte_flow_template_table *tbl,
+			    const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+			    const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	const struct rte_flow_item_eth *eth_spec = __get_eth_spec(pattern_type);
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+
+	if (!eth_spec)
+		return -EINVAL;
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = eth_spec,
+	};
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VOID };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	/* Without VLAN filtering, only a single flow rule must be created. */
+	return flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0);
+}
+
+static int
+__flow_hw_ctrl_flows_single_vlan(struct rte_eth_dev *dev,
+				 struct rte_flow_template_table *tbl,
+				 const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+				 const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_item_eth *eth_spec = __get_eth_spec(pattern_type);
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+	unsigned int i;
+
+	if (!eth_spec)
+		return -EINVAL;
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = eth_spec,
+	};
+	/* Optional VLAN for now will be VOID - will be filled later. */
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VLAN };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	/* Since VLAN filtering is done, create a single flow rule for each registered vid. */
+	for (i = 0; i < priv->vlan_filter_n; ++i) {
+		uint16_t vlan = priv->vlan_filter[i];
+		struct rte_flow_item_vlan vlan_spec = {
+			.tci = rte_cpu_to_be_16(vlan),
+		};
+
+		items[1].spec = &vlan_spec;
+		if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0))
+			return -rte_errno;
+	}
+	return 0;
+}
+
+static int
+__flow_hw_ctrl_flows_unicast(struct rte_eth_dev *dev,
+			     struct rte_flow_template_table *tbl,
+			     const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+			     const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_item_eth eth_spec;
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+	const struct rte_ether_addr cmp = {
+		.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	};
+	unsigned int i;
+
+	RTE_SET_USED(pattern_type);
+
+	memset(&eth_spec, 0, sizeof(eth_spec));
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = &eth_spec,
+	};
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VOID };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	for (i = 0; i < MLX5_MAX_MAC_ADDRESSES; ++i) {
+		struct rte_ether_addr *mac = &dev->data->mac_addrs[i];
+
+		if (!memcmp(mac, &cmp, sizeof(*mac)))
+			continue;
+		memcpy(&eth_spec.dst.addr_bytes, mac->addr_bytes, RTE_ETHER_ADDR_LEN);
+		if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0))
+			return -rte_errno;
+	}
+	return 0;
+}
+
+static int
+__flow_hw_ctrl_flows_unicast_vlan(struct rte_eth_dev *dev,
+				  struct rte_flow_template_table *tbl,
+				  const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+				  const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_eth eth_spec;
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+	const struct rte_ether_addr cmp = {
+		.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	};
+	unsigned int i;
+	unsigned int j;
+
+	RTE_SET_USED(pattern_type);
+
+	memset(&eth_spec, 0, sizeof(eth_spec));
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = &eth_spec,
+	};
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VLAN };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	for (i = 0; i < MLX5_MAX_MAC_ADDRESSES; ++i) {
+		struct rte_ether_addr *mac = &dev->data->mac_addrs[i];
+
+		if (!memcmp(mac, &cmp, sizeof(*mac)))
+			continue;
+		memcpy(&eth_spec.dst.addr_bytes, mac->addr_bytes, RTE_ETHER_ADDR_LEN);
+		for (j = 0; j < priv->vlan_filter_n; ++j) {
+			uint16_t vlan = priv->vlan_filter[j];
+			struct rte_flow_item_vlan vlan_spec = {
+				.tci = rte_cpu_to_be_16(vlan),
+			};
+
+			items[1].spec = &vlan_spec;
+			if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0))
+				return -rte_errno;
+		}
+	}
+	return 0;
+}
+
+static int
+__flow_hw_ctrl_flows(struct rte_eth_dev *dev,
+		     struct rte_flow_template_table *tbl,
+		     const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+		     const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	switch (pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+		return __flow_hw_ctrl_flows_single(dev, tbl, pattern_type, rss_type);
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		return __flow_hw_ctrl_flows_single_vlan(dev, tbl, pattern_type, rss_type);
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC:
+		return __flow_hw_ctrl_flows_unicast(dev, tbl, pattern_type, rss_type);
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		return __flow_hw_ctrl_flows_unicast_vlan(dev, tbl, pattern_type, rss_type);
+	default:
+		/* Should not reach here. */
+		MLX5_ASSERT(false);
+		rte_errno = EINVAL;
+		return -EINVAL;
+	}
+}
+
+
+int
+mlx5_flow_hw_ctrl_flows(struct rte_eth_dev *dev, uint32_t flags)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_hw_ctrl_rx *hw_ctrl_rx;
+	unsigned int i;
+	unsigned int j;
+	int ret = 0;
+
+	RTE_SET_USED(priv);
+	RTE_SET_USED(flags);
+	if (!priv->dr_ctx) {
+		DRV_LOG(DEBUG, "port %u Control flow rules will not be created. "
+			       "HWS needs to be configured beforehand.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!priv->hw_ctrl_rx) {
+		DRV_LOG(ERR, "port %u Control flow rules templates were not created.",
+			dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	hw_ctrl_rx = priv->hw_ctrl_rx;
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type = i;
+
+		if (!eth_pattern_type_is_requested(eth_pattern_type, flags))
+			continue;
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type = j;
+			struct rte_flow_actions_template *at;
+			struct mlx5_flow_hw_ctrl_rx_table *tmpls = &hw_ctrl_rx->tables[i][j];
+			const struct mlx5_flow_template_table_cfg cfg = {
+				.attr = tmpls->attr,
+				.external = 0,
+			};
+
+			if (!hw_ctrl_rx->rss[rss_type]) {
+				at = flow_hw_create_ctrl_rx_rss_template(dev, rss_type);
+				if (!at)
+					return -rte_errno;
+				hw_ctrl_rx->rss[rss_type] = at;
+			} else {
+				at = hw_ctrl_rx->rss[rss_type];
+			}
+			if (!rss_type_is_requested(priv, rss_type))
+				continue;
+			if (!tmpls->tbl) {
+				tmpls->tbl = flow_hw_table_create(dev, &cfg,
+								  &tmpls->pt, 1, &at, 1, NULL);
+				if (!tmpls->tbl) {
+					DRV_LOG(ERR, "port %u Failed to create template table "
+						     "for control flow rules. Unable to create "
+						     "control flow rules.",
+						     dev->data->port_id);
+					return -rte_errno;
+				}
+			}
+
+			ret = __flow_hw_ctrl_flows(dev, tmpls->tbl, eth_pattern_type, rss_type);
+			if (ret) {
+				DRV_LOG(ERR, "port %u Failed to create control flow rule.",
+					dev->data->port_id);
+				return ret;
+			}
+		}
+	}
+	return 0;
+}
+
 void
 mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 {
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b1543b480e..b7818f9598 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2568,13 +2568,14 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_ind_table_obj *ind_tbl;
 	int ret;
+	uint32_t max_queues_n = priv->rxqs_n > queues_n ? priv->rxqs_n : queues_n;
 
 	/*
 	 * Allocate maximum queues for shared action as queue number
 	 * maybe modified later.
 	 */
 	ind_tbl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*ind_tbl) +
-			      (standalone ? priv->rxqs_n : queues_n) *
+			      (standalone ? max_queues_n : queues_n) *
 			      sizeof(uint16_t), 0, SOCKET_ID_ANY);
 	if (!ind_tbl) {
 		rte_errno = ENOMEM;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 8c9d5c1b13..4b821a1076 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1415,6 +1415,9 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
 	mlx5_action_handle_detach(dev);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	mlx5_flow_hw_cleanup_ctrl_rx_templates(dev);
+#endif
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
@@ -1435,6 +1438,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_sh_config *config = &priv->sh->config;
+	uint64_t flags = 0;
 	unsigned int i;
 	int ret;
 
@@ -1481,7 +1485,18 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	} else {
 		DRV_LOG(INFO, "port %u FDB default rule is disabled", dev->data->port_id);
 	}
-	return 0;
+	if (priv->isolated)
+		return 0;
+	if (dev->data->promiscuous)
+		flags |= MLX5_CTRL_PROMISCUOUS;
+	if (dev->data->all_multicast)
+		flags |= MLX5_CTRL_ALL_MULTICAST;
+	else
+		flags |= MLX5_CTRL_BROADCAST | MLX5_CTRL_IPV4_MULTICAST | MLX5_CTRL_IPV6_MULTICAST;
+	flags |= MLX5_CTRL_DMAC;
+	if (priv->vlan_filter_n)
+		flags |= MLX5_CTRL_VLAN_FILTER;
+	return mlx5_flow_hw_ctrl_flows(dev, flags);
 error:
 	ret = rte_errno;
 	mlx5_flow_hw_flush_ctrl_flows(dev);
@@ -1717,6 +1732,9 @@ mlx5_traffic_restart(struct rte_eth_dev *dev)
 {
 	if (dev->data->dev_started) {
 		mlx5_traffic_disable(dev);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		mlx5_flow_hw_cleanup_ctrl_rx_templates(dev);
+#endif
 		return mlx5_traffic_enable(dev);
 	}
 	return 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 00/18] net/mlx5: HW steering PMD update
  2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
                   ` (30 preceding siblings ...)
  2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-10-20 15:41 ` Suanming Mou
  2022-10-20 15:41   ` [PATCH v6 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
                     ` (18 more replies)
  31 siblings, 19 replies; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  Cc: dev, rasland, orika

The skeleton of mlx5 HW steering(HWS) has been updated into
upstream for pretty a long time, but not updated anymore due
to missing of the low-level steering layer code. Luckily,
better late than never, the steering layer finnaly comes[1].

This series will add more features to the existing PMD code:
 - FDB and metadata copy.
 - Modify field.
 - Meter color.
 - Counter.
 - Aging.
 - Action template pre-parser optimization.
 - Connection tracking.
 - Control flow.

Some features such as meter/aging/ct touches the public API,
and public API changes have been sent to ML much earily in
other threads in order not to be swallowed by this big series.

The dpends patches as below:
 [1]https://inbox.dpdk.org/dev/20220922190345.394-1-valex@nvidia.com/

---

 v6:
  - Rebase to the latest version.

 v5:
  - Rebase to the latest version.

 v4:
  - Disable aging due to the flow age API change still in progress.
    https://patches.dpdk.org/project/dpdk/cover/20221019144904.2543586-1-michaelba@nvidia.com/
  - Add control flow for HWS.

 v3:
  - Fixed flow can't be aged out.
  - Fix error not be filled properly while table creat failed.
  - Remove transfer_mode in flow attributes before ethdev layer applied.
    https://patches.dpdk.org/project/dpdk/patch/20220928092425.68214-1-rongweil@nvidia.com/

 v2:
  - Remove the rte_flow patches as they will be integrated in other thread.
  - Fix compilation issues.
  - Make the patches be better organized.

Alexander Kozyrev (2):
  net/mlx5: add HW steering meter action
  net/mlx5: implement METER MARK indirect action for HWS

Bing Zhao (1):
  net/mlx5: add extended metadata mode for hardware steering

Dariusz Sosnowski (5):
  net/mlx5: add HW steering port action
  net/mlx5: support DR action template API
  net/mlx5: support device control for E-Switch default rule
  net/mlx5: support device control of representor matching
  net/mlx5: create control flow rules with HWS

Gregory Etelson (2):
  net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  net/mlx5: support flow integrity in HWS group 0

Michael Baum (1):
  net/mlx5: add HWS AGE action support

Suanming Mou (6):
  net/mlx5: fix invalid flow attributes
  net/mlx5: fix IPv6 and TCP RSS hash fields
  net/mlx5: add shared header reformat support
  net/mlx5: add modify field hws support
  net/mlx5: add HW steering connection tracking support
  net/mlx5: add async action push and pull support

Xiaoyu Min (1):
  net/mlx5: add HW steering counter action

 doc/guides/nics/features/default.ini   |    1 +
 doc/guides/nics/features/mlx5.ini      |    2 +
 doc/guides/nics/mlx5.rst               |   43 +-
 doc/guides/rel_notes/release_22_11.rst |    8 +-
 drivers/common/mlx5/mlx5_devx_cmds.c   |   50 +
 drivers/common/mlx5/mlx5_devx_cmds.h   |   27 +
 drivers/common/mlx5/mlx5_prm.h         |   22 +-
 drivers/common/mlx5/version.map        |    1 +
 drivers/net/mlx5/linux/mlx5_os.c       |   78 +-
 drivers/net/mlx5/meson.build           |    1 +
 drivers/net/mlx5/mlx5.c                |  126 +-
 drivers/net/mlx5/mlx5.h                |  322 +-
 drivers/net/mlx5/mlx5_defs.h           |    5 +
 drivers/net/mlx5/mlx5_flow.c           |  409 +-
 drivers/net/mlx5/mlx5_flow.h           |  335 +-
 drivers/net/mlx5/mlx5_flow_aso.c       |  797 ++-
 drivers/net/mlx5/mlx5_flow_dv.c        | 1128 +--
 drivers/net/mlx5/mlx5_flow_hw.c        | 8789 +++++++++++++++++++++---
 drivers/net/mlx5/mlx5_flow_meter.c     |  776 ++-
 drivers/net/mlx5/mlx5_flow_verbs.c     |    8 +-
 drivers/net/mlx5/mlx5_hws_cnt.c        | 1247 ++++
 drivers/net/mlx5/mlx5_hws_cnt.h        |  703 ++
 drivers/net/mlx5/mlx5_rxq.c            |    3 +-
 drivers/net/mlx5/mlx5_trigger.c        |  272 +-
 drivers/net/mlx5/mlx5_tx.h             |    1 +
 drivers/net/mlx5/mlx5_txq.c            |   47 +
 drivers/net/mlx5/mlx5_utils.h          |   10 +-
 drivers/net/mlx5/rte_pmd_mlx5.h        |   17 +
 drivers/net/mlx5/version.map           |    1 +
 29 files changed, 13589 insertions(+), 1640 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 01/18] net/mlx5: fix invalid flow attributes
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:43     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
                     ` (17 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the function flow_get_drv_type(), attr will be read in non-HWS mode.
In case user call the HWS API in SWS mode, attr should be placed in
HWS functions, or it will cause crash.

Fixes: c40c061a022e ("net/mlx5: add basic flow queue operation")

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c | 38 ++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 1e32031443..eb8faf90f7 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -3742,6 +3742,8 @@ flow_get_drv_type(struct rte_eth_dev *dev, const struct rte_flow_attr *attr)
 	 */
 	if (priv->sh->config.dv_flow_en == 2)
 		return MLX5_FLOW_TYPE_HW;
+	if (!attr)
+		return MLX5_FLOW_TYPE_MIN;
 	/* If no OS specific type - continue with DV/VERBS selection */
 	if (attr->transfer && priv->sh->config.dv_esw_en)
 		type = MLX5_FLOW_TYPE_DV;
@@ -8254,8 +8256,9 @@ mlx5_flow_info_get(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8289,8 +8292,9 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8321,8 +8325,9 @@ mlx5_flow_pattern_template_create(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8352,8 +8357,9 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8387,8 +8393,9 @@ mlx5_flow_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8418,8 +8425,9 @@ mlx5_flow_actions_template_destroy(struct rte_eth_dev *dev,
 				   struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8459,8 +8467,9 @@ mlx5_flow_table_create(struct rte_eth_dev *dev,
 		       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8496,8 +8505,9 @@ mlx5_flow_table_destroy(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8544,8 +8554,9 @@ mlx5_flow_async_flow_create(struct rte_eth_dev *dev,
 			    struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW) {
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
 		rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8587,8 +8598,9 @@ mlx5_flow_async_flow_destroy(struct rte_eth_dev *dev,
 			     struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8623,8 +8635,9 @@ mlx5_flow_pull(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -8652,8 +8665,9 @@ mlx5_flow_push(struct rte_eth_dev *dev,
 	       struct rte_flow_error *error)
 {
 	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = {0};
 
-	if (flow_get_drv_type(dev, NULL) != MLX5_FLOW_TYPE_HW)
+	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
 		return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
  2022-10-20 15:41   ` [PATCH v6 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:43     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 03/18] net/mlx5: add shared header reformat support Suanming Mou
                     ` (16 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

In the flow_dv_hashfields_set() function, while item_flags was 0,
the code went directly to the first if and the else case would
never have chance be checked. This caused the IPv6 and TCP hash
fields in the else case would never be set.

This commit adds the dedicate HW steering hash field set function
to generate the RSS hash fields.

Fixes: 3a2f674b6aa8 ("net/mlx5: add queue and RSS HW steering action")
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 12 +++----
 drivers/net/mlx5/mlx5_flow_hw.c | 59 ++++++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index fb542ffde9..5dd93078ac 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -11276,8 +11276,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		rss_inner = 1;
 #endif
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV4)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4)) ||
-	     !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV4))) {
 		if (rss_types & MLX5_IPV4_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV4;
@@ -11287,8 +11286,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_IPV4_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L3_IPV6)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L3_IPV6))) {
 		if (rss_types & MLX5_IPV6_LAYER_TYPES) {
 			if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_IPV6;
@@ -11311,8 +11309,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 		return;
 	}
 	if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_UDP)) ||
-	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP)) ||
-	    !items) {
+	    (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_UDP))) {
 		if (rss_types & RTE_ETH_RSS_UDP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_UDP;
@@ -11322,8 +11319,7 @@ flow_dv_hashfields_set(uint64_t item_flags,
 				fields |= MLX5_UDP_IBV_RX_HASH;
 		}
 	} else if ((rss_inner && (items & MLX5_FLOW_LAYER_INNER_L4_TCP)) ||
-		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP)) ||
-		   !items) {
+		   (!rss_inner && (items & MLX5_FLOW_LAYER_OUTER_L4_TCP))) {
 		if (rss_types & RTE_ETH_RSS_TCP) {
 			if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
 				fields |= IBV_RX_HASH_SRC_PORT_TCP;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 8de6757737..28b24490e4 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -62,6 +62,63 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	priv->mark_enabled = enable;
 }
 
+/**
+ * Set the hash fields according to the @p rss_desc information.
+ *
+ * @param[in] rss_desc
+ *   Pointer to the mlx5_flow_rss_desc.
+ * @param[out] hash_fields
+ *   Pointer to the RSS hash fields.
+ */
+static void
+flow_hw_hashfields_set(struct mlx5_flow_rss_desc *rss_desc,
+		       uint64_t *hash_fields)
+{
+	uint64_t fields = 0;
+	int rss_inner = 0;
+	uint64_t rss_types = rte_eth_rss_hf_refine(rss_desc->types);
+
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	if (rss_desc->level >= 2)
+		rss_inner = 1;
+#endif
+	if (rss_types & MLX5_IPV4_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV4;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV4;
+		else
+			fields |= MLX5_IPV4_IBV_RX_HASH;
+	} else if (rss_types & MLX5_IPV6_LAYER_TYPES) {
+		if (rss_types & RTE_ETH_RSS_L3_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_IPV6;
+		else if (rss_types & RTE_ETH_RSS_L3_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_IPV6;
+		else
+			fields |= MLX5_IPV6_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_UDP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_UDP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_UDP;
+		else
+			fields |= MLX5_UDP_IBV_RX_HASH;
+	} else if (rss_types & RTE_ETH_RSS_TCP) {
+		if (rss_types & RTE_ETH_RSS_L4_SRC_ONLY)
+			fields |= IBV_RX_HASH_SRC_PORT_TCP;
+		else if (rss_types & RTE_ETH_RSS_L4_DST_ONLY)
+			fields |= IBV_RX_HASH_DST_PORT_TCP;
+		else
+			fields |= MLX5_TCP_IBV_RX_HASH;
+	}
+	if (rss_types & RTE_ETH_RSS_ESP)
+		fields |= IBV_RX_HASH_IPSEC_SPI;
+	if (rss_inner)
+		fields |= IBV_RX_HASH_INNER;
+	*hash_fields = fields;
+}
+
 /**
  * Generate the pattern item flags.
  * Will be used for shared RSS action.
@@ -225,7 +282,7 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 		       MLX5_RSS_HASH_KEY_LEN);
 		rss_desc.key_len = MLX5_RSS_HASH_KEY_LEN;
 		rss_desc.types = !rss->types ? RTE_ETH_RSS_IP : rss->types;
-		flow_dv_hashfields_set(0, &rss_desc, &rss_desc.hash_fields);
+		flow_hw_hashfields_set(&rss_desc, &rss_desc.hash_fields);
 		flow_dv_action_rss_l34_hash_adjust(rss->types,
 						   &rss_desc.hash_fields);
 		if (rss->level > 1) {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 03/18] net/mlx5: add shared header reformat support
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
  2022-10-20 15:41   ` [PATCH v6 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
  2022-10-20 15:41   ` [PATCH v6 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:44     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 04/18] net/mlx5: add modify field hws support Suanming Mou
                     ` (15 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

As the rte_flow_async API defines, the action mask with field value
not be 0 means the action will be used as shared in all the flows in
the table.

The header reformat action with action mask field not be 0 will be
created as constant shared action. For encapsulation header reformat
action, there are two kinds of encapsulation data, raw_encap_data
and rte_flow_item encap_data. Both of these two kinds of data can be
identified from the action mask conf as constant or not.

Examples:
1. VXLAN encap (encap_data: rte_flow_item)
	action conf (eth/ipv4/udp/vxlan_hdr)

	a. action mask conf (eth/ipv4/udp/vxlan_hdr)
	  - items are constant.
	b. action mask conf (NULL)
	  - items will change.

2. RAW encap (encap_data: raw)
	action conf (raw_data)

	a. action mask conf (not NULL)
	  - encap_data constant.
	b. action mask conf (NULL)
	  - encap_data will change.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   6 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 124 ++++++++++----------------------
 2 files changed, 39 insertions(+), 91 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index a274808802..b225528216 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1078,10 +1078,6 @@ struct mlx5_action_construct_data {
 	uint16_t action_dst; /* mlx5dr_rule_action dst offset. */
 	union {
 		struct {
-			/* encap src(item) offset. */
-			uint16_t src;
-			/* encap dst data offset. */
-			uint16_t dst;
 			/* encap data len. */
 			uint16_t len;
 		} encap;
@@ -1124,6 +1120,8 @@ struct mlx5_hw_jump_action {
 /* Encap decap action struct. */
 struct mlx5_hw_encap_decap_action {
 	struct mlx5dr_action *action; /* Action object. */
+	/* Is header_reformat action shared across flows in table. */
+	bool shared;
 	size_t data_size; /* Action metadata size. */
 	uint8_t data[]; /* Action data. */
 };
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 28b24490e4..066ce4694b 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -402,10 +402,6 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
  *   Offset of source rte flow action.
  * @param[in] action_dst
  *   Offset of destination DR action.
- * @param[in] encap_src
- *   Offset of source encap raw data.
- * @param[in] encap_dst
- *   Offset of destination encap raw data.
  * @param[in] len
  *   Length of the data to be updated.
  *
@@ -418,16 +414,12 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				enum rte_flow_action_type type,
 				uint16_t action_src,
 				uint16_t action_dst,
-				uint16_t encap_src,
-				uint16_t encap_dst,
 				uint16_t len)
 {	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
 		return -1;
-	act_data->encap.src = encap_src;
-	act_data->encap.dst = encap_dst;
 	act_data->encap.len = len;
 	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
 	return 0;
@@ -523,53 +515,6 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
-/**
- * Translate encap items to encapsulation list.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev data structure.
- * @param[in] acts
- *   Pointer to the template HW steering DR actions.
- * @param[in] type
- *   Action type.
- * @param[in] action_src
- *   Offset of source rte flow action.
- * @param[in] action_dst
- *   Offset of destination DR action.
- * @param[in] items
- *   Encap item pattern.
- * @param[in] items_m
- *   Encap item mask indicates which part are constant and dynamic.
- *
- * @return
- *    0 on success, negative value otherwise and rte_errno is set.
- */
-static __rte_always_inline int
-flow_hw_encap_item_translate(struct rte_eth_dev *dev,
-			     struct mlx5_hw_actions *acts,
-			     enum rte_flow_action_type type,
-			     uint16_t action_src,
-			     uint16_t action_dst,
-			     const struct rte_flow_item *items,
-			     const struct rte_flow_item *items_m)
-{
-	struct mlx5_priv *priv = dev->data->dev_private;
-	size_t len, total_len = 0;
-	uint32_t i = 0;
-
-	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++, items_m++, i++) {
-		len = flow_dv_get_item_hdr_len(items->type);
-		if ((!items_m->spec ||
-		    memcmp(items_m->spec, items->spec, len)) &&
-		    __flow_hw_act_data_encap_append(priv, acts, type,
-						    action_src, action_dst, i,
-						    total_len, len))
-			return -1;
-		total_len += len;
-	}
-	return 0;
-}
-
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -611,7 +556,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
-	uint8_t *encap_data = NULL;
+	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	bool actions_end = false;
 	uint32_t type, i;
@@ -718,9 +663,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_vxlan_encap *)
-				 masks->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -729,9 +674,9 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
-			enc_item_m =
-				((const struct rte_flow_action_nvgre_encap *)
-				actions->conf)->definition;
+			if (masks->conf)
+				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
+					     masks->conf)->definition;
 			reformat_pos = i++;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
@@ -743,6 +688,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data =
+				(const struct rte_flow_action_raw_encap *)
+				 masks->conf;
+			if (raw_encap_data)
+				encap_data_m = raw_encap_data->data;
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 actions->conf;
@@ -776,22 +726,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
+		bool shared_rfmt = true;
 
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
-			if (flow_dv_convert_encap_data
-				(enc_item, buf, &data_size, error) ||
-			    flow_hw_encap_item_translate
-				(dev, acts, (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos,
-				 enc_item, enc_item_m))
+			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
 				goto err;
 			encap_data = buf;
-		} else if (encap_data && __flow_hw_act_data_encap_append
-				(priv, acts,
-				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, 0, 0, data_size)) {
-			goto err;
+			if (!enc_item_m)
+				shared_rfmt = false;
+		} else if (encap_data && !encap_data_m) {
+			shared_rfmt = false;
 		}
 		acts->encap_decap = mlx5_malloc(MLX5_MEM_ZERO,
 				    sizeof(*acts->encap_decap) + data_size,
@@ -805,12 +750,22 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		acts->encap_decap->action = mlx5dr_action_create_reformat
 				(priv->dr_ctx, refmt_type,
 				 data_size, encap_data,
-				 rte_log2_u32(table_attr->nb_flows),
-				 mlx5_hw_act_flag[!!attr->group][type]);
+				 shared_rfmt ? 0 : rte_log2_u32(table_attr->nb_flows),
+				 mlx5_hw_act_flag[!!attr->group][type] |
+				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
 		acts->rule_acts[reformat_pos].action =
 						acts->encap_decap->action;
+		acts->rule_acts[reformat_pos].reformat.data =
+						acts->encap_decap->data;
+		if (shared_rfmt)
+			acts->rule_acts[reformat_pos].reformat.offset = 0;
+		else if (__flow_hw_act_data_encap_append(priv, acts,
+				 (action_start + reformat_src)->type,
+				 reformat_src, reformat_pos, data_size))
+			goto err;
+		acts->encap_decap->shared = shared_rfmt;
 		acts->encap_decap_pos = reformat_pos;
 	}
 	acts->acts_num = i;
@@ -975,6 +930,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			.ingress = 1,
 	};
 	uint32_t ft_flag;
+	size_t encap_len = 0;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -992,9 +948,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
-	if (hw_acts->encap_decap && hw_acts->encap_decap->data_size)
-		memcpy(buf, hw_acts->encap_decap->data,
-		       hw_acts->encap_decap->data_size);
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1053,23 +1006,20 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   action->conf)->definition;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   enc_item[act_data->encap.src].spec,
-				   act_data->encap.len);
+			if (flow_dv_convert_encap_data(enc_item, buf, &encap_len, NULL))
+				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
 			raw_encap_data =
 				(const struct rte_flow_action_raw_encap *)
 				 action->conf;
-			rte_memcpy((void *)&buf[act_data->encap.dst],
-				   raw_encap_data->data, act_data->encap.len);
+			rte_memcpy((void *)buf, raw_encap_data->data, act_data->encap.len);
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
@@ -1077,7 +1027,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
-	if (hw_acts->encap_decap) {
+	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 04/18] net/mlx5: add modify field hws support
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (2 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 03/18] net/mlx5: add shared header reformat support Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:44     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 05/18] net/mlx5: add HW steering port action Suanming Mou
                     ` (14 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

This patch introduces support for modify_field rte_flow actions in HWS
mode. Support includes:

- Ingress and egress domains,
	- SET and ADD operations,
	- usage of arbitrary bit offsets and widths for packet and metadata
	fields.

	Support is implemented in two phases:

	1. On flow table creation the hardware commands are generated, based
	on rte_flow action templates, and stored alongside action template.
	2. On flow rule creation/queueing the hardware commands are updated with
	values provided by the user. Any masks over immediate values, provided
	in action templates, are applied to these values before enqueueing rules
	for creation.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 doc/guides/rel_notes/release_22_11.rst |   3 +-
 drivers/common/mlx5/mlx5_prm.h         |   2 +
 drivers/net/mlx5/linux/mlx5_os.c       |  18 +-
 drivers/net/mlx5/mlx5.h                |   1 +
 drivers/net/mlx5/mlx5_flow.h           |  96 ++++
 drivers/net/mlx5/mlx5_flow_dv.c        | 551 +++++++++++-----------
 drivers/net/mlx5/mlx5_flow_hw.c        | 614 ++++++++++++++++++++++++-
 7 files changed, 1009 insertions(+), 276 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index cdc5837f1d..8f56a99ec9 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -251,7 +251,8 @@ New Features
 
 * **Updated Nvidia mlx5 driver.**
 
-  * Added fully support for queue based async HW steering to the PMD.
+  * Added fully support for queue based async HW steering to the PMD:
+    - Support of modify fields.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 371942ae50..fb3c43eed9 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -751,6 +751,8 @@ enum mlx5_modification_field {
 	MLX5_MODI_IN_TCP_ACK_NUM = 0x5C,
 	MLX5_MODI_GTP_TEID = 0x6E,
 	MLX5_MODI_OUT_IP_ECN = 0x73,
+	MLX5_MODI_TUNNEL_HDR_DW_1 = 0x75,
+	MLX5_MODI_GTPU_FIRST_EXT_DW_0 = 0x76,
 };
 
 /* Total number of metadata reg_c's. */
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 12f503474a..07c238f422 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1553,6 +1553,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				       mlx5_hrxq_clone_free_cb);
 	if (!priv->hrxqs)
 		goto error;
+	mlx5_set_metadata_mask(eth_dev);
+	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+	    !priv->sh->dv_regc0_mask) {
+		DRV_LOG(ERR, "metadata mode %u is not supported "
+			     "(no metadata reg_c[0] is available)",
+			     sh->config.dv_xmeta_en);
+			err = ENOTSUP;
+			goto error;
+	}
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
@@ -1579,15 +1588,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		err = -err;
 		goto error;
 	}
-	mlx5_set_metadata_mask(eth_dev);
-	if (sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
-	    !priv->sh->dv_regc0_mask) {
-		DRV_LOG(ERR, "metadata mode %u is not supported "
-			     "(no metadata reg_c[0] is available)",
-			     sh->config.dv_xmeta_en);
-			err = ENOTSUP;
-			goto error;
-	}
 	/* Query availability of metadata reg_c's. */
 	if (!priv->sh->metadata_regc_check_flag) {
 		err = mlx5_flow_discover_mreg_c(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index fc8e1190f3..4ca53a62f5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -348,6 +348,7 @@ struct mlx5_hw_q_job {
 	struct rte_flow_hw *flow; /* Flow attached to the job. */
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
+	struct mlx5_modification_cmd *mhdr_cmd;
 };
 
 /* HW steering job descriptor LIFO pool. */
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index b225528216..88a08ff877 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1013,6 +1013,51 @@ flow_items_to_tunnel(const struct rte_flow_item items[])
 	return items[0].spec;
 }
 
+/**
+ * Fetch 1, 2, 3 or 4 byte field from the byte array
+ * and return as unsigned integer in host-endian format.
+ *
+ * @param[in] data
+ *   Pointer to data array.
+ * @param[in] size
+ *   Size of field to extract.
+ *
+ * @return
+ *   converted field in host endian format.
+ */
+static inline uint32_t
+flow_dv_fetch_field(const uint8_t *data, uint32_t size)
+{
+	uint32_t ret;
+
+	switch (size) {
+	case 1:
+		ret = *data;
+		break;
+	case 2:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		break;
+	case 3:
+		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
+		ret = (ret << 8) | *(data + sizeof(uint16_t));
+		break;
+	case 4:
+		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
+		break;
+	default:
+		MLX5_ASSERT(false);
+		ret = 0;
+		break;
+	}
+	return ret;
+}
+
+struct field_modify_info {
+	uint32_t size; /* Size of field in protocol header, in bytes. */
+	uint32_t offset; /* Offset of field in protocol header, in bytes. */
+	enum mlx5_modification_field id;
+};
+
 /* HW steering flow attributes. */
 struct mlx5_flow_attr {
 	uint32_t port_id; /* Port index. */
@@ -1081,6 +1126,29 @@ struct mlx5_action_construct_data {
 			/* encap data len. */
 			uint16_t len;
 		} encap;
+		struct {
+			/* Modify header action offset in pattern. */
+			uint16_t mhdr_cmds_off;
+			/* Offset in pattern after modify header actions. */
+			uint16_t mhdr_cmds_end;
+			/*
+			 * True if this action is masked and does not need to
+			 * be generated.
+			 */
+			bool shared;
+			/*
+			 * Modified field definitions in dst field (SET, ADD)
+			 * or src field (COPY).
+			 */
+			struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS];
+			/* Modified field definitions in dst field (COPY). */
+			struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS];
+			/*
+			 * Masks applied to field values to generate
+			 * PRM actions.
+			 */
+			uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS];
+		} modify_header;
 		struct {
 			uint64_t types; /* RSS hash types. */
 			uint32_t level; /* RSS level. */
@@ -1106,6 +1174,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 };
 
@@ -1126,6 +1195,22 @@ struct mlx5_hw_encap_decap_action {
 	uint8_t data[]; /* Action data. */
 };
 
+#define MLX5_MHDR_MAX_CMD ((MLX5_MAX_MODIFY_NUM) * 2 + 1)
+
+/* Modify field action struct. */
+struct mlx5_hw_modify_header_action {
+	/* Reference to DR action */
+	struct mlx5dr_action *action;
+	/* Modify header action position in action rule table. */
+	uint16_t pos;
+	/* Is MODIFY_HEADER action shared across flows in table. */
+	bool shared;
+	/* Amount of modification commands stored in the precompiled buffer. */
+	uint32_t mhdr_cmds_num;
+	/* Precompiled modification commands. */
+	struct mlx5_modification_cmd mhdr_cmds[MLX5_MHDR_MAX_CMD];
+};
+
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
@@ -1135,6 +1220,7 @@ struct mlx5_hw_actions {
 	LIST_HEAD(act_list, mlx5_action_construct_data) act_list;
 	struct mlx5_hw_jump_action *jump; /* Jump action. */
 	struct mlx5_hrxq *tir; /* TIR action. */
+	struct mlx5_hw_modify_header_action *mhdr; /* Modify header action. */
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
@@ -2244,6 +2330,16 @@ int flow_dv_action_query(struct rte_eth_dev *dev,
 size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type);
 int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf,
 			   size_t *size, struct rte_flow_error *error);
+void mlx5_flow_field_id_to_modify_info
+		(const struct rte_flow_action_modify_data *data,
+		 struct field_modify_info *info, uint32_t *mask,
+		 uint32_t width, struct rte_eth_dev *dev,
+		 const struct rte_flow_attr *attr, struct rte_flow_error *error);
+int flow_dv_convert_modify_action(struct rte_flow_item *item,
+			      struct field_modify_info *field,
+			      struct field_modify_info *dcopy,
+			      struct mlx5_flow_dv_modify_hdr_resource *resource,
+			      uint32_t type, struct rte_flow_error *error);
 
 #define MLX5_PF_VPORT_ID 0
 #define MLX5_ECPF_VPORT_ID 0xFFFE
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 5dd93078ac..c3ada4815e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -212,12 +212,6 @@ flow_dv_attr_init(const struct rte_flow_item *item, union flow_dv_attr *attr,
 	attr->valid = 1;
 }
 
-struct field_modify_info {
-	uint32_t size; /* Size of field in protocol header, in bytes. */
-	uint32_t offset; /* Offset of field in protocol header, in bytes. */
-	enum mlx5_modification_field id;
-};
-
 struct field_modify_info modify_eth[] = {
 	{4,  0, MLX5_MODI_OUT_DMAC_47_16},
 	{2,  4, MLX5_MODI_OUT_DMAC_15_0},
@@ -350,45 +344,6 @@ mlx5_update_vlan_vid_pcp(const struct rte_flow_action *action,
 	}
 }
 
-/**
- * Fetch 1, 2, 3 or 4 byte field from the byte array
- * and return as unsigned integer in host-endian format.
- *
- * @param[in] data
- *   Pointer to data array.
- * @param[in] size
- *   Size of field to extract.
- *
- * @return
- *   converted field in host endian format.
- */
-static inline uint32_t
-flow_dv_fetch_field(const uint8_t *data, uint32_t size)
-{
-	uint32_t ret;
-
-	switch (size) {
-	case 1:
-		ret = *data;
-		break;
-	case 2:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		break;
-	case 3:
-		ret = rte_be_to_cpu_16(*(const unaligned_uint16_t *)data);
-		ret = (ret << 8) | *(data + sizeof(uint16_t));
-		break;
-	case 4:
-		ret = rte_be_to_cpu_32(*(const unaligned_uint32_t *)data);
-		break;
-	default:
-		MLX5_ASSERT(false);
-		ret = 0;
-		break;
-	}
-	return ret;
-}
-
 /**
  * Convert modify-header action to DV specification.
  *
@@ -417,7 +372,7 @@ flow_dv_fetch_field(const uint8_t *data, uint32_t size)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-static int
+int
 flow_dv_convert_modify_action(struct rte_flow_item *item,
 			      struct field_modify_info *field,
 			      struct field_modify_info *dcopy,
@@ -1435,7 +1390,32 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static void
+static __rte_always_inline uint8_t
+flow_modify_info_mask_8(uint32_t length, uint32_t off)
+{
+	return (0xffu >> (8 - length)) << off;
+}
+
+static __rte_always_inline uint16_t
+flow_modify_info_mask_16(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_16((0xffffu >> (16 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32(uint32_t length, uint32_t off)
+{
+	return rte_cpu_to_be_32((0xffffffffu >> (32 - length)) << off);
+}
+
+static __rte_always_inline uint32_t
+flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mask)
+{
+	uint32_t mask = (0xffffffffu >> (32 - length)) << off;
+	return rte_cpu_to_be_32(mask & post_mask);
+}
+
+void
 mlx5_flow_field_id_to_modify_info
 		(const struct rte_flow_action_modify_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
@@ -1444,323 +1424,340 @@ mlx5_flow_field_id_to_modify_info
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	uint32_t idx = 0;
-	uint32_t off = 0;
-
-	switch (data->field) {
+	uint32_t off_be = 0;
+	uint32_t length = 0;
+	switch ((int)data->field) {
 	case RTE_FLOW_FIELD_START:
 		/* not supported yet */
 		MLX5_ASSERT(false);
 		break;
 	case RTE_FLOW_FIELD_MAC_DST:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_DMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_DMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_DMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_DMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_DMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_SRC:
-		off = data->offset > 16 ? data->offset - 16 : 0;
-		if (mask) {
-			if (data->offset < 16) {
-				info[idx] = (struct field_modify_info){2, 4,
-						MLX5_MODI_OUT_SMAC_15_0};
-				if (width < 16) {
-					mask[1] = rte_cpu_to_be_16(0xffff >>
-								 (16 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE16(0xffff);
-					width -= 16;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SMAC_47_16};
-			mask[0] = rte_cpu_to_be_32((0xffffffff >>
-						    (32 - width)) << off);
+		MLX5_ASSERT(data->offset + width <= 48);
+		off_be = 48 - (data->offset + width);
+		if (off_be < 16) {
+			info[idx] = (struct field_modify_info){2, 4,
+					MLX5_MODI_OUT_SMAC_15_0};
+			length = off_be + width <= 16 ? width : 16 - off_be;
+			if (mask)
+				mask[1] = flow_modify_info_mask_16(length,
+								   off_be);
+			else
+				info[idx].offset = off_be;
+			width -= length;
+			if (!width)
+				break;
+			off_be = 0;
+			idx++;
 		} else {
-			if (data->offset < 16)
-				info[idx++] = (struct field_modify_info){2, 0,
-						MLX5_MODI_OUT_SMAC_15_0};
-			info[idx] = (struct field_modify_info){4, off,
-						MLX5_MODI_OUT_SMAC_47_16};
+			off_be -= 16;
 		}
+		info[idx] = (struct field_modify_info){4, 0,
+				MLX5_MODI_OUT_SMAC_47_16};
+		if (mask)
+			mask[0] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VLAN_TYPE:
 		/* not supported yet */
 		break;
 	case RTE_FLOW_FIELD_VLAN_ID:
+		MLX5_ASSERT(data->offset + width <= 12);
+		off_be = 12 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_FIRST_VID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x0fff >> (12 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_MAC_TYPE:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_ETHERTYPE};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_TTL:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV4_TTL};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_SRC:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_SIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV4_DST:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_DIPV4};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_DSCP:
+		MLX5_ASSERT(data->offset + width <= 6);
+		off_be = 6 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_DSCP};
 		if (mask)
-			mask[idx] = 0x3f >> (6 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_IPV6_HOPLIMIT:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = 8 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IPV6_HOPLIMIT};
 		if (mask)
-			mask[idx] = 0xff >> (8 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_SRC:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_SIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_SIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_SIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	case RTE_FLOW_FIELD_IPV6_SRC: {
+		/*
+		 * Fields corresponding to IPv6 source address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_SIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_SIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_SIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_SIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_SIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
-	case RTE_FLOW_FIELD_IPV6_DST:
-		if (mask) {
-			if (data->offset < 32) {
-				info[idx] = (struct field_modify_info){4, 12,
-						MLX5_MODI_OUT_DIPV6_31_0};
-				if (width < 32) {
-					mask[3] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[3] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 64) {
-				info[idx] = (struct field_modify_info){4, 8,
-						MLX5_MODI_OUT_DIPV6_63_32};
-				if (width < 32) {
-					mask[2] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[2] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
-			}
-			if (data->offset < 96) {
-				info[idx] = (struct field_modify_info){4, 4,
-						MLX5_MODI_OUT_DIPV6_95_64};
-				if (width < 32) {
-					mask[1] =
-						rte_cpu_to_be_32(0xffffffff >>
-								 (32 - width));
-					width = 0;
-				} else {
-					mask[1] = RTE_BE32(0xffffffff);
-					width -= 32;
-				}
-				if (!width)
-					break;
-				++idx;
+	}
+	case RTE_FLOW_FIELD_IPV6_DST: {
+		/*
+		 * Fields corresponding to IPv6 destination address bytes
+		 * arranged according to network byte ordering.
+		 */
+		struct field_modify_info fields[] = {
+			{ 4, 0, MLX5_MODI_OUT_DIPV6_127_96 },
+			{ 4, 4, MLX5_MODI_OUT_DIPV6_95_64 },
+			{ 4, 8, MLX5_MODI_OUT_DIPV6_63_32 },
+			{ 4, 12, MLX5_MODI_OUT_DIPV6_31_0 },
+		};
+		/* First mask to be modified is the mask of 4th address byte. */
+		uint32_t midx = 3;
+
+		MLX5_ASSERT(data->offset + width <= 128);
+		off_be = 128 - (data->offset + width);
+		while (width > 0 && midx > 0) {
+			if (off_be < 32) {
+				info[idx] = fields[midx];
+				length = off_be + width <= 32 ?
+					 width : 32 - off_be;
+				if (mask)
+					mask[midx] = flow_modify_info_mask_32
+						(length, off_be);
+				else
+					info[idx].offset = off_be;
+				width -= length;
+				off_be = 0;
+				idx++;
+			} else {
+				off_be -= 32;
 			}
-			info[idx] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
-			mask[0] = rte_cpu_to_be_32(0xffffffff >> (32 - width));
-		} else {
-			if (data->offset < 32)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_31_0};
-			if (data->offset < 64)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_63_32};
-			if (data->offset < 96)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_95_64};
-			if (data->offset < 128)
-				info[idx++] = (struct field_modify_info){4, 0,
-						MLX5_MODI_OUT_DIPV6_127_96};
+			midx--;
 		}
+		if (!width)
+			break;
+		info[idx] = fields[midx];
+		if (mask)
+			mask[midx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
+	}
 	case RTE_FLOW_FIELD_TCP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_SEQ_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_SEQ_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_ACK_NUM:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_OUT_TCP_ACK_NUM};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TCP_FLAGS:
+		MLX5_ASSERT(data->offset + width <= 9);
+		off_be = 9 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_TCP_FLAGS};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0x1ff >> (9 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_SRC:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_SPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_UDP_PORT_DST:
+		MLX5_ASSERT(data->offset + width <= 16);
+		off_be = 16 - (data->offset + width);
 		info[idx] = (struct field_modify_info){2, 0,
 					MLX5_MODI_OUT_UDP_DPORT};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_16(0xffff >> (16 - width));
+			mask[idx] = flow_modify_info_mask_16(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_VXLAN_VNI:
-		/* not supported yet */
+		MLX5_ASSERT(data->offset + width <= 24);
+		/* VNI is on bits 31-8 of TUNNEL_HDR_DW_1. */
+		off_be = 24 - (data->offset + width) + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_TUNNEL_HDR_DW_1};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_GENEVE_VNI:
 		/* not supported yet*/
 		break;
 	case RTE_FLOW_FIELD_GTP_TEID:
+		MLX5_ASSERT(data->offset + width <= 32);
+		off_be = 32 - (data->offset + width);
 		info[idx] = (struct field_modify_info){4, 0,
 					MLX5_MODI_GTP_TEID};
 		if (mask)
-			mask[idx] = rte_cpu_to_be_32(0xffffffff >>
-						     (32 - width));
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_TAG:
 		{
-			int reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
-						   data->level, error);
+			MLX5_ASSERT(data->offset + width <= 32);
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = REG_C_1;
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
+							   data->level, error);
 			if (reg < 0)
 				return;
 			MLX5_ASSERT(reg != REG_NON);
@@ -1768,15 +1765,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] =
-					rte_cpu_to_be_32(0xffffffff >>
-							 (32 - width));
+				mask[idx] = flow_modify_info_mask_32
+					(width, data->offset);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_MARK:
 		{
 			uint32_t mark_mask = priv->sh->dv_mark_mask;
 			uint32_t mark_count = __builtin_popcount(mark_mask);
+			RTE_SET_USED(mark_count);
+			MLX5_ASSERT(data->offset + width <= mark_count);
 			int reg = mlx5_flow_get_reg_id(dev, MLX5_FLOW_MARK,
 						       0, error);
 			if (reg < 0)
@@ -1786,14 +1786,18 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((mark_mask >>
-					 (mark_count - width)) & mark_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, mark_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_META:
 		{
 			uint32_t meta_mask = priv->sh->dv_meta_mask;
 			uint32_t meta_count = __builtin_popcount(meta_mask);
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
 			int reg = flow_dv_get_metadata_reg(dev, attr, error);
 			if (reg < 0)
 				return;
@@ -1802,16 +1806,32 @@ mlx5_flow_field_id_to_modify_info
 			info[idx] = (struct field_modify_info){4, 0,
 						reg_to_field[reg]};
 			if (mask)
-				mask[idx] = rte_cpu_to_be_32((meta_mask >>
-					(meta_count - width)) & meta_mask);
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
 		}
 		break;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+		MLX5_ASSERT(data->offset + width <= 2);
+		off_be = 2 - (data->offset + width);
 		info[idx] = (struct field_modify_info){1, 0,
 					MLX5_MODI_OUT_IP_ECN};
 		if (mask)
-			mask[idx] = 0x3 >> (2 - width);
+			mask[idx] = flow_modify_info_mask_8(width, off_be);
+		else
+			info[idx].offset = off_be;
+		break;
+	case RTE_FLOW_FIELD_GTP_PSC_QFI:
+		MLX5_ASSERT(data->offset + width <= 8);
+		off_be = data->offset + 8;
+		info[idx] = (struct field_modify_info){4, 0,
+					MLX5_MODI_GTPU_FIRST_EXT_DW_0};
+		if (mask)
+			mask[idx] = flow_modify_info_mask_32(width, off_be);
+		else
+			info[idx].offset = off_be;
 		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
@@ -1861,7 +1881,8 @@ flow_dv_convert_action_modify_field
 
 	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
 	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
-		type = MLX5_MODIFICATION_TYPE_SET;
+		type = conf->operation == RTE_FLOW_MODIFY_SET ?
+			MLX5_MODIFICATION_TYPE_SET : MLX5_MODIFICATION_TYPE_ADD;
 		/** For SET fill the destination field (field) first. */
 		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
 						  conf->width, dev,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 066ce4694b..8af87657f6 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -319,6 +319,11 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->mhdr) {
+		if (acts->mhdr->action)
+			mlx5dr_action_destroy(acts->mhdr->action);
+		mlx5_free(acts->mhdr);
+	}
 }
 
 /**
@@ -425,6 +430,37 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+static __rte_always_inline int
+__flow_hw_act_data_hdr_modify_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     uint16_t mhdr_cmds_off,
+				     uint16_t mhdr_cmds_end,
+				     bool shared,
+				     struct field_modify_info *field,
+				     struct field_modify_info *dcopy,
+				     uint32_t *mask)
+{
+	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->modify_header.mhdr_cmds_off = mhdr_cmds_off;
+	act_data->modify_header.mhdr_cmds_end = mhdr_cmds_end;
+	act_data->modify_header.shared = shared;
+	rte_memcpy(act_data->modify_header.field, field,
+		   sizeof(*field) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.dcopy, dcopy,
+		   sizeof(*dcopy) * MLX5_ACT_MAX_MOD_FIELDS);
+	rte_memcpy(act_data->modify_header.mask, mask,
+		   sizeof(*mask) * MLX5_ACT_MAX_MOD_FIELDS);
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
 /**
  * Append shared RSS action to the dynamic action list.
  *
@@ -515,6 +551,265 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline bool
+flow_hw_action_modify_field_is_shared(const struct rte_flow_action *action,
+				      const struct rte_flow_action *mask)
+{
+	const struct rte_flow_action_modify_field *v = action->conf;
+	const struct rte_flow_action_modify_field *m = mask->conf;
+
+	if (v->src.field == RTE_FLOW_FIELD_VALUE) {
+		uint32_t j;
+
+		if (m == NULL)
+			return false;
+		for (j = 0; j < RTE_DIM(m->src.value); ++j) {
+			/*
+			 * Immediate value is considered to be masked
+			 * (and thus shared by all flow rules), if mask
+			 * is non-zero. Partial mask over immediate value
+			 * is not allowed.
+			 */
+			if (m->src.value[j])
+				return true;
+		}
+		return false;
+	}
+	if (v->src.field == RTE_FLOW_FIELD_POINTER)
+		return m->src.pvalue != NULL;
+	/*
+	 * Source field types other than VALUE and
+	 * POINTER are always shared.
+	 */
+	return true;
+}
+
+static __rte_always_inline bool
+flow_hw_should_insert_nop(const struct mlx5_hw_modify_header_action *mhdr,
+			  const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd last_cmd = { { 0 } };
+	struct mlx5_modification_cmd new_cmd = { { 0 } };
+	const uint32_t cmds_num = mhdr->mhdr_cmds_num;
+	unsigned int last_type;
+	bool should_insert = false;
+
+	if (cmds_num == 0)
+		return false;
+	last_cmd = *(&mhdr->mhdr_cmds[cmds_num - 1]);
+	last_cmd.data0 = rte_be_to_cpu_32(last_cmd.data0);
+	last_cmd.data1 = rte_be_to_cpu_32(last_cmd.data1);
+	last_type = last_cmd.action_type;
+	new_cmd = *cmd;
+	new_cmd.data0 = rte_be_to_cpu_32(new_cmd.data0);
+	new_cmd.data1 = rte_be_to_cpu_32(new_cmd.data1);
+	switch (new_cmd.action_type) {
+	case MLX5_MODIFICATION_TYPE_SET:
+	case MLX5_MODIFICATION_TYPE_ADD:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = new_cmd.field == last_cmd.field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = new_cmd.field == last_cmd.dst_field;
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	case MLX5_MODIFICATION_TYPE_COPY:
+		if (last_type == MLX5_MODIFICATION_TYPE_SET ||
+		    last_type == MLX5_MODIFICATION_TYPE_ADD)
+			should_insert = (new_cmd.field == last_cmd.field ||
+					 new_cmd.dst_field == last_cmd.field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_COPY)
+			should_insert = (new_cmd.field == last_cmd.dst_field ||
+					 new_cmd.dst_field == last_cmd.dst_field);
+		else if (last_type == MLX5_MODIFICATION_TYPE_NOP)
+			should_insert = false;
+		else
+			MLX5_ASSERT(false); /* Other types are not supported. */
+		break;
+	default:
+		/* Other action types should be rejected on AT validation. */
+		MLX5_ASSERT(false);
+		break;
+	}
+	return should_insert;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_nop_append(struct mlx5_hw_modify_header_action *mhdr)
+{
+	struct mlx5_modification_cmd *nop;
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	nop = mhdr->mhdr_cmds + num;
+	nop->data0 = 0;
+	nop->action_type = MLX5_MODIFICATION_TYPE_NOP;
+	nop->data0 = rte_cpu_to_be_32(nop->data0);
+	nop->data1 = 0;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_mhdr_cmd_append(struct mlx5_hw_modify_header_action *mhdr,
+			struct mlx5_modification_cmd *cmd)
+{
+	uint32_t num = mhdr->mhdr_cmds_num;
+
+	if (num + 1 >= MLX5_MHDR_MAX_CMD)
+		return -ENOMEM;
+	mhdr->mhdr_cmds[num] = *cmd;
+	mhdr->mhdr_cmds_num = num + 1;
+	return 0;
+}
+
+static __rte_always_inline int
+flow_hw_converted_mhdr_cmds_append(struct mlx5_hw_modify_header_action *mhdr,
+				   struct mlx5_flow_dv_modify_hdr_resource *resource)
+{
+	uint32_t idx;
+	int ret;
+
+	for (idx = 0; idx < resource->actions_num; ++idx) {
+		struct mlx5_modification_cmd *src = &resource->actions[idx];
+
+		if (flow_hw_should_insert_nop(mhdr, src)) {
+			ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+			if (ret)
+				return ret;
+		}
+		ret = flow_hw_mhdr_cmd_append(mhdr, src);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static __rte_always_inline void
+flow_hw_modify_field_init(struct mlx5_hw_modify_header_action *mhdr,
+			  struct rte_flow_actions_template *at)
+{
+	memset(mhdr, 0, sizeof(*mhdr));
+	/* Modify header action without any commands is shared by default. */
+	mhdr->shared = true;
+	mhdr->pos = at->mhdr_off;
+}
+
+static __rte_always_inline int
+flow_hw_modify_field_compile(struct rte_eth_dev *dev,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *action_start, /* Start of AT actions. */
+			     const struct rte_flow_action *action, /* Current action from AT. */
+			     const struct rte_flow_action *action_mask, /* Current mask from AT. */
+			     struct mlx5_hw_actions *acts,
+			     struct mlx5_hw_modify_header_action *mhdr,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_modify_field *conf = action->conf;
+	union {
+		struct mlx5_flow_dv_modify_hdr_resource resource;
+		uint8_t data[sizeof(struct mlx5_flow_dv_modify_hdr_resource) +
+			     sizeof(struct mlx5_modification_cmd) * MLX5_MHDR_MAX_CMD];
+	} dummy;
+	struct mlx5_flow_dv_modify_hdr_resource *resource;
+	struct rte_flow_item item = {
+		.spec = NULL,
+		.mask = NULL
+	};
+	struct field_modify_info field[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	struct field_modify_info dcopy[MLX5_ACT_MAX_MOD_FIELDS] = {
+						{0, 0, MLX5_MODI_OUT_NONE} };
+	uint32_t mask[MLX5_ACT_MAX_MOD_FIELDS] = { 0 };
+	uint32_t type, value = 0;
+	uint16_t cmds_start, cmds_end;
+	bool shared;
+	int ret;
+
+	/*
+	 * Modify header action is shared if previous modify_field actions
+	 * are shared and currently compiled action is shared.
+	 */
+	shared = flow_hw_action_modify_field_is_shared(action, action_mask);
+	mhdr->shared &= shared;
+	if (conf->src.field == RTE_FLOW_FIELD_POINTER ||
+	    conf->src.field == RTE_FLOW_FIELD_VALUE) {
+		type = conf->operation == RTE_FLOW_MODIFY_SET ? MLX5_MODIFICATION_TYPE_SET :
+								MLX5_MODIFICATION_TYPE_ADD;
+		/* For SET/ADD fill the destination field (field) first. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, field, mask,
+						  conf->width, dev,
+						  attr, error);
+		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
+				(void *)(uintptr_t)conf->src.pvalue :
+				(void *)(uintptr_t)&conf->src.value;
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+			value = *(const unaligned_uint32_t *)item.spec;
+			value = rte_cpu_to_be_32(value);
+			item.spec = &value;
+		} else if (conf->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+			/*
+			 * QFI is passed as an uint8_t integer, but it is accessed through
+			 * a 2nd least significant byte of a 32-bit field in modify header command.
+			 */
+			value = *(const uint8_t *)item.spec;
+			value = rte_cpu_to_be_32(value << 8);
+			item.spec = &value;
+		}
+	} else {
+		type = MLX5_MODIFICATION_TYPE_COPY;
+		/* For COPY fill the destination field (dcopy) without mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->dst, dcopy, NULL,
+						  conf->width, dev,
+						  attr, error);
+		/* Then construct the source field (field) with mask. */
+		mlx5_flow_field_id_to_modify_info(&conf->src, field, mask,
+						  conf->width, dev,
+						  attr, error);
+	}
+	item.mask = &mask;
+	memset(&dummy, 0, sizeof(dummy));
+	resource = &dummy.resource;
+	ret = flow_dv_convert_modify_action(&item, field, dcopy, resource, type, error);
+	if (ret)
+		return ret;
+	MLX5_ASSERT(resource->actions_num > 0);
+	/*
+	 * If previous modify field action collide with this one, then insert NOP command.
+	 * This NOP command will not be a part of action's command range used to update commands
+	 * on rule creation.
+	 */
+	if (flow_hw_should_insert_nop(mhdr, &resource->actions[0])) {
+		ret = flow_hw_mhdr_cmd_nop_append(mhdr);
+		if (ret)
+			return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL, "too many modify field operations specified");
+	}
+	cmds_start = mhdr->mhdr_cmds_num;
+	ret = flow_hw_converted_mhdr_cmds_append(mhdr, resource);
+	if (ret)
+		return rte_flow_error_set(error, ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "too many modify field operations specified");
+
+	cmds_end = mhdr->mhdr_cmds_num;
+	if (shared)
+		return 0;
+	ret = __flow_hw_act_data_hdr_modify_append(priv, acts, RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+						   action - action_start, mhdr->pos,
+						   cmds_start, cmds_end, shared,
+						   field, dcopy, mask);
+	if (ret)
+		return rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "not enough memory to store modify field metadata");
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -558,10 +853,12 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
+	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
 	uint32_t type, i;
 	int err;
 
+	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
 		type = MLX5DR_TABLE_TYPE_FDB;
 	else if (attr->egress)
@@ -717,6 +1014,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_SEND_TO_KERNEL:
 			DRV_LOG(ERR, "send to kernel action is not supported in HW steering.");
 			goto err;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr.pos == UINT16_MAX)
+				mhdr.pos = i++;
+			err = flow_hw_modify_field_compile(dev, attr, action_start,
+							   actions, masks, acts, &mhdr,
+							   error);
+			if (err)
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -724,6 +1030,31 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (mhdr.pos != UINT16_MAX) {
+		uint32_t flags;
+		uint32_t bulk_size;
+		size_t mhdr_len;
+
+		acts->mhdr = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*acts->mhdr),
+					 0, SOCKET_ID_ANY);
+		if (!acts->mhdr)
+			goto err;
+		rte_memcpy(acts->mhdr, &mhdr, sizeof(*acts->mhdr));
+		mhdr_len = sizeof(struct mlx5_modification_cmd) * acts->mhdr->mhdr_cmds_num;
+		flags = mlx5_hw_act_flag[!!attr->group][type];
+		if (acts->mhdr->shared) {
+			flags |= MLX5DR_ACTION_FLAG_SHARED;
+			bulk_size = 0;
+		} else {
+			bulk_size = rte_log2_u32(table_attr->nb_flows);
+		}
+		acts->mhdr->action = mlx5dr_action_create_modify_header
+				(priv->dr_ctx, mhdr_len, (__be64 *)acts->mhdr->mhdr_cmds,
+				 bulk_size, flags);
+		if (!acts->mhdr->action)
+			goto err;
+		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
+	}
 	if (reformat_pos != MLX5_HW_MAX_ACTS) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
@@ -887,6 +1218,110 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_mhdr_cmd_is_nop(const struct mlx5_modification_cmd *cmd)
+{
+	struct mlx5_modification_cmd cmd_he = {
+		.data0 = rte_be_to_cpu_32(cmd->data0),
+		.data1 = 0,
+	};
+
+	return cmd_he.action_type == MLX5_MODIFICATION_TYPE_NOP;
+}
+
+/**
+ * Construct flow action array.
+ *
+ * For action template contains dynamic actions, these actions need to
+ * be updated according to the rte_flow action during flow creation.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] job
+ *   Pointer to job descriptor.
+ * @param[in] hw_acts
+ *   Pointer to translated actions from template.
+ * @param[in] it_idx
+ *   Item template index the action template refer to.
+ * @param[in] actions
+ *   Array of rte_flow action need to be checked.
+ * @param[in] rule_acts
+ *   Array of DR rule actions to be used during flow creation..
+ * @param[in] acts_num
+ *   Pointer to the real acts_num flow has.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	const struct rte_flow_action_modify_field *mhdr_action = action->conf;
+	uint8_t values[16] = { 0 };
+	unaligned_uint32_t *value_p;
+	uint32_t i;
+	struct field_modify_info *field;
+
+	if (!hw_acts->mhdr)
+		return -1;
+	if (hw_acts->mhdr->shared || act_data->modify_header.shared)
+		return 0;
+	MLX5_ASSERT(mhdr_action->operation == RTE_FLOW_MODIFY_SET ||
+		    mhdr_action->operation == RTE_FLOW_MODIFY_ADD);
+	if (mhdr_action->src.field != RTE_FLOW_FIELD_VALUE &&
+	    mhdr_action->src.field != RTE_FLOW_FIELD_POINTER)
+		return 0;
+	if (mhdr_action->src.field == RTE_FLOW_FIELD_VALUE)
+		rte_memcpy(values, &mhdr_action->src.value, sizeof(values));
+	else
+		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
+	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(*value_p);
+	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
+		uint32_t tmp;
+
+		/*
+		 * QFI is passed as an uint8_t integer, but it is accessed through
+		 * a 2nd least significant byte of a 32-bit field in modify header command.
+		 */
+		tmp = values[0];
+		value_p = (unaligned_uint32_t *)values;
+		*value_p = rte_cpu_to_be_32(tmp << 8);
+	}
+	i = act_data->modify_header.mhdr_cmds_off;
+	field = act_data->modify_header.field;
+	do {
+		uint32_t off_b;
+		uint32_t mask;
+		uint32_t data;
+		const uint8_t *mask_src;
+
+		if (i >= act_data->modify_header.mhdr_cmds_end)
+			return -1;
+		if (flow_hw_mhdr_cmd_is_nop(&job->mhdr_cmd[i])) {
+			++i;
+			continue;
+		}
+		mask_src = (const uint8_t *)act_data->modify_header.mask;
+		mask = flow_dv_fetch_field(mask_src + field->offset, field->size);
+		if (!mask) {
+			++field;
+			continue;
+		}
+		off_b = rte_bsf32(mask);
+		data = flow_dv_fetch_field(values + field->offset, field->size);
+		data = (data & mask) >> off_b;
+		job->mhdr_cmd[i++].data1 = rte_cpu_to_be_32(data);
+		++field;
+	} while (field->size);
+	return 0;
+}
+
 /**
  * Construct flow action array.
  *
@@ -931,6 +1366,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	};
 	uint32_t ft_flag;
 	size_t encap_len = 0;
+	int ret;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -948,6 +1384,18 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	} else {
 		attr.ingress = 1;
 	}
+	if (hw_acts->mhdr && hw_acts->mhdr->mhdr_cmds_num > 0) {
+		uint16_t pos = hw_acts->mhdr->pos;
+
+		if (!hw_acts->mhdr->shared) {
+			rule_acts[pos].modify_header.offset =
+						job->flow->idx - 1;
+			rule_acts[pos].modify_header.data =
+						(uint8_t *)job->mhdr_cmd;
+			rte_memcpy(job->mhdr_cmd, hw_acts->mhdr->mhdr_cmds,
+				   sizeof(*job->mhdr_cmd) * hw_acts->mhdr->mhdr_cmds_num);
+		}
+	}
 	LIST_FOREACH(act_data, &hw_acts->act_list, next) {
 		uint32_t jump_group;
 		uint32_t tag;
@@ -1023,6 +1471,14 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			MLX5_ASSERT(raw_encap_data->size ==
 				    act_data->encap.len);
 			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_modify_field_construct(job,
+							     act_data,
+							     hw_acts,
+							     action);
+			if (ret)
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -1612,6 +2068,155 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static bool
+flow_hw_modify_field_is_used(const struct rte_flow_action_modify_field *action,
+			     enum rte_flow_field_id field)
+{
+	return action->src.field == field || action->dst.field == field;
+}
+
+static int
+flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
+				     const struct rte_flow_action *mask,
+				     struct rte_flow_error *error)
+{
+	const struct rte_flow_action_modify_field *action_conf =
+		action->conf;
+	const struct rte_flow_action_modify_field *mask_conf =
+		mask->conf;
+
+	if (action_conf->operation != mask_conf->operation)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field operation mask and template are not equal");
+	if (action_conf->dst.field != mask_conf->dst.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->dst.field == RTE_FLOW_FIELD_POINTER ||
+	    action_conf->dst.field == RTE_FLOW_FIELD_VALUE)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"immediate value and pointer cannot be used as destination");
+	if (mask_conf->dst.level != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination encapsulation level must be fully masked");
+	if (mask_conf->dst.offset != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+			RTE_FLOW_ERROR_TYPE_ACTION, action,
+			"destination offset level must be fully masked");
+	if (action_conf->src.field != mask_conf->src.field)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"destination field mask and template are not equal");
+	if (action_conf->src.field != RTE_FLOW_FIELD_POINTER &&
+	    action_conf->src.field != RTE_FLOW_FIELD_VALUE) {
+		if (mask_conf->src.level != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source encapsulation level must be fully masked");
+		if (mask_conf->src.offset != UINT32_MAX)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"source offset level must be fully masked");
+	}
+	if (mask_conf->width != UINT32_MAX)
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modify_field width field must be fully masked");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_START))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying arbitrary place in a packet is not supported");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_VLAN_TYPE))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying vlan_type is not supported");
+	if (flow_hw_modify_field_is_used(action_conf, RTE_FLOW_FIELD_GENEVE_VNI))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION, action,
+				"modifying Geneve VNI is not supported");
+	return 0;
+}
+
+static int
+flow_hw_action_validate(const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	int i;
+	bool actions_end = false;
+	int ret;
+
+	for (i = 0; !actions_end; ++i) {
+		const struct rte_flow_action *action = &actions[i];
+		const struct rte_flow_action *mask = &masks[i];
+
+		if (action->type != mask->type)
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "mask type does not match action type");
+		switch (action->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MARK:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_DROP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_JUMP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			/* TODO: Validation logic */
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			ret = flow_hw_validate_action_modify_field(action,
+									mask,
+									error);
+			if (ret < 0)
+				return ret;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			actions_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "action not supported in template API");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow action template.
  *
@@ -1640,6 +2245,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
+	if (flow_hw_action_validate(actions, masks, error))
+		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
 	if (act_len <= 0)
@@ -2096,6 +2703,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
+			    sizeof(struct mlx5_modification_cmd) *
+			    MLX5_MHDR_MAX_CMD +
 			    sizeof(struct mlx5_hw_q_job)) *
 			    queue_attr[0]->size;
 	}
@@ -2107,6 +2716,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	for (i = 0; i < nb_queue; i++) {
 		uint8_t *encap = NULL;
+		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 
 		priv->hw_q[i].job_idx = queue_attr[i]->size;
 		priv->hw_q[i].size = queue_attr[i]->size;
@@ -2118,8 +2728,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 					    &job[queue_attr[i - 1]->size];
 		job = (struct mlx5_hw_q_job *)
 		      &priv->hw_q[i].job[queue_attr[i]->size];
-		encap = (uint8_t *)&job[queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
+		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
 		for (j = 0; j < queue_attr[i]->size; j++) {
+			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
 			priv->hw_q[i].job[j] = &job[j];
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 05/18] net/mlx5: add HW steering port action
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (3 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 04/18] net/mlx5: add modify field hws support Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:44     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
                     ` (13 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch implements creating and caching of port actions for use with
HW Steering FDB flows.

Actions are created on flow template API configuration and created
only on the port designated as master. Attaching and detaching of ports
in the same switching domain causes an update to the port actions cache
by, respectively, creating and destroying actions.

A new devarg fdb_def_rule_en is being added and it's used to control
the default dedicated E-Switch rule is created by PMD implicitly or not,
and PMD sets this value to 1 by default.
If set to 0, the default E-Switch rule will not be created and user can
create the specific E-Switch rule on root table if needed.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |    9 +
 drivers/net/mlx5/linux/mlx5_os.c   |   16 +-
 drivers/net/mlx5/mlx5.c            |   14 +
 drivers/net/mlx5/mlx5.h            |   26 +-
 drivers/net/mlx5/mlx5_flow.c       |   96 +-
 drivers/net/mlx5/mlx5_flow.h       |   22 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   93 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1356 +++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_trigger.c    |   77 +-
 10 files changed, 1595 insertions(+), 118 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 303eb17714..7d2095f075 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1132,6 +1132,15 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``fdb_def_rule_en`` parameter [int]
+
+  A non-zero value enables the PMD to create a dedicated rule on E-Switch root
+  table, this dedicated rule forwards all incoming packets into table 1, other
+  rules will be created in E-Switch table original table level plus one, to
+  improve the flow insertion rate due to skip root table managed by firmware.
+  If set to 0, all rules will be created on the original E-Switch table level.
+
+  By default, the PMD will set this value to 1.
 
 Supported NICs
 --------------
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 07c238f422..4c004ee2ef 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1564,11 +1564,18 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	}
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
-#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+#ifdef HAVE_MLX5_HWS_SUPPORT
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    flow_hw_create_vport_action(eth_dev)) {
+			DRV_LOG(ERR, "port %u failed to create vport action",
+				eth_dev->data->port_id);
+			err = EINVAL;
+			goto error;
+		}
 		return eth_dev;
 #else
 		DRV_LOG(ERR, "DV support is missing for HWS.");
@@ -1633,6 +1640,13 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	return eth_dev;
 error:
 	if (priv) {
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		if (eth_dev &&
+		    priv->sh &&
+		    priv->sh->config.dv_flow_en == 2 &&
+		    priv->sh->config.dv_esw_en)
+			flow_hw_destroy_vport_action(eth_dev);
+#endif
 		if (priv->mreg_cp_tbl)
 			mlx5_hlist_destroy(priv->mreg_cp_tbl);
 		if (priv->sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a34fbcf74d..470b9c2d0f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -172,6 +172,9 @@
 /* Device parameter to configure the delay drop when creating Rxqs. */
 #define MLX5_DELAY_DROP "delay_drop"
 
+/* Device parameter to create the fdb default rule in PMD */
+#define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1239,6 +1242,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->decap_en = !!tmp;
 	} else if (strcmp(MLX5_ALLOW_DUPLICATE_PATTERN, key) == 0) {
 		config->allow_duplicate_pattern = !!tmp;
+	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
+		config->fdb_def_rule = !!tmp;
 	}
 	return 0;
 }
@@ -1274,6 +1279,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_RECLAIM_MEM,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
+		MLX5_FDB_DEFAULT_RULE_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1285,6 +1291,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->dv_flow_en = 1;
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
+	config->fdb_def_rule = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1360,6 +1367,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"decap_en\" is %u.", config->decap_en);
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
+	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
 	return 0;
 }
 
@@ -1943,6 +1951,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	mlx5_flex_parser_ecpri_release(dev);
 	mlx5_flex_item_port_cleanup(dev);
 #ifdef HAVE_MLX5_HWS_SUPPORT
+	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
 	if (priv->sh->config.dv_flow_en == 2)
@@ -2644,6 +2653,11 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.fdb_def_rule ^ config->fdb_def_rule) {
+		DRV_LOG(ERR, "\"fdb_def_rule_en\" configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.l3_vxlan_en ^ config->l3_vxlan_en) {
 		DRV_LOG(ERR, "\"l3_vxlan_en\" "
 			"configuration mismatch for shared %s context.",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4ca53a62f5..fc37b06bd1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -314,6 +314,7 @@ struct mlx5_sh_config {
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
 	/* Allow/Prevent the duplicate rules pattern. */
+	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
 
@@ -342,6 +343,8 @@ enum {
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
 };
 
+#define MLX5_HW_MAX_ITEMS (16)
+
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
@@ -349,6 +352,8 @@ struct mlx5_hw_q_job {
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
+	struct rte_flow_item *items;
+	struct rte_flow_item_ethdev port_spec;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -1212,6 +1217,8 @@ struct mlx5_dev_ctx_shared {
 	uint32_t flow_priority_check_flag:1; /* Check Flag for flow priority. */
 	uint32_t metadata_regc_check_flag:1; /* Check Flag for metadata REGC. */
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
+	uint32_t shared_mark_enabled:1;
+	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1463,6 +1470,12 @@ struct mlx5_obj_ops {
 
 #define MLX5_RSS_HASH_FIELDS_LEN RTE_DIM(mlx5_rss_hash_fields)
 
+struct mlx5_hw_ctrl_flow {
+	LIST_ENTRY(mlx5_hw_ctrl_flow) next;
+	struct rte_eth_dev *owner_dev;
+	struct rte_flow *flow;
+};
+
 struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared device context. */
@@ -1503,6 +1516,11 @@ struct mlx5_priv {
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
 	void *root_drop_action; /* Pointer to root drop action. */
+	rte_spinlock_t hw_ctrl_lock;
+	LIST_HEAD(hw_ctrl_flow, mlx5_hw_ctrl_flow) hw_ctrl_flows;
+	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
+	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
+	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
@@ -1563,11 +1581,11 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_drop[MLX5_HW_ACTION_FLAG_MAX]
-				     [MLX5DR_TABLE_TYPE_MAX];
-	/* HW steering global drop action. */
-	struct mlx5dr_action *hw_tag[MLX5_HW_ACTION_FLAG_MAX];
+	struct mlx5dr_action *hw_drop[2];
+	/* HW steering global tag action. */
+	struct mlx5dr_action *hw_tag[2];
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index eb8faf90f7..a7da9c923d 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1001,6 +1001,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.flex_item_create = mlx5_flow_flex_item_create,
 	.flex_item_release = mlx5_flow_flex_item_release,
 	.info_get = mlx5_flow_info_get,
+	.pick_transfer_proxy = mlx5_flow_pick_transfer_proxy,
 	.configure = mlx5_flow_port_configure,
 	.pattern_template_create = mlx5_flow_pattern_template_create,
 	.pattern_template_destroy = mlx5_flow_pattern_template_destroy,
@@ -1244,7 +1245,7 @@ mlx5_get_lowest_priority(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (!attr->group && !attr->transfer)
+	if (!attr->group && !(attr->transfer && priv->fdb_def_rule))
 		return priv->sh->flow_max_priority - 2;
 	return MLX5_NON_ROOT_FLOW_MAX_PRIO - 1;
 }
@@ -1271,11 +1272,14 @@ mlx5_get_matcher_priority(struct rte_eth_dev *dev,
 	uint16_t priority = (uint16_t)attr->priority;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+	/* NIC root rules */
 	if (!attr->group && !attr->transfer) {
 		if (attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR)
 			priority = priv->sh->flow_max_priority - 1;
 		return mlx5_os_flow_adjust_priority(dev, priority, subpriority);
-	} else if (!external && attr->transfer && attr->group == 0 &&
+	/* FDB root rules */
+	} else if (attr->transfer && (!external || !priv->fdb_def_rule) &&
+		   attr->group == 0 &&
 		   attr->priority == MLX5_FLOW_LOWEST_PRIO_INDICATOR) {
 		return (priv->sh->flow_max_priority - 1) * 3;
 	}
@@ -1483,13 +1487,32 @@ flow_rxq_mark_flag_set(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_rxq_ctrl *rxq_ctrl;
+	uint16_t port_id;
 
-	if (priv->mark_enabled)
+	if (priv->sh->shared_mark_enabled)
 		return;
-	LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
-		rxq_ctrl->rxq.mark = 1;
+	if (priv->master || priv->representor) {
+		MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->domain_id != priv->domain_id ||
+			    opriv->mark_enabled)
+				continue;
+			LIST_FOREACH(rxq_ctrl, &opriv->rxqsctrl, next) {
+				rxq_ctrl->rxq.mark = 1;
+			}
+			opriv->mark_enabled = 1;
+		}
+	} else {
+		LIST_FOREACH(rxq_ctrl, &priv->rxqsctrl, next) {
+			rxq_ctrl->rxq.mark = 1;
+		}
+		priv->mark_enabled = 1;
 	}
-	priv->mark_enabled = 1;
+	priv->sh->shared_mark_enabled = 1;
 }
 
 /**
@@ -1625,6 +1648,7 @@ flow_rxq_flags_clear(struct rte_eth_dev *dev)
 		rxq->ctrl->rxq.tunnel = 0;
 	}
 	priv->mark_enabled = 0;
+	priv->sh->shared_mark_enabled = 0;
 }
 
 /**
@@ -2810,8 +2834,8 @@ mlx5_flow_validate_item_tcp(const struct rte_flow_item *item,
  *   Item specification.
  * @param[in] item_flags
  *   Bit-fields that holds the items detected until now.
- * @param[in] attr
- *   Flow rule attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2823,7 +2847,7 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 			      uint16_t udp_dport,
 			      const struct rte_flow_item *item,
 			      uint64_t item_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_vxlan *spec = item->spec;
@@ -2860,12 +2884,11 @@ mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 	if (priv->sh->steering_format_version !=
 	    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
 	    !udp_dport || udp_dport == MLX5_UDP_PORT_VXLAN) {
-		/* FDB domain & NIC domain non-zero group */
-		if ((attr->transfer || attr->group) && priv->sh->misc5_cap)
+		/* non-root table */
+		if (!root && priv->sh->misc5_cap)
 			valid_mask = &nic_mask;
 		/* Group zero in NIC domain */
-		if (!attr->group && !attr->transfer &&
-		    priv->sh->tunnel_header_0_1)
+		if (!root && priv->sh->tunnel_header_0_1)
 			valid_mask = &nic_mask;
 	}
 	ret = mlx5_flow_item_acceptable
@@ -3104,11 +3127,11 @@ mlx5_flow_validate_item_gre_option(struct rte_eth_dev *dev,
 	if (mask->checksum_rsvd.checksum || mask->sequence.sequence) {
 		if (priv->sh->steering_format_version ==
 		    MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 ||
-		    ((attr->group || attr->transfer) &&
+		    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
 		     !priv->sh->misc5_cap) ||
 		    (!(priv->sh->tunnel_header_0_1 &&
 		       priv->sh->tunnel_header_2_3) &&
-		    !attr->group && !attr->transfer))
+		    !attr->group && (!attr->transfer || !priv->fdb_def_rule)))
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  item,
@@ -6165,7 +6188,8 @@ flow_create_split_metadata(struct rte_eth_dev *dev,
 	}
 	if (qrss) {
 		/* Check if it is in meter suffix table. */
-		mtr_sfx = attr->group == (attr->transfer ?
+		mtr_sfx = attr->group ==
+			  ((attr->transfer && priv->fdb_def_rule) ?
 			  (MLX5_FLOW_TABLE_LEVEL_METER - 1) :
 			  MLX5_FLOW_TABLE_LEVEL_METER);
 		/*
@@ -11120,3 +11144,43 @@ int mlx5_flow_get_item_vport_id(struct rte_eth_dev *dev,
 
 	return 0;
 }
+
+int
+mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+			      uint16_t *proxy_port_id,
+			      struct rte_flow_error *error)
+{
+	const struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t port_id;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " without E-Switch configured");
+	if (!priv->master && !priv->representor)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "unable to provide a proxy port"
+					  " for port which is not a master"
+					  " or a representor port");
+	if (priv->master) {
+		*proxy_port_id = dev->data->port_id;
+		return 0;
+	}
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_priv->master &&
+		    port_priv->domain_id == priv->domain_id) {
+			*proxy_port_id = port_id;
+			return 0;
+		}
+	}
+	return rte_flow_error_set(error, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "unable to find a proxy port");
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 88a08ff877..88caec606d 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1165,6 +1165,11 @@ struct rte_flow_pattern_template {
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
 	uint32_t refcnt;  /* Reference counter. */
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * represented_port pattern item.
+	 */
+	bool implicit_port;
 };
 
 /* Flow action template struct. */
@@ -1240,6 +1245,7 @@ struct mlx5_hw_action_template {
 /* mlx5 flow group struct. */
 struct mlx5_flow_group {
 	struct mlx5_list_entry entry;
+	struct rte_eth_dev *dev; /* Reference to corresponding device. */
 	struct mlx5dr_table *tbl; /* HWS table object. */
 	struct mlx5_hw_jump_action jump; /* Jump action. */
 	enum mlx5dr_table_type type; /* Table type. */
@@ -1498,6 +1504,9 @@ void flow_hw_clear_port_info(struct rte_eth_dev *dev);
 void flow_hw_init_tags_set(struct rte_eth_dev *dev);
 void flow_hw_clear_tags_set(struct rte_eth_dev *dev);
 
+int flow_hw_create_vport_action(struct rte_eth_dev *dev);
+void flow_hw_destroy_vport_action(struct rte_eth_dev *dev);
+
 typedef int (*mlx5_flow_validate_t)(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr,
 				    const struct rte_flow_item items[],
@@ -2097,7 +2106,7 @@ int mlx5_flow_validate_item_vxlan(struct rte_eth_dev *dev,
 				  uint16_t udp_dport,
 				  const struct rte_flow_item *item,
 				  uint64_t item_flags,
-				  const struct rte_flow_attr *attr,
+				  bool root,
 				  struct rte_flow_error *error);
 int mlx5_flow_validate_item_vxlan_gpe(const struct rte_flow_item *item,
 				      uint64_t item_flags,
@@ -2356,4 +2365,15 @@ int flow_dv_translate_items_hws(const struct rte_flow_item *items,
 				uint32_t key_type, uint64_t *item_flags,
 				uint8_t *match_criteria,
 				struct rte_flow_error *error);
+
+int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
+				  uint16_t *proxy_port_id,
+				  struct rte_flow_error *error);
+
+int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
+
+int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
+					 uint32_t txq);
+int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index c3ada4815e..f6f4f20a6f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2442,8 +2442,8 @@ flow_dv_validate_item_gtp(struct rte_eth_dev *dev,
  *   Previous validated item in the pattern items.
  * @param[in] gtp_item
  *   Previous GTP item specification.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2454,7 +2454,7 @@ static int
 flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			      uint64_t last_item,
 			      const struct rte_flow_item *gtp_item,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	const struct rte_flow_item_gtp *gtp_spec;
@@ -2479,7 +2479,7 @@ flow_dv_validate_item_gtp_psc(const struct rte_flow_item *item,
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, item,
 			 "GTP E flag must be 1 to match GTP PSC");
 	/* Check the flow is not created in group zero. */
-	if (!attr->transfer && !attr->group)
+	if (root)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 			 "GTP PSC is not supported for group 0");
@@ -3344,20 +3344,19 @@ flow_dv_validate_action_set_tag(struct rte_eth_dev *dev,
 /**
  * Indicates whether ASO aging is supported.
  *
- * @param[in] sh
- *   Pointer to shared device context structure.
- * @param[in] attr
- *   Attributes of flow that includes AGE action.
+ * @param[in] priv
+ *   Pointer to device private context structure.
+ * @param[in] root
+ *   Whether action is on root table.
  *
  * @return
  *   True when ASO aging is supported, false otherwise.
  */
 static inline bool
-flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
-		const struct rte_flow_attr *attr)
+flow_hit_aso_supported(const struct mlx5_priv *priv, bool root)
 {
-	MLX5_ASSERT(sh && attr);
-	return (sh->flow_hit_aso_en && (attr->transfer || attr->group));
+	MLX5_ASSERT(priv);
+	return (priv->sh->flow_hit_aso_en && !root);
 }
 
 /**
@@ -3369,8 +3368,8 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
  *   Indicator if action is shared.
  * @param[in] action_flags
  *   Holds the actions detected until now.
- * @param[in] attr
- *   Attributes of flow that includes this action.
+ * @param[in] root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3380,7 +3379,7 @@ flow_hit_aso_supported(const struct mlx5_dev_ctx_shared *sh,
 static int
 flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 			      uint64_t action_flags,
-			      const struct rte_flow_attr *attr,
+			      bool root,
 			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -3392,7 +3391,7 @@ flow_dv_validate_action_count(struct rte_eth_dev *dev, bool shared,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "duplicate count actions set");
 	if (shared && (action_flags & MLX5_FLOW_ACTION_AGE) &&
-	    !flow_hit_aso_supported(priv->sh, attr))
+	    !flow_hit_aso_supported(priv, root))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "old age and indirect count combination is not supported");
@@ -3623,8 +3622,8 @@ flow_dv_validate_action_raw_encap_decap
  *   Holds the actions detected until now.
  * @param[in] item_flags
  *   The items found in this flow rule.
- * @param[in] attr
- *   Pointer to flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -3635,12 +3634,12 @@ static int
 flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 			       uint64_t action_flags,
 			       uint64_t item_flags,
-			       const struct rte_flow_attr *attr,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	RTE_SET_USED(dev);
 
-	if (attr->group == 0 && !attr->transfer)
+	if (root)
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -4890,6 +4889,8 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   Pointer to the modify action.
  * @param[in] attr
  *   Pointer to the flow attributes.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -4902,6 +4903,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 				   const uint64_t action_flags,
 				   const struct rte_flow_action *action,
 				   const struct rte_flow_attr *attr,
+				   bool root,
 				   struct rte_flow_error *error)
 {
 	int ret = 0;
@@ -4949,7 +4951,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	}
 	if (action_modify_field->src.field != RTE_FLOW_FIELD_VALUE &&
 	    action_modify_field->src.field != RTE_FLOW_FIELD_POINTER) {
-		if (!attr->transfer && !attr->group)
+		if (root)
 			return rte_flow_error_set(error, ENOTSUP,
 					RTE_FLOW_ERROR_TYPE_ACTION, action,
 					"modify field action is not"
@@ -5039,8 +5041,7 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV4_ECN ||
 	    action_modify_field->dst.field == RTE_FLOW_FIELD_IPV6_ECN ||
 	    action_modify_field->src.field == RTE_FLOW_FIELD_IPV6_ECN)
-		if (!hca_attr->modify_outer_ip_ecn &&
-		    !attr->transfer && !attr->group)
+		if (!hca_attr->modify_outer_ip_ecn && root)
 			return rte_flow_error_set(error, ENOTSUP,
 				RTE_FLOW_ERROR_TYPE_ACTION, action,
 				"modifications of the ECN for current firmware is not supported");
@@ -5074,11 +5075,12 @@ flow_dv_validate_action_jump(struct rte_eth_dev *dev,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table = 0;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret = 0;
 	struct flow_grp_info grp_info = {
 		.external = !!external,
 		.transfer = !!attributes->transfer,
-		.fdb_def_rule = 1,
+		.fdb_def_rule = !!priv->fdb_def_rule,
 		.std_tbl_fix = 0
 	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
@@ -5658,6 +5660,8 @@ flow_dv_modify_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  *   Pointer to the COUNT action in sample action list.
  * @param[out] fdb_mirror_limit
  *   Pointer to the FDB mirror limitation flag.
+ * @param root
+ *   Whether action is on root table.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -5674,6 +5678,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 			       const struct rte_flow_action_rss **sample_rss,
 			       const struct rte_flow_action_count **count,
 			       int *fdb_mirror_limit,
+			       bool root,
 			       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5775,7 +5780,7 @@ flow_dv_validate_action_sample(uint64_t *action_flags,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count
 				(dev, false, *action_flags | sub_action_flags,
-				 attr, error);
+				 root, error);
 			if (ret < 0)
 				return ret;
 			*count = act->conf;
@@ -7255,7 +7260,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
@@ -7349,7 +7354,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
 			ret = flow_dv_validate_item_gtp_psc(items, last_item,
-							    gtp_item, attr,
+							    gtp_item, is_root,
 							    error);
 			if (ret < 0)
 				return ret;
@@ -7566,7 +7571,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_dv_validate_action_count(dev, shared_count,
 							    action_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			count = actions->conf;
@@ -7860,7 +7865,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_SET_TAG;
 			break;
 		case MLX5_RTE_FLOW_ACTION_TYPE_AGE:
-			if (!attr->transfer && !attr->group)
+			if (is_root)
 				return rte_flow_error_set(error, ENOTSUP,
 						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 									   NULL,
@@ -7885,7 +7890,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			 * Validate the regular AGE action (using counter)
 			 * mutual exclusion with indirect counter actions.
 			 */
-			if (!flow_hit_aso_supported(priv->sh, attr)) {
+			if (!flow_hit_aso_supported(priv, is_root)) {
 				if (shared_count)
 					return rte_flow_error_set
 						(error, EINVAL,
@@ -7941,6 +7946,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 							     rss, &sample_rss,
 							     &sample_count,
 							     &fdb_mirror_limit,
+							     is_root,
 							     error);
 			if (ret < 0)
 				return ret;
@@ -7957,6 +7963,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 								   action_flags,
 								   actions,
 								   attr,
+								   is_root,
 								   error);
 			if (ret < 0)
 				return ret;
@@ -7970,8 +7977,8 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			ret = flow_dv_validate_action_aso_ct(dev, action_flags,
-							     item_flags, attr,
-							     error);
+							     item_flags,
+							     is_root, error);
 			if (ret < 0)
 				return ret;
 			action_flags |= MLX5_FLOW_ACTION_CT;
@@ -9174,15 +9181,18 @@ flow_dv_translate_item_vxlan(struct rte_eth_dev *dev,
 	if (MLX5_ITEM_VALID(item, key_type))
 		return;
 	MLX5_ITEM_UPDATE(item, key_type, vxlan_v, vxlan_m, &nic_mask);
-	if (item->mask == &nic_mask &&
-	    ((!attr->group && !priv->sh->tunnel_header_0_1) ||
-	    (attr->group && !priv->sh->misc5_cap)))
+	if ((item->mask == &nic_mask) &&
+	    ((!attr->group && !(attr->transfer && priv->fdb_def_rule) &&
+	    !priv->sh->tunnel_header_0_1) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)))
 		vxlan_m = &rte_flow_item_vxlan_mask;
 	if ((priv->sh->steering_format_version ==
 	     MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 &&
 	     dport != MLX5_UDP_PORT_VXLAN) ||
-	    (!attr->group && !attr->transfer) ||
-	    ((attr->group || attr->transfer) && !priv->sh->misc5_cap)) {
+	    (!attr->group && !(attr->transfer && priv->fdb_def_rule)) ||
+	    ((attr->group || (attr->transfer && priv->fdb_def_rule)) &&
+	    !priv->sh->misc5_cap)) {
 		misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 		size = sizeof(vxlan_m->vni);
 		vni_v = MLX5_ADDR_OF(fte_match_set_misc, misc_v, vxlan_vni);
@@ -14210,7 +14220,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 			 */
 			if (action_flags & MLX5_FLOW_ACTION_AGE) {
 				if ((non_shared_age && count) ||
-				    !flow_hit_aso_supported(priv->sh, attr)) {
+				    !flow_hit_aso_supported(priv, !dev_flow->dv.group)) {
 					/* Creates age by counters. */
 					cnt_act = flow_dv_prepare_counter
 								(dev, dev_flow,
@@ -18365,6 +18375,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 			struct rte_flow_error *err)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	/* called from RTE API */
 
 	RTE_SET_USED(conf);
 	switch (action->type) {
@@ -18392,7 +18403,7 @@ flow_dv_action_validate(struct rte_eth_dev *dev,
 						"Indirect age action not supported");
 		return flow_dv_validate_action_age(0, action, dev, err);
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		return flow_dv_validate_action_count(dev, true, 0, NULL, err);
+		return flow_dv_validate_action_count(dev, true, 0, false, err);
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		if (!priv->sh->ct_aso_en)
 			return rte_flow_error_set(err, ENOTSUP,
@@ -18569,6 +18580,8 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 	bool def_green = false;
 	bool def_yellow = false;
 	const struct rte_flow_action_rss *rss_color[RTE_COLORS] = {NULL};
+	/* Called from RTE API */
+	bool is_root = !(attr->group || (attr->transfer && priv->fdb_def_rule));
 
 	if (!dev_conf->dv_esw_en)
 		def_domain &= ~MLX5_MTR_DOMAIN_TRANSFER_BIT;
@@ -18770,7 +18783,7 @@ flow_dv_validate_mtr_policy_acts(struct rte_eth_dev *dev,
 				break;
 			case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 				ret = flow_dv_validate_action_modify_field(dev,
-					action_flags[i], act, attr, &flow_err);
+					action_flags[i], act, attr, is_root, &flow_err);
 				if (ret < 0)
 					return -rte_mtr_error_set(error,
 					  ENOTSUP,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 8af87657f6..6187ed20cb 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,6 +20,14 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
+/* Maximum number of rules in control flow tables */
+#define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
+
+/* Flow group for SQ miss default flows/ */
+#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+
+static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -57,6 +65,9 @@ flow_hw_rxq_flag_set(struct rte_eth_dev *dev, bool enable)
 	for (i = 0; i < priv->rxqs_n; ++i) {
 		struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_ctrl_get(dev, i);
 
+		/* With RXQ start/stop feature, RXQ might be stopped. */
+		if (!rxq_ctrl)
+			continue;
 		rxq_ctrl->rxq.mark = enable;
 	}
 	priv->mark_enabled = enable;
@@ -810,6 +821,77 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static int
+flow_hw_represented_port_compile(struct rte_eth_dev *dev,
+				 const struct rte_flow_attr *attr,
+				 const struct rte_flow_action *action_start,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *action_mask,
+				 struct mlx5_hw_actions *acts,
+				 uint16_t action_dst,
+				 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_ethdev *v = action->conf;
+	const struct rte_flow_action_ethdev *m = action_mask->conf;
+	int ret;
+
+	if (!attr->group)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used on group 0");
+	if (!attr->transfer)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER,
+					  NULL,
+					  "represented_port action requires"
+					  " transfer attribute");
+	if (attr->ingress || attr->egress)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "represented_port action cannot"
+					  " be used with direction attributes");
+	if (!priv->master)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "represented_port acton must"
+					  " be used on proxy port");
+	if (m && !!m->port_id) {
+		struct mlx5_priv *port_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
+		if (port_priv == NULL)
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "port does not exist or unable to"
+					 " obtain E-Switch info for port");
+		MLX5_ASSERT(priv->hw_vport != NULL);
+		if (priv->hw_vport[v->port_id]) {
+			acts->rule_acts[action_dst].action =
+					priv->hw_vport[v->port_id];
+		} else {
+			return rte_flow_error_set
+					(error, EINVAL,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "cannot use represented_port action"
+					 " with this port");
+		}
+	} else {
+		ret = __flow_hw_act_data_general_append
+				(priv, acts, action->type,
+				 action - action_start, action_dst);
+		if (ret)
+			return rte_flow_error_set
+					(error, ENOMEM,
+					 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					 "not enough memory to store"
+					 " vport action");
+	}
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -887,7 +969,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			acts->rule_acts[i++].action =
-				priv->hw_drop[!!attr->group][type];
+				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			acts->mark = true;
@@ -1023,6 +1105,13 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			if (err)
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			if (flow_hw_represented_port_compile
+					(dev, attr, action_start, actions,
+					 masks, acts, i, error))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1355,11 +1444,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5dr_rule_action *rule_acts,
 			  uint32_t *acts_num)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
+	const struct rte_flow_action_ethdev *port_action = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1479,6 +1570,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (ret)
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			port_action = action->conf;
+			if (!priv->hw_vport[port_action->port_id])
+				return -1;
+			rule_acts[act_data->action_dst].action =
+					priv->hw_vport[port_action->port_id];
+			break;
 		default:
 			break;
 		}
@@ -1491,6 +1589,52 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static const struct rte_flow_item *
+flow_hw_get_rule_items(struct rte_eth_dev *dev,
+		       struct rte_flow_template_table *table,
+		       const struct rte_flow_item items[],
+		       uint8_t pattern_template_index,
+		       struct mlx5_hw_q_job *job)
+{
+	if (table->its[pattern_template_index]->implicit_port) {
+		const struct rte_flow_item *curr_item;
+		unsigned int nb_items;
+		bool found_end;
+		unsigned int i;
+
+		/* Count number of pattern items. */
+		nb_items = 0;
+		found_end = false;
+		for (curr_item = items; !found_end; ++curr_item) {
+			++nb_items;
+			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+				found_end = true;
+		}
+		/* Prepend represented port item. */
+		job->port_spec = (struct rte_flow_item_ethdev){
+			.port_id = dev->data->port_id,
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &job->port_spec,
+		};
+		found_end = false;
+		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
+			job->items[i] = items[i - 1];
+			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
+				found_end = true;
+				break;
+			}
+		}
+		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+		return job->items;
+	}
+	return items;
+}
+
 /**
  * Enqueue HW steering flow creation.
  *
@@ -1542,6 +1686,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
+	const struct rte_flow_item *rule_items;
 	uint32_t acts_num, flow_idx;
 	int ret;
 
@@ -1568,15 +1713,23 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow action array based on the input actions.*/
-	flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num);
+	/* Construct the flow actions based on the input actions.*/
+	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
+				  actions, rule_acts, &acts_num)) {
+		rte_errno = EINVAL;
+		goto free;
+	}
+	rule_items = flow_hw_get_rule_items(dev, table, items,
+					    pattern_template_index, job);
+	if (!rule_items)
+		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
 				 pattern_template_index, items,
 				 action_template_index, rule_acts,
 				 &rule_attr, (struct mlx5dr_rule *)flow->rule);
 	if (likely(!ret))
 		return (struct rte_flow *)flow;
+free:
 	/* Flow created fail, return the descriptor and flow memory. */
 	mlx5_ipool_free(table->flow, flow_idx);
 	priv->hw_q[queue].job_idx++;
@@ -1757,7 +1910,9 @@ __flow_hw_pull_comp(struct rte_eth_dev *dev,
 	struct rte_flow_op_result comp[BURST_THR];
 	int ret, i, empty_loop = 0;
 
-	flow_hw_push(dev, queue, error);
+	ret = flow_hw_push(dev, queue, error);
+	if (ret < 0)
+		return ret;
 	while (pending_rules) {
 		ret = flow_hw_pull(dev, queue, comp, BURST_THR, error);
 		if (ret < 0)
@@ -2042,8 +2197,12 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
+	uint32_t fidx = 1;
 
-	if (table->refcnt) {
+	/* Build ipool allocated object bitmap. */
+	mlx5_ipool_flush_cache(table->flow);
+	/* Check if ipool has allocated objects. */
+	if (table->refcnt || mlx5_ipool_get_next(table->flow, &fidx)) {
 		DRV_LOG(WARNING, "Table %p is still in using.", (void *)table);
 		return rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2055,8 +2214,6 @@ flow_hw_table_destroy(struct rte_eth_dev *dev,
 		__atomic_sub_fetch(&table->its[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
 	for (i = 0; i < table->nb_action_templates; i++) {
-		if (table->ats[i].acts.mark)
-			flow_hw_rxq_flag_set(dev, false);
 		__flow_hw_action_template_destroy(dev, &table->ats[i].acts);
 		__atomic_sub_fetch(&table->ats[i].action_template->refcnt,
 				   1, __ATOMIC_RELAXED);
@@ -2141,7 +2298,51 @@ flow_hw_validate_action_modify_field(const struct rte_flow_action *action,
 }
 
 static int
-flow_hw_action_validate(const struct rte_flow_action actions[],
+flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
+					 const struct rte_flow_action *action,
+					 const struct rte_flow_action *mask,
+					 struct rte_flow_error *error)
+{
+	const struct rte_flow_action_ethdev *action_conf = action->conf;
+	const struct rte_flow_action_ethdev *mask_conf = mask->conf;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->sh->config.dv_esw_en)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "cannot use represented_port actions"
+					  " without an E-Switch");
+	if (mask_conf->port_id) {
+		struct mlx5_priv *port_priv;
+		struct mlx5_priv *dev_priv;
+
+		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
+		if (!port_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for port");
+		dev_priv = mlx5_dev_to_eswitch_info(dev);
+		if (!dev_priv)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "failed to obtain E-Switch"
+						  " info for transfer proxy");
+		if (port_priv->domain_id != dev_priv->domain_id)
+			return rte_flow_error_set(error, rte_errno,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action,
+						  "cannot forward to port from"
+						  " a different E-Switch");
+	}
+	return 0;
+}
+
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
@@ -2204,6 +2405,12 @@ flow_hw_action_validate(const struct rte_flow_action actions[],
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			ret = flow_hw_validate_action_represented_port
+					(dev, action, mask, error);
+			if (ret < 0)
+				return ret;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2245,7 +2452,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	int len, act_len, mask_len, i;
 	struct rte_flow_actions_template *at;
 
-	if (flow_hw_action_validate(actions, masks, error))
+	if (flow_hw_action_validate(dev, actions, masks, error))
 		return NULL;
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
 				NULL, 0, actions, error);
@@ -2328,6 +2535,46 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
+static struct rte_flow_item *
+flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
+			       struct rte_flow_error *error)
+{
+	const struct rte_flow_item *curr_item;
+	struct rte_flow_item *copied_items;
+	bool found_end;
+	unsigned int nb_items;
+	unsigned int i;
+	size_t size;
+
+	/* Count number of pattern items. */
+	nb_items = 0;
+	found_end = false;
+	for (curr_item = items; !found_end; ++curr_item) {
+		++nb_items;
+		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
+			found_end = true;
+	}
+	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	size = sizeof(*copied_items) * (nb_items + 1);
+	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
+	if (!copied_items) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				   NULL,
+				   "cannot allocate item template");
+		return NULL;
+	}
+	copied_items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = NULL,
+		.last = NULL,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	for (i = 1; i < nb_items + 1; ++i)
+		copied_items[i] = items[i - 1];
+	return copied_items;
+}
+
 /**
  * Create flow item template.
  *
@@ -2351,9 +2598,35 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *it;
+	struct rte_flow_item *copied_items = NULL;
+	const struct rte_flow_item *tmpl_items;
 
+	if (priv->sh->config.dv_esw_en && attr->ingress) {
+		/*
+		 * Disallow pattern template with ingress and egress/transfer
+		 * attributes in order to forbid implicit port matching
+		 * on egress and transfer traffic.
+		 */
+		if (attr->egress || attr->transfer) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "item template for ingress traffic"
+					   " cannot be used for egress/transfer"
+					   " traffic when E-Switch is enabled");
+			return NULL;
+		}
+		copied_items = flow_hw_copy_prepend_port_item(items, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else {
+		tmpl_items = items;
+	}
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL,
@@ -2361,8 +2634,10 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
-	it->mt = mlx5dr_match_template_create(items, attr->relaxed_matching);
+	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
+		if (copied_items)
+			mlx5_free(copied_items);
 		mlx5_free(it);
 		rte_flow_error_set(error, rte_errno,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2370,9 +2645,12 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 				   "cannot create match template");
 		return NULL;
 	}
-	it->item_flags = flow_hw_rss_item_flags_get(items);
+	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
+	it->implicit_port = !!copied_items;
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
+	if (copied_items)
+		mlx5_free(copied_items);
 	return it;
 }
 
@@ -2498,6 +2776,7 @@ flow_hw_grp_create_cb(void *tool_ctx, void *cb_ctx)
 			goto error;
 		grp_data->jump.root_action = jump;
 	}
+	grp_data->dev = dev;
 	grp_data->idx = idx;
 	grp_data->group_id = attr->group;
 	grp_data->type = dr_tbl_attr.type;
@@ -2566,7 +2845,8 @@ flow_hw_grp_match_cb(void *tool_ctx __rte_unused, struct mlx5_list_entry *entry,
 	struct rte_flow_attr *attr =
 			(struct rte_flow_attr *)ctx->data;
 
-	return (grp_data->group_id != attr->group) ||
+	return (grp_data->dev != ctx->dev) ||
+		(grp_data->group_id != attr->group) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_FDB) &&
 		attr->transfer) ||
 		((grp_data->type != MLX5DR_TABLE_TYPE_NIC_TX) &&
@@ -2629,6 +2909,545 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
 	mlx5_ipool_free(sh->ipool[MLX5_IPOOL_HW_GRP], grp_data->idx);
 }
 
+/**
+ * Create and cache a vport action for given @p dev port. vport actions
+ * cache is used in HWS with FDB flows.
+ *
+ * This function does not create any function if proxy port for @p dev port
+ * was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ *
+ * @return
+ *   0 on success, positive value otherwise.
+ */
+int
+flow_hw_create_vport_action(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+	int ret;
+
+	ret = mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL);
+	if (ret)
+		return ret;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport)
+		return 0;
+	if (proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u HWS vport action already created",
+			port_id);
+		return -EINVAL;
+	}
+	proxy_priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+			(proxy_priv->dr_ctx, priv->dev_port,
+			 MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!proxy_priv->hw_vport[port_id]) {
+		DRV_LOG(ERR, "port %u unable to create HWS vport action",
+			port_id);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Destroys the vport action associated with @p dev device
+ * from actions' cache.
+ *
+ * This function does not destroy any action if there is no action cached
+ * for @p dev or proxy port was not configured for HW Steering.
+ *
+ * This function assumes that E-Switch is enabled and PMD is running with
+ * HW Steering configured.
+ *
+ * @param dev
+ *   Pointer to Ethernet device which will be the action destination.
+ */
+void
+flow_hw_destroy_vport_action(struct rte_eth_dev *dev)
+{
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t port_id = dev->data->port_id;
+	uint16_t proxy_port_id = port_id;
+
+	if (mlx5_flow_pick_transfer_proxy(dev, &proxy_port_id, NULL))
+		return;
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->hw_vport || !proxy_priv->hw_vport[port_id])
+		return;
+	mlx5dr_action_destroy(proxy_priv->hw_vport[port_id]);
+	proxy_priv->hw_vport[port_id] = NULL;
+}
+
+static int
+flow_hw_create_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	MLX5_ASSERT(!priv->hw_vport);
+	priv->hw_vport = mlx5_malloc(MLX5_MEM_ZERO,
+				     sizeof(*priv->hw_vport) * RTE_MAX_ETHPORTS,
+				     0, SOCKET_ID_ANY);
+	if (!priv->hw_vport)
+		return -ENOMEM;
+	DRV_LOG(DEBUG, "port %u :: creating vport actions", priv->dev_data->port_id);
+	DRV_LOG(DEBUG, "port %u ::    domain_id=%u", priv->dev_data->port_id, priv->domain_id);
+	MLX5_ETH_FOREACH_DEV(port_id, NULL) {
+		struct mlx5_priv *port_priv = rte_eth_devices[port_id].data->dev_private;
+
+		if (!port_priv ||
+		    port_priv->domain_id != priv->domain_id)
+			continue;
+		DRV_LOG(DEBUG, "port %u :: for port_id=%u, calling mlx5dr_action_create_dest_vport() with ibport=%u",
+			priv->dev_data->port_id, port_id, port_priv->dev_port);
+		priv->hw_vport[port_id] = mlx5dr_action_create_dest_vport
+				(priv->dr_ctx, port_priv->dev_port,
+				 MLX5DR_ACTION_FLAG_HWS_FDB);
+		DRV_LOG(DEBUG, "port %u :: priv->hw_vport[%u]=%p",
+			priv->dev_data->port_id, port_id, (void *)priv->hw_vport[port_id]);
+		if (!priv->hw_vport[port_id])
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static void
+flow_hw_free_vport_actions(struct mlx5_priv *priv)
+{
+	uint16_t port_id;
+
+	if (!priv->hw_vport)
+		return;
+	for (port_id = 0; port_id < RTE_MAX_ETHPORTS; ++port_id)
+		if (priv->hw_vport[port_id])
+			mlx5dr_action_destroy(priv->hw_vport[port_id]);
+	mlx5_free(priv->hw_vport);
+	priv->hw_vport = NULL;
+}
+
+/**
+ * Creates a flow pattern template used to match on E-Switch Manager.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template used to match on a TX queue.
+ * This template is used to set up a table for SQ miss default flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct mlx5_rte_flow_item_sq queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow pattern template with unmasked represented port matching.
+ * This template is used to set up a table for default transfer flows
+ * directing packets to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.transfer = 1,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = UINT16_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+/**
+ * Creates a flow actions template with an unmasked JUMP action. Flows
+ * based on this template will perform a jump to some group. This template
+ * is used to set up tables for control flows.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param group
+ *   Destination group for this action template.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_jump_actions_template(struct rte_eth_dev *dev,
+					  uint32_t group)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = group,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a flow action template with a unmasked REPRESENTED_PORT action.
+ * It is used to create control flow tables.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow action template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_ethdev port_v = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action_ethdev port_m = {
+		.port_id = 0,
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m,
+					       NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
+				       struct rte_flow_pattern_template *it,
+				       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+
+/**
+ * Creates a control flow table used to transfer traffic from E-Switch Manager
+ * and TX queues from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
+				  struct rte_flow_pattern_template *it,
+				  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = MLX5_HW_SQ_MISS_GROUP,
+			.priority = 0,
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a control flow table used to transfer traffic
+ * from group 0 to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param it
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table *
+flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
+			       struct rte_flow_pattern_template *it,
+			       struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = 15, /* TODO: Flow priority discovery. */
+			.ingress = 0,
+			.egress = 0,
+			.transfer = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+
+	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+}
+
+/**
+ * Creates a set of flow tables used to create control flows used
+ * when E-Switch is engaged.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, EINVAL otherwise
+ */
+static __rte_unused int
+flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
+	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *port_items_tmpl = NULL;
+	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_actions_template *port_actions_tmpl = NULL;
+	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+
+	/* Item templates */
+	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
+	if (!esw_mgr_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
+			" template for control flows", dev->data->port_id);
+		goto error;
+	}
+	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
+	if (!sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Action templates */
+	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
+									 MLX5_HW_SQ_MISS_GROUP);
+	if (!jump_sq_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	/* Tables */
+	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
+	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
+			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_root_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+								     port_actions_tmpl);
+	if (!priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
+	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
+							       jump_one_actions_tmpl);
+	if (!priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u failed to create table for default jump to group 1"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
+	return 0;
+error:
+	if (priv->hw_esw_zero_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_zero_tbl, NULL);
+		priv->hw_esw_zero_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_tbl, NULL);
+		priv->hw_esw_sq_miss_tbl = NULL;
+	}
+	if (priv->hw_esw_sq_miss_root_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
+		priv->hw_esw_sq_miss_root_tbl = NULL;
+	}
+	if (jump_one_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
+	if (port_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
+	if (jump_sq_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (port_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
+	if (sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (esw_mgr_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
+	return -EINVAL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -2646,7 +3465,6 @@ flow_hw_grp_clone_free_cb(void *tool_ctx, struct mlx5_list_entry *entry)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-
 static int
 flow_hw_configure(struct rte_eth_dev *dev,
 		  const struct rte_flow_port_attr *port_attr,
@@ -2669,6 +3487,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		.free = mlx5_free,
 		.type = "mlx5_hw_action_construct_data",
 	};
+	/* Adds one queue to be used by PMD.
+	 * The last queue will be used by the PMD.
+	 */
+	uint16_t nb_q_updated;
+	struct rte_flow_queue_attr **_queue_attr = NULL;
+	struct rte_flow_queue_attr ctrl_queue_attr = {0};
+	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
+	int ret;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -2677,7 +3503,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* In case re-configuring, release existing context at first. */
 	if (priv->dr_ctx) {
 		/* */
-		for (i = 0; i < nb_queue; i++) {
+		for (i = 0; i < priv->nb_queue; i++) {
 			hw_q = &priv->hw_q[i];
 			/* Make sure all queues are empty. */
 			if (hw_q->size != hw_q->job_idx) {
@@ -2687,26 +3513,42 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		}
 		flow_hw_resource_release(dev);
 	}
+	ctrl_queue_attr.size = queue_attr[0]->size;
+	nb_q_updated = nb_queue + 1;
+	_queue_attr = mlx5_malloc(MLX5_MEM_ZERO,
+				  nb_q_updated *
+				  sizeof(struct rte_flow_queue_attr *),
+				  64, SOCKET_ID_ANY);
+	if (!_queue_attr) {
+		rte_errno = ENOMEM;
+		goto err;
+	}
+
+	memcpy(_queue_attr, queue_attr,
+	       sizeof(void *) * nb_queue);
+	_queue_attr[nb_queue] = &ctrl_queue_attr;
 	priv->acts_ipool = mlx5_ipool_create(&cfg);
 	if (!priv->acts_ipool)
 		goto err;
 	/* Allocate the queue job descriptor LIFO. */
-	mem_size = sizeof(priv->hw_q[0]) * nb_queue;
-	for (i = 0; i < nb_queue; i++) {
+	mem_size = sizeof(priv->hw_q[0]) * nb_q_updated;
+	for (i = 0; i < nb_q_updated; i++) {
 		/*
 		 * Check if the queues' size are all the same as the
 		 * limitation from HWS layer.
 		 */
-		if (queue_attr[i]->size != queue_attr[0]->size) {
+		if (_queue_attr[i]->size != _queue_attr[0]->size) {
 			rte_errno = EINVAL;
 			goto err;
 		}
 		mem_size += (sizeof(struct mlx5_hw_q_job *) +
+			    sizeof(struct mlx5_hw_q_job) +
 			    sizeof(uint8_t) * MLX5_ENCAP_MAX_LEN +
 			    sizeof(struct mlx5_modification_cmd) *
 			    MLX5_MHDR_MAX_CMD +
-			    sizeof(struct mlx5_hw_q_job)) *
-			    queue_attr[0]->size;
+			    sizeof(struct rte_flow_item) *
+			    MLX5_HW_MAX_ITEMS) *
+			    _queue_attr[i]->size;
 	}
 	priv->hw_q = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
 				 64, SOCKET_ID_ANY);
@@ -2714,58 +3556,82 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		rte_errno = ENOMEM;
 		goto err;
 	}
-	for (i = 0; i < nb_queue; i++) {
+	for (i = 0; i < nb_q_updated; i++) {
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
+		struct rte_flow_item *items = NULL;
 
-		priv->hw_q[i].job_idx = queue_attr[i]->size;
-		priv->hw_q[i].size = queue_attr[i]->size;
+		priv->hw_q[i].job_idx = _queue_attr[i]->size;
+		priv->hw_q[i].size = _queue_attr[i]->size;
 		if (i == 0)
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &priv->hw_q[nb_queue];
+					    &priv->hw_q[nb_q_updated];
 		else
 			priv->hw_q[i].job = (struct mlx5_hw_q_job **)
-					    &job[queue_attr[i - 1]->size];
+				&job[_queue_attr[i - 1]->size - 1].items
+				 [MLX5_HW_MAX_ITEMS];
 		job = (struct mlx5_hw_q_job *)
-		      &priv->hw_q[i].job[queue_attr[i]->size];
-		mhdr_cmd = (struct mlx5_modification_cmd *)&job[queue_attr[i]->size];
-		encap = (uint8_t *)&mhdr_cmd[queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
-		for (j = 0; j < queue_attr[i]->size; j++) {
+		      &priv->hw_q[i].job[_queue_attr[i]->size];
+		mhdr_cmd = (struct mlx5_modification_cmd *)
+			   &job[_queue_attr[i]->size];
+		encap = (uint8_t *)
+			 &mhdr_cmd[_queue_attr[i]->size * MLX5_MHDR_MAX_CMD];
+		items = (struct rte_flow_item *)
+			 &encap[_queue_attr[i]->size * MLX5_ENCAP_MAX_LEN];
+		for (j = 0; j < _queue_attr[i]->size; j++) {
 			job[j].mhdr_cmd = &mhdr_cmd[j * MLX5_MHDR_MAX_CMD];
 			job[j].encap_data = &encap[j * MLX5_ENCAP_MAX_LEN];
+			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
-	dr_ctx_attr.queues = nb_queue;
+	dr_ctx_attr.queues = nb_q_updated;
 	/* Queue size should all be the same. Take the first one. */
-	dr_ctx_attr.queue_size = queue_attr[0]->size;
+	dr_ctx_attr.queue_size = _queue_attr[0]->size;
 	dr_ctx = mlx5dr_context_open(priv->sh->cdev->ctx, &dr_ctx_attr);
 	/* rte_errno has been updated by HWS layer. */
 	if (!dr_ctx)
 		goto err;
 	priv->dr_ctx = dr_ctx;
-	priv->nb_queue = nb_queue;
+	priv->nb_queue = nb_q_updated;
+	rte_spinlock_init(&priv->hw_ctrl_lock);
+	LIST_INIT(&priv->hw_ctrl_flows);
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			priv->hw_drop[i][j] = mlx5dr_action_create_dest_drop
-				(priv->dr_ctx, mlx5_hw_act_flag[i][j]);
-			if (!priv->hw_drop[i][j])
-				goto err;
-		}
+		uint32_t act_flags = 0;
+
+		act_flags = mlx5_hw_act_flag[i][0] | mlx5_hw_act_flag[i][1];
+		if (is_proxy)
+			act_flags |= mlx5_hw_act_flag[i][2];
+		priv->hw_drop[i] = mlx5dr_action_create_dest_drop(priv->dr_ctx, act_flags);
+		if (!priv->hw_drop[i])
+			goto err;
 		priv->hw_tag[i] = mlx5dr_action_create_tag
 			(priv->dr_ctx, mlx5_hw_act_flag[i][0]);
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (is_proxy) {
+		ret = flow_hw_create_vport_actions(priv);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+		ret = flow_hw_create_ctrl_tables(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return 0;
 err:
+	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
@@ -2777,6 +3643,8 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -2795,10 +3663,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i, j;
+	int i;
 
 	if (!priv->dr_ctx)
 		return;
+	flow_hw_rxq_flag_set(dev, false);
+	flow_hw_flush_all_ctrl_flows(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -2812,13 +3682,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, at, NULL);
 	}
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
-		for (j = 0; j < MLX5DR_TABLE_TYPE_MAX; j++) {
-			if (priv->hw_drop[i][j])
-				mlx5dr_action_destroy(priv->hw_drop[i][j]);
-		}
+		if (priv->hw_drop[i])
+			mlx5dr_action_destroy(priv->hw_drop[i]);
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
@@ -3061,4 +3930,397 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
+static uint32_t
+flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
+{
+	MLX5_ASSERT(priv->nb_queue > 0);
+	return priv->nb_queue - 1;
+}
+
+/**
+ * Creates a control flow using flow template API on @p proxy_dev device,
+ * on behalf of @p owner_dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * Created flow is stored in private list associated with @p proxy_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device on behalf of which flow is created.
+ * @param proxy_dev
+ *   Pointer to Ethernet device on which flow is created.
+ * @param table
+ *   Pointer to flow table.
+ * @param items
+ *   Pointer to flow rule items.
+ * @param item_template_idx
+ *   Index of an item template associated with @p table.
+ * @param actions
+ *   Pointer to flow rule actions.
+ * @param action_template_idx
+ *   Index of an action template associated with @p table.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno set.
+ */
+static __rte_unused int
+flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
+			 struct rte_eth_dev *proxy_dev,
+			 struct rte_flow_template_table *table,
+			 struct rte_flow_item items[],
+			 uint8_t item_template_idx,
+			 struct rte_flow_action actions[],
+			 uint8_t action_template_idx)
+{
+	struct mlx5_priv *priv = proxy_dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	struct rte_flow *flow = NULL;
+	struct mlx5_hw_ctrl_flow *entry = NULL;
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	entry = mlx5_malloc(MLX5_MEM_ZERO | MLX5_MEM_SYS, sizeof(*entry),
+			    0, SOCKET_ID_ANY);
+	if (!entry) {
+		DRV_LOG(ERR, "port %u not enough memory to create control flows",
+			proxy_dev->data->port_id);
+		rte_errno = ENOMEM;
+		ret = -rte_errno;
+		goto error;
+	}
+	flow = flow_hw_async_flow_create(proxy_dev, queue, &op_attr, table,
+					 items, item_template_idx,
+					 actions, action_template_idx,
+					 NULL, NULL);
+	if (!flow) {
+		DRV_LOG(ERR, "port %u failed to enqueue create control"
+			" flow operation", proxy_dev->data->port_id);
+		ret = -rte_errno;
+		goto error;
+	}
+	ret = flow_hw_push(proxy_dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			proxy_dev->data->port_id);
+		goto error;
+	}
+	ret = __flow_hw_pull_comp(proxy_dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to insert control flow",
+			proxy_dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto error;
+	}
+	entry->owner_dev = owner_dev;
+	entry->flow = flow;
+	LIST_INSERT_HEAD(&priv->hw_ctrl_flows, entry, next);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+error:
+	if (entry)
+		mlx5_free(entry);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys a control flow @p flow using flow template API on @p dev device.
+ *
+ * This function uses locks internally to synchronize access to the
+ * flow queue.
+ *
+ * If the @p flow is stored on any private list/pool, then caller must free up
+ * the relevant resources.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param flow
+ *   Pointer to flow rule.
+ *
+ * @return
+ *   0 on success, non-zero value otherwise.
+ */
+static int
+flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	struct rte_flow_op_attr op_attr = {
+		.postpone = 0,
+	};
+	int ret;
+
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	ret = flow_hw_async_flow_destroy(dev, queue, &op_attr, flow, NULL, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to enqueue destroy control"
+			" flow operation", dev->data->port_id);
+		goto exit;
+	}
+	ret = flow_hw_push(dev, queue, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to drain control flow queue",
+			dev->data->port_id);
+		goto exit;
+	}
+	ret = __flow_hw_pull_comp(dev, queue, 1, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "port %u failed to destroy control flow",
+			dev->data->port_id);
+		rte_errno = EINVAL;
+		ret = -rte_errno;
+		goto exit;
+	}
+exit:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return ret;
+}
+
+/**
+ * Destroys control flows created on behalf of @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	if (owner_priv->sh->config.dv_esw_en) {
+		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u",
+				owner_port_id);
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+		proxy_priv = proxy_dev->data->dev_private;
+	} else {
+		proxy_dev = owner_dev;
+		proxy_priv = owner_priv;
+	}
+	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		if (cf->owner_dev == owner_dev) {
+			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+			if (ret) {
+				rte_errno = ret;
+				return -ret;
+			}
+			LIST_REMOVE(cf, next);
+			mlx5_free(cf);
+		}
+		cf = cf_next;
+	}
+	return 0;
+}
+
+/**
+ * Destroys all control flows created on @p dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+static int
+flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_ctrl_flow *cf;
+	struct mlx5_hw_ctrl_flow *cf_next;
+	int ret;
+
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
+	while (cf != NULL) {
+		cf_next = LIST_NEXT(cf, next);
+		ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
+		if (ret) {
+			rte_errno = ret;
+			return -ret;
+		}
+		LIST_REMOVE(cf, next);
+		mlx5_free(cf);
+		cf = cf_next;
+	}
+	return 0;
+}
+
+int
+mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item_ethdev port_mask = {
+		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+			.mask = &port_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = MLX5_HW_SQ_MISS_GROUP,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx ||
+	    !priv->hw_esw_sq_miss_root_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_esw_sq_miss_root_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct mlx5_rte_flow_item_sq queue_spec = {
+		.queue = txq,
+	};
+	struct mlx5_rte_flow_item_sq queue_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &queue_spec,
+			.mask = &queue_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_ethdev port = {
+		.port_id = port_id,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+			.conf = &port,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	RTE_SET_USED(txq);
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
+	    !proxy_priv->hw_esw_sq_miss_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_sq_miss_tbl,
+					items, 0, actions, 0);
+}
+
+int
+mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
+{
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev port_spec = {
+		.port_id = port_id,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+			.spec = &port_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_jump jump = {
+		.group = 1,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = dev->data->port_id;
+	int ret;
+
+	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+	if (ret) {
+		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (!proxy_priv->dr_ctx)
+		return 0;
+	if (!proxy_priv->hw_esw_zero_tbl) {
+		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
+			" flow tables are not created",
+			port_id, proxy_port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return flow_hw_create_ctrl_flow(dev, proxy_dev,
+					proxy_priv->hw_esw_zero_tbl,
+					items, 0, actions, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index fd902078f8..7ffaf4c227 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1245,12 +1245,14 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 	uint16_t ether_type = 0;
 	bool is_empty_vlan = false;
 	uint16_t udp_dport = 0;
+	bool is_root;
 
 	if (items == NULL)
 		return -1;
 	ret = mlx5_flow_validate_attributes(dev, attr, error);
 	if (ret < 0)
 		return ret;
+	is_root = ret;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		int ret = 0;
@@ -1380,7 +1382,7 @@ flow_verbs_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 			ret = mlx5_flow_validate_item_vxlan(dev, udp_dport,
 							    items, item_flags,
-							    attr, error);
+							    is_root, error);
 			if (ret < 0)
 				return ret;
 			last_item = MLX5_FLOW_LAYER_VXLAN;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index c68b32cf14..f59d314ff4 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1280,6 +1280,52 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+
+static int
+mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	int ret;
+
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
+			goto error;
+	}
+	for (i = 0; i < priv->txqs_n; ++i) {
+		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
+		uint32_t queue;
+
+		if (!txq)
+			continue;
+		if (txq->is_hairpin)
+			queue = txq->obj->sq->id;
+		else
+			queue = txq->obj->sq_obj.sq->id;
+		if ((priv->representor || priv->master) &&
+		    priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
+		mlx5_txq_release(dev, i);
+	}
+	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
+			goto error;
+	}
+	return 0;
+error:
+	ret = rte_errno;
+	mlx5_flow_hw_flush_ctrl_flows(dev);
+	rte_errno = ret;
+	return -rte_errno;
+}
+
+#endif
+
 /**
  * Enable traffic flows configured by control plane
  *
@@ -1316,6 +1362,10 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 	unsigned int j;
 	int ret;
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_traffic_enable_hws(dev);
+#endif
 	/*
 	 * Hairpin txq default flow should be created no matter if it is
 	 * isolation mode. Or else all the packets to be sent will be sent
@@ -1346,13 +1396,17 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_create_esw_table_zero_flow(dev))
-			priv->fdb_def_rule = 1;
-		else
-			DRV_LOG(INFO, "port %u FDB default rule cannot be"
-				" configured - only Eswitch group 0 flows are"
-				" supported.", dev->data->port_id);
+	if (priv->sh->config.fdb_def_rule) {
+		if (priv->sh->config.dv_esw_en) {
+			if (mlx5_flow_create_esw_table_zero_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				DRV_LOG(INFO, "port %u FDB default rule cannot be configured - only Eswitch group 0 flows are supported.",
+					dev->data->port_id);
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled",
+			dev->data->port_id);
 	}
 	if (!priv->sh->config.lacp_by_user && priv->pf_bond >= 0) {
 		ret = mlx5_flow_lacp_miss(dev);
@@ -1470,7 +1524,14 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 void
 mlx5_traffic_disable(struct rte_eth_dev *dev)
 {
-	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		mlx5_flow_hw_flush_ctrl_flows(dev);
+	else
+#endif
+		mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 06/18] net/mlx5: add extended metadata mode for hardware steering
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (4 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 05/18] net/mlx5: add HW steering port action Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:45     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 07/18] net/mlx5: add HW steering meter action Suanming Mou
                     ` (12 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Bing Zhao

From: Bing Zhao <bingz@nvidia.com>

The new mode 4 of devarg "dv_xmeta_en" is added for HWS only. In this
mode, the Rx / Tx metadata with 32b width copy between FDB and NIC is
supported. The mark is only supported in NIC and there is no copy
supported.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 doc/guides/nics/mlx5.rst               |   4 +
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/net/mlx5/linux/mlx5_os.c       |  10 +-
 drivers/net/mlx5/mlx5.c                |   7 +-
 drivers/net/mlx5/mlx5.h                |   8 +-
 drivers/net/mlx5/mlx5_flow.c           |   8 +-
 drivers/net/mlx5/mlx5_flow.h           |  14 +
 drivers/net/mlx5/mlx5_flow_dv.c        |  43 +-
 drivers/net/mlx5/mlx5_flow_hw.c        | 864 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_trigger.c        |   3 +
 10 files changed, 877 insertions(+), 85 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 7d2095f075..0c7bd042a4 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -980,6 +980,10 @@ for an additional list of options shared with other mlx5 drivers.
   - 3, this engages tunnel offload mode. In E-Switch configuration, that
     mode implicitly activates ``dv_xmeta_en=1``.
 
+  - 4, this mode only supported in HWS (``dv_flow_en=2``). The Rx / Tx
+    metadata with 32b width copy between FDB and NIC is supported. The
+    mark is only supported in NIC and there is no copy supported.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8f56a99ec9..c69e796c16 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -253,6 +253,7 @@ New Features
 
   * Added fully support for queue based async HW steering to the PMD:
     - Support of modify fields.
+    - Support of FDB.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 4c004ee2ef..62b957839c 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1567,6 +1567,15 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 #ifdef HAVE_MLX5_HWS_SUPPORT
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
+		if (priv->sh->config.dv_esw_en &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+			DRV_LOG(ERR,
+				"metadata mode %u is not supported in HWS eswitch mode",
+				priv->sh->config.dv_xmeta_en);
+				err = ENOTSUP;
+				goto error;
+		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
@@ -1582,7 +1591,6 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		goto error;
 #endif
 	}
-	/* Port representor shares the same max priority with pf port. */
 	if (!priv->sh->flow_priority_check_flag) {
 		/* Supported Verbs flow priority number detection. */
 		err = mlx5_flow_discover_priorities(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 470b9c2d0f..9cd4892858 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1218,7 +1218,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
 		    tmp != MLX5_XMETA_MODE_META32 &&
-		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
+		    tmp != MLX5_XMETA_MODE_MISS_INFO &&
+		    tmp != MLX5_XMETA_MODE_META32_HWS) {
 			DRV_LOG(ERR, "Invalid extensive metadata parameter.");
 			rte_errno = EINVAL;
 			return -rte_errno;
@@ -2849,6 +2850,10 @@ mlx5_set_metadata_mask(struct rte_eth_dev *dev)
 		meta = UINT32_MAX;
 		mark = (reg_c0 >> rte_bsf32(reg_c0)) & MLX5_FLOW_MARK_MASK;
 		break;
+	case MLX5_XMETA_MODE_META32_HWS:
+		meta = UINT32_MAX;
+		mark = MLX5_FLOW_MARK_MASK;
+		break;
 	default:
 		meta = 0;
 		mark = 0;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index fc37b06bd1..e15b80ba92 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -303,8 +303,8 @@ struct mlx5_sh_config {
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
-	unsigned int dv_flow_en:2;
-	uint32_t dv_xmeta_en:2; /* Enable extensive flow metadata. */
+	uint32_t dv_flow_en:2; /* Enable DV flow. */
+	uint32_t dv_xmeta_en:3; /* Enable extensive flow metadata. */
 	uint32_t dv_miss_info:1; /* Restore packet after partial hw miss. */
 	uint32_t l3_vxlan_en:1; /* Enable L3 VXLAN flow creation. */
 	uint32_t vf_nl_en:1; /* Enable Netlink requests in VF mode. */
@@ -317,7 +317,6 @@ struct mlx5_sh_config {
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
 
-
 /* Structure for VF VLAN workaround. */
 struct mlx5_vf_vlan {
 	uint32_t tag:12;
@@ -1290,12 +1289,12 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
+	/* Availability of mreg_c's. */
 	void *devx_channel_lwm;
 	struct rte_intr_handle *intr_handle_lwm;
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
-	/* Availability of mreg_c's. */
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1521,6 +1520,7 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_root_tbl;
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
+	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index a7da9c923d..51d2b42755 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1109,6 +1109,8 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_METADATA_TX:
@@ -1121,11 +1123,14 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 			return REG_C_0;
 		case MLX5_XMETA_MODE_META32:
 			return REG_C_1;
+		case MLX5_XMETA_MODE_META32_HWS:
+			return REG_C_1;
 		}
 		break;
 	case MLX5_FLOW_MARK:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
+		case MLX5_XMETA_MODE_META32_HWS:
 			return REG_NON;
 		case MLX5_XMETA_MODE_META16:
 			return REG_C_1;
@@ -4444,7 +4449,8 @@ static bool flow_check_modify_action_type(struct rte_eth_dev *dev,
 		return true;
 	case RTE_FLOW_ACTION_TYPE_FLAG:
 	case RTE_FLOW_ACTION_TYPE_MARK:
-		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY)
+		if (priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
+		    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS)
 			return true;
 		else
 			return false;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 88caec606d..2adf516691 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -48,6 +48,12 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
 };
 
+/* Private (internal) Field IDs for MODIFY_FIELD action. */
+enum mlx5_rte_flow_field_id {
+		MLX5_RTE_FLOW_FIELD_END = INT_MIN,
+			MLX5_RTE_FLOW_FIELD_META_REG,
+};
+
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
 
 enum {
@@ -1181,6 +1187,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
+	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
 };
 
 /* Jump action struct. */
@@ -1257,6 +1264,11 @@ struct mlx5_flow_group {
 #define MLX5_HW_TBL_MAX_ITEM_TEMPLATE 2
 #define MLX5_HW_TBL_MAX_ACTION_TEMPLATE 32
 
+struct mlx5_flow_template_table_cfg {
+	struct rte_flow_template_table_attr attr; /* Table attributes passed through flow API. */
+	bool external; /* True if created by flow API, false if table is internal to PMD. */
+};
+
 struct rte_flow_template_table {
 	LIST_ENTRY(rte_flow_template_table) next;
 	struct mlx5_flow_group *grp; /* The group rte_flow_template_table uses. */
@@ -1266,6 +1278,7 @@ struct rte_flow_template_table {
 	/* Action templates bind to the table. */
 	struct mlx5_hw_action_template ats[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	struct mlx5_indexed_pool *flow; /* The table's flow ipool. */
+	struct mlx5_flow_template_table_cfg cfg;
 	uint32_t type; /* Flow table type RX/TX/FDB. */
 	uint8_t nb_item_templates; /* Item template number. */
 	uint8_t nb_action_templates; /* Action template number. */
@@ -2376,4 +2389,5 @@ int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index f6f4f20a6f..2ca83f5d7a 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1754,7 +1754,8 @@ mlx5_flow_field_id_to_modify_info
 			int reg;
 
 			if (priv->sh->config.dv_flow_en == 2)
-				reg = REG_C_1;
+				reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG,
+							 data->level);
 			else
 				reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG,
 							   data->level, error);
@@ -1833,6 +1834,24 @@ mlx5_flow_field_id_to_modify_info
 		else
 			info[idx].offset = off_be;
 		break;
+	case MLX5_RTE_FLOW_FIELD_META_REG:
+		{
+			uint32_t meta_mask = priv->sh->dv_meta_mask;
+			uint32_t meta_count = __builtin_popcount(meta_mask);
+			uint32_t reg = data->level;
+
+			RTE_SET_USED(meta_count);
+			MLX5_ASSERT(data->offset + width <= meta_count);
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT(reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0, reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, meta_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -9796,7 +9815,19 @@ flow_dv_translate_item_meta(struct rte_eth_dev *dev,
 	mask = meta_m->data;
 	if (key_type == MLX5_SET_MATCHER_HS_M)
 		mask = value;
-	reg = flow_dv_get_metadata_reg(dev, attr, NULL);
+	/*
+	 * In the current implementation, REG_B cannot be used to match.
+	 * Force to use REG_C_1 in HWS root table as other tables.
+	 * This map may change.
+	 * NIC: modify - REG_B to be present in SW
+	 *      match - REG_C_1 when copied from FDB, different from SWS
+	 * FDB: modify - REG_C_1 in Xmeta mode, REG_NON in legacy mode
+	 *      match - REG_C_1 in FDB
+	 */
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = flow_dv_get_metadata_reg(dev, attr, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_META, 0);
 	if (reg < 0)
 		return;
 	MLX5_ASSERT(reg != REG_NON);
@@ -9896,7 +9927,10 @@ flow_dv_translate_item_tag(struct rte_eth_dev *dev, void *key,
 	/* When set mask, the index should be from spec. */
 	index = tag_vv ? tag_vv->index : tag_v->index;
 	/* Get the metadata register index for the tag. */
-	reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG, index, NULL);
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_APP_TAG, index, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, index);
 	MLX5_ASSERT(reg > 0);
 	flow_dv_match_meta_reg(key, reg, tag_v->data, tag_m->data);
 }
@@ -13459,7 +13493,8 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 	 */
 	if (!(wks.item_flags & MLX5_FLOW_ITEM_PORT_ID) &&
 	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) && priv->sh->esw_mode &&
-	    !(attr->egress && !attr->transfer)) {
+	    !(attr->egress && !attr->transfer) &&
+	    attr->group != MLX5_FLOW_MREG_CP_TABLE_GROUP) {
 		if (flow_dv_translate_item_port_id_all(dev, match_mask,
 						   match_value, NULL, attr))
 			return -rte_errno;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6187ed20cb..e4c3ec9b28 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -20,13 +20,27 @@
 /* Default queue to flush the flows. */
 #define MLX5_DEFAULT_FLUSH_QUEUE 0
 
-/* Maximum number of rules in control flow tables */
+/* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Flow group for SQ miss default flows/ */
-#define MLX5_HW_SQ_MISS_GROUP (UINT32_MAX)
+/* Lowest flow group usable by an application. */
+#define MLX5_HW_LOWEST_USABLE_GROUP (1)
+
+/* Maximum group index usable by user applications for transfer flows. */
+#define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
+
+/* Lowest priority for HW root table. */
+#define MLX5_HW_LOWEST_PRIO_ROOT 15
+
+/* Lowest priority for HW non-root table. */
+#define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
+static int flow_hw_translate_group(struct rte_eth_dev *dev,
+				   const struct mlx5_flow_template_table_cfg *cfg,
+				   uint32_t group,
+				   uint32_t *table_group,
+				   struct rte_flow_error *error);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -213,12 +227,12 @@ flow_hw_rss_item_flags_get(const struct rte_flow_item items[])
  */
 static struct mlx5_hw_jump_action *
 flow_hw_jump_action_register(struct rte_eth_dev *dev,
-			     const struct rte_flow_attr *attr,
+			     const struct mlx5_flow_template_table_cfg *cfg,
 			     uint32_t dest_group,
 			     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_attr jattr = *attr;
+	struct rte_flow_attr jattr = cfg->attr.flow_attr;
 	struct mlx5_flow_group *grp;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -226,9 +240,13 @@ flow_hw_jump_action_register(struct rte_eth_dev *dev,
 		.data = &jattr,
 	};
 	struct mlx5_list_entry *ge;
+	uint32_t target_group;
 
-	jattr.group = dest_group;
-	ge = mlx5_hlist_register(priv->sh->flow_tbls, dest_group, &ctx);
+	target_group = dest_group;
+	if (flow_hw_translate_group(dev, cfg, dest_group, &target_group, error))
+		return NULL;
+	jattr.group = target_group;
+	ge = mlx5_hlist_register(priv->sh->flow_tbls, target_group, &ctx);
 	if (!ge)
 		return NULL;
 	grp = container_of(ge, struct mlx5_flow_group, entry);
@@ -760,7 +778,8 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)conf->src.pvalue :
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
-		    conf->dst.field == RTE_FLOW_FIELD_TAG) {
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
 			item.spec = &value;
@@ -860,6 +879,9 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	if (m && !!m->port_id) {
 		struct mlx5_priv *port_priv;
 
+		if (!v)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(v->port_id, false);
 		if (port_priv == NULL)
 			return rte_flow_error_set
@@ -903,8 +925,8 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] table_attr
- *   Pointer to the table attributes.
+ * @param[in] cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in/out] acts
@@ -919,12 +941,13 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
  */
 static int
 flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct rte_flow_template_table_attr *table_attr,
+			  const struct mlx5_flow_template_table_cfg *cfg,
 			  struct mlx5_hw_actions *acts,
 			  struct rte_flow_actions_template *at,
 			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
 	const struct rte_flow_attr *attr = &table_attr->flow_attr;
 	struct rte_flow_action *actions = at->actions;
 	struct rte_flow_action *action_start = actions;
@@ -991,7 +1014,7 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
 				acts->jump = flow_hw_jump_action_register
-						(dev, attr, jump_group, error);
+						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
 				acts->rule_acts[i].action = (!!attr->group) ?
@@ -1104,6 +1127,16 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 							   error);
 			if (err)
 				goto err;
+			/*
+			 * Adjust the action source position for the following.
+			 * ... / MODIFY_FIELD: rx_cpy_pos / (QUEUE|RSS) / ...
+			 * The next action will be Q/RSS, there will not be
+			 * another adjustment and the real source position of
+			 * the following actions will be decreased by 1.
+			 * No change of the total actions in the new template.
+			 */
+			if ((actions - action_start) == at->rx_cpy_pos)
+				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			if (flow_hw_represented_port_compile
@@ -1368,7 +1401,8 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 	else
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
-	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG) {
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
 	} else if (mhdr_action->dst.field == RTE_FLOW_FIELD_GTP_PSC_QFI) {
@@ -1516,7 +1550,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
 			jump = flow_hw_jump_action_register
-				(dev, &attr, jump_group, NULL);
+				(dev, &table->cfg, jump_group, NULL);
 			if (!jump)
 				return -1;
 			rule_acts[act_data->action_dst].action =
@@ -1713,7 +1747,13 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->user_data = user_data;
 	rule_attr.user_data = job;
 	hw_acts = &table->ats[action_template_index].acts;
-	/* Construct the flow actions based on the input actions.*/
+	/*
+	 * Construct the flow actions based on the input actions.
+	 * The implicitly appended action is always fixed, like metadata
+	 * copy action from FDB to NIC Rx.
+	 * No need to copy and contrust a new "actions" list based on the
+	 * user's input, in order to save the cost.
+	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
 				  actions, rule_acts, &acts_num)) {
 		rte_errno = EINVAL;
@@ -1984,6 +2024,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
 	/* Flush flow per-table from MLX5_DEFAULT_FLUSH_QUEUE. */
 	hw_q = &priv->hw_q[MLX5_DEFAULT_FLUSH_QUEUE];
 	LIST_FOREACH(tbl, &priv->flow_hw_tbl, next) {
+		if (!tbl->cfg.external)
+			continue;
 		MLX5_IPOOL_FOREACH(tbl->flow, fidx, flow) {
 			if (flow_hw_async_flow_destroy(dev,
 						MLX5_DEFAULT_FLUSH_QUEUE,
@@ -2021,8 +2063,8 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
- * @param[in] attr
- *   Pointer to the table attributes.
+ * @param[in] table_cfg
+ *   Pointer to the table configuration.
  * @param[in] item_templates
  *   Item template array to be binded to the table.
  * @param[in] nb_item_templates
@@ -2039,7 +2081,7 @@ flow_hw_q_flow_flush(struct rte_eth_dev *dev,
  */
 static struct rte_flow_template_table *
 flow_hw_table_create(struct rte_eth_dev *dev,
-		     const struct rte_flow_template_table_attr *attr,
+		     const struct mlx5_flow_template_table_cfg *table_cfg,
 		     struct rte_flow_pattern_template *item_templates[],
 		     uint8_t nb_item_templates,
 		     struct rte_flow_actions_template *action_templates[],
@@ -2051,6 +2093,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
@@ -2091,6 +2134,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*tbl), 0, rte_socket_id());
 	if (!tbl)
 		goto error;
+	tbl->cfg = *table_cfg;
 	/* Allocate flow indexed pool. */
 	tbl->flow = mlx5_ipool_create(&cfg);
 	if (!tbl->flow)
@@ -2134,7 +2178,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			goto at_error;
 		}
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, attr,
+		err = flow_hw_actions_translate(dev, &tbl->cfg,
 						&tbl->ats[i].acts,
 						action_templates[i], error);
 		if (err) {
@@ -2177,6 +2221,96 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Translates group index specified by the user in @p attr to internal
+ * group index.
+ *
+ * Translation is done by incrementing group index, so group n becomes n + 1.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] cfg
+ *   Pointer to the template table configuration.
+ * @param[in] group
+ *   Currently used group index (table group or jump destination).
+ * @param[out] table_group
+ *   Pointer to output group index.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success. Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static int
+flow_hw_translate_group(struct rte_eth_dev *dev,
+			const struct mlx5_flow_template_table_cfg *cfg,
+			uint32_t group,
+			uint32_t *table_group,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
+
+	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
+	} else {
+		*table_group = group;
+	}
+	return 0;
+}
+
+/**
+ * Create flow table.
+ *
+ * This function is a wrapper over @ref flow_hw_table_create(), which translates parameters
+ * provided by user to proper internal values.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] attr
+ *   Pointer to the table attributes.
+ * @param[in] item_templates
+ *   Item template array to be binded to the table.
+ * @param[in] nb_item_templates
+ *   Number of item templates.
+ * @param[in] action_templates
+ *   Action template array to be binded to the table.
+ * @param[in] nb_action_templates
+ *   Number of action templates.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Table on success, Otherwise, returns negative error code, rte_errno is set
+ *   and error structure is filled.
+ */
+static struct rte_flow_template_table *
+flow_hw_template_table_create(struct rte_eth_dev *dev,
+			      const struct rte_flow_template_table_attr *attr,
+			      struct rte_flow_pattern_template *item_templates[],
+			      uint8_t nb_item_templates,
+			      struct rte_flow_actions_template *action_templates[],
+			      uint8_t nb_action_templates,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = *attr,
+		.external = true,
+	};
+	uint32_t group = attr->flow_attr.group;
+
+	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
+		return NULL;
+	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
+				    action_templates, nb_action_templates, error);
+}
+
 /**
  * Destroy flow table.
  *
@@ -2312,10 +2446,13 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 					  "cannot use represented_port actions"
 					  " without an E-Switch");
-	if (mask_conf->port_id) {
+	if (mask_conf && mask_conf->port_id) {
 		struct mlx5_priv *port_priv;
 		struct mlx5_priv *dev_priv;
 
+		if (!action_conf)
+			return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "port index was not provided");
 		port_priv = mlx5_port_to_eswitch_info(action_conf->port_id, false);
 		if (!port_priv)
 			return rte_flow_error_set(error, rte_errno,
@@ -2340,20 +2477,77 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static inline int
+flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
+				const struct rte_flow_action masks[],
+				const struct rte_flow_action *ins_actions,
+				const struct rte_flow_action *ins_masks,
+				struct rte_flow_action *new_actions,
+				struct rte_flow_action *new_masks,
+				uint16_t *ins_pos)
+{
+	uint16_t idx, total = 0;
+	bool ins = false;
+	bool act_end = false;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(ins_actions && ins_masks);
+	for (idx = 0; !act_end; idx++) {
+		if (idx >= MLX5_HW_MAX_ACTS)
+			return -1;
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
+		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+			ins = true;
+			*ins_pos = idx;
+		}
+		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+			act_end = true;
+	}
+	if (!ins)
+		return 0;
+	else if (idx == MLX5_HW_MAX_ACTS)
+		return -1; /* No more space. */
+	total = idx;
+	/* Before the position, no change for the actions. */
+	for (idx = 0; idx < *ins_pos; idx++) {
+		new_actions[idx] = actions[idx];
+		new_masks[idx] = masks[idx];
+	}
+	/* Insert the new action and mask to the position. */
+	new_actions[idx] = *ins_actions;
+	new_masks[idx] = *ins_masks;
+	/* Remaining content is right shifted by one position. */
+	for (; idx < total; idx++) {
+		new_actions[idx + 1] = actions[idx];
+		new_masks[idx + 1] = masks[idx];
+	}
+	return 0;
+}
+
 static int
 flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
 			struct rte_flow_error *error)
 {
-	int i;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t i;
 	bool actions_end = false;
 	int ret;
 
+	/* FDB actions are only valid to proxy port. */
+	if (attr->transfer && (!priv->sh->config.dv_esw_en || !priv->master))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "transfer actions are only valid to proxy port");
 	for (i = 0; !actions_end; ++i) {
 		const struct rte_flow_action *action = &actions[i];
 		const struct rte_flow_action *mask = &masks[i];
 
+		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
 		if (action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -2450,21 +2644,77 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int len, act_len, mask_len, i;
-	struct rte_flow_actions_template *at;
+	struct rte_flow_actions_template *at = NULL;
+	uint16_t pos = MLX5_HW_MAX_ACTS;
+	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
+	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
+	const struct rte_flow_action *ra;
+	const struct rte_flow_action *rm;
+	const struct rte_flow_action_modify_field rx_mreg = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_B,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field rx_mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action rx_cpy = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg,
+	};
+	const struct rte_flow_action rx_cpy_mask = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &rx_mreg_mask,
+	};
 
-	if (flow_hw_action_validate(dev, actions, masks, error))
+	if (flow_hw_action_validate(dev, attr, actions, masks, error))
 		return NULL;
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				NULL, 0, actions, error);
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en) {
+		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
+						    tmp_action, tmp_mask, &pos)) {
+			rte_flow_error_set(error, EINVAL,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "Failed to concatenate new action/mask");
+			return NULL;
+		}
+	}
+	/* Application should make sure only one Q/RSS exist in one rule. */
+	if (pos == MLX5_HW_MAX_ACTS) {
+		ra = actions;
+		rm = masks;
+	} else {
+		ra = tmp_action;
+		rm = tmp_mask;
+	}
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
 		return NULL;
 	len = RTE_ALIGN(act_len, 16);
-	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS,
-				 NULL, 0, masks, error);
+	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, rm, error);
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at), 64, rte_socket_id());
+	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
+			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
 		rte_flow_error_set(error, ENOMEM,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -2472,18 +2722,20 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
+	/* Actions part is in the first half. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
-	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions, len,
-				actions, error);
+	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
+				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	at->masks = (struct rte_flow_action *)
-		    (((uint8_t *)at->actions) + act_len);
+	/* Masks part is in the second half. */
+	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
-				 len - act_len, masks, error);
+				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
 	 * The rte_flow_conv() function copies the content from conf pointer.
@@ -2500,7 +2752,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	mlx5_free(at);
+	if (at)
+		mlx5_free(at);
 	return NULL;
 }
 
@@ -2575,6 +2828,80 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 	return copied_items;
 }
 
+static int
+flow_hw_pattern_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error)
+{
+	int i;
+	bool items_end = false;
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+
+	for (i = 0; !items_end; i++) {
+		int type = items[i].type;
+
+		switch (type) {
+		case RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			int reg;
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+
+			reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_TAG, tag->index);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported tag index");
+			break;
+		}
+		case MLX5_RTE_FLOW_ITEM_TYPE_TAG:
+		{
+			const struct rte_flow_item_tag *tag =
+				(const struct rte_flow_item_tag *)items[i].spec;
+			struct mlx5_priv *priv = dev->data->dev_private;
+			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
+
+			if (!((1 << (tag->index - REG_C_0)) & regcs))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported internal tag index");
+		}
+		case RTE_FLOW_ITEM_TYPE_VOID:
+		case RTE_FLOW_ITEM_TYPE_ETH:
+		case RTE_FLOW_ITEM_TYPE_VLAN:
+		case RTE_FLOW_ITEM_TYPE_IPV4:
+		case RTE_FLOW_ITEM_TYPE_IPV6:
+		case RTE_FLOW_ITEM_TYPE_UDP:
+		case RTE_FLOW_ITEM_TYPE_TCP:
+		case RTE_FLOW_ITEM_TYPE_GTP:
+		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ITEM_TYPE_VXLAN:
+		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
+		case RTE_FLOW_ITEM_TYPE_META:
+		case RTE_FLOW_ITEM_TYPE_GRE:
+		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
+		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
+		case RTE_FLOW_ITEM_TYPE_ICMP:
+		case RTE_FLOW_ITEM_TYPE_ICMP6:
+			break;
+		case RTE_FLOW_ITEM_TYPE_END:
+			items_end = true;
+			break;
+		default:
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						  NULL,
+						  "Unsupported item type");
+		}
+	}
+	return 0;
+}
+
 /**
  * Create flow item template.
  *
@@ -2601,6 +2928,8 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
 
+	if (flow_hw_pattern_validate(dev, attr, items, error))
+		return NULL;
 	if (priv->sh->config.dv_esw_en && attr->ingress) {
 		/*
 		 * Disallow pattern template with ingress and egress/transfer
@@ -3035,6 +3364,17 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+static uint32_t
+flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+{
+	uint32_t usable_mask = ~priv->vport_meta_mask;
+
+	if (usable_mask)
+		return (1 << rte_bsf32(usable_mask));
+	else
+		return 0;
+}
+
 /**
  * Creates a flow pattern template used to match on E-Switch Manager.
  * This template is used to set up a table for SQ miss default flow.
@@ -3073,7 +3413,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match on a TX queue.
+ * Creates a flow pattern template used to match REG_C_0 and a TX queue.
+ * Matching on REG_C_0 is set up to match on least significant bit usable
+ * by user-space, which is set when packet was originated from E-Switch Manager.
+ *
  * This template is used to set up a table for SQ miss default flow.
  *
  * @param dev
@@ -3083,16 +3426,30 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
  *   Pointer to flow pattern template on success, NULL otherwise.
  */
 static struct rte_flow_pattern_template *
-flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
+flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
 	};
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
@@ -3103,6 +3460,12 @@ flow_hw_create_ctrl_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
+		return NULL;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -3140,6 +3503,132 @@ flow_hw_create_ctrl_port_pattern_template(struct rte_eth_dev *dev)
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
+/*
+ * Creating a flow pattern template with all ETH packets matching.
+ * This template is used to set up a table for default Tx copy (Tx metadata
+ * to REG_C_1) flow rule usage.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow pattern template on success, NULL otherwise.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr tx_pa_attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_pattern_template_create(dev, &tx_pa_attr, eth_all, &drop_err);
+}
+
+/**
+ * Creates a flow actions template with modify field action and masked jump action.
+ * Modify field action sets the least significant bit of REG_C_0 (usable by user-space)
+ * to 1, meaning that packet was originated from E-Switch Manager. Jump action
+ * transfers steering to group 1.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
+	uint32_t marker_bit_mask = UINT32_MAX;
+	struct rte_flow_actions_template_attr attr = {
+		.transfer = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
+	struct rte_flow_action_modify_field set_reg_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_v,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action actions_m[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &set_reg_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_m,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
+		return NULL;
+	}
+	set_reg_v.dst.offset = rte_bsf32(marker_bit);
+	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
+	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
 /**
  * Creates a flow actions template with an unmasked JUMP action. Flows
  * based on this template will perform a jump to some group. This template
@@ -3234,6 +3723,73 @@ flow_hw_create_ctrl_port_actions_template(struct rte_eth_dev *dev)
 					       NULL);
 }
 
+/*
+ * Creating an actions template to use header modify action for register
+ * copying. This template is used to set up a table for copy flow.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to flow actions template on success, NULL otherwise.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
+{
+	struct rte_flow_actions_template_attr tx_act_attr = {
+		.egress = 1,
+	};
+	const struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	const struct rte_flow_action_modify_field mreg_mask = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	const struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	const struct rte_flow_action copy_reg_mask[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_mask,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+	struct rte_flow_error drop_err;
+
+	RTE_SET_USED(drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
+					       copy_reg_mask, &drop_err);
+}
+
 /**
  * Creates a control flow table used to transfer traffic from E-Switch Manager
  * and TX queues from group 0 to group 1.
@@ -3263,8 +3819,12 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 
@@ -3289,16 +3849,56 @@ flow_hw_create_ctrl_sq_miss_table(struct rte_eth_dev *dev,
 {
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
-			.group = MLX5_HW_SQ_MISS_GROUP,
-			.priority = 0,
+			.group = 1,
+			.priority = MLX5_HW_LOWEST_PRIO_NON_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
+
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
+}
+
+/*
+ * Creating the default Tx metadata copy table on NIC Tx group 0.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param pt
+ *   Pointer to flow pattern template.
+ * @param at
+ *   Pointer to flow actions template.
+ *
+ * @return
+ *   Pointer to flow table on success, NULL otherwise.
+ */
+static struct rte_flow_template_table*
+flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
+					  struct rte_flow_pattern_template *pt,
+					  struct rte_flow_actions_template *at)
+{
+	struct rte_flow_template_table_attr tx_tbl_attr = {
+		.flow_attr = {
+			.group = 0, /* Root */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = 1, /* One default flow rule for all. */
+	};
+	struct mlx5_flow_template_table_cfg tx_tbl_cfg = {
+		.attr = tx_tbl_attr,
+		.external = false,
+	};
+	struct rte_flow_error drop_err;
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	RTE_SET_USED(drop_err);
+	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
 }
 
 /**
@@ -3323,15 +3923,19 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 15, /* TODO: Flow priority discovery. */
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
 		},
 		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
 	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
 
-	return flow_hw_table_create(dev, &attr, &it, 1, &at, 1, NULL);
+	return flow_hw_table_create(dev, &cfg, &it, 1, &at, 1, NULL);
 }
 
 /**
@@ -3349,11 +3953,14 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_pattern_template *esw_mgr_items_tmpl = NULL;
-	struct rte_flow_pattern_template *sq_items_tmpl = NULL;
+	struct rte_flow_pattern_template *regc_sq_items_tmpl = NULL;
 	struct rte_flow_pattern_template *port_items_tmpl = NULL;
-	struct rte_flow_actions_template *jump_sq_actions_tmpl = NULL;
+	struct rte_flow_pattern_template *tx_meta_items_tmpl = NULL;
+	struct rte_flow_actions_template *regc_jump_actions_tmpl = NULL;
 	struct rte_flow_actions_template *port_actions_tmpl = NULL;
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
+	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
+	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
 
 	/* Item templates */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
@@ -3362,8 +3969,8 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	sq_items_tmpl = flow_hw_create_ctrl_sq_pattern_template(dev);
-	if (!sq_items_tmpl) {
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create SQ item template for"
 			" control flows", dev->data->port_id);
 		goto error;
@@ -3374,11 +3981,18 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Action templates */
-	jump_sq_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev,
-									 MLX5_HW_SQ_MISS_GROUP);
-	if (!jump_sq_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
+	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
+	if (!regc_jump_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
@@ -3388,23 +4002,32 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template(dev, 1);
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
 	if (!jump_one_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+	}
 	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
-			(dev, esw_mgr_items_tmpl, jump_sq_actions_tmpl);
+			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_root_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (root table)"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
-	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, sq_items_tmpl,
+	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
 	if (!priv->hw_esw_sq_miss_tbl) {
 		DRV_LOG(ERR, "port %u failed to create table for default sq miss (non-root table)"
@@ -3419,6 +4042,16 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
+		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
+					tx_meta_items_tmpl, tx_meta_actions_tmpl);
+		if (!priv->hw_tx_meta_cpy_tbl) {
+			DRV_LOG(ERR, "port %u failed to create table for default"
+				" Tx metadata copy flow rule", dev->data->port_id);
+			goto error;
+		}
+	}
 	return 0;
 error:
 	if (priv->hw_esw_zero_tbl) {
@@ -3433,16 +4066,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
 	if (port_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
-	if (jump_sq_actions_tmpl)
-		flow_hw_actions_template_destroy(dev, jump_sq_actions_tmpl, NULL);
+	if (regc_jump_actions_tmpl)
+		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
+	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
-	if (sq_items_tmpl)
-		flow_hw_pattern_template_destroy(dev, sq_items_tmpl, NULL);
+	if (regc_sq_items_tmpl)
+		flow_hw_pattern_template_destroy(dev, regc_sq_items_tmpl, NULL);
 	if (esw_mgr_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, esw_mgr_items_tmpl, NULL);
 	return -EINVAL;
@@ -3494,7 +4131,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
-	int ret;
+	int ret = 0;
 
 	if (!port_attr || !nb_queue || !queue_attr) {
 		rte_errno = EINVAL;
@@ -3645,6 +4282,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	/* Do not overwrite the internal errno information. */
+	if (ret)
+		return ret;
 	return rte_flow_error_set(error, rte_errno,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				  "fail to configure port");
@@ -3754,17 +4394,17 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		return;
 	unset |= 1 << (priv->mtr_color_reg - REG_C_0);
 	unset |= 1 << (REG_C_6 - REG_C_0);
-	if (meta_mode == MLX5_XMETA_MODE_META32_HWS) {
-		unset |= 1 << (REG_C_1 - REG_C_0);
+	if (priv->sh->config.dv_esw_en)
 		unset |= 1 << (REG_C_0 - REG_C_0);
-	}
+	if (meta_mode == MLX5_XMETA_MODE_META32_HWS)
+		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
 						mlx5_flow_hw_avl_tags[i];
-				copy_masks |= (1 << i);
+				copy_masks |= (1 << (mlx5_flow_hw_avl_tags[i] - REG_C_0));
 			}
 		}
 		if (copy_masks != masks) {
@@ -3906,7 +4546,6 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	return flow_dv_action_destroy(dev, handle, error);
 }
 
-
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -3914,7 +4553,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
-	.template_table_create = flow_hw_table_create,
+	.template_table_create = flow_hw_template_table_create,
 	.template_table_destroy = flow_hw_table_destroy,
 	.async_flow_create = flow_hw_async_flow_create,
 	.async_flow_destroy = flow_hw_async_flow_destroy,
@@ -3930,13 +4569,6 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.action_query = flow_dv_action_query,
 };
 
-static uint32_t
-flow_hw_get_ctrl_queue(struct mlx5_priv *priv)
-{
-	MLX5_ASSERT(priv->nb_queue > 0);
-	return priv->nb_queue - 1;
-}
-
 /**
  * Creates a control flow using flow template API on @p proxy_dev device,
  * on behalf of @p owner_dev device.
@@ -3974,7 +4606,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4049,7 +4681,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = flow_hw_get_ctrl_queue(priv);
+	uint32_t queue = priv->nb_queue - 1;
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4186,10 +4818,24 @@ mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
+	struct rte_flow_action_modify_field modify_field = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = 1,
+	};
 	struct rte_flow_action_jump jump = {
-		.group = MLX5_HW_SQ_MISS_GROUP,
+		.group = 1,
 	};
 	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &modify_field,
+		},
 		{
 			.type = RTE_FLOW_ACTION_TYPE_JUMP,
 			.conf = &jump,
@@ -4212,6 +4858,12 @@ int
 mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 {
 	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_tag reg_c0_spec = {
+		.index = (uint8_t)REG_C_0,
+	};
+	struct rte_flow_item_tag reg_c0_mask = {
+		.index = 0xff,
+	};
 	struct mlx5_rte_flow_item_sq queue_spec = {
 		.queue = txq,
 	};
@@ -4219,6 +4871,12 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		.queue = UINT32_MAX,
 	};
 	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &reg_c0_spec,
+			.mask = &reg_c0_mask,
+		},
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
@@ -4244,6 +4902,7 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
+	uint32_t marker_bit;
 	int ret;
 
 	RTE_SET_USED(txq);
@@ -4264,6 +4923,14 @@ mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
+	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
+	if (!marker_bit) {
+		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	reg_c0_spec.data = marker_bit;
+	reg_c0_mask.data = marker_bit;
 	return flow_hw_create_ctrl_flow(dev, proxy_dev,
 					proxy_priv->hw_esw_sq_miss_tbl,
 					items, 0, actions, 0);
@@ -4323,4 +4990,53 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 					items, 0, actions, 0);
 }
 
+int
+mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_eth promisc = {
+		.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+		.type = 0,
+	};
+	struct rte_flow_item eth_all[] = {
+		[0] = {
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &promisc,
+			.mask = &promisc,
+		},
+		[1] = {
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action_modify_field mreg_action = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action copy_reg_action[] = {
+		[0] = {
+			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+			.conf = &mreg_action,
+		},
+		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	MLX5_ASSERT(priv->master);
+	if (!priv->dr_ctx || !priv->hw_tx_meta_cpy_tbl)
+		return 0;
+	return flow_hw_create_ctrl_flow(dev, dev,
+					priv->hw_tx_meta_cpy_tbl,
+					eth_all, 0, copy_reg_action, 0);
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f59d314ff4..cccec08d70 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1292,6 +1292,9 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	if (priv->sh->config.dv_esw_en && priv->master) {
 		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
 			goto error;
+		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
+			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+				goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 07/18] net/mlx5: add HW steering meter action
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (5 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:44     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 08/18] net/mlx5: add HW steering counter action Suanming Mou
                     ` (11 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

This commit adds meter action for HWS steering.

HW steering meter is based on ASO. The number of meters will
be used by flows should be specified in advanced in the flow
configure API.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/net/mlx5/mlx5.h                |  61 ++-
 drivers/net/mlx5/mlx5_flow.c           |  71 +++
 drivers/net/mlx5/mlx5_flow.h           |  24 +
 drivers/net/mlx5/mlx5_flow_aso.c       |  30 +-
 drivers/net/mlx5/mlx5_flow_hw.c        | 258 ++++++++-
 drivers/net/mlx5/mlx5_flow_meter.c     | 702 ++++++++++++++++++++++++-
 7 files changed, 1111 insertions(+), 36 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index c69e796c16..a0030aac37 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -254,6 +254,7 @@ New Features
   * Added fully support for queue based async HW steering to the PMD:
     - Support of modify fields.
     - Support of FDB.
+    - Support of meter.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e15b80ba92..f99820c045 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -362,6 +362,9 @@ struct mlx5_hw_q {
 	struct mlx5_hw_q_job **job; /* LIFO header. */
 } __rte_cache_aligned;
 
+
+
+
 #define MLX5_COUNTERS_PER_POOL 512
 #define MLX5_MAX_PENDING_QUERIES 4
 #define MLX5_CNT_CONTAINER_RESIZE 64
@@ -787,15 +790,29 @@ struct mlx5_flow_meter_policy {
 	/* Is meter action in policy table. */
 	uint32_t hierarchy_drop_cnt:1;
 	/* Is any meter in hierarchy contains drop_cnt. */
+	uint32_t skip_r:1;
+	/* If red color policy is skipped. */
 	uint32_t skip_y:1;
 	/* If yellow color policy is skipped. */
 	uint32_t skip_g:1;
 	/* If green color policy is skipped. */
 	uint32_t mark:1;
 	/* If policy contains mark action. */
+	uint32_t initialized:1;
+	/* Initialized. */
+	uint16_t group;
+	/* The group. */
 	rte_spinlock_t sl;
 	uint32_t ref_cnt;
 	/* Use count. */
+	struct rte_flow_pattern_template *hws_item_templ;
+	/* Hardware steering item templates. */
+	struct rte_flow_actions_template *hws_act_templ[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering action templates. */
+	struct rte_flow_template_table *hws_flow_table[MLX5_MTR_DOMAIN_MAX];
+	/* Hardware steering tables. */
+	struct rte_flow *hws_flow_rule[MLX5_MTR_DOMAIN_MAX][RTE_COLORS];
+	/* Hardware steering rules. */
 	struct mlx5_meter_policy_action_container act_cnt[MLX5_MTR_RTE_COLORS];
 	/* Policy actions container. */
 	void *dr_drop_action[MLX5_MTR_DOMAIN_MAX];
@@ -870,6 +887,7 @@ struct mlx5_flow_meter_info {
 	 */
 	uint32_t transfer:1;
 	uint32_t def_policy:1;
+	uint32_t initialized:1;
 	/* Meter points to default policy. */
 	uint32_t color_aware:1;
 	/* Meter is color aware mode. */
@@ -885,6 +903,10 @@ struct mlx5_flow_meter_info {
 	/**< Flow meter action. */
 	void *meter_action_y;
 	/**< Flow meter action for yellow init_color. */
+	uint32_t meter_offset;
+	/**< Flow meter offset. */
+	uint16_t group;
+	/**< Flow meter group. */
 };
 
 /* PPS(packets per second) map to BPS(Bytes per second).
@@ -919,6 +941,7 @@ struct mlx5_flow_meter_profile {
 	uint32_t ref_cnt; /**< Use count. */
 	uint32_t g_support:1; /**< If G color will be generated. */
 	uint32_t y_support:1; /**< If Y color will be generated. */
+	uint32_t initialized:1; /**< Initialized. */
 };
 
 /* 2 meters in each ASO cache line */
@@ -939,13 +962,20 @@ enum mlx5_aso_mtr_state {
 	ASO_METER_READY, /* CQE received. */
 };
 
+/*aso flow meter type*/
+enum mlx5_aso_mtr_type {
+	ASO_METER_INDIRECT,
+	ASO_METER_DIRECT,
+};
+
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
 	LIST_ENTRY(mlx5_aso_mtr) next;
+	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
-	uint8_t offset;
+	uint32_t offset;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -969,6 +999,14 @@ struct mlx5_aso_mtr_pools_mng {
 	struct mlx5_aso_mtr_pool **pools; /* ASO flow meter pool array. */
 };
 
+/* Bulk management structure for ASO flow meter. */
+struct mlx5_mtr_bulk {
+	uint32_t size; /* Number of ASO objects. */
+	struct mlx5dr_action *action; /* HWS action */
+	struct mlx5_devx_obj *devx_obj; /* DEVX object. */
+	struct mlx5_aso_mtr *aso; /* Array of ASO objects. */
+};
+
 /* Meter management structure for global flow meter resource. */
 struct mlx5_flow_mtr_mng {
 	struct mlx5_aso_mtr_pools_mng pools_mng;
@@ -1022,6 +1060,7 @@ struct mlx5_flow_tbl_resource {
 #define MLX5_FLOW_TABLE_LEVEL_METER (MLX5_MAX_TABLES - 3)
 #define MLX5_FLOW_TABLE_LEVEL_POLICY (MLX5_MAX_TABLES - 4)
 #define MLX5_MAX_TABLES_EXTERNAL MLX5_FLOW_TABLE_LEVEL_POLICY
+#define MLX5_FLOW_TABLE_HWS_POLICY (MLX5_MAX_TABLES - 10)
 #define MLX5_MAX_TABLES_FDB UINT16_MAX
 #define MLX5_FLOW_TABLE_FACTOR 10
 
@@ -1314,6 +1353,12 @@ TAILQ_HEAD(mlx5_mtr_profiles, mlx5_flow_meter_profile);
 /* MTR list. */
 TAILQ_HEAD(mlx5_legacy_flow_meters, mlx5_legacy_flow_meter);
 
+struct mlx5_mtr_config {
+	uint32_t nb_meters; /**< Number of configured meters */
+	uint32_t nb_meter_profiles; /**< Number of configured meter profiles */
+	uint32_t nb_meter_policies; /**< Number of configured meter policies */
+};
+
 /* RSS description. */
 struct mlx5_flow_rss_desc {
 	uint32_t level;
@@ -1551,12 +1596,16 @@ struct mlx5_priv {
 	struct mlx5_nl_vlan_vmwa_context *vmwa_context; /* VLAN WA context. */
 	struct mlx5_hlist *mreg_cp_tbl;
 	/* Hash table of Rx metadata register copy table. */
+	struct mlx5_mtr_config mtr_config; /* Meter configuration */
 	uint8_t mtr_sfx_reg; /* Meter prefix-suffix flow match REG_C. */
 	uint8_t mtr_color_reg; /* Meter color match REG_C. */
 	struct mlx5_legacy_flow_meters flow_meters; /* MTR list. */
 	struct mlx5_l3t_tbl *mtr_profile_tbl; /* Meter index lookup table. */
+	struct mlx5_flow_meter_profile *mtr_profile_arr; /* Profile array. */
 	struct mlx5_l3t_tbl *policy_idx_tbl; /* Policy index lookup table. */
+	struct mlx5_flow_meter_policy *mtr_policy_arr; /* Policy array. */
 	struct mlx5_l3t_tbl *mtr_idx_tbl; /* Meter index lookup table. */
+	struct mlx5_mtr_bulk mtr_bulk; /* Meter index mapping for HWS */
 	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 	uint8_t fdb_def_rule; /* Whether fdb jump to table 1 is configured. */
 	struct mlx5_mp_id mp_id; /* ID of a multi-process process */
@@ -1570,13 +1619,13 @@ struct mlx5_priv {
 	struct mlx5_flex_item flex_item[MLX5_PORT_FLEX_ITEM_NUM];
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
+	uint32_t nb_queue; /* HW steering queue number. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
 	/* Action template list. */
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
-	uint32_t nb_queue; /* HW steering queue number. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
@@ -1592,6 +1641,7 @@ struct mlx5_priv {
 
 #define PORT_ID(priv) ((priv)->dev_data->port_id)
 #define ETH_DEV(priv) (&rte_eth_devices[PORT_ID(priv)])
+#define CTRL_QUEUE_ID(priv) ((priv)->nb_queue - 1)
 
 struct rte_hairpin_peer_info {
 	uint32_t qp_id;
@@ -1903,6 +1953,11 @@ void mlx5_pmd_socket_uninit(void);
 
 /* mlx5_flow_meter.c */
 
+int mlx5_flow_meter_init(struct rte_eth_dev *dev,
+			 uint32_t nb_meters,
+			 uint32_t nb_meter_profiles,
+			 uint32_t nb_meter_policies);
+void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
 		uint32_t meter_id, uint32_t *mtr_idx);
@@ -1977,7 +2032,7 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 51d2b42755..1c97b77031 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8333,6 +8333,40 @@ mlx5_flow_port_configure(struct rte_eth_dev *dev,
 	return fops->configure(dev, port_attr, nb_queue, queue_attr, error);
 }
 
+/**
+ * Validate item template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the item template attributes.
+ * @param[in] items
+ *   The template item pattern.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"pattern validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->pattern_validate(dev, attr, items, error);
+}
+
 /**
  * Create flow item template.
  *
@@ -8398,6 +8432,43 @@ mlx5_flow_pattern_template_destroy(struct rte_eth_dev *dev,
 	return fops->pattern_template_destroy(dev, template, error);
 }
 
+/**
+ * Validate flow actions template.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] attr
+ *   Pointer to the action template attributes.
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[in] masks
+ *   List of actions that marks which of the action's member is constant.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_actions_template_attr *attr,
+			const struct rte_flow_action actions[],
+			const struct rte_flow_action masks[],
+			struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr fattr = {0};
+
+	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
+		rte_flow_error_set(error, ENOTSUP,
+			RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			"actions validate with incorrect steering mode");
+		return -ENOTSUP;
+	}
+	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+	return fops->actions_validate(dev, attr, actions, masks, error);
+}
+
 /**
  * Create flow item template.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 2adf516691..f9600568a0 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1668,6 +1668,11 @@ typedef int (*mlx5_flow_port_configure_t)
 			 uint16_t nb_queue,
 			 const struct rte_flow_queue_attr *queue_attr[],
 			 struct rte_flow_error *err);
+typedef int (*mlx5_flow_pattern_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_pattern_template_attr *attr,
+			 const struct rte_flow_item items[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_pattern_template *(*mlx5_flow_pattern_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_pattern_template_attr *attr,
@@ -1677,6 +1682,12 @@ typedef int (*mlx5_flow_pattern_template_destroy_t)
 			(struct rte_eth_dev *dev,
 			 struct rte_flow_pattern_template *template,
 			 struct rte_flow_error *error);
+typedef int (*mlx5_flow_actions_validate_t)
+			(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error);
 typedef struct rte_flow_actions_template *(*mlx5_flow_actions_template_create_t)
 			(struct rte_eth_dev *dev,
 			 const struct rte_flow_actions_template_attr *attr,
@@ -1793,8 +1804,10 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_item_update_t item_update;
 	mlx5_flow_info_get_t info_get;
 	mlx5_flow_port_configure_t configure;
+	mlx5_flow_pattern_validate_t pattern_validate;
 	mlx5_flow_pattern_template_create_t pattern_template_create;
 	mlx5_flow_pattern_template_destroy_t pattern_template_destroy;
+	mlx5_flow_actions_validate_t actions_validate;
 	mlx5_flow_actions_template_create_t actions_template_create;
 	mlx5_flow_actions_template_destroy_t actions_template_destroy;
 	mlx5_flow_table_create_t template_table_create;
@@ -1876,6 +1889,8 @@ mlx5_aso_meter_by_idx(struct mlx5_priv *priv, uint32_t idx)
 
 	/* Decrease to original index. */
 	idx--;
+	if (priv->mtr_bulk.aso)
+		return priv->mtr_bulk.aso + idx;
 	MLX5_ASSERT(idx / MLX5_ASO_MTRS_PER_POOL < pools_mng->n);
 	rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 	pool = pools_mng->pools[idx / MLX5_ASO_MTRS_PER_POOL];
@@ -2390,4 +2405,13 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t txq);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_actions_template_attr *attr,
+		const struct rte_flow_action actions[],
+		const struct rte_flow_action masks[],
+		struct rte_flow_error *error);
+int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
+		const struct rte_flow_pattern_template_attr *attr,
+		const struct rte_flow_item items[],
+		struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 4129e3a9e0..60d0280367 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -642,7 +642,8 @@ mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh)
 static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
-			       struct mlx5_aso_mtr *aso_mtr)
+			       struct mlx5_aso_mtr *aso_mtr,
+			       struct mlx5_mtr_bulk *bulk)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -653,6 +654,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t dseg_idx = 0;
 	struct mlx5_aso_mtr_pool *pool = NULL;
 	uint32_t param_le;
+	int id;
 
 	rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
@@ -666,14 +668,19 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
-	pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-			mtrs[aso_mtr->offset]);
-	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
-			(aso_mtr->offset >> 1));
-	wqe->general_cseg.opcode = rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
-			(ASO_OPC_MOD_POLICER <<
-			WQE_CSEG_OPC_MOD_OFFSET) |
-			sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
+	if (aso_mtr->type == ASO_METER_INDIRECT) {
+		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+				    mtrs[aso_mtr->offset]);
+		id = pool->devx_obj->id;
+	} else {
+		id = bulk->devx_obj->id;
+	}
+	wqe->general_cseg.misc = rte_cpu_to_be_32(id +
+						  (aso_mtr->offset >> 1));
+	wqe->general_cseg.opcode =
+		rte_cpu_to_be_32(MLX5_OPCODE_ACCESS_ASO |
+			(ASO_OPC_MOD_POLICER << WQE_CSEG_OPC_MOD_OFFSET) |
+			 sq->pi << WQE_CSEG_WQE_INDEX_OFFSET);
 	/* There are 2 meters in one ASO cache line. */
 	dseg_idx = aso_mtr->offset & 0x1;
 	wqe->aso_cseg.data_mask =
@@ -811,14 +818,15 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  */
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-			struct mlx5_aso_mtr *mtr)
+			struct mlx5_aso_mtr *mtr,
+			struct mlx5_mtr_bulk *bulk)
 {
 	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 
 	do {
 		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index e4c3ec9b28..6be83e37de 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -914,6 +914,38 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static __rte_always_inline int
+flow_hw_meter_compile(struct rte_eth_dev *dev,
+		      const struct mlx5_flow_template_table_cfg *cfg,
+		      uint32_t  start_pos, const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	const struct rte_flow_action_meter *meter = action->conf;
+	uint32_t pos = start_pos;
+	uint32_t group = cfg->attr.flow_attr.group;
+
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
+	acts->rule_acts[pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
+		acts->jump = flow_hw_jump_action_register
+		(dev, cfg, aso_mtr->fm.group, error);
+	if (!acts->jump) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	acts->rule_acts[++pos].action = (!!group) ?
+				    acts->jump->hws_action :
+				    acts->jump->root_action;
+	*end_pos = pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		*end_pos = start_pos;
+		return -ENOMEM;
+	}
+	return 0;
+}
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1145,6 +1177,21 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter *)
+			     masks->conf)->mtr_id) {
+				err = flow_hw_meter_compile(dev, cfg,
+						i, actions, acts, &i, error);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							i))
+				goto err;
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1485,6 +1532,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
+	const struct rte_flow_action_meter *meter = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1492,6 +1540,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	struct mlx5_aso_mtr *mtr;
+	uint32_t mtr_id;
 
 	memcpy(rule_acts, hw_acts->rule_acts,
 	       sizeof(*rule_acts) * hw_acts->acts_num);
@@ -1611,6 +1661,29 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			rule_acts[act_data->action_dst].action =
 					priv->hw_vport[port_action->port_id];
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			meter = action->conf;
+			mtr_id = meter->mtr_id;
+			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			rule_acts[act_data->action_dst].action =
+				priv->mtr_bulk.action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+								mtr->offset;
+			jump = flow_hw_jump_action_register
+				(dev, &table->cfg, mtr->fm.group, NULL);
+			if (!jump)
+				return -1;
+			MLX5_ASSERT
+				(!rule_acts[act_data->action_dst + 1].action);
+			rule_acts[act_data->action_dst + 1].action =
+					(!!attr.group) ? jump->hws_action :
+							 jump->root_action;
+			job->flow->jump = jump;
+			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
+			(*acts_num)++;
+			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2526,7 +2599,7 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 }
 
 static int
-flow_hw_action_validate(struct rte_eth_dev *dev,
+flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
 			const struct rte_flow_action actions[],
 			const struct rte_flow_action masks[],
@@ -2592,6 +2665,9 @@ flow_hw_action_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -2685,7 +2761,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_action_validate(dev, attr, actions, masks, error))
+	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
@@ -3031,15 +3107,24 @@ flow_hw_pattern_template_destroy(struct rte_eth_dev *dev __rte_unused,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_hw_info_get(struct rte_eth_dev *dev __rte_unused,
-		 struct rte_flow_port_info *port_info __rte_unused,
-		 struct rte_flow_queue_info *queue_info __rte_unused,
+flow_hw_info_get(struct rte_eth_dev *dev,
+		 struct rte_flow_port_info *port_info,
+		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
-	/* Nothing to be updated currently. */
+	uint16_t port_id = dev->data->port_id;
+	struct rte_mtr_capabilities mtr_cap;
+	int ret;
+
 	memset(port_info, 0, sizeof(*port_info));
 	/* Queue size is unlimited from low-level. */
+	port_info->max_nb_queues = UINT32_MAX;
 	queue_info->max_size = UINT32_MAX;
+
+	memset(&mtr_cap, 0, sizeof(struct rte_mtr_capabilities));
+	ret = rte_mtr_capabilities_get(port_id, &mtr_cap, NULL);
+	if (!ret)
+		port_info->max_nb_meters = mtr_cap.n_max;
 	return 0;
 }
 
@@ -4234,6 +4319,10 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	priv->nb_queue = nb_q_updated;
 	rte_spinlock_init(&priv->hw_ctrl_lock);
 	LIST_INIT(&priv->hw_ctrl_flows);
+	/* Initialize meter library*/
+	if (port_attr->nb_meters)
+		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1))
+			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		uint32_t act_flags = 0;
@@ -4549,8 +4638,10 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
+	.pattern_validate = flow_hw_pattern_validate,
 	.pattern_template_create = flow_hw_pattern_template_create,
 	.pattern_template_destroy = flow_hw_pattern_template_destroy,
+	.actions_validate = flow_hw_actions_validate,
 	.actions_template_create = flow_hw_actions_template_create,
 	.actions_template_destroy = flow_hw_actions_template_destroy,
 	.template_table_create = flow_hw_template_table_create,
@@ -4606,7 +4697,7 @@ flow_hw_create_ctrl_flow(struct rte_eth_dev *owner_dev,
 			 uint8_t action_template_idx)
 {
 	struct mlx5_priv *priv = proxy_dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -4681,7 +4772,7 @@ static int
 flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t queue = priv->nb_queue - 1;
+	uint32_t queue = CTRL_QUEUE_ID(priv);
 	struct rte_flow_op_attr op_attr = {
 		.postpone = 0,
 	};
@@ -5039,4 +5130,155 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+void
+mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->mtr_policy_arr) {
+		mlx5_free(priv->mtr_policy_arr);
+		priv->mtr_policy_arr = NULL;
+	}
+	if (priv->mtr_profile_arr) {
+		mlx5_free(priv->mtr_profile_arr);
+		priv->mtr_profile_arr = NULL;
+	}
+	if (priv->mtr_bulk.aso) {
+		mlx5_free(priv->mtr_bulk.aso);
+		priv->mtr_bulk.aso = NULL;
+		priv->mtr_bulk.size = 0;
+		mlx5_aso_queue_uninit(priv->sh, ASO_OPC_MOD_POLICER);
+	}
+	if (priv->mtr_bulk.action) {
+		mlx5dr_action_destroy(priv->mtr_bulk.action);
+		priv->mtr_bulk.action = NULL;
+	}
+	if (priv->mtr_bulk.devx_obj) {
+		claim_zero(mlx5_devx_cmd_destroy(priv->mtr_bulk.devx_obj));
+		priv->mtr_bulk.devx_obj = NULL;
+	}
+}
+
+int
+mlx5_flow_meter_init(struct rte_eth_dev *dev,
+		     uint32_t nb_meters,
+		     uint32_t nb_meter_profiles,
+		     uint32_t nb_meter_policies)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_obj *dcs = NULL;
+	uint32_t log_obj_size;
+	int ret = 0;
+	int reg_id;
+	struct mlx5_aso_mtr *aso;
+	uint32_t i;
+	struct rte_flow_error error;
+
+	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter configuration is invalid.");
+		goto err;
+	}
+	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO is not supported.");
+		goto err;
+	}
+	priv->mtr_config.nb_meters = nb_meters;
+	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	log_obj_size = rte_log2_u32(nb_meters >> 1);
+	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
+		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
+			log_obj_size);
+	if (!dcs) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter ASO object allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.devx_obj = dcs;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	if (reg_id < 0) {
+		ret = ENOTSUP;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter register is not available.");
+		goto err;
+	}
+	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
+			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
+				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
+				MLX5DR_ACTION_FLAG_HWS_TX |
+				MLX5DR_ACTION_FLAG_HWS_FDB);
+	if (!priv->mtr_bulk.action) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter action creation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
+						sizeof(struct mlx5_aso_mtr) * nb_meters,
+						RTE_CACHE_LINE_SIZE,
+						SOCKET_ID_ANY);
+	if (!priv->mtr_bulk.aso) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter bulk ASO allocation failed.");
+		goto err;
+	}
+	priv->mtr_bulk.size = nb_meters;
+	aso = priv->mtr_bulk.aso;
+	for (i = 0; i < priv->mtr_bulk.size; i++) {
+		aso->type = ASO_METER_DIRECT;
+		aso->state = ASO_METER_WAIT;
+		aso->offset = i;
+		aso++;
+	}
+	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
+	priv->mtr_profile_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_profile) *
+				nb_meter_profiles,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_profile_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter profile allocation failed.");
+		goto err;
+	}
+	priv->mtr_config.nb_meter_policies = nb_meter_policies;
+	priv->mtr_policy_arr =
+		mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_flow_meter_policy) *
+				nb_meter_policies,
+				RTE_CACHE_LINE_SIZE,
+				SOCKET_ID_ANY);
+	if (!priv->mtr_policy_arr) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL, "Meter policy allocation failed.");
+		goto err;
+	}
+	return 0;
+err:
+	mlx5_flow_meter_uninit(dev);
+	return ret;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index d4aafe4eea..8cf24d1f7a 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -98,6 +98,8 @@ mlx5_flow_meter_profile_find(struct mlx5_priv *priv, uint32_t meter_profile_id)
 	union mlx5_l3t_data data;
 	int32_t ret;
 
+	if (priv->mtr_profile_arr)
+		return &priv->mtr_profile_arr[meter_profile_id];
 	if (mlx5_l3t_get_entry(priv->mtr_profile_tbl,
 			       meter_profile_id, &data) || !data.ptr)
 		return NULL;
@@ -145,17 +147,29 @@ mlx5_flow_meter_profile_validate(struct rte_eth_dev *dev,
 					  RTE_MTR_ERROR_TYPE_METER_PROFILE,
 					  NULL, "Meter profile is null.");
 	/* Meter profile ID must be valid. */
-	if (meter_profile_id == UINT32_MAX)
-		return -rte_mtr_error_set(error, EINVAL,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL, "Meter profile id not valid.");
-	/* Meter profile must not exist. */
-	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
-	if (fmp)
-		return -rte_mtr_error_set(error, EEXIST,
-					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
-					  NULL,
-					  "Meter profile already exists.");
+	if (priv->mtr_profile_arr) {
+		if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp->initialized)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	} else {
+		if (meter_profile_id == UINT32_MAX)
+			return -rte_mtr_error_set(error, EINVAL,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile id not valid.");
+		fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+		/* Meter profile must not exist. */
+		if (fmp)
+			return -rte_mtr_error_set(error, EEXIST,
+					RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					NULL, "Meter profile already exists.");
+	}
 	if (!priv->sh->meter_aso_en) {
 		/* Old version is even not supported. */
 		if (!priv->sh->cdev->config.hca_attr.qos.flow_meter_old)
@@ -574,6 +588,96 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to add MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[in] profile
+ *   Pointer to meter profile detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_add(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_meter_profile *profile,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+	int ret;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Check input params. */
+	ret = mlx5_flow_meter_profile_validate(dev, meter_profile_id,
+					       profile, error);
+	if (ret)
+		return ret;
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	/* Fill profile info. */
+	fmp->id = meter_profile_id;
+	fmp->profile = *profile;
+	fmp->initialized = 1;
+	/* Fill the flow meter parameters for the PRM. */
+	return mlx5_flow_meter_param_fill(fmp, error);
+}
+
+/**
+ * Callback to delete MTR profile with HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_profile_hws_delete(struct rte_eth_dev *dev,
+			uint32_t meter_profile_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *fmp;
+
+	if (!priv->mtr_profile_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter profile array is not allocated");
+	/* Meter id must be valid. */
+	if (meter_profile_id >= priv->mtr_config.nb_meter_profiles)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id not valid.");
+	/* Meter profile must exist. */
+	fmp = mlx5_flow_meter_profile_find(priv, meter_profile_id);
+	if (!fmp->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  &meter_profile_id,
+					  "Meter profile id is invalid.");
+	/* Check profile is unused. */
+	if (fmp->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+					  NULL, "Meter profile is in use.");
+	memset(fmp, 0, sizeof(struct mlx5_flow_meter_profile));
+	return 0;
+}
+
 /**
  * Find policy by id.
  *
@@ -594,6 +698,11 @@ mlx5_flow_meter_policy_find(struct rte_eth_dev *dev,
 	struct mlx5_flow_meter_sub_policy *sub_policy = NULL;
 	union mlx5_l3t_data data;
 
+	if (priv->mtr_policy_arr) {
+		if (policy_idx)
+			*policy_idx = policy_id;
+		return &priv->mtr_policy_arr[policy_id];
+	}
 	if (policy_id > MLX5_MAX_SUB_POLICY_TBL_NUM || !priv->policy_idx_tbl)
 		return NULL;
 	if (mlx5_l3t_get_entry(priv->policy_idx_tbl, policy_id, &data) ||
@@ -710,6 +819,43 @@ mlx5_flow_meter_policy_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to check MTR policy action validate for HWS
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_validate(struct rte_eth_dev *dev,
+	struct rte_mtr_meter_policy_params *policy,
+	struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_actions_template_attr attr = {
+		.transfer = priv->sh->config.dv_esw_en ? 1 : 0 };
+	int ret;
+	int i;
+
+	if (!priv->mtr_en || !priv->sh->meter_aso_en)
+		return -rte_mtr_error_set(error, ENOTSUP,
+				RTE_MTR_ERROR_TYPE_METER_POLICY,
+				NULL, "meter policy unsupported.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		ret = mlx5_flow_actions_validate(dev, &attr, policy->actions[i],
+						 policy->actions[i], NULL);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 static int
 __mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 			uint32_t policy_id,
@@ -1004,6 +1150,338 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to delete MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_delete(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy;
+	uint32_t i, j;
+	uint32_t nb_flows = 0;
+	int ret;
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter policy array is not allocated");
+	/* Meter id must be valid. */
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  &policy_id,
+					  "Meter policy id not valid.");
+	/* Meter policy must exist. */
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (!mtr_policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID, NULL,
+			"Meter policy does not exists.");
+	/* Check policy is unused. */
+	if (mtr_policy->ref_cnt)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy is in use.");
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->hws_flow_rule[i][j]) {
+				ret = rte_flow_async_destroy(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_rule[i][j],
+					NULL, NULL);
+				if (ret < 0)
+					continue;
+				nb_flows++;
+			}
+		}
+	}
+	ret = rte_flow_push(dev->data->port_id, CTRL_QUEUE_ID(priv), NULL);
+	while (nb_flows && (ret >= 0)) {
+		ret = rte_flow_pull(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), result,
+					nb_flows, NULL);
+		nb_flows -= ret;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		if (mtr_policy->hws_flow_table[i])
+			rte_flow_template_table_destroy(dev->data->port_id,
+				 mtr_policy->hws_flow_table[i], NULL);
+	}
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->hws_act_templ[i])
+			rte_flow_actions_template_destroy(dev->data->port_id,
+				 mtr_policy->hws_act_templ[i], NULL);
+	}
+	if (mtr_policy->hws_item_templ)
+		rte_flow_pattern_template_destroy(dev->data->port_id,
+				mtr_policy->hws_item_templ, NULL);
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	return 0;
+}
+
+/**
+ * Callback to add MTR policy for HWS.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[out] policy_id
+ *   Pointer to policy id
+ * @param[in] actions
+ *   Pointer to meter policy action detail.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
+			uint32_t policy_id,
+			struct rte_mtr_meter_policy_params *policy,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_policy *mtr_policy = NULL;
+	const struct rte_flow_action *act;
+	const struct rte_flow_action_meter *mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *plc;
+	uint8_t domain_color = MLX5_MTR_ALL_DOMAIN_BIT;
+	bool is_rss = false;
+	bool is_hierarchy = false;
+	int i, j;
+	uint32_t nb_colors = 0;
+	uint32_t nb_flows = 0;
+	int color;
+	int ret;
+	struct rte_flow_pattern_template_attr pta = {0};
+	struct rte_flow_actions_template_attr ata = {0};
+	struct rte_flow_template_table_attr ta = { {0}, 0 };
+	struct rte_flow_op_attr op_attr = { .postpone = 1 };
+	struct rte_flow_op_result result[RTE_COLORS * MLX5_MTR_DOMAIN_MAX];
+	const uint32_t color_mask = (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	int color_reg_c_idx = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						   0, NULL);
+	struct rte_flow_item_tag tag_spec = {
+		.data = 0,
+		.index = color_reg_c_idx
+	};
+	struct rte_flow_item_tag tag_mask = {
+		.data = color_mask,
+		.index = 0xff};
+	struct rte_flow_item pattern[] = {
+		[0] = {
+			.type = (enum rte_flow_item_type)
+				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &tag_spec,
+			.mask = &tag_mask,
+		},
+		[1] = { .type = RTE_FLOW_ITEM_TYPE_END }
+	};
+
+	if (!priv->mtr_policy_arr)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy array is not allocated.");
+	if (policy_id >= priv->mtr_config.nb_meter_policies)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy id not valid.");
+	mtr_policy = mlx5_flow_meter_policy_find(dev, policy_id, NULL);
+	if (mtr_policy->initialized)
+		return -rte_mtr_error_set(error, EEXIST,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy already exists.");
+	if (!policy ||
+	    !policy->actions[RTE_COLOR_RED] ||
+	    !policy->actions[RTE_COLOR_YELLOW] ||
+	    !policy->actions[RTE_COLOR_GREEN])
+		return -rte_mtr_error_set(error, EINVAL,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY,
+					  NULL, "Meter policy actions are not valid.");
+	if (policy->actions[RTE_COLOR_RED] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_r = 1;
+	if (policy->actions[RTE_COLOR_YELLOW] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_y = 1;
+	if (policy->actions[RTE_COLOR_GREEN] == RTE_FLOW_ACTION_TYPE_END)
+		mtr_policy->skip_g = 1;
+	if (mtr_policy->skip_r && mtr_policy->skip_y && mtr_policy->skip_g)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy actions are empty.");
+	for (i = 0; i < RTE_COLORS; i++) {
+		act = policy->actions[i];
+		while (act && act->type != RTE_FLOW_ACTION_TYPE_END) {
+			switch (act->type) {
+			case RTE_FLOW_ACTION_TYPE_PORT_ID:
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+				domain_color &= ~(MLX5_MTR_DOMAIN_INGRESS_BIT |
+						  MLX5_MTR_DOMAIN_EGRESS_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_RSS:
+				is_rss = true;
+				/* fall-through. */
+			case RTE_FLOW_ACTION_TYPE_QUEUE:
+				domain_color &= ~(MLX5_MTR_DOMAIN_EGRESS_BIT |
+						  MLX5_MTR_DOMAIN_TRANSFER_BIT);
+				break;
+			case RTE_FLOW_ACTION_TYPE_METER:
+				is_hierarchy = true;
+				mtr = act->conf;
+				fm = mlx5_flow_meter_find(priv,
+							  mtr->mtr_id, NULL);
+				if (!fm)
+					return -rte_mtr_error_set(error, EINVAL,
+						RTE_MTR_ERROR_TYPE_MTR_ID, NULL,
+						"Meter not found in meter hierarchy.");
+				plc = mlx5_flow_meter_policy_find(dev,
+								  fm->policy_id,
+								  NULL);
+				MLX5_ASSERT(plc);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->ingress <<
+					 MLX5_MTR_DOMAIN_INGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->egress <<
+					 MLX5_MTR_DOMAIN_EGRESS);
+				domain_color &= MLX5_MTR_ALL_DOMAIN_BIT &
+					(plc->transfer <<
+					 MLX5_MTR_DOMAIN_TRANSFER);
+				break;
+			default:
+				break;
+			}
+			act++;
+		}
+	}
+	if (!domain_color)
+		return -rte_mtr_error_set(error, ENOTSUP,
+					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+					  NULL, "Meter policy domains are conflicting.");
+	mtr_policy->is_rss = is_rss;
+	mtr_policy->ingress = !!(domain_color & MLX5_MTR_DOMAIN_INGRESS_BIT);
+	pta.ingress = mtr_policy->ingress;
+	mtr_policy->egress = !!(domain_color & MLX5_MTR_DOMAIN_EGRESS_BIT);
+	pta.egress = mtr_policy->egress;
+	mtr_policy->transfer = !!(domain_color & MLX5_MTR_DOMAIN_TRANSFER_BIT);
+	pta.transfer = mtr_policy->transfer;
+	mtr_policy->group = MLX5_FLOW_TABLE_HWS_POLICY - policy_id;
+	mtr_policy->is_hierarchy = is_hierarchy;
+	mtr_policy->initialized = 1;
+	rte_spinlock_lock(&priv->hw_ctrl_lock);
+	mtr_policy->hws_item_templ =
+		rte_flow_pattern_template_create(dev->data->port_id,
+						 &pta, pattern, NULL);
+	if (!mtr_policy->hws_item_templ)
+		goto policy_add_err;
+	for (i = 0; i < RTE_COLORS; i++) {
+		if (mtr_policy->skip_g && i == RTE_COLOR_GREEN)
+			continue;
+		if (mtr_policy->skip_y && i == RTE_COLOR_YELLOW)
+			continue;
+		if (mtr_policy->skip_r && i == RTE_COLOR_RED)
+			continue;
+		mtr_policy->hws_act_templ[nb_colors] =
+			rte_flow_actions_template_create(dev->data->port_id,
+						&ata, policy->actions[i],
+						policy->actions[i], NULL);
+		if (!mtr_policy->hws_act_templ[nb_colors])
+			goto policy_add_err;
+		nb_colors++;
+	}
+	for (i = 0; i < MLX5_MTR_DOMAIN_MAX; i++) {
+		memset(&ta, 0, sizeof(ta));
+		ta.nb_flows = RTE_COLORS;
+		ta.flow_attr.group = mtr_policy->group;
+		if (i == MLX5_MTR_DOMAIN_INGRESS) {
+			if (!mtr_policy->ingress)
+				continue;
+			ta.flow_attr.ingress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_EGRESS) {
+			if (!mtr_policy->egress)
+				continue;
+			ta.flow_attr.egress = 1;
+		} else if (i == MLX5_MTR_DOMAIN_TRANSFER) {
+			if (!mtr_policy->transfer)
+				continue;
+			ta.flow_attr.transfer = 1;
+		}
+		mtr_policy->hws_flow_table[i] =
+			rte_flow_template_table_create(dev->data->port_id,
+					&ta, &mtr_policy->hws_item_templ, 1,
+					mtr_policy->hws_act_templ, nb_colors,
+					NULL);
+		if (!mtr_policy->hws_flow_table[i])
+			goto policy_add_err;
+		nb_colors = 0;
+		for (j = 0; j < RTE_COLORS; j++) {
+			if (mtr_policy->skip_g && j == RTE_COLOR_GREEN)
+				continue;
+			if (mtr_policy->skip_y && j == RTE_COLOR_YELLOW)
+				continue;
+			if (mtr_policy->skip_r && j == RTE_COLOR_RED)
+				continue;
+			color = rte_col_2_mlx5_col((enum rte_color)j);
+			tag_spec.data = color;
+			mtr_policy->hws_flow_rule[i][j] =
+				rte_flow_async_create(dev->data->port_id,
+					CTRL_QUEUE_ID(priv), &op_attr,
+					mtr_policy->hws_flow_table[i],
+					pattern, 0, policy->actions[j],
+					nb_colors, NULL, NULL);
+			if (!mtr_policy->hws_flow_rule[i][j])
+				goto policy_add_err;
+			nb_colors++;
+			nb_flows++;
+		}
+		ret = rte_flow_push(dev->data->port_id,
+				    CTRL_QUEUE_ID(priv), NULL);
+		if (ret < 0)
+			goto policy_add_err;
+		while (nb_flows) {
+			ret = rte_flow_pull(dev->data->port_id,
+					    CTRL_QUEUE_ID(priv), result,
+					    nb_flows, NULL);
+			if (ret < 0)
+				goto policy_add_err;
+			for (j = 0; j < ret; j++) {
+				if (result[j].status == RTE_FLOW_OP_ERROR)
+					goto policy_add_err;
+			}
+			nb_flows -= ret;
+		}
+	}
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	return 0;
+policy_add_err:
+	rte_spinlock_unlock(&priv->hw_ctrl_lock);
+	ret = mlx5_flow_meter_policy_hws_delete(dev, policy_id, error);
+	memset(mtr_policy, 0, sizeof(struct mlx5_flow_meter_policy));
+	if (ret)
+		return ret;
+	return -rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Failed to create meter policy.");
+}
+
 /**
  * Check meter validation.
  *
@@ -1087,7 +1565,8 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
@@ -1336,7 +1815,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+						   &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1369,6 +1849,90 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 		NULL, "Failed to create devx meter.");
 }
 
+/**
+ * Create meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[in] params
+ *   Pointer to rte meter parameters.
+ * @param[in] shared
+ *   Meter shared with other flow or not.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
+		       struct rte_mtr_params *params, int shared,
+		       struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_meter_profile *profile;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy = NULL;
+	struct mlx5_aso_mtr *aso_mtr;
+	int ret;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+			"Meter bulk array is not allocated.");
+	/* Meter profile must exist. */
+	profile = mlx5_flow_meter_profile_find(priv, params->meter_profile_id);
+	if (!profile->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_PROFILE_ID,
+			NULL, "Meter profile id not valid.");
+	/* Meter policy must exist. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			params->meter_policy_id, NULL);
+	if (!policy->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
+			NULL, "Meter policy id not valid.");
+	/* Meter ID must be valid. */
+	if (meter_id >= priv->mtr_config.nb_meters)
+		return -rte_mtr_error_set(error, EINVAL,
+			RTE_MTR_ERROR_TYPE_MTR_ID,
+			NULL, "Meter id not valid.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object already exists.");
+	/* Fill the flow meter parameters. */
+	fm->meter_id = meter_id;
+	fm->policy_id = params->meter_policy_id;
+	fm->profile = profile;
+	fm->meter_offset = meter_id;
+	fm->group = policy->group;
+	/* Add to the flow meter list. */
+	fm->active_state = 1; /* Config meter starts as active. */
+	fm->is_enable = params->meter_enable;
+	fm->shared = !!shared;
+	fm->initialized = 1;
+	/* Update ASO flow meter by wqe. */
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+					   &priv->mtr_bulk);
+	if (ret)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+			NULL, "Failed to create devx meter.");
+	fm->active_state = params->meter_enable;
+	__atomic_add_fetch(&fm->profile->ref_cnt, 1, __ATOMIC_RELAXED);
+	__atomic_add_fetch(&policy->ref_cnt, 1, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
 mlx5_flow_meter_params_flush(struct rte_eth_dev *dev,
 			struct mlx5_flow_meter_info *fm,
@@ -1475,6 +2039,58 @@ mlx5_flow_meter_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
 	return 0;
 }
 
+/**
+ * Destroy meter rules.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_id
+ *   Meter id.
+ * @param[out] error
+ *   Pointer to rte meter error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_meter_hws_destroy(struct rte_eth_dev *dev, uint32_t meter_id,
+			struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
+
+	if (!priv->mtr_profile_arr ||
+	    !priv->mtr_policy_arr ||
+	    !priv->mtr_bulk.aso)
+		return -rte_mtr_error_set(error, ENOTSUP,
+			RTE_MTR_ERROR_TYPE_METER_POLICY, NULL,
+			"Meter bulk array is not allocated.");
+	/* Find ASO object. */
+	aso_mtr = mlx5_aso_meter_by_idx(priv, meter_id);
+	fm = &aso_mtr->fm;
+	if (!fm->initialized)
+		return -rte_mtr_error_set(error, ENOENT,
+					  RTE_MTR_ERROR_TYPE_MTR_ID,
+					  NULL, "Meter object id not valid.");
+	/* Meter object must not have any owner. */
+	if (fm->ref_cnt > 0)
+		return -rte_mtr_error_set(error, EBUSY,
+					  RTE_MTR_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "Meter object is being used.");
+	/* Destroy the meter profile. */
+	__atomic_sub_fetch(&fm->profile->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	/* Destroy the meter policy. */
+	policy = mlx5_flow_meter_policy_find(dev,
+			fm->policy_id, NULL);
+	__atomic_sub_fetch(&policy->ref_cnt,
+						1, __ATOMIC_RELAXED);
+	memset(fm, 0, sizeof(struct mlx5_flow_meter_info));
+	return 0;
+}
+
 /**
  * Modify meter state.
  *
@@ -1798,6 +2414,23 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.stats_read = mlx5_flow_meter_stats_read,
 };
 
+static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
+	.capabilities_get = mlx5_flow_mtr_cap_get,
+	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
+	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
+	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
+	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.create = mlx5_flow_meter_hws_create,
+	.destroy = mlx5_flow_meter_hws_destroy,
+	.meter_enable = mlx5_flow_meter_enable,
+	.meter_disable = mlx5_flow_meter_disable,
+	.meter_profile_update = mlx5_flow_meter_profile_update,
+	.meter_dscp_table_update = NULL,
+	.stats_update = NULL,
+	.stats_read = NULL,
+};
+
 /**
  * Get meter operations.
  *
@@ -1812,7 +2445,12 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 int
 mlx5_flow_meter_ops_get(struct rte_eth_dev *dev __rte_unused, void *arg)
 {
-	*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->sh->config.dv_flow_en == 2)
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_hws_ops;
+	else
+		*(const struct rte_mtr_ops **)arg = &mlx5_flow_mtr_ops;
 	return 0;
 }
 
@@ -1841,6 +2479,12 @@ mlx5_flow_meter_find(struct mlx5_priv *priv, uint32_t meter_id,
 	union mlx5_l3t_data data;
 	uint16_t n_valid;
 
+	if (priv->mtr_bulk.aso) {
+		if (mtr_idx)
+			*mtr_idx = meter_id;
+		aso_mtr = priv->mtr_bulk.aso + meter_id;
+		return &aso_mtr->fm;
+	}
 	if (priv->sh->meter_aso_en) {
 		rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 		n_valid = pools_mng->n_valid;
@@ -2185,6 +2829,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 	struct mlx5_flow_meter_profile *fmp;
 	struct mlx5_legacy_flow_meter *legacy_fm;
 	struct mlx5_flow_meter_info *fm;
+	struct mlx5_flow_meter_policy *policy;
 	struct mlx5_flow_meter_sub_policy *sub_policy;
 	void *tmp;
 	uint32_t i, mtr_idx, policy_idx;
@@ -2219,6 +2864,14 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 				NULL, "MTR object meter profile invalid.");
 		}
 	}
+	if (priv->mtr_bulk.aso) {
+		for (i = 1; i <= priv->mtr_config.nb_meter_profiles; i++) {
+			aso_mtr = mlx5_aso_meter_by_idx(priv, i);
+			fm = &aso_mtr->fm;
+			if (fm->initialized)
+				mlx5_flow_meter_hws_destroy(dev, i, error);
+		}
+	}
 	if (priv->policy_idx_tbl) {
 		MLX5_L3T_FOREACH(priv->policy_idx_tbl, i, entry) {
 			policy_idx = *(uint32_t *)entry;
@@ -2244,6 +2897,15 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->policy_idx_tbl);
 		priv->policy_idx_tbl = NULL;
 	}
+	if (priv->mtr_policy_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_policies; i++) {
+			policy = mlx5_flow_meter_policy_find(dev, i,
+							     &policy_idx);
+			if (policy->initialized)
+				mlx5_flow_meter_policy_hws_delete(dev, i,
+								  error);
+		}
+	}
 	if (priv->mtr_profile_tbl) {
 		MLX5_L3T_FOREACH(priv->mtr_profile_tbl, i, entry) {
 			fmp = entry;
@@ -2257,9 +2919,21 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		mlx5_l3t_destroy(priv->mtr_profile_tbl);
 		priv->mtr_profile_tbl = NULL;
 	}
+	if (priv->mtr_profile_arr) {
+		for (i = 0; i < priv->mtr_config.nb_meter_profiles; i++) {
+			fmp = mlx5_flow_meter_profile_find(priv, i);
+			if (fmp->initialized)
+				mlx5_flow_meter_profile_hws_delete(dev, i,
+								   error);
+		}
+	}
 	/* Delete default policy table. */
 	mlx5_flow_destroy_def_policy(dev);
 	if (priv->sh->refcnt == 1)
 		mlx5_flow_destroy_mtr_drop_tbls(dev);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	/* Destroy HWS configuration. */
+	mlx5_flow_meter_uninit(dev);
+#endif
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 08/18] net/mlx5: add HW steering counter action
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (6 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 07/18] net/mlx5: add HW steering meter action Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:45     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 09/18] net/mlx5: support DR action template API Suanming Mou
                     ` (10 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Xiaoyu Min

From: Xiaoyu Min <jackmin@nvidia.com>

This commit adds HW steering counter action support.
Pool mechanism is the basic data structure for the HW steering counter.

The HW steering's counter pool is based on the rte_ring of zero-copy
variation.

There are two global rte_rings:
1. free_list:
     Store the counters indexes, which are ready for use.
2. wait_reset_list:
     Store the counters indexes, which are just freed from the user and
     need to query the hardware counter to get the reset value before
     this counter can be reused again.

The counter pool also supports cache per HW steering's queues, which are
also based on rte_ring of zero-copy variation.

The cache can be configured in size, preload, threshold, and fetch size,
they are all exposed via device args.

The main operations of the counter pool are as follows:

 - Get one counter from the pool:
   1. The user call _get_* API.
   2. If the cache is enabled, dequeue one counter index from the local
      cache:
      2.A: if the dequeued one from the local cache is still in reset
	status (counter's query_gen_when_free is equal to pool's query
	gen):
	I. Flush all counters in local cache back to global
	   wait_reset_list.
	II. Fetch _fetch_sz_ counters into the cache from the global
	    free list.
	III. Fetch one counter from the cache.
   3. If the cache is empty, fetch _fetch_sz_ counters from the global
      free list into the cache and fetch one counter from the cache.
 - Free one counter into the pool:
   1. The user calls _put_* API.
   2. Put the counter into the local cache.
   3. If the local cache is full:
      3.A: Write back all counters above _threshold_ into the global
           wait_reset_list.
      3.B: Also, write back this counter into the global wait_reset_list.

When the local cache is disabled, _get_/_put_ cache directly from/into
global list.

Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
---
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/common/mlx5/mlx5_devx_cmds.c   |  50 +++
 drivers/common/mlx5/mlx5_devx_cmds.h   |  27 ++
 drivers/common/mlx5/mlx5_prm.h         |  20 +-
 drivers/common/mlx5/version.map        |   1 +
 drivers/net/mlx5/meson.build           |   1 +
 drivers/net/mlx5/mlx5.c                |  14 +
 drivers/net/mlx5/mlx5.h                |  27 ++
 drivers/net/mlx5/mlx5_defs.h           |   2 +
 drivers/net/mlx5/mlx5_flow.c           |  27 +-
 drivers/net/mlx5/mlx5_flow.h           |   5 +
 drivers/net/mlx5/mlx5_flow_aso.c       | 261 +++++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c        | 340 ++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.c        | 528 +++++++++++++++++++++++
 drivers/net/mlx5/mlx5_hws_cnt.h        | 558 +++++++++++++++++++++++++
 15 files changed, 1831 insertions(+), 31 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.c
 create mode 100644 drivers/net/mlx5/mlx5_hws_cnt.h

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index a0030aac37..4e1634c4d8 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -255,6 +255,7 @@ New Features
     - Support of modify fields.
     - Support of FDB.
     - Support of meter.
+    - Support of counter.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 9c185366d0..05b9429c7f 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -176,6 +176,41 @@ mlx5_devx_cmd_register_write(void *ctx, uint16_t reg_id, uint32_t arg,
 	return 0;
 }
 
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+		struct mlx5_devx_counter_attr *attr)
+{
+	struct mlx5_devx_obj *dcs = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*dcs),
+						0, SOCKET_ID_ANY);
+	uint32_t in[MLX5_ST_SZ_DW(alloc_flow_counter_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(alloc_flow_counter_out)] = {0};
+
+	if (!dcs) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	MLX5_SET(alloc_flow_counter_in, in, opcode,
+		 MLX5_CMD_OP_ALLOC_FLOW_COUNTER);
+	if (attr->bulk_log_max_alloc)
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk_log_size,
+			 attr->flow_counter_bulk_log_size);
+	else
+		MLX5_SET(alloc_flow_counter_in, in, flow_counter_bulk,
+			 attr->bulk_n_128);
+	if (attr->pd_valid)
+		MLX5_SET(alloc_flow_counter_in, in, pd, attr->pd);
+	dcs->obj = mlx5_glue->devx_obj_create(ctx, in,
+					      sizeof(in), out, sizeof(out));
+	if (!dcs->obj) {
+		DRV_LOG(ERR, "Can't allocate counters - error %d", errno);
+		rte_errno = errno;
+		mlx5_free(dcs);
+		return NULL;
+	}
+	dcs->id = MLX5_GET(alloc_flow_counter_out, out, flow_counter_id);
+	return dcs;
+}
+
 /**
  * Allocate flow counters via devx interface.
  *
@@ -967,6 +1002,16 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 					 general_obj_types) &
 			      MLX5_GENERAL_OBJ_TYPES_CAP_CONN_TRACK_OFFLOAD);
 	attr->rq_delay_drop = MLX5_GET(cmd_hca_cap, hcattr, rq_delay_drop);
+	attr->max_flow_counter_15_0 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_15_0);
+	attr->max_flow_counter_31_16 = MLX5_GET(cmd_hca_cap, hcattr,
+			max_flow_counter_31_16);
+	attr->alloc_flow_counter_pd = MLX5_GET(cmd_hca_cap, hcattr,
+			alloc_flow_counter_pd);
+	attr->flow_counter_access_aso = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_counter_access_aso);
+	attr->flow_access_aso_opc_mod = MLX5_GET(cmd_hca_cap, hcattr,
+			flow_access_aso_opc_mod);
 	if (attr->crypto) {
 		attr->aes_xts = MLX5_GET(cmd_hca_cap, hcattr, aes_xts);
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
@@ -995,6 +1040,11 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 							   hairpin_sq_wq_in_host_mem);
 		attr->hairpin_data_buffer_locked = MLX5_GET(cmd_hca_cap_2, hcattr,
 							    hairpin_data_buffer_locked);
+		attr->flow_counter_bulk_log_max_alloc = MLX5_GET(cmd_hca_cap_2,
+				hcattr, flow_counter_bulk_log_max_alloc);
+		attr->flow_counter_bulk_log_granularity =
+			MLX5_GET(cmd_hca_cap_2, hcattr,
+				 flow_counter_bulk_log_granularity);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index a10aa3331b..c94b9eac06 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -15,6 +15,16 @@
 #define MLX5_DEVX_MAX_KLM_ENTRIES ((UINT16_MAX - \
 		MLX5_ST_SZ_DW(create_mkey_in) * 4) / (MLX5_ST_SZ_DW(klm) * 4))
 
+struct mlx5_devx_counter_attr {
+	uint32_t pd_valid:1;
+	uint32_t pd:24;
+	uint32_t bulk_log_max_alloc:1;
+	union {
+		uint8_t flow_counter_bulk_log_size;
+		uint8_t bulk_n_128;
+	};
+};
+
 struct mlx5_devx_mkey_attr {
 	uint64_t addr;
 	uint64_t size;
@@ -266,6 +276,18 @@ struct mlx5_hca_attr {
 	uint32_t set_reg_c:8;
 	uint32_t nic_flow_table:1;
 	uint32_t modify_outer_ip_ecn:1;
+	union {
+		uint32_t max_flow_counter;
+		struct {
+			uint16_t max_flow_counter_15_0;
+			uint16_t max_flow_counter_31_16;
+		};
+	};
+	uint32_t flow_counter_bulk_log_max_alloc:5;
+	uint32_t flow_counter_bulk_log_granularity:5;
+	uint32_t alloc_flow_counter_pd:1;
+	uint32_t flow_counter_access_aso:1;
+	uint32_t flow_access_aso_opc_mod:8;
 };
 
 /* LAG Context. */
@@ -598,6 +620,11 @@ struct mlx5_devx_crypto_login_attr {
 
 /* mlx5_devx_cmds.c */
 
+__rte_internal
+struct mlx5_devx_obj *
+mlx5_devx_cmd_flow_counter_alloc_general(void *ctx,
+				struct mlx5_devx_counter_attr *attr);
+
 __rte_internal
 struct mlx5_devx_obj *mlx5_devx_cmd_flow_counter_alloc(void *ctx,
 						       uint32_t bulk_sz);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index fb3c43eed9..2b5c43ee6e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1170,8 +1170,10 @@ struct mlx5_ifc_alloc_flow_counter_in_bits {
 	u8 reserved_at_10[0x10];
 	u8 reserved_at_20[0x10];
 	u8 op_mod[0x10];
-	u8 flow_counter_id[0x20];
-	u8 reserved_at_40[0x18];
+	u8 reserved_at_40[0x8];
+	u8 pd[0x18];
+	u8 reserved_at_60[0x13];
+	u8 flow_counter_bulk_log_size[0x5];
 	u8 flow_counter_bulk[0x8];
 };
 
@@ -1405,7 +1407,13 @@ enum {
 #define MLX5_STEERING_LOGIC_FORMAT_CONNECTX_6DX 0x1
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8 reserved_at_0[0x20];
+	u8 access_other_hca_roce[0x1];
+	u8 alloc_flow_counter_pd[0x1];
+	u8 flow_counter_access_aso[0x1];
+	u8 reserved_at_3[0x5];
+	u8 flow_access_aso_opc_mod[0x8];
+	u8 reserved_at_10[0xf];
+	u8 vhca_resource_manager[0x1];
 	u8 hca_cap_2[0x1];
 	u8 reserved_at_21[0xf];
 	u8 vhca_id[0x10];
@@ -2118,7 +2126,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 format_select_dw_8_6_ext[0x1];
 	u8 reserved_at_1ac[0x14];
 	u8 general_obj_types_127_64[0x40];
-	u8 reserved_at_200[0x80];
+	u8 reserved_at_200[0x53];
+	u8 flow_counter_bulk_log_max_alloc[0x5];
+	u8 reserved_at_258[0x3];
+	u8 flow_counter_bulk_log_granularity[0x5];
+	u8 reserved_at_260[0x20];
 	u8 format_select_dw_gtpu_dw_0[0x8];
 	u8 format_select_dw_gtpu_dw_1[0x8];
 	u8 format_select_dw_gtpu_dw_2[0x8];
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index 413dec14ab..4f72900519 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -40,6 +40,7 @@ INTERNAL {
 	mlx5_devx_cmd_create_virtq;
 	mlx5_devx_cmd_destroy;
 	mlx5_devx_cmd_flow_counter_alloc;
+	mlx5_devx_cmd_flow_counter_alloc_general;
 	mlx5_devx_cmd_flow_counter_query;
 	mlx5_devx_cmd_flow_dump;
 	mlx5_devx_cmd_flow_single_dump;
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 6b947eaab5..ff84448186 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -41,6 +41,7 @@ sources = files(
 if is_linux
     sources += files(
             'mlx5_flow_hw.c',
+            'mlx5_hws_cnt.c',
             'mlx5_flow_verbs.c',
     )
     if (dpdk_conf.has('RTE_ARCH_X86_64')
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9cd4892858..4d87da8e29 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -175,6 +175,12 @@
 /* Device parameter to create the fdb default rule in PMD */
 #define MLX5_FDB_DEFAULT_RULE_EN "fdb_def_rule_en"
 
+/* HW steering counter configuration. */
+#define MLX5_HWS_CNT_SERVICE_CORE "service_core"
+
+/* HW steering counter's query interval. */
+#define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1245,6 +1251,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->allow_duplicate_pattern = !!tmp;
 	} else if (strcmp(MLX5_FDB_DEFAULT_RULE_EN, key) == 0) {
 		config->fdb_def_rule = !!tmp;
+	} else if (strcmp(MLX5_HWS_CNT_SERVICE_CORE, key) == 0) {
+		config->cnt_svc.service_core = tmp;
+	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
+		config->cnt_svc.cycle_time = tmp;
 	}
 	return 0;
 }
@@ -1281,6 +1291,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_DECAP_EN,
 		MLX5_ALLOW_DUPLICATE_PATTERN,
 		MLX5_FDB_DEFAULT_RULE_EN,
+		MLX5_HWS_CNT_SERVICE_CORE,
+		MLX5_HWS_CNT_CYCLE_TIME,
 		NULL,
 	};
 	int ret = 0;
@@ -1293,6 +1305,8 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->decap_en = 1;
 	config->allow_duplicate_pattern = 1;
 	config->fdb_def_rule = 1;
+	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
+	config->cnt_svc.service_core = rte_get_main_lcore();
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f99820c045..6f42053ef0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -313,6 +313,10 @@ struct mlx5_sh_config {
 	uint32_t hw_fcs_strip:1; /* FCS stripping is supported. */
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
+	struct {
+		uint16_t service_core;
+		uint32_t cycle_time; /* query cycle time in milli-second. */
+	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
 };
@@ -1234,6 +1238,22 @@ struct mlx5_send_to_kernel_action {
 	void *tbl;
 };
 
+#define HWS_CNT_ASO_SQ_NUM 4
+
+struct mlx5_hws_aso_mng {
+	uint16_t sq_num;
+	struct mlx5_aso_sq sqs[HWS_CNT_ASO_SQ_NUM];
+};
+
+struct mlx5_hws_cnt_svc_mng {
+	uint32_t refcnt;
+	uint32_t service_core;
+	uint32_t query_interval;
+	pthread_t service_thread;
+	uint8_t svc_running;
+	struct mlx5_hws_aso_mng aso_mng __rte_cache_aligned;
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -1334,6 +1354,7 @@ struct mlx5_dev_ctx_shared {
 	pthread_mutex_t lwm_config_lock;
 	uint32_t host_shaper_rate:8;
 	uint32_t lwm_triggered:1;
+	struct mlx5_hws_cnt_svc_mng *cnt_svc;
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
@@ -1620,6 +1641,7 @@ struct mlx5_priv {
 	/* Flex items have been created on the port. */
 	uint32_t flex_item_map; /* Map of allocated flex item elements. */
 	uint32_t nb_queue; /* HW steering queue number. */
+	struct mlx5_hws_cnt_pool *hws_cpool; /* HW steering's counter pool. */
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 	/* Item template list. */
 	LIST_HEAD(flow_hw_itt, rte_flow_pattern_template) flow_hw_itt;
@@ -2050,6 +2072,11 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
+void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
+int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
 /* mlx5_flow_flex.c */
 
 struct rte_flow_item_flex_handle *
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 585afb0a98..d064abfef3 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -188,4 +188,6 @@
 #define static_assert _Static_assert
 #endif
 
+#define MLX5_CNT_SVC_CYCLE_TIME_DEFAULT 500
+
 #endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 1c97b77031..320a11958f 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7834,24 +7834,33 @@ mlx5_flow_isolate(struct rte_eth_dev *dev,
  */
 static int
 flow_drv_query(struct rte_eth_dev *dev,
-	       uint32_t flow_idx,
+	       struct rte_flow *eflow,
 	       const struct rte_flow_action *actions,
 	       void *data,
 	       struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow *flow = mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
-					       flow_idx);
-	enum mlx5_flow_drv_type ftype;
+	struct rte_flow *flow = NULL;
+	enum mlx5_flow_drv_type ftype = MLX5_FLOW_TYPE_MIN;
 
+	if (priv->sh->config.dv_flow_en == 2) {
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		flow = eflow;
+		ftype = MLX5_FLOW_TYPE_HW;
+#endif
+	} else {
+		flow = (struct rte_flow *)mlx5_ipool_get(priv->flows[MLX5_FLOW_TYPE_GEN],
+				(uintptr_t)(void *)eflow);
+	}
 	if (!flow) {
 		return rte_flow_error_set(error, ENOENT,
 			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			  NULL,
 			  "invalid flow handle");
 	}
-	ftype = flow->drv_type;
+	if (ftype == MLX5_FLOW_TYPE_MIN)
+		ftype = flow->drv_type;
 	MLX5_ASSERT(ftype > MLX5_FLOW_TYPE_MIN && ftype < MLX5_FLOW_TYPE_MAX);
 	fops = flow_get_drv_ops(ftype);
 
@@ -7872,14 +7881,8 @@ mlx5_flow_query(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	int ret;
-	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (priv->sh->config.dv_flow_en == 2)
-		return rte_flow_error_set(error, ENOTSUP,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-			  NULL,
-			  "Flow non-Q query not supported");
-	ret = flow_drv_query(dev, (uintptr_t)(void *)flow, actions, data,
+	ret = flow_drv_query(dev, flow, actions, data,
 			     error);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index f9600568a0..213d8c2689 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1112,6 +1112,7 @@ struct rte_flow_hw {
 		struct mlx5_hrxq *hrxq; /* TIR action. */
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
+	uint32_t cnt_id;
 	uint8_t rule[0]; /* HWS layer data struct. */
 } __rte_packed;
 
@@ -1160,6 +1161,9 @@ struct mlx5_action_construct_data {
 			uint32_t level; /* RSS level. */
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
+		struct {
+			uint32_t id;
+		} shared_counter;
 	};
 };
 
@@ -1238,6 +1242,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
+	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index 60d0280367..ed9272e583 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -12,6 +12,9 @@
 
 #include "mlx5.h"
 #include "mlx5_flow.h"
+#include "mlx5_hws_cnt.h"
+
+#define MLX5_ASO_CNT_QUEUE_LOG_DESC 14
 
 /**
  * Free MR resources.
@@ -79,6 +82,33 @@ mlx5_aso_destroy_sq(struct mlx5_aso_sq *sq)
 	memset(sq, 0, sizeof(*sq));
 }
 
+/**
+ * Initialize Send Queue used for ASO access counter.
+ *
+ * @param[in] sq
+ *   ASO SQ to initialize.
+ */
+static void
+mlx5_aso_cnt_init_sq(struct mlx5_aso_sq *sq)
+{
+	volatile struct mlx5_aso_wqe *restrict wqe;
+	int i;
+	int size = 1 << sq->log_desc_n;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0, wqe = &sq->sq_obj.aso_wqes[0]; i < size; ++i, ++wqe) {
+		wqe->general_cseg.sq_ds = rte_cpu_to_be_32((sq->sqn << 8) |
+							  (sizeof(*wqe) >> 4));
+		wqe->aso_cseg.operand_masks = rte_cpu_to_be_32
+			(0u |
+			 (ASO_OPER_LOGICAL_OR << ASO_CSEG_COND_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_1_OPER_OFFSET) |
+			 (ASO_OP_ALWAYS_FALSE << ASO_CSEG_COND_0_OPER_OFFSET) |
+			 (BYTEWISE_64BYTE << ASO_CSEG_DATA_MASK_MODE_OFFSET));
+		wqe->aso_cseg.data_mask = RTE_BE64(UINT64_MAX);
+	}
+}
+
 /**
  * Initialize Send Queue used for ASO access.
  *
@@ -191,7 +221,7 @@ mlx5_aso_ct_init_sq(struct mlx5_aso_sq *sq)
  */
 static int
 mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
-		   void *uar)
+		   void *uar, uint16_t log_desc_n)
 {
 	struct mlx5_devx_cq_attr cq_attr = {
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(uar),
@@ -212,12 +242,12 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	int ret;
 
 	if (mlx5_devx_cq_create(cdev->ctx, &sq->cq.cq_obj,
-				MLX5_ASO_QUEUE_LOG_DESC, &cq_attr,
+				log_desc_n, &cq_attr,
 				SOCKET_ID_ANY))
 		goto error;
 	sq->cq.cq_ci = 0;
-	sq->cq.log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
-	sq->log_desc_n = MLX5_ASO_QUEUE_LOG_DESC;
+	sq->cq.log_desc_n = log_desc_n;
+	sq->log_desc_n = log_desc_n;
 	sq_attr.cqn = sq->cq.cq_obj.cq->id;
 	/* for mlx5_aso_wqe that is twice the size of mlx5_wqe */
 	log_wqbb_n = sq->log_desc_n + 1;
@@ -269,7 +299,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    sq_desc_n, &sh->aso_age_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->aso_age_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->aso_age_mng->aso_sq.mr);
 			return -1;
 		}
@@ -277,7 +308,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		break;
 	case ASO_OPC_MOD_POLICER:
 		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj))
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
 			return -1;
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
@@ -287,7 +318,7 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 				    &sh->ct_mng->aso_sq.mr))
 			return -1;
 		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj)) {
+				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
 			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
 			return -1;
 		}
@@ -1403,3 +1434,219 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 	rte_errno = EBUSY;
 	return -rte_errno;
 }
+
+int
+mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_hws_aso_mng *aso_mng = NULL;
+	uint8_t idx;
+	struct mlx5_aso_sq *sq;
+
+	MLX5_ASSERT(sh);
+	MLX5_ASSERT(sh->cnt_svc);
+	aso_mng = &sh->cnt_svc->aso_mng;
+	aso_mng->sq_num = HWS_CNT_ASO_SQ_NUM;
+	for (idx = 0; idx < HWS_CNT_ASO_SQ_NUM; idx++) {
+		sq = &aso_mng->sqs[idx];
+		if (mlx5_aso_sq_create(sh->cdev, sq, sh->tx_uar.obj,
+					MLX5_ASO_CNT_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_cnt_init_sq(sq);
+	}
+	return 0;
+error:
+	mlx5_aso_cnt_queue_uninit(sh);
+	return -1;
+}
+
+void
+mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh)
+{
+	uint16_t idx;
+
+	for (idx = 0; idx < sh->cnt_svc->aso_mng.sq_num; idx++)
+		mlx5_aso_destroy_sq(&sh->cnt_svc->aso_mng.sqs[idx]);
+	sh->cnt_svc->aso_mng.sq_num = 0;
+}
+
+static uint16_t
+mlx5_aso_cnt_sq_enqueue_burst(struct mlx5_hws_cnt_pool *cpool,
+		struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_aso_sq *sq, uint32_t n,
+		uint32_t offset, uint32_t dcs_id_base)
+{
+	volatile struct mlx5_aso_wqe *wqe;
+	uint16_t size = 1 << sq->log_desc_n;
+	uint16_t mask = size - 1;
+	uint16_t max;
+	uint32_t upper_offset = offset;
+	uint64_t addr;
+	uint32_t ctrl_gen_id = 0;
+	uint8_t opcmod = sh->cdev->config.hca_attr.flow_access_aso_opc_mod;
+	rte_be32_t lkey = rte_cpu_to_be_32(cpool->raw_mng->mr.lkey);
+	uint16_t aso_n = (uint16_t)(RTE_ALIGN_CEIL(n, 4) / 4);
+	uint32_t ccntid;
+
+	max = RTE_MIN(size - (uint16_t)(sq->head - sq->tail), aso_n);
+	if (unlikely(!max))
+		return 0;
+	upper_offset += (max * 4);
+	/* Because only one burst at one time, we can use the same elt. */
+	sq->elts[0].burst_size = max;
+	ctrl_gen_id = dcs_id_base;
+	ctrl_gen_id /= 4;
+	do {
+		ccntid = upper_offset - max * 4;
+		wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
+		rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
+		wqe->general_cseg.misc = rte_cpu_to_be_32(ctrl_gen_id);
+		wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+							 MLX5_COMP_MODE_OFFSET);
+		wqe->general_cseg.opcode = rte_cpu_to_be_32
+						(MLX5_OPCODE_ACCESS_ASO |
+						 (opcmod <<
+						  WQE_CSEG_OPC_MOD_OFFSET) |
+						 (sq->pi <<
+						  WQE_CSEG_WQE_INDEX_OFFSET));
+		addr = (uint64_t)RTE_PTR_ADD(cpool->raw_mng->raw,
+				ccntid * sizeof(struct flow_counter_stats));
+		wqe->aso_cseg.va_h = rte_cpu_to_be_32((uint32_t)(addr >> 32));
+		wqe->aso_cseg.va_l_r = rte_cpu_to_be_32((uint32_t)addr | 1u);
+		wqe->aso_cseg.lkey = lkey;
+		sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
+		sq->head++;
+		sq->next++;
+		ctrl_gen_id++;
+		max--;
+	} while (max);
+	wqe->general_cseg.flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+							 MLX5_COMP_MODE_OFFSET);
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	return sq->elts[0].burst_size;
+}
+
+static uint16_t
+mlx5_aso_cnt_completion_handle(struct mlx5_aso_sq *sq)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const unsigned int cq_size = 1 << cq->log_desc_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = cq->cq_ci & mask;
+	const uint16_t max = (uint16_t)(sq->head - sq->tail);
+	uint16_t i = 0;
+	int ret;
+	if (unlikely(!max))
+		return 0;
+	idx = next_idx;
+	next_idx = (cq->cq_ci + 1) & mask;
+	rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+	cqe = &cq->cq_obj.cqes[idx];
+	ret = check_cqe(cqe, cq_size, cq->cq_ci);
+	/*
+	 * Be sure owner read is done before any other cookie field or
+	 * opaque field.
+	 */
+	rte_io_rmb();
+	if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+		if (likely(ret == MLX5_CQE_STATUS_HW_OWN))
+			return 0; /* return immediately. */
+		mlx5_aso_cqe_err_handle(sq);
+	}
+	i += sq->elts[0].burst_size;
+	sq->elts[0].burst_size = 0;
+	cq->cq_ci++;
+	if (likely(i)) {
+		sq->tail += i;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return i;
+}
+
+static uint16_t
+mlx5_aso_cnt_query_one_dcs(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool,
+			   uint8_t dcs_idx, uint32_t num)
+{
+	uint32_t dcs_id = cpool->dcs_mng.dcs[dcs_idx].obj->id;
+	uint64_t cnt_num = cpool->dcs_mng.dcs[dcs_idx].batch_sz;
+	uint64_t left;
+	uint32_t iidx = cpool->dcs_mng.dcs[dcs_idx].iidx;
+	uint32_t offset;
+	uint16_t mask;
+	uint16_t sq_idx;
+	uint64_t burst_sz = (uint64_t)(1 << MLX5_ASO_CNT_QUEUE_LOG_DESC) * 4 *
+		sh->cnt_svc->aso_mng.sq_num;
+	uint64_t qburst_sz = burst_sz / sh->cnt_svc->aso_mng.sq_num;
+	uint64_t n;
+	struct mlx5_aso_sq *sq;
+
+	cnt_num = RTE_MIN(num, cnt_num);
+	left = cnt_num;
+	while (left) {
+		mask = 0;
+		for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+				sq_idx++) {
+			if (left == 0) {
+				mask |= (1 << sq_idx);
+				continue;
+			}
+			n = RTE_MIN(left, qburst_sz);
+			offset = cnt_num - left;
+			offset += iidx;
+			mlx5_aso_cnt_sq_enqueue_burst(cpool, sh,
+					&sh->cnt_svc->aso_mng.sqs[sq_idx], n,
+					offset, dcs_id);
+			left -= n;
+		}
+		do {
+			for (sq_idx = 0; sq_idx < sh->cnt_svc->aso_mng.sq_num;
+					sq_idx++) {
+				sq = &sh->cnt_svc->aso_mng.sqs[sq_idx];
+				if (mlx5_aso_cnt_completion_handle(sq))
+					mask |= (1 << sq_idx);
+			}
+		} while (mask < ((1 << sh->cnt_svc->aso_mng.sq_num) - 1));
+	}
+	return cnt_num;
+}
+
+/*
+ * Query FW counter via ASO WQE.
+ *
+ * ASO query counter use _sync_ mode, means:
+ * 1. each SQ issue one burst with several WQEs
+ * 2. ask for CQE at last WQE
+ * 3. busy poll CQ of each SQ's
+ * 4. If all SQ's CQE are received then goto step 1, issue next burst
+ *
+ * @param[in] sh
+ *   Pointer to shared device.
+ * @param[in] cpool
+ *   Pointer to counter pool.
+ *
+ * @return
+ *   0 on success, -1 on failure.
+ */
+int
+mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
+		   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	uint32_t num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool) -
+		rte_ring_count(cpool->free_list);
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		num = RTE_MIN(cnt_num, cpool->dcs_mng.dcs[idx].batch_sz);
+		mlx5_aso_cnt_query_one_dcs(sh, cpool, idx, num);
+		cnt_num -= num;
+		if (cnt_num == 0)
+			break;
+	}
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6be83e37de..3b5dfcd9e1 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -10,6 +10,7 @@
 #include "mlx5_rx.h"
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+#include "mlx5_hws_cnt.h"
 
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
@@ -353,6 +354,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 			mlx5dr_action_destroy(acts->mhdr->action);
 		mlx5_free(acts->mhdr);
 	}
+	if (mlx5_hws_cnt_id_valid(acts->cnt_id)) {
+		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
+		acts->cnt_id = 0;
+	}
 }
 
 /**
@@ -532,6 +537,44 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared counter action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] cnt_id
+ *   Shared counter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t cnt_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_counter.id = cnt_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
+
+
 /**
  * Translate shared indirect action.
  *
@@ -573,6 +616,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 		    action_src, action_dst, idx, shared_rss))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (__flow_hw_act_data_shared_cnt_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
+			action_src, action_dst, act_idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -946,6 +996,30 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	}
 	return 0;
 }
+
+static __rte_always_inline int
+flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
+		      struct mlx5_hw_actions *acts)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t pos = start_pos;
+	cnt_id_t cnt_id;
+	int ret;
+
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	if (ret != 0)
+		return ret;
+	ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &acts->rule_acts[pos].action,
+				 &acts->rule_acts[pos].counter.offset);
+	if (ret != 0)
+		return ret;
+	acts->cnt_id = cnt_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1192,6 +1266,20 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			i++;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			if (masks->conf &&
+			    ((const struct rte_flow_action_count *)
+			     masks->conf)->id) {
+				err = flow_hw_cnt_compile(dev, i, acts);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, i)) {
+				goto err;
+			}
+			i++;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1380,6 +1468,13 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				(dev, &act_data, item_flags, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+				act_idx,
+				&rule_act->action,
+				&rule_act->counter.offset))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1523,7 +1618,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num)
+			  uint32_t *acts_num,
+			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
@@ -1577,6 +1673,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
 		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
@@ -1684,6 +1781,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
+					&cnt_id);
+			if (ret != 0)
+				return ret;
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 cnt_id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = cnt_id;
+			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = mlx5_hws_cnt_pool_get_action_offset
+				(priv->hws_cpool,
+				 act_data->shared_counter.id,
+				 &rule_acts[act_data->action_dst].action,
+				 &rule_acts[act_data->action_dst].counter.offset
+				 );
+			if (ret != 0)
+				return ret;
+			job->flow->cnt_id = act_data->shared_counter.id;
+			break;
 		default:
 			break;
 		}
@@ -1693,6 +1816,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				job->flow->idx - 1;
 		rule_acts[hw_acts->encap_decap_pos].reformat.data = buf;
 	}
+	if (mlx5_hws_cnt_id_valid(hw_acts->cnt_id))
+		job->flow->cnt_id = hw_acts->cnt_id;
 	return 0;
 }
 
@@ -1828,7 +1953,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * user's input, in order to save the cost.
 	 */
 	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num)) {
+				  actions, rule_acts, &acts_num, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1958,6 +2083,13 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
+			    mlx5_hws_cnt_is_shared
+				(priv->hws_cpool, job->flow->cnt_id) == false) {
+				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
+						&job->flow->cnt_id);
+				job->flow->cnt_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -2681,6 +2813,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -4352,6 +4487,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_counters) {
+		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
+				nb_queue);
+		if (priv->hws_cpool == NULL)
+			goto err;
+	}
 	return 0;
 err:
 	flow_hw_free_vport_actions(priv);
@@ -4421,6 +4562,8 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
+	if (priv->hws_cpool)
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4562,10 +4705,28 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	cnt_id_t cnt_id;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_create(dev, conf, action, error);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+			rte_flow_error_set(error, ENODEV,
+					RTE_FLOW_ERROR_TYPE_ACTION,
+					NULL,
+					"counter are not configured!");
+		else
+			handle = (struct rte_flow_action_handle *)
+				 (uintptr_t)cnt_id;
+		break;
+	default:
+		handle = flow_dv_action_create(dev, conf, action, error);
+	}
+	return handle;
 }
 
 /**
@@ -4629,10 +4790,172 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			      void *user_data,
 			      struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	struct mlx5_priv *priv = dev->data->dev_private;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_destroy(dev, handle, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	default:
+		return flow_dv_action_destroy(dev, handle, error);
+	}
+}
+
+static int
+flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
+		      void *data, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cnt *cnt;
+	struct rte_flow_query_count *qc = data;
+	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint64_t pkts, bytes;
+
+	if (!mlx5_hws_cnt_id_valid(counter))
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"counter are not available");
+	cnt = &priv->hws_cpool->pool[iidx];
+	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
+	qc->hits_set = 1;
+	qc->bytes_set = 1;
+	qc->hits = pkts - cnt->reset.hits;
+	qc->bytes = bytes - cnt->reset.bytes;
+	if (qc->reset) {
+		cnt->reset.bytes = bytes;
+		cnt->reset.hits = pkts;
+	}
+	return 0;
+}
+
+static int
+flow_hw_query(struct rte_eth_dev *dev,
+	      struct rte_flow *flow __rte_unused,
+	      const struct rte_flow_action *actions __rte_unused,
+	      void *data __rte_unused,
+	      struct rte_flow_error *error __rte_unused)
+{
+	int ret = -EINVAL;
+	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
+
+	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; actions++) {
+		switch (actions->type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
+						  error);
+			break;
+		default:
+			return rte_flow_error_set(error, ENOTSUP,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  actions,
+						  "action not supported");
+		}
+	}
+	return ret;
+}
+
+/**
+ * Create indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   A valid shared action handle in case of success, NULL otherwise and
+ *   rte_errno is set.
+ */
+static struct rte_flow_action_handle *
+flow_hw_action_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_indir_action_conf *conf,
+		       const struct rte_flow_action *action,
+		       struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
+					    NULL, err);
+}
+
+/**
+ * Destroy the indirect action.
+ * Release action related resources on the NIC and the memory.
+ * Lock free, (mutex should be acquired by caller).
+ * Dispatcher for action type specific call.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be removed.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_destroy(struct rte_eth_dev *dev,
+		       struct rte_flow_action_handle *handle,
+		       struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
+			NULL, error);
+}
+
+/**
+ * Updates in place shared action configuration.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] handle
+ *   The indirect action object handle to be updated.
+ * @param[in] update
+ *   Action specification used to modify the action pointed by *handle*.
+ *   *update* could be of same type with the action pointed by the *handle*
+ *   handle argument, or some other structures like a wrapper, depending on
+ *   the indirect action type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_update(struct rte_eth_dev *dev,
+		      struct rte_flow_action_handle *handle,
+		      const void *update,
+		      struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
+			update, NULL, err);
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		return flow_hw_query_counter(dev, act_idx, data, error);
+	default:
+		return flow_dv_action_query(dev, handle, data, error);
+	}
 }
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
@@ -4654,10 +4977,11 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
 	.action_validate = flow_dv_action_validate,
-	.action_create = flow_dv_action_create,
-	.action_destroy = flow_dv_action_destroy,
-	.action_update = flow_dv_action_update,
-	.action_query = flow_dv_action_query,
+	.action_create = flow_hw_action_create,
+	.action_destroy = flow_hw_action_destroy,
+	.action_update = flow_hw_action_update,
+	.action_query = flow_hw_action_query,
+	.query = flow_hw_query,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
new file mode 100644
index 0000000000..d826ebaa25
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -0,0 +1,528 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+
+#include <stdint.h>
+#include <rte_malloc.h>
+#include <mlx5_malloc.h>
+#include <rte_ring.h>
+#include <mlx5_devx_cmds.h>
+#include <rte_cycles.h>
+
+#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
+
+#include "mlx5_utils.h"
+#include "mlx5_hws_cnt.h"
+
+#define HWS_CNT_CACHE_SZ_DEFAULT 511
+#define HWS_CNT_CACHE_PRELOAD_DEFAULT 254
+#define HWS_CNT_CACHE_FETCH_DEFAULT 254
+#define HWS_CNT_CACHE_THRESHOLD_DEFAULT 254
+#define HWS_CNT_ALLOC_FACTOR_DEFAULT 20
+
+static void
+__hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t preload;
+	uint32_t q_num = cpool->cache->q_num;
+	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	cnt_id_t cnt_id, iidx = 0;
+	uint32_t qidx;
+	struct rte_ring *qcache = NULL;
+
+	/*
+	 * Counter ID order is important for tracking the max number of in used
+	 * counter for querying, which means counter internal index order must
+	 * be from zero to the number user configured, i.e: 0 - 8000000.
+	 * Need to load counter ID in this order into the cache firstly,
+	 * and then the global free list.
+	 * In the end, user fetch the counter from minimal to the maximum.
+	 */
+	preload = RTE_MIN(cpool->cache->preload_sz, cnt_num / q_num);
+	for (qidx = 0; qidx < q_num; qidx++) {
+		for (; iidx < preload * (qidx + 1); iidx++) {
+			cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+			qcache = cpool->cache->qcache[qidx];
+			if (qcache)
+				rte_ring_enqueue_elem(qcache, &cnt_id,
+						sizeof(cnt_id));
+		}
+	}
+	for (; iidx < cnt_num; iidx++) {
+		cnt_id = mlx5_hws_cnt_id_gen(cpool, iidx);
+		rte_ring_enqueue_elem(cpool->free_list, &cnt_id,
+				sizeof(cnt_id));
+	}
+}
+
+static void
+__mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	struct rte_ring *reset_list = cpool->wait_reset_list;
+	struct rte_ring *reuse_list = cpool->reuse_list;
+	uint32_t reset_cnt_num;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdu = {0};
+
+	reset_cnt_num = rte_ring_count(reset_list);
+	do {
+		cpool->query_gen++;
+		mlx5_aso_cnt_query(sh, cpool);
+		zcdr.n1 = 0;
+		zcdu.n1 = 0;
+		rte_ring_enqueue_zc_burst_elem_start(reuse_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdu,
+				NULL);
+		rte_ring_dequeue_zc_burst_elem_start(reset_list,
+				sizeof(cnt_id_t), reset_cnt_num, &zcdr,
+				NULL);
+		__hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num);
+		rte_ring_dequeue_zc_elem_finish(reset_list,
+				reset_cnt_num);
+		rte_ring_enqueue_zc_elem_finish(reuse_list,
+				reset_cnt_num);
+		reset_cnt_num = rte_ring_count(reset_list);
+	} while (reset_cnt_num > 0);
+}
+
+static void
+mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_raw_data_mng *mng)
+{
+	if (mng == NULL)
+		return;
+	sh->cdev->mr_scache.dereg_mr_cb(&mng->mr);
+	mlx5_free(mng->raw);
+	mlx5_free(mng);
+}
+
+__rte_unused
+static struct mlx5_hws_cnt_raw_data_mng *
+mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
+{
+	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
+	int ret;
+	size_t sz = n * sizeof(struct flow_counter_stats);
+
+	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
+			SOCKET_ID_ANY);
+	if (mng == NULL)
+		goto error;
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+			SOCKET_ID_ANY);
+	if (mng->raw == NULL)
+		goto error;
+	ret = sh->cdev->mr_scache.reg_mr_cb(sh->cdev->pd, mng->raw, sz,
+					    &mng->mr);
+	if (ret) {
+		rte_errno = errno;
+		goto error;
+	}
+	return mng;
+error:
+	mlx5_hws_cnt_raw_data_free(sh, mng);
+	return NULL;
+}
+
+static void *
+mlx5_hws_cnt_svc(void *opaque)
+{
+	struct mlx5_dev_ctx_shared *sh =
+		(struct mlx5_dev_ctx_shared *)opaque;
+	uint64_t interval =
+		(uint64_t)sh->cnt_svc->query_interval * (US_PER_S / MS_PER_S);
+	uint16_t port_id;
+	uint64_t start_cycle, query_cycle = 0;
+	uint64_t query_us;
+	uint64_t sleep_us;
+
+	while (sh->cnt_svc->svc_running != 0) {
+		start_cycle = rte_rdtsc();
+		MLX5_ETH_FOREACH_DEV(port_id, sh->cdev->dev) {
+			struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+			if (opriv != NULL &&
+			    opriv->sh == sh &&
+			    opriv->hws_cpool != NULL) {
+				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+			}
+		}
+		query_cycle = rte_rdtsc() - start_cycle;
+		query_us = query_cycle / (rte_get_timer_hz() / US_PER_S);
+		sleep_us = interval - query_us;
+		if (interval > query_us)
+			rte_delay_us_sleep(sleep_us);
+	}
+	return NULL;
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg)
+{
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct mlx5_hws_cnt_pool *cntp;
+	uint64_t cnt_num = 0;
+	uint32_t qidx;
+
+	MLX5_ASSERT(pcfg);
+	MLX5_ASSERT(ccfg);
+	cntp = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*cntp), 0,
+			   SOCKET_ID_ANY);
+	if (cntp == NULL)
+		return NULL;
+
+	cntp->cfg = *pcfg;
+	cntp->cache = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*cntp->cache) +
+			sizeof(((struct mlx5_hws_cnt_pool_caches *)0)->qcache[0])
+				* ccfg->q_num, 0, SOCKET_ID_ANY);
+	if (cntp->cache == NULL)
+		goto error;
+	 /* store the necessary cache parameters. */
+	cntp->cache->fetch_sz = ccfg->fetch_sz;
+	cntp->cache->preload_sz = ccfg->preload_sz;
+	cntp->cache->threshold = ccfg->threshold;
+	cntp->cache->q_num = ccfg->q_num;
+	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
+	if (cnt_num > UINT32_MAX) {
+		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
+			cnt_num);
+		goto error;
+	}
+	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(struct mlx5_hws_cnt) *
+			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
+			0, SOCKET_ID_ANY);
+	if (cntp->pool == NULL)
+		goto error;
+	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
+	cntp->free_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->free_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_R_RING", pcfg->name);
+	cntp->wait_reset_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_MP_HTS_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
+	if (cntp->wait_reset_list == NULL) {
+		DRV_LOG(ERR, "failed to create free list ring");
+		goto error;
+	}
+	snprintf(mz_name, sizeof(mz_name), "%s_U_RING", pcfg->name);
+	cntp->reuse_list = rte_ring_create_elem(mz_name, sizeof(cnt_id_t),
+			(uint32_t)cnt_num, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_MC_HTS_DEQ | RING_F_EXACT_SZ);
+	if (cntp->reuse_list == NULL) {
+		DRV_LOG(ERR, "failed to create reuse list ring");
+		goto error;
+	}
+	for (qidx = 0; qidx < ccfg->q_num; qidx++) {
+		snprintf(mz_name, sizeof(mz_name), "%s_cache/%u", pcfg->name,
+				qidx);
+		cntp->cache->qcache[qidx] = rte_ring_create(mz_name, ccfg->size,
+				SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (cntp->cache->qcache[qidx] == NULL)
+			goto error;
+	}
+	return cntp;
+error:
+	mlx5_hws_cnt_pool_deinit(cntp);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool * const cntp)
+{
+	uint32_t qidx = 0;
+	if (cntp == NULL)
+		return;
+	rte_ring_free(cntp->free_list);
+	rte_ring_free(cntp->wait_reset_list);
+	rte_ring_free(cntp->reuse_list);
+	if (cntp->cache) {
+		for (qidx = 0; qidx < cntp->cache->q_num; qidx++)
+			rte_ring_free(cntp->cache->qcache[qidx]);
+	}
+	mlx5_free(cntp->cache);
+	mlx5_free(cntp->raw_mng);
+	mlx5_free(cntp->pool);
+	mlx5_free(cntp);
+}
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh)
+{
+#define CNT_THREAD_NAME_MAX 256
+	char name[CNT_THREAD_NAME_MAX];
+	rte_cpuset_t cpuset;
+	int ret;
+	uint32_t service_core = sh->cnt_svc->service_core;
+
+	CPU_ZERO(&cpuset);
+	sh->cnt_svc->svc_running = 1;
+	ret = pthread_create(&sh->cnt_svc->service_thread, NULL,
+			mlx5_hws_cnt_svc, sh);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create HW steering's counter service thread.");
+		return -ENOSYS;
+	}
+	snprintf(name, CNT_THREAD_NAME_MAX - 1, "%s/svc@%d",
+		 sh->ibdev_name, service_core);
+	rte_thread_setname(sh->cnt_svc->service_thread, name);
+	CPU_SET(service_core, &cpuset);
+	pthread_setaffinity_np(sh->cnt_svc->service_thread, sizeof(cpuset),
+				&cpuset);
+	return 0;
+}
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc->service_thread == 0)
+		return;
+	sh->cnt_svc->svc_running = 0;
+	pthread_join(sh->cnt_svc->service_thread, NULL);
+	sh->cnt_svc->service_thread = 0;
+}
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+	uint32_t max_log_bulk_sz = 0;
+	uint32_t log_bulk_sz;
+	uint32_t idx, alloced = 0;
+	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
+	struct mlx5_devx_counter_attr attr = {0};
+	struct mlx5_devx_obj *dcs;
+
+	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
+		DRV_LOG(ERR,
+			"Fw doesn't support bulk log max alloc");
+		return -1;
+	}
+	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
+	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
+	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
+	attr.pd = sh->cdev->pdn;
+	attr.pd_valid = 1;
+	attr.bulk_log_max_alloc = 1;
+	attr.flow_counter_bulk_log_size = log_bulk_sz;
+	idx = 0;
+	dcs = mlx5_devx_cmd_flow_counter_alloc_general(sh->cdev->ctx, &attr);
+	if (dcs == NULL)
+		goto error;
+	cpool->dcs_mng.dcs[idx].obj = dcs;
+	cpool->dcs_mng.dcs[idx].batch_sz = (1 << log_bulk_sz);
+	cpool->dcs_mng.batch_total++;
+	idx++;
+	cpool->dcs_mng.dcs[0].iidx = 0;
+	alloced = cpool->dcs_mng.dcs[0].batch_sz;
+	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
+		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			dcs = mlx5_devx_cmd_flow_counter_alloc_general
+				(sh->cdev->ctx, &attr);
+			if (dcs == NULL)
+				goto error;
+			cpool->dcs_mng.dcs[idx].obj = dcs;
+			cpool->dcs_mng.dcs[idx].batch_sz =
+				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].iidx = alloced;
+			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
+			cpool->dcs_mng.batch_total++;
+		}
+	}
+	return 0;
+error:
+	DRV_LOG(DEBUG,
+		"Cannot alloc device counter, allocated[%" PRIu32 "] request[%" PRIu32 "]",
+		alloced, cnt_num);
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+		cpool->dcs_mng.dcs[idx].obj = NULL;
+		cpool->dcs_mng.dcs[idx].batch_sz = 0;
+		cpool->dcs_mng.dcs[idx].iidx = 0;
+	}
+	cpool->dcs_mng.batch_total = 0;
+	return -1;
+}
+
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+
+	if (cpool == NULL)
+		return;
+	for (idx = 0; idx < MLX5_HWS_CNT_DCS_NUM; idx++)
+		mlx5_devx_cmd_destroy(cpool->dcs_mng.dcs[idx].obj);
+	if (cpool->raw_mng) {
+		mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+		cpool->raw_mng = NULL;
+	}
+}
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	int ret = 0;
+	struct mlx5_hws_cnt_dcs *dcs;
+	uint32_t flags;
+
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		dcs->dr_action = mlx5dr_action_create_counter(priv->dr_ctx,
+					(struct mlx5dr_devx_obj *)dcs->obj,
+					flags);
+		if (dcs->dr_action == NULL) {
+			mlx5_hws_cnt_pool_action_destroy(cpool);
+			ret = -ENOSYS;
+			break;
+		}
+	}
+	return ret;
+}
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool)
+{
+	uint32_t idx;
+	struct mlx5_hws_cnt_dcs *dcs;
+
+	for (idx = 0; idx < cpool->dcs_mng.batch_total; idx++) {
+		dcs = &cpool->dcs_mng.dcs[idx];
+		if (dcs->dr_action != NULL) {
+			mlx5dr_action_destroy(dcs->dr_action);
+			dcs->dr_action = NULL;
+		}
+	}
+}
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue)
+{
+	struct mlx5_hws_cnt_pool *cpool = NULL;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hws_cache_param cparam = {0};
+	struct mlx5_hws_cnt_pool_cfg pcfg = {0};
+	char *mp_name;
+	int ret = 0;
+	size_t sz;
+
+	/* init cnt service if not. */
+	if (priv->sh->cnt_svc == NULL) {
+		ret = mlx5_hws_cnt_svc_init(priv->sh);
+		if (ret != 0)
+			return NULL;
+	}
+	cparam.fetch_sz = HWS_CNT_CACHE_FETCH_DEFAULT;
+	cparam.preload_sz = HWS_CNT_CACHE_PRELOAD_DEFAULT;
+	cparam.q_num = nb_queue;
+	cparam.threshold = HWS_CNT_CACHE_THRESHOLD_DEFAULT;
+	cparam.size = HWS_CNT_CACHE_SZ_DEFAULT;
+	pcfg.alloc_factor = HWS_CNT_ALLOC_FACTOR_DEFAULT;
+	mp_name = mlx5_malloc(MLX5_MEM_ZERO, RTE_MEMZONE_NAMESIZE, 0,
+			SOCKET_ID_ANY);
+	if (mp_name == NULL)
+		goto error;
+	snprintf(mp_name, RTE_MEMZONE_NAMESIZE, "MLX5_HWS_CNT_POOL_%u",
+			dev->data->port_id);
+	pcfg.name = mp_name;
+	pcfg.request_num = pattr->nb_counters;
+	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	if (cpool == NULL)
+		goto error;
+	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
+	if (ret != 0)
+		goto error;
+	sz = RTE_ALIGN_CEIL(mlx5_hws_cnt_pool_get_size(cpool), 4);
+	cpool->raw_mng = mlx5_hws_cnt_raw_data_alloc(priv->sh, sz);
+	if (cpool->raw_mng == NULL)
+		goto error;
+	__hws_cnt_id_load(cpool);
+	/*
+	 * Bump query gen right after pool create so the
+	 * pre-loaded counters can be used directly
+	 * because they already have init value no need
+	 * to wait for query.
+	 */
+	cpool->query_gen = 1;
+	ret = mlx5_hws_cnt_pool_action_create(priv, cpool);
+	if (ret != 0)
+		goto error;
+	priv->sh->cnt_svc->refcnt++;
+	return cpool;
+error:
+	mlx5_hws_cnt_pool_destroy(priv->sh, cpool);
+	return NULL;
+}
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool)
+{
+	if (cpool == NULL)
+		return;
+	if (--sh->cnt_svc->refcnt == 0)
+		mlx5_hws_cnt_svc_deinit(sh);
+	mlx5_hws_cnt_pool_action_destroy(cpool);
+	mlx5_hws_cnt_pool_dcs_free(sh, cpool);
+	mlx5_hws_cnt_raw_data_free(sh, cpool->raw_mng);
+	mlx5_free((void *)cpool->cfg.name);
+	mlx5_hws_cnt_pool_deinit(cpool);
+}
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh)
+{
+	int ret;
+
+	sh->cnt_svc = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+			sizeof(*sh->cnt_svc), 0, SOCKET_ID_ANY);
+	if (sh->cnt_svc == NULL)
+		return -1;
+	sh->cnt_svc->query_interval = sh->config.cnt_svc.cycle_time;
+	sh->cnt_svc->service_core = sh->config.cnt_svc.service_core;
+	ret = mlx5_aso_cnt_queue_init(sh);
+	if (ret != 0) {
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+		return -1;
+	}
+	ret = mlx5_hws_cnt_service_thread_create(sh);
+	if (ret != 0) {
+		mlx5_aso_cnt_queue_uninit(sh);
+		mlx5_free(sh->cnt_svc);
+		sh->cnt_svc = NULL;
+	}
+	return 0;
+}
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->cnt_svc == NULL)
+		return;
+	mlx5_hws_cnt_service_thread_destroy(sh);
+	mlx5_aso_cnt_queue_uninit(sh);
+	mlx5_free(sh->cnt_svc);
+	sh->cnt_svc = NULL;
+}
+
+#endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
new file mode 100644
index 0000000000..5fab4ba597
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -0,0 +1,558 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2022 Mellanox Technologies, Ltd
+ */
+
+#ifndef _MLX5_HWS_CNT_H_
+#define _MLX5_HWS_CNT_H_
+
+#include <rte_ring.h>
+#include "mlx5_utils.h"
+#include "mlx5_flow.h"
+
+/*
+ * COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    | T |       | D |                                               |
+ *    ~ Y |       | C |                    IDX                        ~
+ *    | P |       | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
+#define MLX5_HWS_CNT_DCS_NUM 4
+#define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
+#define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
+#define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
+
+struct mlx5_hws_cnt_dcs {
+	void *dr_action;
+	uint32_t batch_sz;
+	uint32_t iidx; /* internal index of first counter in this bulk. */
+	struct mlx5_devx_obj *obj;
+};
+
+struct mlx5_hws_cnt_dcs_mng {
+	uint32_t batch_total;
+	struct mlx5_hws_cnt_dcs dcs[MLX5_HWS_CNT_DCS_NUM];
+};
+
+struct mlx5_hws_cnt {
+	struct flow_counter_stats reset;
+	union {
+		uint32_t share: 1;
+		/*
+		 * share will be set to 1 when this counter is used as indirect
+		 * action. Only meaningful when user own this counter.
+		 */
+		uint32_t query_gen_when_free;
+		/*
+		 * When PMD own this counter (user put back counter to PMD
+		 * counter pool, i.e), this field recorded value of counter
+		 * pools query generation at time user release the counter.
+		 */
+	};
+};
+
+struct mlx5_hws_cnt_raw_data_mng {
+	struct flow_counter_stats *raw;
+	struct mlx5_pmd_mr mr;
+};
+
+struct mlx5_hws_cache_param {
+	uint32_t size;
+	uint32_t q_num;
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+};
+
+struct mlx5_hws_cnt_pool_cfg {
+	char *name;
+	uint32_t request_num;
+	uint32_t alloc_factor;
+};
+
+struct mlx5_hws_cnt_pool_caches {
+	uint32_t fetch_sz;
+	uint32_t threshold;
+	uint32_t preload_sz;
+	uint32_t q_num;
+	struct rte_ring *qcache[];
+};
+
+struct mlx5_hws_cnt_pool {
+	struct mlx5_hws_cnt_pool_cfg cfg __rte_cache_aligned;
+	struct mlx5_hws_cnt_dcs_mng dcs_mng __rte_cache_aligned;
+	uint32_t query_gen __rte_cache_aligned;
+	struct mlx5_hws_cnt *pool;
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng;
+	struct rte_ring *reuse_list;
+	struct rte_ring *free_list;
+	struct rte_ring *wait_reset_list;
+	struct mlx5_hws_cnt_pool_caches *cache;
+} __rte_cache_aligned;
+
+/**
+ * Translate counter id into internal index (start from 0), which can be used
+ * as index of raw/cnt pool.
+ *
+ * @param cnt_id
+ *   The external counter id
+ * @return
+ *   Internal index
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+	uint32_t offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+
+	dcs_idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	return (cpool->dcs_mng.dcs[dcs_idx].iidx + offset);
+}
+
+/**
+ * Check if it's valid counter id.
+ */
+static __rte_always_inline bool
+mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
+{
+	return (cnt_id >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_COUNT ? true : false;
+}
+
+/**
+ * Generate Counter id from internal index.
+ *
+ * @param cpool
+ *   The pointer to counter pool
+ * @param index
+ *   The internal counter index.
+ *
+ * @return
+ *   Counter id
+ */
+static __rte_always_inline cnt_id_t
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+{
+	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
+	uint32_t idx;
+	uint32_t offset;
+	cnt_id_t cnt_id;
+
+	for (idx = 0, offset = iidx; idx < dcs_mng->batch_total; idx++) {
+		if (dcs_mng->dcs[idx].batch_sz <= offset)
+			offset -= dcs_mng->dcs[idx].batch_sz;
+		else
+			break;
+	}
+	cnt_id = offset;
+	cnt_id |= (idx << MLX5_HWS_CNT_DCS_IDX_OFFSET);
+	return (MLX5_INDIRECT_ACTION_TYPE_COUNT <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | cnt_id;
+}
+
+static __rte_always_inline void
+__hws_cnt_query_raw(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		uint64_t *raw_pkts, uint64_t *raw_bytes)
+{
+	struct mlx5_hws_cnt_raw_data_mng *raw_mng = cpool->raw_mng;
+	struct flow_counter_stats s[2];
+	uint8_t i = 0x1;
+	size_t stat_sz = sizeof(s[0]);
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	memcpy(&s[0], &raw_mng->raw[iidx], stat_sz);
+	do {
+		memcpy(&s[i & 1], &raw_mng->raw[iidx], stat_sz);
+		if (memcmp(&s[0], &s[1], stat_sz) == 0) {
+			*raw_pkts = rte_be_to_cpu_64(s[0].hits);
+			*raw_bytes = rte_be_to_cpu_64(s[0].bytes);
+			break;
+		}
+		i = ~i;
+	} while (1);
+}
+
+/**
+ * Copy elems from one zero-copy ring to zero-copy ring in place.
+ *
+ * The input is a rte ring zero-copy data struct, which has two pointer.
+ * in case of the wrapper happened, the ptr2 will be meaningful.
+ *
+ * So this rountin needs to consider the situation that the address given by
+ * source and destination could be both wrapped.
+ * First, calculate the first number of element needs to be copied until wrapped
+ * address, which could be in source or destination.
+ * Second, copy left number of element until second wrapped address. If in first
+ * step the wrapped address is source, then this time it must be in destination.
+ * and vice-vers.
+ * Third, copy all left numbe of element.
+ *
+ * In worst case, we need copy three pieces of continuous memory.
+ *
+ * @param zcdd
+ *   A pointer to zero-copy data of dest ring.
+ * @param zcds
+ *   A pointer to zero-copy data of source ring.
+ * @param n
+ *   Number of elems to copy.
+ */
+static __rte_always_inline void
+__hws_cnt_r2rcpy(struct rte_ring_zc_data *zcdd, struct rte_ring_zc_data *zcds,
+		unsigned int n)
+{
+	unsigned int n1, n2, n3;
+	void *s1, *s2, *s3;
+	void *d1, *d2, *d3;
+
+	s1 = zcds->ptr1;
+	d1 = zcdd->ptr1;
+	n1 = RTE_MIN(zcdd->n1, zcds->n1);
+	if (zcds->n1 > n1) {
+		n2 = zcds->n1 - n1;
+		s2 = RTE_PTR_ADD(zcds->ptr1, sizeof(cnt_id_t) * n1);
+		d2 = zcdd->ptr2;
+		n3 = n - n1 - n2;
+		s3 = zcds->ptr2;
+		d3 = RTE_PTR_ADD(zcdd->ptr2, sizeof(cnt_id_t) * n2);
+	} else {
+		n2 = zcdd->n1 - n1;
+		s2 = zcds->ptr2;
+		d2 = RTE_PTR_ADD(zcdd->ptr1, sizeof(cnt_id_t) * n1);
+		n3 = n - n1 - n2;
+		s3 = RTE_PTR_ADD(zcds->ptr2, sizeof(cnt_id_t) * n2);
+		d3 = zcdd->ptr2;
+	}
+	memcpy(d1, s1, n1 * sizeof(cnt_id_t));
+	if (n2 != 0) {
+		memcpy(d2, s2, n2 * sizeof(cnt_id_t));
+		if (n3 != 0)
+			memcpy(d3, s3, n3 * sizeof(cnt_id_t));
+	}
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_flush(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *reset_list = NULL;
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache,
+			sizeof(cnt_id_t), rte_ring_count(qcache), &zcdc,
+			NULL);
+	MLX5_ASSERT(ret);
+	reset_list = cpool->wait_reset_list;
+	rte_ring_enqueue_zc_burst_elem_start(reset_list,
+			sizeof(cnt_id_t), ret, &zcdr, NULL);
+	__hws_cnt_r2rcpy(&zcdr, &zcdc, ret);
+	rte_ring_enqueue_zc_elem_finish(reset_list, ret);
+	rte_ring_dequeue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_cache_fetch(struct mlx5_hws_cnt_pool *cpool,
+			      uint32_t queue_id)
+{
+	struct rte_ring *qcache = cpool->cache->qcache[queue_id];
+	struct rte_ring *free_list = NULL;
+	struct rte_ring *reuse_list = NULL;
+	struct rte_ring *list = NULL;
+	struct rte_ring_zc_data zcdf = {0};
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdu = {0};
+	struct rte_ring_zc_data zcds = {0};
+	struct mlx5_hws_cnt_pool_caches *cache = cpool->cache;
+	unsigned int ret;
+
+	reuse_list = cpool->reuse_list;
+	ret = rte_ring_dequeue_zc_burst_elem_start(reuse_list,
+			sizeof(cnt_id_t), cache->fetch_sz, &zcdu, NULL);
+	zcds = zcdu;
+	list = reuse_list;
+	if (unlikely(ret == 0)) { /* no reuse counter. */
+		rte_ring_dequeue_zc_elem_finish(reuse_list, 0);
+		free_list = cpool->free_list;
+		ret = rte_ring_dequeue_zc_burst_elem_start(free_list,
+				sizeof(cnt_id_t), cache->fetch_sz, &zcdf, NULL);
+		zcds = zcdf;
+		list = free_list;
+		if (unlikely(ret == 0)) { /* no free counter. */
+			rte_ring_dequeue_zc_elem_finish(free_list, 0);
+			if (rte_ring_count(cpool->wait_reset_list))
+				return -EAGAIN;
+			return -ENOENT;
+		}
+	}
+	rte_ring_enqueue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+			ret, &zcdc, NULL);
+	__hws_cnt_r2rcpy(&zcdc, &zcds, ret);
+	rte_ring_dequeue_zc_elem_finish(list, ret);
+	rte_ring_enqueue_zc_elem_finish(qcache, ret);
+	return 0;
+}
+
+static __rte_always_inline int
+__mlx5_hws_cnt_pool_enqueue_revert(struct rte_ring *r, unsigned int n,
+		struct rte_ring_zc_data *zcd)
+{
+	uint32_t current_head = 0;
+	uint32_t revert2head = 0;
+
+	MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST);
+	MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST);
+	current_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	MLX5_ASSERT(n <= r->capacity);
+	MLX5_ASSERT(n <= rte_ring_count(r));
+	revert2head = current_head - n;
+	r->prod.head = revert2head; /* This ring should be SP. */
+	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
+			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
+	/* Update tail */
+	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
+	return n;
+}
+
+/**
+ * Put one counter back in the mempool.
+ *
+ * @param cpool
+ *   A pointer to the counter pool structure.
+ * @param cnt_id
+ *   A counter id to be added.
+ * @return
+ *   - 0: Success; object taken
+ *   - -ENOENT: not enough entry in pool
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret = 0;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring_zc_data zcdr = {0};
+	struct rte_ring *qcache = NULL;
+	unsigned int wb_num = 0; /* cache write-back number. */
+	cnt_id_t iidx;
+
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].query_gen_when_free =
+		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_enqueue_elem(cpool->wait_reset_list, cnt_id,
+				sizeof(cnt_id_t));
+		MLX5_ASSERT(ret == 0);
+		return ret;
+	}
+	ret = rte_ring_enqueue_burst_elem(qcache, cnt_id, sizeof(cnt_id_t), 1,
+					  NULL);
+	if (unlikely(ret == 0)) { /* cache is full. */
+		wb_num = rte_ring_count(qcache) - cpool->cache->threshold;
+		MLX5_ASSERT(wb_num < rte_ring_count(qcache));
+		__mlx5_hws_cnt_pool_enqueue_revert(qcache, wb_num, &zcdc);
+		rte_ring_enqueue_zc_burst_elem_start(cpool->wait_reset_list,
+				sizeof(cnt_id_t), wb_num, &zcdr, NULL);
+		__hws_cnt_r2rcpy(&zcdr, &zcdc, wb_num);
+		rte_ring_enqueue_zc_elem_finish(cpool->wait_reset_list, wb_num);
+		/* write-back THIS counter too */
+		ret = rte_ring_enqueue_burst_elem(cpool->wait_reset_list,
+				cnt_id, sizeof(cnt_id_t), 1, NULL);
+	}
+	return ret == 1 ? 0 : -ENOENT;
+}
+
+/**
+ * Get one counter from the pool.
+ *
+ * If @param queue is not null, objects will be retrieved first from queue's
+ * cache, subsequently from the common pool. Note that it can return -ENOENT
+ * when the local cache and common pool are empty, even if cache from other
+ * queue are full.
+ *
+ * @param cntp
+ *   A pointer to the counter pool structure.
+ * @param queue
+ *   A pointer to HWS queue. If null, it means fetch from common pool.
+ * @param cnt_id
+ *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @return
+ *   - 0: Success; objects taken.
+ *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
+ *   - -EAGAIN: counter is not ready; try again.
+ */
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
+		uint32_t *queue, cnt_id_t *cnt_id)
+{
+	unsigned int ret;
+	struct rte_ring_zc_data zcdc = {0};
+	struct rte_ring *qcache = NULL;
+	uint32_t query_gen = 0;
+	cnt_id_t iidx, tmp_cid = 0;
+
+	if (likely(queue != NULL))
+		qcache = cpool->cache->qcache[*queue];
+	if (unlikely(qcache == NULL)) {
+		ret = rte_ring_dequeue_elem(cpool->reuse_list, &tmp_cid,
+				sizeof(cnt_id_t));
+		if (unlikely(ret != 0)) {
+			ret = rte_ring_dequeue_elem(cpool->free_list, &tmp_cid,
+					sizeof(cnt_id_t));
+			if (unlikely(ret != 0)) {
+				if (rte_ring_count(cpool->wait_reset_list))
+					return -EAGAIN;
+				return -ENOENT;
+			}
+		}
+		*cnt_id = tmp_cid;
+		iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+		__hws_cnt_query_raw(cpool, *cnt_id,
+				    &cpool->pool[iidx].reset.hits,
+				    &cpool->pool[iidx].reset.bytes);
+		return 0;
+	}
+	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
+			&zcdc, NULL);
+	if (unlikely(ret == 0)) { /* local cache is empty. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+	}
+	/* get one from local cache. */
+	*cnt_id = (*(cnt_id_t *)zcdc.ptr1);
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	query_gen = cpool->pool[iidx].query_gen_when_free;
+	if (cpool->query_gen == query_gen) { /* counter is waiting to reset. */
+		rte_ring_dequeue_zc_elem_finish(qcache, 0);
+		/* write-back counter to reset list. */
+		mlx5_hws_cnt_pool_cache_flush(cpool, *queue);
+		/* let's fetch from global free list. */
+		ret = mlx5_hws_cnt_pool_cache_fetch(cpool, *queue);
+		if (unlikely(ret != 0))
+			return ret;
+		rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t),
+				1, &zcdc, NULL);
+		*cnt_id = *(cnt_id_t *)zcdc.ptr1;
+	}
+	__hws_cnt_query_raw(cpool, *cnt_id, &cpool->pool[iidx].reset.hits,
+			    &cpool->pool[iidx].reset.bytes);
+	rte_ring_dequeue_zc_elem_finish(qcache, 1);
+	cpool->pool[iidx].share = 0;
+	return 0;
+}
+
+static __rte_always_inline unsigned int
+mlx5_hws_cnt_pool_get_size(struct mlx5_hws_cnt_pool *cpool)
+{
+	return rte_ring_get_capacity(cpool->free_list);
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
+		cnt_id_t cnt_id, struct mlx5dr_action **action,
+		uint32_t *offset)
+{
+	uint8_t idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
+
+	idx &= MLX5_HWS_CNT_DCS_IDX_MASK;
+	*action = cpool->dcs_mng.dcs[idx].dr_action;
+	*offset = cnt_id & MLX5_HWS_CNT_IDX_MASK;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx;
+
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	if (ret != 0)
+		return ret;
+	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	MLX5_ASSERT(cpool->pool[iidx].share == 0);
+	cpool->pool[iidx].share = 1;
+	return 0;
+}
+
+static __rte_always_inline int
+mlx5_hws_cnt_shared_put(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+{
+	int ret;
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+
+	cpool->pool[iidx].share = 0;
+	ret = mlx5_hws_cnt_pool_put(cpool, NULL, cnt_id);
+	if (unlikely(ret != 0))
+		cpool->pool[iidx].share = 1; /* fail to release, restore. */
+	return ret;
+}
+
+static __rte_always_inline bool
+mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	return cpool->pool[iidx].share ? true : false;
+}
+
+/* init HWS counter pool. */
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		const struct mlx5_hws_cache_param *ccfg);
+
+void
+mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
+
+int
+mlx5_hws_cnt_service_thread_create(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_service_thread_destroy(struct mlx5_dev_ctx_shared *sh);
+
+int
+mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+void
+mlx5_hws_cnt_pool_dcs_free(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_pool_action_create(struct mlx5_priv *priv,
+		struct mlx5_hws_cnt_pool *cpool);
+
+void
+mlx5_hws_cnt_pool_action_destroy(struct mlx5_hws_cnt_pool *cpool);
+
+struct mlx5_hws_cnt_pool *
+mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
+		const struct rte_flow_port_attr *pattr, uint16_t nb_queue);
+
+void
+mlx5_hws_cnt_pool_destroy(struct mlx5_dev_ctx_shared *sh,
+		struct mlx5_hws_cnt_pool *cpool);
+
+int
+mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
+
+void
+mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
+
+#endif /* _MLX5_HWS_CNT_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 09/18] net/mlx5: support DR action template API
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (7 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 08/18] net/mlx5: add HW steering counter action Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:45     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
                     ` (9 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adapts mlx5 PMD to changes in mlx5dr API regarding action
templates. It changes the following:

1. Actions template creation:

    - Flow actions types are translated to mlx5dr action types in order
      to create mlx5dr_action_template object.
    - An offset is assigned to each flow action. This offset is used to
      predetermine action's location in rule_acts array passed on rule
      creation.

2. Template table creation:

    - Fixed actions are created and put in rule_acts cache using
      predetermined offsets
    - mlx5dr matcher is parametrized by action templates bound to
      template table.
    - mlx5dr matcher is configured to optimize rule creation based on
      passed rule indices.

3. Flow rule creation:

    - mlx5dr rule is parametrized by action template on which these
      rule's actions are based.
    - Rule index hint is provided to mlx5dr.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |   1 +
 drivers/net/mlx5/mlx5.c          |   4 +-
 drivers/net/mlx5/mlx5.h          |   2 +
 drivers/net/mlx5/mlx5_flow.h     |  32 +-
 drivers/net/mlx5/mlx5_flow_hw.c  | 617 +++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_trigger.c  |  10 +
 6 files changed, 543 insertions(+), 123 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 62b957839c..85a8247a6f 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1578,6 +1578,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 		}
 		/* Only HWS requires this information. */
 		flow_hw_init_tags_set(eth_dev);
+		flow_hw_init_flow_metadata_config(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
 		    flow_hw_create_vport_action(eth_dev)) {
 			DRV_LOG(ERR, "port %u failed to create vport action",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4d87da8e29..e7a4aac354 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1969,8 +1969,10 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 	flow_hw_destroy_vport_action(dev);
 	flow_hw_resource_release(dev);
 	flow_hw_clear_port_info(dev);
-	if (priv->sh->config.dv_flow_en == 2)
+	if (priv->sh->config.dv_flow_en == 2) {
+		flow_hw_clear_flow_metadata_config();
 		flow_hw_clear_tags_set(dev);
+	}
 #endif
 	if (priv->rxq_privs != NULL) {
 		/* XXX race condition if mlx5_rx_burst() is still running. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6f42053ef0..26f5af22a6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1657,6 +1657,8 @@ struct mlx5_priv {
 	struct mlx5dr_action *hw_drop[2];
 	/* HW steering global tag action. */
 	struct mlx5dr_action *hw_tag[2];
+	/* HW steering create ongoing rte flow table list header. */
+	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
 #endif
 };
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 213d8c2689..6782f4b2bb 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1189,6 +1189,11 @@ struct rte_flow_actions_template {
 	struct rte_flow_actions_template_attr attr;
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
+	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint16_t dr_actions_num; /* Amount of DR rules actions. */
+	uint16_t actions_num; /* Amount of flow actions */
+	uint16_t *actions_off; /* DR action offset for given rte action offset. */
+	uint16_t reformat_off; /* Offset of DR reformat action. */
 	uint16_t mhdr_off; /* Offset of DR modify header action. */
 	uint32_t refcnt; /* Reference counter. */
 	uint16_t rx_cpy_pos; /* Action position of Rx metadata to be copied. */
@@ -1240,7 +1245,6 @@ struct mlx5_hw_actions {
 	/* Encap/Decap action. */
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
-	uint32_t acts_num:4; /* Total action number. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
 	/* Translated DR action array from action template. */
@@ -1496,6 +1500,13 @@ flow_hw_get_wire_port(struct ibv_context *ibctx)
 }
 #endif
 
+extern uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+extern uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+extern uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+void flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev);
+void flow_hw_clear_flow_metadata_config(void);
+
 /*
  * Convert metadata or tag to the actual register.
  * META: Can only be used to match in the FDB in this stage, fixed C_1.
@@ -1507,7 +1518,22 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 {
 	switch (type) {
 	case RTE_FLOW_ITEM_TYPE_META:
-		return REG_C_1;
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		if (mlx5_flow_hw_flow_metadata_esw_en &&
+		    mlx5_flow_hw_flow_metadata_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		}
+#endif
+		/*
+		 * On root table - PMD allows only egress META matching, thus
+		 * REG_A matching is sufficient.
+		 *
+		 * On non-root tables - REG_A corresponds to general_purpose_lookup_field,
+		 * which translates to REG_A in NIC TX and to REG_B in NIC RX.
+		 * However, current FW does not implement REG_B case right now, so
+		 * REG_B case should be rejected on pattern template validation.
+		 */
+		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
@@ -2419,4 +2445,6 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_pattern_template_attr *attr,
 		const struct rte_flow_item items[],
 		struct rte_flow_error *error);
+int flow_hw_table_update(struct rte_eth_dev *dev,
+			 struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 3b5dfcd9e1..c1eef12116 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -340,6 +340,13 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 				 struct mlx5_hw_actions *acts)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_action_construct_data *data;
+
+	while (!LIST_EMPTY(&acts->act_list)) {
+		data = LIST_FIRST(&acts->act_list);
+		LIST_REMOVE(data, next);
+		mlx5_ipool_free(priv->acts_ipool, data->idx);
+	}
 
 	if (acts->jump) {
 		struct mlx5_flow_group *grp;
@@ -349,6 +356,16 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hlist_unregister(priv->sh->flow_tbls, &grp->entry);
 		acts->jump = NULL;
 	}
+	if (acts->tir) {
+		mlx5_hrxq_release(dev, acts->tir->idx);
+		acts->tir = NULL;
+	}
+	if (acts->encap_decap) {
+		if (acts->encap_decap->action)
+			mlx5dr_action_destroy(acts->encap_decap->action);
+		mlx5_free(acts->encap_decap);
+		acts->encap_decap = NULL;
+	}
 	if (acts->mhdr) {
 		if (acts->mhdr->action)
 			mlx5dr_action_destroy(acts->mhdr->action);
@@ -967,33 +984,29 @@ flow_hw_represented_port_compile(struct rte_eth_dev *dev,
 static __rte_always_inline int
 flow_hw_meter_compile(struct rte_eth_dev *dev,
 		      const struct mlx5_flow_template_table_cfg *cfg,
-		      uint32_t  start_pos, const struct rte_flow_action *action,
-		      struct mlx5_hw_actions *acts, uint32_t *end_pos,
+		      uint16_t aso_mtr_pos,
+		      uint16_t jump_pos,
+		      const struct rte_flow_action *action,
+		      struct mlx5_hw_actions *acts,
 		      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr *aso_mtr;
 	const struct rte_flow_action_meter *meter = action->conf;
-	uint32_t pos = start_pos;
 	uint32_t group = cfg->attr.flow_attr.group;
 
 	aso_mtr = mlx5_aso_meter_by_idx(priv, meter->mtr_id);
-	acts->rule_acts[pos].action = priv->mtr_bulk.action;
-	acts->rule_acts[pos].aso_meter.offset = aso_mtr->offset;
-		acts->jump = flow_hw_jump_action_register
+	acts->rule_acts[aso_mtr_pos].action = priv->mtr_bulk.action;
+	acts->rule_acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts->jump = flow_hw_jump_action_register
 		(dev, cfg, aso_mtr->fm.group, error);
-	if (!acts->jump) {
-		*end_pos = start_pos;
+	if (!acts->jump)
 		return -ENOMEM;
-	}
-	acts->rule_acts[++pos].action = (!!group) ?
+	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	*end_pos = pos;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
-		*end_pos = start_pos;
+	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
 		return -ENOMEM;
-	}
 	return 0;
 }
 
@@ -1046,11 +1059,11 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
  *    Table on success, NULL otherwise and rte_errno is set.
  */
 static int
-flow_hw_actions_translate(struct rte_eth_dev *dev,
-			  const struct mlx5_flow_template_table_cfg *cfg,
-			  struct mlx5_hw_actions *acts,
-			  struct rte_flow_actions_template *at,
-			  struct rte_flow_error *error)
+__flow_hw_actions_translate(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_template_table_cfg *cfg,
+			    struct mlx5_hw_actions *acts,
+			    struct rte_flow_actions_template *at,
+			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_template_table_attr *table_attr = &cfg->attr;
@@ -1061,12 +1074,15 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 	enum mlx5dr_action_reformat_type refmt_type = 0;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL, *enc_item_m = NULL;
-	uint16_t reformat_pos = MLX5_HW_MAX_ACTS, reformat_src = 0;
+	uint16_t reformat_src = 0;
 	uint8_t *encap_data = NULL, *encap_data_m = NULL;
 	size_t data_size = 0;
 	struct mlx5_hw_modify_header_action mhdr = { 0 };
 	bool actions_end = false;
-	uint32_t type, i;
+	uint32_t type;
+	bool reformat_used = false;
+	uint16_t action_pos;
+	uint16_t jump_pos;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1076,46 +1092,53 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 		type = MLX5DR_TABLE_TYPE_NIC_TX;
 	else
 		type = MLX5DR_TABLE_TYPE_NIC_RX;
-	for (i = 0; !actions_end; actions++, masks++) {
+	for (; !actions_end; actions++, masks++) {
 		switch (actions->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			action_pos = at->actions_off[actions - at->actions];
 			if (!attr->group) {
 				DRV_LOG(ERR, "Indirect action is not supported in root table.");
 				goto err;
 			}
 			if (actions->conf && masks->conf) {
 				if (flow_hw_shared_action_translate
-				(dev, actions, acts, actions - action_start, i))
+				(dev, actions, acts, actions - action_start, action_pos))
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
-			acts->rule_acts[i++].action =
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
 				priv->hw_drop[!!attr->group];
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
+			action_pos = at->actions_off[actions - at->actions];
 			acts->mark = true;
-			if (masks->conf)
-				acts->rule_acts[i].tag.value =
+			if (masks->conf &&
+			    ((const struct rte_flow_action_mark *)
+			     masks->conf)->id)
+				acts->rule_acts[action_pos].tag.value =
 					mlx5_flow_mark_set
 					(((const struct rte_flow_action_mark *)
-					(masks->conf))->id);
+					(actions->conf))->id);
 			else if (__flow_hw_act_data_general_append(priv, acts,
-				actions->type, actions - action_start, i))
+				actions->type, actions - action_start, action_pos))
 				goto err;
-			acts->rule_acts[i++].action =
+			acts->rule_acts[action_pos].action =
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_jump *)
+			     masks->conf)->group) {
 				uint32_t jump_group =
 					((const struct rte_flow_action_jump *)
 					actions->conf)->group;
@@ -1123,76 +1146,77 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 						(dev, cfg, jump_group, error);
 				if (!acts->jump)
 					goto err;
-				acts->rule_acts[i].action = (!!attr->group) ?
+				acts->rule_acts[action_pos].action = (!!attr->group) ?
 						acts->jump->hws_action :
 						acts->jump->root_action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)){
+					 actions - action_start, action_pos)){
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf &&
+			    ((const struct rte_flow_action_queue *)
+			     masks->conf)->index) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
-			if (masks->conf) {
+			action_pos = at->actions_off[actions - at->actions];
+			if (actions->conf && masks->conf) {
 				acts->tir = flow_hw_tir_action_register
 				(dev,
 				 mlx5_hw_act_flag[!!attr->group][type],
 				 actions);
 				if (!acts->tir)
 					goto err;
-				acts->rule_acts[i].action =
+				acts->rule_acts[action_pos].action =
 					acts->tir->action;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_vxlan_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_vxlan_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
+			MLX5_ASSERT(!reformat_used);
 			enc_item = ((const struct rte_flow_action_nvgre_encap *)
 				   actions->conf)->definition;
 			if (masks->conf)
 				enc_item_m = ((const struct rte_flow_action_nvgre_encap *)
 					     masks->conf)->definition;
-			reformat_pos = i++;
+			reformat_used = true;
 			reformat_src = actions - action_start;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
-			MLX5_ASSERT(reformat_pos == MLX5_HW_MAX_ACTS);
-			reformat_pos = i++;
+			MLX5_ASSERT(!reformat_used);
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
@@ -1206,28 +1230,26 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 actions->conf;
 			encap_data = raw_encap_data->data;
 			data_size = raw_encap_data->size;
-			if (reformat_pos != MLX5_HW_MAX_ACTS) {
+			if (reformat_used) {
 				refmt_type = data_size <
 				MLX5_ENCAPSULATION_DECISION_SIZE ?
 				MLX5DR_ACTION_REFORMAT_TYPE_TNL_L3_TO_L2 :
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L3;
 			} else {
-				reformat_pos = i++;
+				reformat_used = true;
 				refmt_type =
 				MLX5DR_ACTION_REFORMAT_TYPE_L2_TO_TNL_L2;
 			}
 			reformat_src = actions - action_start;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
-			reformat_pos = i++;
+			reformat_used = true;
 			refmt_type = MLX5DR_ACTION_REFORMAT_TYPE_TNL_L2_TO_L2;
 			break;
 		case RTE_FLOW_ACTION_TYPE_SEND_TO_KERNEL:
 			DRV_LOG(ERR, "send to kernel action is not supported in HW steering.");
 			goto err;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			if (mhdr.pos == UINT16_MAX)
-				mhdr.pos = i++;
 			err = flow_hw_modify_field_compile(dev, attr, action_start,
 							   actions, masks, acts, &mhdr,
 							   error);
@@ -1245,40 +1267,46 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				action_start += 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+			action_pos = at->actions_off[actions - at->actions];
 			if (flow_hw_represented_port_compile
 					(dev, attr, action_start, actions,
-					 masks, acts, i, error))
+					 masks, acts, action_pos, error))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
+			/*
+			 * METER action is compiled to 2 DR actions - ASO_METER and FT.
+			 * Calculated DR offset is stored only for ASO_METER and FT
+			 * is assumed to be the next action.
+			 */
+			action_pos = at->actions_off[actions - at->actions];
+			jump_pos = action_pos + 1;
 			if (actions->conf && masks->conf &&
 			    ((const struct rte_flow_action_meter *)
 			     masks->conf)->mtr_id) {
 				err = flow_hw_meter_compile(dev, cfg,
-						i, actions, acts, &i, error);
+						action_pos, jump_pos, actions, acts, error);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append(priv, acts,
 							actions->type,
 							actions - action_start,
-							i))
+							action_pos))
 				goto err;
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
+			action_pos = at->actions_off[actions - action_start];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
-				err = flow_hw_cnt_compile(dev, i, acts);
+				err = flow_hw_cnt_compile(dev, action_pos, acts);
 				if (err)
 					goto err;
 			} else if (__flow_hw_act_data_general_append
 					(priv, acts, actions->type,
-					 actions - action_start, i)) {
+					 actions - action_start, action_pos)) {
 				goto err;
 			}
-			i++;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -1312,10 +1340,11 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 			goto err;
 		acts->rule_acts[acts->mhdr->pos].action = acts->mhdr->action;
 	}
-	if (reformat_pos != MLX5_HW_MAX_ACTS) {
+	if (reformat_used) {
 		uint8_t buf[MLX5_ENCAP_MAX_LEN];
 		bool shared_rfmt = true;
 
+		MLX5_ASSERT(at->reformat_off != UINT16_MAX);
 		if (enc_item) {
 			MLX5_ASSERT(!encap_data);
 			if (flow_dv_convert_encap_data(enc_item, buf, &data_size, error))
@@ -1343,20 +1372,17 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				 (shared_rfmt ? MLX5DR_ACTION_FLAG_SHARED : 0));
 		if (!acts->encap_decap->action)
 			goto err;
-		acts->rule_acts[reformat_pos].action =
-						acts->encap_decap->action;
-		acts->rule_acts[reformat_pos].reformat.data =
-						acts->encap_decap->data;
+		acts->rule_acts[at->reformat_off].action = acts->encap_decap->action;
+		acts->rule_acts[at->reformat_off].reformat.data = acts->encap_decap->data;
 		if (shared_rfmt)
-			acts->rule_acts[reformat_pos].reformat.offset = 0;
+			acts->rule_acts[at->reformat_off].reformat.offset = 0;
 		else if (__flow_hw_act_data_encap_append(priv, acts,
 				 (action_start + reformat_src)->type,
-				 reformat_src, reformat_pos, data_size))
+				 reformat_src, at->reformat_off, data_size))
 			goto err;
 		acts->encap_decap->shared = shared_rfmt;
-		acts->encap_decap_pos = reformat_pos;
+		acts->encap_decap_pos = at->reformat_off;
 	}
-	acts->acts_num = i;
 	return 0;
 err:
 	err = rte_errno;
@@ -1366,6 +1392,40 @@ flow_hw_actions_translate(struct rte_eth_dev *dev,
 				  "fail to create rte table");
 }
 
+/**
+ * Translate rte_flow actions to DR action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] tbl
+ *   Pointer to the flow template table.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_actions_translate(struct rte_eth_dev *dev,
+			  struct rte_flow_template_table *tbl,
+			  struct rte_flow_error *error)
+{
+	uint32_t i;
+
+	for (i = 0; i < tbl->nb_action_templates; i++) {
+		if (__flow_hw_actions_translate(dev, &tbl->cfg,
+						&tbl->ats[i].acts,
+						tbl->ats[i].action_template,
+						error))
+			goto err;
+	}
+	return 0;
+err:
+	while (i--)
+		__flow_hw_action_template_destroy(dev, &tbl->ats[i].acts);
+	return -1;
+}
+
 /**
  * Get shared indirect action.
  *
@@ -1614,16 +1674,17 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 static __rte_always_inline int
 flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  struct mlx5_hw_q_job *job,
-			  const struct mlx5_hw_actions *hw_acts,
+			  const struct mlx5_hw_action_template *hw_at,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t *acts_num,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
+	const struct rte_flow_actions_template *at = hw_at->action_template;
+	const struct mlx5_hw_actions *hw_acts = &hw_at->acts;
 	const struct rte_flow_action *action;
 	const struct rte_flow_action_raw_encap *raw_encap_data;
 	const struct rte_flow_item *enc_item = NULL;
@@ -1639,11 +1700,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *mtr;
 	uint32_t mtr_id;
 
-	memcpy(rule_acts, hw_acts->rule_acts,
-	       sizeof(*rule_acts) * hw_acts->acts_num);
-	*acts_num = hw_acts->acts_num;
-	if (LIST_EMPTY(&hw_acts->act_list))
-		return 0;
+	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
 	ft_flag = mlx5_hw_act_flag[!!table->grp->group_id][table->type];
 	if (table->type == MLX5DR_TABLE_TYPE_FDB) {
@@ -1777,7 +1834,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			(*acts_num)++;
 			if (mlx5_aso_mtr_wait(priv->sh, mtr))
 				return -1;
 			break;
@@ -1915,13 +1971,16 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 		.burst = attr->postpone,
 	};
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
-	struct mlx5_hw_actions *hw_acts;
 	struct rte_flow_hw *flow;
 	struct mlx5_hw_q_job *job;
 	const struct rte_flow_item *rule_items;
-	uint32_t acts_num, flow_idx;
+	uint32_t flow_idx;
 	int ret;
 
+	if (unlikely((!dev->data->dev_started))) {
+		rte_errno = EINVAL;
+		goto error;
+	}
 	if (unlikely(!priv->hw_q[queue].job_idx)) {
 		rte_errno = ENOMEM;
 		goto error;
@@ -1944,7 +2003,12 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	job->flow = flow;
 	job->user_data = user_data;
 	rule_attr.user_data = job;
-	hw_acts = &table->ats[action_template_index].acts;
+	/*
+	 * Indexed pool returns 1-based indices, but mlx5dr expects 0-based indices for rule
+	 * insertion hints.
+	 */
+	MLX5_ASSERT(flow_idx > 0);
+	rule_attr.rule_idx = flow_idx - 1;
 	/*
 	 * Construct the flow actions based on the input actions.
 	 * The implicitly appended action is always fixed, like metadata
@@ -1952,8 +2016,8 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, hw_acts, pattern_template_index,
-				  actions, rule_acts, &acts_num, queue)) {
+	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
+				      pattern_template_index, actions, rule_acts, queue)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -1962,7 +2026,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	if (!rule_items)
 		goto free;
 	ret = mlx5dr_rule_create(table->matcher,
-				 pattern_template_index, items,
+				 pattern_template_index, rule_items,
 				 action_template_index, rule_acts,
 				 &rule_attr, (struct mlx5dr_rule *)flow->rule);
 	if (likely(!ret))
@@ -2298,6 +2362,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_template_table *tbl = NULL;
 	struct mlx5_flow_group *grp;
 	struct mlx5dr_match_template *mt[MLX5_HW_TBL_MAX_ITEM_TEMPLATE];
+	struct mlx5dr_action_template *at[MLX5_HW_TBL_MAX_ACTION_TEMPLATE];
 	const struct rte_flow_template_table_attr *attr = &table_cfg->attr;
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
@@ -2318,6 +2383,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct mlx5_list_entry *ge;
 	uint32_t i, max_tpl = MLX5_HW_TBL_MAX_ITEM_TEMPLATE;
 	uint32_t nb_flows = rte_align32pow2(attr->nb_flows);
+	bool port_started = !!dev->data->dev_started;
 	int err;
 
 	/* HWS layer accepts only 1 item template with root table. */
@@ -2352,12 +2418,20 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	tbl->grp = grp;
 	/* Prepare matcher information. */
 	matcher_attr.priority = attr->flow_attr.priority;
+	matcher_attr.optimize_using_rule_idx = true;
 	matcher_attr.mode = MLX5DR_MATCHER_RESOURCE_MODE_RULE;
 	matcher_attr.rule.num_log = rte_log2_u32(nb_flows);
 	/* Build the item template. */
 	for (i = 0; i < nb_item_templates; i++) {
 		uint32_t ret;
 
+		if ((flow_attr.ingress && !item_templates[i]->attr.ingress) ||
+		    (flow_attr.egress && !item_templates[i]->attr.egress) ||
+		    (flow_attr.transfer && !item_templates[i]->attr.transfer)) {
+			DRV_LOG(ERR, "pattern template and template table attribute mismatch");
+			rte_errno = EINVAL;
+			goto it_error;
+		}
 		ret = __atomic_add_fetch(&item_templates[i]->refcnt, 1,
 					 __ATOMIC_RELAXED);
 		if (ret <= 1) {
@@ -2367,10 +2441,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mt[i] = item_templates[i]->mt;
 		tbl->its[i] = item_templates[i];
 	}
-	tbl->matcher = mlx5dr_matcher_create
-		(tbl->grp->tbl, mt, nb_item_templates, NULL, 0, &matcher_attr);
-	if (!tbl->matcher)
-		goto it_error;
 	tbl->nb_item_templates = nb_item_templates;
 	/* Build the action template. */
 	for (i = 0; i < nb_action_templates; i++) {
@@ -2382,21 +2452,31 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			rte_errno = EINVAL;
 			goto at_error;
 		}
+		at[i] = action_templates[i]->tmpl;
+		tbl->ats[i].action_template = action_templates[i];
 		LIST_INIT(&tbl->ats[i].acts.act_list);
-		err = flow_hw_actions_translate(dev, &tbl->cfg,
-						&tbl->ats[i].acts,
-						action_templates[i], error);
+		if (!port_started)
+			continue;
+		err = __flow_hw_actions_translate(dev, &tbl->cfg,
+						  &tbl->ats[i].acts,
+						  action_templates[i], error);
 		if (err) {
 			i++;
 			goto at_error;
 		}
-		tbl->ats[i].action_template = action_templates[i];
 	}
 	tbl->nb_action_templates = nb_action_templates;
+	tbl->matcher = mlx5dr_matcher_create
+		(tbl->grp->tbl, mt, nb_item_templates, at, nb_action_templates, &matcher_attr);
+	if (!tbl->matcher)
+		goto at_error;
 	tbl->type = attr->flow_attr.transfer ? MLX5DR_TABLE_TYPE_FDB :
 		    (attr->flow_attr.egress ? MLX5DR_TABLE_TYPE_NIC_TX :
 		    MLX5DR_TABLE_TYPE_NIC_RX);
-	LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	if (port_started)
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	else
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl_ongo, tbl, next);
 	return tbl;
 at_error:
 	while (i--) {
@@ -2409,7 +2489,6 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	while (i--)
 		__atomic_sub_fetch(&item_templates[i]->refcnt,
 				   1, __ATOMIC_RELAXED);
-	mlx5dr_matcher_destroy(tbl->matcher);
 error:
 	err = rte_errno;
 	if (tbl) {
@@ -2426,6 +2505,33 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+/**
+ * Update flow template table.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+int
+flow_hw_table_update(struct rte_eth_dev *dev,
+		     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table *tbl;
+
+	while ((tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo)) != NULL) {
+		if (flow_hw_actions_translate(dev, tbl, error))
+			return -1;
+		LIST_REMOVE(tbl, next);
+		LIST_INSERT_HEAD(&priv->flow_hw_tbl, tbl, next);
+	}
+	return 0;
+}
+
 /**
  * Translates group index specified by the user in @p attr to internal
  * group index.
@@ -2504,6 +2610,7 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -2512,6 +2619,12 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
+	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
+		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+				  "egress flows are not supported with HW Steering"
+				  " when E-Switch is enabled");
+		return NULL;
+	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -2753,7 +2866,8 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_action *mask = &masks[i];
 
 		MLX5_ASSERT(i < MLX5_HW_MAX_ACTS);
-		if (action->type != mask->type)
+		if (action->type != RTE_FLOW_ACTION_TYPE_INDIRECT &&
+		    action->type != mask->type)
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
 						  action,
@@ -2829,6 +2943,157 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
+	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
+	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
+	[RTE_FLOW_ACTION_TYPE_JUMP] = MLX5DR_ACTION_TYP_FT,
+	[RTE_FLOW_ACTION_TYPE_QUEUE] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_RSS] = MLX5DR_ACTION_TYP_TIR,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP] = MLX5DR_ACTION_TYP_L2_TO_TNL_L2,
+	[RTE_FLOW_ACTION_TYPE_VXLAN_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
+	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
+	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
+	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+};
+
+static int
+flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
+					  unsigned int action_src,
+					  enum mlx5dr_action_type *action_types,
+					  uint16_t *curr_off,
+					  struct rte_flow_actions_template *at)
+{
+	uint32_t type;
+
+	if (!mask) {
+		DRV_LOG(WARNING, "Unable to determine indirect action type "
+			"without a mask specified");
+		return -EINVAL;
+	}
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
+		*curr_off = *curr_off + 1;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
+		*curr_off = *curr_off + 1;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * Create DR action template based on a provided sequence of flow actions.
+ *
+ * @param[in] at
+ *   Pointer to flow actions template to be updated.
+ *
+ * @return
+ *   DR action template pointer on success and action offsets in @p at are updated.
+ *   NULL otherwise.
+ */
+static struct mlx5dr_action_template *
+flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
+{
+	struct mlx5dr_action_template *dr_template;
+	enum mlx5dr_action_type action_types[MLX5_HW_MAX_ACTS] = { MLX5DR_ACTION_TYP_LAST };
+	unsigned int i;
+	uint16_t curr_off;
+	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+	uint16_t reformat_off = UINT16_MAX;
+	uint16_t mhdr_off = UINT16_MAX;
+	int ret;
+	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		const struct rte_flow_action_raw_encap *raw_encap_data;
+		size_t data_size;
+		enum mlx5dr_action_type type;
+
+		if (curr_off >= MLX5_HW_MAX_ACTS)
+			goto err_actions_num;
+		switch (at->actions[i].type) {
+		case RTE_FLOW_ACTION_TYPE_VOID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_INDIRECT:
+			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
+									action_types,
+									&curr_off, at);
+			if (ret)
+				return NULL;
+			break;
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
+			MLX5_ASSERT(reformat_off == UINT16_MAX);
+			reformat_off = curr_off++;
+			reformat_act_type = mlx5_hw_dr_action_types[at->actions[i].type];
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+			raw_encap_data = at->actions[i].conf;
+			data_size = raw_encap_data->size;
+			if (reformat_off != UINT16_MAX) {
+				reformat_act_type = data_size < MLX5_ENCAPSULATION_DECISION_SIZE ?
+					MLX5DR_ACTION_TYP_TNL_L3_TO_L2 :
+					MLX5DR_ACTION_TYP_L2_TO_TNL_L3;
+			} else {
+				reformat_off = curr_off++;
+				reformat_act_type = MLX5DR_ACTION_TYP_L2_TO_TNL_L2;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
+			reformat_off = curr_off++;
+			reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			if (mhdr_off == UINT16_MAX) {
+				mhdr_off = curr_off++;
+				type = mlx5_hw_dr_action_types[at->actions[i].type];
+				action_types[mhdr_off] = type;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
+			break;
+		default:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			break;
+		}
+	}
+	if (curr_off >= MLX5_HW_MAX_ACTS)
+		goto err_actions_num;
+	if (mhdr_off != UINT16_MAX)
+		at->mhdr_off = mhdr_off;
+	if (reformat_off != UINT16_MAX) {
+		at->reformat_off = reformat_off;
+		action_types[reformat_off] = reformat_act_type;
+	}
+	dr_template = mlx5dr_action_template_create(action_types);
+	if (dr_template)
+		at->dr_actions_num = curr_off;
+	else
+		DRV_LOG(ERR, "Failed to create DR action template: %d", rte_errno);
+	return dr_template;
+err_actions_num:
+	DRV_LOG(ERR, "Number of HW actions (%u) exceeded maximum (%u) allowed in template",
+		curr_off, MLX5_HW_MAX_ACTS);
+	return NULL;
+}
+
 /**
  * Create flow action template.
  *
@@ -2854,7 +3119,8 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_len, mask_len, i;
+	int len, act_num, act_len, mask_len;
+	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = MLX5_HW_MAX_ACTS;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
@@ -2924,6 +3190,11 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
+	/* Count flow actions to allocate required space for storing DR offsets. */
+	act_num = 0;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
+		act_num++;
+	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
 	if (!at) {
@@ -2933,19 +3204,26 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   "cannot allocate action template");
 		return NULL;
 	}
-	/* Actions part is in the first half. */
+	/* Actions part is in the first part. */
 	at->attr = *attr;
 	at->actions = (struct rte_flow_action *)(at + 1);
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->actions,
 				len, ra, error);
 	if (act_len <= 0)
 		goto error;
-	/* Masks part is in the second half. */
+	/* Masks part is in the second part. */
 	at->masks = (struct rte_flow_action *)(((uint8_t *)at->actions) + act_len);
 	mask_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, at->masks,
 				 len - act_len, rm, error);
 	if (mask_len <= 0)
 		goto error;
+	/* DR actions offsets in the third part. */
+	at->actions_off = (uint16_t *)((uint8_t *)at->masks + mask_len);
+	at->actions_num = act_num;
+	for (i = 0; i < at->actions_num; ++i)
+		at->actions_off[i] = UINT16_MAX;
+	at->reformat_off = UINT16_MAX;
+	at->mhdr_off = UINT16_MAX;
 	at->rx_cpy_pos = pos;
 	/*
 	 * mlx5 PMD hacks indirect action index directly to the action conf.
@@ -2959,12 +3237,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			at->masks[i].conf = masks->conf;
 		}
 	}
+	at->tmpl = flow_hw_dr_actions_template_create(at);
+	if (!at->tmpl)
+		goto error;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
 error:
-	if (at)
+	if (at) {
+		if (at->tmpl)
+			mlx5dr_action_template_destroy(at->tmpl);
 		mlx5_free(at);
+	}
 	return NULL;
 }
 
@@ -2995,6 +3279,8 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 				   "action template in using");
 	}
 	LIST_REMOVE(template, next);
+	if (template->tmpl)
+		mlx5dr_action_template_destroy(template->tmpl);
 	mlx5_free(template);
 	return 0;
 }
@@ -3045,11 +3331,48 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			 const struct rte_flow_item items[],
 			 struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	int i;
 	bool items_end = false;
-	RTE_SET_USED(dev);
-	RTE_SET_USED(attr);
 
+	if (!attr->ingress && !attr->egress && !attr->transfer)
+		return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+					  "at least one of the direction attributes"
+					  " must be specified");
+	if (priv->sh->config.dv_esw_en) {
+		MLX5_ASSERT(priv->master || priv->representor);
+		if (priv->master) {
+			/*
+			 * It is allowed to specify ingress, egress and transfer attributes
+			 * at the same time, in order to construct flows catching all missed
+			 * FDB traffic and forwarding it to the master port.
+			 */
+			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "only one or all direction attributes"
+							  " at once can be used on transfer proxy"
+							  " port");
+		} else {
+			if (attr->transfer)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+							  "transfer attribute cannot be used with"
+							  " port representors");
+			if (attr->ingress && attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
+							  "ingress and egress direction attributes"
+							  " cannot be used at the same time on"
+							  " port representors");
+		}
+	} else {
+		if (attr->transfer)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_TRANSFER, NULL,
+						  "transfer attribute cannot be used when"
+						  " E-Switch is disabled");
+	}
 	for (i = 0; !items_end; i++) {
 		int type = items[i].type;
 
@@ -3072,7 +3395,6 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		{
 			const struct rte_flow_item_tag *tag =
 				(const struct rte_flow_item_tag *)items[i].spec;
-			struct mlx5_priv *priv = dev->data->dev_private;
 			uint8_t regcs = (uint8_t)priv->sh->cdev->config.hca_attr.set_reg_c;
 
 			if (!((1 << (tag->index - REG_C_0)) & regcs))
@@ -3080,7 +3402,26 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 							  NULL,
 							  "Unsupported internal tag index");
+			break;
 		}
+		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
+			if (attr->ingress || attr->egress)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when transfer attribute is set");
+			break;
+		case RTE_FLOW_ITEM_TYPE_META:
+			if (!priv->sh->config.dv_esw_en ||
+			    priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_META32_HWS) {
+				if (attr->ingress)
+					return rte_flow_error_set(error, EINVAL,
+								  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+								  "META item is not supported"
+								  " on current FW with ingress"
+								  " attribute");
+			}
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -3090,10 +3431,8 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_TCP:
 		case RTE_FLOW_ITEM_TYPE_GTP:
 		case RTE_FLOW_ITEM_TYPE_GTP_PSC:
-		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		case RTE_FLOW_ITEM_TYPE_VXLAN:
 		case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
-		case RTE_FLOW_ITEM_TYPE_META:
 		case RTE_FLOW_ITEM_TYPE_GRE:
 		case RTE_FLOW_ITEM_TYPE_GRE_KEY:
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
@@ -3141,21 +3480,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress) {
-		/*
-		 * Disallow pattern template with ingress and egress/transfer
-		 * attributes in order to forbid implicit port matching
-		 * on egress and transfer traffic.
-		 */
-		if (attr->egress || attr->transfer) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					   NULL,
-					   "item template for ingress traffic"
-					   " cannot be used for egress/transfer"
-					   " traffic when E-Switch is enabled");
-			return NULL;
-		}
+	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
 		copied_items = flow_hw_copy_prepend_port_item(items, error);
 		if (!copied_items)
 			return NULL;
@@ -4539,6 +4864,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
+		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
+		flow_hw_table_destroy(dev, tbl, NULL);
+	}
 	while (!LIST_EMPTY(&priv->flow_hw_tbl)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -4676,6 +5005,54 @@ void flow_hw_clear_tags_set(struct rte_eth_dev *dev)
 		       sizeof(enum modify_reg) * MLX5_FLOW_HW_TAGS_MAX);
 }
 
+uint32_t mlx5_flow_hw_flow_metadata_config_refcnt;
+uint8_t mlx5_flow_hw_flow_metadata_esw_en;
+uint8_t mlx5_flow_hw_flow_metadata_xmeta_en;
+
+/**
+ * Initializes static configuration of META flow items.
+ *
+ * As a temporary workaround, META flow item is translated to a register,
+ * based on statically saved dv_esw_en and dv_xmeta_en device arguments.
+ * It is a workaround for flow_hw_get_reg_id() where port specific information
+ * is not available at runtime.
+ *
+ * Values of dv_esw_en and dv_xmeta_en device arguments are taken from the first opened port.
+ * This means that each mlx5 port will use the same configuration for translation
+ * of META flow items.
+ *
+ * @param[in] dev
+ *    Pointer to Ethernet device.
+ */
+void
+flow_hw_init_flow_metadata_config(struct rte_eth_dev *dev)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_fetch_add(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = MLX5_SH(dev)->config.dv_esw_en;
+	mlx5_flow_hw_flow_metadata_xmeta_en = MLX5_SH(dev)->config.dv_xmeta_en;
+}
+
+/**
+ * Clears statically stored configuration related to META flow items.
+ */
+void
+flow_hw_clear_flow_metadata_config(void)
+{
+	uint32_t refcnt;
+
+	refcnt = __atomic_sub_fetch(&mlx5_flow_hw_flow_metadata_config_refcnt, 1,
+				    __ATOMIC_RELAXED);
+	if (refcnt > 0)
+		return;
+	mlx5_flow_hw_flow_metadata_esw_en = 0;
+	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
+}
+
 /**
  * Create shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index cccec08d70..c260c81e57 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1170,6 +1170,16 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 			dev->data->port_id, rte_strerror(rte_errno));
 		goto error;
 	}
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		ret = flow_hw_table_update(dev, NULL);
+		if (ret) {
+			DRV_LOG(ERR, "port %u failed to update HWS tables",
+				dev->data->port_id);
+			goto error;
+		}
+	}
+#endif
 	ret = mlx5_traffic_enable(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u failed to set defaults flows",
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 10/18] net/mlx5: add HW steering connection tracking support
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (8 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 09/18] net/mlx5: support DR action template API Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:46     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
                     ` (8 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

This commit adds the support of connection tracking to HW steering as
SW steering did before.

Different with SW steering implementation, take advantage of HW steering
bulk action allocation support, in HW steering only one single CT pool
is needed.

An indexed pool is introduced to record allocated actions from bulk and
CT action state etc. Once one CT action is allocated from bulk, one
indexed object will also be allocated from the indexed pool, similar for
deallocate. That makes mlx5_aso_ct_action can also be managed by that
indexed pool, no need to be reserved from mlx5_aso_ct_pool. The single
CT pool is also saved to mlx5_aso_ct_action struct directly.

The ASO operation functions are shared with SW steering implementation.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 doc/guides/nics/mlx5.rst               |   2 +-
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/net/mlx5/linux/mlx5_os.c       |   8 +-
 drivers/net/mlx5/mlx5.c                |   3 +-
 drivers/net/mlx5/mlx5.h                |  54 +++-
 drivers/net/mlx5/mlx5_flow.c           |   1 +
 drivers/net/mlx5/mlx5_flow.h           |   7 +
 drivers/net/mlx5/mlx5_flow_aso.c       | 212 ++++++++++----
 drivers/net/mlx5/mlx5_flow_dv.c        |  28 +-
 drivers/net/mlx5/mlx5_flow_hw.c        | 381 ++++++++++++++++++++++++-
 10 files changed, 619 insertions(+), 78 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0c7bd042a4..e499c38dcf 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -506,7 +506,7 @@ Limitations
   - Cannot co-exist with ASO meter, ASO age action in a single flow rule.
   - Flow rules insertion rate and memory consumption need more optimization.
   - 256 ports maximum.
-  - 4M connections maximum.
+  - 4M connections maximum with ``dv_flow_en`` 1 mode. 16M with ``dv_flow_en`` 2.
 
 - Multi-thread flow insertion:
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 4e1634c4d8..725382c1b7 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -256,6 +256,7 @@ New Features
     - Support of FDB.
     - Support of meter.
     - Support of counter.
+    - Support of CT.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 85a8247a6f..5f1fd9b4e7 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1362,9 +1362,11 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			DRV_LOG(DEBUG, "Flow Hit ASO is supported.");
 		}
 #endif /* HAVE_MLX5_DR_CREATE_ACTION_ASO */
-#if defined(HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
-	defined(HAVE_MLX5_DR_ACTION_ASO_CT)
-		if (hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
+#if defined (HAVE_MLX5_DR_CREATE_ACTION_ASO) && \
+    defined (HAVE_MLX5_DR_ACTION_ASO_CT)
+		/* HWS create CT ASO SQ based on HWS configure queue number. */
+		if (sh->config.dv_flow_en != 2 &&
+		    hca_attr->ct_offload && priv->mtr_color_reg == REG_C_3) {
 			err = mlx5_flow_aso_ct_mng_init(sh);
 			if (err) {
 				err = -err;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e7a4aac354..6490ac636c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -755,7 +755,8 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 
 	if (sh->ct_mng)
 		return 0;
-	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng),
+	sh->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*sh->ct_mng) +
+				 sizeof(struct mlx5_aso_sq) * MLX5_ASO_CT_SQ_NUM,
 				 RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
 	if (!sh->ct_mng) {
 		DRV_LOG(ERR, "ASO CT management allocation failed.");
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 26f5af22a6..a5565dc391 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -44,6 +44,8 @@
 
 #define MLX5_SH(dev) (((struct mlx5_priv *)(dev)->data->dev_private)->sh)
 
+#define MLX5_HW_INV_QUEUE UINT32_MAX
+
 /*
  * Number of modification commands.
  * The maximal actions amount in FW is some constant, and it is 16 in the
@@ -1164,7 +1166,12 @@ enum mlx5_aso_ct_state {
 
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
-	LIST_ENTRY(mlx5_aso_ct_action) next; /* Pointer to the next ASO CT. */
+	union {
+		LIST_ENTRY(mlx5_aso_ct_action) next;
+		/* Pointer to the next ASO CT. Used only in SWS. */
+		struct mlx5_aso_ct_pool *pool;
+		/* Pointer to action pool. Used only in HWS. */
+	};
 	void *dr_action_orig; /* General action object for original dir. */
 	void *dr_action_rply; /* General action object for reply dir. */
 	uint32_t refcnt; /* Action used count in device flows. */
@@ -1178,28 +1185,48 @@ struct mlx5_aso_ct_action {
 #define MLX5_ASO_CT_UPDATE_STATE(c, s) \
 	__atomic_store_n(&((c)->state), (s), __ATOMIC_RELAXED)
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+
 /* ASO connection tracking software pool definition. */
 struct mlx5_aso_ct_pool {
 	uint16_t index; /* Pool index in pools array. */
+	/* Free ASO CT index in the pool. Used by HWS. */
+	struct mlx5_indexed_pool *cts;
 	struct mlx5_devx_obj *devx_obj;
-	/* The first devx object in the bulk, used for freeing (not yet). */
-	struct mlx5_aso_ct_action actions[MLX5_ASO_CT_ACTIONS_PER_POOL];
+	union {
+		void *dummy_action;
+		/* Dummy action to increase the reference count in the driver. */
+		struct mlx5dr_action *dr_action;
+		/* HWS action. */
+	};
+	struct mlx5_aso_sq *sq; /* Async ASO SQ. */
+	struct mlx5_aso_sq *shared_sq; /* Shared ASO SQ. */
+	struct mlx5_aso_ct_action actions[0];
 	/* CT action structures bulk. */
 };
 
 LIST_HEAD(aso_ct_list, mlx5_aso_ct_action);
 
+#define MLX5_ASO_CT_SQ_NUM 16
+
 /* Pools management structure for ASO connection tracking pools. */
 struct mlx5_aso_ct_pools_mng {
 	struct mlx5_aso_ct_pool **pools;
 	uint16_t n; /* Total number of pools. */
 	uint16_t next; /* Number of pools in use, index of next free pool. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
 	rte_spinlock_t ct_sl; /* The ASO CT free list lock. */
 	rte_rwlock_t resize_rwl; /* The ASO CT pool resize lock. */
 	struct aso_ct_list free_cts; /* Free ASO CT objects list. */
-	struct mlx5_aso_sq aso_sq; /* ASO queue objects. */
+	struct mlx5_aso_sq aso_sqs[0]; /* ASO queue objects. */
 };
 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
 /* LAG attr. */
 struct mlx5_lag {
 	uint8_t tx_remap_affinity[16]; /* The PF port number of affinity */
@@ -1343,8 +1370,7 @@ struct mlx5_dev_ctx_shared {
 	rte_spinlock_t geneve_tlv_opt_sl; /* Lock for geneve tlv resource */
 	struct mlx5_flow_mtr_mng *mtrmng;
 	/* Meter management structure. */
-	struct mlx5_aso_ct_pools_mng *ct_mng;
-	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pools_mng *ct_mng; /* Management data for ASO CT in HWS only. */
 	struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
 	unsigned int flow_max_priority;
 	enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
@@ -1660,6 +1686,9 @@ struct mlx5_priv {
 	/* HW steering create ongoing rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl_ongo, rte_flow_template_table) flow_hw_tbl_ongo;
 	struct mlx5_indexed_pool *acts_ipool; /* Action data indexed pool. */
+	struct mlx5_aso_ct_pools_mng *ct_mng;
+	/* Management data for ASO connection tracking. */
+	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
 #endif
 };
 
@@ -2059,15 +2088,15 @@ int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_aso_mtr *mtr);
-int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
-int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
 			     struct rte_flow_action_conntrack *profile);
-int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
 mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
@@ -2078,6 +2107,11 @@ int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
 		struct mlx5_hws_cnt_pool *cpool);
+int mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+			   struct mlx5_aso_ct_pools_mng *ct_mng,
+			   uint32_t nb_queues);
+int mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_aso_ct_pools_mng *ct_mng);
 
 /* mlx5_flow_flex.c */
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 320a11958f..82692120b1 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -49,6 +49,7 @@ struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
  */
 uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 enum modify_reg mlx5_flow_hw_avl_tags[MLX5_FLOW_HW_TAGS_MAX] = {REG_NON};
+enum modify_reg mlx5_flow_hw_aso_tag;
 
 struct tunnel_default_miss_ctx {
 	uint16_t *queue;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 6782f4b2bb..9621c167d7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -82,6 +82,10 @@ enum {
 #define MLX5_INDIRECT_ACT_CT_GET_IDX(index) \
 	((index) & ((1 << MLX5_INDIRECT_ACT_CT_OWNER_SHIFT) - 1))
 
+#define MLX5_ACTION_CTX_CT_GET_IDX  MLX5_INDIRECT_ACT_CT_GET_IDX
+#define MLX5_ACTION_CTX_CT_GET_OWNER MLX5_INDIRECT_ACT_CT_GET_OWNER
+#define MLX5_ACTION_CTX_CT_GEN_IDX MLX5_INDIRECT_ACT_CT_GEN_IDX
+
 /* Matches on selected register. */
 struct mlx5_rte_flow_item_tag {
 	enum modify_reg id;
@@ -1458,6 +1462,7 @@ extern struct flow_hw_port_info mlx5_flow_hw_port_infos[RTE_MAX_ETHPORTS];
 #define MLX5_FLOW_HW_TAGS_MAX 8
 extern uint32_t mlx5_flow_hw_avl_tags_init_cnt;
 extern enum modify_reg mlx5_flow_hw_avl_tags[];
+extern enum modify_reg mlx5_flow_hw_aso_tag;
 
 /*
  * Get metadata match tag and mask for given rte_eth_dev port.
@@ -1534,6 +1539,8 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 * REG_B case should be rejected on pattern template validation.
 		 */
 		return REG_A;
+	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
 		return mlx5_flow_hw_avl_tags[id];
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index ed9272e583..c00c07b891 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -313,16 +313,8 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		/* 64B per object for query. */
-		if (mlx5_aso_reg_mr(cdev, 64 * sq_desc_n,
-				    &sh->ct_mng->aso_sq.mr))
+		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
 			return -1;
-		if (mlx5_aso_sq_create(cdev, &sh->ct_mng->aso_sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC)) {
-			mlx5_aso_dereg_mr(cdev, &sh->ct_mng->aso_sq.mr);
-			return -1;
-		}
-		mlx5_aso_ct_init_sq(&sh->ct_mng->aso_sq);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
@@ -343,7 +335,7 @@ void
 mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		      enum mlx5_access_aso_opc_mod aso_opc_mod)
 {
-	struct mlx5_aso_sq *sq;
+	struct mlx5_aso_sq *sq = NULL;
 
 	switch (aso_opc_mod) {
 	case ASO_OPC_MOD_FLOW_HIT:
@@ -354,14 +346,14 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->mtrmng->pools_mng.sq;
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
-		mlx5_aso_dereg_mr(sh->cdev, &sh->ct_mng->aso_sq.mr);
-		sq = &sh->ct_mng->aso_sq;
+		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
 		break;
 	default:
 		DRV_LOG(ERR, "Unknown ASO operation mode");
 		return;
 	}
-	mlx5_aso_destroy_sq(sq);
+	if (sq)
+		mlx5_aso_destroy_sq(sq);
 }
 
 /**
@@ -903,6 +895,89 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
 	return -1;
 }
 
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_hws(uint32_t queue,
+			    struct mlx5_aso_ct_pool *pool)
+{
+	return (queue == MLX5_HW_INV_QUEUE) ?
+		pool->shared_sq : &pool->sq[queue];
+}
+
+static inline struct mlx5_aso_sq*
+__mlx5_aso_ct_get_sq_in_sws(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_ct_action *ct)
+{
+	return &sh->ct_mng->aso_sqs[ct->offset & (MLX5_ASO_CT_SQ_NUM - 1)];
+}
+
+static inline struct mlx5_aso_ct_pool*
+__mlx5_aso_ct_get_pool(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_action *ct)
+{
+	if (likely(sh->config.dv_flow_en == 2))
+		return ct->pool;
+	return container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+}
+
+int
+mlx5_aso_ct_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			 struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < ct_mng->nb_sq; i++) {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	}
+	return 0;
+}
+
+/**
+ * API to create and initialize CT Send Queue used for ASO access.
+ *
+ * @param[in] sh
+ *   Pointer to shared device context.
+ * @param[in] ct_mng
+ *   Pointer to the CT management struct.
+ * *param[in] nb_queues
+ *   Number of queues to be allocated.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_aso_ct_queue_init(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_ct_pools_mng *ct_mng,
+		       uint32_t nb_queues)
+{
+	uint32_t i;
+
+	/* 64B per object for query. */
+	for (i = 0; i < nb_queues; i++) {
+		if (mlx5_aso_reg_mr(sh->cdev, 64 * (1 << MLX5_ASO_QUEUE_LOG_DESC),
+				    &ct_mng->aso_sqs[i].mr))
+			goto error;
+		if (mlx5_aso_sq_create(sh->cdev, &ct_mng->aso_sqs[i],
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			goto error;
+		mlx5_aso_ct_init_sq(&ct_mng->aso_sqs[i]);
+	}
+	ct_mng->nb_sq = nb_queues;
+	return 0;
+error:
+	do {
+		if (ct_mng->aso_sqs[i].mr.addr)
+			mlx5_aso_dereg_mr(sh->cdev, &ct_mng->aso_sqs[i].mr);
+		if (&ct_mng->aso_sqs[i])
+			mlx5_aso_destroy_sq(&ct_mng->aso_sqs[i]);
+	} while (i--);
+	ct_mng->nb_sq = 0;
+	return -1;
+}
+
 /*
  * Post a WQE to the ASO CT SQ to modify the context.
  *
@@ -918,11 +993,12 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
  */
 static uint16_t
 mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
+			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile)
+			      const struct rte_flow_action_conntrack *profile,
+			      bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -931,11 +1007,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	void *orig_dir;
 	void *reply_dir;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	/* Prevent other threads to update the index. */
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -945,7 +1023,7 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
 	sq->elts[sq->head & mask].ct = ct;
 	sq->elts[sq->head & mask].query_data = NULL;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1028,7 +1106,8 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1080,10 +1159,11 @@ mlx5_aso_ct_status_update(struct mlx5_aso_sq *sq, uint16_t num)
  */
 static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
-			    struct mlx5_aso_ct_action *ct, char *data)
+			    struct mlx5_aso_sq *sq,
+			    struct mlx5_aso_ct_action *ct, char *data,
+			    bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
-	struct mlx5_aso_sq *sq = &sh->ct_mng->aso_sq;
 	uint16_t size = 1 << sq->log_desc_n;
 	uint16_t mask = size - 1;
 	uint16_t res;
@@ -1098,10 +1178,12 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	} else if (state == ASO_CONNTRACK_WAIT) {
 		return 0;
 	}
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
@@ -1113,7 +1195,7 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	wqe_idx = sq->head & mask;
 	sq->elts[wqe_idx].ct = ct;
 	sq->elts[wqe_idx].query_data = data;
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
+	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1141,7 +1223,8 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -1152,9 +1235,10 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
  *   Pointer to the CT pools management structure.
  */
 static void
-mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
+mlx5_aso_ct_completion_handle(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			      struct mlx5_aso_sq *sq,
+			      bool need_lock)
 {
-	struct mlx5_aso_sq *sq = &mng->aso_sq;
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
 	const uint32_t cq_size = 1 << cq->log_desc_n;
@@ -1165,10 +1249,12 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return;
 	}
 	next_idx = cq->cq_ci & mask;
@@ -1199,7 +1285,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /*
@@ -1207,6 +1294,8 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue index.
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  * @param[in] profile
@@ -1217,21 +1306,26 @@ mlx5_aso_ct_completion_handle(struct mlx5_aso_ct_pools_mng *mng)
  */
 int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
 			  const struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, ct, profile))
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1242,6 +1336,8 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *
  * @param[in] sh
  *   Pointer to mlx5_dev_ctx_shared object.
+ * @param[in] queue
+ *   The queue which CT works on..
  * @param[in] ct
  *   Pointer to connection tracking offload object.
  *
@@ -1249,25 +1345,29 @@ mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, -1 on failure.
  */
 int
-mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		       struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 	    ASO_CONNTRACK_READY)
 		return 0;
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		if (__atomic_load_n(&ct->state, __ATOMIC_RELAXED) ==
 		    ASO_CONNTRACK_READY)
 			return 0;
 		/* Waiting for CQE ready, consider should block or sleep. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to poll CQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
@@ -1363,18 +1463,24 @@ mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
  */
 int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
+			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
 			 struct rte_flow_action_conntrack *profile)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
-	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	char out_data[64 * 2];
 	int ret;
 
-	MLX5_ASSERT(ct);
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	do {
-		mlx5_aso_ct_completion_handle(sh->ct_mng);
-		ret = mlx5_aso_ct_sq_query_single(sh, ct, out_data);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1383,12 +1489,11 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		else
 			rte_delay_us_sleep(10u);
 	} while (--poll_wqe_times);
-	pool = container_of(ct, struct mlx5_aso_ct_pool, actions[ct->offset]);
 	DRV_LOG(ERR, "Fail to send WQE for ASO CT %d in pool %d",
 		ct->offset, pool->index);
 	return -1;
 data_handle:
-	ret = mlx5_aso_ct_wait_ready(sh, ct);
+	ret = mlx5_aso_ct_wait_ready(sh, queue, ct);
 	if (!ret)
 		mlx5_aso_ct_obj_analyze(profile, out_data);
 	return ret;
@@ -1408,13 +1513,20 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
  */
 int
 mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
+		      uint32_t queue,
 		      struct mlx5_aso_ct_action *ct)
 {
-	struct mlx5_aso_ct_pools_mng *mng = sh->ct_mng;
+	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
+	struct mlx5_aso_sq *sq;
+	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
 	uint32_t poll_cqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	enum mlx5_aso_ct_state state =
 				__atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 
+	if (sh->config.dv_flow_en == 2)
+		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
+	else
+		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
 	if (state == ASO_CONNTRACK_FREE) {
 		rte_errno = ENXIO;
 		return -rte_errno;
@@ -1423,13 +1535,13 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		return 0;
 	}
 	do {
-		mlx5_aso_ct_completion_handle(mng);
+		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
 		state = __atomic_load_n(&ct->state, __ATOMIC_RELAXED);
 		if (state == ASO_CONNTRACK_READY ||
 		    state == ASO_CONNTRACK_QUERY)
 			return 0;
-		/* Waiting for CQE ready, consider should block or sleep. */
-		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
+		/* Waiting for CQE ready, consider should block or sleep.  */
+		rte_delay_us_block(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
 	} while (--poll_cqe_times);
 	rte_errno = EBUSY;
 	return -rte_errno;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 2ca83f5d7a..0b65167451 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12860,6 +12860,7 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 	struct mlx5_devx_obj *obj = NULL;
 	uint32_t i;
 	uint32_t log_obj_size = rte_log2_u32(MLX5_ASO_CT_ACTIONS_PER_POOL);
+	size_t mem_size;
 
 	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
 							  priv->sh->cdev->pdn,
@@ -12869,7 +12870,10 @@ flow_dv_ct_pool_create(struct rte_eth_dev *dev,
 		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
 		return NULL;
 	}
-	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	mem_size = sizeof(struct mlx5_aso_ct_action) *
+		   MLX5_ASO_CT_ACTIONS_PER_POOL +
+		   sizeof(*pool);
+	pool = mlx5_malloc(MLX5_MEM_ZERO, mem_size, 0, SOCKET_ID_ANY);
 	if (!pool) {
 		rte_errno = ENOMEM;
 		claim_zero(mlx5_devx_cmd_destroy(obj));
@@ -13009,10 +13013,13 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, ct, pro))
-		return rte_flow_error_set(error, EBUSY,
-					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					  "Failed to update CT");
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+		flow_dv_aso_ct_dev_release(dev, idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	return idx;
@@ -14218,7 +14225,7 @@ flow_dv_translate(struct rte_eth_dev *dev,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
 						"Failed to get CT object.");
-			if (mlx5_aso_ct_available(priv->sh, ct))
+			if (mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct))
 				return rte_flow_error_set(error, rte_errno,
 						RTE_FLOW_ERROR_TYPE_ACTION,
 						NULL,
@@ -15832,14 +15839,15 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						ct, new_prf);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
 					"Failed to send CT context update WQE");
-		/* Block until ready or a failure. */
-		ret = mlx5_aso_ct_available(priv->sh, ct);
+		/* Block until ready or a failure, default is asynchronous. */
+		ret = mlx5_aso_ct_available(priv->sh, MLX5_HW_INV_QUEUE, ct);
 		if (ret)
 			rte_flow_error_set(error, rte_errno,
 					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16668,7 +16676,7 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index c1eef12116..ac349d35ef 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -15,6 +15,14 @@
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16
 
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_HW_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_HW_IPOOL_CACHE_MIN (1 << 9)
+
 /* Default push burst threshold. */
 #define BURST_THR 32u
 
@@ -324,6 +332,25 @@ flow_hw_tir_action_register(struct rte_eth_dev *dev,
 	return hrxq;
 }
 
+static __rte_always_inline int
+flow_hw_ct_compile(struct rte_eth_dev *dev,
+		   uint32_t queue, uint32_t idx,
+		   struct mlx5dr_rule_action *rule_act)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(priv->hws_ctpool->cts, MLX5_ACTION_CTX_CT_GET_IDX(idx));
+	if (!ct || mlx5_aso_ct_available(priv->sh, queue, ct))
+		return -1;
+	rule_act->action = priv->hws_ctpool->dr_action;
+	rule_act->aso_ct.offset = ct->offset;
+	rule_act->aso_ct.direction = ct->is_original ?
+		MLX5DR_ACTION_ASO_CT_DIRECTION_INITIATOR :
+		MLX5DR_ACTION_ASO_CT_DIRECTION_RESPONDER;
+	return 0;
+}
+
 /**
  * Destroy DR actions created by action template.
  *
@@ -640,6 +667,11 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
+				       idx, &acts->rule_acts[action_dst]))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1083,6 +1115,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	bool reformat_used = false;
 	uint16_t action_pos;
 	uint16_t jump_pos;
+	uint32_t ct_idx;
 	int err;
 
 	flow_hw_modify_field_init(&mhdr, at);
@@ -1308,6 +1341,20 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			action_pos = at->actions_off[actions - at->actions];
+			if (masks->conf) {
+				ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+					 ((uint32_t)(uintptr_t)actions->conf);
+				if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE, ct_idx,
+						       &acts->rule_acts[action_pos]))
+					goto err;
+			} else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos)) {
+				goto err;
+			}
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1482,6 +1529,8 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev data structure.
+ * @param[in] queue
+ *   The flow creation queue index.
  * @param[in] action
  *   Pointer to the shared indirect rte_flow action.
  * @param[in] table
@@ -1495,7 +1544,7 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *    0 on success, negative value otherwise and rte_errno is set.
  */
 static __rte_always_inline int
-flow_hw_shared_action_construct(struct rte_eth_dev *dev,
+flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
 				const uint8_t it_idx,
@@ -1535,6 +1584,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev,
 				&rule_act->counter.offset))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1730,6 +1783,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		uint64_t item_flags;
 		struct mlx5_hw_jump_action *jump;
 		struct mlx5_hrxq *hrxq;
+		uint32_t ct_idx;
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
@@ -1738,7 +1792,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
-					(dev, action, table, it_idx,
+					(dev, queue, action, table, it_idx,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -1863,6 +1917,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				return ret;
 			job->flow->cnt_id = act_data->shared_counter.id;
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			ct_idx = MLX5_ACTION_CTX_CT_GET_IDX
+				 ((uint32_t)(uintptr_t)action->conf);
+			if (flow_hw_ct_compile(dev, queue, ct_idx,
+					       &rule_acts[act_data->action_dst]))
+				return -1;
+			break;
 		default:
 			break;
 		}
@@ -2394,6 +2455,8 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	if (nb_flows < cfg.trunk_size) {
 		cfg.per_core_cache = 0;
 		cfg.trunk_size = nb_flows;
+	} else if (nb_flows <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
 	}
 	/* Check if we requires too many templates. */
 	if (nb_item_templates > max_tpl ||
@@ -2930,6 +2993,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -2956,6 +3022,7 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
+	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 };
 
 static int
@@ -2984,6 +3051,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3438,6 +3510,7 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_GRE_OPTION:
 		case RTE_FLOW_ITEM_TYPE_ICMP:
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
+		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
@@ -4630,6 +4703,97 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	return -EINVAL;
 }
 
+static void
+flow_hw_ct_mng_destroy(struct rte_eth_dev *dev,
+		       struct mlx5_aso_ct_pools_mng *ct_mng)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	mlx5_aso_ct_queue_uninit(priv->sh, ct_mng);
+	mlx5_free(ct_mng);
+}
+
+static void
+flow_hw_ct_pool_destroy(struct rte_eth_dev *dev __rte_unused,
+			struct mlx5_aso_ct_pool *pool)
+{
+	if (pool->dr_action)
+		mlx5dr_action_destroy(pool->dr_action);
+	if (pool->devx_obj)
+		claim_zero(mlx5_devx_cmd_destroy(pool->devx_obj));
+	if (pool->cts)
+		mlx5_ipool_destroy(pool->cts);
+	mlx5_free(pool);
+}
+
+static struct mlx5_aso_ct_pool *
+flow_hw_ct_pool_create(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *port_attr)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool;
+	struct mlx5_devx_obj *obj;
+	uint32_t nb_cts = rte_align32pow2(port_attr->nb_conn_tracks);
+	uint32_t log_obj_size = rte_log2_u32(nb_cts);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_ct_action),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hw_ct_action",
+	};
+	int reg_id;
+	uint32_t flags;
+
+	pool = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*pool), 0, SOCKET_ID_ANY);
+	if (!pool) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	obj = mlx5_devx_cmd_create_conn_track_offload_obj(priv->sh->cdev->ctx,
+							  priv->sh->cdev->pdn,
+							  log_obj_size);
+	if (!obj) {
+		rte_errno = ENODATA;
+		DRV_LOG(ERR, "Failed to create conn_track_offload_obj using DevX.");
+		goto err;
+	}
+	pool->devx_obj = obj;
+	reg_id = mlx5_flow_get_reg_id(dev, MLX5_ASO_CONNTRACK, 0, NULL);
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
+	pool->dr_action = mlx5dr_action_create_aso_ct(priv->dr_ctx,
+						      (struct mlx5dr_devx_obj *)obj,
+						      reg_id - REG_C_0, flags);
+	if (!pool->dr_action)
+		goto err;
+	/*
+	 * No need for local cache if CT number is a small number. Since
+	 * flow insertion rate will be very limited in that case. Here let's
+	 * set the number to less than default trunk size 4K.
+	 */
+	if (nb_cts <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_cts;
+	} else if (nb_cts <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	pool->cts = mlx5_ipool_create(&cfg);
+	if (!pool->cts)
+		goto err;
+	pool->sq = priv->ct_mng->aso_sqs;
+	/* Assign the last extra ASO SQ as public SQ. */
+	pool->shared_sq = &priv->ct_mng->aso_sqs[priv->nb_queue - 1];
+	return pool;
+err:
+	flow_hw_ct_pool_destroy(dev, pool);
+	return NULL;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4812,6 +4976,20 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (_queue_attr)
 		mlx5_free(_queue_attr);
+	if (port_attr->nb_conn_tracks) {
+		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
+			   sizeof(*priv->ct_mng);
+		priv->ct_mng = mlx5_malloc(MLX5_MEM_ZERO, mem_size,
+					   RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!priv->ct_mng)
+			goto err;
+		if (mlx5_aso_ct_queue_init(priv->sh, priv->ct_mng, nb_q_updated))
+			goto err;
+		priv->hws_ctpool = flow_hw_ct_pool_create(dev, port_attr);
+		if (!priv->hws_ctpool)
+			goto err;
+		priv->sh->ct_aso_en = 1;
+	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
 				nb_queue);
@@ -4820,6 +4998,14 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	return 0;
 err:
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -4893,6 +5079,14 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	}
 	if (priv->hws_cpool)
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+	if (priv->hws_ctpool) {
+		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
+		priv->hws_ctpool = NULL;
+	}
+	if (priv->ct_mng) {
+		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
+		priv->ct_mng = NULL;
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -4961,6 +5155,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		unset |= 1 << (REG_C_1 - REG_C_0);
 	masks &= ~unset;
 	if (mlx5_flow_hw_avl_tags_init_cnt) {
+		MLX5_ASSERT(mlx5_flow_hw_aso_tag == priv->mtr_color_reg);
 		for (i = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
 			if (mlx5_flow_hw_avl_tags[i] != REG_NON && !!((1 << i) & masks)) {
 				copy[mlx5_flow_hw_avl_tags[i] - REG_C_0] =
@@ -4983,6 +5178,7 @@ void flow_hw_init_tags_set(struct rte_eth_dev *dev)
 		}
 	}
 	priv->sh->hws_tags = 1;
+	mlx5_flow_hw_aso_tag = (enum modify_reg)priv->mtr_color_reg;
 	mlx5_flow_hw_avl_tags_init_cnt++;
 }
 
@@ -5053,6 +5249,170 @@ flow_hw_clear_flow_metadata_config(void)
 	mlx5_flow_hw_flow_metadata_xmeta_en = 0;
 }
 
+static int
+flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
+			  uint32_t idx,
+			  struct rte_flow_error *error)
+{
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	struct rte_eth_dev *owndev = &rte_eth_devices[owner];
+	struct mlx5_priv *priv = owndev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT destruction index");
+	}
+	__atomic_store_n(&ct->state, ASO_CONNTRACK_FREE,
+				 __ATOMIC_RELAXED);
+	mlx5_ipool_free(pool->cts, ct_idx);
+	return 0;
+}
+
+static int
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+			struct rte_flow_action_conntrack *profile,
+			struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+
+	if (owner != PORT_ID(priv))
+		return rte_flow_error_set(error, EACCES,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Can't query CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT query index");
+	}
+	profile->peer_port = ct->peer;
+	profile->is_original_dir = ct->is_original;
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+		return rte_flow_error_set(error, EIO,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Failed to query CT context");
+	return 0;
+}
+
+
+static int
+flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_modify_conntrack *action_conf,
+			 uint32_t idx, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	const struct rte_flow_action_conntrack *new_prf;
+	uint16_t owner = (uint16_t)MLX5_ACTION_CTX_CT_GET_OWNER(idx);
+	uint32_t ct_idx;
+	int ret = 0;
+
+	if (PORT_ID(priv) != owner)
+		return rte_flow_error_set(error, EACCES,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL,
+					  "Can't update CT object owned by another port");
+	ct_idx = MLX5_ACTION_CTX_CT_GET_IDX(idx);
+	ct = mlx5_ipool_get(pool->cts, ct_idx);
+	if (!ct) {
+		return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL,
+				"Invalid CT update index");
+	}
+	new_prf = &action_conf->new_ct;
+	if (action_conf->direction)
+		ct->is_original = !!new_prf->is_original_dir;
+	if (action_conf->state) {
+		/* Only validate the profile when it needs to be updated. */
+		ret = mlx5_validate_action_ct(dev, new_prf, error);
+		if (ret)
+			return ret;
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		if (ret)
+			return rte_flow_error_set(error, EIO,
+					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					NULL,
+					"Failed to send CT context update WQE");
+		if (queue != MLX5_HW_INV_QUEUE)
+			return 0;
+		/* Block until ready or a failure in synchronous mode. */
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret)
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+	}
+	return ret;
+}
+
+static struct rte_flow_action_handle *
+flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action_conntrack *pro,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
+	struct mlx5_aso_ct_action *ct;
+	uint32_t ct_idx = 0;
+	int ret;
+	bool async = !!(queue != MLX5_HW_INV_QUEUE);
+
+	if (!pool) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "CT is not enabled");
+		return 0;
+	}
+	ct = mlx5_ipool_zmalloc(pool->cts, &ct_idx);
+	if (!ct) {
+		rte_flow_error_set(error, rte_errno,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to allocate CT object");
+		return 0;
+	}
+	ct->offset = ct_idx - 1;
+	ct->is_original = !!pro->is_original_dir;
+	ct->peer = pro->peer_port;
+	ct->pool = pool;
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+		mlx5_ipool_free(pool->cts, ct_idx);
+		rte_flow_error_set(error, EBUSY,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+				   "Failed to update CT");
+		return 0;
+	}
+	if (!async) {
+		ret = mlx5_aso_ct_available(priv->sh, queue, ct);
+		if (ret) {
+			mlx5_ipool_free(pool->cts, ct_idx);
+			rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL,
+					   "Timeout to get the CT update");
+			return 0;
+		}
+	}
+	return (struct rte_flow_action_handle *)(uintptr_t)
+		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
+}
+
 /**
  * Create shared action.
  *
@@ -5100,6 +5460,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			handle = (struct rte_flow_action_handle *)
 				 (uintptr_t)cnt_id;
 		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5135,10 +5498,18 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
-	return flow_dv_action_update(dev, handle, update, error);
+	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	default:
+		return flow_dv_action_update(dev, handle, update, error);
+	}
 }
 
 /**
@@ -5177,6 +5548,8 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_destroy(dev, act_idx, error);
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -5330,6 +5703,8 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
+	case MLX5_INDIRECT_ACTION_TYPE_CT:
+		return flow_hw_conntrack_query(dev, act_idx, data, error);
 	default:
 		return flow_dv_action_query(dev, handle, data, error);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (9 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:46     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
                     ` (7 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

Add PMD implementation for HW steering VLAN push, pop and modify flow
actions.

HWS VLAN push flow action is triggered by a sequence of mandatory
OF_PUSH_VLAN, OF_SET_VLAN_VID and optional OF_SET_VLAN_PCP
flow actions commands.
The commands must be arranged in the exact order:
OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ].
In masked HWS VLAN push flow action template *ALL* the above flow
actions must be masked.
In non-masked HWS VLAN push flow action template *ALL* the above flow
actions must not be masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan / \
  of_set_vlan_vid \
  [ / of_set_vlan_pcp  ] / end \
mask \
  of_push_vlan ethertype 0 / \
  of_set_vlan_vid vlan_vid 0 \
  [ / of_set_vlan_pcp vlan_pcp 0 ] / end\

flow actions_template <port id> create \
actions_template_id <action id> \
template \
  of_push_vlan ethertype <E>/ \
  of_set_vlan_vid vlan_vid <VID>\
  [ / of_set_vlan_pcp  <PCP>] / end \
mask \
  of_push_vlan ethertype <type != 0> / \
  of_set_vlan_vid vlan_vid <vid_mask != 0>\
  [ / of_set_vlan_pcp vlan_pcp <pcp_mask != 0> ] / end\

HWS VLAN pop flow action is triggered by OF_POP_VLAN
flow action command.
HWS VLAN pop action template is always non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_pop_vlan / end mask of_pop_vlan / end

HWS VLAN VID modify flow action is triggered by a standalone
OF_SET_VLAN_VID flow action command.
HWS VLAN VID modify action template can be ether masked or non-masked.

Example:

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid / end mask of_set_vlan_vid vlan_vid 0 / end

flow actions_template <port id> create \
actions_template_id <action id> \
template of_set_vlan_vid vlan_vid 0x101 / end \
mask of_set_vlan_vid vlan_vid 0xffff / end

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5.h         |   2 +
 drivers/net/mlx5/mlx5_flow.h    |   4 +
 drivers/net/mlx5/mlx5_flow_dv.c |   2 +-
 drivers/net/mlx5/mlx5_flow_hw.c | 492 +++++++++++++++++++++++++++++---
 4 files changed, 463 insertions(+), 37 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a5565dc391..7a6f13536e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1678,6 +1678,8 @@ struct mlx5_priv {
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
+	struct mlx5dr_action *hw_push_vlan[MLX5DR_TABLE_TYPE_MAX];
+	struct mlx5dr_action *hw_pop_vlan[MLX5DR_TABLE_TYPE_MAX];
 	struct mlx5dr_action **hw_vport;
 	/* HW steering global drop action. */
 	struct mlx5dr_action *hw_drop[2];
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 9621c167d7..ae1d30b0d0 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2454,4 +2454,8 @@ int mlx5_flow_pattern_validate(struct rte_eth_dev *dev,
 		struct rte_flow_error *error);
 int flow_hw_table_update(struct rte_eth_dev *dev,
 			 struct rte_flow_error *error);
+int mlx5_flow_item_field_width(struct rte_eth_dev *dev,
+			   enum rte_flow_field_id field, int inherit,
+			   const struct rte_flow_attr *attr,
+			   struct rte_flow_error *error);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 0b65167451..b7f3ee3d7e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1322,7 +1322,7 @@ flow_dv_convert_action_modify_ipv6_dscp
 					     MLX5_MODIFICATION_TYPE_SET, error);
 }
 
-static int
+int
 mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 			   enum rte_flow_field_id field, int inherit,
 			   const struct rte_flow_attr *attr,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index ac349d35ef..0c4e18a4bd 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -44,12 +44,22 @@
 /* Lowest priority for HW non-root table. */
 #define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
+#define MLX5_HW_VLAN_PUSH_TYPE_IDX 0
+#define MLX5_HW_VLAN_PUSH_VID_IDX 1
+#define MLX5_HW_VLAN_PUSH_PCP_IDX 2
+
 static int flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev);
 static int flow_hw_translate_group(struct rte_eth_dev *dev,
 				   const struct mlx5_flow_template_table_cfg *cfg,
 				   uint32_t group,
 				   uint32_t *table_group,
 				   struct rte_flow_error *error);
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action);
 
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
@@ -1065,6 +1075,52 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	return 0;
 }
 
+static __rte_always_inline bool
+is_of_vlan_pcp_present(const struct rte_flow_action *actions)
+{
+	/*
+	 * Order of RTE VLAN push actions is
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	return actions[MLX5_HW_VLAN_PUSH_PCP_IDX].type ==
+		RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP;
+}
+
+static __rte_always_inline bool
+is_template_masked_push_vlan(const struct rte_flow_action_of_push_vlan *mask)
+{
+	/*
+	 * In masked push VLAN template all RTE push actions are masked.
+	 */
+	return mask && mask->ethertype != 0;
+}
+
+static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
+{
+/*
+ * OpenFlow Switch Specification defines 801.1q VID as 12+1 bits.
+ */
+	rte_be32_t type, vid, pcp;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	rte_be32_t vid_lo, vid_hi;
+#endif
+
+	type = ((const struct rte_flow_action_of_push_vlan *)
+		actions[MLX5_HW_VLAN_PUSH_TYPE_IDX].conf)->ethertype;
+	vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+		actions[MLX5_HW_VLAN_PUSH_VID_IDX].conf)->vlan_vid;
+	pcp = is_of_vlan_pcp_present(actions) ?
+	      ((const struct rte_flow_action_of_set_vlan_pcp *)
+		      actions[MLX5_HW_VLAN_PUSH_PCP_IDX].conf)->vlan_pcp : 0;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	vid_hi = vid & 0xff;
+	vid_lo = vid >> 8;
+	return (((vid_lo << 8) | (pcp << 5) | vid_hi) << 16) | type;
+#else
+	return (type << 16) | (pcp << 13) | vid;
+#endif
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1167,6 +1223,26 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				priv->hw_tag[!!attr->group];
 			flow_hw_rxq_flag_set(dev, true);
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_push_vlan[type];
+			if (is_template_masked_push_vlan(masks->conf))
+				acts->rule_acts[action_pos].push_vlan.vlan_hdr =
+					vlan_hdr_to_be32(actions);
+			else if (__flow_hw_act_data_general_append
+					(priv, acts, actions->type,
+					 actions - action_start, action_pos))
+				goto err;
+			actions += is_of_vlan_pcp_present(actions) ?
+					MLX5_HW_VLAN_PUSH_PCP_IDX :
+					MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_pos = at->actions_off[actions - at->actions];
+			acts->rule_acts[action_pos].action =
+				priv->hw_pop_vlan[type];
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			action_pos = at->actions_off[actions - at->actions];
 			if (masks->conf &&
@@ -1787,8 +1863,17 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		cnt_id_t cnt_id;
 
 		action = &actions[act_data->action_src];
-		MLX5_ASSERT(action->type == RTE_FLOW_ACTION_TYPE_INDIRECT ||
-			    (int)action->type == act_data->type);
+		/*
+		 * action template construction replaces
+		 * OF_SET_VLAN_VID with MODIFY_FIELD
+		 */
+		if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+			MLX5_ASSERT(act_data->type ==
+				    RTE_FLOW_ACTION_TYPE_MODIFY_FIELD);
+		else
+			MLX5_ASSERT(action->type ==
+				    RTE_FLOW_ACTION_TYPE_INDIRECT ||
+				    (int)action->type == act_data->type);
 		switch (act_data->type) {
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
@@ -1804,6 +1889,10 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			      (action->conf))->id);
 			rule_acts[act_data->action_dst].tag.value = tag;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			rule_acts[act_data->action_dst].push_vlan.vlan_hdr =
+				vlan_hdr_to_be32(action);
+			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 						action->conf)->group;
@@ -1855,10 +1944,16 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				    act_data->encap.len);
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			ret = flow_hw_modify_field_construct(job,
-							     act_data,
-							     hw_acts,
-							     action);
+			if (action->type == RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+				ret = flow_hw_set_vlan_vid_construct(dev, job,
+								     act_data,
+								     hw_acts,
+								     action);
+			else
+				ret = flow_hw_modify_field_construct(job,
+								     act_data,
+								     hw_acts,
+								     action);
 			if (ret)
 				return -1;
 			break;
@@ -2562,9 +2657,14 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			mlx5_ipool_destroy(tbl->flow);
 		mlx5_free(tbl);
 	}
-	rte_flow_error_set(error, err,
-			  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
-			  "fail to create rte table");
+	if (error != NULL) {
+		rte_flow_error_set(error, err,
+				error->type == RTE_FLOW_ERROR_TYPE_NONE ?
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED : error->type,
+				NULL,
+				error->message == NULL ?
+				"fail to create rte table" : error->message);
+	}
 	return NULL;
 }
 
@@ -2868,28 +2968,76 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 				uint16_t *ins_pos)
 {
 	uint16_t idx, total = 0;
-	bool ins = false;
+	uint16_t end_idx = UINT16_MAX;
 	bool act_end = false;
+	bool modify_field = false;
+	bool rss_or_queue = false;
 
 	MLX5_ASSERT(actions && masks);
 	MLX5_ASSERT(new_actions && new_masks);
 	MLX5_ASSERT(ins_actions && ins_masks);
 	for (idx = 0; !act_end; idx++) {
-		if (idx >= MLX5_HW_MAX_ACTS)
-			return -1;
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_RSS ||
-		    actions[idx].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
-			ins = true;
-			*ins_pos = idx;
-		}
-		if (actions[idx].type == RTE_FLOW_ACTION_TYPE_END)
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			/* It is assumed that application provided only single RSS/QUEUE action. */
+			MLX5_ASSERT(!rss_or_queue);
+			rss_or_queue = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
+			modify_field = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			end_idx = idx;
 			act_end = true;
+			break;
+		default:
+			break;
+		}
 	}
-	if (!ins)
+	if (!rss_or_queue)
 		return 0;
-	else if (idx == MLX5_HW_MAX_ACTS)
+	else if (idx >= MLX5_HW_MAX_ACTS)
 		return -1; /* No more space. */
 	total = idx;
+	/*
+	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
+	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
+	 * first MODIFY_FIELD flow action.
+	 */
+	if (modify_field) {
+		*ins_pos = end_idx;
+		goto insert_meta_copy;
+	}
+	/*
+	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
+	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	 */
+	act_end = false;
+	for (idx = 0; !act_end; idx++) {
+		switch (actions[idx].type) {
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+		case RTE_FLOW_ACTION_TYPE_METER:
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			*ins_pos = idx;
+			act_end = true;
+			break;
+		case RTE_FLOW_ACTION_TYPE_END:
+			act_end = true;
+			break;
+		default:
+			break;
+		}
+	}
+insert_meta_copy:
+	MLX5_ASSERT(*ins_pos != UINT16_MAX);
+	MLX5_ASSERT(*ins_pos < total);
 	/* Before the position, no change for the actions. */
 	for (idx = 0; idx < *ins_pos; idx++) {
 		new_actions[idx] = actions[idx];
@@ -2906,6 +3054,73 @@ flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
 	return 0;
 }
 
+static int
+flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
+				  const
+				  struct rte_flow_actions_template_attr *attr,
+				  const struct rte_flow_action *action,
+				  const struct rte_flow_action *mask,
+				  struct rte_flow_error *error)
+{
+#define X_FIELD(ptr, t, f) (((ptr)->conf) && ((t *)((ptr)->conf))->f)
+
+	const bool masked_push =
+		X_FIELD(mask + MLX5_HW_VLAN_PUSH_TYPE_IDX,
+			const struct rte_flow_action_of_push_vlan, ethertype);
+	bool masked_param;
+
+	/*
+	 * Mandatory actions order:
+	 * OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ]
+	 */
+	RTE_SET_USED(dev);
+	RTE_SET_USED(attr);
+	/* Check that mark matches OF_PUSH_VLAN */
+	if (mask[MLX5_HW_VLAN_PUSH_TYPE_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: mask does not match");
+	/* Check that the second template and mask items are SET_VLAN_VID */
+	if (action[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID ||
+	    mask[MLX5_HW_VLAN_PUSH_VID_IDX].type !=
+	    RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION,
+					  action, "OF_PUSH_VLAN: invalid actions order");
+	masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_VID_IDX,
+			       const struct rte_flow_action_of_set_vlan_vid,
+			       vlan_vid);
+	/*
+	 * PMD requires OF_SET_VLAN_VID mask to must match OF_PUSH_VLAN
+	 */
+	if (masked_push ^ masked_param)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "OF_SET_VLAN_VID: mask does not match OF_PUSH_VLAN");
+	if (is_of_vlan_pcp_present(action)) {
+		if (mask[MLX5_HW_VLAN_PUSH_PCP_IDX].type !=
+		     RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_PCP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  action, "OF_SET_VLAN_PCP: missing mask configuration");
+		masked_param = X_FIELD(mask + MLX5_HW_VLAN_PUSH_PCP_IDX,
+				       const struct
+				       rte_flow_action_of_set_vlan_pcp,
+				       vlan_pcp);
+		/*
+		 * PMD requires OF_SET_VLAN_PCP mask to must match OF_PUSH_VLAN
+		 */
+		if (masked_push ^ masked_param)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION, action,
+						  "OF_SET_VLAN_PCP: mask does not match OF_PUSH_VLAN");
+	}
+	return 0;
+#undef X_FIELD
+}
+
 static int
 flow_hw_actions_validate(struct rte_eth_dev *dev,
 			const struct rte_flow_actions_template_attr *attr,
@@ -2996,6 +3211,18 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			ret = flow_hw_validate_action_push_vlan
+					(dev, attr, action, mask, error);
+			if (ret != 0)
+				return ret;
+			i += is_of_vlan_pcp_present(action) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -3023,6 +3250,8 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
 	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
+	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
+	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
 };
 
 static int
@@ -3139,6 +3368,14 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				goto err_actions_num;
 			action_types[curr_off++] = MLX5DR_ACTION_TYP_FT;
 			break;
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			type = mlx5_hw_dr_action_types[at->actions[i].type];
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = type;
+			i += is_of_vlan_pcp_present(at->actions + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3166,6 +3403,89 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	return NULL;
 }
 
+static void
+flow_hw_set_vlan_vid(struct rte_eth_dev *dev,
+		     struct rte_flow_action *ra,
+		     struct rte_flow_action *rm,
+		     struct rte_flow_action_modify_field *spec,
+		     struct rte_flow_action_modify_field *mask,
+		     int set_vlan_vid_ix)
+{
+	struct rte_flow_error error;
+	const bool masked = rm[set_vlan_vid_ix].conf &&
+		(((const struct rte_flow_action_of_set_vlan_vid *)
+			rm[set_vlan_vid_ix].conf)->vlan_vid != 0);
+	const struct rte_flow_action_of_set_vlan_vid *conf =
+		ra[set_vlan_vid_ix].conf;
+	rte_be16_t vid = masked ? conf->vlan_vid : 0;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	*spec = (typeof(*spec)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	*mask = (typeof(*mask)) {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0xffffffff, .offset = 0xffffffff,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = masked ? (1U << width) - 1 : 0,
+			.offset = 0,
+		},
+		.width = 0xffffffff,
+	};
+	ra[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	ra[set_vlan_vid_ix].conf = spec;
+	rm[set_vlan_vid_ix].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+	rm[set_vlan_vid_ix].conf = mask;
+}
+
+static __rte_always_inline int
+flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
+			       struct mlx5_hw_q_job *job,
+			       struct mlx5_action_construct_data *act_data,
+			       const struct mlx5_hw_actions *hw_acts,
+			       const struct rte_flow_action *action)
+{
+	struct rte_flow_error error;
+	rte_be16_t vid = ((const struct rte_flow_action_of_set_vlan_vid *)
+			   action->conf)->vlan_vid;
+	int width = mlx5_flow_item_field_width(dev, RTE_FLOW_FIELD_VLAN_ID, 0,
+					       NULL, &error);
+	struct rte_flow_action_modify_field conf = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = RTE_FLOW_FIELD_VLAN_ID,
+			.level = 0, .offset = 0,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+			.level = vid,
+			.offset = 0,
+		},
+		.width = width,
+	};
+	struct rte_flow_action modify_action = {
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+		.conf = &conf
+	};
+
+	return flow_hw_modify_field_construct(job, act_data, hw_acts,
+					      &modify_action);
+}
+
 /**
  * Create flow action template.
  *
@@ -3191,14 +3511,18 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	int len, act_num, act_len, mask_len;
+	int len, act_len, mask_len;
+	unsigned int act_num;
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
-	uint16_t pos = MLX5_HW_MAX_ACTS;
+	uint16_t pos = UINT16_MAX;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
-	const struct rte_flow_action *ra;
-	const struct rte_flow_action *rm;
+	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
+	struct rte_flow_action *rm = (void *)(uintptr_t)masks;
+	int set_vlan_vid_ix = -1;
+	struct rte_flow_action_modify_field set_vlan_vid_spec = {0, };
+	struct rte_flow_action_modify_field set_vlan_vid_mask = {0, };
 	const struct rte_flow_action_modify_field rx_mreg = {
 		.operation = RTE_FLOW_MODIFY_SET,
 		.dst = {
@@ -3238,21 +3562,58 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
 	    priv->sh->config.dv_esw_en) {
+		/* Application should make sure only one Q/RSS exist in one rule. */
 		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
 						    tmp_action, tmp_mask, &pos)) {
 			rte_flow_error_set(error, EINVAL,
 					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					   "Failed to concatenate new action/mask");
 			return NULL;
+		} else if (pos != UINT16_MAX) {
+			ra = tmp_action;
+			rm = tmp_mask;
 		}
 	}
-	/* Application should make sure only one Q/RSS exist in one rule. */
-	if (pos == MLX5_HW_MAX_ACTS) {
-		ra = actions;
-		rm = masks;
-	} else {
-		ra = tmp_action;
-		rm = tmp_mask;
+	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		switch (ra[i].type) {
+		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
+		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
+			i += is_of_vlan_pcp_present(ra + i) ?
+				MLX5_HW_VLAN_PUSH_PCP_IDX :
+				MLX5_HW_VLAN_PUSH_VID_IDX;
+			break;
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			set_vlan_vid_ix = i;
+			break;
+		default:
+			break;
+		}
+	}
+	/*
+	 * Count flow actions to allocate required space for storing DR offsets and to check
+	 * if temporary buffer would not be overrun.
+	 */
+	act_num = i + 1;
+	if (act_num >= MLX5_HW_MAX_ACTS) {
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
+		return NULL;
+	}
+	if (set_vlan_vid_ix != -1) {
+		/* If temporary action buffer was not used, copy template actions to it */
+		if (ra == actions && rm == masks) {
+			for (i = 0; i < act_num; ++i) {
+				tmp_action[i] = actions[i];
+				tmp_mask[i] = masks[i];
+				if (actions[i].type == RTE_FLOW_ACTION_TYPE_END)
+					break;
+			}
+			ra = tmp_action;
+			rm = tmp_mask;
+		}
+		flow_hw_set_vlan_vid(dev, ra, rm,
+				     &set_vlan_vid_spec, &set_vlan_vid_mask,
+				     set_vlan_vid_ix);
 	}
 	act_len = rte_flow_conv(RTE_FLOW_CONV_OP_ACTIONS, NULL, 0, ra, error);
 	if (act_len <= 0)
@@ -3262,10 +3623,6 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	if (mask_len <= 0)
 		return NULL;
 	len += RTE_ALIGN(mask_len, 16);
-	/* Count flow actions to allocate required space for storing DR offsets. */
-	act_num = 0;
-	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i)
-		act_num++;
 	len += RTE_ALIGN(act_num * sizeof(*at->actions_off), 16);
 	at = mlx5_malloc(MLX5_MEM_ZERO, len + sizeof(*at),
 			 RTE_CACHE_LINE_SIZE, rte_socket_id());
@@ -4513,7 +4870,11 @@ flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
 		.attr = tx_tbl_attr,
 		.external = false,
 	};
-	struct rte_flow_error drop_err;
+	struct rte_flow_error drop_err = {
+		.type = RTE_FLOW_ERROR_TYPE_NONE,
+		.cause = NULL,
+		.message = NULL,
+	};
 
 	RTE_SET_USED(drop_err);
 	return flow_hw_table_create(dev, &tx_tbl_cfg, &pt, 1, &at, 1, &drop_err);
@@ -4794,6 +5155,60 @@ flow_hw_ct_pool_create(struct rte_eth_dev *dev,
 	return NULL;
 }
 
+static void
+flow_hw_destroy_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i < MLX5DR_TABLE_TYPE_MAX; i++) {
+		if (priv->hw_pop_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_pop_vlan[i]);
+			priv->hw_pop_vlan[i] = NULL;
+		}
+		if (priv->hw_push_vlan[i]) {
+			mlx5dr_action_destroy(priv->hw_push_vlan[i]);
+			priv->hw_push_vlan[i] = NULL;
+		}
+	}
+}
+
+static int
+flow_hw_create_vlan(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	enum mlx5dr_table_type i;
+	const enum mlx5dr_action_flags flags[MLX5DR_TABLE_TYPE_MAX] = {
+		MLX5DR_ACTION_FLAG_HWS_RX,
+		MLX5DR_ACTION_FLAG_HWS_TX,
+		MLX5DR_ACTION_FLAG_HWS_FDB
+	};
+
+	for (i = MLX5DR_TABLE_TYPE_NIC_RX; i <= MLX5DR_TABLE_TYPE_NIC_TX; i++) {
+		priv->hw_pop_vlan[i] =
+			mlx5dr_action_create_pop_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+		priv->hw_push_vlan[i] =
+			mlx5dr_action_create_push_vlan(priv->dr_ctx, flags[i]);
+		if (!priv->hw_pop_vlan[i])
+			return -ENOENT;
+	}
+	if (priv->sh->config.dv_esw_en && priv->master) {
+		priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_pop_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+		priv->hw_push_vlan[MLX5DR_TABLE_TYPE_FDB] =
+			mlx5dr_action_create_push_vlan
+				(priv->dr_ctx, MLX5DR_ACTION_FLAG_HWS_FDB);
+		if (!priv->hw_pop_vlan[MLX5DR_TABLE_TYPE_FDB])
+			return -ENOENT;
+	}
+	return 0;
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -4996,6 +5411,9 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	ret = flow_hw_create_vlan(dev);
+	if (ret)
+		goto err;
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -5013,6 +5431,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
 	mlx5_free(priv->hw_q);
@@ -5072,6 +5491,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		if (priv->hw_tag[i])
 			mlx5dr_action_destroy(priv->hw_tag[i]);
 	}
+	flow_hw_destroy_vlan(dev);
 	flow_hw_free_vport_actions(priv);
 	if (priv->acts_ipool) {
 		mlx5_ipool_destroy(priv->acts_ipool);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 12/18] net/mlx5: implement METER MARK indirect action for HWS
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (10 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:46     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 13/18] net/mlx5: add HWS AGE action support Suanming Mou
                     ` (6 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Ferruh Yigit, Matan Azrad, Viacheslav Ovsiienko
  Cc: dev, rasland, orika, Alexander Kozyrev

From: Alexander Kozyrev <akozyrev@nvidia.com>

Add ability to create an indirect action handle for METER_MARK.
It allows to share one Meter between several different actions.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
---
 doc/guides/nics/features/default.ini |   1 +
 doc/guides/nics/features/mlx5.ini    |   1 +
 doc/guides/nics/mlx5.rst             |   3 +
 drivers/net/mlx5/mlx5.c              |   4 +-
 drivers/net/mlx5/mlx5.h              |  33 +-
 drivers/net/mlx5/mlx5_flow.c         |   6 +
 drivers/net/mlx5/mlx5_flow.h         |  20 +-
 drivers/net/mlx5/mlx5_flow_aso.c     | 141 +++++++--
 drivers/net/mlx5/mlx5_flow_dv.c      | 145 ++++++++-
 drivers/net/mlx5/mlx5_flow_hw.c      | 437 +++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c   |  92 +++++-
 11 files changed, 778 insertions(+), 105 deletions(-)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index 67ba3567c2..54af499f57 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -200,3 +200,4 @@ set_ttl              =
 vf                   =
 vxlan_decap          =
 vxlan_encap          =
+meter_mark           =
diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini
index b129f5787d..de4b109c31 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -130,3 +130,4 @@ set_tp_src           = Y
 set_ttl              = Y
 vxlan_decap          = Y
 vxlan_encap          = Y
+meter_mark           = Y
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index e499c38dcf..12646550b0 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -105,6 +105,7 @@ Features
 - Sub-Function representors.
 - Sub-Function.
 - Matching on represented port.
+- Meter mark.
 
 
 Limitations
@@ -485,6 +486,8 @@ Limitations
     if meter has drop count
     or meter hierarchy contains any meter that uses drop count,
     it cannot be used by flow rule matching all ports.
+  - When using HWS flow engine (``dv_flow_en`` = 2),
+    Only meter mark action is supported.
 
 - Integrity:
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6490ac636c..64a0e6f31d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -442,7 +442,7 @@ mlx5_flow_aso_age_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_FLOW_HIT, 1);
 	if (err) {
 		mlx5_free(sh->aso_age_mng);
 		return -1;
@@ -763,7 +763,7 @@ mlx5_flow_aso_ct_mng_init(struct mlx5_dev_ctx_shared *sh)
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING);
+	err = mlx5_aso_queue_init(sh, ASO_OPC_MOD_CONNECTION_TRACKING, MLX5_ASO_CT_SQ_NUM);
 	if (err) {
 		mlx5_free(sh->ct_mng);
 		/* rte_errno should be extracted from the failure. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 7a6f13536e..2bf5bf553e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -976,12 +976,16 @@ enum mlx5_aso_mtr_type {
 
 /* Generic aso_flow_meter information. */
 struct mlx5_aso_mtr {
-	LIST_ENTRY(mlx5_aso_mtr) next;
+	union {
+		LIST_ENTRY(mlx5_aso_mtr) next;
+		struct mlx5_aso_mtr_pool *pool;
+	};
 	enum mlx5_aso_mtr_type type;
 	struct mlx5_flow_meter_info fm;
 	/**< Pointer to the next aso flow meter structure. */
 	uint8_t state; /**< ASO flow meter state. */
 	uint32_t offset;
+	enum rte_color init_color;
 };
 
 /* Generic aso_flow_meter pool structure. */
@@ -990,7 +994,11 @@ struct mlx5_aso_mtr_pool {
 	/*Must be the first in pool*/
 	struct mlx5_devx_obj *devx_obj;
 	/* The devx object of the minimum aso flow meter ID. */
+	struct mlx5dr_action *action; /* HWS action. */
+	struct mlx5_indexed_pool *idx_pool; /* HWS index pool. */
 	uint32_t index; /* Pool index in management structure. */
+	uint32_t nb_sq; /* Number of ASO SQ. */
+	struct mlx5_aso_sq *sq; /* ASO SQs. */
 };
 
 LIST_HEAD(aso_meter_list, mlx5_aso_mtr);
@@ -1691,6 +1699,7 @@ struct mlx5_priv {
 	struct mlx5_aso_ct_pools_mng *ct_mng;
 	/* Management data for ASO connection tracking. */
 	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
+	struct mlx5_aso_mtr_pool *hws_mpool; /* HW steering's Meter pool. */
 #endif
 };
 
@@ -2011,7 +2020,8 @@ void mlx5_pmd_socket_uninit(void);
 int mlx5_flow_meter_init(struct rte_eth_dev *dev,
 			 uint32_t nb_meters,
 			 uint32_t nb_meter_profiles,
-			 uint32_t nb_meter_policies);
+			 uint32_t nb_meter_policies,
+			 uint32_t nb_queues);
 void mlx5_flow_meter_uninit(struct rte_eth_dev *dev);
 int mlx5_flow_meter_ops_get(struct rte_eth_dev *dev, void *arg);
 struct mlx5_flow_meter_info *mlx5_flow_meter_find(struct mlx5_priv *priv,
@@ -2080,15 +2090,24 @@ eth_tx_burst_t mlx5_select_tx_function(struct rte_eth_dev *dev);
 
 /* mlx5_flow_aso.c */
 
+int mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+			    struct mlx5_aso_mtr_pool *hws_pool,
+			    struct mlx5_aso_mtr_pools_mng *pool_mng,
+			    uint32_t nb_queues);
+void mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh,
+			       struct mlx5_aso_mtr_pool *hws_pool,
+			       struct mlx5_aso_mtr_pools_mng *pool_mng);
 int mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
+			enum mlx5_access_aso_opc_mod aso_opc_mode,
+			uint32_t nb_queues);
 int mlx5_aso_flow_hit_queue_poll_start(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
-		enum mlx5_access_aso_opc_mod aso_opc_mod);
-int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
-		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk);
-int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+			   enum mlx5_access_aso_opc_mod aso_opc_mod);
+int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
+				 struct mlx5_aso_mtr *mtr,
+				 struct mlx5_mtr_bulk *bulk);
+int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 82692120b1..aeaeb15f80 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -4223,6 +4223,12 @@ flow_action_handles_translate(struct rte_eth_dev *dev,
 						MLX5_RTE_FLOW_ACTION_TYPE_COUNT;
 			translated[handle->index].conf = (void *)(uintptr_t)idx;
 			break;
+		case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+			translated[handle->index].type =
+						(enum rte_flow_action_type)
+						MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK;
+			translated[handle->index].conf = (void *)(uintptr_t)idx;
+			break;
 		case MLX5_INDIRECT_ACTION_TYPE_AGE:
 			if (priv->sh->flow_hit_aso_en) {
 				translated[handle->index].type =
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ae1d30b0d0..81bb7a70c1 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -46,6 +46,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_COUNT,
 	MLX5_RTE_FLOW_ACTION_TYPE_JUMP,
 	MLX5_RTE_FLOW_ACTION_TYPE_RSS,
+	MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
 };
 
 /* Private (internal) Field IDs for MODIFY_FIELD action. */
@@ -54,22 +55,23 @@ enum mlx5_rte_flow_field_id {
 			MLX5_RTE_FLOW_FIELD_META_REG,
 };
 
-#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 30
+#define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
 	MLX5_INDIRECT_ACTION_TYPE_COUNT,
 	MLX5_INDIRECT_ACTION_TYPE_CT,
+	MLX5_INDIRECT_ACTION_TYPE_METER_MARK,
 };
 
-/* Now, the maximal ports will be supported is 256, action number is 4M. */
-#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x100
+/* Now, the maximal ports will be supported is 16, action number is 32M. */
+#define MLX5_INDIRECT_ACT_CT_MAX_PORT 0x10
 
 #define MLX5_INDIRECT_ACT_CT_OWNER_SHIFT 22
 #define MLX5_INDIRECT_ACT_CT_OWNER_MASK (MLX5_INDIRECT_ACT_CT_MAX_PORT - 1)
 
-/* 30-31: type, 22-29: owner port, 0-21: index. */
+/* 29-31: type, 25-28: owner port, 0-24: index */
 #define MLX5_INDIRECT_ACT_CT_GEN_IDX(owner, index) \
 	((MLX5_INDIRECT_ACTION_TYPE_CT << MLX5_INDIRECT_ACTION_TYPE_OFFSET) | \
 	 (((owner) & MLX5_INDIRECT_ACT_CT_OWNER_MASK) << \
@@ -1117,6 +1119,7 @@ struct rte_flow_hw {
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
 	uint32_t cnt_id;
+	uint32_t mtr_id;
 	uint8_t rule[0]; /* HWS layer data struct. */
 } __rte_packed;
 
@@ -1168,6 +1171,9 @@ struct mlx5_action_construct_data {
 		struct {
 			uint32_t id;
 		} shared_counter;
+		struct {
+			uint32_t id;
+		} shared_meter;
 	};
 };
 
@@ -1251,6 +1257,7 @@ struct mlx5_hw_actions {
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
 	uint32_t cnt_id; /* Counter id. */
+	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
 };
@@ -1540,6 +1547,7 @@ flow_hw_get_reg_id(enum rte_flow_item_type type, uint32_t id)
 		 */
 		return REG_A;
 	case RTE_FLOW_ITEM_TYPE_CONNTRACK:
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
 		return mlx5_flow_hw_aso_tag;
 	case RTE_FLOW_ITEM_TYPE_TAG:
 		MLX5_ASSERT(id < MLX5_FLOW_HW_TAGS_MAX);
@@ -1925,10 +1933,10 @@ mlx5_aso_meter_by_idx(struct mlx5_priv *priv, uint32_t idx)
 	struct mlx5_aso_mtr_pools_mng *pools_mng =
 				&priv->sh->mtrmng->pools_mng;
 
-	/* Decrease to original index. */
-	idx--;
 	if (priv->mtr_bulk.aso)
 		return priv->mtr_bulk.aso + idx;
+	/* Decrease to original index. */
+	idx--;
 	MLX5_ASSERT(idx / MLX5_ASO_MTRS_PER_POOL < pools_mng->n);
 	rte_rwlock_read_lock(&pools_mng->resize_mtrwl);
 	pool = pools_mng->pools[idx / MLX5_ASO_MTRS_PER_POOL];
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index c00c07b891..a5f58301eb 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -275,6 +275,65 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
 	return -1;
 }
 
+void
+mlx5_aso_mtr_queue_uninit(struct mlx5_dev_ctx_shared *sh __rte_unused,
+			  struct mlx5_aso_mtr_pool *hws_pool,
+			  struct mlx5_aso_mtr_pools_mng *pool_mng)
+{
+	uint32_t i;
+
+	if (hws_pool) {
+		for (i = 0; i < hws_pool->nb_sq; i++)
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+		mlx5_free(hws_pool->sq);
+		return;
+	}
+	if (pool_mng)
+		mlx5_aso_destroy_sq(&pool_mng->sq);
+}
+
+int
+mlx5_aso_mtr_queue_init(struct mlx5_dev_ctx_shared *sh,
+				struct mlx5_aso_mtr_pool *hws_pool,
+				struct mlx5_aso_mtr_pools_mng *pool_mng,
+				uint32_t nb_queues)
+{
+	struct mlx5_common_device *cdev = sh->cdev;
+	struct mlx5_aso_sq *sq;
+	uint32_t i;
+
+	if (hws_pool) {
+		sq = mlx5_malloc(MLX5_MEM_ZERO,
+			sizeof(struct mlx5_aso_sq) * nb_queues,
+			RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+		if (!sq)
+			return -1;
+		hws_pool->sq = sq;
+		for (i = 0; i < nb_queues; i++) {
+			if (mlx5_aso_sq_create(cdev, hws_pool->sq + i,
+					       sh->tx_uar.obj,
+					       MLX5_ASO_QUEUE_LOG_DESC))
+				goto error;
+			mlx5_aso_mtr_init_sq(hws_pool->sq + i);
+		}
+		hws_pool->nb_sq = nb_queues;
+	}
+	if (pool_mng) {
+		if (mlx5_aso_sq_create(cdev, &pool_mng->sq,
+				       sh->tx_uar.obj,
+				       MLX5_ASO_QUEUE_LOG_DESC))
+			return -1;
+		mlx5_aso_mtr_init_sq(&pool_mng->sq);
+	}
+	return 0;
+error:
+	do {
+		if (&hws_pool->sq[i])
+			mlx5_aso_destroy_sq(hws_pool->sq + i);
+	} while (i--);
+	return -1;
+}
+
 /**
  * API to create and initialize Send Queue used for ASO access.
  *
@@ -282,13 +341,16 @@ mlx5_aso_sq_create(struct mlx5_common_device *cdev, struct mlx5_aso_sq *sq,
  *   Pointer to shared device context.
  * @param[in] aso_opc_mod
  *   Mode of ASO feature.
+ * @param[in] nb_queues
+ *   Number of Send Queues to create.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
-		    enum mlx5_access_aso_opc_mod aso_opc_mod)
+		    enum mlx5_access_aso_opc_mod aso_opc_mod,
+			uint32_t nb_queues)
 {
 	uint32_t sq_desc_n = 1 << MLX5_ASO_QUEUE_LOG_DESC;
 	struct mlx5_common_device *cdev = sh->cdev;
@@ -307,10 +369,9 @@ mlx5_aso_queue_init(struct mlx5_dev_ctx_shared *sh,
 		mlx5_aso_age_init_sq(&sh->aso_age_mng->aso_sq);
 		break;
 	case ASO_OPC_MOD_POLICER:
-		if (mlx5_aso_sq_create(cdev, &sh->mtrmng->pools_mng.sq,
-				       sh->tx_uar.obj, MLX5_ASO_QUEUE_LOG_DESC))
+		if (mlx5_aso_mtr_queue_init(sh, NULL,
+					    &sh->mtrmng->pools_mng, nb_queues))
 			return -1;
-		mlx5_aso_mtr_init_sq(&sh->mtrmng->pools_mng.sq);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		if (mlx5_aso_ct_queue_init(sh, sh->ct_mng, MLX5_ASO_CT_SQ_NUM))
@@ -343,7 +404,7 @@ mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 		sq = &sh->aso_age_mng->aso_sq;
 		break;
 	case ASO_OPC_MOD_POLICER:
-		sq = &sh->mtrmng->pools_mng.sq;
+		mlx5_aso_mtr_queue_uninit(sh, NULL, &sh->mtrmng->pools_mng);
 		break;
 	case ASO_OPC_MOD_CONNECTION_TRACKING:
 		mlx5_aso_ct_queue_uninit(sh, sh->ct_mng);
@@ -666,7 +727,8 @@ static uint16_t
 mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
-			       struct mlx5_mtr_bulk *bulk)
+			       struct mlx5_mtr_bulk *bulk,
+				   bool need_lock)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -679,11 +741,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	uint32_t param_le;
 	int id;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	res = size - (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!res)) {
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
-		rte_spinlock_unlock(&sq->sqsl);
+		if (need_lock)
+			rte_spinlock_unlock(&sq->sqsl);
 		return 0;
 	}
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
@@ -692,8 +756,11 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	fm = &aso_mtr->fm;
 	sq->elts[sq->head & mask].mtr = aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
-		pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
-				    mtrs[aso_mtr->offset]);
+		if (likely(sh->config.dv_flow_en == 2))
+			pool = aso_mtr->pool;
+		else
+			pool = container_of(aso_mtr, struct mlx5_aso_mtr_pool,
+					    mtrs[aso_mtr->offset]);
 		id = pool->devx_obj->id;
 	} else {
 		id = bulk->devx_obj->id;
@@ -756,7 +823,8 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
 }
 
@@ -779,7 +847,7 @@ mlx5_aso_mtrs_status_update(struct mlx5_aso_sq *sq, uint16_t aso_mtrs_nums)
 }
 
 static void
-mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
+mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 {
 	struct mlx5_aso_cq *cq = &sq->cq;
 	volatile struct mlx5_cqe *restrict cqe;
@@ -791,7 +859,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 	uint16_t n = 0;
 	int ret;
 
-	rte_spinlock_lock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_lock(&sq->sqsl);
 	max = (uint16_t)(sq->head - sq->tail);
 	if (unlikely(!max)) {
 		rte_spinlock_unlock(&sq->sqsl);
@@ -823,7 +892,8 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
 		rte_io_wmb();
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
 	}
-	rte_spinlock_unlock(&sq->sqsl);
+	if (need_lock)
+		rte_spinlock_unlock(&sq->sqsl);
 }
 
 /**
@@ -840,16 +910,31 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq)
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
 			struct mlx5_mtr_bulk *bulk)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2) &&
+	    mtr->type == ASO_METER_INDIRECT) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk))
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
+						   bulk, need_lock))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -873,17 +958,31 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh,
+mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr)
 {
-	struct mlx5_aso_sq *sq = &sh->mtrmng->pools_mng.sq;
+	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	bool need_lock;
 
+	if (likely(sh->config.dv_flow_en == 2) &&
+	    mtr->type == ASO_METER_INDIRECT) {
+		if (queue == MLX5_HW_INV_QUEUE) {
+			sq = &mtr->pool->sq[mtr->pool->nb_sq - 1];
+			need_lock = true;
+		} else {
+			sq = &mtr->pool->sq[queue];
+			need_lock = false;
+		}
+	} else {
+		sq = &sh->mtrmng->pools_mng.sq;
+		need_lock = true;
+	}
 	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 		return 0;
 	do {
-		mlx5_aso_mtr_completion_handle(sq);
+		mlx5_aso_mtr_completion_handle(sq, need_lock);
 		if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
 					    ASO_METER_READY)
 			return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index b7f3ee3d7e..d6a762d57d 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1383,6 +1383,7 @@ mlx5_flow_item_field_width(struct rte_eth_dev *dev,
 		return inherit < 0 ? 0 : inherit;
 	case RTE_FLOW_FIELD_IPV4_ECN:
 	case RTE_FLOW_FIELD_IPV6_ECN:
+	case RTE_FLOW_FIELD_METER_COLOR:
 		return 2;
 	default:
 		MLX5_ASSERT(false);
@@ -1852,6 +1853,31 @@ mlx5_flow_field_id_to_modify_info
 				info[idx].offset = data->offset;
 		}
 		break;
+	case RTE_FLOW_FIELD_METER_COLOR:
+		{
+			const uint32_t color_mask =
+				(UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+			int reg;
+
+			if (priv->sh->config.dv_flow_en == 2)
+				reg = flow_hw_get_reg_id
+					(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			else
+				reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR,
+						       0, error);
+			if (reg < 0)
+				return;
+			MLX5_ASSERT(reg != REG_NON);
+			MLX5_ASSERT((unsigned int)reg < RTE_DIM(reg_to_field));
+			info[idx] = (struct field_modify_info){4, 0,
+						reg_to_field[reg]};
+			if (mask)
+				mask[idx] = flow_modify_info_mask_32_masked
+					(width, data->offset, color_mask);
+			else
+				info[idx].offset = data->offset;
+		}
+		break;
 	case RTE_FLOW_FIELD_POINTER:
 	case RTE_FLOW_FIELD_VALUE:
 	default:
@@ -1909,7 +1935,9 @@ flow_dv_convert_action_modify_field
 		item.spec = conf->src.field == RTE_FLOW_FIELD_POINTER ?
 					(void *)(uintptr_t)conf->src.pvalue :
 					(void *)(uintptr_t)&conf->src.value;
-		if (conf->dst.field == RTE_FLOW_FIELD_META) {
+		if (conf->dst.field == RTE_FLOW_FIELD_META ||
+		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR) {
 			meta = *(const unaligned_uint32_t *)item.spec;
 			meta = rte_cpu_to_be_32(meta);
 			item.spec = &meta;
@@ -3683,6 +3711,69 @@ flow_dv_validate_action_aso_ct(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Validate METER_COLOR item.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] item
+ *   Item specification.
+ * @param[in] attr
+ *   Attributes of flow that includes this item.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_dv_validate_item_meter_color(struct rte_eth_dev *dev,
+			   const struct rte_flow_item *item,
+			   const struct rte_flow_attr *attr __rte_unused,
+			   struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_item_meter_color *spec = item->spec;
+	const struct rte_flow_item_meter_color *mask = item->mask;
+	struct rte_flow_item_meter_color nic_mask = {
+		.color = RTE_COLORS
+	};
+	int ret;
+
+	if (priv->mtr_color_reg == REG_NON)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ITEM, item,
+					  "meter color register"
+					  " isn't available");
+	ret = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, error);
+	if (ret < 0)
+		return ret;
+	if (!spec)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ITEM_SPEC,
+					  item->spec,
+					  "data cannot be empty");
+	if (spec->color > RTE_COLORS)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  &spec->color,
+					  "meter color is invalid");
+	if (!mask)
+		mask = &rte_flow_item_meter_color_mask;
+	if (!mask->color)
+		return rte_flow_error_set(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ITEM_SPEC, NULL,
+					"mask cannot be zero");
+
+	ret = mlx5_flow_item_acceptable(item, (const uint8_t *)mask,
+				(const uint8_t *)&nic_mask,
+				sizeof(struct rte_flow_item_meter_color),
+				MLX5_ITEM_RANGE_NOT_ACCEPTED, error);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
+
 int
 flow_dv_encap_decap_match_cb(void *tool_ctx __rte_unused,
 			     struct mlx5_list_entry *entry, void *cb_ctx)
@@ -6515,7 +6606,7 @@ flow_dv_mtr_container_resize(struct rte_eth_dev *dev)
 		return -ENOMEM;
 	}
 	if (!pools_mng->n)
-		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
+		if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER, 1)) {
 			mlx5_free(pools);
 			return -ENOMEM;
 		}
@@ -7417,6 +7508,13 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			if (ret < 0)
 				return ret;
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+			ret = flow_dv_validate_item_meter_color(dev, items,
+								attr, error);
+			if (ret < 0)
+				return ret;
+			last_item = MLX5_FLOW_ITEM_METER_COLOR;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
@@ -10510,6 +10608,45 @@ flow_dv_translate_item_flex(struct rte_eth_dev *dev, void *matcher, void *key,
 	mlx5_flex_flow_translate_item(dev, matcher, key, item, is_inner);
 }
 
+/**
+ * Add METER_COLOR item to matcher
+ *
+ * @param[in] dev
+ *   The device to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] item
+ *   Flow pattern to translate.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ */
+static void
+flow_dv_translate_item_meter_color(struct rte_eth_dev *dev, void *key,
+			    const struct rte_flow_item *item,
+			    uint32_t key_type)
+{
+	const struct rte_flow_item_meter_color *color_m = item->mask;
+	const struct rte_flow_item_meter_color *color_v = item->spec;
+	uint32_t value, mask;
+	int reg = REG_NON;
+
+	MLX5_ASSERT(color_v);
+	if (MLX5_ITEM_VALID(item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(item, key_type, color_v, color_m,
+		&rte_flow_item_meter_color_mask);
+	value = rte_col_2_mlx5_col(color_v->color);
+	mask = color_m ?
+		color_m->color : (UINT32_C(1) << MLX5_MTR_COLOR_BITS) - 1;
+	if (!!(key_type & MLX5_SET_MATCHER_SW))
+		reg = mlx5_flow_get_reg_id(dev, MLX5_MTR_COLOR, 0, NULL);
+	else
+		reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+	if (reg == REG_NON)
+		return;
+	flow_dv_match_meta_reg(key, (enum modify_reg)reg, value, mask);
+}
+
 static uint32_t matcher_zero[MLX5_ST_SZ_DW(fte_match_param)] = { 0 };
 
 #define HEADER_IS_ZERO(match_criteria, headers)				     \
@@ -13307,6 +13444,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		/* No other protocol should follow eCPRI layer. */
 		last_item = MLX5_FLOW_LAYER_ECPRI;
 		break;
+	case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		flow_dv_translate_item_meter_color(dev, key, items, key_type);
+		last_item = MLX5_FLOW_ITEM_METER_COLOR;
+		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 0c4e18a4bd..6a1c82978f 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -412,6 +412,10 @@ __flow_hw_action_template_destroy(struct rte_eth_dev *dev,
 		mlx5_hws_cnt_shared_put(priv->hws_cpool, &acts->cnt_id);
 		acts->cnt_id = 0;
 	}
+	if (acts->mtr_id) {
+		mlx5_ipool_free(priv->hws_mpool->idx_pool, acts->mtr_id);
+		acts->mtr_id = 0;
+	}
 }
 
 /**
@@ -628,6 +632,42 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 	return 0;
 }
 
+/**
+ * Append shared meter_mark action to the dynamic action list.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] acts
+ *   Pointer to the template HW steering DR actions.
+ * @param[in] type
+ *   Action type.
+ * @param[in] action_src
+ *   Offset of source rte flow action.
+ * @param[in] action_dst
+ *   Offset of destination DR action.
+ * @param[in] mtr_id
+ *   Shared meter id.
+ *
+ * @return
+ *    0 on success, negative value otherwise and rte_errno is set.
+ */
+static __rte_always_inline int
+__flow_hw_act_data_shared_mtr_append(struct mlx5_priv *priv,
+				     struct mlx5_hw_actions *acts,
+				     enum rte_flow_action_type type,
+				     uint16_t action_src,
+				     uint16_t action_dst,
+				     cnt_id_t mtr_id)
+{	struct mlx5_action_construct_data *act_data;
+
+	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
+	if (!act_data)
+		return -1;
+	act_data->type = type;
+	act_data->shared_meter.id = mtr_id;
+	LIST_INSERT_HEAD(&acts->act_list, act_data, next);
+	return 0;
+}
 
 /**
  * Translate shared indirect action.
@@ -682,6 +722,13 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 				       idx, &acts->rule_acts[action_dst]))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		if (__flow_hw_act_data_shared_mtr_append(priv, acts,
+			(enum rte_flow_action_type)
+			MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK,
+			action_src, action_dst, idx))
+			return -1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -888,6 +935,7 @@ flow_hw_modify_field_compile(struct rte_eth_dev *dev,
 				(void *)(uintptr_t)&conf->src.value;
 		if (conf->dst.field == RTE_FLOW_FIELD_META ||
 		    conf->dst.field == RTE_FLOW_FIELD_TAG ||
+		    conf->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 		    conf->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 			value = *(const unaligned_uint32_t *)item.spec;
 			value = rte_cpu_to_be_32(value);
@@ -1047,7 +1095,7 @@ flow_hw_meter_compile(struct rte_eth_dev *dev,
 	acts->rule_acts[jump_pos].action = (!!group) ?
 				    acts->jump->hws_action :
 				    acts->jump->root_action;
-	if (mlx5_aso_mtr_wait(priv->sh, aso_mtr))
+	if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 		return -ENOMEM;
 	return 0;
 }
@@ -1121,6 +1169,74 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 #endif
 }
 
+static __rte_always_inline struct mlx5_aso_mtr *
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
+			   const struct rte_flow_action *action,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_action_meter_mark *meter_mark = action->conf;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t mtr_id;
+
+	aso_mtr = mlx5_ipool_malloc(priv->hws_mpool->idx_pool, &mtr_id);
+	if (!aso_mtr)
+		return NULL;
+	/* Fill the flow meter parameters. */
+	aso_mtr->type = ASO_METER_INDIRECT;
+	fm = &aso_mtr->fm;
+	fm->meter_id = mtr_id;
+	fm->profile = (struct mlx5_flow_meter_profile *)(meter_mark->profile);
+	fm->is_enable = meter_mark->state;
+	fm->color_aware = meter_mark->color_mode;
+	aso_mtr->pool = pool;
+	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->offset = mtr_id - 1;
+	aso_mtr->init_color = (meter_mark->color_mode) ?
+		meter_mark->init_color : RTE_COLOR_GREEN;
+	/* Update ASO flow meter by wqe. */
+	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+					 &priv->mtr_bulk)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	/* Wait for ASO object completion. */
+	if (queue == MLX5_HW_INV_QUEUE &&
+	    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+		mlx5_ipool_free(pool->idx_pool, mtr_id);
+		return NULL;
+	}
+	return aso_mtr;
+}
+
+static __rte_always_inline int
+flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
+			   uint16_t aso_mtr_pos,
+			   const struct rte_flow_action *action,
+			   struct mlx5dr_rule_action *acts,
+			   uint32_t *index,
+			   uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+
+	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	if (!aso_mtr)
+		return -1;
+
+	/* Compile METER_MARK action */
+	acts[aso_mtr_pos].action = pool->action;
+	acts[aso_mtr_pos].aso_meter.offset = aso_mtr->offset;
+	acts[aso_mtr_pos].aso_meter.init_color =
+		(enum mlx5dr_action_aso_meter_color)
+		rte_col_2_mlx5_col(aso_mtr->init_color);
+	*index = aso_mtr->fm.meter_id;
+	return 0;
+}
+
 /**
  * Translate rte_flow actions to DR action.
  *
@@ -1431,6 +1547,24 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 				goto err;
 			}
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			action_pos = at->actions_off[actions - at->actions];
+			if (actions->conf && masks->conf &&
+			    ((const struct rte_flow_action_meter_mark *)
+			     masks->conf)->profile) {
+				err = flow_hw_meter_mark_compile(dev,
+							action_pos, actions,
+							acts->rule_acts,
+							&acts->mtr_id,
+							MLX5_HW_INV_QUEUE);
+				if (err)
+					goto err;
+			} else if (__flow_hw_act_data_general_append(priv, acts,
+							actions->type,
+							actions - action_start,
+							action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
 			break;
@@ -1627,8 +1761,10 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
+	struct mlx5_aso_mtr *aso_mtr;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
@@ -1664,6 +1800,17 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return -1;
+		rule_act->action = pool->action;
+		rule_act->aso_meter.offset = aso_mtr->offset;
+		rule_act->aso_meter.init_color =
+			(enum mlx5dr_action_aso_meter_color)
+			rte_col_2_mlx5_col(aso_mtr->init_color);
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type:%d", type);
 		break;
@@ -1733,6 +1880,7 @@ flow_hw_modify_field_construct(struct mlx5_hw_q_job *job,
 		rte_memcpy(values, mhdr_action->src.pvalue, sizeof(values));
 	if (mhdr_action->dst.field == RTE_FLOW_FIELD_META ||
 	    mhdr_action->dst.field == RTE_FLOW_FIELD_TAG ||
+	    mhdr_action->dst.field == RTE_FLOW_FIELD_METER_COLOR ||
 	    mhdr_action->dst.field == (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG) {
 		value_p = (unaligned_uint32_t *)values;
 		*value_p = rte_cpu_to_be_32(*value_p);
@@ -1810,6 +1958,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  uint32_t queue)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct rte_flow_template_table *table = job->flow->table;
 	struct mlx5_action_construct_data *act_data;
 	const struct rte_flow_actions_template *at = hw_at->action_template;
@@ -1826,8 +1975,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
-	struct mlx5_aso_mtr *mtr;
-	uint32_t mtr_id;
+	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
 	attr.group = table->grp->group_id;
@@ -1861,6 +2009,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		struct mlx5_hrxq *hrxq;
 		uint32_t ct_idx;
 		cnt_id_t cnt_id;
+		uint32_t mtr_id;
 
 		action = &actions[act_data->action_src];
 		/*
@@ -1967,13 +2116,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			meter = action->conf;
 			mtr_id = meter->mtr_id;
-			mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
+			aso_mtr = mlx5_aso_meter_by_idx(priv, mtr_id);
 			rule_acts[act_data->action_dst].action =
 				priv->mtr_bulk.action;
 			rule_acts[act_data->action_dst].aso_meter.offset =
-								mtr->offset;
+								aso_mtr->offset;
 			jump = flow_hw_jump_action_register
-				(dev, &table->cfg, mtr->fm.group, NULL);
+				(dev, &table->cfg, aso_mtr->fm.group, NULL);
 			if (!jump)
 				return -1;
 			MLX5_ASSERT
@@ -1983,7 +2132,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 							 jump->root_action;
 			job->flow->jump = jump;
 			job->flow->fate_type = MLX5_FLOW_FATE_JUMP;
-			if (mlx5_aso_mtr_wait(priv->sh, mtr))
+			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
@@ -2019,6 +2168,28 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 					       &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_METER_MARK:
+			mtr_id = act_data->shared_meter.id &
+				((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+			/* Find ASO object. */
+			aso_mtr = mlx5_ipool_get(pool->idx_pool, mtr_id);
+			if (!aso_mtr)
+				return -1;
+			rule_acts[act_data->action_dst].action =
+							pool->action;
+			rule_acts[act_data->action_dst].aso_meter.offset =
+							aso_mtr->offset;
+			rule_acts[act_data->action_dst].aso_meter.init_color =
+				(enum mlx5dr_action_aso_meter_color)
+				rte_col_2_mlx5_col(aso_mtr->init_color);
+			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			ret = flow_hw_meter_mark_compile(dev,
+				act_data->action_dst, action,
+				rule_acts, &job->flow->mtr_id, queue);
+			if (ret != 0)
+				return ret;
+			break;
 		default:
 			break;
 		}
@@ -2286,6 +2457,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	     struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
@@ -2310,6 +2482,10 @@ flow_hw_pull(struct rte_eth_dev *dev,
 						&job->flow->cnt_id);
 				job->flow->cnt_id = 0;
 			}
+			if (job->flow->mtr_id) {
+				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
+				job->flow->mtr_id = 0;
+			}
 			mlx5_ipool_free(job->flow->table->flow, job->flow->idx);
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
@@ -3192,6 +3368,9 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/* TODO: Validation logic */
+			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
 									mask,
@@ -3285,6 +3464,11 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_CT;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		at->actions_off[action_src] = *curr_off;
+		action_types[*curr_off] = MLX5DR_ACTION_TYP_ASO_METER;
+		*curr_off = *curr_off + 1;
+		break;
 	default:
 		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
 		return -EINVAL;
@@ -3376,6 +3560,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
 			break;
+		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			at->actions_off[i] = curr_off;
+			action_types[curr_off++] = MLX5DR_ACTION_TYP_ASO_METER;
+			if (curr_off >= MLX5_HW_MAX_ACTS)
+				goto err_actions_num;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3851,6 +4041,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 								  " attribute");
 			}
 			break;
+		case RTE_FLOW_ITEM_TYPE_METER_COLOR:
+		{
+			int reg = flow_hw_get_reg_id(RTE_FLOW_ITEM_TYPE_METER_COLOR, 0);
+			if (reg == REG_NON)
+				return rte_flow_error_set(error, EINVAL,
+							  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+							  NULL,
+							  "Unsupported meter color register");
+			break;
+		}
 		case RTE_FLOW_ITEM_TYPE_VOID:
 		case RTE_FLOW_ITEM_TYPE_ETH:
 		case RTE_FLOW_ITEM_TYPE_VLAN:
@@ -5360,7 +5560,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	LIST_INIT(&priv->hw_ctrl_flows);
 	/* Initialize meter library*/
 	if (port_attr->nb_meters)
-		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1))
+		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1, nb_q_updated))
 			goto err;
 	/* Add global actions. */
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
@@ -5864,7 +6064,9 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
+	uint32_t mtr_id;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5883,6 +6085,14 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
 		break;
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		if (!aso_mtr)
+			break;
+		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
+			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
+		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
+		break;
 	default:
 		handle = flow_dv_action_create(dev, conf, action, error);
 	}
@@ -5918,18 +6128,59 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
-	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
-
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_update_meter_mark *upd_meter_mark =
+		(const struct rte_flow_update_meter_mark *)update;
+	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
+	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
+	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		meter_mark = &upd_meter_mark->meter_mark;
+		/* Find ASO object. */
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark update index");
+		fm = &aso_mtr->fm;
+		if (upd_meter_mark->profile_valid)
+			fm->profile = (struct mlx5_flow_meter_profile *)
+							(meter_mark->profile);
+		if (upd_meter_mark->color_mode_valid)
+			fm->color_aware = meter_mark->color_mode;
+		if (upd_meter_mark->init_color_valid)
+			aso_mtr->init_color = (meter_mark->color_mode) ?
+				meter_mark->init_color : RTE_COLOR_GREEN;
+		if (upd_meter_mark->state_valid)
+			fm->is_enable = meter_mark->state;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
+						 aso_mtr, &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		return 0;
 	default:
-		return flow_dv_action_update(dev, handle, update, error);
+		break;
 	}
+	return flow_dv_action_update(dev, handle, update, error);
 }
 
 /**
@@ -5960,7 +6211,11 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_flow_meter_info *fm;
 
 	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
@@ -5970,6 +6225,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
+	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
+		if (!aso_mtr)
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Invalid meter_mark destroy index");
+		fm = &aso_mtr->fm;
+		fm->is_enable = 0;
+		/* Update ASO flow meter by wqe. */
+		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
+						 &priv->mtr_bulk))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to update ASO meter WQE");
+		/* Wait for ASO object completion. */
+		if (queue == MLX5_HW_INV_QUEUE &&
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
+			return rte_flow_error_set(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				NULL, "Unable to wait for ASO meter CQE");
+		mlx5_ipool_free(pool->idx_pool, idx);
+		return 0;
 	default:
 		return flow_dv_action_destroy(dev, handle, error);
 	}
@@ -6053,8 +6330,8 @@ flow_hw_action_create(struct rte_eth_dev *dev,
 		       const struct rte_flow_action *action,
 		       struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_create(dev, UINT32_MAX, NULL, conf, action,
-					    NULL, err);
+	return flow_hw_action_handle_create(dev, MLX5_HW_INV_QUEUE,
+					    NULL, conf, action, NULL, err);
 }
 
 /**
@@ -6079,8 +6356,8 @@ flow_hw_action_destroy(struct rte_eth_dev *dev,
 		       struct rte_flow_action_handle *handle,
 		       struct rte_flow_error *error)
 {
-	return flow_hw_action_handle_destroy(dev, UINT32_MAX, NULL, handle,
-			NULL, error);
+	return flow_hw_action_handle_destroy(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, NULL, error);
 }
 
 /**
@@ -6108,8 +6385,8 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 		      const void *update,
 		      struct rte_flow_error *err)
 {
-	return flow_hw_action_handle_update(dev, UINT32_MAX, NULL, handle,
-			update, NULL, err);
+	return flow_hw_action_handle_update(dev, MLX5_HW_INV_QUEUE,
+			NULL, handle, update, NULL, err);
 }
 
 static int
@@ -6639,6 +6916,12 @@ mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 		mlx5_free(priv->mtr_profile_arr);
 		priv->mtr_profile_arr = NULL;
 	}
+	if (priv->hws_mpool) {
+		mlx5_aso_mtr_queue_uninit(priv->sh, priv->hws_mpool, NULL);
+		mlx5_ipool_destroy(priv->hws_mpool->idx_pool);
+		mlx5_free(priv->hws_mpool);
+		priv->hws_mpool = NULL;
+	}
 	if (priv->mtr_bulk.aso) {
 		mlx5_free(priv->mtr_bulk.aso);
 		priv->mtr_bulk.aso = NULL;
@@ -6659,7 +6942,8 @@ int
 mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		     uint32_t nb_meters,
 		     uint32_t nb_meter_profiles,
-		     uint32_t nb_meter_policies)
+		     uint32_t nb_meter_policies,
+		     uint32_t nb_queues)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_devx_obj *dcs = NULL;
@@ -6669,29 +6953,35 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr *aso;
 	uint32_t i;
 	struct rte_flow_error error;
+	uint32_t flags;
+	uint32_t nb_mtrs = rte_align32pow2(nb_meters);
+	struct mlx5_indexed_pool_config cfg = {
+		.size = sizeof(struct mlx5_aso_mtr),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.max_idx = nb_meters,
+		.free = mlx5_free,
+		.type = "mlx5_hw_mtr_mark_action",
+	};
 
 	if (!nb_meters || !nb_meter_profiles || !nb_meter_policies) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter configuration is invalid.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter configuration is invalid.");
 		goto err;
 	}
 	if (!priv->mtr_en || !priv->sh->meter_aso_en) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO is not supported.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO is not supported.");
 		goto err;
 	}
 	priv->mtr_config.nb_meters = nb_meters;
-	if (mlx5_aso_queue_init(priv->sh, ASO_OPC_MOD_POLICER)) {
-		ret = ENOMEM;
-		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO queue allocation failed.");
-		goto err;
-	}
 	log_obj_size = rte_log2_u32(nb_meters >> 1);
 	dcs = mlx5_devx_cmd_create_flow_meter_aso_obj
 		(priv->sh->cdev->ctx, priv->sh->cdev->pdn,
@@ -6699,8 +6989,8 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (!dcs) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter ASO object allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO object allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.devx_obj = dcs;
@@ -6708,31 +6998,33 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 	if (reg_id < 0) {
 		ret = ENOTSUP;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter register is not available.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter register is not available.");
 		goto err;
 	}
+	flags = MLX5DR_ACTION_FLAG_HWS_RX | MLX5DR_ACTION_FLAG_HWS_TX;
+	if (priv->sh->config.dv_esw_en && priv->master)
+		flags |= MLX5DR_ACTION_FLAG_HWS_FDB;
 	priv->mtr_bulk.action = mlx5dr_action_create_aso_meter
 			(priv->dr_ctx, (struct mlx5dr_devx_obj *)dcs,
-				reg_id - REG_C_0, MLX5DR_ACTION_FLAG_HWS_RX |
-				MLX5DR_ACTION_FLAG_HWS_TX |
-				MLX5DR_ACTION_FLAG_HWS_FDB);
+				reg_id - REG_C_0, flags);
 	if (!priv->mtr_bulk.action) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter action creation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter action creation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.aso = mlx5_malloc(MLX5_MEM_ZERO,
-						sizeof(struct mlx5_aso_mtr) * nb_meters,
-						RTE_CACHE_LINE_SIZE,
-						SOCKET_ID_ANY);
+					 sizeof(struct mlx5_aso_mtr) *
+					 nb_meters,
+					 RTE_CACHE_LINE_SIZE,
+					 SOCKET_ID_ANY);
 	if (!priv->mtr_bulk.aso) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter bulk ASO allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter bulk ASO allocation failed.");
 		goto err;
 	}
 	priv->mtr_bulk.size = nb_meters;
@@ -6743,32 +7035,65 @@ mlx5_flow_meter_init(struct rte_eth_dev *dev,
 		aso->offset = i;
 		aso++;
 	}
+	priv->hws_mpool = mlx5_malloc(MLX5_MEM_ZERO,
+				sizeof(struct mlx5_aso_mtr_pool),
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	if (!priv->hws_mpool) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ipool allocation failed.");
+		goto err;
+	}
+	priv->hws_mpool->devx_obj = priv->mtr_bulk.devx_obj;
+	priv->hws_mpool->action = priv->mtr_bulk.action;
+	priv->hws_mpool->nb_sq = nb_queues;
+	if (mlx5_aso_mtr_queue_init(priv->sh, priv->hws_mpool,
+				    &priv->sh->mtrmng->pools_mng, nb_queues)) {
+		ret = ENOMEM;
+		rte_flow_error_set(&error, ENOMEM,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter ASO queue allocation failed.");
+		goto err;
+	}
+	/*
+	 * No need for local cache if Meter number is a small number.
+	 * Since flow insertion rate will be very limited in that case.
+	 * Here let's set the number to less than default trunk size 4K.
+	 */
+	if (nb_mtrs <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = nb_mtrs;
+	} else if (nb_mtrs <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	priv->hws_mpool->idx_pool = mlx5_ipool_create(&cfg);
 	priv->mtr_config.nb_meter_profiles = nb_meter_profiles;
 	priv->mtr_profile_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_profile) *
-				nb_meter_profiles,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_profile) *
+			    nb_meter_profiles,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_profile_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter profile allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter profile allocation failed.");
 		goto err;
 	}
 	priv->mtr_config.nb_meter_policies = nb_meter_policies;
 	priv->mtr_policy_arr =
 		mlx5_malloc(MLX5_MEM_ZERO,
-				sizeof(struct mlx5_flow_meter_policy) *
-				nb_meter_policies,
-				RTE_CACHE_LINE_SIZE,
-				SOCKET_ID_ANY);
+			    sizeof(struct mlx5_flow_meter_policy) *
+			    nb_meter_policies,
+			    RTE_CACHE_LINE_SIZE,
+			    SOCKET_ID_ANY);
 	if (!priv->mtr_policy_arr) {
 		ret = ENOMEM;
 		rte_flow_error_set(&error, ENOMEM,
-					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					NULL, "Meter policy allocation failed.");
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, "Meter policy allocation failed.");
 		goto err;
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index 8cf24d1f7a..ed2306283d 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -588,6 +588,36 @@ mlx5_flow_meter_profile_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR profile.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] meter_profile_id
+ *   Meter profile id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_profile *
+mlx5_flow_meter_profile_get(struct rte_eth_dev *dev,
+			  uint32_t meter_profile_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_profile_find(priv,
+							meter_profile_id);
+}
+
 /**
  * Callback to add MTR profile with HWS.
  *
@@ -1150,6 +1180,37 @@ mlx5_flow_meter_policy_delete(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/**
+ * Callback to get MTR policy.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] policy_id
+ *   Meter policy id.
+ * @param[out] error
+ *   Pointer to the error structure.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise.
+ */
+static struct rte_flow_meter_policy *
+mlx5_flow_meter_policy_get(struct rte_eth_dev *dev,
+			  uint32_t policy_id,
+			  struct rte_mtr_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t policy_idx;
+
+	if (!priv->mtr_en) {
+		rte_mtr_error_set(error, ENOTSUP,
+				  RTE_MTR_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "Meter is not supported");
+		return NULL;
+	}
+	return (void *)(uintptr_t)mlx5_flow_meter_policy_find(dev, policy_id,
+							      &policy_idx);
+}
+
 /**
  * Callback to delete MTR policy for HWS.
  *
@@ -1310,9 +1371,9 @@ mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
 			RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
 			NULL, "Meter policy already exists.");
 	if (!policy ||
-	    !policy->actions[RTE_COLOR_RED] ||
-	    !policy->actions[RTE_COLOR_YELLOW] ||
-	    !policy->actions[RTE_COLOR_GREEN])
+	    (!policy->actions[RTE_COLOR_RED] &&
+	    !policy->actions[RTE_COLOR_YELLOW] &&
+	    !policy->actions[RTE_COLOR_GREEN]))
 		return -rte_mtr_error_set(error, EINVAL,
 					  RTE_MTR_ERROR_TYPE_METER_POLICY,
 					  NULL, "Meter policy actions are not valid.");
@@ -1372,6 +1433,11 @@ mlx5_flow_meter_policy_hws_add(struct rte_eth_dev *dev,
 			act++;
 		}
 	}
+	if (priv->sh->config.dv_esw_en)
+		domain_color &= ~(MLX5_MTR_DOMAIN_EGRESS_BIT |
+				  MLX5_MTR_DOMAIN_TRANSFER_BIT);
+	else
+		domain_color &= ~MLX5_MTR_DOMAIN_TRANSFER_BIT;
 	if (!domain_color)
 		return -rte_mtr_error_set(error, ENOTSUP,
 					  RTE_MTR_ERROR_TYPE_METER_POLICY_ID,
@@ -1565,11 +1631,11 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 	if (priv->sh->meter_aso_en) {
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_mtr_wait(priv->sh, aso_mtr);
+		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
 		if (ret)
 			return ret;
 	} else {
@@ -1815,8 +1881,8 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	/* If ASO meter supported, update ASO flow meter by wqe. */
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
-						   &priv->mtr_bulk);
+		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
+						   aso_mtr, &priv->mtr_bulk);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1921,7 +1987,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->shared = !!shared;
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
-	ret = mlx5_aso_meter_update_by_wqe(priv->sh, aso_mtr,
+	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
 					   &priv->mtr_bulk);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
@@ -2401,9 +2467,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_create,
 	.destroy = mlx5_flow_meter_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2418,9 +2486,11 @@ static const struct rte_mtr_ops mlx5_flow_mtr_hws_ops = {
 	.capabilities_get = mlx5_flow_mtr_cap_get,
 	.meter_profile_add = mlx5_flow_meter_profile_hws_add,
 	.meter_profile_delete = mlx5_flow_meter_profile_hws_delete,
+	.meter_profile_get = mlx5_flow_meter_profile_get,
 	.meter_policy_validate = mlx5_flow_meter_policy_hws_validate,
 	.meter_policy_add = mlx5_flow_meter_policy_hws_add,
 	.meter_policy_delete = mlx5_flow_meter_policy_hws_delete,
+	.meter_policy_get = mlx5_flow_meter_policy_get,
 	.create = mlx5_flow_meter_hws_create,
 	.destroy = mlx5_flow_meter_hws_destroy,
 	.meter_enable = mlx5_flow_meter_enable,
@@ -2566,7 +2636,7 @@ mlx5_flow_meter_attach(struct mlx5_priv *priv,
 		struct mlx5_aso_mtr *aso_mtr;
 
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
-		if (mlx5_aso_mtr_wait(priv->sh, aso_mtr)) {
+		if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
 			return rte_flow_error_set(error, ENOENT,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
@@ -2865,7 +2935,7 @@ mlx5_flow_meter_flush(struct rte_eth_dev *dev, struct rte_mtr_error *error)
 		}
 	}
 	if (priv->mtr_bulk.aso) {
-		for (i = 1; i <= priv->mtr_config.nb_meter_profiles; i++) {
+		for (i = 0; i < priv->mtr_config.nb_meters; i++) {
 			aso_mtr = mlx5_aso_meter_by_idx(priv, i);
 			fm = &aso_mtr->fm;
 			if (fm->initialized)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 13/18] net/mlx5: add HWS AGE action support
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (11 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:46     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 14/18] net/mlx5: add async action push and pull support Suanming Mou
                     ` (5 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Michael Baum

From: Michael Baum <michaelba@nvidia.com>

Add support for AGE action for HW steering.
This patch includes:

 1. Add new structures to manage the aging.
 2. Initialize all them in configure function.
 3. Implement per second aging check using CNT background thread.
 4. Enable AGE action in flow create/destroy operations.
 5. Implement queue-based function to report aged flow rules.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 doc/guides/nics/mlx5.rst           |   14 +
 drivers/net/mlx5/mlx5.c            |   67 +-
 drivers/net/mlx5/mlx5.h            |   51 +-
 drivers/net/mlx5/mlx5_defs.h       |    3 +
 drivers/net/mlx5/mlx5_flow.c       |   91 ++-
 drivers/net/mlx5/mlx5_flow.h       |   33 +-
 drivers/net/mlx5/mlx5_flow_dv.c    |   30 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 1145 ++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |    4 +-
 drivers/net/mlx5/mlx5_hws_cnt.c    |  753 +++++++++++++++++-
 drivers/net/mlx5/mlx5_hws_cnt.h    |  193 ++++-
 drivers/net/mlx5/mlx5_utils.h      |   10 +-
 12 files changed, 2127 insertions(+), 267 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 12646550b0..ae4d406ca1 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -560,6 +560,20 @@ Limitations
 - The NIC egress flow rules on representor port are not supported.
 
 
+- HWS AGE action in mlx5:
+
+  - Using the same indirect COUNT action combined with multiple AGE actions in
+    different flows may cause a wrong AGE state for the AGE actions.
+  - Creating/destroying flow rules with indirect AGE action when it is active
+    (timeout != 0) may cause a wrong AGE state for the indirect AGE action.
+  - The mlx5 driver reuses counters for aging action, so for optimization
+    the values in ``rte_flow_port_attr`` structure should describe:
+
+    - ``nb_counters`` is the number of flow rules using counter (with/without AGE)
+      in addition to flow rules using only AGE (without COUNT action).
+    - ``nb_aging_objects`` is the number of flow rules containing AGE action.
+
+
 Statistics
 ----------
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 64a0e6f31d..4e532f0807 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -497,6 +497,12 @@ mlx5_flow_aging_init(struct mlx5_dev_ctx_shared *sh)
 	uint32_t i;
 	struct mlx5_age_info *age_info;
 
+	/*
+	 * In HW steering, aging information structure is initialized later
+	 * during configure function.
+	 */
+	if (sh->config.dv_flow_en == 2)
+		return;
 	for (i = 0; i < sh->max_port; i++) {
 		age_info = &sh->port[i].age_info;
 		age_info->flags = 0;
@@ -540,8 +546,8 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 			hca_attr->flow_counter_bulk_alloc_bitmap);
 	/* Initialize fallback mode only on the port initializes sh. */
 	if (sh->refcnt == 1)
-		sh->cmng.counter_fallback = fallback;
-	else if (fallback != sh->cmng.counter_fallback)
+		sh->sws_cmng.counter_fallback = fallback;
+	else if (fallback != sh->sws_cmng.counter_fallback)
 		DRV_LOG(WARNING, "Port %d in sh has different fallback mode "
 			"with others:%d.", PORT_ID(priv), fallback);
 #endif
@@ -556,17 +562,38 @@ mlx5_flow_counter_mode_config(struct rte_eth_dev *dev __rte_unused)
 static void
 mlx5_flow_counters_mng_init(struct mlx5_dev_ctx_shared *sh)
 {
-	int i;
+	int i, j;
+
+	if (sh->config.dv_flow_en < 2) {
+		memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
+		TAILQ_INIT(&sh->sws_cmng.flow_counters);
+		sh->sws_cmng.min_id = MLX5_CNT_BATCH_OFFSET;
+		sh->sws_cmng.max_id = -1;
+		sh->sws_cmng.last_pool_idx = POOL_IDX_INVALID;
+		rte_spinlock_init(&sh->sws_cmng.pool_update_sl);
+		for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
+			TAILQ_INIT(&sh->sws_cmng.counters[i]);
+			rte_spinlock_init(&sh->sws_cmng.csl[i]);
+		}
+	} else {
+		struct mlx5_hca_attr *attr = &sh->cdev->config.hca_attr;
+		uint32_t fw_max_nb_cnts = attr->max_flow_counter;
+		uint8_t log_dcs = log2above(fw_max_nb_cnts) - 1;
+		uint32_t max_nb_cnts = 0;
+
+		for (i = 0, j = 0; j < MLX5_HWS_CNT_DCS_NUM; ++i) {
+			int log_dcs_i = log_dcs - i;
 
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
-	TAILQ_INIT(&sh->cmng.flow_counters);
-	sh->cmng.min_id = MLX5_CNT_BATCH_OFFSET;
-	sh->cmng.max_id = -1;
-	sh->cmng.last_pool_idx = POOL_IDX_INVALID;
-	rte_spinlock_init(&sh->cmng.pool_update_sl);
-	for (i = 0; i < MLX5_COUNTER_TYPE_MAX; i++) {
-		TAILQ_INIT(&sh->cmng.counters[i]);
-		rte_spinlock_init(&sh->cmng.csl[i]);
+			if (log_dcs_i < 0)
+				break;
+			if ((max_nb_cnts | RTE_BIT32(log_dcs_i)) >
+			    fw_max_nb_cnts)
+				continue;
+			max_nb_cnts |= RTE_BIT32(log_dcs_i);
+			j++;
+		}
+		sh->hws_max_log_bulk_sz = log_dcs;
+		sh->hws_max_nb_counters = max_nb_cnts;
 	}
 }
 
@@ -607,13 +634,13 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 		rte_pause();
 	}
 
-	if (sh->cmng.pools) {
+	if (sh->sws_cmng.pools) {
 		struct mlx5_flow_counter_pool *pool;
-		uint16_t n_valid = sh->cmng.n_valid;
-		bool fallback = sh->cmng.counter_fallback;
+		uint16_t n_valid = sh->sws_cmng.n_valid;
+		bool fallback = sh->sws_cmng.counter_fallback;
 
 		for (i = 0; i < n_valid; ++i) {
-			pool = sh->cmng.pools[i];
+			pool = sh->sws_cmng.pools[i];
 			if (!fallback && pool->min_dcs)
 				claim_zero(mlx5_devx_cmd_destroy
 							       (pool->min_dcs));
@@ -632,14 +659,14 @@ mlx5_flow_counters_mng_close(struct mlx5_dev_ctx_shared *sh)
 			}
 			mlx5_free(pool);
 		}
-		mlx5_free(sh->cmng.pools);
+		mlx5_free(sh->sws_cmng.pools);
 	}
-	mng = LIST_FIRST(&sh->cmng.mem_mngs);
+	mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	while (mng) {
 		mlx5_flow_destroy_counter_stat_mem_mng(mng);
-		mng = LIST_FIRST(&sh->cmng.mem_mngs);
+		mng = LIST_FIRST(&sh->sws_cmng.mem_mngs);
 	}
-	memset(&sh->cmng, 0, sizeof(sh->cmng));
+	memset(&sh->sws_cmng, 0, sizeof(sh->sws_cmng));
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 2bf5bf553e..482ec83c61 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -644,12 +644,45 @@ struct mlx5_geneve_tlv_option_resource {
 /* Current time in seconds. */
 #define MLX5_CURR_TIME_SEC	(rte_rdtsc() / rte_get_tsc_hz())
 
+/*
+ * HW steering queue oriented AGE info.
+ * It contains an array of rings, one for each HWS queue.
+ */
+struct mlx5_hws_q_age_info {
+	uint16_t nb_rings; /* Number of aged-out ring lists. */
+	struct rte_ring *aged_lists[]; /* Aged-out lists. */
+};
+
+/*
+ * HW steering AGE info.
+ * It has a ring list containing all aged out flow rules.
+ */
+struct mlx5_hws_age_info {
+	struct rte_ring *aged_list; /* Aged out lists. */
+};
+
 /* Aging information for per port. */
 struct mlx5_age_info {
 	uint8_t flags; /* Indicate if is new event or need to be triggered. */
-	struct mlx5_counters aged_counters; /* Aged counter list. */
-	struct aso_age_list aged_aso; /* Aged ASO actions list. */
-	rte_spinlock_t aged_sl; /* Aged flow list lock. */
+	union {
+		/* SW/FW steering AGE info. */
+		struct {
+			struct mlx5_counters aged_counters;
+			/* Aged counter list. */
+			struct aso_age_list aged_aso;
+			/* Aged ASO actions list. */
+			rte_spinlock_t aged_sl; /* Aged flow list lock. */
+		};
+		struct {
+			struct mlx5_indexed_pool *ages_ipool;
+			union {
+				struct mlx5_hws_age_info hw_age;
+				/* HW steering AGE info. */
+				struct mlx5_hws_q_age_info *hw_q_age;
+				/* HW steering queue oriented AGE info. */
+			};
+		};
+	};
 };
 
 /* Per port data of shared IB device. */
@@ -1312,6 +1345,9 @@ struct mlx5_dev_ctx_shared {
 	uint32_t hws_tags:1; /* Check if tags info for HWS initialized. */
 	uint32_t shared_mark_enabled:1;
 	/* If mark action is enabled on Rxqs (shared E-Switch domain). */
+	uint32_t hws_max_log_bulk_sz:5;
+	/* Log of minimal HWS counters created hard coded. */
+	uint32_t hws_max_nb_counters; /* Maximal number for HWS counters. */
 	uint32_t max_port; /* Maximal IB device port index. */
 	struct mlx5_bond_info bond; /* Bonding information. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
@@ -1353,7 +1389,8 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_list *dest_array_list;
 	struct mlx5_list *flex_parsers_dv; /* Flex Item parsers. */
 	/* List of destination array actions. */
-	struct mlx5_flow_counter_mng cmng; /* Counters management structure. */
+	struct mlx5_flow_counter_mng sws_cmng;
+	/* SW steering counters management structure. */
 	void *default_miss_action; /* Default miss action. */
 	struct mlx5_indexed_pool *ipool[MLX5_IPOOL_MAX];
 	struct mlx5_indexed_pool *mdh_ipools[MLX5_MAX_MODIFY_NUM];
@@ -1683,6 +1720,9 @@ struct mlx5_priv {
 	LIST_HEAD(flow_hw_at, rte_flow_actions_template) flow_hw_at;
 	struct mlx5dr_context *dr_ctx; /**< HW steering DR context. */
 	/* HW steering queue polling mechanism job descriptor LIFO. */
+	uint32_t hws_strict_queue:1;
+	/**< Whether all operations strictly happen on the same HWS queue. */
+	uint32_t hws_age_req:1; /**< Whether this port has AGE indexed pool. */
 	struct mlx5_hw_q *hw_q;
 	/* HW steering rte flow table list header. */
 	LIST_HEAD(flow_hw_tbl, rte_flow_template_table) flow_hw_tbl;
@@ -1998,6 +2038,9 @@ int mlx5_validate_action_ct(struct rte_eth_dev *dev,
 			    const struct rte_flow_action_conntrack *conntrack,
 			    struct rte_flow_error *error);
 
+int mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			       void **contexts, uint32_t nb_contexts,
+			       struct rte_flow_error *error);
 
 /* mlx5_mp_os.c */
 
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index d064abfef3..2af8c731ef 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -43,6 +43,9 @@
 #define MLX5_PMD_SOFT_COUNTERS 1
 #endif
 
+/* Maximum number of DCS created per port. */
+#define MLX5_HWS_CNT_DCS_NUM 4
+
 /* Alarm timeout. */
 #define MLX5_ALARM_TIMEOUT_US 100000
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index aeaeb15f80..f79ac265a4 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -989,6 +989,9 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.isolate = mlx5_flow_isolate,
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	.get_q_aged_flows = mlx5_flow_get_q_aged_flows,
+#endif
 	.get_aged_flows = mlx5_flow_get_aged_flows,
 	.action_handle_create = mlx5_action_handle_create,
 	.action_handle_destroy = mlx5_action_handle_destroy,
@@ -8944,11 +8947,11 @@ mlx5_flow_create_counter_stat_mem_mng(struct mlx5_dev_ctx_shared *sh)
 		mem_mng->raws[i].data = raw_data + i * MLX5_COUNTERS_PER_POOL;
 	}
 	for (i = 0; i < MLX5_MAX_PENDING_QUERIES; ++i)
-		LIST_INSERT_HEAD(&sh->cmng.free_stat_raws,
+		LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws,
 				 mem_mng->raws + MLX5_CNT_CONTAINER_RESIZE + i,
 				 next);
-	LIST_INSERT_HEAD(&sh->cmng.mem_mngs, mem_mng, next);
-	sh->cmng.mem_mng = mem_mng;
+	LIST_INSERT_HEAD(&sh->sws_cmng.mem_mngs, mem_mng, next);
+	sh->sws_cmng.mem_mng = mem_mng;
 	return 0;
 }
 
@@ -8967,7 +8970,7 @@ static int
 mlx5_flow_set_counter_stat_mem(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_flow_counter_pool *pool)
 {
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	/* Resize statistic memory once used out. */
 	if (!(pool->index % MLX5_CNT_CONTAINER_RESIZE) &&
 	    mlx5_flow_create_counter_stat_mem_mng(sh)) {
@@ -8996,14 +8999,14 @@ mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh)
 {
 	uint32_t pools_n, us;
 
-	pools_n = __atomic_load_n(&sh->cmng.n_valid, __ATOMIC_RELAXED);
+	pools_n = __atomic_load_n(&sh->sws_cmng.n_valid, __ATOMIC_RELAXED);
 	us = MLX5_POOL_QUERY_FREQ_US / pools_n;
 	DRV_LOG(DEBUG, "Set alarm for %u pools each %u us", pools_n, us);
 	if (rte_eal_alarm_set(us, mlx5_flow_query_alarm, sh)) {
-		sh->cmng.query_thread_on = 0;
+		sh->sws_cmng.query_thread_on = 0;
 		DRV_LOG(ERR, "Cannot reinitialize query alarm");
 	} else {
-		sh->cmng.query_thread_on = 1;
+		sh->sws_cmng.query_thread_on = 1;
 	}
 }
 
@@ -9019,12 +9022,12 @@ mlx5_flow_query_alarm(void *arg)
 {
 	struct mlx5_dev_ctx_shared *sh = arg;
 	int ret;
-	uint16_t pool_index = sh->cmng.pool_index;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	uint16_t pool_index = sh->sws_cmng.pool_index;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	uint16_t n_valid;
 
-	if (sh->cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
+	if (sh->sws_cmng.pending_queries >= MLX5_MAX_PENDING_QUERIES)
 		goto set_alarm;
 	rte_spinlock_lock(&cmng->pool_update_sl);
 	pool = cmng->pools[pool_index];
@@ -9037,7 +9040,7 @@ mlx5_flow_query_alarm(void *arg)
 		/* There is a pool query in progress. */
 		goto set_alarm;
 	pool->raw_hw =
-		LIST_FIRST(&sh->cmng.free_stat_raws);
+		LIST_FIRST(&sh->sws_cmng.free_stat_raws);
 	if (!pool->raw_hw)
 		/* No free counter statistics raw memory. */
 		goto set_alarm;
@@ -9063,12 +9066,12 @@ mlx5_flow_query_alarm(void *arg)
 		goto set_alarm;
 	}
 	LIST_REMOVE(pool->raw_hw, next);
-	sh->cmng.pending_queries++;
+	sh->sws_cmng.pending_queries++;
 	pool_index++;
 	if (pool_index >= n_valid)
 		pool_index = 0;
 set_alarm:
-	sh->cmng.pool_index = pool_index;
+	sh->sws_cmng.pool_index = pool_index;
 	mlx5_set_query_alarm(sh);
 }
 
@@ -9151,7 +9154,7 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 		(struct mlx5_flow_counter_pool *)(uintptr_t)async_id;
 	struct mlx5_counter_stats_raw *raw_to_free;
 	uint8_t query_gen = pool->query_gen ^ 1;
-	struct mlx5_flow_counter_mng *cmng = &sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 		pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 				MLX5_COUNTER_TYPE_ORIGIN;
@@ -9174,9 +9177,9 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 			rte_spinlock_unlock(&cmng->csl[cnt_type]);
 		}
 	}
-	LIST_INSERT_HEAD(&sh->cmng.free_stat_raws, raw_to_free, next);
+	LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws, raw_to_free, next);
 	pool->raw_hw = NULL;
-	sh->cmng.pending_queries--;
+	sh->sws_cmng.pending_queries--;
 }
 
 static int
@@ -9536,7 +9539,7 @@ mlx5_flow_dev_dump_sh_all(struct rte_eth_dev *dev,
 	struct mlx5_list_inconst *l_inconst;
 	struct mlx5_list_entry *e;
 	int lcore_index;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	uint32_t max;
 	void *action;
 
@@ -9707,18 +9710,58 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 {
 	const struct mlx5_flow_driver_ops *fops;
 	struct rte_flow_attr attr = { .transfer = 0 };
+	enum mlx5_flow_drv_type type = flow_get_drv_type(dev, &attr);
 
-	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_DV) {
-		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_DV);
-		return fops->get_aged_flows(dev, contexts, nb_contexts,
-						    error);
+	if (type == MLX5_FLOW_TYPE_DV || type == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(type);
+		return fops->get_aged_flows(dev, contexts, nb_contexts, error);
 	}
-	DRV_LOG(ERR,
-		"port %u get aged flows is not supported.",
-		 dev->data->port_id);
+	DRV_LOG(ERR, "port %u get aged flows is not supported.",
+		dev->data->port_id);
 	return -ENOTSUP;
 }
 
+/**
+ * Get aged-out flows per HWS queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query.
+ * @param[in] context
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_countexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+int
+mlx5_flow_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			   void **contexts, uint32_t nb_contexts,
+			   struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops;
+	struct rte_flow_attr attr = { 0 };
+
+	if (flow_get_drv_type(dev, &attr) == MLX5_FLOW_TYPE_HW) {
+		fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+		return fops->get_q_aged_flows(dev, queue_id, contexts,
+					      nb_contexts, error);
+	}
+	DRV_LOG(ERR, "port %u queue %u get aged flows is not supported.",
+		dev->data->port_id, queue_id);
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "get Q aged flows with incorrect steering mode");
+}
+
 /* Wrapper for driver action_validate op callback */
 static int
 flow_drv_action_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 81bb7a70c1..9bfb2908a1 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -294,6 +294,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_METER_WITH_TERMINATED_POLICY (1ull << 40)
 #define MLX5_FLOW_ACTION_CT (1ull << 41)
 #define MLX5_FLOW_ACTION_SEND_TO_KERNEL (1ull << 42)
+#define MLX5_FLOW_ACTION_INDIRECT_COUNT (1ull << 43)
+#define MLX5_FLOW_ACTION_INDIRECT_AGE (1ull << 44)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -1102,6 +1104,22 @@ struct rte_flow {
 	uint32_t geneve_tlv_option; /**< Holds Geneve TLV option id. > */
 } __rte_packed;
 
+/*
+ * HWS COUNTER ID's layout
+ *       3                   2                   1                   0
+ *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 25:24 = DCS index
+ *    Bit 23:00 = IDX in this counter belonged DCS bulk.
+ */
+typedef uint32_t cnt_id_t;
+
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
 #ifdef PEDANTIC
@@ -1118,7 +1136,8 @@ struct rte_flow_hw {
 		struct mlx5_hrxq *hrxq; /* TIR action. */
 	};
 	struct rte_flow_template_table *table; /* The table flow allcated from. */
-	uint32_t cnt_id;
+	uint32_t age_idx;
+	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 	uint8_t rule[0]; /* HWS layer data struct. */
 } __rte_packed;
@@ -1169,7 +1188,7 @@ struct mlx5_action_construct_data {
 			uint32_t idx; /* Shared action index. */
 		} shared_rss;
 		struct {
-			uint32_t id;
+			cnt_id_t id;
 		} shared_counter;
 		struct {
 			uint32_t id;
@@ -1200,6 +1219,7 @@ struct rte_flow_actions_template {
 	struct rte_flow_action *actions; /* Cached flow actions. */
 	struct rte_flow_action *masks; /* Cached action masks.*/
 	struct mlx5dr_action_template *tmpl; /* mlx5dr action template. */
+	uint64_t action_flags; /* Bit-map of all valid action in template. */
 	uint16_t dr_actions_num; /* Amount of DR rules actions. */
 	uint16_t actions_num; /* Amount of flow actions */
 	uint16_t *actions_off; /* DR action offset for given rte action offset. */
@@ -1256,7 +1276,7 @@ struct mlx5_hw_actions {
 	struct mlx5_hw_encap_decap_action *encap_decap;
 	uint16_t encap_decap_pos; /* Encap/Decap action position. */
 	uint32_t mark:1; /* Indicate the mark action. */
-	uint32_t cnt_id; /* Counter id. */
+	cnt_id_t cnt_id; /* Counter id. */
 	uint32_t mtr_id; /* Meter id. */
 	/* Translated DR action array from action template. */
 	struct mlx5dr_rule_action rule_acts[MLX5_HW_MAX_ACTS];
@@ -1632,6 +1652,12 @@ typedef int (*mlx5_flow_get_aged_flows_t)
 					 void **context,
 					 uint32_t nb_contexts,
 					 struct rte_flow_error *error);
+typedef int (*mlx5_flow_get_q_aged_flows_t)
+					(struct rte_eth_dev *dev,
+					 uint32_t queue_id,
+					 void **context,
+					 uint32_t nb_contexts,
+					 struct rte_flow_error *error);
 typedef int (*mlx5_flow_action_validate_t)
 				(struct rte_eth_dev *dev,
 				 const struct rte_flow_indir_action_conf *conf,
@@ -1838,6 +1864,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_counter_free_t counter_free;
 	mlx5_flow_counter_query_t counter_query;
 	mlx5_flow_get_aged_flows_t get_aged_flows;
+	mlx5_flow_get_q_aged_flows_t get_q_aged_flows;
 	mlx5_flow_action_validate_t action_validate;
 	mlx5_flow_action_create_t action_create;
 	mlx5_flow_action_destroy_t action_destroy;
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index d6a762d57d..12fd62f5e8 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -5520,7 +5520,7 @@ flow_dv_validate_action_age(uint64_t action_flags,
 	const struct rte_flow_action_age *age = action->conf;
 
 	if (!priv->sh->cdev->config.devx ||
-	    (priv->sh->cmng.counter_fallback && !priv->sh->aso_age_mng))
+	    (priv->sh->sws_cmng.counter_fallback && !priv->sh->aso_age_mng))
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					  NULL,
@@ -6081,7 +6081,7 @@ flow_dv_counter_get_by_idx(struct rte_eth_dev *dev,
 			   struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	/* Decrease to original index and clear shared bit. */
@@ -6175,7 +6175,7 @@ static int
 flow_dv_container_resize(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	void *old_pools = cmng->pools;
 	uint32_t resize = cmng->n + MLX5_CNT_CONTAINER_RESIZE;
 	uint32_t mem_size = sizeof(struct mlx5_flow_counter_pool *) * resize;
@@ -6221,7 +6221,7 @@ _flow_dv_query_count(struct rte_eth_dev *dev, uint32_t counter, uint64_t *pkts,
 
 	cnt = flow_dv_counter_get_by_idx(dev, counter, &pool);
 	MLX5_ASSERT(pool);
-	if (priv->sh->cmng.counter_fallback)
+	if (priv->sh->sws_cmng.counter_fallback)
 		return mlx5_devx_cmd_flow_counter_query(cnt->dcs_when_active, 0,
 					0, pkts, bytes, 0, NULL, NULL, 0);
 	rte_spinlock_lock(&pool->sl);
@@ -6258,8 +6258,8 @@ flow_dv_pool_create(struct rte_eth_dev *dev, struct mlx5_devx_obj *dcs,
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t size = sizeof(*pool);
 
 	size += MLX5_COUNTERS_PER_POOL * MLX5_CNT_SIZE;
@@ -6320,14 +6320,14 @@ flow_dv_counter_pool_prepare(struct rte_eth_dev *dev,
 			     uint32_t age)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 	struct mlx5_counters tmp_tq;
 	struct mlx5_devx_obj *dcs = NULL;
 	struct mlx5_flow_counter *cnt;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
-	bool fallback = priv->sh->cmng.counter_fallback;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
 	uint32_t i;
 
 	if (fallback) {
@@ -6391,8 +6391,8 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt_free = NULL;
-	bool fallback = priv->sh->cmng.counter_fallback;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	bool fallback = priv->sh->sws_cmng.counter_fallback;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	enum mlx5_counter_type cnt_type =
 			age ? MLX5_COUNTER_TYPE_AGE : MLX5_COUNTER_TYPE_ORIGIN;
 	uint32_t cnt_idx;
@@ -6438,7 +6438,7 @@ flow_dv_counter_alloc(struct rte_eth_dev *dev, uint32_t age)
 	if (_flow_dv_query_count(dev, cnt_idx, &cnt_free->hits,
 				 &cnt_free->bytes))
 		goto err;
-	if (!fallback && !priv->sh->cmng.query_thread_on)
+	if (!fallback && !priv->sh->sws_cmng.query_thread_on)
 		/* Start the asynchronous batch query by the host thread. */
 		mlx5_set_query_alarm(priv->sh);
 	/*
@@ -6566,7 +6566,7 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 	 * this case, lock will not be needed as query callback and release
 	 * function both operate with the different list.
 	 */
-	if (!priv->sh->cmng.counter_fallback) {
+	if (!priv->sh->sws_cmng.counter_fallback) {
 		rte_spinlock_lock(&pool->csl);
 		TAILQ_INSERT_TAIL(&pool->counters[pool->query_gen], cnt, next);
 		rte_spinlock_unlock(&pool->csl);
@@ -6574,10 +6574,10 @@ flow_dv_counter_free(struct rte_eth_dev *dev, uint32_t counter)
 		cnt->dcs_when_free = cnt->dcs_when_active;
 		cnt_type = pool->is_aged ? MLX5_COUNTER_TYPE_AGE :
 					   MLX5_COUNTER_TYPE_ORIGIN;
-		rte_spinlock_lock(&priv->sh->cmng.csl[cnt_type]);
-		TAILQ_INSERT_TAIL(&priv->sh->cmng.counters[cnt_type],
+		rte_spinlock_lock(&priv->sh->sws_cmng.csl[cnt_type]);
+		TAILQ_INSERT_TAIL(&priv->sh->sws_cmng.counters[cnt_type],
 				  cnt, next);
-		rte_spinlock_unlock(&priv->sh->cmng.csl[cnt_type]);
+		rte_spinlock_unlock(&priv->sh->sws_cmng.csl[cnt_type]);
 	}
 }
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 6a1c82978f..1a9c5e6d7f 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -477,7 +477,8 @@ __flow_hw_act_data_general_append(struct mlx5_priv *priv,
 				  enum rte_flow_action_type type,
 				  uint16_t action_src,
 				  uint16_t action_dst)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -512,7 +513,8 @@ __flow_hw_act_data_encap_append(struct mlx5_priv *priv,
 				uint16_t action_src,
 				uint16_t action_dst,
 				uint16_t len)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -582,7 +584,8 @@ __flow_hw_act_data_shared_rss_append(struct mlx5_priv *priv,
 				     uint16_t action_dst,
 				     uint32_t idx,
 				     struct mlx5_shared_action_rss *rss)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -621,7 +624,8 @@ __flow_hw_act_data_shared_cnt_append(struct mlx5_priv *priv,
 				     uint16_t action_src,
 				     uint16_t action_dst,
 				     cnt_id_t cnt_id)
-{	struct mlx5_action_construct_data *act_data;
+{
+	struct mlx5_action_construct_data *act_data;
 
 	act_data = __flow_hw_act_data_alloc(priv, type, action_src, action_dst);
 	if (!act_data)
@@ -717,6 +721,10 @@ flow_hw_shared_action_translate(struct rte_eth_dev *dev,
 			action_src, action_dst, act_idx))
 			return -1;
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/* Not supported, prevent by validate function. */
+		MLX5_ASSERT(0);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, MLX5_HW_INV_QUEUE,
 				       idx, &acts->rule_acts[action_dst]))
@@ -1109,7 +1117,7 @@ flow_hw_cnt_compile(struct rte_eth_dev *dev, uint32_t  start_pos,
 	cnt_id_t cnt_id;
 	int ret;
 
-	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id);
+	ret = mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0);
 	if (ret != 0)
 		return ret;
 	ret = mlx5_hws_cnt_pool_get_action_offset
@@ -1250,8 +1258,6 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to the rte_eth_dev structure.
  * @param[in] cfg
  *   Pointer to the table configuration.
- * @param[in] item_templates
- *   Item template array to be binded to the table.
  * @param[in/out] acts
  *   Pointer to the template HW steering DR actions.
  * @param[in] at
@@ -1260,7 +1266,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
  *   Pointer to error structure.
  *
  * @return
- *    Table on success, NULL otherwise and rte_errno is set.
+ *   0 on success, a negative errno otherwise and rte_errno is set.
  */
 static int
 __flow_hw_actions_translate(struct rte_eth_dev *dev,
@@ -1289,6 +1295,7 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 	uint16_t jump_pos;
 	uint32_t ct_idx;
 	int err;
+	uint32_t target_grp = 0;
 
 	flow_hw_modify_field_init(&mhdr, at);
 	if (attr->transfer)
@@ -1519,8 +1526,42 @@ __flow_hw_actions_translate(struct rte_eth_dev *dev,
 							action_pos))
 				goto err;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Age action on root table is not supported in HW steering mode");
+			}
+			action_pos = at->actions_off[actions - at->actions];
+			if (__flow_hw_act_data_general_append(priv, acts,
+							 actions->type,
+							 actions - action_start,
+							 action_pos))
+				goto err;
+			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			action_pos = at->actions_off[actions - action_start];
+			flow_hw_translate_group(dev, cfg, attr->group,
+						&target_grp, error);
+			if (target_grp == 0) {
+				__flow_hw_action_template_destroy(dev, acts);
+				return rte_flow_error_set(error, ENOTSUP,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL,
+						"Counter action on root table is not supported in HW steering mode");
+			}
+			if ((at->action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * When both COUNT and AGE are requested, it is
+				 * saved as AGE action which creates also the
+				 * counter.
+				 */
+				break;
+			action_pos = at->actions_off[actions - at->actions];
 			if (masks->conf &&
 			    ((const struct rte_flow_action_count *)
 			     masks->conf)->id) {
@@ -1747,6 +1788,10 @@ flow_hw_shared_action_get(struct rte_eth_dev *dev,
  *   Pointer to the flow table.
  * @param[in] it_idx
  *   Item template index the action template refer to.
+ * @param[in] action_flags
+ *   Actions bit-map detected in this template.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
  * @param[in] rule_act
  *   Pointer to the shared action's destination rule DR action.
  *
@@ -1757,7 +1802,8 @@ static __rte_always_inline int
 flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				const struct rte_flow_action *action,
 				struct rte_flow_template_table *table,
-				const uint8_t it_idx,
+				const uint8_t it_idx, uint64_t action_flags,
+				struct rte_flow_hw *flow,
 				struct mlx5dr_rule_action *rule_act)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -1765,11 +1811,14 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 	struct mlx5_action_construct_data act_data;
 	struct mlx5_shared_action_rss *shared_rss;
 	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_age_info *age_info;
+	struct mlx5_hws_age_param *param;
 	uint32_t act_idx = (uint32_t)(uintptr_t)action->conf;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx &
 		       ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	uint64_t item_flags;
+	cnt_id_t age_cnt;
 
 	memset(&act_data, 0, sizeof(act_data));
 	switch (type) {
@@ -1795,6 +1844,44 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue,
 				&rule_act->action,
 				&rule_act->counter.offset))
 			return -1;
+		flow->cnt_id = act_idx;
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		/*
+		 * Save the index with the indirect type, to recognize
+		 * it in flow destroy.
+		 */
+		flow->age_idx = act_idx;
+		if (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+			/*
+			 * The mutual update for idirect AGE & COUNT will be
+			 * performed later after we have ID for both of them.
+			 */
+			break;
+		age_info = GET_PORT_AGE_INFO(priv);
+		param = mlx5_ipool_get(age_info->ages_ipool, idx);
+		if (param == NULL)
+			return -1;
+		if (action_flags & MLX5_FLOW_ACTION_COUNT) {
+			if (mlx5_hws_cnt_pool_get(priv->hws_cpool,
+						  &param->queue_id, &age_cnt,
+						  idx) < 0)
+				return -1;
+			flow->cnt_id = age_cnt;
+			param->nb_cnts++;
+		} else {
+			/*
+			 * Get the counter of this indirect AGE or create one
+			 * if doesn't exist.
+			 */
+			age_cnt = mlx5_hws_age_cnt_get(priv, param, idx);
+			if (age_cnt == 0)
+				return -1;
+		}
+		if (mlx5_hws_cnt_pool_get_action_offset(priv->hws_cpool,
+						     age_cnt, &rule_act->action,
+						     &rule_act->counter.offset))
+			return -1;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		if (flow_hw_ct_compile(dev, queue, idx, rule_act))
@@ -1955,7 +2042,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			  const uint8_t it_idx,
 			  const struct rte_flow_action actions[],
 			  struct mlx5dr_rule_action *rule_acts,
-			  uint32_t queue)
+			  uint32_t queue,
+			  struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1968,6 +2056,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	const struct rte_flow_item *enc_item = NULL;
 	const struct rte_flow_action_ethdev *port_action = NULL;
 	const struct rte_flow_action_meter *meter = NULL;
+	const struct rte_flow_action_age *age = NULL;
 	uint8_t *buf = job->encap_data;
 	struct rte_flow_attr attr = {
 			.ingress = 1,
@@ -1975,6 +2064,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 	uint32_t ft_flag;
 	size_t encap_len = 0;
 	int ret;
+	uint32_t age_idx = 0;
 	struct mlx5_aso_mtr *aso_mtr;
 
 	rte_memcpy(rule_acts, hw_acts->rule_acts, sizeof(*rule_acts) * at->dr_actions_num);
@@ -2027,6 +2117,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
 			if (flow_hw_shared_action_construct
 					(dev, queue, action, table, it_idx,
+					 at->action_flags, job->flow,
 					 &rule_acts[act_data->action_dst]))
 				return -1;
 			break;
@@ -2135,9 +2226,32 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			if (mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
 				return -1;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			age = action->conf;
+			/*
+			 * First, create the AGE parameter, then create its
+			 * counter later:
+			 * Regular counter - in next case.
+			 * Indirect counter - update it after the loop.
+			 */
+			age_idx = mlx5_hws_age_action_create(priv, queue, 0,
+							     age,
+							     job->flow->idx,
+							     error);
+			if (age_idx == 0)
+				return -rte_errno;
+			job->flow->age_idx = age_idx;
+			if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT)
+				/*
+				 * When AGE uses indirect counter, no need to
+				 * create counter but need to update it with the
+				 * AGE parameter, will be done after the loop.
+				 */
+				break;
+			/* Fall-through. */
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue,
-					&cnt_id);
+						    &cnt_id, age_idx);
 			if (ret != 0)
 				return ret;
 			ret = mlx5_hws_cnt_pool_get_action_offset
@@ -2194,6 +2308,25 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 			break;
 		}
 	}
+	if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT) {
+		if (at->action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE) {
+			age_idx = job->flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+			if (mlx5_hws_cnt_age_get(priv->hws_cpool,
+						 job->flow->cnt_id) != age_idx)
+				/*
+				 * This is first use of this indirect counter
+				 * for this indirect AGE, need to increase the
+				 * number of counters.
+				 */
+				mlx5_hws_age_nb_cnt_increase(priv, age_idx);
+		}
+		/*
+		 * Update this indirect counter the indirect/direct AGE in which
+		 * using it.
+		 */
+		mlx5_hws_cnt_age_set(priv->hws_cpool, job->flow->cnt_id,
+				     age_idx);
+	}
 	if (hw_acts->encap_decap && !hw_acts->encap_decap->shared) {
 		rule_acts[hw_acts->encap_decap_pos].reformat.offset =
 				job->flow->idx - 1;
@@ -2343,8 +2476,10 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
 	 * No need to copy and contrust a new "actions" list based on the
 	 * user's input, in order to save the cost.
 	 */
-	if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index],
-				      pattern_template_index, actions, rule_acts, queue)) {
+	if (flow_hw_actions_construct(dev, job,
+				      &table->ats[action_template_index],
+				      pattern_template_index, actions,
+				      rule_acts, queue, error)) {
 		rte_errno = EINVAL;
 		goto free;
 	}
@@ -2429,6 +2564,49 @@ flow_hw_async_flow_destroy(struct rte_eth_dev *dev,
 			"fail to create rte flow");
 }
 
+/**
+ * Release the AGE and counter for given flow.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue
+ *   The queue to release the counter.
+ * @param[in, out] flow
+ *   Pointer to the flow containing the counter.
+ * @param[out] error
+ *   Pointer to error structure.
+ */
+static void
+flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
+			  struct rte_flow_hw *flow,
+			  struct rte_flow_error *error)
+{
+	if (mlx5_hws_cnt_is_shared(priv->hws_cpool, flow->cnt_id)) {
+		if (flow->age_idx && !mlx5_hws_age_is_indirect(flow->age_idx)) {
+			/* Remove this AGE parameter from indirect counter. */
+			mlx5_hws_cnt_age_set(priv->hws_cpool, flow->cnt_id, 0);
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+			flow->age_idx = 0;
+		}
+		return;
+	}
+	/* Put the counter first to reduce the race risk in BG thread. */
+	mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue, &flow->cnt_id);
+	flow->cnt_id = 0;
+	if (flow->age_idx) {
+		if (mlx5_hws_age_is_indirect(flow->age_idx)) {
+			uint32_t idx = flow->age_idx & MLX5_HWS_AGE_IDX_MASK;
+
+			mlx5_hws_age_nb_cnt_decrease(priv, idx);
+		} else {
+			/* Release the AGE parameter. */
+			mlx5_hws_age_action_destroy(priv, flow->age_idx, error);
+		}
+		flow->age_idx = 0;
+	}
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2475,13 +2653,9 @@ flow_hw_pull(struct rte_eth_dev *dev,
 				flow_hw_jump_release(dev, job->flow->jump);
 			else if (job->flow->fate_type == MLX5_FLOW_FATE_QUEUE)
 				mlx5_hrxq_obj_release(dev, job->flow->hrxq);
-			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id) &&
-			    mlx5_hws_cnt_is_shared
-				(priv->hws_cpool, job->flow->cnt_id) == false) {
-				mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue,
-						&job->flow->cnt_id);
-				job->flow->cnt_id = 0;
-			}
+			if (mlx5_hws_cnt_id_valid(job->flow->cnt_id))
+				flow_hw_age_count_release(priv, queue,
+							  job->flow, error);
 			if (job->flow->mtr_id) {
 				mlx5_ipool_free(pool->idx_pool,	job->flow->mtr_id);
 				job->flow->mtr_id = 0;
@@ -3134,100 +3308,315 @@ flow_hw_validate_action_represented_port(struct rte_eth_dev *dev,
 	return 0;
 }
 
-static inline int
-flow_hw_action_meta_copy_insert(const struct rte_flow_action actions[],
-				const struct rte_flow_action masks[],
-				const struct rte_flow_action *ins_actions,
-				const struct rte_flow_action *ins_masks,
-				struct rte_flow_action *new_actions,
-				struct rte_flow_action *new_masks,
-				uint16_t *ins_pos)
+/**
+ * Validate AGE action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[in] fixed_cnt
+ *   Indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_age(struct rte_eth_dev *dev,
+			    const struct rte_flow_action *action,
+			    uint64_t action_flags, bool fixed_cnt,
+			    struct rte_flow_error *error)
 {
-	uint16_t idx, total = 0;
-	uint16_t end_idx = UINT16_MAX;
-	bool act_end = false;
-	bool modify_field = false;
-	bool rss_or_queue = false;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
 
-	MLX5_ASSERT(actions && masks);
-	MLX5_ASSERT(new_actions && new_masks);
-	MLX5_ASSERT(ins_actions && ins_masks);
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_RSS:
-		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			/* It is assumed that application provided only single RSS/QUEUE action. */
-			MLX5_ASSERT(!rss_or_queue);
-			rss_or_queue = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
-			modify_field = true;
-			break;
-		case RTE_FLOW_ACTION_TYPE_END:
-			end_idx = idx;
-			act_end = true;
-			break;
-		default:
-			break;
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "AGE action not supported");
+	if (age_info->ages_ipool == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "aging pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_AGE) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate AGE actions set");
+	if (fixed_cnt)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "AGE and fixed COUNT combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate count action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in] action_flags
+ *   Holds the actions detected until now.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_count(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      const struct rte_flow_action *mask,
+			      uint64_t action_flags,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count = mask->conf;
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "count action not supported");
+	if (!priv->hws_cpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "counters pool not initialized");
+	if ((action_flags & MLX5_FLOW_ACTION_COUNT) ||
+	    (action_flags & MLX5_FLOW_ACTION_INDIRECT_COUNT))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "duplicate count actions set");
+	if (count && count->id && (action_flags & MLX5_FLOW_ACTION_AGE))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, mask,
+					  "AGE and COUNT action shared by mask combination is not supported");
+	return 0;
+}
+
+/**
+ * Validate meter_mark action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_meter_mark(struct rte_eth_dev *dev,
+			      const struct rte_flow_action *action,
+			      struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	RTE_SET_USED(action);
+
+	if (!priv->sh->cdev->config.devx)
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark action not supported");
+	if (!priv->hws_mpool)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "meter_mark pool not initialized");
+	return 0;
+}
+
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[in] mask
+ *   Pointer to the indirect action mask.
+ * @param[in, out] action_flags
+ *   Holds the actions detected until now.
+ * @param[in, out] fixed_cnt
+ *   Pointer to indicator if this list has a fixed COUNT action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_indirect(struct rte_eth_dev *dev,
+				 const struct rte_flow_action *action,
+				 const struct rte_flow_action *mask,
+				 uint64_t *action_flags, bool *fixed_cnt,
+				 struct rte_flow_error *error)
+{
+	uint32_t type;
+	int ret;
+
+	if (!mask)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "Unable to determine indirect action type without a mask specified");
+	type = mask->type;
+	switch (type) {
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		ret = flow_hw_validate_action_meter_mark(dev, mask, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_METER;
+		break;
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_RSS;
+		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		/* TODO: Validation logic (same as flow_hw_actions_validate) */
+		*action_flags |= MLX5_FLOW_ACTION_CT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (action->conf && mask->conf) {
+			if ((*action_flags & MLX5_FLOW_ACTION_AGE) ||
+			    (*action_flags & MLX5_FLOW_ACTION_INDIRECT_AGE))
+				/*
+				 * AGE cannot use indirect counter which is
+				 * shared with enother flow rules.
+				 */
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "AGE and fixed COUNT combination is not supported");
+			*fixed_cnt = true;
 		}
+		ret = flow_hw_validate_action_count(dev, action, mask,
+						    *action_flags, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_COUNT;
+		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		ret = flow_hw_validate_action_age(dev, action, *action_flags,
+						  *fixed_cnt, error);
+		if (ret < 0)
+			return ret;
+		*action_flags |= MLX5_FLOW_ACTION_INDIRECT_AGE;
+		break;
+	default:
+		DRV_LOG(WARNING, "Unsupported shared action type: %d", type);
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, mask,
+					  "Unsupported indirect action type");
 	}
-	if (!rss_or_queue)
-		return 0;
-	else if (idx >= MLX5_HW_MAX_ACTS)
-		return -1; /* No more space. */
-	total = idx;
-	/*
-	 * If actions template contains MODIFY_FIELD action, then meta copy action can be inserted
-	 * at the template's end. Position of MODIFY_HDR action is based on the position of the
-	 * first MODIFY_FIELD flow action.
-	 */
-	if (modify_field) {
-		*ins_pos = end_idx;
-		goto insert_meta_copy;
-	}
-	/*
-	 * If actions template does not contain MODIFY_FIELD action, then meta copy action must be
-	 * inserted at aplace conforming with action order defined in steering/mlx5dr_action.c.
+	return 0;
+}
+
+/**
+ * Validate raw_encap action.
+ *
+ * @param[in] dev
+ *   Pointer to rte_eth_dev structure.
+ * @param[in] action
+ *   Pointer to the indirect action.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_validate_action_raw_encap(struct rte_eth_dev *dev __rte_unused,
+				  const struct rte_flow_action *action,
+				  struct rte_flow_error *error)
+{
+	const struct rte_flow_action_raw_encap *raw_encap_data = action->conf;
+
+	if (!raw_encap_data || !raw_encap_data->size || !raw_encap_data->data)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION, action,
+					  "invalid raw_encap_data");
+	return 0;
+}
+
+static inline uint16_t
+flow_hw_template_expand_modify_field(const struct rte_flow_action actions[],
+				     const struct rte_flow_action masks[],
+				     const struct rte_flow_action *mf_action,
+				     const struct rte_flow_action *mf_mask,
+				     struct rte_flow_action *new_actions,
+				     struct rte_flow_action *new_masks,
+				     uint64_t flags, uint32_t act_num)
+{
+	uint32_t i, tail;
+
+	MLX5_ASSERT(actions && masks);
+	MLX5_ASSERT(new_actions && new_masks);
+	MLX5_ASSERT(mf_action && mf_mask);
+	if (flags & MLX5_FLOW_ACTION_MODIFY_FIELD) {
+		/*
+		 * Application action template already has Modify Field.
+		 * It's location will be used in DR.
+		 * Expanded MF action can be added before the END.
+		 */
+		i = act_num - 1;
+		goto insert;
+	}
+	/**
+	 * Locate the first action positioned BEFORE the new MF.
+	 *
+	 * Search for a place to insert modify header
+	 * from the END action backwards:
+	 * 1. END is always present in actions array
+	 * 2. END location is always at action[act_num - 1]
+	 * 3. END always positioned AFTER modify field location
+	 *
+	 * Relative actions order is the same for RX, TX and FDB.
+	 *
+	 * Current actions order (draft-3)
+	 * @see action_order_arr[]
 	 */
-	act_end = false;
-	for (idx = 0; !act_end; idx++) {
-		switch (actions[idx].type) {
-		case RTE_FLOW_ACTION_TYPE_COUNT:
-		case RTE_FLOW_ACTION_TYPE_METER:
-		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+	for (i = act_num - 2; (int)i >= 0; i--) {
+		enum rte_flow_action_type type = actions[i].type;
+
+		if (type == RTE_FLOW_ACTION_TYPE_INDIRECT)
+			type = masks[i].type;
+		switch (type) {
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_DROP:
+		case RTE_FLOW_ACTION_TYPE_JUMP:
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			*ins_pos = idx;
-			act_end = true;
-			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
+		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+		case RTE_FLOW_ACTION_TYPE_VOID:
 		case RTE_FLOW_ACTION_TYPE_END:
-			act_end = true;
 			break;
 		default:
+			i++; /* new MF inserted AFTER actions[i] */
+			goto insert;
 			break;
 		}
 	}
-insert_meta_copy:
-	MLX5_ASSERT(*ins_pos != UINT16_MAX);
-	MLX5_ASSERT(*ins_pos < total);
-	/* Before the position, no change for the actions. */
-	for (idx = 0; idx < *ins_pos; idx++) {
-		new_actions[idx] = actions[idx];
-		new_masks[idx] = masks[idx];
-	}
-	/* Insert the new action and mask to the position. */
-	new_actions[idx] = *ins_actions;
-	new_masks[idx] = *ins_masks;
-	/* Remaining content is right shifted by one position. */
-	for (; idx < total; idx++) {
-		new_actions[idx + 1] = actions[idx];
-		new_masks[idx + 1] = masks[idx];
-	}
-	return 0;
+	i = 0;
+insert:
+	tail = act_num - i; /* num action to move */
+	memcpy(new_actions, actions, sizeof(actions[0]) * i);
+	new_actions[i] = *mf_action;
+	memcpy(new_actions + i + 1, actions + i, sizeof(actions[0]) * tail);
+	memcpy(new_masks, masks, sizeof(masks[0]) * i);
+	new_masks[i] = *mf_mask;
+	memcpy(new_masks + i + 1, masks + i, sizeof(masks[0]) * tail);
+	return i;
 }
 
 static int
@@ -3298,13 +3687,17 @@ flow_hw_validate_action_push_vlan(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_actions_validate(struct rte_eth_dev *dev,
-			const struct rte_flow_actions_template_attr *attr,
-			const struct rte_flow_action actions[],
-			const struct rte_flow_action masks[],
-			struct rte_flow_error *error)
+mlx5_flow_hw_actions_validate(struct rte_eth_dev *dev,
+			      const struct rte_flow_actions_template_attr *attr,
+			      const struct rte_flow_action actions[],
+			      const struct rte_flow_action masks[],
+			      uint64_t *act_flags,
+			      struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_count *count_mask = NULL;
+	bool fixed_cnt = false;
+	uint64_t action_flags = 0;
 	uint16_t i;
 	bool actions_end = false;
 	int ret;
@@ -3330,46 +3723,70 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_indirect(dev, action,
+							       mask,
+							       &action_flags,
+							       &fixed_cnt,
+							       error);
+			if (ret < 0)
+				return ret;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MARK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_MARK;
 			break;
 		case RTE_FLOW_ACTION_TYPE_DROP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DROP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_JUMP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_QUEUE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RSS:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_RSS;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_ENCAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_VXLAN_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_NVGRE_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_ENCAP:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_raw_encap(dev, action, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_ENCAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_RAW_DECAP:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_meter_mark(dev, action,
+								 error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_METER;
 			break;
 		case RTE_FLOW_ACTION_TYPE_MODIFY_FIELD:
 			ret = flow_hw_validate_action_modify_field(action,
@@ -3377,21 +3794,43 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 									error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
 			break;
 		case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
 			ret = flow_hw_validate_action_represented_port
 					(dev, action, mask, error);
 			if (ret < 0)
 				return ret;
+			action_flags |= MLX5_FLOW_ACTION_PORT_ID;
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			if (count_mask && count_mask->id)
+				fixed_cnt = true;
+			ret = flow_hw_validate_action_age(dev, action,
+							  action_flags,
+							  fixed_cnt, error);
+			if (ret < 0)
+				return ret;
+			action_flags |= MLX5_FLOW_ACTION_AGE;
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			/* TODO: Validation logic */
+			ret = flow_hw_validate_action_count(dev, action, mask,
+							    action_flags,
+							    error);
+			if (ret < 0)
+				return ret;
+			count_mask = mask->conf;
+			action_flags |= MLX5_FLOW_ACTION_COUNT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 			/* TODO: Validation logic */
+			action_flags |= MLX5_FLOW_ACTION_CT;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_POP_VLAN:
+			action_flags |= MLX5_FLOW_ACTION_OF_POP_VLAN;
+			break;
 		case RTE_FLOW_ACTION_TYPE_OF_SET_VLAN_VID:
+			action_flags |= MLX5_FLOW_ACTION_OF_SET_VLAN_VID;
 			break;
 		case RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN:
 			ret = flow_hw_validate_action_push_vlan
@@ -3401,6 +3840,7 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 			i += is_of_vlan_pcp_present(action) ?
 				MLX5_HW_VLAN_PUSH_PCP_IDX :
 				MLX5_HW_VLAN_PUSH_VID_IDX;
+			action_flags |= MLX5_FLOW_ACTION_OF_PUSH_VLAN;
 			break;
 		case RTE_FLOW_ACTION_TYPE_END:
 			actions_end = true;
@@ -3412,9 +3852,23 @@ flow_hw_actions_validate(struct rte_eth_dev *dev,
 						  "action not supported in template API");
 		}
 	}
+	if (act_flags != NULL)
+		*act_flags = action_flags;
 	return 0;
 }
 
+static int
+flow_hw_actions_validate(struct rte_eth_dev *dev,
+			 const struct rte_flow_actions_template_attr *attr,
+			 const struct rte_flow_action actions[],
+			 const struct rte_flow_action masks[],
+			 struct rte_flow_error *error)
+{
+	return mlx5_flow_hw_actions_validate(dev, attr, actions, masks, NULL,
+					     error);
+}
+
+
 static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_MARK] = MLX5DR_ACTION_TYP_TAG,
 	[RTE_FLOW_ACTION_TYPE_DROP] = MLX5DR_ACTION_TYP_DROP,
@@ -3427,7 +3881,6 @@ static enum mlx5dr_action_type mlx5_hw_dr_action_types[] = {
 	[RTE_FLOW_ACTION_TYPE_NVGRE_DECAP] = MLX5DR_ACTION_TYP_TNL_L2_TO_L2,
 	[RTE_FLOW_ACTION_TYPE_MODIFY_FIELD] = MLX5DR_ACTION_TYP_MODIFY_HDR,
 	[RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT] = MLX5DR_ACTION_TYP_VPORT,
-	[RTE_FLOW_ACTION_TYPE_COUNT] = MLX5DR_ACTION_TYP_CTR,
 	[RTE_FLOW_ACTION_TYPE_CONNTRACK] = MLX5DR_ACTION_TYP_ASO_CT,
 	[RTE_FLOW_ACTION_TYPE_OF_POP_VLAN] = MLX5DR_ACTION_TYP_POP_VLAN,
 	[RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN] = MLX5DR_ACTION_TYP_PUSH_VLAN,
@@ -3437,7 +3890,7 @@ static int
 flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 					  unsigned int action_src,
 					  enum mlx5dr_action_type *action_types,
-					  uint16_t *curr_off,
+					  uint16_t *curr_off, uint16_t *cnt_off,
 					  struct rte_flow_actions_template *at)
 {
 	uint32_t type;
@@ -3454,10 +3907,18 @@ flow_hw_dr_actions_template_handle_shared(const struct rte_flow_action *mask,
 		action_types[*curr_off] = MLX5DR_ACTION_TYP_TIR;
 		*curr_off = *curr_off + 1;
 		break;
+	case RTE_FLOW_ACTION_TYPE_AGE:
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		at->actions_off[action_src] = *curr_off;
-		action_types[*curr_off] = MLX5DR_ACTION_TYP_CTR;
-		*curr_off = *curr_off + 1;
+		/*
+		 * Both AGE and COUNT action need counter, the first one fills
+		 * the action_types array, and the second only saves the offset.
+		 */
+		if (*cnt_off == UINT16_MAX) {
+			*cnt_off = *curr_off;
+			action_types[*cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			*curr_off = *curr_off + 1;
+		}
+		at->actions_off[action_src] = *cnt_off;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
 		at->actions_off[action_src] = *curr_off;
@@ -3496,6 +3957,7 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 	enum mlx5dr_action_type reformat_act_type = MLX5DR_ACTION_TYP_TNL_L2_TO_L2;
 	uint16_t reformat_off = UINT16_MAX;
 	uint16_t mhdr_off = UINT16_MAX;
+	uint16_t cnt_off = UINT16_MAX;
 	int ret;
 	for (i = 0, curr_off = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		const struct rte_flow_action_raw_encap *raw_encap_data;
@@ -3508,9 +3970,12 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_INDIRECT:
-			ret = flow_hw_dr_actions_template_handle_shared(&at->masks[i], i,
-									action_types,
-									&curr_off, at);
+			ret = flow_hw_dr_actions_template_handle_shared
+								 (&at->masks[i],
+								  i,
+								  action_types,
+								  &curr_off,
+								  &cnt_off, at);
 			if (ret)
 				return NULL;
 			break;
@@ -3566,6 +4031,19 @@ flow_hw_dr_actions_template_create(struct rte_flow_actions_template *at)
 			if (curr_off >= MLX5_HW_MAX_ACTS)
 				goto err_actions_num;
 			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+		case RTE_FLOW_ACTION_TYPE_COUNT:
+			/*
+			 * Both AGE and COUNT action need counter, the first
+			 * one fills the action_types array, and the second only
+			 * saves the offset.
+			 */
+			if (cnt_off == UINT16_MAX) {
+				cnt_off = curr_off++;
+				action_types[cnt_off] = MLX5DR_ACTION_TYP_CTR;
+			}
+			at->actions_off[i] = cnt_off;
+			break;
 		default:
 			type = mlx5_hw_dr_action_types[at->actions[i].type];
 			at->actions_off[i] = curr_off;
@@ -3706,6 +4184,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	unsigned int i;
 	struct rte_flow_actions_template *at = NULL;
 	uint16_t pos = UINT16_MAX;
+	uint64_t action_flags = 0;
 	struct rte_flow_action tmp_action[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action tmp_mask[MLX5_HW_MAX_ACTS];
 	struct rte_flow_action *ra = (void *)(uintptr_t)actions;
@@ -3748,22 +4227,9 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 		.conf = &rx_mreg_mask,
 	};
 
-	if (flow_hw_actions_validate(dev, attr, actions, masks, error))
+	if (mlx5_flow_hw_actions_validate(dev, attr, actions, masks,
+					  &action_flags, error))
 		return NULL;
-	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
-	    priv->sh->config.dv_esw_en) {
-		/* Application should make sure only one Q/RSS exist in one rule. */
-		if (flow_hw_action_meta_copy_insert(actions, masks, &rx_cpy, &rx_cpy_mask,
-						    tmp_action, tmp_mask, &pos)) {
-			rte_flow_error_set(error, EINVAL,
-					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					   "Failed to concatenate new action/mask");
-			return NULL;
-		} else if (pos != UINT16_MAX) {
-			ra = tmp_action;
-			rm = tmp_mask;
-		}
-	}
 	for (i = 0; ra[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
 		switch (ra[i].type) {
 		/* OF_PUSH_VLAN *MUST* come before OF_SET_VLAN_VID */
@@ -3789,6 +4255,28 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL, "Too many actions");
 		return NULL;
 	}
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->sh->config.dv_esw_en &&
+	    (action_flags & (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS))) {
+		/* Insert META copy */
+		if (act_num + 1 > MLX5_HW_MAX_ACTS) {
+			rte_flow_error_set(error, E2BIG,
+					   RTE_FLOW_ERROR_TYPE_ACTION,
+					   NULL, "cannot expand: too many actions");
+			return NULL;
+		}
+		/* Application should make sure only one Q/RSS exist in one rule. */
+		pos = flow_hw_template_expand_modify_field(actions, masks,
+							   &rx_cpy,
+							   &rx_cpy_mask,
+							   tmp_action, tmp_mask,
+							   action_flags,
+							   act_num);
+		ra = tmp_action;
+		rm = tmp_mask;
+		act_num++;
+		action_flags |= MLX5_FLOW_ACTION_MODIFY_FIELD;
+	}
 	if (set_vlan_vid_ix != -1) {
 		/* If temporary action buffer was not used, copy template actions to it */
 		if (ra == actions && rm == masks) {
@@ -3859,6 +4347,7 @@ flow_hw_actions_template_create(struct rte_eth_dev *dev,
 	at->tmpl = flow_hw_dr_actions_template_create(at);
 	if (!at->tmpl)
 		goto error;
+	at->action_flags = action_flags;
 	__atomic_fetch_add(&at->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_at, at, next);
 	return at;
@@ -4202,6 +4691,7 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 		 struct rte_flow_queue_info *queue_info,
 		 struct rte_flow_error *error __rte_unused)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	uint16_t port_id = dev->data->port_id;
 	struct rte_mtr_capabilities mtr_cap;
 	int ret;
@@ -4215,6 +4705,8 @@ flow_hw_info_get(struct rte_eth_dev *dev,
 	ret = rte_mtr_capabilities_get(port_id, &mtr_cap, NULL);
 	if (!ret)
 		port_info->max_nb_meters = mtr_cap.n_max;
+	port_info->max_nb_counters = priv->sh->hws_max_nb_counters;
+	port_info->max_nb_aging_objects = port_info->max_nb_counters;
 	return 0;
 }
 
@@ -5589,8 +6081,6 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			goto err;
 		}
 	}
-	if (_queue_attr)
-		mlx5_free(_queue_attr);
 	if (port_attr->nb_conn_tracks) {
 		mem_size = sizeof(struct mlx5_aso_sq) * nb_q_updated +
 			   sizeof(*priv->ct_mng);
@@ -5607,13 +6097,37 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	}
 	if (port_attr->nb_counters) {
 		priv->hws_cpool = mlx5_hws_cnt_pool_create(dev, port_attr,
-				nb_queue);
+							   nb_queue);
 		if (priv->hws_cpool == NULL)
 			goto err;
 	}
+	if (port_attr->nb_aging_objects) {
+		if (port_attr->nb_counters == 0) {
+			/*
+			 * Aging management uses counter. Number counters
+			 * requesting should take into account a counter for
+			 * each flow rules containing AGE without counter.
+			 */
+			DRV_LOG(ERR, "Port %u AGE objects are requested (%u) "
+				"without counters requesting.",
+				dev->data->port_id,
+				port_attr->nb_aging_objects);
+			rte_errno = EINVAL;
+			goto err;
+		}
+		ret = mlx5_hws_age_pool_init(dev, port_attr, nb_queue);
+		if (ret < 0)
+			goto err;
+	}
 	ret = flow_hw_create_vlan(dev);
 	if (ret)
 		goto err;
+	if (_queue_attr)
+		mlx5_free(_queue_attr);
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	if (port_attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE)
+		priv->hws_strict_queue = 1;
+#endif
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -5624,6 +6138,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
+	if (priv->hws_cpool) {
+		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+		priv->hws_cpool = NULL;
+	}
 	flow_hw_free_vport_actions(priv);
 	for (i = 0; i < MLX5_HW_ACTION_FLAG_MAX; i++) {
 		if (priv->hw_drop[i])
@@ -5697,8 +6217,12 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		mlx5_ipool_destroy(priv->acts_ipool);
 		priv->acts_ipool = NULL;
 	}
-	if (priv->hws_cpool)
+	if (priv->hws_age_req)
+		mlx5_hws_age_pool_destroy(priv);
+	if (priv->hws_cpool) {
 		mlx5_hws_cnt_pool_destroy(priv->sh, priv->hws_cpool);
+		priv->hws_cpool = NULL;
+	}
 	if (priv->hws_ctpool) {
 		flow_hw_ct_pool_destroy(dev, priv->hws_ctpool);
 		priv->hws_ctpool = NULL;
@@ -6033,13 +6557,81 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 		MLX5_ACTION_CTX_CT_GEN_IDX(PORT_ID(priv), ct_idx);
 }
 
+/**
+ * Validate shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used.
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] conf
+ *   Indirect action configuration.
+ * @param[in] action
+ *   rte_flow action detail.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_handle_validate(struct rte_eth_dev *dev, uint32_t queue,
+			       const struct rte_flow_op_attr *attr,
+			       const struct rte_flow_indir_action_conf *conf,
+			       const struct rte_flow_action *action,
+			       void *user_data,
+			       struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	RTE_SET_USED(attr);
+	RTE_SET_USED(queue);
+	RTE_SET_USED(user_data);
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		if (!priv->hws_age_req)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "aging pool not initialized");
+		break;
+	case RTE_FLOW_ACTION_TYPE_COUNT:
+		if (!priv->hws_cpool)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "counters pool not initialized");
+		break;
+	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
+		if (priv->hws_ctpool == NULL)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ACTION,
+						  NULL,
+						  "CT pool not initialized");
+		return mlx5_validate_action_ct(dev, action->conf, error);
+	case RTE_FLOW_ACTION_TYPE_METER_MARK:
+		return flow_hw_validate_action_meter_mark(dev, action, error);
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		return flow_dv_action_validate(dev, conf, action, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
+	}
+	return 0;
+}
+
 /**
  * Create shared action.
  *
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] conf
@@ -6064,16 +6656,44 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 {
 	struct rte_flow_action_handle *handle = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
+	uint32_t age_idx;
 
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_AGE:
+		if (priv->hws_strict_queue) {
+			struct mlx5_age_info *info = GET_PORT_AGE_INFO(priv);
+
+			if (queue >= info->hw_q_age->nb_rings) {
+				rte_flow_error_set(error, EINVAL,
+						   RTE_FLOW_ERROR_TYPE_ACTION,
+						   NULL,
+						   "Invalid queue ID for indirect AGE.");
+				rte_errno = EINVAL;
+				return NULL;
+			}
+		}
+		age = action->conf;
+		age_idx = mlx5_hws_age_action_create(priv, queue, true, age,
+						     0, error);
+		if (age_idx == 0) {
+			rte_flow_error_set(error, ENODEV,
+					   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					   "AGE are not configured!");
+		} else {
+			age_idx = (MLX5_INDIRECT_ACTION_TYPE_AGE <<
+				   MLX5_INDIRECT_ACTION_TYPE_OFFSET) | age_idx;
+			handle =
+			    (struct rte_flow_action_handle *)(uintptr_t)age_idx;
+		}
+		break;
 	case RTE_FLOW_ACTION_TYPE_COUNT:
-		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id))
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool, &cnt_id, 0))
 			rte_flow_error_set(error, ENODEV,
 					RTE_FLOW_ERROR_TYPE_ACTION,
 					NULL,
@@ -6093,8 +6713,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			MLX5_INDIRECT_ACTION_TYPE_OFFSET) | (aso_mtr->fm.meter_id);
 		handle = (struct rte_flow_action_handle *)(uintptr_t)mtr_id;
 		break;
-	default:
+	case RTE_FLOW_ACTION_TYPE_RSS:
 		handle = flow_dv_action_create(dev, conf, action, error);
+		break;
+	default:
+		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				   NULL, "action type not supported");
+		return NULL;
 	}
 	return handle;
 }
@@ -6105,7 +6730,7 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6128,7 +6753,6 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(queue);
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6143,6 +6767,8 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_update(priv, idx, update, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
@@ -6176,11 +6802,15 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		return 0;
-	default:
 		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
+		return flow_dv_action_update(dev, handle, update, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
-	return flow_dv_action_update(dev, handle, update, error);
+	return 0;
 }
 
 /**
@@ -6189,7 +6819,7 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
  * @param[in] dev
  *   Pointer to the rte_eth_dev structure.
  * @param[in] queue
- *   Which queue to be used..
+ *   Which queue to be used.
  * @param[in] attr
  *   Operation attribute.
  * @param[in] handle
@@ -6211,6 +6841,7 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -6221,7 +6852,16 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	RTE_SET_USED(attr);
 	RTE_SET_USED(user_data);
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return mlx5_hws_age_action_destroy(priv, age_idx, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
+		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
+		if (age_idx != 0)
+			/*
+			 * If this counter belongs to indirect AGE, here is the
+			 * time to update the AGE.
+			 */
+			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
 		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_destroy(dev, act_idx, error);
@@ -6246,10 +6886,15 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
 		mlx5_ipool_free(pool->idx_pool, idx);
-		return 0;
-	default:
+		break;
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_destroy(dev, handle, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
+	return 0;
 }
 
 static int
@@ -6259,13 +6904,14 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hws_cnt *cnt;
 	struct rte_flow_query_count *qc = data;
-	uint32_t iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
+	uint32_t iidx;
 	uint64_t pkts, bytes;
 
 	if (!mlx5_hws_cnt_id_valid(counter))
 		return rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 				"counter are not available");
+	iidx = mlx5_hws_cnt_iidx(priv->hws_cpool, counter);
 	cnt = &priv->hws_cpool->pool[iidx];
 	__hws_cnt_query_raw(priv->hws_cpool, counter, &pkts, &bytes);
 	qc->hits_set = 1;
@@ -6279,12 +6925,64 @@ flow_hw_query_counter(const struct rte_eth_dev *dev, uint32_t counter,
 	return 0;
 }
 
+/**
+ * Query a flow rule AGE action for aging information.
+ *
+ * @param[in] dev
+ *   Pointer to Ethernet device.
+ * @param[in] age_idx
+ *   Index of AGE action parameter.
+ * @param[out] data
+ *   Data retrieved by the query.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+flow_hw_query_age(const struct rte_eth_dev *dev, uint32_t age_idx, void *data,
+		  struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+	struct rte_flow_query_age *resp = data;
+
+	if (!param || !param->timeout)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "age data not available");
+	switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+	case HWS_AGE_AGED_OUT_REPORTED:
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		resp->aged = 1;
+		break;
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		resp->aged = 0;
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * When state is FREE the flow itself should be invalid.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	resp->sec_since_last_hit_valid = !resp->aged;
+	if (resp->sec_since_last_hit_valid)
+		resp->sec_since_last_hit = __atomic_load_n
+				 (&param->sec_since_last_hit, __ATOMIC_RELAXED);
+	return 0;
+}
+
 static int
-flow_hw_query(struct rte_eth_dev *dev,
-	      struct rte_flow *flow __rte_unused,
-	      const struct rte_flow_action *actions __rte_unused,
-	      void *data __rte_unused,
-	      struct rte_flow_error *error __rte_unused)
+flow_hw_query(struct rte_eth_dev *dev, struct rte_flow *flow,
+	      const struct rte_flow_action *actions, void *data,
+	      struct rte_flow_error *error)
 {
 	int ret = -EINVAL;
 	struct rte_flow_hw *hw_flow = (struct rte_flow_hw *)flow;
@@ -6295,7 +6993,11 @@ flow_hw_query(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_COUNT:
 			ret = flow_hw_query_counter(dev, hw_flow->cnt_id, data,
-						  error);
+						    error);
+			break;
+		case RTE_FLOW_ACTION_TYPE_AGE:
+			ret = flow_hw_query_age(dev, hw_flow->age_idx, data,
+						error);
 			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
@@ -6307,6 +7009,32 @@ flow_hw_query(struct rte_eth_dev *dev,
 	return ret;
 }
 
+/**
+ * Validate indirect action.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] conf
+ *   Shared action configuration.
+ * @param[in] action
+ *   Action specification used to create indirect action.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   0 on success, otherwise negative errno value.
+ */
+static int
+flow_hw_action_validate(struct rte_eth_dev *dev,
+			const struct rte_flow_indir_action_conf *conf,
+			const struct rte_flow_action *action,
+			struct rte_flow_error *err)
+{
+	return flow_hw_action_handle_validate(dev, MLX5_HW_INV_QUEUE, NULL,
+					      conf, action, NULL, err);
+}
+
 /**
  * Create indirect action.
  *
@@ -6396,17 +7124,118 @@ flow_hw_action_query(struct rte_eth_dev *dev,
 {
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
+	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
 
 	switch (type) {
+	case MLX5_INDIRECT_ACTION_TYPE_AGE:
+		return flow_hw_query_age(dev, age_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		return flow_hw_query_counter(dev, act_idx, data, error);
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
 		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	default:
+	case MLX5_INDIRECT_ACTION_TYPE_RSS:
 		return flow_dv_action_query(dev, handle, data, error);
+	default:
+		return rte_flow_error_set(error, ENOTSUP,
+					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					  "action type not supported");
 	}
 }
 
+/**
+ * Get aged-out flows of a given port on the given HWS flow queue.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] queue_id
+ *   Flow queue to query. Ignored when RTE_FLOW_PORT_FLAG_STRICT_QUEUE not set.
+ * @param[in, out] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array, otherwise negative errno value.
+ */
+static int
+flow_hw_get_q_aged_flows(struct rte_eth_dev *dev, uint32_t queue_id,
+			 void **contexts, uint32_t nb_contexts,
+			 struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct rte_ring *r;
+	int nb_flows = 0;
+
+	if (nb_contexts && !contexts)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "empty context");
+	if (priv->hws_strict_queue) {
+		if (queue_id >= age_info->hw_q_age->nb_rings)
+			return rte_flow_error_set(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+						NULL, "invalid queue id");
+		r = age_info->hw_q_age->aged_lists[queue_id];
+	} else {
+		r = age_info->hw_age.aged_list;
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	if (nb_contexts == 0)
+		return rte_ring_count(r);
+	while ((uint32_t)nb_flows < nb_contexts) {
+		uint32_t age_idx;
+
+		if (rte_ring_dequeue_elem(r, &age_idx, sizeof(uint32_t)) < 0)
+			break;
+		/* get the AGE context if the aged-out index is still valid. */
+		contexts[nb_flows] = mlx5_hws_age_context_get(priv, age_idx);
+		if (!contexts[nb_flows])
+			continue;
+		nb_flows++;
+	}
+	return nb_flows;
+}
+
+/**
+ * Get aged-out flows.
+ *
+ * This function is relevant only if RTE_FLOW_PORT_FLAG_STRICT_QUEUE isn't set.
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ * @param[in] contexts
+ *   The address of an array of pointers to the aged-out flows contexts.
+ * @param[in] nb_contexts
+ *   The length of context array pointers.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. Initialized in case of
+ *   error only.
+ *
+ * @return
+ *   how many contexts get in success, otherwise negative errno value.
+ *   if nb_contexts is 0, return the amount of all aged contexts.
+ *   if nb_contexts is not 0 , return the amount of aged flows reported
+ *   in the context array.
+ */
+static int
+flow_hw_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
+		       uint32_t nb_contexts, struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hws_strict_queue)
+		DRV_LOG(WARNING,
+			"port %u get aged flows called in strict queue mode.",
+			dev->data->port_id);
+	return flow_hw_get_q_aged_flows(dev, 0, contexts, nb_contexts, error);
+}
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.info_get = flow_hw_info_get,
 	.configure = flow_hw_configure,
@@ -6425,12 +7254,14 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
-	.action_validate = flow_dv_action_validate,
+	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
 	.action_update = flow_hw_action_update,
 	.action_query = flow_hw_action_query,
 	.query = flow_hw_query,
+	.get_aged_flows = flow_hw_get_aged_flows,
+	.get_q_aged_flows = flow_hw_get_q_aged_flows,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 7ffaf4c227..81a33ddf09 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -122,7 +122,7 @@ flow_verbs_counter_get_by_idx(struct rte_eth_dev *dev,
 			      struct mlx5_flow_counter_pool **ppool)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool;
 
 	idx = (idx - 1) & (MLX5_CNT_SHARED_OFFSET - 1);
@@ -215,7 +215,7 @@ static uint32_t
 flow_verbs_counter_new(struct rte_eth_dev *dev, uint32_t id __rte_unused)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct mlx5_flow_counter_mng *cmng = &priv->sh->cmng;
+	struct mlx5_flow_counter_mng *cmng = &priv->sh->sws_cmng;
 	struct mlx5_flow_counter_pool *pool = NULL;
 	struct mlx5_flow_counter *cnt = NULL;
 	uint32_t n_valid = cmng->n_valid;
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
index d826ebaa25..9c37700f94 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.c
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -8,6 +8,7 @@
 #include <rte_ring.h>
 #include <mlx5_devx_cmds.h>
 #include <rte_cycles.h>
+#include <rte_eal_paging.h>
 
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 
@@ -26,8 +27,8 @@ __hws_cnt_id_load(struct mlx5_hws_cnt_pool *cpool)
 	uint32_t preload;
 	uint32_t q_num = cpool->cache->q_num;
 	uint32_t cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
-	cnt_id_t cnt_id, iidx = 0;
-	uint32_t qidx;
+	cnt_id_t cnt_id;
+	uint32_t qidx, iidx = 0;
 	struct rte_ring *qcache = NULL;
 
 	/*
@@ -86,6 +87,174 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
 	} while (reset_cnt_num > 0);
 }
 
+/**
+ * Release AGE parameter.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param own_cnt_index
+ *   Counter ID to created only for this AGE to release.
+ *   Zero means there is no such counter.
+ * @param age_ipool
+ *   Pointer to AGE parameter indexed pool.
+ * @param idx
+ *   Index of AGE parameter in the indexed pool.
+ */
+static void
+mlx5_hws_age_param_free(struct mlx5_priv *priv, cnt_id_t own_cnt_index,
+			struct mlx5_indexed_pool *age_ipool, uint32_t idx)
+{
+	if (own_cnt_index) {
+		struct mlx5_hws_cnt_pool *cpool = priv->hws_cpool;
+
+		MLX5_ASSERT(mlx5_hws_cnt_is_shared(cpool, own_cnt_index));
+		mlx5_hws_cnt_shared_put(cpool, &own_cnt_index);
+	}
+	mlx5_ipool_free(age_ipool, idx);
+}
+
+/**
+ * Check and callback event for new aged flow in the HWS counter pool.
+ *
+ * @param[in] priv
+ *   Pointer to port private object.
+ * @param[in] cpool
+ *   Pointer to current counter pool.
+ */
+static void
+mlx5_hws_aging_check(struct mlx5_priv *priv, struct mlx5_hws_cnt_pool *cpool)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct flow_counter_stats *stats = cpool->raw_mng->raw;
+	struct mlx5_hws_age_param *param;
+	struct rte_ring *r;
+	const uint64_t curr_time = MLX5_CURR_TIME_SEC;
+	const uint32_t time_delta = curr_time - cpool->time_of_last_age_check;
+	uint32_t nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(cpool);
+	uint16_t expected1 = HWS_AGE_CANDIDATE;
+	uint16_t expected2 = HWS_AGE_CANDIDATE_INSIDE_RING;
+	uint32_t i;
+
+	cpool->time_of_last_age_check = curr_time;
+	for (i = 0; i < nb_alloc_cnts; ++i) {
+		uint32_t age_idx = cpool->pool[i].age_idx;
+		uint64_t hits;
+
+		if (!cpool->pool[i].in_used || age_idx == 0)
+			continue;
+		param = mlx5_ipool_get(age_info->ages_ipool, age_idx);
+		if (unlikely(param == NULL)) {
+			/*
+			 * When AGE which used indirect counter it is user
+			 * responsibility not using this indirect counter
+			 * without this AGE.
+			 * If this counter is used after the AGE was freed, the
+			 * AGE index is invalid and using it here will cause a
+			 * segmentation fault.
+			 */
+			DRV_LOG(WARNING,
+				"Counter %u is lost his AGE, it is unused.", i);
+			continue;
+		}
+		if (param->timeout == 0)
+			continue;
+		switch (__atomic_load_n(&param->state, __ATOMIC_RELAXED)) {
+		case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		case HWS_AGE_AGED_OUT_REPORTED:
+			/* Already aged-out, no action is needed. */
+			continue;
+		case HWS_AGE_CANDIDATE:
+		case HWS_AGE_CANDIDATE_INSIDE_RING:
+			/* This AGE candidate to be aged-out, go to checking. */
+			break;
+		case HWS_AGE_FREE:
+			/*
+			 * AGE parameter with state "FREE" couldn't be pointed
+			 * by any counter since counter is destroyed first.
+			 * Fall-through.
+			 */
+		default:
+			MLX5_ASSERT(0);
+			continue;
+		}
+		hits = rte_be_to_cpu_64(stats[i].hits);
+		if (param->nb_cnts == 1) {
+			if (hits != param->accumulator_last_hits) {
+				__atomic_store_n(&param->sec_since_last_hit, 0,
+						 __ATOMIC_RELAXED);
+				param->accumulator_last_hits = hits;
+				continue;
+			}
+		} else {
+			param->accumulator_hits += hits;
+			param->accumulator_cnt++;
+			if (param->accumulator_cnt < param->nb_cnts)
+				continue;
+			param->accumulator_cnt = 0;
+			if (param->accumulator_last_hits !=
+						param->accumulator_hits) {
+				__atomic_store_n(&param->sec_since_last_hit,
+						 0, __ATOMIC_RELAXED);
+				param->accumulator_last_hits =
+							param->accumulator_hits;
+				param->accumulator_hits = 0;
+				continue;
+			}
+			param->accumulator_hits = 0;
+		}
+		if (__atomic_add_fetch(&param->sec_since_last_hit, time_delta,
+				       __ATOMIC_RELAXED) <=
+		   __atomic_load_n(&param->timeout, __ATOMIC_RELAXED))
+			continue;
+		/* Prepare the relevant ring for this AGE parameter */
+		if (priv->hws_strict_queue)
+			r = age_info->hw_q_age->aged_lists[param->queue_id];
+		else
+			r = age_info->hw_age.aged_list;
+		/* Changing the state atomically and insert it into the ring. */
+		if (__atomic_compare_exchange_n(&param->state, &expected1,
+						HWS_AGE_AGED_OUT_NOT_REPORTED,
+						false, __ATOMIC_RELAXED,
+						__ATOMIC_RELAXED)) {
+			int ret = rte_ring_enqueue_burst_elem(r, &age_idx,
+							      sizeof(uint32_t),
+							      1, NULL);
+
+			/*
+			 * The ring doesn't have enough room for this entry,
+			 * it replace back the state for the next second.
+			 *
+			 * FIXME: if until next sec it get traffic, we are going
+			 *        to lose this "aged out", will be fixed later
+			 *        when optimise it to fill ring in bulks.
+			 */
+			expected2 = HWS_AGE_AGED_OUT_NOT_REPORTED;
+			if (ret == 0 &&
+			    !__atomic_compare_exchange_n(&param->state,
+							 &expected2, expected1,
+							 false,
+							 __ATOMIC_RELAXED,
+							 __ATOMIC_RELAXED) &&
+			    expected2 == HWS_AGE_FREE)
+				mlx5_hws_age_param_free(priv,
+							param->own_cnt_index,
+							age_info->ages_ipool,
+							age_idx);
+			/* The event is irrelevant in strict queue mode. */
+			if (!priv->hws_strict_queue)
+				MLX5_AGE_SET(age_info, MLX5_AGE_EVENT_NEW);
+		} else {
+			__atomic_compare_exchange_n(&param->state, &expected2,
+						  HWS_AGE_AGED_OUT_NOT_REPORTED,
+						  false, __ATOMIC_RELAXED,
+						  __ATOMIC_RELAXED);
+		}
+	}
+	/* The event is irrelevant in strict queue mode. */
+	if (!priv->hws_strict_queue)
+		mlx5_age_event_prepare(priv->sh);
+}
+
 static void
 mlx5_hws_cnt_raw_data_free(struct mlx5_dev_ctx_shared *sh,
 			   struct mlx5_hws_cnt_raw_data_mng *mng)
@@ -104,12 +273,14 @@ mlx5_hws_cnt_raw_data_alloc(struct mlx5_dev_ctx_shared *sh, uint32_t n)
 	struct mlx5_hws_cnt_raw_data_mng *mng = NULL;
 	int ret;
 	size_t sz = n * sizeof(struct flow_counter_stats);
+	size_t pgsz = rte_mem_page_size();
 
+	MLX5_ASSERT(pgsz > 0);
 	mng = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sizeof(*mng), 0,
 			SOCKET_ID_ANY);
 	if (mng == NULL)
 		goto error;
-	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, 0,
+	mng->raw = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, sz, pgsz,
 			SOCKET_ID_ANY);
 	if (mng->raw == NULL)
 		goto error;
@@ -146,6 +317,9 @@ mlx5_hws_cnt_svc(void *opaque)
 			    opriv->sh == sh &&
 			    opriv->hws_cpool != NULL) {
 				__mlx5_hws_cnt_svc(sh, opriv->hws_cpool);
+				if (opriv->hws_age_req)
+					mlx5_hws_aging_check(opriv,
+							     opriv->hws_cpool);
 			}
 		}
 		query_cycle = rte_rdtsc() - start_cycle;
@@ -158,8 +332,9 @@ mlx5_hws_cnt_svc(void *opaque)
 }
 
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg)
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct mlx5_hws_cnt_pool *cntp;
@@ -185,16 +360,26 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 	cntp->cache->preload_sz = ccfg->preload_sz;
 	cntp->cache->threshold = ccfg->threshold;
 	cntp->cache->q_num = ccfg->q_num;
+	if (pcfg->request_num > sh->hws_max_nb_counters) {
+		DRV_LOG(ERR, "Counter number %u "
+			"is greater than the maximum supported (%u).",
+			pcfg->request_num, sh->hws_max_nb_counters);
+		goto error;
+	}
 	cnt_num = pcfg->request_num * (100 + pcfg->alloc_factor) / 100;
 	if (cnt_num > UINT32_MAX) {
 		DRV_LOG(ERR, "counter number %"PRIu64" is out of 32bit range",
 			cnt_num);
 		goto error;
 	}
+	/*
+	 * When counter request number is supported, but the factor takes it
+	 * out of size, the factor is reduced.
+	 */
+	cnt_num = RTE_MIN((uint32_t)cnt_num, sh->hws_max_nb_counters);
 	cntp->pool = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
-			sizeof(struct mlx5_hws_cnt) *
-			pcfg->request_num * (100 + pcfg->alloc_factor) / 100,
-			0, SOCKET_ID_ANY);
+				 sizeof(struct mlx5_hws_cnt) * cnt_num,
+				 0, SOCKET_ID_ANY);
 	if (cntp->pool == NULL)
 		goto error;
 	snprintf(mz_name, sizeof(mz_name), "%s_F_RING", pcfg->name);
@@ -231,6 +416,8 @@ mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
 		if (cntp->cache->qcache[qidx] == NULL)
 			goto error;
 	}
+	/* Initialize the time for aging-out calculation. */
+	cntp->time_of_last_age_check = MLX5_CURR_TIME_SEC;
 	return cntp;
 error:
 	mlx5_hws_cnt_pool_deinit(cntp);
@@ -297,19 +484,17 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_hws_cnt_pool *cpool)
 {
 	struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
-	uint32_t max_log_bulk_sz = 0;
+	uint32_t max_log_bulk_sz = sh->hws_max_log_bulk_sz;
 	uint32_t log_bulk_sz;
-	uint32_t idx, alloced = 0;
+	uint32_t idx, alloc_candidate, alloced = 0;
 	unsigned int cnt_num = mlx5_hws_cnt_pool_get_size(cpool);
 	struct mlx5_devx_counter_attr attr = {0};
 	struct mlx5_devx_obj *dcs;
 
 	if (hca_attr->flow_counter_bulk_log_max_alloc == 0) {
-		DRV_LOG(ERR,
-			"Fw doesn't support bulk log max alloc");
+		DRV_LOG(ERR, "Fw doesn't support bulk log max alloc");
 		return -1;
 	}
-	max_log_bulk_sz = 23; /* hard code to 8M (1 << 23). */
 	cnt_num = RTE_ALIGN_CEIL(cnt_num, 4); /* minimal 4 counter in bulk. */
 	log_bulk_sz = RTE_MIN(max_log_bulk_sz, rte_log2_u32(cnt_num));
 	attr.pd = sh->cdev->pdn;
@@ -327,18 +512,23 @@ mlx5_hws_cnt_pool_dcs_alloc(struct mlx5_dev_ctx_shared *sh,
 	cpool->dcs_mng.dcs[0].iidx = 0;
 	alloced = cpool->dcs_mng.dcs[0].batch_sz;
 	if (cnt_num > cpool->dcs_mng.dcs[0].batch_sz) {
-		for (; idx < MLX5_HWS_CNT_DCS_NUM; idx++) {
+		while (idx < MLX5_HWS_CNT_DCS_NUM) {
 			attr.flow_counter_bulk_log_size = --max_log_bulk_sz;
+			alloc_candidate = RTE_BIT32(max_log_bulk_sz);
+			if (alloced + alloc_candidate > sh->hws_max_nb_counters)
+				continue;
 			dcs = mlx5_devx_cmd_flow_counter_alloc_general
 				(sh->cdev->ctx, &attr);
 			if (dcs == NULL)
 				goto error;
 			cpool->dcs_mng.dcs[idx].obj = dcs;
-			cpool->dcs_mng.dcs[idx].batch_sz =
-				(1 << max_log_bulk_sz);
+			cpool->dcs_mng.dcs[idx].batch_sz = alloc_candidate;
 			cpool->dcs_mng.dcs[idx].iidx = alloced;
 			alloced += cpool->dcs_mng.dcs[idx].batch_sz;
 			cpool->dcs_mng.batch_total++;
+			if (alloced >= cnt_num)
+				break;
+			idx++;
 		}
 	}
 	return 0;
@@ -445,7 +635,7 @@ mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev,
 			dev->data->port_id);
 	pcfg.name = mp_name;
 	pcfg.request_num = pattr->nb_counters;
-	cpool = mlx5_hws_cnt_pool_init(&pcfg, &cparam);
+	cpool = mlx5_hws_cnt_pool_init(priv->sh, &pcfg, &cparam);
 	if (cpool == NULL)
 		goto error;
 	ret = mlx5_hws_cnt_pool_dcs_alloc(priv->sh, cpool);
@@ -525,4 +715,533 @@ mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh)
 	sh->cnt_svc = NULL;
 }
 
+/**
+ * Destroy AGE action.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ * @param error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	switch (__atomic_exchange_n(&param->state, HWS_AGE_FREE,
+				    __ATOMIC_RELAXED)) {
+	case HWS_AGE_CANDIDATE:
+	case HWS_AGE_AGED_OUT_REPORTED:
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		/*
+		 * In both cases AGE is inside the ring. Change the state here
+		 * and destroy it later when it is taken out of ring.
+		 */
+		break;
+	case HWS_AGE_FREE:
+		/*
+		 * If index is valid and state is FREE, it says this AGE has
+		 * been freed for the user but not for the PMD since it is
+		 * inside the ring.
+		 */
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "this AGE has already been released");
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return 0;
+}
+
+/**
+ * Create AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] queue_id
+ *   Which HWS queue to be used.
+ * @param[in] shared
+ *   Whether it indirect AGE action.
+ * @param[in] flow_idx
+ *   Flow index from indexed pool.
+ *   For indirect AGE action it doesn't affect.
+ * @param[in] age
+ *   Pointer to the aging action configuration.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   Index to AGE action parameter on success, 0 otherwise.
+ */
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param;
+	uint32_t age_idx;
+
+	param = mlx5_ipool_malloc(ipool, &age_idx);
+	if (param == NULL) {
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   "cannot allocate AGE parameter");
+		return 0;
+	}
+	MLX5_ASSERT(__atomic_load_n(&param->state,
+				    __ATOMIC_RELAXED) == HWS_AGE_FREE);
+	if (shared) {
+		param->nb_cnts = 0;
+		param->accumulator_hits = 0;
+		param->accumulator_cnt = 0;
+		flow_idx = age_idx;
+	} else {
+		param->nb_cnts = 1;
+	}
+	param->context = age->context ? age->context :
+					(void *)(uintptr_t)flow_idx;
+	param->timeout = age->timeout;
+	param->queue_id = queue_id;
+	param->accumulator_last_hits = 0;
+	param->own_cnt_index = 0;
+	param->sec_since_last_hit = 0;
+	param->state = HWS_AGE_CANDIDATE;
+	return age_idx;
+}
+
+/**
+ * Update indirect AGE action parameter.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ * @param[in] idx
+ *   Index of AGE parameter.
+ * @param[in] update
+ *   Update value.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error)
+{
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	const struct rte_flow_update_age *update_ade = update;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	bool sec_since_last_hit_reset = false;
+	bool state_update = false;
+
+	if (param == NULL)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  "invalid AGE parameter index");
+	if (update_ade->timeout_valid) {
+		uint32_t old_timeout = __atomic_exchange_n(&param->timeout,
+							   update_ade->timeout,
+							   __ATOMIC_RELAXED);
+
+		if (old_timeout == 0)
+			sec_since_last_hit_reset = true;
+		else if (old_timeout < update_ade->timeout ||
+			 update_ade->timeout == 0)
+			/*
+			 * When timeout is increased, aged-out flows might be
+			 * active again and state should be updated accordingly.
+			 * When new timeout is 0, we update the state for not
+			 * reporting aged-out stopped.
+			 */
+			state_update = true;
+	}
+	if (update_ade->touch) {
+		sec_since_last_hit_reset = true;
+		state_update = true;
+	}
+	if (sec_since_last_hit_reset)
+		__atomic_store_n(&param->sec_since_last_hit, 0,
+				 __ATOMIC_RELAXED);
+	if (state_update) {
+		uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+		/*
+		 * Change states of aged-out flows to active:
+		 *  - AGED_OUT_NOT_REPORTED -> CANDIDATE_INSIDE_RING
+		 *  - AGED_OUT_REPORTED -> CANDIDATE
+		 */
+		if (!__atomic_compare_exchange_n(&param->state, &expected,
+						 HWS_AGE_CANDIDATE_INSIDE_RING,
+						 false, __ATOMIC_RELAXED,
+						 __ATOMIC_RELAXED) &&
+		    expected == HWS_AGE_AGED_OUT_REPORTED)
+			__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+					 __ATOMIC_RELAXED);
+	}
+	return 0;
+#else
+	RTE_SET_USED(priv);
+	RTE_SET_USED(idx);
+	RTE_SET_USED(update);
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "update age action not supported");
+#endif
+}
+
+/**
+ * Get the AGE context if the aged-out index is still valid.
+ *
+ * @param priv
+ *   Pointer to the port private data structure.
+ * @param idx
+ *   Index of AGE parameter.
+ *
+ * @return
+ *   AGE context if the index is still aged-out, NULL otherwise.
+ */
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, idx);
+	uint16_t expected = HWS_AGE_AGED_OUT_NOT_REPORTED;
+
+	MLX5_ASSERT(param != NULL);
+	if (__atomic_compare_exchange_n(&param->state, &expected,
+					HWS_AGE_AGED_OUT_REPORTED, false,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
+		return param->context;
+	switch (expected) {
+	case HWS_AGE_FREE:
+		/*
+		 * This AGE couldn't have been destroyed since it was inside
+		 * the ring. Its state has updated, and now it is actually
+		 * destroyed.
+		 */
+		mlx5_hws_age_param_free(priv, param->own_cnt_index, ipool, idx);
+		break;
+	case HWS_AGE_CANDIDATE_INSIDE_RING:
+		__atomic_store_n(&param->state, HWS_AGE_CANDIDATE,
+				 __ATOMIC_RELAXED);
+		break;
+	case HWS_AGE_CANDIDATE:
+		/*
+		 * Only BG thread pushes to ring and it never pushes this state.
+		 * When AGE inside the ring becomes candidate, it has a special
+		 * state called HWS_AGE_CANDIDATE_INSIDE_RING.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_REPORTED:
+		/*
+		 * Only this thread (doing query) may write this state, and it
+		 * happens only after the query thread takes it out of the ring.
+		 * Fall-through.
+		 */
+	case HWS_AGE_AGED_OUT_NOT_REPORTED:
+		/*
+		 * In this case the compare return true and function return
+		 * the context immediately.
+		 * Fall-through.
+		 */
+	default:
+		MLX5_ASSERT(0);
+		break;
+	}
+	return NULL;
+}
+
+#ifdef RTE_ARCH_64
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX UINT32_MAX
+#else
+#define MLX5_HWS_AGED_OUT_RING_SIZE_MAX RTE_BIT32(8)
+#endif
+
+/**
+ * Get the size of aged out ring list for each queue.
+ *
+ * The size is one percent of nb_counters divided by nb_queues.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is on.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ * @param nb_queues
+ *   Number of HWS queues in this port.
+ *
+ * @return
+ *   Size of aged out ring per queue.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_q_ring_size_get(uint32_t nb_counters, uint32_t nb_queues)
+{
+	uint32_t size = rte_align32pow2((nb_counters / 100) / nb_queues);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Get the size of the aged out ring list.
+ *
+ * The size is one percent of nb_counters.
+ * The ring size must be power of 2, so it align up to power of 2.
+ * In 32 bit systems, the size is limited by 256.
+ *
+ * This function is called when RTE_FLOW_PORT_FLAG_STRICT_QUEUE is off.
+ *
+ * @param nb_counters
+ *   Final number of allocated counter in the pool.
+ *
+ * @return
+ *   Size of the aged out ring list.
+ */
+static __rte_always_inline uint32_t
+mlx5_hws_aged_out_ring_size_get(uint32_t nb_counters)
+{
+	uint32_t size = rte_align32pow2(nb_counters / 100);
+	uint32_t max_size = MLX5_HWS_AGED_OUT_RING_SIZE_MAX;
+
+	return RTE_MIN(size, max_size);
+}
+
+/**
+ * Initialize the shared aging list information per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param nb_queues
+ *   Number of HWS queues.
+ * @param strict_queue
+ *   Indicator whether is strict_queue mode.
+ * @param ring_size
+ *   Size of aged-out ring for creation.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_hws_age_info_init(struct rte_eth_dev *dev, uint16_t nb_queues,
+		       bool strict_queue, uint32_t ring_size)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint32_t flags = RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	struct rte_ring *r = NULL;
+	uint32_t qidx;
+
+	age_info->flags = 0;
+	if (strict_queue) {
+		size_t size = sizeof(*age_info->hw_q_age) +
+			      sizeof(struct rte_ring *) * nb_queues;
+
+		age_info->hw_q_age = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO,
+						 size, 0, SOCKET_ID_ANY);
+		if (age_info->hw_q_age == NULL)
+			return -ENOMEM;
+		for (qidx = 0; qidx < nb_queues; ++qidx) {
+			snprintf(mz_name, sizeof(mz_name),
+				 "port_%u_queue_%u_aged_out_ring",
+				 dev->data->port_id, qidx);
+			r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY,
+					    flags);
+			if (r == NULL) {
+				DRV_LOG(ERR, "\"%s\" creation failed: %s",
+					mz_name, rte_strerror(rte_errno));
+				goto error;
+			}
+			age_info->hw_q_age->aged_lists[qidx] = r;
+			DRV_LOG(DEBUG,
+				"\"%s\" is successfully created (size=%u).",
+				mz_name, ring_size);
+		}
+		age_info->hw_q_age->nb_rings = nb_queues;
+	} else {
+		snprintf(mz_name, sizeof(mz_name), "port_%u_aged_out_ring",
+			 dev->data->port_id);
+		r = rte_ring_create(mz_name, ring_size, SOCKET_ID_ANY, flags);
+		if (r == NULL) {
+			DRV_LOG(ERR, "\"%s\" creation failed: %s", mz_name,
+				rte_strerror(rte_errno));
+			return -rte_errno;
+		}
+		age_info->hw_age.aged_list = r;
+		DRV_LOG(DEBUG, "\"%s\" is successfully created (size=%u).",
+			mz_name, ring_size);
+		/* In non "strict_queue" mode, initialize the event. */
+		MLX5_AGE_SET(age_info, MLX5_AGE_TRIGGER);
+	}
+	return 0;
+error:
+	MLX5_ASSERT(strict_queue);
+	while (qidx--)
+		rte_ring_free(age_info->hw_q_age->aged_lists[qidx]);
+	rte_free(age_info->hw_q_age);
+	return -1;
+}
+
+/**
+ * Cleanup aged-out ring before destroying.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ * @param r
+ *   Pointer to aged-out ring object.
+ */
+static void
+mlx5_hws_aged_out_ring_cleanup(struct mlx5_priv *priv, struct rte_ring *r)
+{
+	int ring_size = rte_ring_count(r);
+
+	while (ring_size > 0) {
+		uint32_t age_idx = 0;
+
+		if (rte_ring_dequeue_elem(r, &age_idx, sizeof(uint32_t)) < 0)
+			break;
+		/* get the AGE context if the aged-out index is still valid. */
+		mlx5_hws_age_context_get(priv, age_idx);
+		ring_size--;
+	}
+	rte_ring_free(r);
+}
+
+/**
+ * Destroy the shared aging list information per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+static void
+mlx5_hws_age_info_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	uint16_t nb_queues = age_info->hw_q_age->nb_rings;
+	struct rte_ring *r;
+
+	if (priv->hws_strict_queue) {
+		uint32_t qidx;
+
+		for (qidx = 0; qidx < nb_queues; ++qidx) {
+			r = age_info->hw_q_age->aged_lists[qidx];
+			mlx5_hws_aged_out_ring_cleanup(priv, r);
+		}
+		mlx5_free(age_info->hw_q_age);
+	} else {
+		r = age_info->hw_age.aged_list;
+		mlx5_hws_aged_out_ring_cleanup(priv, r);
+	}
+}
+
+/**
+ * Initialize the aging mechanism per port.
+ *
+ * @param dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param attr
+ *   Port configuration attributes.
+ * @param nb_queues
+ *   Number of HWS queues.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool_config cfg = {
+		.size =
+		      RTE_CACHE_LINE_ROUNDUP(sizeof(struct mlx5_hws_age_param)),
+		.trunk_size = 1 << 12,
+		.per_core_cache = 1 << 13,
+		.need_lock = 1,
+		.release_mem_en = !!priv->sh->config.reclaim_mode,
+		.malloc = mlx5_malloc,
+		.free = mlx5_free,
+		.type = "mlx5_hws_age_pool",
+	};
+	bool strict_queue = false;
+	uint32_t nb_alloc_cnts;
+	uint32_t rsize;
+	uint32_t nb_ages_updated;
+	int ret;
+
+#ifdef MLX5_HAVE_RTE_FLOW_Q_AGE
+	strict_queue = !!(attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE);
+#endif
+	MLX5_ASSERT(priv->hws_cpool);
+	nb_alloc_cnts = mlx5_hws_cnt_pool_get_size(priv->hws_cpool);
+	if (strict_queue) {
+		rsize = mlx5_hws_aged_out_q_ring_size_get(nb_alloc_cnts,
+							  nb_queues);
+		nb_ages_updated = rsize * nb_queues + attr->nb_aging_objects;
+	} else {
+		rsize = mlx5_hws_aged_out_ring_size_get(nb_alloc_cnts);
+		nb_ages_updated = rsize + attr->nb_aging_objects;
+	}
+	ret = mlx5_hws_age_info_init(dev, nb_queues, strict_queue, rsize);
+	if (ret < 0)
+		return ret;
+	cfg.max_idx = rte_align32pow2(nb_ages_updated);
+	if (cfg.max_idx <= cfg.trunk_size) {
+		cfg.per_core_cache = 0;
+		cfg.trunk_size = cfg.max_idx;
+	} else if (cfg.max_idx <= MLX5_HW_IPOOL_SIZE_THRESHOLD) {
+		cfg.per_core_cache = MLX5_HW_IPOOL_CACHE_MIN;
+	}
+	age_info->ages_ipool = mlx5_ipool_create(&cfg);
+	if (age_info->ages_ipool == NULL) {
+		mlx5_hws_age_info_destroy(priv);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	priv->hws_age_req = 1;
+	return 0;
+}
+
+/**
+ * Cleanup all aging resources per port.
+ *
+ * @param priv
+ *   Pointer to port private object.
+ */
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+
+	MLX5_ASSERT(priv->hws_age_req);
+	mlx5_hws_age_info_destroy(priv);
+	mlx5_ipool_destroy(age_info->ages_ipool);
+	age_info->ages_ipool = NULL;
+	priv->hws_age_req = 0;
+}
+
 #endif
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index 5fab4ba597..e311923f71 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -10,26 +10,26 @@
 #include "mlx5_flow.h"
 
 /*
- * COUNTER ID's layout
+ * HWS COUNTER ID's layout
  *       3                   2                   1                   0
  *     1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- *    | T |       | D |                                               |
- *    ~ Y |       | C |                    IDX                        ~
- *    | P |       | S |                                               |
+ *    |  T  |     | D |                                               |
+ *    ~  Y  |     | C |                    IDX                        ~
+ *    |  P  |     | S |                                               |
  *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
- *    Bit 31:30 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
+ *    Bit 31:29 = TYPE = MLX5_INDIRECT_ACTION_TYPE_COUNT = b'10
  *    Bit 25:24 = DCS index
  *    Bit 23:00 = IDX in this counter belonged DCS bulk.
  */
-typedef uint32_t cnt_id_t;
 
-#define MLX5_HWS_CNT_DCS_NUM 4
 #define MLX5_HWS_CNT_DCS_IDX_OFFSET 24
 #define MLX5_HWS_CNT_DCS_IDX_MASK 0x3
 #define MLX5_HWS_CNT_IDX_MASK ((1UL << MLX5_HWS_CNT_DCS_IDX_OFFSET) - 1)
 
+#define MLX5_HWS_AGE_IDX_MASK (RTE_BIT32(MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1)
+
 struct mlx5_hws_cnt_dcs {
 	void *dr_action;
 	uint32_t batch_sz;
@@ -44,12 +44,22 @@ struct mlx5_hws_cnt_dcs_mng {
 
 struct mlx5_hws_cnt {
 	struct flow_counter_stats reset;
+	bool in_used; /* Indicator whether this counter in used or in pool. */
 	union {
-		uint32_t share: 1;
-		/*
-		 * share will be set to 1 when this counter is used as indirect
-		 * action. Only meaningful when user own this counter.
-		 */
+		struct {
+			uint32_t share:1;
+			/*
+			 * share will be set to 1 when this counter is used as
+			 * indirect action.
+			 */
+			uint32_t age_idx:24;
+			/*
+			 * When this counter uses for aging, it save the index
+			 * of AGE parameter. For pure counter (without aging)
+			 * this index is zero.
+			 */
+		};
+		/* This struct is only meaningful when user own this counter. */
 		uint32_t query_gen_when_free;
 		/*
 		 * When PMD own this counter (user put back counter to PMD
@@ -96,8 +106,48 @@ struct mlx5_hws_cnt_pool {
 	struct rte_ring *free_list;
 	struct rte_ring *wait_reset_list;
 	struct mlx5_hws_cnt_pool_caches *cache;
+	uint64_t time_of_last_age_check;
 } __rte_cache_aligned;
 
+/* HWS AGE status. */
+enum {
+	HWS_AGE_FREE, /* Initialized state. */
+	HWS_AGE_CANDIDATE, /* AGE assigned to flows. */
+	HWS_AGE_CANDIDATE_INSIDE_RING,
+	/*
+	 * AGE assigned to flows but it still in ring. It was aged-out but the
+	 * timeout was changed, so it in ring but stiil candidate.
+	 */
+	HWS_AGE_AGED_OUT_REPORTED,
+	/*
+	 * Aged-out, reported by rte_flow_get_q_aged_flows and wait for destroy.
+	 */
+	HWS_AGE_AGED_OUT_NOT_REPORTED,
+	/*
+	 * Aged-out, inside the aged-out ring.
+	 * wait for rte_flow_get_q_aged_flows and destroy.
+	 */
+};
+
+/* HWS counter age parameter. */
+struct mlx5_hws_age_param {
+	uint32_t timeout; /* Aging timeout in seconds (atomically accessed). */
+	uint32_t sec_since_last_hit;
+	/* Time in seconds since last hit (atomically accessed). */
+	uint16_t state; /* AGE state (atomically accessed). */
+	uint64_t accumulator_last_hits;
+	/* Last total value of hits for comparing. */
+	uint64_t accumulator_hits;
+	/* Accumulator for hits coming from several counters. */
+	uint32_t accumulator_cnt;
+	/* Number counters which already updated the accumulator in this sec. */
+	uint32_t nb_cnts; /* Number counters used by this AGE. */
+	uint32_t queue_id; /* Queue id of the counter. */
+	cnt_id_t own_cnt_index;
+	/* Counter action created specifically for this AGE action. */
+	void *context; /* Flow AGE context. */
+} __rte_packed __rte_cache_aligned;
+
 /**
  * Translate counter id into internal index (start from 0), which can be used
  * as index of raw/cnt pool.
@@ -107,7 +157,7 @@ struct mlx5_hws_cnt_pool {
  * @return
  *   Internal index
  */
-static __rte_always_inline cnt_id_t
+static __rte_always_inline uint32_t
 mlx5_hws_cnt_iidx(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 {
 	uint8_t dcs_idx = cnt_id >> MLX5_HWS_CNT_DCS_IDX_OFFSET;
@@ -139,7 +189,7 @@ mlx5_hws_cnt_id_valid(cnt_id_t cnt_id)
  *   Counter id
  */
 static __rte_always_inline cnt_id_t
-mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, cnt_id_t iidx)
+mlx5_hws_cnt_id_gen(struct mlx5_hws_cnt_pool *cpool, uint32_t iidx)
 {
 	struct mlx5_hws_cnt_dcs_mng *dcs_mng = &cpool->dcs_mng;
 	uint32_t idx;
@@ -344,9 +394,10 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
 	struct rte_ring_zc_data zcdr = {0};
 	struct rte_ring *qcache = NULL;
 	unsigned int wb_num = 0; /* cache write-back number. */
-	cnt_id_t iidx;
+	uint32_t iidx;
 
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
+	cpool->pool[iidx].in_used = false;
 	cpool->pool[iidx].query_gen_when_free =
 		__atomic_load_n(&cpool->query_gen, __ATOMIC_RELAXED);
 	if (likely(queue != NULL))
@@ -388,20 +439,23 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool,
  *   A pointer to HWS queue. If null, it means fetch from common pool.
  * @param cnt_id
  *   A pointer to a cnt_id_t * pointer (counter id) that will be filled.
+ * @param age_idx
+ *   Index of AGE parameter using this counter, zero means there is no such AGE.
+ *
  * @return
  *   - 0: Success; objects taken.
  *   - -ENOENT: Not enough entries in the mempool; no object is retrieved.
  *   - -EAGAIN: counter is not ready; try again.
  */
 static __rte_always_inline int
-mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
-		uint32_t *queue, cnt_id_t *cnt_id)
+mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue,
+		      cnt_id_t *cnt_id, uint32_t age_idx)
 {
 	unsigned int ret;
 	struct rte_ring_zc_data zcdc = {0};
 	struct rte_ring *qcache = NULL;
-	uint32_t query_gen = 0;
-	cnt_id_t iidx, tmp_cid = 0;
+	uint32_t iidx, query_gen = 0;
+	cnt_id_t tmp_cid = 0;
 
 	if (likely(queue != NULL))
 		qcache = cpool->cache->qcache[*queue];
@@ -422,6 +476,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 		__hws_cnt_query_raw(cpool, *cnt_id,
 				    &cpool->pool[iidx].reset.hits,
 				    &cpool->pool[iidx].reset.bytes);
+		cpool->pool[iidx].in_used = true;
+		cpool->pool[iidx].age_idx = age_idx;
 		return 0;
 	}
 	ret = rte_ring_dequeue_zc_burst_elem_start(qcache, sizeof(cnt_id_t), 1,
@@ -455,6 +511,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool,
 			    &cpool->pool[iidx].reset.bytes);
 	rte_ring_dequeue_zc_elem_finish(qcache, 1);
 	cpool->pool[iidx].share = 0;
+	cpool->pool[iidx].in_used = true;
+	cpool->pool[iidx].age_idx = age_idx;
 	return 0;
 }
 
@@ -478,16 +536,16 @@ mlx5_hws_cnt_pool_get_action_offset(struct mlx5_hws_cnt_pool *cpool,
 }
 
 static __rte_always_inline int
-mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id)
+mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id,
+			uint32_t age_idx)
 {
 	int ret;
 	uint32_t iidx;
 
-	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id);
+	ret = mlx5_hws_cnt_pool_get(cpool, NULL, cnt_id, age_idx);
 	if (ret != 0)
 		return ret;
 	iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id);
-	MLX5_ASSERT(cpool->pool[iidx].share == 0);
 	cpool->pool[iidx].share = 1;
 	return 0;
 }
@@ -513,10 +571,73 @@ mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
 	return cpool->pool[iidx].share ? true : false;
 }
 
+static __rte_always_inline void
+mlx5_hws_cnt_age_set(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id,
+		     uint32_t age_idx)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	cpool->pool[iidx].age_idx = age_idx;
+}
+
+static __rte_always_inline uint32_t
+mlx5_hws_cnt_age_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id)
+{
+	uint32_t iidx = mlx5_hws_cnt_iidx(cpool, cnt_id);
+
+	MLX5_ASSERT(cpool->pool[iidx].share);
+	return cpool->pool[iidx].age_idx;
+}
+
+static __rte_always_inline cnt_id_t
+mlx5_hws_age_cnt_get(struct mlx5_priv *priv, struct mlx5_hws_age_param *param,
+		     uint32_t age_idx)
+{
+	if (!param->own_cnt_index) {
+		/* Create indirect counter one for internal usage. */
+		if (mlx5_hws_cnt_shared_get(priv->hws_cpool,
+					    &param->own_cnt_index, age_idx) < 0)
+			return 0;
+		param->nb_cnts++;
+	}
+	return param->own_cnt_index;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_increase(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	MLX5_ASSERT(param != NULL);
+	param->nb_cnts++;
+}
+
+static __rte_always_inline void
+mlx5_hws_age_nb_cnt_decrease(struct mlx5_priv *priv, uint32_t age_idx)
+{
+	struct mlx5_age_info *age_info = GET_PORT_AGE_INFO(priv);
+	struct mlx5_indexed_pool *ipool = age_info->ages_ipool;
+	struct mlx5_hws_age_param *param = mlx5_ipool_get(ipool, age_idx);
+
+	if (param != NULL)
+		param->nb_cnts--;
+}
+
+static __rte_always_inline bool
+mlx5_hws_age_is_indirect(uint32_t age_idx)
+{
+	return (age_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET) ==
+		MLX5_INDIRECT_ACTION_TYPE_AGE ? true : false;
+}
+
 /* init HWS counter pool. */
 struct mlx5_hws_cnt_pool *
-mlx5_hws_cnt_pool_init(const struct mlx5_hws_cnt_pool_cfg *pcfg,
-		const struct mlx5_hws_cache_param *ccfg);
+mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh,
+		       const struct mlx5_hws_cnt_pool_cfg *pcfg,
+		       const struct mlx5_hws_cache_param *ccfg);
 
 void
 mlx5_hws_cnt_pool_deinit(struct mlx5_hws_cnt_pool *cntp);
@@ -555,4 +676,28 @@ mlx5_hws_cnt_svc_init(struct mlx5_dev_ctx_shared *sh);
 void
 mlx5_hws_cnt_svc_deinit(struct mlx5_dev_ctx_shared *sh);
 
+int
+mlx5_hws_age_action_destroy(struct mlx5_priv *priv, uint32_t idx,
+			    struct rte_flow_error *error);
+
+uint32_t
+mlx5_hws_age_action_create(struct mlx5_priv *priv, uint32_t queue_id,
+			   bool shared, const struct rte_flow_action_age *age,
+			   uint32_t flow_idx, struct rte_flow_error *error);
+
+int
+mlx5_hws_age_action_update(struct mlx5_priv *priv, uint32_t idx,
+			   const void *update, struct rte_flow_error *error);
+
+void *
+mlx5_hws_age_context_get(struct mlx5_priv *priv, uint32_t idx);
+
+int
+mlx5_hws_age_pool_init(struct rte_eth_dev *dev,
+		       const struct rte_flow_port_attr *attr,
+		       uint16_t nb_queues);
+
+void
+mlx5_hws_age_pool_destroy(struct mlx5_priv *priv);
+
 #endif /* _MLX5_HWS_CNT_H_ */
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index 254c879d1a..82e8298781 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -170,6 +170,14 @@ struct mlx5_l3t_tbl {
 typedef int32_t (*mlx5_l3t_alloc_callback_fn)(void *ctx,
 					   union mlx5_l3t_data *data);
 
+/*
+ * The default ipool threshold value indicates which per_core_cache
+ * value to set.
+ */
+#define MLX5_HW_IPOOL_SIZE_THRESHOLD (1 << 19)
+/* The default min local cache size. */
+#define MLX5_HW_IPOOL_CACHE_MIN (1 << 9)
+
 /*
  * The indexed memory entry index is made up of trunk index and offset of
  * the entry in the trunk. Since the entry index is 32 bits, in case user
@@ -207,7 +215,7 @@ struct mlx5_indexed_pool_config {
 	 */
 	uint32_t need_lock:1;
 	/* Lock is needed for multiple thread usage. */
-	uint32_t release_mem_en:1; /* Rlease trunk when it is free. */
+	uint32_t release_mem_en:1; /* Release trunk when it is free. */
 	uint32_t max_idx; /* The maximum index can be allocated. */
 	uint32_t per_core_cache;
 	/*
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 14/18] net/mlx5: add async action push and pull support
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (12 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 13/18] net/mlx5: add HWS AGE action support Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:47     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
                     ` (4 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika

The queue based rte_flow_async_action_* functions work same as
queue based async flow functions. The operations can be pushed
asynchronously, so is the pull.

This commit adds the async action missing push and pull support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/net/mlx5/mlx5.h            |  62 ++++-
 drivers/net/mlx5/mlx5_flow.c       |  45 ++++
 drivers/net/mlx5/mlx5_flow.h       |  17 ++
 drivers/net/mlx5/mlx5_flow_aso.c   | 181 +++++++++++--
 drivers/net/mlx5/mlx5_flow_dv.c    |   7 +-
 drivers/net/mlx5/mlx5_flow_hw.c    | 412 +++++++++++++++++++++++++----
 drivers/net/mlx5/mlx5_flow_meter.c |   6 +-
 7 files changed, 626 insertions(+), 104 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 482ec83c61..42a1e206c0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -346,6 +346,8 @@ struct mlx5_lb_ctx {
 enum {
 	MLX5_HW_Q_JOB_TYPE_CREATE, /* Flow create job type. */
 	MLX5_HW_Q_JOB_TYPE_DESTROY, /* Flow destroy job type. */
+	MLX5_HW_Q_JOB_TYPE_UPDATE,
+	MLX5_HW_Q_JOB_TYPE_QUERY,
 };
 
 #define MLX5_HW_MAX_ITEMS (16)
@@ -353,12 +355,23 @@ enum {
 /* HW steering flow management job descriptor. */
 struct mlx5_hw_q_job {
 	uint32_t type; /* Job type. */
-	struct rte_flow_hw *flow; /* Flow attached to the job. */
+	union {
+		struct rte_flow_hw *flow; /* Flow attached to the job. */
+		const void *action; /* Indirect action attached to the job. */
+	};
 	void *user_data; /* Job user data. */
 	uint8_t *encap_data; /* Encap data. */
 	struct mlx5_modification_cmd *mhdr_cmd;
 	struct rte_flow_item *items;
-	struct rte_flow_item_ethdev port_spec;
+	union {
+		struct {
+			/* Pointer to ct query user memory. */
+			struct rte_flow_action_conntrack *profile;
+			/* Pointer to ct ASO query out memory. */
+			void *out_data;
+		} __rte_packed;
+		struct rte_flow_item_ethdev port_spec;
+	} __rte_packed;
 };
 
 /* HW steering job descriptor LIFO pool. */
@@ -366,6 +379,8 @@ struct mlx5_hw_q {
 	uint32_t job_idx; /* Free job index. */
 	uint32_t size; /* LIFO size. */
 	struct mlx5_hw_q_job **job; /* LIFO header. */
+	struct rte_ring *indir_cq; /* Indirect action SW completion queue. */
+	struct rte_ring *indir_iq; /* Indirect action SW in progress queue. */
 } __rte_cache_aligned;
 
 
@@ -574,6 +589,7 @@ struct mlx5_aso_sq_elem {
 			struct mlx5_aso_ct_action *ct;
 			char *query_data;
 		};
+		void *user_data;
 	};
 };
 
@@ -583,7 +599,9 @@ struct mlx5_aso_sq {
 	struct mlx5_aso_cq cq;
 	struct mlx5_devx_sq sq_obj;
 	struct mlx5_pmd_mr mr;
+	volatile struct mlx5_aso_wqe *db;
 	uint16_t pi;
+	uint16_t db_pi;
 	uint32_t head;
 	uint32_t tail;
 	uint32_t sqn;
@@ -998,6 +1016,7 @@ struct mlx5_flow_meter_profile {
 enum mlx5_aso_mtr_state {
 	ASO_METER_FREE, /* In free list. */
 	ASO_METER_WAIT, /* ACCESS_ASO WQE in progress. */
+	ASO_METER_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_METER_READY, /* CQE received. */
 };
 
@@ -1200,6 +1219,7 @@ struct mlx5_bond_info {
 enum mlx5_aso_ct_state {
 	ASO_CONNTRACK_FREE, /* Inactive, in the free list. */
 	ASO_CONNTRACK_WAIT, /* WQE sent in the SQ. */
+	ASO_CONNTRACK_WAIT_ASYNC, /* CQE will be handled by async pull. */
 	ASO_CONNTRACK_READY, /* CQE received w/o error. */
 	ASO_CONNTRACK_QUERY, /* WQE for query sent. */
 	ASO_CONNTRACK_MAX, /* Guard. */
@@ -1208,13 +1228,21 @@ enum mlx5_aso_ct_state {
 /* Generic ASO connection tracking structure. */
 struct mlx5_aso_ct_action {
 	union {
-		LIST_ENTRY(mlx5_aso_ct_action) next;
-		/* Pointer to the next ASO CT. Used only in SWS. */
-		struct mlx5_aso_ct_pool *pool;
-		/* Pointer to action pool. Used only in HWS. */
+		/* SWS mode struct. */
+		struct {
+			/* Pointer to the next ASO CT. Used only in SWS. */
+			LIST_ENTRY(mlx5_aso_ct_action) next;
+		};
+		/* HWS mode struct. */
+		struct {
+			/* Pointer to action pool. Used only in HWS. */
+			struct mlx5_aso_ct_pool *pool;
+		};
 	};
-	void *dr_action_orig; /* General action object for original dir. */
-	void *dr_action_rply; /* General action object for reply dir. */
+	/* General action object for original dir. */
+	void *dr_action_orig;
+	/* General action object for reply dir. */
+	void *dr_action_rply;
 	uint32_t refcnt; /* Action used count in device flows. */
 	uint16_t offset; /* Offset of ASO CT in DevX objects bulk. */
 	uint16_t peer; /* The only peer port index could also use this CT. */
@@ -2148,18 +2176,21 @@ int mlx5_aso_flow_hit_queue_poll_stop(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_queue_uninit(struct mlx5_dev_ctx_shared *sh,
 			   enum mlx5_access_aso_opc_mod aso_opc_mod);
 int mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
-				 struct mlx5_aso_mtr *mtr,
-				 struct mlx5_mtr_bulk *bulk);
+		struct mlx5_aso_mtr *mtr, struct mlx5_mtr_bulk *bulk,
+		void *user_data, bool push);
 int mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		struct mlx5_aso_mtr *mtr);
 int mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			      struct mlx5_aso_ct_action *ct,
-			      const struct rte_flow_action_conntrack *profile);
+			      const struct rte_flow_action_conntrack *profile,
+			      void *user_data,
+			      bool push);
 int mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			   struct mlx5_aso_ct_action *ct);
 int mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			     struct mlx5_aso_ct_action *ct,
-			     struct rte_flow_action_conntrack *profile);
+			     struct rte_flow_action_conntrack *profile,
+			     void *user_data, bool push);
 int mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			  struct mlx5_aso_ct_action *ct);
 uint32_t
@@ -2167,6 +2198,13 @@ mlx5_get_supported_sw_parsing_offloads(const struct mlx5_hca_attr *attr);
 uint32_t
 mlx5_get_supported_tunneling_offloads(const struct mlx5_hca_attr *attr);
 
+void mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
+			     char *wdata);
+void mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		       struct mlx5_aso_sq *sq);
+int mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			     struct rte_flow_op_result res[],
+			     uint16_t n_res);
 int mlx5_aso_cnt_queue_init(struct mlx5_dev_ctx_shared *sh);
 void mlx5_aso_cnt_queue_uninit(struct mlx5_dev_ctx_shared *sh);
 int mlx5_aso_cnt_query(struct mlx5_dev_ctx_shared *sh,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index f79ac265a4..9121b90b4e 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -981,6 +981,14 @@ mlx5_flow_async_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 				  void *user_data,
 				  struct rte_flow_error *error);
 
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				 const struct rte_flow_op_attr *attr,
+				 const struct rte_flow_action_handle *handle,
+				 void *data,
+				 void *user_data,
+				 struct rte_flow_error *error);
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -1019,6 +1027,7 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.push = mlx5_flow_push,
 	.async_action_handle_create = mlx5_flow_async_action_handle_create,
 	.async_action_handle_update = mlx5_flow_async_action_handle_update,
+	.async_action_handle_query = mlx5_flow_async_action_handle_query,
 	.async_action_handle_destroy = mlx5_flow_async_action_handle_destroy,
 };
 
@@ -8862,6 +8871,42 @@ mlx5_flow_async_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 					 update, user_data, error);
 }
 
+/**
+ * Query shared action.
+ *
+ * @param[in] dev
+ *   Pointer to the rte_eth_dev structure.
+ * @param[in] queue
+ *   Which queue to be used..
+ * @param[in] attr
+ *   Operation attribute.
+ * @param[in] handle
+ *   Action handle to be updated.
+ * @param[in] data
+ *   Pointer query result data.
+ * @param[in] user_data
+ *   Pointer to the user_data.
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, negative value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+				    const struct rte_flow_op_attr *attr,
+				    const struct rte_flow_action_handle *handle,
+				    void *data,
+				    void *user_data,
+				    struct rte_flow_error *error)
+{
+	const struct mlx5_flow_driver_ops *fops =
+			flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
+
+	return fops->async_action_query(dev, queue, attr, handle,
+					data, user_data, error);
+}
+
 /**
  * Destroy shared action.
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 9bfb2908a1..10d4cdb502 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -57,6 +57,13 @@ enum mlx5_rte_flow_field_id {
 
 #define MLX5_INDIRECT_ACTION_TYPE_OFFSET 29
 
+#define MLX5_INDIRECT_ACTION_TYPE_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) >> MLX5_INDIRECT_ACTION_TYPE_OFFSET)
+
+#define MLX5_INDIRECT_ACTION_IDX_GET(handle) \
+	(((uint32_t)(uintptr_t)(handle)) & \
+	 ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1))
+
 enum {
 	MLX5_INDIRECT_ACTION_TYPE_RSS,
 	MLX5_INDIRECT_ACTION_TYPE_AGE,
@@ -1829,6 +1836,15 @@ typedef int (*mlx5_flow_async_action_handle_update_t)
 			 void *user_data,
 			 struct rte_flow_error *error);
 
+typedef int (*mlx5_flow_async_action_handle_query_t)
+			(struct rte_eth_dev *dev,
+			 uint32_t queue,
+			 const struct rte_flow_op_attr *attr,
+			 const struct rte_flow_action_handle *handle,
+			 void *data,
+			 void *user_data,
+			 struct rte_flow_error *error);
+
 typedef int (*mlx5_flow_async_action_handle_destroy_t)
 			(struct rte_eth_dev *dev,
 			 uint32_t queue,
@@ -1891,6 +1907,7 @@ struct mlx5_flow_driver_ops {
 	mlx5_flow_push_t push;
 	mlx5_flow_async_action_handle_create_t async_action_create;
 	mlx5_flow_async_action_handle_update_t async_action_update;
+	mlx5_flow_async_action_handle_query_t async_action_query;
 	mlx5_flow_async_action_handle_destroy_t async_action_destroy;
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_aso.c b/drivers/net/mlx5/mlx5_flow_aso.c
index a5f58301eb..1ddf71e44e 100644
--- a/drivers/net/mlx5/mlx5_flow_aso.c
+++ b/drivers/net/mlx5/mlx5_flow_aso.c
@@ -519,6 +519,70 @@ mlx5_aso_cqe_err_handle(struct mlx5_aso_sq *sq)
 			       (volatile uint32_t *)&sq->sq_obj.aso_wqes[idx]);
 }
 
+int
+mlx5_aso_pull_completion(struct mlx5_aso_sq *sq,
+			 struct rte_flow_op_result res[],
+			 uint16_t n_res)
+{
+	struct mlx5_aso_cq *cq = &sq->cq;
+	volatile struct mlx5_cqe *restrict cqe;
+	const uint32_t cq_size = 1 << cq->log_desc_n;
+	const uint32_t mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx;
+	uint16_t max;
+	uint16_t n = 0;
+	int ret;
+
+	max = (uint16_t)(sq->head - sq->tail);
+	if (unlikely(!max || !n_res))
+		return 0;
+	next_idx = cq->cq_ci & mask;
+	do {
+		idx = next_idx;
+		next_idx = (cq->cq_ci + 1) & mask;
+		/* Need to confirm the position of the prefetch. */
+		rte_prefetch0(&cq->cq_obj.cqes[next_idx]);
+		cqe = &cq->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, cq->cq_ci);
+		/*
+		 * Be sure owner read is done before any other cookie field or
+		 * opaque field.
+		 */
+		rte_io_rmb();
+		if (ret == MLX5_CQE_STATUS_HW_OWN)
+			break;
+		res[n].user_data = sq->elts[(uint16_t)((sq->tail + n) & mask)].user_data;
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			mlx5_aso_cqe_err_handle(sq);
+			res[n].status = RTE_FLOW_OP_ERROR;
+		} else {
+			res[n].status = RTE_FLOW_OP_SUCCESS;
+		}
+		cq->cq_ci++;
+		if (++n == n_res)
+			break;
+	} while (1);
+	if (likely(n)) {
+		sq->tail += n;
+		rte_io_wmb();
+		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+	}
+	return n;
+}
+
+void
+mlx5_aso_push_wqe(struct mlx5_dev_ctx_shared *sh,
+		  struct mlx5_aso_sq *sq)
+{
+	if (sq->db_pi == sq->pi)
+		return;
+	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)sq->db,
+			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+			   !sh->tx_uar.dbnc);
+	sq->db_pi = sq->pi;
+}
+
 /**
  * Update ASO objects upon completion.
  *
@@ -728,7 +792,9 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			       struct mlx5_aso_sq *sq,
 			       struct mlx5_aso_mtr *aso_mtr,
 			       struct mlx5_mtr_bulk *bulk,
-				   bool need_lock)
+			       bool need_lock,
+			       void *user_data,
+			       bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	struct mlx5_flow_meter_info *fm = NULL;
@@ -754,7 +820,7 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	fm = &aso_mtr->fm;
-	sq->elts[sq->head & mask].mtr = aso_mtr;
+	sq->elts[sq->head & mask].mtr = user_data ? user_data : aso_mtr;
 	if (aso_mtr->type == ASO_METER_INDIRECT) {
 		if (likely(sh->config.dv_flow_en == 2))
 			pool = aso_mtr->pool;
@@ -820,9 +886,13 @@ mlx5_aso_mtr_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	 */
 	sq->head++;
 	sq->pi += 2;/* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
 			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
 			   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -912,11 +982,14 @@ mlx5_aso_mtr_completion_handle(struct mlx5_aso_sq *sq, bool need_lock)
 int
 mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 			struct mlx5_aso_mtr *mtr,
-			struct mlx5_mtr_bulk *bulk)
+			struct mlx5_mtr_bulk *bulk,
+			void *user_data,
+			bool push)
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_wqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
 	bool need_lock;
+	int ret;
 
 	if (likely(sh->config.dv_flow_en == 2) &&
 	    mtr->type == ASO_METER_INDIRECT) {
@@ -931,10 +1004,15 @@ mlx5_aso_meter_update_by_wqe(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						     need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
-		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr,
-						   bulk, need_lock))
+		if (mlx5_aso_mtr_sq_enqueue_single(sh, sq, mtr, bulk,
+						   need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(MLX5_ASO_WQE_CQE_RESPONSE_DELAY);
@@ -963,6 +1041,7 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 {
 	struct mlx5_aso_sq *sq;
 	uint32_t poll_cqe_times = MLX5_MTR_POLL_WQE_CQE_TIMES;
+	uint8_t state;
 	bool need_lock;
 
 	if (likely(sh->config.dv_flow_en == 2) &&
@@ -978,8 +1057,8 @@ mlx5_aso_mtr_wait(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
 		sq = &sh->mtrmng->pools_mng.sq;
 		need_lock = true;
 	}
-	if (__atomic_load_n(&mtr->state, __ATOMIC_RELAXED) ==
-					    ASO_METER_READY)
+	state = __atomic_load_n(&mtr->state, __ATOMIC_RELAXED);
+	if (state == ASO_METER_READY || state == ASO_METER_WAIT_ASYNC)
 		return 0;
 	do {
 		mlx5_aso_mtr_completion_handle(sq, need_lock);
@@ -1095,7 +1174,9 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 			      struct mlx5_aso_sq *sq,
 			      struct mlx5_aso_ct_action *ct,
 			      const struct rte_flow_action_conntrack *profile,
-			      bool need_lock)
+			      bool need_lock,
+			      void *user_data,
+			      bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1119,10 +1200,16 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_WAIT);
-	sq->elts[sq->head & mask].ct = ct;
-	sq->elts[sq->head & mask].query_data = NULL;
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_WAIT);
+	if (user_data) {
+		sq->elts[sq->head & mask].user_data = user_data;
+	} else {
+		sq->elts[sq->head & mask].ct = ct;
+		sq->elts[sq->head & mask].query_data = NULL;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
+
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
 						  ct->offset);
@@ -1202,9 +1289,13 @@ mlx5_aso_ct_sq_enqueue_single(struct mlx5_dev_ctx_shared *sh,
 		 profile->reply_dir.max_ack);
 	sq->head++;
 	sq->pi += 2; /* Each WQE contains 2 WQEBB's. */
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1260,7 +1351,9 @@ static int
 mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 			    struct mlx5_aso_sq *sq,
 			    struct mlx5_aso_ct_action *ct, char *data,
-			    bool need_lock)
+			    bool need_lock,
+			    void *user_data,
+			    bool push)
 {
 	volatile struct mlx5_aso_wqe *wqe = NULL;
 	uint16_t size = 1 << sq->log_desc_n;
@@ -1286,14 +1379,23 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(ERR, "Fail: SQ is full and no free WQE to send");
 		return 0;
 	}
-	MLX5_ASO_CT_UPDATE_STATE(ct, ASO_CONNTRACK_QUERY);
+	MLX5_ASO_CT_UPDATE_STATE(ct,
+			user_data ? ASO_CONNTRACK_WAIT_ASYNC : ASO_CONNTRACK_QUERY);
 	wqe = &sq->sq_obj.aso_wqes[sq->head & mask];
 	/* Confirm the location and address of the prefetch instruction. */
 	rte_prefetch0(&sq->sq_obj.aso_wqes[(sq->head + 1) & mask]);
 	/* Fill next WQE. */
 	wqe_idx = sq->head & mask;
-	sq->elts[wqe_idx].ct = ct;
-	sq->elts[wqe_idx].query_data = data;
+	/* Check if this is async mode. */
+	if (user_data) {
+		struct mlx5_hw_q_job *job = (struct mlx5_hw_q_job *)user_data;
+
+		sq->elts[wqe_idx].ct = user_data;
+		job->out_data = (char *)((uintptr_t)sq->mr.addr + wqe_idx * 64);
+	} else {
+		sq->elts[wqe_idx].query_data = data;
+		sq->elts[wqe_idx].ct = ct;
+	}
 	pool = __mlx5_aso_ct_get_pool(sh, ct);
 	/* Each WQE will have a single CT object. */
 	wqe->general_cseg.misc = rte_cpu_to_be_32(pool->devx_obj->id +
@@ -1319,9 +1421,13 @@ mlx5_aso_ct_sq_query_single(struct mlx5_dev_ctx_shared *sh,
 	 * data segment is not used in this case.
 	 */
 	sq->pi += 2;
-	mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
-			   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
-			   !sh->tx_uar.dbnc);
+	if (push) {
+		mlx5_doorbell_ring(&sh->tx_uar.bf_db, *(volatile uint64_t *)wqe,
+				   sq->pi, &sq->sq_obj.db_rec[MLX5_SND_DBR],
+				   !sh->tx_uar.dbnc);
+		sq->db_pi = sq->pi;
+	}
+	sq->db = wqe;
 	if (need_lock)
 		rte_spinlock_unlock(&sq->sqsl);
 	return 1;
@@ -1407,20 +1513,29 @@ int
 mlx5_aso_ct_update_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			  uint32_t queue,
 			  struct mlx5_aso_ct_action *ct,
-			  const struct rte_flow_action_conntrack *profile)
+			  const struct rte_flow_action_conntrack *profile,
+			  void *user_data,
+			  bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
 	struct mlx5_aso_sq *sq;
 	bool need_lock = !!(queue == MLX5_HW_INV_QUEUE);
+	int ret;
 
 	if (sh->config.dv_flow_en == 2)
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						    need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
-		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile, need_lock))
+		mlx5_aso_ct_completion_handle(sh, sq,  need_lock);
+		if (mlx5_aso_ct_sq_enqueue_single(sh, sq, ct, profile,
+						  need_lock, NULL, true))
 			return 0;
 		/* Waiting for wqe resource. */
 		rte_delay_us_sleep(10u);
@@ -1480,7 +1595,7 @@ mlx5_aso_ct_wait_ready(struct mlx5_dev_ctx_shared *sh, uint32_t queue,
  * @param[in] wdata
  *   Pointer to data fetched from hardware.
  */
-static inline void
+void
 mlx5_aso_ct_obj_analyze(struct rte_flow_action_conntrack *profile,
 			char *wdata)
 {
@@ -1564,7 +1679,8 @@ int
 mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 			 uint32_t queue,
 			 struct mlx5_aso_ct_action *ct,
-			 struct rte_flow_action_conntrack *profile)
+			 struct rte_flow_action_conntrack *profile,
+			 void *user_data, bool push)
 {
 	uint32_t poll_wqe_times = MLX5_CT_POLL_WQE_CQE_TIMES;
 	struct mlx5_aso_ct_pool *pool = __mlx5_aso_ct_get_pool(sh, ct);
@@ -1577,9 +1693,15 @@ mlx5_aso_ct_query_by_wqe(struct mlx5_dev_ctx_shared *sh,
 		sq = __mlx5_aso_ct_get_sq_in_hws(queue, pool);
 	else
 		sq = __mlx5_aso_ct_get_sq_in_sws(sh, ct);
+	if (queue != MLX5_HW_INV_QUEUE) {
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+						  need_lock, user_data, push);
+		return ret > 0 ? 0 : -1;
+	}
 	do {
 		mlx5_aso_ct_completion_handle(sh, sq, need_lock);
-		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data, need_lock);
+		ret = mlx5_aso_ct_sq_query_single(sh, sq, ct, out_data,
+				need_lock, NULL, true);
 		if (ret < 0)
 			return ret;
 		else if (ret > 0)
@@ -1630,7 +1752,8 @@ mlx5_aso_ct_available(struct mlx5_dev_ctx_shared *sh,
 		rte_errno = ENXIO;
 		return -rte_errno;
 	} else if (state == ASO_CONNTRACK_READY ||
-		   state == ASO_CONNTRACK_QUERY) {
+		   state == ASO_CONNTRACK_QUERY ||
+		   state == ASO_CONNTRACK_WAIT_ASYNC) {
 		return 0;
 	}
 	do {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 12fd62f5e8..42c4231286 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -13150,7 +13150,7 @@ flow_dv_translate_create_conntrack(struct rte_eth_dev *dev,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "Failed to allocate CT object");
 	ct = flow_aso_ct_get_by_dev_idx(dev, idx);
-	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(sh, MLX5_HW_INV_QUEUE, ct, pro, NULL, true)) {
 		flow_dv_aso_ct_dev_release(dev, idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -15981,7 +15981,7 @@ __flow_dv_action_ct_update(struct rte_eth_dev *dev, uint32_t idx,
 		if (ret)
 			return ret;
 		ret = mlx5_aso_ct_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						ct, new_prf);
+						ct, new_prf, NULL, true);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -16817,7 +16817,8 @@ flow_dv_action_query(struct rte_eth_dev *dev,
 							ct->peer;
 		((struct rte_flow_action_conntrack *)data)->is_original_dir =
 							ct->is_original;
-		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, data))
+		if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct,
+					data, NULL, true))
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 					NULL,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1a9c5e6d7f..59c5383553 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -1178,9 +1178,9 @@ static rte_be32_t vlan_hdr_to_be32(const struct rte_flow_action *actions)
 }
 
 static __rte_always_inline struct mlx5_aso_mtr *
-flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
-			   const struct rte_flow_action *action,
-			   uint32_t queue)
+flow_hw_meter_mark_alloc(struct rte_eth_dev *dev, uint32_t queue,
+			 const struct rte_flow_action *action,
+			 void *user_data, bool push)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
@@ -1200,13 +1200,14 @@ flow_hw_meter_mark_alloc(struct rte_eth_dev *dev,
 	fm->is_enable = meter_mark->state;
 	fm->color_aware = meter_mark->color_mode;
 	aso_mtr->pool = pool;
-	aso_mtr->state = ASO_METER_WAIT;
+	aso_mtr->state = (queue == MLX5_HW_INV_QUEUE) ?
+			  ASO_METER_WAIT : ASO_METER_WAIT_ASYNC;
 	aso_mtr->offset = mtr_id - 1;
 	aso_mtr->init_color = (meter_mark->color_mode) ?
 		meter_mark->init_color : RTE_COLOR_GREEN;
 	/* Update ASO flow meter by wqe. */
 	if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-					 &priv->mtr_bulk)) {
+					 &priv->mtr_bulk, user_data, push)) {
 		mlx5_ipool_free(pool->idx_pool, mtr_id);
 		return NULL;
 	}
@@ -1231,7 +1232,7 @@ flow_hw_meter_mark_compile(struct rte_eth_dev *dev,
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
 	struct mlx5_aso_mtr *aso_mtr;
 
-	aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+	aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, NULL, true);
 	if (!aso_mtr)
 		return -1;
 
@@ -2298,9 +2299,13 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				rte_col_2_mlx5_col(aso_mtr->init_color);
 			break;
 		case RTE_FLOW_ACTION_TYPE_METER_MARK:
+			/*
+			 * Allocate meter directly will slow down flow
+			 * insertion rate.
+			 */
 			ret = flow_hw_meter_mark_compile(dev,
 				act_data->action_dst, action,
-				rule_acts, &job->flow->mtr_id, queue);
+				rule_acts, &job->flow->mtr_id, MLX5_HW_INV_QUEUE);
 			if (ret != 0)
 				return ret;
 			break;
@@ -2607,6 +2612,74 @@ flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
 	}
 }
 
+static inline int
+__flow_hw_pull_indir_action_comp(struct rte_eth_dev *dev,
+				 uint32_t queue,
+				 struct rte_flow_op_result res[],
+				 uint16_t n_res)
+
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *r = priv->hw_q[queue].indir_cq;
+	struct mlx5_hw_q_job *job;
+	void *user_data = NULL;
+	uint32_t type, idx;
+	struct mlx5_aso_mtr *aso_mtr;
+	struct mlx5_aso_ct_action *aso_ct;
+	int ret_comp, i;
+
+	ret_comp = (int)rte_ring_count(r);
+	if (ret_comp > n_res)
+		ret_comp = n_res;
+	for (i = 0; i < ret_comp; i++) {
+		rte_ring_dequeue(r, &user_data);
+		res[i].user_data = user_data;
+		res[i].status = RTE_FLOW_OP_SUCCESS;
+	}
+	if (ret_comp < n_res && priv->hws_mpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->hws_mpool->sq[queue],
+				&res[ret_comp], n_res - ret_comp);
+	if (ret_comp < n_res && priv->hws_ctpool)
+		ret_comp += mlx5_aso_pull_completion(&priv->ct_mng->aso_sqs[queue],
+				&res[ret_comp], n_res - ret_comp);
+	for (i = 0; i <  ret_comp; i++) {
+		job = (struct mlx5_hw_q_job *)res[i].user_data;
+		/* Restore user data. */
+		res[i].user_data = job->user_data;
+		if (job->type == MLX5_HW_Q_JOB_TYPE_DESTROY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				mlx5_ipool_free(priv->hws_mpool->idx_pool, idx);
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_CREATE) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_METER_MARK) {
+				idx = MLX5_INDIRECT_ACTION_IDX_GET(job->action);
+				aso_mtr = mlx5_ipool_get(priv->hws_mpool->idx_pool, idx);
+				aso_mtr->state = ASO_METER_READY;
+			} else if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		} else if (job->type == MLX5_HW_Q_JOB_TYPE_QUERY) {
+			type = MLX5_INDIRECT_ACTION_TYPE_GET(job->action);
+			if (type == MLX5_INDIRECT_ACTION_TYPE_CT) {
+				idx = MLX5_ACTION_CTX_CT_GET_IDX
+					((uint32_t)(uintptr_t)job->action);
+				aso_ct = mlx5_ipool_get(priv->hws_ctpool->cts, idx);
+				mlx5_aso_ct_obj_analyze(job->profile,
+							job->out_data);
+				aso_ct->state = ASO_CONNTRACK_READY;
+			}
+		}
+		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
+	}
+	return ret_comp;
+}
+
 /**
  * Pull the enqueued flows.
  *
@@ -2639,6 +2712,7 @@ flow_hw_pull(struct rte_eth_dev *dev,
 	struct mlx5_hw_q_job *job;
 	int ret, i;
 
+	/* 1. Pull the flow completion. */
 	ret = mlx5dr_send_queue_poll(priv->dr_ctx, queue, res, n_res);
 	if (ret < 0)
 		return rte_flow_error_set(error, rte_errno,
@@ -2664,9 +2738,34 @@ flow_hw_pull(struct rte_eth_dev *dev,
 		}
 		priv->hw_q[queue].job[priv->hw_q[queue].job_idx++] = job;
 	}
+	/* 2. Pull indirect action comp. */
+	if (ret < n_res)
+		ret += __flow_hw_pull_indir_action_comp(dev, queue, &res[ret],
+							n_res - ret);
 	return ret;
 }
 
+static inline void
+__flow_hw_push_action(struct rte_eth_dev *dev,
+		    uint32_t queue)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_ring *iq = priv->hw_q[queue].indir_iq;
+	struct rte_ring *cq = priv->hw_q[queue].indir_cq;
+	void *job = NULL;
+	uint32_t ret, i;
+
+	ret = rte_ring_count(iq);
+	for (i = 0; i < ret; i++) {
+		rte_ring_dequeue(iq, &job);
+		rte_ring_enqueue(cq, job);
+	}
+	if (priv->hws_ctpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->ct_mng->aso_sqs[queue]);
+	if (priv->hws_mpool)
+		mlx5_aso_push_wqe(priv->sh, &priv->hws_mpool->sq[queue]);
+}
+
 /**
  * Push the enqueued flows to HW.
  *
@@ -2690,6 +2789,7 @@ flow_hw_push(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	int ret;
 
+	__flow_hw_push_action(dev, queue);
 	ret = mlx5dr_send_queue_action(priv->dr_ctx, queue,
 				       MLX5DR_SEND_QUEUE_ACTION_DRAIN);
 	if (ret) {
@@ -5943,7 +6043,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	/* Adds one queue to be used by PMD.
 	 * The last queue will be used by the PMD.
 	 */
-	uint16_t nb_q_updated;
+	uint16_t nb_q_updated = 0;
 	struct rte_flow_queue_attr **_queue_attr = NULL;
 	struct rte_flow_queue_attr ctrl_queue_attr = {0};
 	bool is_proxy = !!(priv->sh->config.dv_esw_en && priv->master);
@@ -6010,6 +6110,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		goto err;
 	}
 	for (i = 0; i < nb_q_updated; i++) {
+		char mz_name[RTE_MEMZONE_NAMESIZE];
 		uint8_t *encap = NULL;
 		struct mlx5_modification_cmd *mhdr_cmd = NULL;
 		struct rte_flow_item *items = NULL;
@@ -6037,6 +6138,22 @@ flow_hw_configure(struct rte_eth_dev *dev,
 			job[j].items = &items[j * MLX5_HW_MAX_ITEMS];
 			priv->hw_q[i].job[j] = &job[j];
 		}
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_cq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_cq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_cq)
+			goto err;
+		snprintf(mz_name, sizeof(mz_name), "port_%u_indir_act_iq_%u",
+			 dev->data->port_id, i);
+		priv->hw_q[i].indir_iq = rte_ring_create(mz_name,
+				_queue_attr[i]->size, SOCKET_ID_ANY,
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (!priv->hw_q[i].indir_iq)
+			goto err;
 	}
 	dr_ctx_attr.pd = priv->sh->cdev->pd;
 	dr_ctx_attr.queues = nb_q_updated;
@@ -6154,6 +6271,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	flow_hw_destroy_vlan(dev);
 	if (dr_ctx)
 		claim_zero(mlx5dr_context_close(dr_ctx));
+	for (i = 0; i < nb_q_updated; i++) {
+		if (priv->hw_q[i].indir_iq)
+			rte_ring_free(priv->hw_q[i].indir_iq);
+		if (priv->hw_q[i].indir_cq)
+			rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	if (priv->acts_ipool) {
@@ -6183,7 +6306,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	struct rte_flow_template_table *tbl;
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_actions_template *at;
-	int i;
+	uint32_t i;
 
 	if (!priv->dr_ctx)
 		return;
@@ -6231,6 +6354,10 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		flow_hw_ct_mng_destroy(dev, priv->ct_mng);
 		priv->ct_mng = NULL;
 	}
+	for (i = 0; i < priv->nb_queue; i++) {
+		rte_ring_free(priv->hw_q[i].indir_iq);
+		rte_ring_free(priv->hw_q[i].indir_cq);
+	}
 	mlx5_free(priv->hw_q);
 	priv->hw_q = NULL;
 	claim_zero(mlx5dr_context_close(priv->dr_ctx));
@@ -6419,8 +6546,9 @@ flow_hw_conntrack_destroy(struct rte_eth_dev *dev __rte_unused,
 }
 
 static int
-flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
+flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t queue, uint32_t idx,
 			struct rte_flow_action_conntrack *profile,
+			void *user_data, bool push,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6444,7 +6572,7 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 	}
 	profile->peer_port = ct->peer;
 	profile->is_original_dir = ct->is_original;
-	if (mlx5_aso_ct_query_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, ct, profile))
+	if (mlx5_aso_ct_query_by_wqe(priv->sh, queue, ct, profile, user_data, push))
 		return rte_flow_error_set(error, EIO,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL,
@@ -6456,7 +6584,8 @@ flow_hw_conntrack_query(struct rte_eth_dev *dev, uint32_t idx,
 static int
 flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_modify_conntrack *action_conf,
-			 uint32_t idx, struct rte_flow_error *error)
+			 uint32_t idx, void *user_data, bool push,
+			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_ct_pool *pool = priv->hws_ctpool;
@@ -6487,7 +6616,8 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 		ret = mlx5_validate_action_ct(dev, new_prf, error);
 		if (ret)
 			return ret;
-		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf);
+		ret = mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, new_prf,
+						user_data, push);
 		if (ret)
 			return rte_flow_error_set(error, EIO,
 					RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -6509,6 +6639,7 @@ flow_hw_conntrack_update(struct rte_eth_dev *dev, uint32_t queue,
 static struct rte_flow_action_handle *
 flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 			 const struct rte_flow_action_conntrack *pro,
+			 void *user_data, bool push,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -6535,7 +6666,7 @@ flow_hw_conntrack_create(struct rte_eth_dev *dev, uint32_t queue,
 	ct->is_original = !!pro->is_original_dir;
 	ct->peer = pro->peer_port;
 	ct->pool = pool;
-	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro)) {
+	if (mlx5_aso_ct_update_by_wqe(priv->sh, queue, ct, pro, user_data, push)) {
 		mlx5_ipool_free(pool->cts, ct_idx);
 		rte_flow_error_set(error, EBUSY,
 				   RTE_FLOW_ERROR_TYPE_ACTION, NULL,
@@ -6655,15 +6786,29 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 			     struct rte_flow_error *error)
 {
 	struct rte_flow_action_handle *handle = NULL;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_action_age *age;
 	struct mlx5_aso_mtr *aso_mtr;
 	cnt_id_t cnt_id;
 	uint32_t mtr_id;
 	uint32_t age_idx;
+	bool push = true;
+	bool aso = false;
 
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx)) {
+			rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Flow queue full.");
+			return NULL;
+		}
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_CREATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (action->type) {
 	case RTE_FLOW_ACTION_TYPE_AGE:
 		if (priv->hws_strict_queue) {
@@ -6703,10 +6848,13 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 				 (uintptr_t)cnt_id;
 		break;
 	case RTE_FLOW_ACTION_TYPE_CONNTRACK:
-		handle = flow_hw_conntrack_create(dev, queue, action->conf, error);
+		aso = true;
+		handle = flow_hw_conntrack_create(dev, queue, action->conf, job,
+						  push, error);
 		break;
 	case RTE_FLOW_ACTION_TYPE_METER_MARK:
-		aso_mtr = flow_hw_meter_mark_alloc(dev, action, queue);
+		aso = true;
+		aso_mtr = flow_hw_meter_mark_alloc(dev, queue, action, job, push);
 		if (!aso_mtr)
 			break;
 		mtr_id = (MLX5_INDIRECT_ACTION_TYPE_METER_MARK <<
@@ -6719,7 +6867,20 @@ flow_hw_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
 	default:
 		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				   NULL, "action type not supported");
-		return NULL;
+		break;
+	}
+	if (job) {
+		if (!handle) {
+			priv->hw_q[queue].job_idx++;
+			return NULL;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return handle;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
 	return handle;
 }
@@ -6753,32 +6914,56 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			     void *user_data,
 			     struct rte_flow_error *error)
 {
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	const struct rte_flow_modify_conntrack *ct_conf =
+		(const struct rte_flow_modify_conntrack *)update;
 	const struct rte_flow_update_meter_mark *upd_meter_mark =
 		(const struct rte_flow_update_meter_mark *)update;
 	const struct rte_flow_action_meter_mark *meter_mark;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
+	int ret = 0;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action update failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_UPDATE;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_update(priv, idx, update, error);
+		ret = mlx5_hws_age_action_update(priv, idx, update, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_update(dev, queue, update, act_idx, error);
+		if (ct_conf->state)
+			aso = true;
+		ret = flow_hw_conntrack_update(dev, queue, update, act_idx,
+					       job, push, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
+		aso = true;
 		meter_mark = &upd_meter_mark->meter_mark;
 		/* Find ASO object. */
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark update index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		if (upd_meter_mark->profile_valid)
 			fm->profile = (struct mlx5_flow_meter_profile *)
@@ -6792,25 +6977,46 @@ flow_hw_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
 			fm->is_enable = meter_mark->state;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue,
-						 aso_mtr, &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 aso_mtr, &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
+		}
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_update(dev, handle, update, error);
+		ret = flow_dv_action_update(dev, handle, update, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return 0;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 /**
@@ -6845,15 +7051,28 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 	uint32_t idx = act_idx & ((1u << MLX5_INDIRECT_ACTION_TYPE_OFFSET) - 1);
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_aso_mtr_pool *pool = priv->hws_mpool;
+	struct mlx5_hw_q_job *job = NULL;
 	struct mlx5_aso_mtr *aso_mtr;
 	struct mlx5_flow_meter_info *fm;
+	bool push = true;
+	bool aso = false;
+	int ret = 0;
 
-	RTE_SET_USED(queue);
-	RTE_SET_USED(attr);
-	RTE_SET_USED(user_data);
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_DESTROY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return mlx5_hws_age_action_destroy(priv, age_idx, error);
+		ret = mlx5_hws_age_action_destroy(priv, age_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
 		age_idx = mlx5_hws_cnt_age_get(priv->hws_cpool, act_idx);
 		if (age_idx != 0)
@@ -6862,39 +7081,69 @@ flow_hw_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
 			 * time to update the AGE.
 			 */
 			mlx5_hws_age_nb_cnt_decrease(priv, age_idx);
-		return mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		ret = mlx5_hws_cnt_shared_put(priv->hws_cpool, &act_idx);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_destroy(dev, act_idx, error);
+		ret = flow_hw_conntrack_destroy(dev, act_idx, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_METER_MARK:
 		aso_mtr = mlx5_ipool_get(pool->idx_pool, idx);
-		if (!aso_mtr)
-			return rte_flow_error_set(error, EINVAL,
+		if (!aso_mtr) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Invalid meter_mark destroy index");
+			break;
+		}
 		fm = &aso_mtr->fm;
 		fm->is_enable = 0;
 		/* Update ASO flow meter by wqe. */
 		if (mlx5_aso_meter_update_by_wqe(priv->sh, queue, aso_mtr,
-						 &priv->mtr_bulk))
-			return rte_flow_error_set(error, EINVAL,
+						 &priv->mtr_bulk, job, push)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to update ASO meter WQE");
+			break;
+		}
 		/* Wait for ASO object completion. */
 		if (queue == MLX5_HW_INV_QUEUE &&
-		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr))
-			return rte_flow_error_set(error, EINVAL,
+		    mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr)) {
+			ret = -EINVAL;
+			rte_flow_error_set(error, EINVAL,
 				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				NULL, "Unable to wait for ASO meter CQE");
-		mlx5_ipool_free(pool->idx_pool, idx);
+			break;
+		}
+		if (!job)
+			mlx5_ipool_free(pool->idx_pool, idx);
+		else
+			aso = true;
 		break;
 	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_destroy(dev, handle, error);
+		ret = flow_dv_action_destroy(dev, handle, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
 	}
-	return 0;
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
+	}
+	return ret;
 }
 
 static int
@@ -7118,28 +7367,76 @@ flow_hw_action_update(struct rte_eth_dev *dev,
 }
 
 static int
-flow_hw_action_query(struct rte_eth_dev *dev,
-		     const struct rte_flow_action_handle *handle, void *data,
-		     struct rte_flow_error *error)
+flow_hw_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
+			    const struct rte_flow_op_attr *attr,
+			    const struct rte_flow_action_handle *handle,
+			    void *data, void *user_data,
+			    struct rte_flow_error *error)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hw_q_job *job = NULL;
 	uint32_t act_idx = (uint32_t)(uintptr_t)handle;
 	uint32_t type = act_idx >> MLX5_INDIRECT_ACTION_TYPE_OFFSET;
 	uint32_t age_idx = act_idx & MLX5_HWS_AGE_IDX_MASK;
+	int ret;
+	bool push = true;
+	bool aso = false;
 
+	if (attr) {
+		MLX5_ASSERT(queue != MLX5_HW_INV_QUEUE);
+		if (unlikely(!priv->hw_q[queue].job_idx))
+			return rte_flow_error_set(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				"Action destroy failed due to queue full.");
+		job = priv->hw_q[queue].job[--priv->hw_q[queue].job_idx];
+		job->type = MLX5_HW_Q_JOB_TYPE_QUERY;
+		job->user_data = user_data;
+		push = !attr->postpone;
+	}
 	switch (type) {
 	case MLX5_INDIRECT_ACTION_TYPE_AGE:
-		return flow_hw_query_age(dev, age_idx, data, error);
+		ret = flow_hw_query_age(dev, age_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_COUNT:
-		return flow_hw_query_counter(dev, act_idx, data, error);
+		ret = flow_hw_query_counter(dev, act_idx, data, error);
+		break;
 	case MLX5_INDIRECT_ACTION_TYPE_CT:
-		return flow_hw_conntrack_query(dev, act_idx, data, error);
-	case MLX5_INDIRECT_ACTION_TYPE_RSS:
-		return flow_dv_action_query(dev, handle, data, error);
+		aso = true;
+		if (job)
+			job->profile = (struct rte_flow_action_conntrack *)data;
+		ret = flow_hw_conntrack_query(dev, queue, act_idx, data,
+					      job, push, error);
+		break;
 	default:
-		return rte_flow_error_set(error, ENOTSUP,
+		ret = -ENOTSUP;
+		rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "action type not supported");
+		break;
+	}
+	if (job) {
+		if (ret) {
+			priv->hw_q[queue].job_idx++;
+			return ret;
+		}
+		job->action = handle;
+		if (push)
+			__flow_hw_push_action(dev, queue);
+		if (aso)
+			return ret;
+		rte_ring_enqueue(push ? priv->hw_q[queue].indir_cq :
+				 priv->hw_q[queue].indir_iq, job);
 	}
+	return 0;
+}
+
+static int
+flow_hw_action_query(struct rte_eth_dev *dev,
+		     const struct rte_flow_action_handle *handle, void *data,
+		     struct rte_flow_error *error)
+{
+	return flow_hw_action_handle_query(dev, MLX5_HW_INV_QUEUE, NULL,
+			handle, data, NULL, error);
 }
 
 /**
@@ -7254,6 +7551,7 @@ const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops = {
 	.async_action_create = flow_hw_action_handle_create,
 	.async_action_destroy = flow_hw_action_handle_destroy,
 	.async_action_update = flow_hw_action_handle_update,
+	.async_action_query = flow_hw_action_handle_query,
 	.action_validate = flow_hw_action_validate,
 	.action_create = flow_hw_action_create,
 	.action_destroy = flow_hw_action_destroy,
diff --git a/drivers/net/mlx5/mlx5_flow_meter.c b/drivers/net/mlx5/mlx5_flow_meter.c
index ed2306283d..08f8aad70a 100644
--- a/drivers/net/mlx5/mlx5_flow_meter.c
+++ b/drivers/net/mlx5/mlx5_flow_meter.c
@@ -1632,7 +1632,7 @@ mlx5_flow_meter_action_modify(struct mlx5_priv *priv,
 		fm->is_enable = !!is_enable;
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			return ret;
 		ret = mlx5_aso_mtr_wait(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr);
@@ -1882,7 +1882,7 @@ mlx5_flow_meter_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	if (priv->sh->meter_aso_en) {
 		aso_mtr = container_of(fm, struct mlx5_aso_mtr, fm);
 		ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE,
-						   aso_mtr, &priv->mtr_bulk);
+						   aso_mtr, &priv->mtr_bulk, NULL, true);
 		if (ret)
 			goto error;
 		if (!priv->mtr_idx_tbl) {
@@ -1988,7 +1988,7 @@ mlx5_flow_meter_hws_create(struct rte_eth_dev *dev, uint32_t meter_id,
 	fm->initialized = 1;
 	/* Update ASO flow meter by wqe. */
 	ret = mlx5_aso_meter_update_by_wqe(priv->sh, MLX5_HW_INV_QUEUE, aso_mtr,
-					   &priv->mtr_bulk);
+					   &priv->mtr_bulk, NULL, true);
 	if (ret)
 		return -rte_mtr_error_set(error, ENOTSUP,
 			RTE_MTR_ERROR_TYPE_UNSPECIFIED,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 15/18] net/mlx5: support flow integrity in HWS group 0
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (13 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 14/18] net/mlx5: add async action push and pull support Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:47     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
                     ` (3 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Gregory Etelson

From: Gregory Etelson <getelson@nvidia.com>

- Reformat flow integrity item translation for HWS code.
- Support flow integrity bits in HWS group 0.
- Update integrity item translation to match positive semantics only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.h    |   1 +
 drivers/net/mlx5/mlx5_flow_dv.c | 163 ++++++++++++++++----------------
 drivers/net/mlx5/mlx5_flow_hw.c |   8 ++
 3 files changed, 90 insertions(+), 82 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 10d4cdb502..8ba3c2ddb1 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1473,6 +1473,7 @@ struct mlx5_dv_matcher_workspace {
 	struct mlx5_flow_rss_desc *rss_desc; /* RSS descriptor. */
 	const struct rte_flow_item *tunnel_item; /* Flow tunnel item. */
 	const struct rte_flow_item *gre_item; /* Flow GRE item. */
+	const struct rte_flow_item *integrity_items[2];
 };
 
 struct mlx5_flow_split_info {
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 42c4231286..5c6ecc4a1a 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -12695,132 +12695,121 @@ flow_dv_aso_age_params_init(struct rte_eth_dev *dev,
 
 static void
 flow_dv_translate_integrity_l4(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v)
+			       void *headers)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value is used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l4_ok) {
 		/* RTE l4_ok filter aggregates hardware l4_ok and
 		 * l4_checksum_ok filters.
 		 * Positive RTE l4_ok match requires hardware match on both L4
 		 * hardware integrity bits.
-		 * For negative match, check hardware l4_checksum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L4.
+		 * PMD supports positive integrity item semantics only.
 		 */
-		if (value->l4_ok) {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_ok, 1);
-		}
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 !!value->l4_ok);
-	}
-	if (mask->l4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, l4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, l4_checksum_ok,
-			 value->l4_csum_ok);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_ok, 1);
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
+	} else if (mask->l4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l4_checksum_ok, 1);
 	}
 }
 
 static void
 flow_dv_translate_integrity_l3(const struct rte_flow_item_integrity *mask,
-			       const struct rte_flow_item_integrity *value,
-			       void *headers_m, void *headers_v, bool is_ipv4)
+			       void *headers, bool is_ipv4)
 {
+	/*
+	 * In HWS mode MLX5_ITEM_UPDATE() macro assigns the same pointer to
+	 * both mask and value, therefore ether can be used.
+	 * In SWS SW_V mode mask points to item mask and value points to item
+	 * spec. Integrity item value used only if matching mask is set.
+	 * Use mask reference here to keep SWS functionality.
+	 */
 	if (mask->l3_ok) {
 		/* RTE l3_ok filter aggregates for IPv4 hardware l3_ok and
 		 * ipv4_csum_ok filters.
 		 * Positive RTE l3_ok match requires hardware match on both L3
 		 * hardware integrity bits.
-		 * For negative match, check hardware l3_csum_ok bit only,
-		 * because hardware sets that bit to 0 for all packets
-		 * with bad L3.
+		 * PMD supports positive integrity item semantics only.
 		 */
+		MLX5_SET(fte_match_set_lyr_2_4, headers, l3_ok, 1);
 		if (is_ipv4) {
-			if (value->l3_ok) {
-				MLX5_SET(fte_match_set_lyr_2_4, headers_m,
-					 l3_ok, 1);
-				MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-					 l3_ok, 1);
-			}
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m,
+			MLX5_SET(fte_match_set_lyr_2_4, headers,
 				 ipv4_checksum_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-				 ipv4_checksum_ok, !!value->l3_ok);
-		} else {
-			MLX5_SET(fte_match_set_lyr_2_4, headers_m, l3_ok, 1);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, l3_ok,
-				 value->l3_ok);
 		}
-	}
-	if (mask->ipv4_csum_ok) {
-		MLX5_SET(fte_match_set_lyr_2_4, headers_m, ipv4_checksum_ok, 1);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ipv4_checksum_ok,
-			 value->ipv4_csum_ok);
+	} else if (is_ipv4 && mask->ipv4_csum_ok) {
+		MLX5_SET(fte_match_set_lyr_2_4, headers, ipv4_checksum_ok, 1);
 	}
 }
 
 static void
-set_integrity_bits(void *headers_m, void *headers_v,
-		   const struct rte_flow_item *integrity_item, bool is_l3_ip4)
+set_integrity_bits(void *headers, const struct rte_flow_item *integrity_item,
+		   bool is_l3_ip4, uint32_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = integrity_item->spec;
-	const struct rte_flow_item_integrity *mask = integrity_item->mask;
+	const struct rte_flow_item_integrity *spec;
+	const struct rte_flow_item_integrity *mask;
 
 	/* Integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (!mask)
-		mask = &rte_flow_item_integrity_mask;
-	flow_dv_translate_integrity_l3(mask, spec, headers_m, headers_v,
-				       is_l3_ip4);
-	flow_dv_translate_integrity_l4(mask, spec, headers_m, headers_v);
+	if (MLX5_ITEM_VALID(integrity_item, key_type))
+		return;
+	MLX5_ITEM_UPDATE(integrity_item, key_type, spec, mask,
+			 &rte_flow_item_integrity_mask);
+	flow_dv_translate_integrity_l3(mask, headers, is_l3_ip4);
+	flow_dv_translate_integrity_l4(mask, headers);
 }
 
 static void
-flow_dv_translate_item_integrity_post(void *matcher, void *key,
+flow_dv_translate_item_integrity_post(void *key,
 				      const
 				      struct rte_flow_item *integrity_items[2],
-				      uint64_t pattern_flags)
+				      uint64_t pattern_flags, uint32_t key_type)
 {
-	void *headers_m, *headers_v;
+	void *headers;
 	bool is_l3_ip4;
 
 	if (pattern_flags & MLX5_FLOW_ITEM_INNER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 inner_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, inner_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_INNER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[1], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[1], is_l3_ip4,
+				   key_type);
 	}
 	if (pattern_flags & MLX5_FLOW_ITEM_OUTER_INTEGRITY) {
-		headers_m = MLX5_ADDR_OF(fte_match_param, matcher,
-					 outer_headers);
-		headers_v = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
+		headers = MLX5_ADDR_OF(fte_match_param, key, outer_headers);
 		is_l3_ip4 = (pattern_flags & MLX5_FLOW_LAYER_OUTER_L3_IPV4) !=
 			    0;
-		set_integrity_bits(headers_m, headers_v,
-				   integrity_items[0], is_l3_ip4);
+		set_integrity_bits(headers, integrity_items[0], is_l3_ip4,
+				   key_type);
 	}
 }
 
-static void
+static uint64_t
 flow_dv_translate_item_integrity(const struct rte_flow_item *item,
-				 const struct rte_flow_item *integrity_items[2],
-				 uint64_t *last_item)
+				 struct mlx5_dv_matcher_workspace *wks,
+				 uint64_t key_type)
 {
-	const struct rte_flow_item_integrity *spec = (typeof(spec))item->spec;
+	if ((key_type & MLX5_SET_MATCHER_SW) != 0) {
+		const struct rte_flow_item_integrity
+			*spec = (typeof(spec))item->spec;
 
-	/* integrity bits validation cleared spec pointer */
-	MLX5_ASSERT(spec != NULL);
-	if (spec->level > 1) {
-		integrity_items[1] = item;
-		*last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		/* SWS integrity bits validation cleared spec pointer */
+		if (spec->level > 1) {
+			wks->integrity_items[1] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_INNER_INTEGRITY;
+		} else {
+			wks->integrity_items[0] = item;
+			wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		}
 	} else {
-		integrity_items[0] = item;
-		*last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
+		/* HWS supports outer integrity only */
+		wks->integrity_items[0] = item;
+		wks->last_item |= MLX5_FLOW_ITEM_OUTER_INTEGRITY;
 	}
+	return wks->last_item;
 }
 
 /**
@@ -13448,6 +13437,10 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		flow_dv_translate_item_meter_color(dev, key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_METER_COLOR;
 		break;
+	case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+		last_item = flow_dv_translate_item_integrity(items,
+							     wks, key_type);
+		break;
 	default:
 		break;
 	}
@@ -13511,6 +13504,12 @@ flow_dv_translate_items_hws(const struct rte_flow_item *items,
 		if (ret)
 			return ret;
 	}
+	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
+		flow_dv_translate_item_integrity_post(key,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      key_type);
+	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(key,
 						 wks.tunnel_item,
@@ -13591,7 +13590,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			     mlx5_flow_get_thread_workspace())->rss_desc,
 	};
 	struct mlx5_dv_matcher_workspace wks_m = wks;
-	const struct rte_flow_item *integrity_items[2] = {NULL, NULL};
 	int ret = 0;
 	int tunnel;
 
@@ -13602,10 +13600,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 						  NULL, "item not supported");
 		tunnel = !!(wks.item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		switch (items->type) {
-		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
-			flow_dv_translate_item_integrity(items, integrity_items,
-							 &wks.last_item);
-			break;
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			flow_dv_translate_item_aso_ct(dev, match_mask,
 						      match_value, items);
@@ -13648,9 +13642,14 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			return -rte_errno;
 	}
 	if (wks.item_flags & MLX5_FLOW_ITEM_INTEGRITY) {
-		flow_dv_translate_item_integrity_post(match_mask, match_value,
-						      integrity_items,
-						      wks.item_flags);
+		flow_dv_translate_item_integrity_post(match_mask,
+						      wks_m.integrity_items,
+						      wks_m.item_flags,
+						      MLX5_SET_MATCHER_SW_M);
+		flow_dv_translate_item_integrity_post(match_value,
+						      wks.integrity_items,
+						      wks.item_flags,
+						      MLX5_SET_MATCHER_SW_V);
 	}
 	if (wks.item_flags & MLX5_FLOW_LAYER_VXLAN_GPE) {
 		flow_dv_translate_item_vxlan_gpe(match_mask,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 59c5383553..07b58db044 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -4658,6 +4658,14 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ITEM_TYPE_ICMP6:
 		case RTE_FLOW_ITEM_TYPE_CONNTRACK:
 			break;
+		case RTE_FLOW_ITEM_TYPE_INTEGRITY:
+			/*
+			 * Integrity flow item validation require access to
+			 * both item mask and spec.
+			 * Current HWS model allows item mask in pattern
+			 * template and item spec in flow rule.
+			 */
+			break;
 		case RTE_FLOW_ITEM_TYPE_END:
 			items_end = true;
 			break;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 16/18] net/mlx5: support device control for E-Switch default rule
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (14 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:47     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 17/18] net/mlx5: support device control of representor matching Suanming Mou
                     ` (2 subsequent siblings)
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ray Kinsella
  Cc: dev, rasland, orika, Dariusz Sosnowski, Xueming Li

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adds support for fdb_def_rule_en device argument to HW
Steering, which controls:

- creation of default FDB jump flow rule,
- ability of the user to create transfer flow rules in root table.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 doc/guides/nics/features/mlx5.ini |   1 +
 drivers/net/mlx5/linux/mlx5_os.c  |  14 ++
 drivers/net/mlx5/mlx5.h           |   4 +-
 drivers/net/mlx5/mlx5_flow.c      |  20 +--
 drivers/net/mlx5/mlx5_flow.h      |   5 +-
 drivers/net/mlx5/mlx5_flow_dv.c   |  62 ++++---
 drivers/net/mlx5/mlx5_flow_hw.c   | 273 +++++++++++++++---------------
 drivers/net/mlx5/mlx5_trigger.c   |  31 ++--
 drivers/net/mlx5/mlx5_tx.h        |   1 +
 drivers/net/mlx5/mlx5_txq.c       |  47 +++++
 drivers/net/mlx5/rte_pmd_mlx5.h   |  17 ++
 drivers/net/mlx5/version.map      |   1 +
 12 files changed, 288 insertions(+), 188 deletions(-)

diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini
index de4b109c31..0ac0fa9663 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -85,6 +85,7 @@ vxlan                = Y
 vxlan_gpe            = Y
 represented_port     = Y
 meter_color          = Y
+port_representor     = Y
 
 [rte_flow actions]
 age                  = I
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 5f1fd9b4e7..a6cb802500 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1567,6 +1567,20 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	rte_rwlock_init(&priv->ind_tbls_lock);
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
+		if (priv->sh->config.dv_esw_en) {
+			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
+				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
+					     "but it is disabled (configure it through devlink)");
+				err = ENOTSUP;
+				goto error;
+			}
+			if (priv->sh->dv_regc0_mask == 0) {
+				DRV_LOG(ERR, "E-Switch with HWS is not supported "
+					     "(no available bits in reg_c[0])");
+				err = ENOTSUP;
+				goto error;
+			}
+		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
 		if (priv->sh->config.dv_esw_en &&
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 42a1e206c0..a715df693e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2028,7 +2028,7 @@ int mlx5_flow_ops_get(struct rte_eth_dev *dev, const struct rte_flow_ops **ops);
 int mlx5_flow_start_default(struct rte_eth_dev *dev);
 void mlx5_flow_stop_default(struct rte_eth_dev *dev);
 int mlx5_flow_verify(struct rte_eth_dev *dev);
-int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t queue);
+int mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev, uint32_t sq_num);
 int mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 			struct rte_flow_item_eth *eth_spec,
 			struct rte_flow_item_eth *eth_mask,
@@ -2040,7 +2040,7 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev,
 int mlx5_flow_lacp_miss(struct rte_eth_dev *dev);
 struct rte_flow *mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev);
 uint32_t mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev,
-					    uint32_t txq);
+					    uint32_t sq_num);
 void mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 				       uint64_t async_id, int status);
 void mlx5_set_query_alarm(struct mlx5_dev_ctx_shared *sh);
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 9121b90b4e..01ad1f774b 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -7159,14 +7159,14 @@ mlx5_flow_create_esw_table_zero_flow(struct rte_eth_dev *dev)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param txq
- *   Txq index.
+ * @param sq_num
+ *   SQ number.
  *
  * @return
  *   Flow ID on success, 0 otherwise and rte_errno is set.
  */
 uint32_t
-mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
+mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sq_num)
 {
 	struct rte_flow_attr attr = {
 		.group = 0,
@@ -7178,8 +7178,8 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 	struct rte_flow_item_port_id port_spec = {
 		.id = MLX5_PORT_ESW_MGR,
 	};
-	struct mlx5_rte_flow_item_sq txq_spec = {
-		.queue = txq,
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sq_num,
 	};
 	struct rte_flow_item pattern[] = {
 		{
@@ -7189,7 +7189,7 @@ mlx5_flow_create_devx_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
 		{
 			.type = (enum rte_flow_item_type)
 				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
-			.spec = &txq_spec,
+			.spec = &sq_spec,
 		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
@@ -7560,22 +7560,22 @@ mlx5_flow_verify(struct rte_eth_dev *dev __rte_unused)
  *
  * @param dev
  *   Pointer to Ethernet device.
- * @param queue
- *   The queue index.
+ * @param sq_num
+ *   The SQ hw number.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx5_ctrl_flow_source_queue(struct rte_eth_dev *dev,
-			    uint32_t queue)
+			    uint32_t sq_num)
 {
 	const struct rte_flow_attr attr = {
 		.egress = 1,
 		.priority = 0,
 	};
 	struct mlx5_rte_flow_item_sq queue_spec = {
-		.queue = queue,
+		.queue = sq_num,
 	};
 	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 8ba3c2ddb1..1a4b33d592 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -116,7 +116,7 @@ struct mlx5_flow_action_copy_mreg {
 
 /* Matches on source queue. */
 struct mlx5_rte_flow_item_sq {
-	uint32_t queue;
+	uint32_t queue; /* DevX SQ number */
 };
 
 /* Feature name to allocate metadata register. */
@@ -2491,9 +2491,8 @@ int mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 
 int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
 
-int mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
-					 uint32_t txq);
+					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 5c6ecc4a1a..dbe55a5103 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -10125,6 +10125,29 @@ flow_dv_translate_item_port_id(struct rte_eth_dev *dev, void *key,
 	return 0;
 }
 
+/**
+ * Translate port representor item to eswitch match on port id.
+ *
+ * @param[in] dev
+ *   The devich to configure through.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] key_type
+ *   Set flow matcher mask or value.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise.
+ */
+static int
+flow_dv_translate_item_port_representor(struct rte_eth_dev *dev, void *key,
+					uint32_t key_type)
+{
+	flow_dv_translate_item_source_vport(key,
+			key_type & MLX5_SET_MATCHER_V ?
+			mlx5_flow_get_esw_manager_vport_id(dev) : 0xffff);
+	return 0;
+}
+
 /**
  * Translate represented port item to eswitch match on port id.
  *
@@ -11404,10 +11427,10 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
 }
 
 /**
- * Add Tx queue matcher
+ * Add SQ matcher
  *
- * @param[in] dev
- *   Pointer to the dev struct.
+ * @param[in, out] matcher
+ *   Flow matcher.
  * @param[in, out] key
  *   Flow matcher value.
  * @param[in] item
@@ -11416,40 +11439,29 @@ flow_dv_translate_create_counter(struct rte_eth_dev *dev,
  *   Set flow matcher mask or value.
  */
 static void
-flow_dv_translate_item_tx_queue(struct rte_eth_dev *dev,
-				void *key,
-				const struct rte_flow_item *item,
-				uint32_t key_type)
+flow_dv_translate_item_sq(void *key,
+			  const struct rte_flow_item *item,
+			  uint32_t key_type)
 {
 	const struct mlx5_rte_flow_item_sq *queue_m;
 	const struct mlx5_rte_flow_item_sq *queue_v;
 	const struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
 	};
-	void *misc_v =
-		MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
-	struct mlx5_txq_ctrl *txq = NULL;
+	void *misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
 	uint32_t queue;
 
 	MLX5_ITEM_UPDATE(item, key_type, queue_v, queue_m, &queue_mask);
 	if (!queue_m || !queue_v)
 		return;
 	if (key_type & MLX5_SET_MATCHER_V) {
-		txq = mlx5_txq_get(dev, queue_v->queue);
-		if (!txq)
-			return;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = queue_v->queue;
 		if (key_type == MLX5_SET_MATCHER_SW_V)
 			queue &= queue_m->queue;
 	} else {
 		queue = queue_m->queue;
 	}
 	MLX5_SET(fte_match_set_misc, misc_v, source_sqn, queue);
-	if (txq)
-		mlx5_txq_release(dev, queue_v->queue);
 }
 
 /**
@@ -13195,6 +13207,11 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 			(dev, key, items, wks->attr, key_type);
 		last_item = MLX5_FLOW_ITEM_PORT_ID;
 		break;
+	case RTE_FLOW_ITEM_TYPE_PORT_REPRESENTOR:
+		flow_dv_translate_item_port_representor
+			(dev, key, key_type);
+		last_item = MLX5_FLOW_ITEM_PORT_REPRESENTOR;
+		break;
 	case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
 		flow_dv_translate_item_represented_port
 			(dev, key, items, wks->attr, key_type);
@@ -13401,7 +13418,7 @@ flow_dv_translate_items(struct rte_eth_dev *dev,
 		last_item = MLX5_FLOW_ITEM_TAG;
 		break;
 	case MLX5_RTE_FLOW_ITEM_TYPE_SQ:
-		flow_dv_translate_item_tx_queue(dev, key, items, key_type);
+		flow_dv_translate_item_sq(key, items, key_type);
 		last_item = MLX5_FLOW_ITEM_SQ;
 		break;
 	case RTE_FLOW_ITEM_TYPE_GTP:
@@ -13611,7 +13628,6 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 			wks.last_item = tunnel ? MLX5_FLOW_ITEM_INNER_FLEX :
 						 MLX5_FLOW_ITEM_OUTER_FLEX;
 			break;
-
 		default:
 			ret = flow_dv_translate_items(dev, items, &wks_m,
 				match_mask, MLX5_SET_MATCHER_SW_M, error);
@@ -13634,7 +13650,9 @@ flow_dv_translate_items_sws(struct rte_eth_dev *dev,
 	 * in use.
 	 */
 	if (!(wks.item_flags & MLX5_FLOW_ITEM_PORT_ID) &&
-	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) && priv->sh->esw_mode &&
+	    !(wks.item_flags & MLX5_FLOW_ITEM_REPRESENTED_PORT) &&
+	    !(wks.item_flags & MLX5_FLOW_ITEM_PORT_REPRESENTOR) &&
+	    priv->sh->esw_mode &&
 	    !(attr->egress && !attr->transfer) &&
 	    attr->group != MLX5_FLOW_MREG_CP_TABLE_GROUP) {
 		if (flow_dv_translate_item_port_id_all(dev, match_mask,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 07b58db044..1516ee9e25 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3176,7 +3176,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en && cfg->external && flow_attr->transfer) {
+	if (priv->sh->config.dv_esw_en &&
+	    priv->fdb_def_rule &&
+	    cfg->external &&
+	    flow_attr->transfer) {
 		if (group > MLX5_HW_MAX_TRANSFER_GROUP)
 			return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -5140,14 +5143,23 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 }
 
 static uint32_t
-flow_hw_usable_lsb_vport_mask(struct mlx5_priv *priv)
+flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
-	uint32_t usable_mask = ~priv->vport_meta_mask;
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
 
-	if (usable_mask)
-		return (1 << rte_bsf32(usable_mask));
-	else
-		return 0;
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return mask;
+}
+
+static uint32_t
+flow_hw_esw_mgr_regc_marker(struct rte_eth_dev *dev)
+{
+	uint32_t mask = MLX5_SH(dev)->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. */
+	MLX5_ASSERT(mask != 0);
+	return RTE_BIT32(rte_bsf32(mask));
 }
 
 /**
@@ -5173,12 +5185,19 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 	struct rte_flow_item_ethdev port_mask = {
 		.port_id = UINT16_MAX,
 	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
 	struct rte_flow_item items[] = {
 		{
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &port_spec,
 			.mask = &port_mask,
 		},
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
@@ -5188,9 +5207,10 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 }
 
 /**
- * Creates a flow pattern template used to match REG_C_0 and a TX queue.
- * Matching on REG_C_0 is set up to match on least significant bit usable
- * by user-space, which is set when packet was originated from E-Switch Manager.
+ * Creates a flow pattern template used to match REG_C_0 and a SQ.
+ * Matching on REG_C_0 is set up to match on all bits usable by user-space.
+ * If traffic was sent from E-Switch Manager, then all usable bits will be set to 0,
+ * except the least significant bit, which will be set to 1.
  *
  * This template is used to set up a table for SQ miss default flow.
  *
@@ -5203,8 +5223,6 @@ flow_hw_create_ctrl_esw_mgr_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_pattern_template *
 flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
 	struct rte_flow_pattern_template_attr attr = {
 		.relaxed_matching = 0,
 		.transfer = 1,
@@ -5214,6 +5232,7 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
 	struct mlx5_rte_flow_item_sq queue_mask = {
 		.queue = UINT32_MAX,
@@ -5235,12 +5254,6 @@ flow_hw_create_ctrl_regc_sq_pattern_template(struct rte_eth_dev *dev)
 		},
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up pattern template for SQ miss table");
-		return NULL;
-	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
 	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
 }
 
@@ -5332,9 +5345,8 @@ flow_hw_create_tx_default_mreg_copy_pattern_template(struct rte_eth_dev *dev)
 static struct rte_flow_actions_template *
 flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	uint32_t marker_bit = flow_hw_usable_lsb_vport_mask(priv);
-	uint32_t marker_bit_mask = UINT32_MAX;
+	uint32_t marker_mask = flow_hw_esw_mgr_regc_marker_mask(dev);
+	uint32_t marker_bits = flow_hw_esw_mgr_regc_marker(dev);
 	struct rte_flow_actions_template_attr attr = {
 		.transfer = 1,
 	};
@@ -5347,7 +5359,7 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		.src = {
 			.field = RTE_FLOW_FIELD_VALUE,
 		},
-		.width = 1,
+		.width = __builtin_popcount(marker_mask),
 	};
 	struct rte_flow_action_modify_field set_reg_m = {
 		.operation = RTE_FLOW_MODIFY_SET,
@@ -5394,13 +5406,9 @@ flow_hw_create_ctrl_regc_jump_actions_template(struct rte_eth_dev *dev)
 		}
 	};
 
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up actions template for SQ miss table");
-		return NULL;
-	}
-	set_reg_v.dst.offset = rte_bsf32(marker_bit);
-	rte_memcpy(set_reg_v.src.value, &marker_bit, sizeof(marker_bit));
-	rte_memcpy(set_reg_m.src.value, &marker_bit_mask, sizeof(marker_bit_mask));
+	set_reg_v.dst.offset = rte_bsf32(marker_mask);
+	rte_memcpy(set_reg_v.src.value, &marker_bits, sizeof(marker_bits));
+	rte_memcpy(set_reg_m.src.value, &marker_mask, sizeof(marker_mask));
 	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
 }
 
@@ -5587,7 +5595,7 @@ flow_hw_create_ctrl_sq_miss_root_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -5702,7 +5710,7 @@ flow_hw_create_ctrl_jump_table(struct rte_eth_dev *dev,
 	struct rte_flow_template_table_attr attr = {
 		.flow_attr = {
 			.group = 0,
-			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.priority = 0,
 			.ingress = 0,
 			.egress = 0,
 			.transfer = 1,
@@ -7800,141 +7808,123 @@ flow_hw_flush_all_ctrl_flows(struct rte_eth_dev *dev)
 }
 
 int
-mlx5_flow_hw_esw_create_mgr_sq_miss_flow(struct rte_eth_dev *dev)
+mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sqn)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
-	struct rte_flow_item_ethdev port_spec = {
+	uint16_t port_id = dev->data->port_id;
+	struct rte_flow_item_ethdev esw_mgr_spec = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item_ethdev port_mask = {
+	struct rte_flow_item_ethdev esw_mgr_mask = {
 		.port_id = MLX5_REPRESENTED_PORT_ESW_MGR,
 	};
-	struct rte_flow_item items[] = {
-		{
-			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-			.spec = &port_spec,
-			.mask = &port_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
-	};
-	struct rte_flow_action_modify_field modify_field = {
-		.operation = RTE_FLOW_MODIFY_SET,
-		.dst = {
-			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
-		},
-		.src = {
-			.field = RTE_FLOW_FIELD_VALUE,
-		},
-		.width = 1,
-	};
-	struct rte_flow_action_jump jump = {
-		.group = 1,
-	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
-			.conf = &modify_field,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_JUMP,
-			.conf = &jump,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
-
-	MLX5_ASSERT(priv->master);
-	if (!priv->dr_ctx ||
-	    !priv->hw_esw_sq_miss_root_tbl)
-		return 0;
-	return flow_hw_create_ctrl_flow(dev, dev,
-					priv->hw_esw_sq_miss_root_tbl,
-					items, 0, actions, 0);
-}
-
-int
-mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev, uint32_t txq)
-{
-	uint16_t port_id = dev->data->port_id;
 	struct rte_flow_item_tag reg_c0_spec = {
 		.index = (uint8_t)REG_C_0,
+		.data = flow_hw_esw_mgr_regc_marker(dev),
 	};
 	struct rte_flow_item_tag reg_c0_mask = {
 		.index = 0xff,
+		.data = flow_hw_esw_mgr_regc_marker_mask(dev),
 	};
-	struct mlx5_rte_flow_item_sq queue_spec = {
-		.queue = txq,
-	};
-	struct mlx5_rte_flow_item_sq queue_mask = {
-		.queue = UINT32_MAX,
-	};
-	struct rte_flow_item items[] = {
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_TAG,
-			.spec = &reg_c0_spec,
-			.mask = &reg_c0_mask,
-		},
-		{
-			.type = (enum rte_flow_item_type)
-				MLX5_RTE_FLOW_ITEM_TYPE_SQ,
-			.spec = &queue_spec,
-			.mask = &queue_mask,
-		},
-		{
-			.type = RTE_FLOW_ITEM_TYPE_END,
-		},
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
 	};
 	struct rte_flow_action_ethdev port = {
 		.port_id = port_id,
 	};
-	struct rte_flow_action actions[] = {
-		{
-			.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
-			.conf = &port,
-		},
-		{
-			.type = RTE_FLOW_ACTION_TYPE_END,
-		},
-	};
+	struct rte_flow_item items[3] = { { 0 } };
+	struct rte_flow_action actions[3] = { { 0 } };
 	struct rte_eth_dev *proxy_dev;
 	struct mlx5_priv *proxy_priv;
 	uint16_t proxy_port_id = dev->data->port_id;
-	uint32_t marker_bit;
 	int ret;
 
-	RTE_SET_USED(txq);
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default SQ miss flows.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default SQ miss flows. Default flows will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_sq_miss_root_tbl ||
 	    !proxy_priv->hw_esw_sq_miss_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = ENOMEM;
 		return -rte_errno;
 	}
-	marker_bit = flow_hw_usable_lsb_vport_mask(proxy_priv);
-	if (!marker_bit) {
-		DRV_LOG(ERR, "Unable to set up control flow in SQ miss table");
-		rte_errno = EINVAL;
-		return -rte_errno;
+	/*
+	 * Create a root SQ miss flow rule - match E-Switch Manager and SQ,
+	 * and jump to group 1.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.spec = &esw_mgr_spec,
+		.mask = &esw_mgr_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_JUMP,
+	};
+	actions[2] = (struct rte_flow_action) {
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_root_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create root SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
 	}
-	reg_c0_spec.data = marker_bit;
-	reg_c0_mask.data = marker_bit;
-	return flow_hw_create_ctrl_flow(dev, proxy_dev,
-					proxy_priv->hw_esw_sq_miss_tbl,
-					items, 0, actions, 0);
+	/*
+	 * Create a non-root SQ miss flow rule - match REG_C_0 marker and SQ,
+	 * and forward to port.
+	 */
+	items[0] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &reg_c0_spec,
+		.mask = &reg_c0_mask,
+	};
+	items[1] = (struct rte_flow_item){
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+		.spec = &sq_spec,
+	};
+	items[2] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_END,
+	};
+	actions[0] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT,
+		.conf = &port,
+	};
+	actions[1] = (struct rte_flow_action){
+		.type = RTE_FLOW_ACTION_TYPE_END,
+	};
+	ret = flow_hw_create_ctrl_flow(dev, proxy_dev, proxy_priv->hw_esw_sq_miss_tbl,
+				       items, 0, actions, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u failed to create HWS SQ miss flow rule for SQ %u, ret %d",
+			port_id, sqn, ret);
+		return ret;
+	}
+	return 0;
 }
 
 int
@@ -7972,17 +7962,24 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
 
 	ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
 	if (ret) {
-		DRV_LOG(ERR, "Unable to pick proxy port for port %u", port_id);
+		DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+			     "port must be present to create default FDB jump rule.",
+			     port_id);
 		return ret;
 	}
 	proxy_dev = &rte_eth_devices[proxy_port_id];
 	proxy_priv = proxy_dev->data->dev_private;
-	if (!proxy_priv->dr_ctx)
+	if (!proxy_priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Transfer proxy port (port %u) of port %u must be configured "
+			       "for HWS to create default FDB jump rule. Default rule will "
+			       "not be created.",
+			       proxy_port_id, port_id);
 		return 0;
+	}
 	if (!proxy_priv->hw_esw_zero_tbl) {
-		DRV_LOG(ERR, "port %u proxy port %u was configured but default"
-			" flow tables are not created",
-			port_id, proxy_port_id);
+		DRV_LOG(ERR, "Transfer proxy port (port %u) of port %u was configured, but "
+			     "default flow tables were not created.",
+			     proxy_port_id, port_id);
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index c260c81e57..715f2891cf 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -426,7 +426,7 @@ mlx5_hairpin_queue_peer_update(struct rte_eth_dev *dev, uint16_t peer_queue,
 			mlx5_txq_release(dev, peer_queue);
 			return -rte_errno;
 		}
-		peer_info->qp_id = txq_ctrl->obj->sq->id;
+		peer_info->qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		peer_info->vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		/* 1-to-1 mapping, only the first one is used. */
 		peer_info->peer_q = txq_ctrl->hairpin_conf.peers[0].queue;
@@ -818,7 +818,7 @@ mlx5_hairpin_bind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 		}
 		/* Pass TxQ's information to peer RxQ and try binding. */
 		cur.peer_q = rx_queue;
-		cur.qp_id = txq_ctrl->obj->sq->id;
+		cur.qp_id = mlx5_txq_get_sqn(txq_ctrl);
 		cur.vhca_id = priv->sh->cdev->config.hca_attr.vhca_id;
 		cur.tx_explicit = txq_ctrl->hairpin_conf.tx_explicit;
 		cur.manual_bind = txq_ctrl->hairpin_conf.manual_bind;
@@ -1300,8 +1300,6 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	int ret;
 
 	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (mlx5_flow_hw_esw_create_mgr_sq_miss_flow(dev))
-			goto error;
 		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
 			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
 				goto error;
@@ -1312,10 +1310,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 
 		if (!txq)
 			continue;
-		if (txq->is_hairpin)
-			queue = txq->obj->sq->id;
-		else
-			queue = txq->obj->sq_obj.sq->id;
+		queue = mlx5_txq_get_sqn(txq);
 		if ((priv->representor || priv->master) &&
 		    priv->sh->config.dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
@@ -1325,9 +1320,15 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
-		if (mlx5_flow_hw_esw_create_default_jump_flow(dev))
-			goto error;
+	if (priv->sh->config.fdb_def_rule) {
+		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
+				priv->fdb_def_rule = 1;
+			else
+				goto error;
+		}
+	} else {
+		DRV_LOG(INFO, "port %u FDB default rule is disabled", dev->data->port_id);
 	}
 	return 0;
 error:
@@ -1393,14 +1394,18 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		    txq_ctrl->hairpin_conf.tx_explicit == 0 &&
 		    txq_ctrl->hairpin_conf.peers[0].port ==
 		    priv->dev_data->port_id) {
-			ret = mlx5_ctrl_flow_source_queue(dev, i);
+			ret = mlx5_ctrl_flow_source_queue(dev,
+					mlx5_txq_get_sqn(txq_ctrl));
 			if (ret) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
 		if (priv->sh->config.dv_esw_en) {
-			if (mlx5_flow_create_devx_sq_miss_flow(dev, i) == 0) {
+			uint32_t q = mlx5_txq_get_sqn(txq_ctrl);
+
+			if (mlx5_flow_create_devx_sq_miss_flow(dev, q) == 0) {
+				mlx5_txq_release(dev, i);
 				DRV_LOG(ERR,
 					"Port %u Tx queue %u SQ create representor devx default miss rule failed.",
 					dev->data->port_id, i);
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index e0fc1872fe..6471ebf59f 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -213,6 +213,7 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
 uint64_t mlx5_get_tx_port_offloads(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9150ced72d..5543f2c570 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -27,6 +27,8 @@
 #include "mlx5_tx.h"
 #include "mlx5_rxtx.h"
 #include "mlx5_autoconf.h"
+#include "rte_pmd_mlx5.h"
+#include "mlx5_flow.h"
 
 /**
  * Allocate TX queue elements.
@@ -1274,6 +1276,51 @@ mlx5_txq_verify(struct rte_eth_dev *dev)
 	return ret;
 }
 
+int
+mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq)
+{
+	return txq->is_hairpin ? txq->obj->sq->id : txq->obj->sq_obj.sq->id;
+}
+
+int
+rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint32_t flow;
+
+	if (rte_eth_dev_is_valid_port(port_id) < 0) {
+		DRV_LOG(ERR, "There is no Ethernet device for port %u.",
+			port_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if ((!priv->representor && !priv->master) ||
+	    !priv->sh->config.dv_esw_en) {
+		DRV_LOG(ERR, "Port %u must be represetnor or master port in E-Switch mode.",
+			port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (sq_num == 0) {
+		DRV_LOG(ERR, "Invalid SQ number.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2)
+		return mlx5_flow_hw_esw_create_sq_miss_flow(dev, sq_num);
+#endif
+	flow = mlx5_flow_create_devx_sq_miss_flow(dev, sq_num);
+	if (flow > 0)
+		return 0;
+	DRV_LOG(ERR, "Port %u failed to create default miss flow for SQ %u.",
+		port_id, sq_num);
+	return -rte_errno;
+}
+
 /**
  * Set the Tx queue dynamic timestamp (mask and offset)
  *
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index fbfdd9737b..d4caea5b20 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -139,6 +139,23 @@ int rte_pmd_mlx5_external_rx_queue_id_unmap(uint16_t port_id,
 __rte_experimental
 int rte_pmd_mlx5_host_shaper_config(int port_id, uint8_t rate, uint32_t flags);
 
+/**
+ * Enable traffic for external SQ.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] sq_num
+ *   SQ HW number.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ *   Possible values for rte_errno:
+ *   - EINVAL - invalid sq_number or port type.
+ *   - ENODEV - there is no Ethernet device for this port id.
+ */
+__rte_experimental
+int rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/net/mlx5/version.map b/drivers/net/mlx5/version.map
index 9942de5079..848270da13 100644
--- a/drivers/net/mlx5/version.map
+++ b/drivers/net/mlx5/version.map
@@ -14,4 +14,5 @@ EXPERIMENTAL {
 	rte_pmd_mlx5_external_rx_queue_id_unmap;
 	# added in 22.07
 	rte_pmd_mlx5_host_shaper_config;
+	rte_pmd_mlx5_external_sq_enable;
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 17/18] net/mlx5: support device control of representor matching
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (15 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:47     ` Slava Ovsiienko
  2022-10-20 15:41   ` [PATCH v6 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
  2022-10-24 10:57   ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Raslan Darawsheh
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

In some E-Switch use cases applications want to receive all traffic
on a single port. Since currently flow API does not provide a way to
match traffic forwarded to any port representor, this patch adds
support for controlling representor matching on ingress flow rules.

Representor matching is controlled through new device argument
repr_matching_en.

- If representor matching is enabled (default setting),
  then each ingress pattern template has an implicit REPRESENTED_PORT
  item added. Flow rules based on this pattern template will match
  the vport associated with port on which rule is created.
- If representor matching is disabled, then there will be no implicit
  item added. As a result ingress flow rules will match traffic
  coming to any port, not only the port on which flow rule is created.

Representor matching is enabled by default, to provide an expected
default behavior.

This patch enables egress flow rules on representors when E-Switch is
enabled in the following configurations:

- repr_matching_en=1 and dv_xmeta_en=4
- repr_matching_en=1 and dv_xmeta_en=0
- repr_matching_en=0 and dv_xmeta_en=0

When representor matching is enabled, the following logic is
implemented:

1. Creating an egress template table in group 0 for each port. These
   tables will hold default flow rules defined as follows:

      pattern SQ
      actions MODIFY_FIELD (set available bits in REG_C_0 to
                            vport_meta_tag)
              MODIFY_FIELD (copy REG_A to REG_C_1, only when
                            dv_xmeta_en == 4)
              JUMP (group 1)

2. Egress pattern templates created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   available bits of REG_C_0.

3. Egress flow rules created by an application have an implicit
   MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
   vport_meta_tag placed in available bits of REG_C_0.

4. Egress template tables created by an application, which are in
   group n, are placed in group n + 1.

5. Items and actions related to META are operating on REG_A when
   dv_xmeta_en == 0 or REG_C_1 when dv_xmeta_en == 4.

When representor matching is disabled and extended metadata is disabled,
no changes to current logic are required.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/nics/mlx5.rst         |  11 +
 drivers/net/mlx5/linux/mlx5_os.c |  11 +
 drivers/net/mlx5/mlx5.c          |  13 +
 drivers/net/mlx5/mlx5.h          |   5 +
 drivers/net/mlx5/mlx5_flow.c     |   8 +-
 drivers/net/mlx5/mlx5_flow.h     |   7 +
 drivers/net/mlx5/mlx5_flow_hw.c  | 738 ++++++++++++++++++++++++-------
 drivers/net/mlx5/mlx5_trigger.c  | 167 ++++++-
 8 files changed, 794 insertions(+), 166 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index ae4d406ca1..b923976fad 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1163,6 +1163,17 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``repr_matching_en`` parameter [int]
+
+  - 0. If representor matching is disabled, then there will be no implicit
+    item added. As a result ingress flow rules will match traffic
+    coming to any port, not only the port on which flow rule is created.
+
+  - 1. If representor matching is enabled (default setting),
+    then each ingress pattern template has an implicit REPRESENTED_PORT
+    item added. Flow rules based on this pattern template will match
+    the vport associated with port on which rule is created.
+
 Supported NICs
 --------------
 
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index a6cb802500..ab7ffa0931 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1568,6 +1568,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
 		if (priv->sh->config.dv_esw_en) {
+			uint32_t usable_bits;
+			uint32_t required_bits;
+
 			if (priv->sh->dv_regc0_mask == UINT32_MAX) {
 				DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
 					     "but it is disabled (configure it through devlink)");
@@ -1580,6 +1583,14 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 				err = ENOTSUP;
 				goto error;
 			}
+			usable_bits = __builtin_popcount(priv->sh->dv_regc0_mask);
+			required_bits = __builtin_popcount(priv->vport_meta_mask);
+			if (usable_bits < required_bits) {
+				DRV_LOG(ERR, "Not enough bits available in reg_c[0] to provide "
+					     "representor matching.");
+				err = ENOTSUP;
+				goto error;
+			}
 		}
 		if (priv->vport_meta_mask)
 			flow_hw_set_port_info(eth_dev);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4e532f0807..78234b116c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -181,6 +181,9 @@
 /* HW steering counter's query interval. */
 #define MLX5_HWS_CNT_CYCLE_TIME "svc_cycle_time"
 
+/* Device parameter to control representor matching in ingress/egress flows with HWS. */
+#define MLX5_REPR_MATCHING_EN "repr_matching_en"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1283,6 +1286,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->cnt_svc.service_core = tmp;
 	} else if (strcmp(MLX5_HWS_CNT_CYCLE_TIME, key) == 0) {
 		config->cnt_svc.cycle_time = tmp;
+	} else if (strcmp(MLX5_REPR_MATCHING_EN, key) == 0) {
+		config->repr_matching = !!tmp;
 	}
 	return 0;
 }
@@ -1321,6 +1326,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_FDB_DEFAULT_RULE_EN,
 		MLX5_HWS_CNT_SERVICE_CORE,
 		MLX5_HWS_CNT_CYCLE_TIME,
+		MLX5_REPR_MATCHING_EN,
 		NULL,
 	};
 	int ret = 0;
@@ -1335,6 +1341,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->fdb_def_rule = 1;
 	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
 	config->cnt_svc.service_core = rte_get_main_lcore();
+	config->repr_matching = 1;
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1368,6 +1375,11 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 			config->dv_xmeta_en);
 		config->dv_xmeta_en = MLX5_XMETA_MODE_LEGACY;
 	}
+	if (config->dv_flow_en != 2 && !config->repr_matching) {
+		DRV_LOG(DEBUG, "Disabling representor matching is valid only "
+			       "when HW Steering is enabled.");
+		config->repr_matching = 1;
+	}
 	if (config->tx_pp && !sh->dev_cap.txpp_en) {
 		DRV_LOG(ERR, "Packet pacing is not supported.");
 		rte_errno = ENODEV;
@@ -1411,6 +1423,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	DRV_LOG(DEBUG, "\"allow_duplicate_pattern\" is %u.",
 		config->allow_duplicate_pattern);
 	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
+	DRV_LOG(DEBUG, "\"repr_matching_en\" is %u.", config->repr_matching);
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a715df693e..5a961a69b7 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -321,6 +321,7 @@ struct mlx5_sh_config {
 	} cnt_svc; /* configure for HW steering's counter's service. */
 	/* Allow/Prevent the duplicate rules pattern. */
 	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
+	uint32_t repr_matching:1; /* Enable implicit vport matching in HWS FDB. */
 };
 
 /* Structure for VF VLAN workaround. */
@@ -371,6 +372,7 @@ struct mlx5_hw_q_job {
 			void *out_data;
 		} __rte_packed;
 		struct rte_flow_item_ethdev port_spec;
+		struct rte_flow_item_tag tag_spec;
 	} __rte_packed;
 };
 
@@ -1686,6 +1688,9 @@ struct mlx5_priv {
 	struct rte_flow_template_table *hw_esw_sq_miss_tbl;
 	struct rte_flow_template_table *hw_esw_zero_tbl;
 	struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
+	struct rte_flow_pattern_template *hw_tx_repr_tagging_pt;
+	struct rte_flow_actions_template *hw_tx_repr_tagging_at;
+	struct rte_flow_template_table *hw_tx_repr_tagging_tbl;
 	struct mlx5_indexed_pool *flows[MLX5_FLOW_TYPE_MAXI];
 	/* RTE Flow rules. */
 	uint32_t ctrl_flows; /* Control flow rules. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 01ad1f774b..e19e9b20ed 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1127,7 +1127,11 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
 		}
 		break;
 	case MLX5_METADATA_TX:
-		return REG_A;
+		if (config->dv_flow_en == 2 && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+			return REG_C_1;
+		} else {
+			return REG_A;
+		}
 	case MLX5_METADATA_FDB:
 		switch (config->dv_xmeta_en) {
 		case MLX5_XMETA_MODE_LEGACY:
@@ -11355,7 +11359,7 @@ mlx5_flow_pick_transfer_proxy(struct rte_eth_dev *dev,
 			return 0;
 		}
 	}
-	return rte_flow_error_set(error, EINVAL,
+	return rte_flow_error_set(error, ENODEV,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, "unable to find a proxy port");
 }
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 1a4b33d592..c3a5fba25e 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1210,12 +1210,18 @@ struct rte_flow_pattern_template {
 	struct rte_flow_pattern_template_attr attr;
 	struct mlx5dr_match_template *mt; /* mlx5 match template. */
 	uint64_t item_flags; /* Item layer flags. */
+	uint64_t orig_item_nb; /* Number of pattern items provided by the user (with END item). */
 	uint32_t refcnt;  /* Reference counter. */
 	/*
 	 * If true, then rule pattern should be prepended with
 	 * represented_port pattern item.
 	 */
 	bool implicit_port;
+	/*
+	 * If true, then rule pattern should be prepended with
+	 * tag pattern item for representor matching.
+	 */
+	bool implicit_tag;
 };
 
 /* Flow action template struct. */
@@ -2495,6 +2501,7 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
 					 uint32_t sqn);
 int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
 int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn);
 int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
 		const struct rte_flow_actions_template_attr *attr,
 		const struct rte_flow_action actions[],
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 1516ee9e25..d036240794 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -32,12 +32,15 @@
 /* Maximum number of rules in control flow tables. */
 #define MLX5_HW_CTRL_FLOW_NB_RULES (4096)
 
-/* Lowest flow group usable by an application. */
+/* Lowest flow group usable by an application if group translation is done. */
 #define MLX5_HW_LOWEST_USABLE_GROUP (1)
 
 /* Maximum group index usable by user applications for transfer flows. */
 #define MLX5_HW_MAX_TRANSFER_GROUP (UINT32_MAX - 1)
 
+/* Maximum group index usable by user applications for egress flows. */
+#define MLX5_HW_MAX_EGRESS_GROUP (UINT32_MAX - 1)
+
 /* Lowest priority for HW root table. */
 #define MLX5_HW_LOWEST_PRIO_ROOT 15
 
@@ -61,6 +64,9 @@ flow_hw_set_vlan_vid_construct(struct rte_eth_dev *dev,
 			       const struct mlx5_hw_actions *hw_acts,
 			       const struct rte_flow_action *action);
 
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev);
+static __rte_always_inline uint32_t flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev);
+
 const struct mlx5_flow_driver_ops mlx5_flow_hw_drv_ops;
 
 /* DR action flags with different table. */
@@ -2349,21 +2355,18 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 		       uint8_t pattern_template_index,
 		       struct mlx5_hw_q_job *job)
 {
-	if (table->its[pattern_template_index]->implicit_port) {
-		const struct rte_flow_item *curr_item;
-		unsigned int nb_items;
-		bool found_end;
-		unsigned int i;
-
-		/* Count number of pattern items. */
-		nb_items = 0;
-		found_end = false;
-		for (curr_item = items; !found_end; ++curr_item) {
-			++nb_items;
-			if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-				found_end = true;
+	struct rte_flow_pattern_template *pt = table->its[pattern_template_index];
+
+	/* Only one implicit item can be added to flow rule pattern. */
+	MLX5_ASSERT(!pt->implicit_port || !pt->implicit_tag);
+	/* At least one item was allocated in job descriptor for items. */
+	MLX5_ASSERT(MLX5_HW_MAX_ITEMS >= 1);
+	if (pt->implicit_port) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
-		/* Prepend represented port item. */
+		/* Set up represented port item in job descriptor. */
 		job->port_spec = (struct rte_flow_item_ethdev){
 			.port_id = dev->data->port_id,
 		};
@@ -2371,21 +2374,26 @@ flow_hw_get_rule_items(struct rte_eth_dev *dev,
 			.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
 			.spec = &job->port_spec,
 		};
-		found_end = false;
-		for (i = 1; i < MLX5_HW_MAX_ITEMS && i - 1 < nb_items; ++i) {
-			job->items[i] = items[i - 1];
-			if (items[i - 1].type == RTE_FLOW_ITEM_TYPE_END) {
-				found_end = true;
-				break;
-			}
-		}
-		if (i >= MLX5_HW_MAX_ITEMS && !found_end) {
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
+		return job->items;
+	} else if (pt->implicit_tag) {
+		if (pt->orig_item_nb + 1 > MLX5_HW_MAX_ITEMS) {
 			rte_errno = ENOMEM;
 			return NULL;
 		}
+		/* Set up tag item in job descriptor. */
+		job->tag_spec = (struct rte_flow_item_tag){
+			.data = flow_hw_tx_tag_regc_value(dev),
+		};
+		job->items[0] = (struct rte_flow_item){
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+			.spec = &job->tag_spec,
+		};
+		rte_memcpy(&job->items[1], items, sizeof(*items) * pt->orig_item_nb);
 		return job->items;
+	} else {
+		return items;
 	}
-	return items;
 }
 
 /**
@@ -2963,6 +2971,11 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		     uint8_t nb_action_templates,
 		     struct rte_flow_error *error)
 {
+	struct rte_flow_error sub_error = {
+		.type = RTE_FLOW_ERROR_TYPE_NONE,
+		.cause = NULL,
+		.message = NULL,
+	};
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5dr_matcher_attr matcher_attr = {0};
 	struct rte_flow_template_table *tbl = NULL;
@@ -2973,7 +2986,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 	struct rte_flow_attr flow_attr = attr->flow_attr;
 	struct mlx5_flow_cb_ctx ctx = {
 		.dev = dev,
-		.error = error,
+		.error = &sub_error,
 		.data = &flow_attr,
 	};
 	struct mlx5_indexed_pool_config cfg = {
@@ -3067,7 +3080,7 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 			continue;
 		err = __flow_hw_actions_translate(dev, &tbl->cfg,
 						  &tbl->ats[i].acts,
-						  action_templates[i], error);
+						  action_templates[i], &sub_error);
 		if (err) {
 			i++;
 			goto at_error;
@@ -3108,12 +3121,11 @@ flow_hw_table_create(struct rte_eth_dev *dev,
 		mlx5_free(tbl);
 	}
 	if (error != NULL) {
-		rte_flow_error_set(error, err,
-				error->type == RTE_FLOW_ERROR_TYPE_NONE ?
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED : error->type,
-				NULL,
-				error->message == NULL ?
-				"fail to create rte table" : error->message);
+		if (sub_error.type == RTE_FLOW_ERROR_TYPE_NONE)
+			rte_flow_error_set(error, err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					   "Failed to create template table");
+		else
+			rte_memcpy(error, &sub_error, sizeof(sub_error));
 	}
 	return NULL;
 }
@@ -3174,9 +3186,10 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 			struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	const struct rte_flow_attr *flow_attr = &cfg->attr.flow_attr;
 
-	if (priv->sh->config.dv_esw_en &&
+	if (config->dv_esw_en &&
 	    priv->fdb_def_rule &&
 	    cfg->external &&
 	    flow_attr->transfer) {
@@ -3186,6 +3199,22 @@ flow_hw_translate_group(struct rte_eth_dev *dev,
 						  NULL,
 						  "group index not supported");
 		*table_group = group + 1;
+	} else if (config->dv_esw_en &&
+		   !(config->repr_matching && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) &&
+		   cfg->external &&
+		   flow_attr->egress) {
+		/*
+		 * On E-Switch setups, egress group translation is not done if and only if
+		 * representor matching is disabled and legacy metadata mode is selected.
+		 * In all other cases, egree group 0 is reserved for representor tagging flows
+		 * and metadata copy flows.
+		 */
+		if (group > MLX5_HW_MAX_EGRESS_GROUP)
+			return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						  NULL,
+						  "group index not supported");
+		*table_group = group + 1;
 	} else {
 		*table_group = group;
 	}
@@ -3226,7 +3255,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 			      uint8_t nb_action_templates,
 			      struct rte_flow_error *error)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_flow_template_table_cfg cfg = {
 		.attr = *attr,
 		.external = true,
@@ -3235,12 +3263,6 @@ flow_hw_template_table_create(struct rte_eth_dev *dev,
 
 	if (flow_hw_translate_group(dev, &cfg, group, &cfg.attr.flow_attr.group, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && cfg.attr.flow_attr.egress) {
-		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-				  "egress flows are not supported with HW Steering"
-				  " when E-Switch is enabled");
-		return NULL;
-	}
 	return flow_hw_table_create(dev, &cfg, item_templates, nb_item_templates,
 				    action_templates, nb_action_templates, error);
 }
@@ -4496,26 +4518,28 @@ flow_hw_actions_template_destroy(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
-static struct rte_flow_item *
-flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
-			       struct rte_flow_error *error)
+static uint32_t
+flow_hw_count_items(const struct rte_flow_item *items)
 {
 	const struct rte_flow_item *curr_item;
-	struct rte_flow_item *copied_items;
-	bool found_end;
-	unsigned int nb_items;
-	unsigned int i;
-	size_t size;
+	uint32_t nb_items;
 
-	/* Count number of pattern items. */
 	nb_items = 0;
-	found_end = false;
-	for (curr_item = items; !found_end; ++curr_item) {
+	for (curr_item = items; curr_item->type != RTE_FLOW_ITEM_TYPE_END; ++curr_item)
 		++nb_items;
-		if (curr_item->type == RTE_FLOW_ITEM_TYPE_END)
-			found_end = true;
-	}
-	/* Allocate new array of items and prepend REPRESENTED_PORT item. */
+	return ++nb_items;
+}
+
+static struct rte_flow_item *
+flow_hw_prepend_item(const struct rte_flow_item *items,
+		     const uint32_t nb_items,
+		     const struct rte_flow_item *new_item,
+		     struct rte_flow_error *error)
+{
+	struct rte_flow_item *copied_items;
+	size_t size;
+
+	/* Allocate new array of items. */
 	size = sizeof(*copied_items) * (nb_items + 1);
 	copied_items = mlx5_malloc(MLX5_MEM_ZERO, size, 0, rte_socket_id());
 	if (!copied_items) {
@@ -4525,14 +4549,9 @@ flow_hw_copy_prepend_port_item(const struct rte_flow_item *items,
 				   "cannot allocate item template");
 		return NULL;
 	}
-	copied_items[0] = (struct rte_flow_item){
-		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
-		.spec = NULL,
-		.last = NULL,
-		.mask = &rte_flow_item_ethdev_mask,
-	};
-	for (i = 1; i < nb_items + 1; ++i)
-		copied_items[i] = items[i - 1];
+	/* Put new item at the beginning and copy the rest. */
+	copied_items[0] = *new_item;
+	rte_memcpy(&copied_items[1], items, sizeof(*items) * nb_items);
 	return copied_items;
 }
 
@@ -4553,17 +4572,13 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	if (priv->sh->config.dv_esw_en) {
 		MLX5_ASSERT(priv->master || priv->representor);
 		if (priv->master) {
-			/*
-			 * It is allowed to specify ingress, egress and transfer attributes
-			 * at the same time, in order to construct flows catching all missed
-			 * FDB traffic and forwarding it to the master port.
-			 */
-			if (!(attr->ingress ^ attr->egress ^ attr->transfer))
+			if ((attr->ingress && attr->egress) ||
+			    (attr->ingress && attr->transfer) ||
+			    (attr->egress && attr->transfer))
 				return rte_flow_error_set(error, EINVAL,
 							  RTE_FLOW_ERROR_TYPE_ATTR, NULL,
-							  "only one or all direction attributes"
-							  " at once can be used on transfer proxy"
-							  " port");
+							  "only one direction attribute at once"
+							  " can be used on transfer proxy port");
 		} else {
 			if (attr->transfer)
 				return rte_flow_error_set(error, EINVAL,
@@ -4616,11 +4631,16 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 			break;
 		}
 		case RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT:
-			if (attr->ingress || attr->egress)
+			if (attr->ingress && priv->sh->config.repr_matching)
+				return rte_flow_error_set(error, EINVAL,
+						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
+						  "represented port item cannot be used"
+						  " when ingress attribute is set");
+			if (attr->egress)
 				return rte_flow_error_set(error, EINVAL,
 						  RTE_FLOW_ERROR_TYPE_ITEM, NULL,
 						  "represented port item cannot be used"
-						  " when transfer attribute is set");
+						  " when egress attribute is set");
 			break;
 		case RTE_FLOW_ITEM_TYPE_META:
 			if (!priv->sh->config.dv_esw_en ||
@@ -4682,6 +4702,17 @@ flow_hw_pattern_validate(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static bool
+flow_hw_pattern_has_sq_match(const struct rte_flow_item *items)
+{
+	unsigned int i;
+
+	for (i = 0; items[i].type != RTE_FLOW_ITEM_TYPE_END; ++i)
+		if (items[i].type == (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ)
+			return true;
+	return false;
+}
+
 /**
  * Create flow item template.
  *
@@ -4707,17 +4738,53 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 	struct rte_flow_pattern_template *it;
 	struct rte_flow_item *copied_items = NULL;
 	const struct rte_flow_item *tmpl_items;
+	uint64_t orig_item_nb;
+	struct rte_flow_item port = {
+		.type = RTE_FLOW_ITEM_TYPE_REPRESENTED_PORT,
+		.mask = &rte_flow_item_ethdev_mask,
+	};
+	struct rte_flow_item_tag tag_v = {
+		.data = 0,
+		.index = REG_C_0,
+	};
+	struct rte_flow_item_tag tag_m = {
+		.data = flow_hw_tx_tag_regc_mask(dev),
+		.index = 0xff,
+	};
+	struct rte_flow_item tag = {
+		.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_TAG,
+		.spec = &tag_v,
+		.mask = &tag_m,
+		.last = NULL
+	};
 
 	if (flow_hw_pattern_validate(dev, attr, items, error))
 		return NULL;
-	if (priv->sh->config.dv_esw_en && attr->ingress && !attr->egress && !attr->transfer) {
-		copied_items = flow_hw_copy_prepend_port_item(items, error);
+	orig_item_nb = flow_hw_count_items(items);
+	if (priv->sh->config.dv_esw_en &&
+	    priv->sh->config.repr_matching &&
+	    attr->ingress && !attr->egress && !attr->transfer) {
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &port, error);
+		if (!copied_items)
+			return NULL;
+		tmpl_items = copied_items;
+	} else if (priv->sh->config.dv_esw_en &&
+		   priv->sh->config.repr_matching &&
+		   !attr->ingress && attr->egress && !attr->transfer) {
+		if (flow_hw_pattern_has_sq_match(items)) {
+			DRV_LOG(DEBUG, "Port %u omitting implicit REG_C_0 match for egress "
+				       "pattern template", dev->data->port_id);
+			tmpl_items = items;
+			goto setup_pattern_template;
+		}
+		copied_items = flow_hw_prepend_item(items, orig_item_nb, &tag, error);
 		if (!copied_items)
 			return NULL;
 		tmpl_items = copied_items;
 	} else {
 		tmpl_items = items;
 	}
+setup_pattern_template:
 	it = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*it), 0, rte_socket_id());
 	if (!it) {
 		if (copied_items)
@@ -4729,6 +4796,7 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->attr = *attr;
+	it->orig_item_nb = orig_item_nb;
 	it->mt = mlx5dr_match_template_create(tmpl_items, attr->relaxed_matching);
 	if (!it->mt) {
 		if (copied_items)
@@ -4741,11 +4809,15 @@ flow_hw_pattern_template_create(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	it->item_flags = flow_hw_rss_item_flags_get(tmpl_items);
-	it->implicit_port = !!copied_items;
+	if (copied_items) {
+		if (attr->ingress)
+			it->implicit_port = true;
+		else if (attr->egress)
+			it->implicit_tag = true;
+		mlx5_free(copied_items);
+	}
 	__atomic_fetch_add(&it->refcnt, 1, __ATOMIC_RELAXED);
 	LIST_INSERT_HEAD(&priv->flow_hw_itt, it, next);
-	if (copied_items)
-		mlx5_free(copied_items);
 	return it;
 }
 
@@ -5142,6 +5214,254 @@ flow_hw_free_vport_actions(struct mlx5_priv *priv)
 	priv->hw_vport = NULL;
 }
 
+/**
+ * Create an egress pattern template matching on source SQ.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to pattern template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_pattern_template *
+flow_hw_create_tx_repr_sq_pattern_tmpl(struct rte_eth_dev *dev)
+{
+	struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.egress = 1,
+	};
+	struct mlx5_rte_flow_item_sq sq_mask = {
+		.queue = UINT32_MAX,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.mask = &sq_mask,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t mask = priv->sh->dv_regc0_mask;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(mask != 0);
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT(__builtin_popcount(mask) >= __builtin_popcount(priv->vport_meta_mask));
+	return mask;
+}
+
+static __rte_always_inline uint32_t
+flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint32_t tag;
+
+	/* Mask is verified during device initialization. Sanity checking here. */
+	MLX5_ASSERT(priv->vport_meta_mask != 0);
+	tag = priv->vport_meta_tag >> (rte_bsf32(priv->vport_meta_mask));
+	/*
+	 * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
+	 * Sanity checking here.
+	 */
+	MLX5_ASSERT((tag & priv->sh->dv_regc0_mask) == tag);
+	return tag;
+}
+
+static void
+flow_hw_update_action_mask(struct rte_flow_action *action,
+			   struct rte_flow_action *mask,
+			   enum rte_flow_action_type type,
+			   void *conf_v,
+			   void *conf_m)
+{
+	action->type = type;
+	action->conf = conf_v;
+	mask->type = type;
+	mask->conf = conf_m;
+}
+
+/**
+ * Create an egress actions template with MODIFY_FIELD action for setting unused REG_C_0 bits
+ * to vport tag and JUMP action to group 1.
+ *
+ * If extended metadata mode is enabled, then MODIFY_FIELD action for copying software metadata
+ * to REG_C_1 is added as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   Pointer to actions template on success. NULL otherwise, and rte_errno is set.
+ */
+static struct rte_flow_actions_template *
+flow_hw_create_tx_repr_tag_jump_acts_tmpl(struct rte_eth_dev *dev)
+{
+	uint32_t tag_mask = flow_hw_tx_tag_regc_mask(dev);
+	uint32_t tag_value = flow_hw_tx_tag_regc_value(dev);
+	struct rte_flow_actions_template_attr attr = {
+		.egress = 1,
+	};
+	struct rte_flow_action_modify_field set_tag_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_0,
+			.offset = rte_bsf32(tag_mask),
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = __builtin_popcount(tag_mask),
+	};
+	struct rte_flow_action_modify_field set_tag_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = RTE_FLOW_FIELD_VALUE,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_modify_field copy_metadata_v = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_C_1,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = REG_A,
+		},
+		.width = 32,
+	};
+	struct rte_flow_action_modify_field copy_metadata_m = {
+		.operation = RTE_FLOW_MODIFY_SET,
+		.dst = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.src = {
+			.field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+			.level = UINT32_MAX,
+			.offset = UINT32_MAX,
+		},
+		.width = UINT32_MAX,
+	};
+	struct rte_flow_action_jump jump_v = {
+		.group = MLX5_HW_LOWEST_USABLE_GROUP,
+	};
+	struct rte_flow_action_jump jump_m = {
+		.group = UINT32_MAX,
+	};
+	struct rte_flow_action actions_v[4] = { { 0 } };
+	struct rte_flow_action actions_m[4] = { { 0 } };
+	unsigned int idx = 0;
+
+	rte_memcpy(set_tag_v.src.value, &tag_value, sizeof(tag_value));
+	rte_memcpy(set_tag_m.src.value, &tag_mask, sizeof(tag_mask));
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+				   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+				   &set_tag_v, &set_tag_m);
+	idx++;
+	if (MLX5_SH(dev)->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx],
+					   RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+					   &copy_metadata_v, &copy_metadata_m);
+		idx++;
+	}
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_JUMP,
+				   &jump_v, &jump_m);
+	idx++;
+	flow_hw_update_action_mask(&actions_v[idx], &actions_m[idx], RTE_FLOW_ACTION_TYPE_END,
+				   NULL, NULL);
+	idx++;
+	MLX5_ASSERT(idx <= RTE_DIM(actions_v));
+	return flow_hw_actions_template_create(dev, &attr, actions_v, actions_m, NULL);
+}
+
+static void
+flow_hw_cleanup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (priv->hw_tx_repr_tagging_tbl) {
+		flow_hw_table_destroy(dev, priv->hw_tx_repr_tagging_tbl, NULL);
+		priv->hw_tx_repr_tagging_tbl = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_at) {
+		flow_hw_actions_template_destroy(dev, priv->hw_tx_repr_tagging_at, NULL);
+		priv->hw_tx_repr_tagging_at = NULL;
+	}
+	if (priv->hw_tx_repr_tagging_pt) {
+		flow_hw_pattern_template_destroy(dev, priv->hw_tx_repr_tagging_pt, NULL);
+		priv->hw_tx_repr_tagging_pt = NULL;
+	}
+}
+
+/**
+ * Setup templates and table used to create default Tx flow rules. These default rules
+ * allow for matching Tx representor traffic using a vport tag placed in unused bits of
+ * REG_C_0 register.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise.
+ */
+static int
+flow_hw_setup_tx_repr_tagging(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_template_table_attr attr = {
+		.flow_attr = {
+			.group = 0,
+			.priority = MLX5_HW_LOWEST_PRIO_ROOT,
+			.egress = 1,
+		},
+		.nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
+	};
+	struct mlx5_flow_template_table_cfg cfg = {
+		.attr = attr,
+		.external = false,
+	};
+
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	priv->hw_tx_repr_tagging_pt = flow_hw_create_tx_repr_sq_pattern_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_pt)
+		goto error;
+	priv->hw_tx_repr_tagging_at = flow_hw_create_tx_repr_tag_jump_acts_tmpl(dev);
+	if (!priv->hw_tx_repr_tagging_at)
+		goto error;
+	priv->hw_tx_repr_tagging_tbl = flow_hw_table_create(dev, &cfg,
+							    &priv->hw_tx_repr_tagging_pt, 1,
+							    &priv->hw_tx_repr_tagging_at, 1,
+							    NULL);
+	if (!priv->hw_tx_repr_tagging_tbl)
+		goto error;
+	return 0;
+error:
+	flow_hw_cleanup_tx_repr_tagging(dev);
+	return -rte_errno;
+}
+
 static uint32_t
 flow_hw_esw_mgr_regc_marker_mask(struct rte_eth_dev *dev)
 {
@@ -5548,29 +5868,43 @@ flow_hw_create_tx_default_mreg_copy_actions_template(struct rte_eth_dev *dev)
 		},
 		.width = UINT32_MAX,
 	};
-	const struct rte_flow_action copy_reg_action[] = {
+	const struct rte_flow_action_jump jump_action = {
+		.group = 1,
+	};
+	const struct rte_flow_action_jump jump_mask = {
+		.group = UINT32_MAX,
+	};
+	const struct rte_flow_action actions[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_action,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
-	const struct rte_flow_action copy_reg_mask[] = {
+	const struct rte_flow_action masks[] = {
 		[0] = {
 			.type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
 			.conf = &mreg_mask,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+			.conf = &jump_mask,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
 	struct rte_flow_error drop_err;
 
 	RTE_SET_USED(drop_err);
-	return flow_hw_actions_template_create(dev, &tx_act_attr, copy_reg_action,
-					       copy_reg_mask, &drop_err);
+	return flow_hw_actions_template_create(dev, &tx_act_attr, actions,
+					       masks, &drop_err);
 }
 
 /**
@@ -5748,63 +6082,21 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 	struct rte_flow_actions_template *jump_one_actions_tmpl = NULL;
 	struct rte_flow_actions_template *tx_meta_actions_tmpl = NULL;
 	uint32_t xmeta = priv->sh->config.dv_xmeta_en;
+	uint32_t repr_matching = priv->sh->config.repr_matching;
 
-	/* Item templates */
+	/* Create templates and table for default SQ miss flow rules - root table. */
 	esw_mgr_items_tmpl = flow_hw_create_ctrl_esw_mgr_pattern_template(dev);
 	if (!esw_mgr_items_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create E-Switch Manager item"
 			" template for control flows", dev->data->port_id);
 		goto error;
 	}
-	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
-	if (!regc_sq_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
-	if (!port_items_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create SQ item template for"
-			" control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
-		if (!tx_meta_items_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Action templates */
 	regc_jump_actions_tmpl = flow_hw_create_ctrl_regc_jump_actions_template(dev);
 	if (!regc_jump_actions_tmpl) {
 		DRV_LOG(ERR, "port %u failed to create REG_C set and jump action template"
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
-	if (!port_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create port action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
-			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
-	if (!jump_one_actions_tmpl) {
-		DRV_LOG(ERR, "port %u failed to create jump action template"
-			" for control flows", dev->data->port_id);
-		goto error;
-	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
-		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
-		if (!tx_meta_actions_tmpl) {
-			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
-				" template for control flows", dev->data->port_id);
-			goto error;
-		}
-	}
-	/* Tables */
 	MLX5_ASSERT(priv->hw_esw_sq_miss_root_tbl == NULL);
 	priv->hw_esw_sq_miss_root_tbl = flow_hw_create_ctrl_sq_miss_root_table
 			(dev, esw_mgr_items_tmpl, regc_jump_actions_tmpl);
@@ -5813,6 +6105,19 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default SQ miss flow rules - non-root table. */
+	regc_sq_items_tmpl = flow_hw_create_ctrl_regc_sq_pattern_template(dev);
+	if (!regc_sq_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	port_actions_tmpl = flow_hw_create_ctrl_port_actions_template(dev);
+	if (!port_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create port action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_sq_miss_tbl == NULL);
 	priv->hw_esw_sq_miss_tbl = flow_hw_create_ctrl_sq_miss_table(dev, regc_sq_items_tmpl,
 								     port_actions_tmpl);
@@ -5821,6 +6126,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
+	/* Create templates and table for default FDB jump flow rules. */
+	port_items_tmpl = flow_hw_create_ctrl_port_pattern_template(dev);
+	if (!port_items_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create SQ item template for"
+			" control flows", dev->data->port_id);
+		goto error;
+	}
+	jump_one_actions_tmpl = flow_hw_create_ctrl_jump_actions_template
+			(dev, MLX5_HW_LOWEST_USABLE_GROUP);
+	if (!jump_one_actions_tmpl) {
+		DRV_LOG(ERR, "port %u failed to create jump action template"
+			" for control flows", dev->data->port_id);
+		goto error;
+	}
 	MLX5_ASSERT(priv->hw_esw_zero_tbl == NULL);
 	priv->hw_esw_zero_tbl = flow_hw_create_ctrl_jump_table(dev, port_items_tmpl,
 							       jump_one_actions_tmpl);
@@ -5829,7 +6148,20 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 			" for control flows", dev->data->port_id);
 		goto error;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS) {
+	/* Create templates and table for default Tx metadata copy flow rule. */
+	if (!repr_matching && xmeta == MLX5_XMETA_MODE_META32_HWS) {
+		tx_meta_items_tmpl = flow_hw_create_tx_default_mreg_copy_pattern_template(dev);
+		if (!tx_meta_items_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy pattern"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
+		tx_meta_actions_tmpl = flow_hw_create_tx_default_mreg_copy_actions_template(dev);
+		if (!tx_meta_actions_tmpl) {
+			DRV_LOG(ERR, "port %u failed to Tx metadata copy actions"
+				" template for control flows", dev->data->port_id);
+			goto error;
+		}
 		MLX5_ASSERT(priv->hw_tx_meta_cpy_tbl == NULL);
 		priv->hw_tx_meta_cpy_tbl = flow_hw_create_tx_default_mreg_copy_table(dev,
 					tx_meta_items_tmpl, tx_meta_actions_tmpl);
@@ -5853,7 +6185,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_table_destroy(dev, priv->hw_esw_sq_miss_root_tbl, NULL);
 		priv->hw_esw_sq_miss_root_tbl = NULL;
 	}
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_actions_tmpl)
+	if (tx_meta_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, tx_meta_actions_tmpl, NULL);
 	if (jump_one_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, jump_one_actions_tmpl, NULL);
@@ -5861,7 +6193,7 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev)
 		flow_hw_actions_template_destroy(dev, port_actions_tmpl, NULL);
 	if (regc_jump_actions_tmpl)
 		flow_hw_actions_template_destroy(dev, regc_jump_actions_tmpl, NULL);
-	if (xmeta == MLX5_XMETA_MODE_META32_HWS && tx_meta_items_tmpl)
+	if (tx_meta_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, tx_meta_items_tmpl, NULL);
 	if (port_items_tmpl)
 		flow_hw_pattern_template_destroy(dev, port_items_tmpl, NULL);
@@ -6202,6 +6534,13 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		if (!priv->hw_tag[i])
 			goto err;
 	}
+	if (priv->sh->config.dv_esw_en && priv->sh->config.repr_matching) {
+		ret = flow_hw_setup_tx_repr_tagging(dev);
+		if (ret) {
+			rte_errno = -ret;
+			goto err;
+		}
+	}
 	if (is_proxy) {
 		ret = flow_hw_create_vport_actions(priv);
 		if (ret) {
@@ -6328,6 +6667,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 		return;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
+	flow_hw_cleanup_tx_repr_tagging(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -7723,45 +8063,30 @@ flow_hw_destroy_ctrl_flow(struct rte_eth_dev *dev, struct rte_flow *flow)
 }
 
 /**
- * Destroys control flows created on behalf of @p owner_dev device.
+ * Destroys control flows created on behalf of @p owner device on @p dev device.
  *
- * @param owner_dev
+ * @param dev
+ *   Pointer to Ethernet device on which control flows were created.
+ * @param owner
  *   Pointer to Ethernet device owning control flows.
  *
  * @return
  *   0 on success, otherwise negative error code is returned and
  *   rte_errno is set.
  */
-int
-mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+static int
+flow_hw_flush_ctrl_flows_owned_by(struct rte_eth_dev *dev, struct rte_eth_dev *owner)
 {
-	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
-	struct rte_eth_dev *proxy_dev;
-	struct mlx5_priv *proxy_priv;
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_hw_ctrl_flow *cf;
 	struct mlx5_hw_ctrl_flow *cf_next;
-	uint16_t owner_port_id = owner_dev->data->port_id;
-	uint16_t proxy_port_id = owner_dev->data->port_id;
 	int ret;
 
-	if (owner_priv->sh->config.dv_esw_en) {
-		if (rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL)) {
-			DRV_LOG(ERR, "Unable to find proxy port for port %u",
-				owner_port_id);
-			rte_errno = EINVAL;
-			return -rte_errno;
-		}
-		proxy_dev = &rte_eth_devices[proxy_port_id];
-		proxy_priv = proxy_dev->data->dev_private;
-	} else {
-		proxy_dev = owner_dev;
-		proxy_priv = owner_priv;
-	}
-	cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+	cf = LIST_FIRST(&priv->hw_ctrl_flows);
 	while (cf != NULL) {
 		cf_next = LIST_NEXT(cf, next);
-		if (cf->owner_dev == owner_dev) {
-			ret = flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow);
+		if (cf->owner_dev == owner) {
+			ret = flow_hw_destroy_ctrl_flow(dev, cf->flow);
 			if (ret) {
 				rte_errno = ret;
 				return -ret;
@@ -7774,6 +8099,50 @@ mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
 	return 0;
 }
 
+/**
+ * Destroys control flows created for @p owner_dev device.
+ *
+ * @param owner_dev
+ *   Pointer to Ethernet device owning control flows.
+ *
+ * @return
+ *   0 on success, otherwise negative error code is returned and
+ *   rte_errno is set.
+ */
+int
+mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *owner_dev)
+{
+	struct mlx5_priv *owner_priv = owner_dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	uint16_t owner_port_id = owner_dev->data->port_id;
+	uint16_t proxy_port_id = owner_dev->data->port_id;
+	int ret;
+
+	/* Flush all flows created by this port for itself. */
+	ret = flow_hw_flush_ctrl_flows_owned_by(owner_dev, owner_dev);
+	if (ret)
+		return ret;
+	/* Flush all flows created for this port on proxy port. */
+	if (owner_priv->sh->config.dv_esw_en) {
+		ret = rte_flow_pick_transfer_proxy(owner_port_id, &proxy_port_id, NULL);
+		if (ret == -ENODEV) {
+			DRV_LOG(DEBUG, "Unable to find transfer proxy port for port %u. It was "
+				       "probably closed. Control flows were cleared.",
+				       owner_port_id);
+			rte_errno = 0;
+			return 0;
+		} else if (ret) {
+			DRV_LOG(ERR, "Unable to find proxy port for port %u (ret = %d)",
+				owner_port_id, ret);
+			return ret;
+		}
+		proxy_dev = &rte_eth_devices[proxy_port_id];
+	} else {
+		proxy_dev = owner_dev;
+	}
+	return flow_hw_flush_ctrl_flows_owned_by(proxy_dev, owner_dev);
+}
+
 /**
  * Destroys all control flows created on @p dev device.
  *
@@ -8025,6 +8394,9 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 			.conf = &mreg_action,
 		},
 		[1] = {
+			.type = RTE_FLOW_ACTION_TYPE_JUMP,
+		},
+		[2] = {
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
@@ -8037,6 +8409,60 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
 					eth_all, 0, copy_reg_action, 0);
 }
 
+int
+mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_rte_flow_item_sq sq_spec = {
+		.queue = sqn,
+	};
+	struct rte_flow_item items[] = {
+		{
+			.type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+			.spec = &sq_spec,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	/*
+	 * Allocate actions array suitable for all cases - extended metadata enabled or not.
+	 * With extended metadata there will be an additional MODIFY_FIELD action before JUMP.
+	 */
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD },
+		{ .type = RTE_FLOW_ACTION_TYPE_JUMP },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+
+	/* It is assumed that caller checked for representor matching. */
+	MLX5_ASSERT(priv->sh->config.repr_matching);
+	if (!priv->dr_ctx) {
+		DRV_LOG(DEBUG, "Port %u must be configured for HWS, before creating "
+			       "default egress flow rules. Omitting creation.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!priv->hw_tx_repr_tagging_tbl) {
+		DRV_LOG(ERR, "Port %u is configured for HWS, but table for default "
+			     "egress flow rules does not exist.",
+			     dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * If extended metadata mode is enabled, then an additional MODIFY_FIELD action must be
+	 * placed before terminating JUMP action.
+	 */
+	if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+		actions[1].type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD;
+		actions[2].type = RTE_FLOW_ACTION_TYPE_JUMP;
+	}
+	return flow_hw_create_ctrl_flow(dev, dev, priv->hw_tx_repr_tagging_tbl,
+					items, 0, actions, 0);
+}
+
 void
 mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 {
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 715f2891cf..8c9d5c1b13 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1065,6 +1065,69 @@ mlx5_hairpin_get_peer_ports(struct rte_eth_dev *dev, uint16_t *peer_ports,
 	return ret;
 }
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+
+/**
+ * Check if starting representor port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then starting representor port
+ * is allowed if and only if transfer proxy port is started as well.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping representor port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_representor_port_allowed_start(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_eth_dev *proxy_dev;
+	struct mlx5_priv *proxy_priv;
+	uint16_t proxy_port_id = UINT16_MAX;
+	int ret;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->representor);
+	ret = rte_flow_pick_transfer_proxy(dev->data->port_id, &proxy_port_id, NULL);
+	if (ret) {
+		if (ret == -ENODEV)
+			DRV_LOG(ERR, "Starting representor port %u is not allowed. Transfer "
+				     "proxy port is not available.", dev->data->port_id);
+		else
+			DRV_LOG(ERR, "Failed to pick transfer proxy for port %u (ret = %d)",
+				dev->data->port_id, ret);
+		return ret;
+	}
+	proxy_dev = &rte_eth_devices[proxy_port_id];
+	proxy_priv = proxy_dev->data->dev_private;
+	if (proxy_priv->dr_ctx == NULL) {
+		DRV_LOG(DEBUG, "Starting representor port %u is allowed, but default traffic flows"
+			       " will not be created. Transfer proxy port must be configured"
+			       " for HWS and started.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!proxy_dev->data->dev_started) {
+		DRV_LOG(ERR, "Failed to start port %u: transfer proxy (port %u) must be started",
+			     dev->data->port_id, proxy_port_id);
+		rte_errno = EAGAIN;
+		return -rte_errno;
+	}
+	if (priv->sh->config.repr_matching && !priv->dr_ctx) {
+		DRV_LOG(ERR, "Failed to start port %u: with representor matching enabled, port "
+			     "must be configured for HWS", dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	return 0;
+}
+
+#endif
+
 /**
  * DPDK callback to start the device.
  *
@@ -1084,6 +1147,19 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	int fine_inline;
 
 	DRV_LOG(DEBUG, "port %u starting device", dev->data->port_id);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_start;
+		/* If master is being started, then it is always allowed. */
+		if (priv->master)
+			goto continue_dev_start;
+		if (mlx5_hw_representor_port_allowed_start(dev))
+			return -rte_errno;
+	}
+continue_dev_start:
+#endif
 	fine_inline = rte_mbuf_dynflag_lookup
 		(RTE_PMD_MLX5_FINE_GRANULARITY_INLINE, NULL);
 	if (fine_inline >= 0)
@@ -1248,6 +1324,53 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	return -rte_errno;
 }
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+/**
+ * Check if stopping transfer proxy port is allowed.
+ *
+ * If transfer proxy port is configured for HWS, then it is allowed to stop it
+ * if and only if all other representor ports are stopped.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   If stopping transfer proxy port is allowed, then 0 is returned.
+ *   Otherwise rte_errno is set, and negative errno value is returned.
+ */
+static int
+mlx5_hw_proxy_port_allowed_stop(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	bool representor_started = false;
+	uint16_t port_id;
+
+	MLX5_ASSERT(priv->sh->config.dv_flow_en == 2);
+	MLX5_ASSERT(priv->sh->config.dv_esw_en);
+	MLX5_ASSERT(priv->master);
+	/* If transfer proxy port was not configured for HWS, then stopping it is allowed. */
+	if (!priv->dr_ctx)
+		return 0;
+	MLX5_ETH_FOREACH_DEV(port_id, dev->device) {
+		const struct rte_eth_dev *port_dev = &rte_eth_devices[port_id];
+		const struct mlx5_priv *port_priv = port_dev->data->dev_private;
+
+		if (port_id != dev->data->port_id &&
+		    port_priv->domain_id == priv->domain_id &&
+		    port_dev->data->dev_started)
+			representor_started = true;
+	}
+	if (representor_started) {
+		DRV_LOG(INFO, "Failed to stop port %u: attached representor ports"
+			      " must be stopped before stopping transfer proxy port",
+			      dev->data->port_id);
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
+	return 0;
+}
+#endif
+
 /**
  * DPDK callback to stop the device.
  *
@@ -1261,6 +1384,21 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	if (priv->sh->config.dv_flow_en == 2) {
+		/* If there is no E-Switch, then there are no start/stop order limitations. */
+		if (!priv->sh->config.dv_esw_en)
+			goto continue_dev_stop;
+		/* If representor is being stopped, then it is always allowed. */
+		if (priv->representor)
+			goto continue_dev_stop;
+		if (mlx5_hw_proxy_port_allowed_stop(dev)) {
+			dev->data->dev_started = 1;
+			return -rte_errno;
+		}
+	}
+continue_dev_stop:
+#endif
 	dev->data->dev_started = 0;
 	/* Prevent crashes when queues are still in use. */
 	dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
@@ -1296,13 +1434,21 @@ static int
 mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_sh_config *config = &priv->sh->config;
 	unsigned int i;
 	int ret;
 
-	if (priv->sh->config.dv_esw_en && priv->master) {
-		if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS)
-			if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
-				goto error;
+	/*
+	 * With extended metadata enabled, the Tx metadata copy is handled by default
+	 * Tx tagging flow rules, so default Tx flow rule is not needed. It is only
+	 * required when representor matching is disabled.
+	 */
+	if (config->dv_esw_en &&
+	    !config->repr_matching &&
+	    config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+	    priv->master) {
+		if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
+			goto error;
 	}
 	for (i = 0; i < priv->txqs_n; ++i) {
 		struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
@@ -1311,17 +1457,22 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 		if (!txq)
 			continue;
 		queue = mlx5_txq_get_sqn(txq);
-		if ((priv->representor || priv->master) &&
-		    priv->sh->config.dv_esw_en) {
+		if ((priv->representor || priv->master) && config->dv_esw_en) {
 			if (mlx5_flow_hw_esw_create_sq_miss_flow(dev, queue)) {
 				mlx5_txq_release(dev, i);
 				goto error;
 			}
 		}
+		if (config->dv_esw_en && config->repr_matching) {
+			if (mlx5_flow_hw_tx_repr_matching_flow(dev, queue)) {
+				mlx5_txq_release(dev, i);
+				goto error;
+			}
+		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->sh->config.fdb_def_rule) {
-		if ((priv->master || priv->representor) && priv->sh->config.dv_esw_en) {
+	if (config->fdb_def_rule) {
+		if ((priv->master || priv->representor) && config->dv_esw_en) {
 			if (!mlx5_flow_hw_esw_create_default_jump_flow(dev))
 				priv->fdb_def_rule = 1;
 			else
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v6 18/18] net/mlx5: create control flow rules with HWS
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (16 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 17/18] net/mlx5: support device control of representor matching Suanming Mou
@ 2022-10-20 15:41   ` Suanming Mou
  2022-10-24  9:48     ` Slava Ovsiienko
  2022-10-24 10:57   ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Raslan Darawsheh
  18 siblings, 1 reply; 140+ messages in thread
From: Suanming Mou @ 2022-10-20 15:41 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland, orika, Dariusz Sosnowski

From: Dariusz Sosnowski <dsosnowski@nvidia.com>

This patch adds creation of control flow rules required to receive
default traffic (based on port configuration) with HWS.

Control flow rules are created on port start and destroyed on port stop.
Handling of destroying these rules was already implemented before that
patch.

Control flow rules are created if and only if flow isolation mode is
disabled and creation process goes as follows:

- Port configuration is collected into a set of flags. Each flag
  corresponds to a certain Ethernet pattern type, defined by
  mlx5_flow_ctrl_rx_eth_pattern_type enumeration. There is a separate
  flag for VLAN filtering.
- For each possible Ethernet pattern type and:
  - For each possible RSS action configuration:
    - If configuration flags do not match this combination, it is
      omitted.
    - A template table is created using this combination of pattern
      and actions template (templates are fetched from hw_ctrl_rx
      struct stored in port's private data).
    - Flow rules are created in this table.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/rel_notes/release_22_11.rst |   1 +
 drivers/net/mlx5/mlx5.h                |   4 +
 drivers/net/mlx5/mlx5_flow.h           |  56 ++
 drivers/net/mlx5/mlx5_flow_hw.c        | 799 +++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxq.c            |   3 +-
 drivers/net/mlx5/mlx5_trigger.c        |  20 +-
 6 files changed, 881 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 725382c1b7..a056109e17 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -257,6 +257,7 @@ New Features
     - Support of meter.
     - Support of counter.
     - Support of CT.
+    - Support of control flow and isolate mode.
 
 * **Rewritten pmdinfo script.**
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5a961a69b7..c9fcb71b69 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1642,6 +1642,8 @@ struct mlx5_hw_ctrl_flow {
 	struct rte_flow *flow;
 };
 
+struct mlx5_flow_hw_ctrl_rx;
+
 struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_dev_ctx_shared *sh; /* Shared device context. */
@@ -1773,6 +1775,8 @@ struct mlx5_priv {
 	/* Management data for ASO connection tracking. */
 	struct mlx5_aso_ct_pool *hws_ctpool; /* HW steering's CT pool. */
 	struct mlx5_aso_mtr_pool *hws_mpool; /* HW steering's Meter pool. */
+	struct mlx5_flow_hw_ctrl_rx *hw_ctrl_rx;
+	/**< HW steering templates used to create control flow rules. */
 #endif
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index c3a5fba25e..85d2fd219d 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2116,6 +2116,62 @@ rte_col_2_mlx5_col(enum rte_color rcol)
 	return MLX5_FLOW_COLOR_UNDEFINED;
 }
 
+/* All types of Ethernet patterns used in control flow rules. */
+enum mlx5_flow_ctrl_rx_eth_pattern_type {
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL = 0,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN,
+	MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX,
+};
+
+/* All types of RSS actions used in control flow rules. */
+enum mlx5_flow_ctrl_rx_expanded_rss_type {
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_NON_IP = 0,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP,
+	MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX,
+};
+
+/**
+ * Contains pattern template, template table and its attributes for a single
+ * combination of Ethernet pattern and RSS action. Used to create control flow rules
+ * with HWS.
+ */
+struct mlx5_flow_hw_ctrl_rx_table {
+	struct rte_flow_template_table_attr attr;
+	struct rte_flow_pattern_template *pt;
+	struct rte_flow_template_table *tbl;
+};
+
+/* Contains all templates required to create control flow rules with HWS. */
+struct mlx5_flow_hw_ctrl_rx {
+	struct rte_flow_actions_template *rss[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX];
+	struct mlx5_flow_hw_ctrl_rx_table tables[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX]
+						[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX];
+};
+
+#define MLX5_CTRL_PROMISCUOUS    (RTE_BIT32(0))
+#define MLX5_CTRL_ALL_MULTICAST  (RTE_BIT32(1))
+#define MLX5_CTRL_BROADCAST      (RTE_BIT32(2))
+#define MLX5_CTRL_IPV4_MULTICAST (RTE_BIT32(3))
+#define MLX5_CTRL_IPV6_MULTICAST (RTE_BIT32(4))
+#define MLX5_CTRL_DMAC           (RTE_BIT32(5))
+#define MLX5_CTRL_VLAN_FILTER    (RTE_BIT32(6))
+
+int mlx5_flow_hw_ctrl_flows(struct rte_eth_dev *dev, uint32_t flags);
+void mlx5_flow_hw_cleanup_ctrl_rx_templates(struct rte_eth_dev *dev);
+
 int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
 			     const struct mlx5_flow_tunnel *tunnel,
 			     uint32_t group, uint32_t *table,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index d036240794..2d275ad111 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -47,6 +47,11 @@
 /* Lowest priority for HW non-root table. */
 #define MLX5_HW_LOWEST_PRIO_NON_ROOT (UINT32_MAX)
 
+/* Priorities for Rx control flow rules. */
+#define MLX5_HW_CTRL_RX_PRIO_L2 (MLX5_HW_LOWEST_PRIO_ROOT)
+#define MLX5_HW_CTRL_RX_PRIO_L3 (MLX5_HW_LOWEST_PRIO_ROOT - 1)
+#define MLX5_HW_CTRL_RX_PRIO_L4 (MLX5_HW_LOWEST_PRIO_ROOT - 2)
+
 #define MLX5_HW_VLAN_PUSH_TYPE_IDX 0
 #define MLX5_HW_VLAN_PUSH_VID_IDX 1
 #define MLX5_HW_VLAN_PUSH_PCP_IDX 2
@@ -84,6 +89,72 @@ static uint32_t mlx5_hw_act_flag[MLX5_HW_ACTION_FLAG_MAX]
 	},
 };
 
+/* Ethernet item spec for promiscuous mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_promisc_spec = {
+	.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for promiscuous mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_promisc_mask = {
+	.dst.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for all multicast mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_mcast_spec = {
+	.dst.addr_bytes = "\x01\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for all multicast mode. */
+static const struct rte_flow_item_eth ctrl_rx_eth_mcast_mask = {
+	.dst.addr_bytes = "\x01\x00\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for IPv4 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv4_mcast_spec = {
+	.dst.addr_bytes = "\x01\x00\x5e\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for IPv4 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv4_mcast_mask = {
+	.dst.addr_bytes = "\xff\xff\xff\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for IPv6 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv6_mcast_spec = {
+	.dst.addr_bytes = "\x33\x33\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+/* Ethernet item mask for IPv6 multicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_ipv6_mcast_mask = {
+	.dst.addr_bytes = "\xff\xff\x00\x00\x00\x00",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item mask for unicast traffic. */
+static const struct rte_flow_item_eth ctrl_rx_eth_dmac_mask = {
+	.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
+/* Ethernet item spec for broadcast. */
+static const struct rte_flow_item_eth ctrl_rx_eth_bcast_spec = {
+	.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	.src.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	.type = 0,
+};
+
 /**
  * Set rxq flag.
  *
@@ -6349,6 +6420,365 @@ flow_hw_create_vlan(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static void
+flow_hw_cleanup_ctrl_rx_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	unsigned int j;
+
+	if (!priv->hw_ctrl_rx)
+		return;
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			struct rte_flow_template_table *tbl = priv->hw_ctrl_rx->tables[i][j].tbl;
+			struct rte_flow_pattern_template *pt = priv->hw_ctrl_rx->tables[i][j].pt;
+
+			if (tbl)
+				claim_zero(flow_hw_table_destroy(dev, tbl, NULL));
+			if (pt)
+				claim_zero(flow_hw_pattern_template_destroy(dev, pt, NULL));
+		}
+	}
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++i) {
+		struct rte_flow_actions_template *at = priv->hw_ctrl_rx->rss[i];
+
+		if (at)
+			claim_zero(flow_hw_actions_template_destroy(dev, at, NULL));
+	}
+	mlx5_free(priv->hw_ctrl_rx);
+	priv->hw_ctrl_rx = NULL;
+}
+
+static uint64_t
+flow_hw_ctrl_rx_rss_type_hash_types(const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	switch (rss_type) {
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_NON_IP:
+		return 0;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4:
+		return RTE_ETH_RSS_IPV4 | RTE_ETH_RSS_FRAG_IPV4 | RTE_ETH_RSS_NONFRAG_IPV4_OTHER;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP:
+		return RTE_ETH_RSS_NONFRAG_IPV4_UDP;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP:
+		return RTE_ETH_RSS_NONFRAG_IPV4_TCP;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6:
+		return RTE_ETH_RSS_IPV6 | RTE_ETH_RSS_FRAG_IPV6 | RTE_ETH_RSS_NONFRAG_IPV6_OTHER;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP:
+		return RTE_ETH_RSS_NONFRAG_IPV6_UDP | RTE_ETH_RSS_IPV6_UDP_EX;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP:
+		return RTE_ETH_RSS_NONFRAG_IPV6_TCP | RTE_ETH_RSS_IPV6_TCP_EX;
+	default:
+		/* Should not reach here. */
+		MLX5_ASSERT(false);
+		return 0;
+	}
+}
+
+static struct rte_flow_actions_template *
+flow_hw_create_ctrl_rx_rss_template(struct rte_eth_dev *dev,
+				    const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_actions_template_attr attr = {
+		.ingress = 1,
+	};
+	uint16_t queue[RTE_MAX_QUEUES_PER_PORT];
+	struct rte_flow_action_rss rss_conf = {
+		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		.level = 0,
+		.types = 0,
+		.key_len = priv->rss_conf.rss_key_len,
+		.key = priv->rss_conf.rss_key,
+		.queue_num = priv->reta_idx_n,
+		.queue = queue,
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_RSS,
+			.conf = &rss_conf,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_action masks[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_RSS,
+			.conf = &rss_conf,
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		}
+	};
+	struct rte_flow_actions_template *at;
+	struct rte_flow_error error;
+	unsigned int i;
+
+	MLX5_ASSERT(priv->reta_idx_n > 0 && priv->reta_idx);
+	/* Select proper RSS hash types and based on that configure the actions template. */
+	rss_conf.types = flow_hw_ctrl_rx_rss_type_hash_types(rss_type);
+	if (rss_conf.types) {
+		for (i = 0; i < priv->reta_idx_n; ++i)
+			queue[i] = (*priv->reta_idx)[i];
+	} else {
+		rss_conf.queue_num = 1;
+		queue[0] = (*priv->reta_idx)[0];
+	}
+	at = flow_hw_actions_template_create(dev, &attr, actions, masks, &error);
+	if (!at)
+		DRV_LOG(ERR,
+			"Failed to create ctrl flow actions template: rte_errno(%d), type(%d): %s",
+			rte_errno, error.type,
+			error.message ? error.message : "(no stated reason)");
+	return at;
+}
+
+static uint32_t ctrl_rx_rss_priority_map[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX] = {
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_NON_IP] = MLX5_HW_CTRL_RX_PRIO_L2,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4] = MLX5_HW_CTRL_RX_PRIO_L3,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP] = MLX5_HW_CTRL_RX_PRIO_L4,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP] = MLX5_HW_CTRL_RX_PRIO_L4,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6] = MLX5_HW_CTRL_RX_PRIO_L3,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP] = MLX5_HW_CTRL_RX_PRIO_L4,
+	[MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP] = MLX5_HW_CTRL_RX_PRIO_L4,
+};
+
+static uint32_t ctrl_rx_nb_flows_map[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX] = {
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN] = MLX5_MAX_VLAN_IDS,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN] = MLX5_MAX_VLAN_IDS,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST] = 1,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN] = MLX5_MAX_VLAN_IDS,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC] = MLX5_MAX_UC_MAC_ADDRESSES,
+	[MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN] =
+			MLX5_MAX_UC_MAC_ADDRESSES * MLX5_MAX_VLAN_IDS,
+};
+
+static struct rte_flow_template_table_attr
+flow_hw_get_ctrl_rx_table_attr(enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type,
+			       const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	return (struct rte_flow_template_table_attr){
+		.flow_attr = {
+			.group = 0,
+			.priority = ctrl_rx_rss_priority_map[rss_type],
+			.ingress = 1,
+		},
+		.nb_flows = ctrl_rx_nb_flows_map[eth_pattern_type],
+	};
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_eth_item(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.mask = NULL,
+	};
+
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+		item.mask = &ctrl_rx_eth_promisc_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+		item.mask = &ctrl_rx_eth_mcast_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		item.mask = &ctrl_rx_eth_dmac_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+		item.mask = &ctrl_rx_eth_ipv4_mcast_mask;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		item.mask = &ctrl_rx_eth_ipv6_mcast_mask;
+		break;
+	default:
+		/* Should not reach here - ETH mask must be present. */
+		item.type = RTE_FLOW_ITEM_TYPE_END;
+		MLX5_ASSERT(false);
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_vlan_item(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_VOID,
+		.mask = NULL,
+	};
+
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		item.type = RTE_FLOW_ITEM_TYPE_VLAN;
+		item.mask = &rte_flow_item_vlan_mask;
+		break;
+	default:
+		/* Nothing to update. */
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_l3_item(const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_VOID,
+		.mask = NULL,
+	};
+
+	switch (rss_type) {
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP:
+		item.type = RTE_FLOW_ITEM_TYPE_IPV4;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP:
+		item.type = RTE_FLOW_ITEM_TYPE_IPV6;
+		break;
+	default:
+		/* Nothing to update. */
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_item
+flow_hw_get_ctrl_rx_l4_item(const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_item item = {
+		.type = RTE_FLOW_ITEM_TYPE_VOID,
+		.mask = NULL,
+	};
+
+	switch (rss_type) {
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_UDP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_UDP:
+		item.type = RTE_FLOW_ITEM_TYPE_UDP;
+		break;
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV4_TCP:
+	case MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_IPV6_TCP:
+		item.type = RTE_FLOW_ITEM_TYPE_TCP;
+		break;
+	default:
+		/* Nothing to update. */
+		break;
+	}
+	return item;
+}
+
+static struct rte_flow_pattern_template *
+flow_hw_create_ctrl_rx_pattern_template
+		(struct rte_eth_dev *dev,
+		 const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type,
+		 const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	const struct rte_flow_pattern_template_attr attr = {
+		.relaxed_matching = 0,
+		.ingress = 1,
+	};
+	struct rte_flow_item items[] = {
+		/* Matching patterns */
+		flow_hw_get_ctrl_rx_eth_item(eth_pattern_type),
+		flow_hw_get_ctrl_rx_vlan_item(eth_pattern_type),
+		flow_hw_get_ctrl_rx_l3_item(rss_type),
+		flow_hw_get_ctrl_rx_l4_item(rss_type),
+		/* Terminate pattern */
+		{ .type = RTE_FLOW_ITEM_TYPE_END }
+	};
+
+	return flow_hw_pattern_template_create(dev, &attr, items, NULL);
+}
+
+static int
+flow_hw_create_ctrl_rx_tables(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	unsigned int i;
+	unsigned int j;
+	int ret;
+
+	MLX5_ASSERT(!priv->hw_ctrl_rx);
+	priv->hw_ctrl_rx = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*priv->hw_ctrl_rx),
+				       RTE_CACHE_LINE_SIZE, rte_socket_id());
+	if (!priv->hw_ctrl_rx) {
+		DRV_LOG(ERR, "Failed to allocate memory for Rx control flow tables");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	/* Create all pattern template variants. */
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type = i;
+
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type = j;
+			struct rte_flow_template_table_attr attr;
+			struct rte_flow_pattern_template *pt;
+
+			attr = flow_hw_get_ctrl_rx_table_attr(eth_pattern_type, rss_type);
+			pt = flow_hw_create_ctrl_rx_pattern_template(dev, eth_pattern_type,
+								     rss_type);
+			if (!pt)
+				goto err;
+			priv->hw_ctrl_rx->tables[i][j].attr = attr;
+			priv->hw_ctrl_rx->tables[i][j].pt = pt;
+		}
+	}
+	return 0;
+err:
+	ret = rte_errno;
+	flow_hw_cleanup_ctrl_rx_tables(dev);
+	rte_errno = ret;
+	return -ret;
+}
+
+void
+mlx5_flow_hw_cleanup_ctrl_rx_templates(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_hw_ctrl_rx *hw_ctrl_rx;
+	unsigned int i;
+	unsigned int j;
+
+	if (!priv->dr_ctx)
+		return;
+	if (!priv->hw_ctrl_rx)
+		return;
+	hw_ctrl_rx = priv->hw_ctrl_rx;
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			struct mlx5_flow_hw_ctrl_rx_table *tmpls = &hw_ctrl_rx->tables[i][j];
+
+			if (tmpls->tbl) {
+				claim_zero(flow_hw_table_destroy(dev, tmpls->tbl, NULL));
+				tmpls->tbl = NULL;
+			}
+		}
+	}
+	for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+		if (hw_ctrl_rx->rss[j]) {
+			claim_zero(flow_hw_actions_template_destroy(dev, hw_ctrl_rx->rss[j], NULL));
+			hw_ctrl_rx->rss[j] = NULL;
+		}
+	}
+}
+
 /**
  * Configure port HWS resources.
  *
@@ -6515,6 +6945,12 @@ flow_hw_configure(struct rte_eth_dev *dev,
 	priv->nb_queue = nb_q_updated;
 	rte_spinlock_init(&priv->hw_ctrl_lock);
 	LIST_INIT(&priv->hw_ctrl_flows);
+	ret = flow_hw_create_ctrl_rx_tables(dev);
+	if (ret) {
+		rte_flow_error_set(error, -ret, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   "Failed to set up Rx control flow templates");
+		goto err;
+	}
 	/* Initialize meter library*/
 	if (port_attr->nb_meters)
 		if (mlx5_flow_meter_init(dev, port_attr->nb_meters, 1, 1, nb_q_updated))
@@ -6668,6 +7104,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
 	flow_hw_cleanup_tx_repr_tagging(dev);
+	flow_hw_cleanup_ctrl_rx_tables(dev);
 	while (!LIST_EMPTY(&priv->flow_hw_tbl_ongo)) {
 		tbl = LIST_FIRST(&priv->flow_hw_tbl_ongo);
 		flow_hw_table_destroy(dev, tbl, NULL);
@@ -8463,6 +8900,368 @@ mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn)
 					items, 0, actions, 0);
 }
 
+static uint32_t
+__calc_pattern_flags(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+		return MLX5_CTRL_PROMISCUOUS;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+		return MLX5_CTRL_ALL_MULTICAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+		return MLX5_CTRL_BROADCAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+		return MLX5_CTRL_IPV4_MULTICAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		return MLX5_CTRL_IPV6_MULTICAST;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		return MLX5_CTRL_DMAC;
+	default:
+		/* Should not reach here. */
+		MLX5_ASSERT(false);
+		return 0;
+	}
+}
+
+static uint32_t
+__calc_vlan_flags(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type)
+{
+	switch (eth_pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		return MLX5_CTRL_VLAN_FILTER;
+	default:
+		return 0;
+	}
+}
+
+static bool
+eth_pattern_type_is_requested(const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type,
+			      uint32_t flags)
+{
+	uint32_t pattern_flags = __calc_pattern_flags(eth_pattern_type);
+	uint32_t vlan_flags = __calc_vlan_flags(eth_pattern_type);
+	bool pattern_requested = !!(pattern_flags & flags);
+	bool consider_vlan = vlan_flags || (MLX5_CTRL_VLAN_FILTER & flags);
+	bool vlan_requested = !!(vlan_flags & flags);
+
+	if (consider_vlan)
+		return pattern_requested && vlan_requested;
+	else
+		return pattern_requested;
+}
+
+static bool
+rss_type_is_requested(struct mlx5_priv *priv,
+		      const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_actions_template *at = priv->hw_ctrl_rx->rss[rss_type];
+	unsigned int i;
+
+	for (i = 0; at->actions[i].type != RTE_FLOW_ACTION_TYPE_END; ++i) {
+		if (at->actions[i].type == RTE_FLOW_ACTION_TYPE_RSS) {
+			const struct rte_flow_action_rss *rss = at->actions[i].conf;
+			uint64_t rss_types = rss->types;
+
+			if ((rss_types & priv->rss_conf.rss_hf) != rss_types)
+				return false;
+		}
+	}
+	return true;
+}
+
+static const struct rte_flow_item_eth *
+__get_eth_spec(const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern)
+{
+	switch (pattern) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+		return &ctrl_rx_eth_promisc_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+		return &ctrl_rx_eth_mcast_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+		return &ctrl_rx_eth_bcast_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+		return &ctrl_rx_eth_ipv4_mcast_spec;
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		return &ctrl_rx_eth_ipv6_mcast_spec;
+	default:
+		/* This case should not be reached. */
+		MLX5_ASSERT(false);
+		return NULL;
+	}
+}
+
+static int
+__flow_hw_ctrl_flows_single(struct rte_eth_dev *dev,
+			    struct rte_flow_template_table *tbl,
+			    const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+			    const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	const struct rte_flow_item_eth *eth_spec = __get_eth_spec(pattern_type);
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+
+	if (!eth_spec)
+		return -EINVAL;
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = eth_spec,
+	};
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VOID };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	/* Without VLAN filtering, only a single flow rule must be created. */
+	return flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0);
+}
+
+static int
+__flow_hw_ctrl_flows_single_vlan(struct rte_eth_dev *dev,
+				 struct rte_flow_template_table *tbl,
+				 const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+				 const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	const struct rte_flow_item_eth *eth_spec = __get_eth_spec(pattern_type);
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+	unsigned int i;
+
+	if (!eth_spec)
+		return -EINVAL;
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = eth_spec,
+	};
+	/* Optional VLAN for now will be VOID - will be filled later. */
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VLAN };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	/* Since VLAN filtering is done, create a single flow rule for each registered vid. */
+	for (i = 0; i < priv->vlan_filter_n; ++i) {
+		uint16_t vlan = priv->vlan_filter[i];
+		struct rte_flow_item_vlan vlan_spec = {
+			.tci = rte_cpu_to_be_16(vlan),
+		};
+
+		items[1].spec = &vlan_spec;
+		if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0))
+			return -rte_errno;
+	}
+	return 0;
+}
+
+static int
+__flow_hw_ctrl_flows_unicast(struct rte_eth_dev *dev,
+			     struct rte_flow_template_table *tbl,
+			     const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+			     const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct rte_flow_item_eth eth_spec;
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+	const struct rte_ether_addr cmp = {
+		.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	};
+	unsigned int i;
+
+	RTE_SET_USED(pattern_type);
+
+	memset(&eth_spec, 0, sizeof(eth_spec));
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = &eth_spec,
+	};
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VOID };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	for (i = 0; i < MLX5_MAX_MAC_ADDRESSES; ++i) {
+		struct rte_ether_addr *mac = &dev->data->mac_addrs[i];
+
+		if (!memcmp(mac, &cmp, sizeof(*mac)))
+			continue;
+		memcpy(&eth_spec.dst.addr_bytes, mac->addr_bytes, RTE_ETHER_ADDR_LEN);
+		if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0))
+			return -rte_errno;
+	}
+	return 0;
+}
+
+static int
+__flow_hw_ctrl_flows_unicast_vlan(struct rte_eth_dev *dev,
+				  struct rte_flow_template_table *tbl,
+				  const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+				  const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct rte_flow_item_eth eth_spec;
+	struct rte_flow_item items[5];
+	struct rte_flow_action actions[] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS },
+		{ .type = RTE_FLOW_ACTION_TYPE_END },
+	};
+	const struct rte_ether_addr cmp = {
+		.addr_bytes = "\x00\x00\x00\x00\x00\x00",
+	};
+	unsigned int i;
+	unsigned int j;
+
+	RTE_SET_USED(pattern_type);
+
+	memset(&eth_spec, 0, sizeof(eth_spec));
+	memset(items, 0, sizeof(items));
+	items[0] = (struct rte_flow_item){
+		.type = RTE_FLOW_ITEM_TYPE_ETH,
+		.spec = &eth_spec,
+	};
+	items[1] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_VLAN };
+	items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
+	items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
+	items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+	for (i = 0; i < MLX5_MAX_MAC_ADDRESSES; ++i) {
+		struct rte_ether_addr *mac = &dev->data->mac_addrs[i];
+
+		if (!memcmp(mac, &cmp, sizeof(*mac)))
+			continue;
+		memcpy(&eth_spec.dst.addr_bytes, mac->addr_bytes, RTE_ETHER_ADDR_LEN);
+		for (j = 0; j < priv->vlan_filter_n; ++j) {
+			uint16_t vlan = priv->vlan_filter[j];
+			struct rte_flow_item_vlan vlan_spec = {
+				.tci = rte_cpu_to_be_16(vlan),
+			};
+
+			items[1].spec = &vlan_spec;
+			if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0))
+				return -rte_errno;
+		}
+	}
+	return 0;
+}
+
+static int
+__flow_hw_ctrl_flows(struct rte_eth_dev *dev,
+		     struct rte_flow_template_table *tbl,
+		     const enum mlx5_flow_ctrl_rx_eth_pattern_type pattern_type,
+		     const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type)
+{
+	switch (pattern_type) {
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_ALL_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST:
+		return __flow_hw_ctrl_flows_single(dev, tbl, pattern_type, rss_type);
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_BCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV4_MCAST_VLAN:
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_IPV6_MCAST_VLAN:
+		return __flow_hw_ctrl_flows_single_vlan(dev, tbl, pattern_type, rss_type);
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC:
+		return __flow_hw_ctrl_flows_unicast(dev, tbl, pattern_type, rss_type);
+	case MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_DMAC_VLAN:
+		return __flow_hw_ctrl_flows_unicast_vlan(dev, tbl, pattern_type, rss_type);
+	default:
+		/* Should not reach here. */
+		MLX5_ASSERT(false);
+		rte_errno = EINVAL;
+		return -EINVAL;
+	}
+}
+
+
+int
+mlx5_flow_hw_ctrl_flows(struct rte_eth_dev *dev, uint32_t flags)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow_hw_ctrl_rx *hw_ctrl_rx;
+	unsigned int i;
+	unsigned int j;
+	int ret = 0;
+
+	RTE_SET_USED(priv);
+	RTE_SET_USED(flags);
+	if (!priv->dr_ctx) {
+		DRV_LOG(DEBUG, "port %u Control flow rules will not be created. "
+			       "HWS needs to be configured beforehand.",
+			       dev->data->port_id);
+		return 0;
+	}
+	if (!priv->hw_ctrl_rx) {
+		DRV_LOG(ERR, "port %u Control flow rules templates were not created.",
+			dev->data->port_id);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	hw_ctrl_rx = priv->hw_ctrl_rx;
+	for (i = 0; i < MLX5_FLOW_HW_CTRL_RX_ETH_PATTERN_MAX; ++i) {
+		const enum mlx5_flow_ctrl_rx_eth_pattern_type eth_pattern_type = i;
+
+		if (!eth_pattern_type_is_requested(eth_pattern_type, flags))
+			continue;
+		for (j = 0; j < MLX5_FLOW_HW_CTRL_RX_EXPANDED_RSS_MAX; ++j) {
+			const enum mlx5_flow_ctrl_rx_expanded_rss_type rss_type = j;
+			struct rte_flow_actions_template *at;
+			struct mlx5_flow_hw_ctrl_rx_table *tmpls = &hw_ctrl_rx->tables[i][j];
+			const struct mlx5_flow_template_table_cfg cfg = {
+				.attr = tmpls->attr,
+				.external = 0,
+			};
+
+			if (!hw_ctrl_rx->rss[rss_type]) {
+				at = flow_hw_create_ctrl_rx_rss_template(dev, rss_type);
+				if (!at)
+					return -rte_errno;
+				hw_ctrl_rx->rss[rss_type] = at;
+			} else {
+				at = hw_ctrl_rx->rss[rss_type];
+			}
+			if (!rss_type_is_requested(priv, rss_type))
+				continue;
+			if (!tmpls->tbl) {
+				tmpls->tbl = flow_hw_table_create(dev, &cfg,
+								  &tmpls->pt, 1, &at, 1, NULL);
+				if (!tmpls->tbl) {
+					DRV_LOG(ERR, "port %u Failed to create template table "
+						     "for control flow rules. Unable to create "
+						     "control flow rules.",
+						     dev->data->port_id);
+					return -rte_errno;
+				}
+			}
+
+			ret = __flow_hw_ctrl_flows(dev, tmpls->tbl, eth_pattern_type, rss_type);
+			if (ret) {
+				DRV_LOG(ERR, "port %u Failed to create control flow rule.",
+					dev->data->port_id);
+				return ret;
+			}
+		}
+	}
+	return 0;
+}
+
 void
 mlx5_flow_meter_uninit(struct rte_eth_dev *dev)
 {
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b1543b480e..b7818f9598 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2568,13 +2568,14 @@ mlx5_ind_table_obj_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_ind_table_obj *ind_tbl;
 	int ret;
+	uint32_t max_queues_n = priv->rxqs_n > queues_n ? priv->rxqs_n : queues_n;
 
 	/*
 	 * Allocate maximum queues for shared action as queue number
 	 * maybe modified later.
 	 */
 	ind_tbl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*ind_tbl) +
-			      (standalone ? priv->rxqs_n : queues_n) *
+			      (standalone ? max_queues_n : queues_n) *
 			      sizeof(uint16_t), 0, SOCKET_ID_ANY);
 	if (!ind_tbl) {
 		rte_errno = ENOMEM;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 8c9d5c1b13..4b821a1076 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1415,6 +1415,9 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_GEN, true);
 	mlx5_flow_meter_rxq_flush(dev);
 	mlx5_action_handle_detach(dev);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+	mlx5_flow_hw_cleanup_ctrl_rx_templates(dev);
+#endif
 	mlx5_rx_intr_vec_disable(dev);
 	priv->sh->port[priv->dev_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
@@ -1435,6 +1438,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	struct mlx5_sh_config *config = &priv->sh->config;
+	uint64_t flags = 0;
 	unsigned int i;
 	int ret;
 
@@ -1481,7 +1485,18 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
 	} else {
 		DRV_LOG(INFO, "port %u FDB default rule is disabled", dev->data->port_id);
 	}
-	return 0;
+	if (priv->isolated)
+		return 0;
+	if (dev->data->promiscuous)
+		flags |= MLX5_CTRL_PROMISCUOUS;
+	if (dev->data->all_multicast)
+		flags |= MLX5_CTRL_ALL_MULTICAST;
+	else
+		flags |= MLX5_CTRL_BROADCAST | MLX5_CTRL_IPV4_MULTICAST | MLX5_CTRL_IPV6_MULTICAST;
+	flags |= MLX5_CTRL_DMAC;
+	if (priv->vlan_filter_n)
+		flags |= MLX5_CTRL_VLAN_FILTER;
+	return mlx5_flow_hw_ctrl_flows(dev, flags);
 error:
 	ret = rte_errno;
 	mlx5_flow_hw_flush_ctrl_flows(dev);
@@ -1717,6 +1732,9 @@ mlx5_traffic_restart(struct rte_eth_dev *dev)
 {
 	if (dev->data->dev_started) {
 		mlx5_traffic_disable(dev);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+		mlx5_flow_hw_cleanup_ctrl_rx_templates(dev);
+#endif
 		return mlx5_traffic_enable(dev);
 	}
 	return 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 01/18] net/mlx5: fix invalid flow attributes
  2022-10-20 15:41   ` [PATCH v6 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
@ 2022-10-24  9:43     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:43 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: dev, Raslan Darawsheh, Ori Kam

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>
> Subject: [PATCH v6 01/18] net/mlx5: fix invalid flow attributes
> 
> In the function flow_get_drv_type(), attr will be read in non-HWS mode.
> In case user call the HWS API in SWS mode, attr should be placed in HWS
> functions, or it will cause crash.
> 
> Fixes: c40c061a022e ("net/mlx5: add basic flow queue operation")
> 
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields
  2022-10-20 15:41   ` [PATCH v6 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
@ 2022-10-24  9:43     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:43 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: dev, Raslan Darawsheh, Ori Kam

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>
> Subject: [PATCH v6 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields
> 
> In the flow_dv_hashfields_set() function, while item_flags was 0, the code
> went directly to the first if and the else case would never have chance be
> checked. This caused the IPv6 and TCP hash fields in the else case would
> never be set.
> 
> This commit adds the dedicate HW steering hash field set function to
> generate the RSS hash fields.
> 
> Fixes: 3a2f674b6aa8 ("net/mlx5: add queue and RSS HW steering action")
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 03/18] net/mlx5: add shared header reformat support
  2022-10-20 15:41   ` [PATCH v6 03/18] net/mlx5: add shared header reformat support Suanming Mou
@ 2022-10-24  9:44     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:44 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: dev, Raslan Darawsheh, Ori Kam

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>
> Subject: [PATCH v6 03/18] net/mlx5: add shared header reformat support
> 
> As the rte_flow_async API defines, the action mask with field value not be 0
> means the action will be used as shared in all the flows in the table.
> 
> The header reformat action with action mask field not be 0 will be created
> as constant shared action. For encapsulation header reformat action, there
> are two kinds of encapsulation data, raw_encap_data and rte_flow_item
> encap_data. Both of these two kinds of data can be identified from the
> action mask conf as constant or not.
> 
> Examples:
> 1. VXLAN encap (encap_data: rte_flow_item)
> 	action conf (eth/ipv4/udp/vxlan_hdr)
> 
> 	a. action mask conf (eth/ipv4/udp/vxlan_hdr)
> 	  - items are constant.
> 	b. action mask conf (NULL)
> 	  - items will change.
> 
> 2. RAW encap (encap_data: raw)
> 	action conf (raw_data)
> 
> 	a. action mask conf (not NULL)
> 	  - encap_data constant.
> 	b. action mask conf (NULL)
> 	  - encap_data will change.
> 
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 04/18] net/mlx5: add modify field hws support
  2022-10-20 15:41   ` [PATCH v6 04/18] net/mlx5: add modify field hws support Suanming Mou
@ 2022-10-24  9:44     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:44 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad
  Cc: dev, Raslan Darawsheh, Ori Kam, Dariusz Sosnowski

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Dariusz Sosnowski <dsosnowski@nvidia.com>
> Subject: [PATCH v6 04/18] net/mlx5: add modify field hws support
> 
> This patch introduces support for modify_field rte_flow actions in HWS
> mode. Support includes:
> 
> - Ingress and egress domains,
> 	- SET and ADD operations,
> 	- usage of arbitrary bit offsets and widths for packet and metadata
> 	fields.
> 
> 	Support is implemented in two phases:
> 
> 	1. On flow table creation the hardware commands are generated, based
> 	on rte_flow action templates, and stored alongside action template.
> 	2. On flow rule creation/queueing the hardware commands are updated
> with
> 	values provided by the user. Any masks over immediate values, provided
> 	in action templates, are applied to these values before enqueueing
> rules
> 	for creation.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 05/18] net/mlx5: add HW steering port action
  2022-10-20 15:41   ` [PATCH v6 05/18] net/mlx5: add HW steering port action Suanming Mou
@ 2022-10-24  9:44     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:44 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad
  Cc: dev, Raslan Darawsheh, Ori Kam, Dariusz Sosnowski

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Dariusz Sosnowski <dsosnowski@nvidia.com>
> Subject: [PATCH v6 05/18] net/mlx5: add HW steering port action
> 
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> 
> This patch implements creating and caching of port actions for use with
> HW Steering FDB flows.
> 
> Actions are created on flow template API configuration and created
> only on the port designated as master. Attaching and detaching of ports
> in the same switching domain causes an update to the port actions cache
> by, respectively, creating and destroying actions.
> 
> A new devarg fdb_def_rule_en is being added and it's used to control
> the default dedicated E-Switch rule is created by PMD implicitly or not,
> and PMD sets this value to 1 by default.
> If set to 0, the default E-Switch rule will not be created and user can
> create the specific E-Switch rule on root table if needed.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 07/18] net/mlx5: add HW steering meter action
  2022-10-20 15:41   ` [PATCH v6 07/18] net/mlx5: add HW steering meter action Suanming Mou
@ 2022-10-24  9:44     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:44 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad
  Cc: dev, Raslan Darawsheh, Ori Kam, Alexander Kozyrev

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Alexander Kozyrev <akozyrev@nvidia.com>
> Subject: [PATCH v6 07/18] net/mlx5: add HW steering meter action
> 
> From: Alexander Kozyrev <akozyrev@nvidia.com>
> 
> This commit adds meter action for HWS steering.
> 
> HW steering meter is based on ASO. The number of meters will
> be used by flows should be specified in advanced in the flow
> configure API.
> 
> Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 08/18] net/mlx5: add HW steering counter action
  2022-10-20 15:41   ` [PATCH v6 08/18] net/mlx5: add HW steering counter action Suanming Mou
@ 2022-10-24  9:45     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:45 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad, Ray Kinsella
  Cc: dev, Raslan Darawsheh, Ori Kam, Jack Min

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ray Kinsella <mdr@ashroe.eu>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Jack Min <jackmin@nvidia.com>
> Subject: [PATCH v6 08/18] net/mlx5: add HW steering counter action
> 
> From: Xiaoyu Min <jackmin@nvidia.com>
> 
> This commit adds HW steering counter action support.
> Pool mechanism is the basic data structure for the HW steering counter.
> 
> The HW steering's counter pool is based on the rte_ring of zero-copy
> variation.
> 
> There are two global rte_rings:
> 1. free_list:
>      Store the counters indexes, which are ready for use.
> 2. wait_reset_list:
>      Store the counters indexes, which are just freed from the user and
>      need to query the hardware counter to get the reset value before
>      this counter can be reused again.
> 
> The counter pool also supports cache per HW steering's queues, which are
> also based on rte_ring of zero-copy variation.
> 
> The cache can be configured in size, preload, threshold, and fetch size,
> they are all exposed via device args.
> 
> The main operations of the counter pool are as follows:
> 
>  - Get one counter from the pool:
>    1. The user call _get_* API.
>    2. If the cache is enabled, dequeue one counter index from the local
>       cache:
>       2.A: if the dequeued one from the local cache is still in reset
> 	status (counter's query_gen_when_free is equal to pool's query
> 	gen):
> 	I. Flush all counters in local cache back to global
> 	   wait_reset_list.
> 	II. Fetch _fetch_sz_ counters into the cache from the global
> 	    free list.
> 	III. Fetch one counter from the cache.
>    3. If the cache is empty, fetch _fetch_sz_ counters from the global
>       free list into the cache and fetch one counter from the cache.
>  - Free one counter into the pool:
>    1. The user calls _put_* API.
>    2. Put the counter into the local cache.
>    3. If the local cache is full:
>       3.A: Write back all counters above _threshold_ into the global
>            wait_reset_list.
>       3.B: Also, write back this counter into the global wait_reset_list.
> 
> When the local cache is disabled, _get_/_put_ cache directly from/into
> global list.
> 
> Signed-off-by: Xiaoyu Min <jackmin@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 06/18] net/mlx5: add extended metadata mode for hardware steering
  2022-10-20 15:41   ` [PATCH v6 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
@ 2022-10-24  9:45     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:45 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: dev, Raslan Darawsheh, Ori Kam, Bing Zhao

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Bing Zhao <bingz@nvidia.com>
> Subject: [PATCH v6 06/18] net/mlx5: add extended metadata mode for hardware
> steering
> 
> From: Bing Zhao <bingz@nvidia.com>
> 
> The new mode 4 of devarg "dv_xmeta_en" is added for HWS only. In this
> mode, the Rx / Tx metadata with 32b width copy between FDB and NIC is
> supported. The mark is only supported in NIC and there is no copy
> supported.
> 
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 09/18] net/mlx5: support DR action template API
  2022-10-20 15:41   ` [PATCH v6 09/18] net/mlx5: support DR action template API Suanming Mou
@ 2022-10-24  9:45     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:45 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad
  Cc: dev, Raslan Darawsheh, Ori Kam, Dariusz Sosnowski

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Dariusz Sosnowski <dsosnowski@nvidia.com>
> Subject: [PATCH v6 09/18] net/mlx5: support DR action template API
> 
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> 
> This patch adapts mlx5 PMD to changes in mlx5dr API regarding action
> templates. It changes the following:
> 
> 1. Actions template creation:
> 
>     - Flow actions types are translated to mlx5dr action types in order
>       to create mlx5dr_action_template object.
>     - An offset is assigned to each flow action. This offset is used to
>       predetermine action's location in rule_acts array passed on rule
>       creation.
> 
> 2. Template table creation:
> 
>     - Fixed actions are created and put in rule_acts cache using
>       predetermined offsets
>     - mlx5dr matcher is parametrized by action templates bound to
>       template table.
>     - mlx5dr matcher is configured to optimize rule creation based on
>       passed rule indices.
> 
> 3. Flow rule creation:
> 
>     - mlx5dr rule is parametrized by action template on which these
>       rule's actions are based.
>     - Rule index hint is provided to mlx5dr.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 10/18] net/mlx5: add HW steering connection tracking support
  2022-10-20 15:41   ` [PATCH v6 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
@ 2022-10-24  9:46     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:46 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: dev, Raslan Darawsheh, Ori Kam

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>
> Subject: [PATCH v6 10/18] net/mlx5: add HW steering connection tracking
> support
> 
> This commit adds the support of connection tracking to HW steering as SW
> steering did before.
> 
> Different with SW steering implementation, take advantage of HW steering
> bulk action allocation support, in HW steering only one single CT pool is
> needed.
> 
> An indexed pool is introduced to record allocated actions from bulk and CT
> action state etc. Once one CT action is allocated from bulk, one indexed
> object will also be allocated from the indexed pool, similar for deallocate.
> That makes mlx5_aso_ct_action can also be managed by that indexed pool, no
> need to be reserved from mlx5_aso_ct_pool. The single CT pool is also saved
> to mlx5_aso_ct_action struct directly.
> 
> The ASO operation functions are shared with SW steering implementation.
> 
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
  2022-10-20 15:41   ` [PATCH v6 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
@ 2022-10-24  9:46     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:46 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: dev, Raslan Darawsheh, Ori Kam, Gregory Etelson

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Gregory Etelson <getelson@nvidia.com>
> Subject: [PATCH v6 11/18] net/mlx5: add HW steering VLAN push, pop and VID
> modify flow actions
> 
> From: Gregory Etelson <getelson@nvidia.com>
> 
> Add PMD implementation for HW steering VLAN push, pop and modify flow
> actions.
> 
> HWS VLAN push flow action is triggered by a sequence of mandatory
> OF_PUSH_VLAN, OF_SET_VLAN_VID and optional OF_SET_VLAN_PCP flow actions
> commands.
> The commands must be arranged in the exact order:
> OF_PUSH_VLAN / OF_SET_VLAN_VID [ / OF_SET_VLAN_PCP ].
> In masked HWS VLAN push flow action template *ALL* the above flow actions
> must be masked.
> In non-masked HWS VLAN push flow action template *ALL* the above flow
> actions must not be masked.
> 
> Example:
> 
> flow actions_template <port id> create \ actions_template_id <action id> \
> template \
>   of_push_vlan / \
>   of_set_vlan_vid \
>   [ / of_set_vlan_pcp  ] / end \
> mask \
>   of_push_vlan ethertype 0 / \
>   of_set_vlan_vid vlan_vid 0 \
>   [ / of_set_vlan_pcp vlan_pcp 0 ] / end\
> 
> flow actions_template <port id> create \ actions_template_id <action id> \
> template \
>   of_push_vlan ethertype <E>/ \
>   of_set_vlan_vid vlan_vid <VID>\
>   [ / of_set_vlan_pcp  <PCP>] / end \
> mask \
>   of_push_vlan ethertype <type != 0> / \
>   of_set_vlan_vid vlan_vid <vid_mask != 0>\
>   [ / of_set_vlan_pcp vlan_pcp <pcp_mask != 0> ] / end\
> 
> HWS VLAN pop flow action is triggered by OF_POP_VLAN flow action command.
> HWS VLAN pop action template is always non-masked.
> 
> Example:
> 
> flow actions_template <port id> create \ actions_template_id <action id> \
> template of_pop_vlan / end mask of_pop_vlan / end
> 
> HWS VLAN VID modify flow action is triggered by a standalone OF_SET_VLAN_VID
> flow action command.
> HWS VLAN VID modify action template can be ether masked or non-masked.
> 
> Example:
> 
> flow actions_template <port id> create \ actions_template_id <action id> \
> template of_set_vlan_vid / end mask of_set_vlan_vid vlan_vid 0 / end
> 
> flow actions_template <port id> create \ actions_template_id <action id> \
> template of_set_vlan_vid vlan_vid 0x101 / end \ mask of_set_vlan_vid
> vlan_vid 0xffff / end
> 
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 12/18] net/mlx5: implement METER MARK indirect action for HWS
  2022-10-20 15:41   ` [PATCH v6 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
@ 2022-10-24  9:46     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:46 UTC (permalink / raw)
  To: Suanming Mou, Ferruh Yigit, Matan Azrad
  Cc: dev, Raslan Darawsheh, Ori Kam, Alexander Kozyrev

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Ferruh Yigit <ferruh.yigit@xilinx.com>; Matan Azrad <matan@nvidia.com>;
> Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Alexander Kozyrev <akozyrev@nvidia.com>
> Subject: [PATCH v6 12/18] net/mlx5: implement METER MARK indirect action for
> HWS
> 
> From: Alexander Kozyrev <akozyrev@nvidia.com>
> 
> Add ability to create an indirect action handle for METER_MARK.
> It allows to share one Meter between several different actions.
> 
> Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 13/18] net/mlx5: add HWS AGE action support
  2022-10-20 15:41   ` [PATCH v6 13/18] net/mlx5: add HWS AGE action support Suanming Mou
@ 2022-10-24  9:46     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:46 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: dev, Raslan Darawsheh, Ori Kam, Michael Baum

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Michael Baum <michaelba@nvidia.com>
> Subject: [PATCH v6 13/18] net/mlx5: add HWS AGE action support
> 
> From: Michael Baum <michaelba@nvidia.com>
> 
> Add support for AGE action for HW steering.
> This patch includes:
> 
>  1. Add new structures to manage the aging.
>  2. Initialize all them in configure function.
>  3. Implement per second aging check using CNT background thread.
>  4. Enable AGE action in flow create/destroy operations.
>  5. Implement queue-based function to report aged flow rules.
> 
> Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 14/18] net/mlx5: add async action push and pull support
  2022-10-20 15:41   ` [PATCH v6 14/18] net/mlx5: add async action push and pull support Suanming Mou
@ 2022-10-24  9:47     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:47 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: dev, Raslan Darawsheh, Ori Kam

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>
> Subject: [PATCH v6 14/18] net/mlx5: add async action push and pull support
> 
> The queue based rte_flow_async_action_* functions work same as queue based
> async flow functions. The operations can be pushed asynchronously, so is the
> pull.
> 
> This commit adds the async action missing push and pull support.
> 
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 15/18] net/mlx5: support flow integrity in HWS group 0
  2022-10-20 15:41   ` [PATCH v6 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
@ 2022-10-24  9:47     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:47 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: dev, Raslan Darawsheh, Ori Kam, Gregory Etelson

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Gregory Etelson <getelson@nvidia.com>
> Subject: [PATCH v6 15/18] net/mlx5: support flow integrity in HWS group 0
> 
> From: Gregory Etelson <getelson@nvidia.com>
> 
> - Reformat flow integrity item translation for HWS code.
> - Support flow integrity bits in HWS group 0.
> - Update integrity item translation to match positive semantics only.
> 
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 17/18] net/mlx5: support device control of representor matching
  2022-10-20 15:41   ` [PATCH v6 17/18] net/mlx5: support device control of representor matching Suanming Mou
@ 2022-10-24  9:47     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:47 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad
  Cc: dev, Raslan Darawsheh, Ori Kam, Dariusz Sosnowski

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Dariusz Sosnowski <dsosnowski@nvidia.com>
> Subject: [PATCH v6 17/18] net/mlx5: support device control of representor
> matching
> 
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> 
> In some E-Switch use cases applications want to receive all traffic
> on a single port. Since currently flow API does not provide a way to
> match traffic forwarded to any port representor, this patch adds
> support for controlling representor matching on ingress flow rules.
> 
> Representor matching is controlled through new device argument
> repr_matching_en.
> 
> - If representor matching is enabled (default setting),
>   then each ingress pattern template has an implicit REPRESENTED_PORT
>   item added. Flow rules based on this pattern template will match
>   the vport associated with port on which rule is created.
> - If representor matching is disabled, then there will be no implicit
>   item added. As a result ingress flow rules will match traffic
>   coming to any port, not only the port on which flow rule is created.
> 
> Representor matching is enabled by default, to provide an expected
> default behavior.
> 
> This patch enables egress flow rules on representors when E-Switch is
> enabled in the following configurations:
> 
> - repr_matching_en=1 and dv_xmeta_en=4
> - repr_matching_en=1 and dv_xmeta_en=0
> - repr_matching_en=0 and dv_xmeta_en=0
> 
> When representor matching is enabled, the following logic is
> implemented:
> 
> 1. Creating an egress template table in group 0 for each port. These
>    tables will hold default flow rules defined as follows:
> 
>       pattern SQ
>       actions MODIFY_FIELD (set available bits in REG_C_0 to
>                             vport_meta_tag)
>               MODIFY_FIELD (copy REG_A to REG_C_1, only when
>                             dv_xmeta_en == 4)
>               JUMP (group 1)
> 
> 2. Egress pattern templates created by an application have an implicit
>    MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
>    available bits of REG_C_0.
> 
> 3. Egress flow rules created by an application have an implicit
>    MLX5_RTE_FLOW_ITEM_TYPE_TAG item prepended to pattern, which matches
>    vport_meta_tag placed in available bits of REG_C_0.
> 
> 4. Egress template tables created by an application, which are in
>    group n, are placed in group n + 1.
> 
> 5. Items and actions related to META are operating on REG_A when
>    dv_xmeta_en == 0 or REG_C_1 when dv_xmeta_en == 4.
> 
> When representor matching is disabled and extended metadata is disabled,
> no changes to current logic are required.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 16/18] net/mlx5: support device control for E-Switch default rule
  2022-10-20 15:41   ` [PATCH v6 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
@ 2022-10-24  9:47     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:47 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad, Ray Kinsella
  Cc: dev, Raslan Darawsheh, Ori Kam, Dariusz Sosnowski, Xueming(Steven) Li

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ray Kinsella <mdr@ashroe.eu>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Dariusz Sosnowski <dsosnowski@nvidia.com>;
> Xueming(Steven) Li <xuemingl@nvidia.com>
> Subject: [PATCH v6 16/18] net/mlx5: support device control for E-Switch
> default rule
> 
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> 
> This patch adds support for fdb_def_rule_en device argument to HW Steering,
> which controls:
> 
> - creation of default FDB jump flow rule,
> - ability of the user to create transfer flow rules in root table.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 18/18] net/mlx5: create control flow rules with HWS
  2022-10-20 15:41   ` [PATCH v6 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
@ 2022-10-24  9:48     ` Slava Ovsiienko
  0 siblings, 0 replies; 140+ messages in thread
From: Slava Ovsiienko @ 2022-10-24  9:48 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad
  Cc: dev, Raslan Darawsheh, Ori Kam, Dariusz Sosnowski

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 18:42
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Dariusz Sosnowski <dsosnowski@nvidia.com>
> Subject: [PATCH v6 18/18] net/mlx5: create control flow rules with HWS
> 
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> 
> This patch adds creation of control flow rules required to receive default
> traffic (based on port configuration) with HWS.
> 
> Control flow rules are created on port start and destroyed on port stop.
> Handling of destroying these rules was already implemented before that
> patch.
> 
> Control flow rules are created if and only if flow isolation mode is
> disabled and creation process goes as follows:
> 
> - Port configuration is collected into a set of flags. Each flag
>   corresponds to a certain Ethernet pattern type, defined by
>   mlx5_flow_ctrl_rx_eth_pattern_type enumeration. There is a separate
>   flag for VLAN filtering.
> - For each possible Ethernet pattern type and:
>   - For each possible RSS action configuration:
>     - If configuration flags do not match this combination, it is
>       omitted.
>     - A template table is created using this combination of pattern
>       and actions template (templates are fetched from hw_ctrl_rx
>       struct stored in port's private data).
>     - Flow rules are created in this table.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [PATCH v6 00/18] net/mlx5: HW steering PMD update
  2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
                     ` (17 preceding siblings ...)
  2022-10-20 15:41   ` [PATCH v6 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
@ 2022-10-24 10:57   ` Raslan Darawsheh
  18 siblings, 0 replies; 140+ messages in thread
From: Raslan Darawsheh @ 2022-10-24 10:57 UTC (permalink / raw)
  To: Suanming Mou; +Cc: dev, Ori Kam

Hi,

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Thursday, October 20, 2022 6:42 PM
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam
> <orika@nvidia.com>
> Subject: [PATCH v6 00/18] net/mlx5: HW steering PMD update
> 
> The skeleton of mlx5 HW steering(HWS) has been updated into upstream for
> pretty a long time, but not updated anymore due to missing of the low-level
> steering layer code. Luckily, better late than never, the steering layer finnaly
> comes[1].
> 
> This series will add more features to the existing PMD code:
>  - FDB and metadata copy.
>  - Modify field.
>  - Meter color.
>  - Counter.
>  - Aging.
>  - Action template pre-parser optimization.
>  - Connection tracking.
>  - Control flow.
> 
> Some features such as meter/aging/ct touches the public API, and public API
> changes have been sent to ML much earily in other threads in order not to be
> swallowed by this big series.
> 
> The dpends patches as below:
>  [1]https://inbox.dpdk.org/dev/20220922190345.394-1-valex@nvidia.com/
> 
> ---
> 
>  v6:
>   - Rebase to the latest version.
> 
>  v5:
>   - Rebase to the latest version.
> 
>  v4:
>   - Disable aging due to the flow age API change still in progress.
>     https://patches.dpdk.org/project/dpdk/cover/20221019144904.2543586-1-
> michaelba@nvidia.com/
>   - Add control flow for HWS.
> 
>  v3:
>   - Fixed flow can't be aged out.
>   - Fix error not be filled properly while table creat failed.
>   - Remove transfer_mode in flow attributes before ethdev layer applied.
>     https://patches.dpdk.org/project/dpdk/patch/20220928092425.68214-1-
> rongweil@nvidia.com/
> 
>  v2:
>   - Remove the rte_flow patches as they will be integrated in other thread.
>   - Fix compilation issues.
>   - Make the patches be better organized.
> 
> Alexander Kozyrev (2):
>   net/mlx5: add HW steering meter action
>   net/mlx5: implement METER MARK indirect action for HWS
> 
> Bing Zhao (1):
>   net/mlx5: add extended metadata mode for hardware steering
> 
> Dariusz Sosnowski (5):
>   net/mlx5: add HW steering port action
>   net/mlx5: support DR action template API
>   net/mlx5: support device control for E-Switch default rule
>   net/mlx5: support device control of representor matching
>   net/mlx5: create control flow rules with HWS
> 
> Gregory Etelson (2):
>   net/mlx5: add HW steering VLAN push, pop and VID modify flow actions
>   net/mlx5: support flow integrity in HWS group 0
> 
> Michael Baum (1):
>   net/mlx5: add HWS AGE action support
> 
> Suanming Mou (6):
>   net/mlx5: fix invalid flow attributes
>   net/mlx5: fix IPv6 and TCP RSS hash fields
>   net/mlx5: add shared header reformat support
>   net/mlx5: add modify field hws support
>   net/mlx5: add HW steering connection tracking support
>   net/mlx5: add async action push and pull support
> 
> Xiaoyu Min (1):
>   net/mlx5: add HW steering counter action
> 
>  doc/guides/nics/features/default.ini   |    1 +
>  doc/guides/nics/features/mlx5.ini      |    2 +
>  doc/guides/nics/mlx5.rst               |   43 +-
>  doc/guides/rel_notes/release_22_11.rst |    8 +-
>  drivers/common/mlx5/mlx5_devx_cmds.c   |   50 +
>  drivers/common/mlx5/mlx5_devx_cmds.h   |   27 +
>  drivers/common/mlx5/mlx5_prm.h         |   22 +-
>  drivers/common/mlx5/version.map        |    1 +
>  drivers/net/mlx5/linux/mlx5_os.c       |   78 +-
>  drivers/net/mlx5/meson.build           |    1 +
>  drivers/net/mlx5/mlx5.c                |  126 +-
>  drivers/net/mlx5/mlx5.h                |  322 +-
>  drivers/net/mlx5/mlx5_defs.h           |    5 +
>  drivers/net/mlx5/mlx5_flow.c           |  409 +-
>  drivers/net/mlx5/mlx5_flow.h           |  335 +-
>  drivers/net/mlx5/mlx5_flow_aso.c       |  797 ++-
>  drivers/net/mlx5/mlx5_flow_dv.c        | 1128 +--
>  drivers/net/mlx5/mlx5_flow_hw.c        | 8789 +++++++++++++++++++++---
>  drivers/net/mlx5/mlx5_flow_meter.c     |  776 ++-
>  drivers/net/mlx5/mlx5_flow_verbs.c     |    8 +-
>  drivers/net/mlx5/mlx5_hws_cnt.c        | 1247 ++++
>  drivers/net/mlx5/mlx5_hws_cnt.h        |  703 ++
>  drivers/net/mlx5/mlx5_rxq.c            |    3 +-
>  drivers/net/mlx5/mlx5_trigger.c        |  272 +-
>  drivers/net/mlx5/mlx5_tx.h             |    1 +
>  drivers/net/mlx5/mlx5_txq.c            |   47 +
>  drivers/net/mlx5/mlx5_utils.h          |   10 +-
>  drivers/net/mlx5/rte_pmd_mlx5.h        |   17 +
>  drivers/net/mlx5/version.map           |    1 +
>  29 files changed, 13589 insertions(+), 1640 deletions(-)  create mode 100644
> drivers/net/mlx5/mlx5_hws_cnt.c  create mode 100644
> drivers/net/mlx5/mlx5_hws_cnt.h
> 
> --
> 2.25.1

Series applied to next-net-mlx with small modifications to the commit logs.

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 140+ messages in thread

end of thread, other threads:[~2022-10-24 10:57 UTC | newest]

Thread overview: 140+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-23 14:43 [PATCH 00/27] net/mlx5: HW steering PMD update Suanming Mou
2022-09-23 14:43 ` [PATCH 01/27] net/mlx5: fix invalid flow attributes Suanming Mou
2022-09-23 14:43 ` [PATCH 02/27] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
2022-09-23 14:43 ` [PATCH 03/27] net/mlx5: add shared header reformat support Suanming Mou
2022-09-23 14:43 ` [PATCH 04/27] net/mlx5: add modify field hws support Suanming Mou
2022-09-23 14:43 ` [PATCH 05/27] net/mlx5: validate modify field action template Suanming Mou
2022-09-23 14:43 ` [PATCH 06/27] net/mlx5: enable mark flag for all ports in the same domain Suanming Mou
2022-09-23 14:43 ` [PATCH 07/27] net/mlx5: create port actions Suanming Mou
2022-09-23 14:43 ` [PATCH 08/27] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
2022-09-23 14:43 ` [PATCH 09/27] ethdev: add meter profiles/policies config Suanming Mou
2022-09-23 14:43 ` [PATCH 10/27] net/mlx5: add HW steering meter action Suanming Mou
2022-09-23 14:43 ` [PATCH 11/27] net/mlx5: add HW steering counter action Suanming Mou
2022-09-23 14:43 ` [PATCH 12/27] net/mlx5: support caching queue action Suanming Mou
2022-09-23 14:43 ` [PATCH 13/27] net/mlx5: support DR action template API Suanming Mou
2022-09-23 14:43 ` [PATCH 14/27] net/mlx5: fix indirect action validate Suanming Mou
2022-09-23 14:43 ` [PATCH 15/27] net/mlx5: update indirect actions ops to HW variation Suanming Mou
2022-09-23 14:43 ` [PATCH 16/27] net/mlx5: support indirect count action for HW steering Suanming Mou
2022-09-23 14:43 ` [PATCH 17/27] net/mlx5: add pattern and table attribute validation Suanming Mou
2022-09-23 14:43 ` [PATCH 18/27] net/mlx5: add meta item support in egress Suanming Mou
2022-09-23 14:43 ` [PATCH 19/27] net/mlx5: add support for ASO return register Suanming Mou
2022-09-23 14:43 ` [PATCH 20/27] lib/ethdev: add connection tracking configuration Suanming Mou
2022-09-23 14:43 ` [PATCH 21/27] net/mlx5: add HW steering connection tracking support Suanming Mou
2022-09-23 14:43 ` [PATCH 22/27] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
2022-09-23 14:43 ` [PATCH 23/27] net/mlx5: add meter color flow matching in dv Suanming Mou
2022-09-23 14:43 ` [PATCH 24/27] net/mlx5: add meter color flow matching in hws Suanming Mou
2022-09-23 14:43 ` [PATCH 25/27] net/mlx5: implement profile/policy get Suanming Mou
2022-09-23 14:43 ` [PATCH 26/27] net/mlx5: implement METER MARK action for HWS Suanming Mou
2022-09-23 14:43 ` [PATCH 27/27] net/mlx5: implement METER MARK indirect " Suanming Mou
2022-09-28  3:31 ` [PATCH v2 00/17] net/mlx5: HW steering PMD update Suanming Mou
2022-09-28  3:31   ` [PATCH v2 01/17] net/mlx5: fix invalid flow attributes Suanming Mou
2022-09-28  3:31   ` [PATCH v2 02/17] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
2022-09-28  3:31   ` [PATCH v2 03/17] net/mlx5: add shared header reformat support Suanming Mou
2022-09-28  3:31   ` [PATCH v2 04/17] net/mlx5: add modify field hws support Suanming Mou
2022-09-28  3:31   ` [PATCH v2 05/17] net/mlx5: add HW steering port action Suanming Mou
2022-09-28  3:31   ` [PATCH v2 06/17] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
2022-09-28  3:31   ` [PATCH v2 07/17] net/mlx5: add HW steering meter action Suanming Mou
2022-09-28  3:31   ` [PATCH v2 08/17] net/mlx5: add HW steering counter action Suanming Mou
2022-09-28  3:31   ` [PATCH v2 09/17] net/mlx5: support DR action template API Suanming Mou
2022-09-28  3:31   ` [PATCH v2 10/17] net/mlx5: add HW steering connection tracking support Suanming Mou
2022-09-28  3:31   ` [PATCH v2 11/17] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
2022-09-28  3:31   ` [PATCH v2 12/17] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
2022-09-28  3:31   ` [PATCH v2 13/17] net/mlx5: add HWS AGE action support Suanming Mou
2022-09-28  3:31   ` [PATCH v2 14/17] net/mlx5: add async action push and pull support Suanming Mou
2022-09-28  3:31   ` [PATCH v2 15/17] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
2022-09-28  3:31   ` [PATCH v2 16/17] net/mlx5: support device control for E-Switch default rule Suanming Mou
2022-09-28  3:31   ` [PATCH v2 17/17] net/mlx5: support device control of representor matching Suanming Mou
2022-09-30 12:52 ` [PATCH v3 00/17] net/mlx5: HW steering PMD update Suanming Mou
2022-09-30 12:52   ` [PATCH v3 01/17] net/mlx5: fix invalid flow attributes Suanming Mou
2022-09-30 12:53   ` [PATCH v3 02/17] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
2022-09-30 12:53   ` [PATCH v3 03/17] net/mlx5: add shared header reformat support Suanming Mou
2022-09-30 12:53   ` [PATCH v3 04/17] net/mlx5: add modify field hws support Suanming Mou
2022-09-30 12:53   ` [PATCH v3 05/17] net/mlx5: add HW steering port action Suanming Mou
2022-09-30 12:53   ` [PATCH v3 06/17] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
2022-09-30 12:53   ` [PATCH v3 07/17] net/mlx5: add HW steering meter action Suanming Mou
2022-09-30 12:53   ` [PATCH v3 08/17] net/mlx5: add HW steering counter action Suanming Mou
2022-09-30 12:53   ` [PATCH v3 09/17] net/mlx5: support DR action template API Suanming Mou
2022-09-30 12:53   ` [PATCH v3 10/17] net/mlx5: add HW steering connection tracking support Suanming Mou
2022-09-30 12:53   ` [PATCH v3 11/17] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
2022-09-30 12:53   ` [PATCH v3 12/17] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
2022-09-30 12:53   ` [PATCH v3 13/17] net/mlx5: add HWS AGE action support Suanming Mou
2022-09-30 12:53   ` [PATCH v3 14/17] net/mlx5: add async action push and pull support Suanming Mou
2022-09-30 12:53   ` [PATCH v3 15/17] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
2022-09-30 12:53   ` [PATCH v3 16/17] net/mlx5: support device control for E-Switch default rule Suanming Mou
2022-09-30 12:53   ` [PATCH v3 17/17] net/mlx5: support device control of representor matching Suanming Mou
2022-10-19 16:25 ` [PATCH v4 00/18] net/mlx5: HW steering PMD update Suanming Mou
2022-10-19 16:25   ` [PATCH v4 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
2022-10-19 16:25   ` [PATCH v4 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
2022-10-19 16:25   ` [PATCH v4 03/18] net/mlx5: add shared header reformat support Suanming Mou
2022-10-19 16:25   ` [PATCH v4 04/18] net/mlx5: add modify field hws support Suanming Mou
2022-10-19 16:25   ` [PATCH v4 05/18] net/mlx5: add HW steering port action Suanming Mou
2022-10-19 16:25   ` [PATCH v4 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
2022-10-19 16:25   ` [PATCH v4 07/18] net/mlx5: add HW steering meter action Suanming Mou
2022-10-19 16:25   ` [PATCH v4 08/18] net/mlx5: add HW steering counter action Suanming Mou
2022-10-19 16:25   ` [PATCH v4 09/18] net/mlx5: support DR action template API Suanming Mou
2022-10-19 16:25   ` [PATCH v4 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
2022-10-19 16:25   ` [PATCH v4 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
2022-10-19 16:25   ` [PATCH v4 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
2022-10-19 16:25   ` [PATCH v4 13/18] net/mlx5: add HWS AGE action support Suanming Mou
2022-10-19 16:25   ` [PATCH v4 14/18] net/mlx5: add async action push and pull support Suanming Mou
2022-10-19 16:25   ` [PATCH v4 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
2022-10-19 16:25   ` [PATCH v4 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
2022-10-19 16:25   ` [PATCH v4 17/18] net/mlx5: support device control of representor matching Suanming Mou
2022-10-19 16:25   ` [PATCH v4 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
2022-10-20  3:21 ` [PATCH v5 00/18] net/mlx5: HW steering PMD update Suanming Mou
2022-10-20  3:21   ` [PATCH v5 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
2022-10-20  3:21   ` [PATCH v5 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
2022-10-20  3:21   ` [PATCH v5 03/18] net/mlx5: add shared header reformat support Suanming Mou
2022-10-20  3:21   ` [PATCH v5 04/18] net/mlx5: add modify field hws support Suanming Mou
2022-10-20  3:21   ` [PATCH v5 05/18] net/mlx5: add HW steering port action Suanming Mou
2022-10-20  3:21   ` [PATCH v5 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
2022-10-20  3:21   ` [PATCH v5 07/18] net/mlx5: add HW steering meter action Suanming Mou
2022-10-20  3:22   ` [PATCH v5 08/18] net/mlx5: add HW steering counter action Suanming Mou
2022-10-20  3:22   ` [PATCH v5 09/18] net/mlx5: support DR action template API Suanming Mou
2022-10-20  3:22   ` [PATCH v5 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
2022-10-20  3:22   ` [PATCH v5 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
2022-10-20  3:22   ` [PATCH v5 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
2022-10-20  3:22   ` [PATCH v5 13/18] net/mlx5: add HWS AGE action support Suanming Mou
2022-10-20  3:22   ` [PATCH v5 14/18] net/mlx5: add async action push and pull support Suanming Mou
2022-10-20  3:22   ` [PATCH v5 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
2022-10-20  3:22   ` [PATCH v5 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
2022-10-20  3:22   ` [PATCH v5 17/18] net/mlx5: support device control of representor matching Suanming Mou
2022-10-20  3:22   ` [PATCH v5 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
2022-10-20 15:41 ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Suanming Mou
2022-10-20 15:41   ` [PATCH v6 01/18] net/mlx5: fix invalid flow attributes Suanming Mou
2022-10-24  9:43     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 02/18] net/mlx5: fix IPv6 and TCP RSS hash fields Suanming Mou
2022-10-24  9:43     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 03/18] net/mlx5: add shared header reformat support Suanming Mou
2022-10-24  9:44     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 04/18] net/mlx5: add modify field hws support Suanming Mou
2022-10-24  9:44     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 05/18] net/mlx5: add HW steering port action Suanming Mou
2022-10-24  9:44     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 06/18] net/mlx5: add extended metadata mode for hardware steering Suanming Mou
2022-10-24  9:45     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 07/18] net/mlx5: add HW steering meter action Suanming Mou
2022-10-24  9:44     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 08/18] net/mlx5: add HW steering counter action Suanming Mou
2022-10-24  9:45     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 09/18] net/mlx5: support DR action template API Suanming Mou
2022-10-24  9:45     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 10/18] net/mlx5: add HW steering connection tracking support Suanming Mou
2022-10-24  9:46     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 11/18] net/mlx5: add HW steering VLAN push, pop and VID modify flow actions Suanming Mou
2022-10-24  9:46     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 12/18] net/mlx5: implement METER MARK indirect action for HWS Suanming Mou
2022-10-24  9:46     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 13/18] net/mlx5: add HWS AGE action support Suanming Mou
2022-10-24  9:46     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 14/18] net/mlx5: add async action push and pull support Suanming Mou
2022-10-24  9:47     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 15/18] net/mlx5: support flow integrity in HWS group 0 Suanming Mou
2022-10-24  9:47     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 16/18] net/mlx5: support device control for E-Switch default rule Suanming Mou
2022-10-24  9:47     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 17/18] net/mlx5: support device control of representor matching Suanming Mou
2022-10-24  9:47     ` Slava Ovsiienko
2022-10-20 15:41   ` [PATCH v6 18/18] net/mlx5: create control flow rules with HWS Suanming Mou
2022-10-24  9:48     ` Slava Ovsiienko
2022-10-24 10:57   ` [PATCH v6 00/18] net/mlx5: HW steering PMD update Raslan Darawsheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).