DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model
@ 2020-06-25 16:03 Gregory Etelson
  2020-06-25 16:03 ` [dpdk-dev] [PATCH 1/2] ethdev: allow negative values in flow rule types Gregory Etelson
                   ` (17 more replies)
  0 siblings, 18 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-06-25 16:03 UTC (permalink / raw)
  To: dev; +Cc: getelson, matan, rasland

Hardware vendors implement tunneled traffic offload techniques
differently. Although RTE flow API provides tools capable to offload
all sorts of network stacks, software application must reference this
hardware differences in flow rules compilation. As the result tunneled
traffic flow rules that utilize hardware capabilities can be different
for the same traffic.  

Tunnel port offload proposed in [1] provides software application with
unified rules model for tunneled traffic regardless underlying
hardware.
 - The model introduces a concept of a virtual tunnel port (VTP).
 - The model uses VTP to offload ingress tunneled network traffic 
   with RTE flow rules.
 - The model is implemented as set of helper functions. Each PMD
   implements VTP offload according to underlying hardware offload
   capabilities.  Applications must query PMD for VTP flow
   items / actions before using in creation of a VTP flow rule.

The model components:
- Virtual Tunnel Port (VTP) is a stateless software object that
  describes tunneled network traffic.  VTP object usually contains
  descriptions of outer headers, tunnel headers and inner headers.
- Tunnel Steering flow Rule (TSR) detects tunneled packets and
  delegates them to tunnel processing infrastructure, implemented 
  in PMD for optimal hardware utilization, for further processing.
- Tunnel Matching flow Rule (TMR) verifies packet configuration and
  runs offload actions in case of a match.

Application actions: 
1 Initialize VTP object according to tunnel
  network parameters.
2 Create TSR flow rule:
2.1 Query PMD for VTP actions: application can query for VTP actions
    more than once
    int
    rte_flow_tunnel_decap_set(uint16_t port_id,
                              struct rte_flow_tunnel *tunnel,
                              struct rte_flow_action **pmd_actions,
                              uint32_t *num_of_pmd_actions,
                              struct rte_flow_error *error);

2.2 Integrate PMD actions into TSR actions list.
2.3 Create TSR flow rule:
    flow create <port> group 0
          match {tunnel items} / end
          actions {PMD actions} / {App actions} / end

3 Create TMR flow rule:
3.1 Query PMD for VTP items: application can query for VTP items
    more than once
    int
    rte_flow_tunnel_match(uint16_t port_id,
                          struct rte_flow_tunnel *tunnel,
                          struct rte_flow_item **pmd_items,
                          uint32_t *num_of_pmd_items,
                          struct rte_flow_error *error);

3.2 Integrate PMD items into TMR items list:
3.3 Create TMR flow rule
    flow create <port> group 0
          match {PMD items} / {APP items} / end
          actions {offload actions} / end

The model provides helper function call to restore packets that miss
tunnel TMR rules to its original state:
int
rte_flow_get_restore_info(uint16_t port_id,
                          struct rte_mbuf *mbuf,
                          struct rte_flow_restore_info *info,
                          struct rte_flow_error *error);

rte_tunnel object filled by the call inside
rte_flow_restore_info *info parameter can be used by the application
to create new TMR rule for that tunnel.

The model requirements:
Software application must initialize
rte_tunnel object with tunnel parameters before calling
rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().

PMD actions array obtained in rte_flow_tunnel_decap_set() must be
released by application with rte_flow_action_release() call.
Application can release the actionsfter TSR rule was created.

PMD items array obtained with rte_flow_tunnel_match() must be released
by application with rte_flow_item_release() call.  Application can
release the items after rule was created. However, if the application
needs to create additional TMR rule for the same tunnel it will need
to obtain PMD items again.

Application cannot destroy rte_tunnel object before it releases all
PMD actions & PMD items referencing that tunnel.

[1] https://mails.dpdk.org/archives/dev/2020-June/169656.html


Eli Britstein (1):
  ethdev: tunnel offload model

Gregory Etelson (1):
  ethdev: allow negative values in flow rule types

 doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 lib/librte_ethdev/rte_flow.c             | 142 +++++++++++++++-
 lib/librte_ethdev/rte_flow.h             | 196 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 5 files changed, 474 insertions(+), 6 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH 1/2] ethdev: allow negative values in flow rule types
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
@ 2020-06-25 16:03 ` Gregory Etelson
  2020-07-05 13:34   ` Andrew Rybchenko
  2020-06-25 16:03 ` [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model Gregory Etelson
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-06-25 16:03 UTC (permalink / raw)
  To: dev; +Cc: getelson, matan, rasland, Ori Kam

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

Signed-off-by: Gregory Etelson <getelson@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
---
 lib/librte_ethdev/rte_flow.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 1685be5f73..c19d25649f 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -563,7 +563,12 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_item[item->type].size;
+		/**
+		 * allow PMD private flow item
+		 */
+		off = (uint32_t)item->type <= INT_MAX ?
+			rte_flow_desc_item[item->type].size :
+			sizeof(void *);
 		rte_memcpy(buf, data, (size > off ? off : size));
 		break;
 	}
@@ -666,7 +671,12 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_action[action->type].size;
+		/**
+		 * allow PMD private flow action
+		 */
+		off = (uint32_t)action->type <= INT_MAX ?
+			rte_flow_desc_action[action->type].size :
+			sizeof(void *);
 		rte_memcpy(buf, action->conf, (size > off ? off : size));
 		break;
 	}
@@ -708,8 +718,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
-		    !rte_flow_desc_item[src->type].name)
+		/**
+		 * allow PMD private flow item
+		 */
+		if (((uint32_t)src->type <= INT_MAX) &&
+			((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
+		    !rte_flow_desc_item[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
 				 "cannot convert unknown item type");
@@ -797,8 +811,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
-		    !rte_flow_desc_action[src->type].name)
+		/**
+		 * allow PMD private flow action
+		 */
+		if (((uint32_t)src->type <= INT_MAX) &&
+		    ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
+		    !rte_flow_desc_action[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				 src, "cannot convert unknown action type");
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
  2020-06-25 16:03 ` [dpdk-dev] [PATCH 1/2] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-06-25 16:03 ` Gregory Etelson
       [not found]   ` <DB8PR05MB6761ED02BCD188771BDCDE64A86F0@DB8PR05MB6761.eurprd05.prod.outlook.com>
  2020-07-05 14:50   ` Andrew Rybchenko
  2020-07-05 13:39 ` [dpdk-dev] [PATCH 0/2] " Andrew Rybchenko
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-06-25 16:03 UTC (permalink / raw)
  To: dev; +Cc: getelson, matan, rasland, Eli Britstein, Ori Kam

From: Eli Britstein <elibr@mellanox.com>

Hardware vendors implement tunneled traffic offload techniques
differently. Although RTE flow API provides tools capable to offload
all sorts of network stacks, software application must reference this
hardware differences in flow rules compilation. As the result tunneled
traffic flow rules that utilize hardware capabilities can be different
for the same traffic.

Tunnel port offload proposed in [1] provides software application with
unified rules model for tunneled traffic regardless underlying
hardware.
 - The model introduces a concept of a virtual tunnel port (VTP).
 - The model uses VTP to offload ingress tunneled network traffic 
   with RTE flow rules.
 - The model is implemented as set of helper functions. Each PMD
   implements VTP offload according to underlying hardware offload
   capabilities.  Applications must query PMD for VTP flow
   items / actions before using in creation of a VTP flow rule.

The model components:
- Virtual Tunnel Port (VTP) is a stateless software object that
  describes tunneled network traffic.  VTP object usually contains
  descriptions of outer headers, tunnel headers and inner headers.
- Tunnel Steering flow Rule (TSR) detects tunneled packets and
  delegates them to tunnel processing infrastructure, implemented
  in PMD for optimal hardware utilization, for further processing.
- Tunnel Matching flow Rule (TMR) verifies packet configuration and
  runs offload actions in case of a match.

Application actions:
1 Initialize VTP object according to tunnel
  network parameters.
2 Create TSR flow rule:
2.1 Query PMD for VTP actions: application can query for VTP actions
    more than once
    int
    rte_flow_tunnel_decap_set(uint16_t port_id,
                              struct rte_flow_tunnel *tunnel,
                              struct rte_flow_action **pmd_actions,
                              uint32_t *num_of_pmd_actions,
                              struct rte_flow_error *error);

2.2 Integrate PMD actions into TSR actions list.
2.3 Create TSR flow rule:
    flow create <port> group 0
          match {tunnel items} / end
          actions {PMD actions} / {App actions} / end

3 Create TMR flow rule:
3.1 Query PMD for VTP items: application can query for VTP items
    more than once
    int
    rte_flow_tunnel_match(uint16_t port_id,
                          struct rte_flow_tunnel *tunnel,
                          struct rte_flow_item **pmd_items,
                          uint32_t *num_of_pmd_items,
                          struct rte_flow_error *error);

3.2 Integrate PMD items into TMR items list:
3.3 Create TMR flow rule
    flow create <port> group 0
          match {PMD items} / {APP items} / end
          actions {offload actions} / end

The model provides helper function call to restore packets that miss
tunnel TMR rules to its original state:
int
rte_flow_get_restore_info(uint16_t port_id,
                          struct rte_mbuf *mbuf,
                          struct rte_flow_restore_info *info,
                          struct rte_flow_error *error);

rte_tunnel object filled by the call inside
rte_flow_restore_info *info parameter can be used by the application
to create new TMR rule for that tunnel.

The model requirements:
Software application must initialize
rte_tunnel object with tunnel parameters before calling
rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().

PMD actions array obtained in rte_flow_tunnel_decap_set() must be
released by application with rte_flow_action_release() call.
Application can release the actionsfter TSR rule was created.

PMD items array obtained with rte_flow_tunnel_match() must be released
by application with rte_flow_item_release() call.  Application can
release the items after rule was created. However, if the application
needs to create additional TMR rule for the same tunnel it will need
to obtain PMD items again.

Application cannot destroy rte_tunnel object before it releases all
PMD actions & PMD items referencing that tunnel.

[1] https://mails.dpdk.org/archives/dev/2020-June/169656.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
---
 doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 196 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 5 files changed, 450 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index d5dd18ce99..cfd98c2e7d 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3010,6 +3010,111 @@ operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Provide software application with unified rules model for tunneled traffic
+regardless underlying hardware.
+
+ - The model introduces a concept of a virtual tunnel port (VTP).
+ - The model uses VTP to offload ingress tunneled network traffic 
+   with RTE flow rules.
+ - The model is implemented as set of helper functions. Each PMD
+   implements VTP offload according to underlying hardware offload
+   capabilities.  Applications must query PMD for VTP flow
+   items / actions before using in creation of a VTP flow rule.
+
+The model components:
+
+- Virtual Tunnel Port (VTP) is a stateless software object that
+  describes tunneled network traffic.  VTP object usually contains
+  descriptions of outer headers, tunnel headers and inner headers.
+- Tunnel Steering flow Rule (TSR) detects tunneled packets and
+  delegates them to tunnel processing infrastructure, implemented
+  in PMD for optimal hardware utilization, for further processing.
+- Tunnel Matching flow Rule (TMR) verifies packet configuration and
+  runs offload actions in case of a match.
+
+Application actions:
+
+1 Initialize VTP object according to tunnel network parameters.
+
+2 Create TSR flow rule.
+
+2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
+
+  .. code-block:: c
+
+    int
+    rte_flow_tunnel_decap_set(uint16_t port_id,
+                              struct rte_flow_tunnel *tunnel,
+                              struct rte_flow_action **pmd_actions,
+                              uint32_t *num_of_pmd_actions,
+                              struct rte_flow_error *error);
+
+2.2 Integrate PMD actions into TSR actions list.
+
+2.3 Create TSR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
+
+3 Create TMR flow rule.
+
+3.1 Query PMD for VTP items. Application can query for VTP items more than once.
+
+    .. code-block:: c
+
+      int
+      rte_flow_tunnel_match(uint16_t port_id,
+                            struct rte_flow_tunnel *tunnel,
+                            struct rte_flow_item **pmd_items,
+                            uint32_t *num_of_pmd_items,
+                            struct rte_flow_error *error);
+
+3.2 Integrate PMD items into TMR items list.
+
+3.3 Create TMR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
+
+The model provides helper function call to restore packets that miss
+tunnel TMR rules to its original state:
+
+.. code-block:: c
+
+  int
+  rte_flow_get_restore_info(uint16_t port_id,
+                            struct rte_mbuf *mbuf,
+                            struct rte_flow_restore_info *info,
+                            struct rte_flow_error *error);
+
+rte_tunnel object filled by the call inside
+``rte_flow_restore_info *info parameter`` can be used by the application
+to create new TMR rule for that tunnel.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+Application can release the actionsfter TSR rule was created.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release the items after rule was created. However, if the application
+needs to create additional TMR rule for the same tunnel it will need
+to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index 7155056045..63800811df 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -241,4 +241,9 @@ EXPERIMENTAL {
 	__rte_ethdev_trace_rx_burst;
 	__rte_ethdev_trace_tx_burst;
 	rte_flow_get_aged_flows;
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_tunnel_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
 };
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index c19d25649f..2dc5bfbb3f 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1268,3 +1268,115 @@ rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, rte_strerror(ENOTSUP));
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_get_restore_info(uint16_t port_id,
+				 struct rte_mbuf *m,
+				 struct rte_flow_restore_info *restore_info,
+				 struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index b0e4199192..1374b6e5a7 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3324,6 +3324,202 @@ int
 rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 			uint32_t nb_contexts, struct rte_flow_error *error);
 
+/* Tunnel information. */
+__rte_experimental
+struct rte_flow_ip_tunnel_key {
+	rte_be64_t tun_id; /**< Tunnel identification. */
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	} u;
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+	rte_be16_t tun_flags; /**< Tunnel flags. */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	rte_be32_t label; /**< Flow Label for IPv6. */
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+};
+
+
+/* Tunnel has a type and the key information. */
+__rte_experimental
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type		type;
+	struct rte_flow_ip_tunnel_key	tun_info; /**< Tunnel key info. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+__rte_experimental
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID. */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_get_restore_info(uint16_t port_id,
+				 struct rte_mbuf *m,
+				 struct rte_flow_restore_info *info,
+				 struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 881cc469b7..ad1d7a2cdc 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -107,6 +107,38 @@ struct rte_flow_ops {
 		 void **context,
 		 uint32_t nb_contexts,
 		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
       [not found]     ` <38d3513f-1261-0fbc-7c56-f83ced61f97a@ashroe.eu>
@ 2020-07-01  6:52       ` Gregory Etelson
  2020-07-13  8:21         ` Thomas Monjalon
  0 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-07-01  6:52 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: Matan Azrad, Raslan Darawsheh, Ori Kam, John McNamara,
	Marko Kovacevic, Neil Horman, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko, Ajit Khaparde, sriharsha.basavapatna,
	hemal.shah, Eli Britstein, Oz Shlomo, dev



> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: Tuesday, June 30, 2020 14:30
> To: Gregory Etelson <getelson@mellanox.com>
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Ori Kam <orika@mellanox.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Neil Horman <nhorman@tuxdriver.com>;
> Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; sriharsha.basavapatna@broadcom.com;
> hemal.shah@broadcom.com; Eli Britstein <elibr@mellanox.com>; Oz Shlomo
> <ozsh@mellanox.com>
> Subject: Re: [PATCH 2/2] ethdev: tunnel offload model
> 
> 
> 
> On 30/06/2020 10:05, Gregory Etelson wrote:
> >
> > + maintainers
> >
> > -----Original Message-----
> > From: Gregory Etelson <getelson@mellanox.com>
> > Sent: Thursday, June 25, 2020 19:04
> > To: dev@dpdk.org
> > Cc: Gregory Etelson <getelson@mellanox.com>; Matan Azrad
> > <matan@mellanox.com>; Raslan Darawsheh <rasland@mellanox.com>; Eli
> > Britstein <elibr@mellanox.com>; Ori Kam <orika@mellanox.com>
> > Subject: [PATCH 2/2] ethdev: tunnel offload model
> >
> > From: Eli Britstein <elibr@mellanox.com>
> >
> > Hardware vendors implement tunneled traffic offload techniques
> differently. Although RTE flow API provides tools capable to offload all sorts
> of network stacks, software application must reference this hardware
> differences in flow rules compilation. As the result tunneled traffic flow rules
> that utilize hardware capabilities can be different for the same traffic.
> >
> > Tunnel port offload proposed in [1] provides software application with
> unified rules model for tunneled traffic regardless underlying hardware.
> >  - The model introduces a concept of a virtual tunnel port (VTP).
> >  - The model uses VTP to offload ingress tunneled network traffic
> >    with RTE flow rules.
> >  - The model is implemented as set of helper functions. Each PMD
> >    implements VTP offload according to underlying hardware offload
> >    capabilities.  Applications must query PMD for VTP flow
> >    items / actions before using in creation of a VTP flow rule.
> >
> > The model components:
> > - Virtual Tunnel Port (VTP) is a stateless software object that
> >   describes tunneled network traffic.  VTP object usually contains
> >   descriptions of outer headers, tunnel headers and inner headers.
> > - Tunnel Steering flow Rule (TSR) detects tunneled packets and
> >   delegates them to tunnel processing infrastructure, implemented
> >   in PMD for optimal hardware utilization, for further processing.
> > - Tunnel Matching flow Rule (TMR) verifies packet configuration and
> >   runs offload actions in case of a match.
> >
> > Application actions:
> > 1 Initialize VTP object according to tunnel
> >   network parameters.
> > 2 Create TSR flow rule:
> > 2.1 Query PMD for VTP actions: application can query for VTP actions
> >     more than once
> >     int
> >     rte_flow_tunnel_decap_set(uint16_t port_id,
> >                               struct rte_flow_tunnel *tunnel,
> >                               struct rte_flow_action **pmd_actions,
> >                               uint32_t *num_of_pmd_actions,
> >                               struct rte_flow_error *error);
> >
> > 2.2 Integrate PMD actions into TSR actions list.
> > 2.3 Create TSR flow rule:
> >     flow create <port> group 0
> >           match {tunnel items} / end
> >           actions {PMD actions} / {App actions} / end
> >
> > 3 Create TMR flow rule:
> > 3.1 Query PMD for VTP items: application can query for VTP items
> >     more than once
> >     int
> >     rte_flow_tunnel_match(uint16_t port_id,
> >                           struct rte_flow_tunnel *tunnel,
> >                           struct rte_flow_item **pmd_items,
> >                           uint32_t *num_of_pmd_items,
> >                           struct rte_flow_error *error);
> >
> > 3.2 Integrate PMD items into TMR items list:
> > 3.3 Create TMR flow rule
> >     flow create <port> group 0
> >           match {PMD items} / {APP items} / end
> >           actions {offload actions} / end
> >
> > The model provides helper function call to restore packets that miss tunnel
> TMR rules to its original state:
> > int
> > rte_flow_get_restore_info(uint16_t port_id,
> >                           struct rte_mbuf *mbuf,
> >                           struct rte_flow_restore_info *info,
> >                           struct rte_flow_error *error);
> >
> > rte_tunnel object filled by the call inside rte_flow_restore_info *info
> parameter can be used by the application to create new TMR rule for that
> tunnel.
> >
> > The model requirements:
> > Software application must initialize
> > rte_tunnel object with tunnel parameters before calling
> > rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> >
> > PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> released by application with rte_flow_action_release() call.
> > Application can release the actionsfter TSR rule was created.
> >
> > PMD items array obtained with rte_flow_tunnel_match() must be released
> by application with rte_flow_item_release() call.  Application can release the
> items after rule was created. However, if the application needs to create
> additional TMR rule for the same tunnel it will need to obtain PMD items
> again.
> >
> > Application cannot destroy rte_tunnel object before it releases all PMD
> actions & PMD items referencing that tunnel.
> >
> > [1]
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail
> > s.dpdk.org%2Farchives%2Fdev%2F2020-
> June%2F169656.html&amp;data=02%7C01
> >
> %7Cgetelson%40mellanox.com%7C1178dd5eb0214d807d6d08d81ce8e739%
> 7Ca65297
> >
> 1c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637291133935729423&amp;sd
> ata=G%2B
> > GIPy%2Bxz73sgmkem4jojYGKDDsXs8nKVK0Ktdek28c%3D&amp;reserved=0
> >
> > Signed-off-by: Eli Britstein <elibr@mellanox.com>
> > Acked-by: Ori Kam <orika@mellanox.com>
> > ---
> >  doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
> >  lib/librte_ethdev/rte_ethdev_version.map |   5 +
> >  lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
> >  lib/librte_ethdev/rte_flow.h             | 196 +++++++++++++++++++++++
> >  lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
> >  5 files changed, 450 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > index d5dd18ce99..cfd98c2e7d 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -3010,6 +3010,111 @@ operations include:
> >  - Duplication of a complete flow rule description.
> >  - Pattern item or action name retrieval.
> >
> > +Tunneled traffic offload
> > +~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Provide software application with unified rules model for tunneled
> > +traffic regardless underlying hardware.
> > +
> > + - The model introduces a concept of a virtual tunnel port (VTP).
> > + - The model uses VTP to offload ingress tunneled network traffic
> > +   with RTE flow rules.
> > + - The model is implemented as set of helper functions. Each PMD
> > +   implements VTP offload according to underlying hardware offload
> > +   capabilities.  Applications must query PMD for VTP flow
> > +   items / actions before using in creation of a VTP flow rule.
> > +
> > +The model components:
> > +
> > +- Virtual Tunnel Port (VTP) is a stateless software object that
> > +  describes tunneled network traffic.  VTP object usually contains
> > +  descriptions of outer headers, tunnel headers and inner headers.
> > +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> > +  delegates them to tunnel processing infrastructure, implemented
> > +  in PMD for optimal hardware utilization, for further processing.
> > +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> > +  runs offload actions in case of a match.
> > +
> > +Application actions:
> > +
> > +1 Initialize VTP object according to tunnel network parameters.
> > +
> > +2 Create TSR flow rule.
> > +
> > +2.1 Query PMD for VTP actions. Application can query for VTP actions
> more than once.
> > +
> > +  .. code-block:: c
> > +
> > +    int
> > +    rte_flow_tunnel_decap_set(uint16_t port_id,
> > +                              struct rte_flow_tunnel *tunnel,
> > +                              struct rte_flow_action **pmd_actions,
> > +                              uint32_t *num_of_pmd_actions,
> > +                              struct rte_flow_error *error);
> > +
> > +2.2 Integrate PMD actions into TSR actions list.
> > +
> > +2.3 Create TSR flow rule.
> > +
> > +    .. code-block:: console
> > +
> > +      flow create <port> group 0 match {tunnel items} / end actions
> > + {PMD actions} / {App actions} / end
> > +
> > +3 Create TMR flow rule.
> > +
> > +3.1 Query PMD for VTP items. Application can query for VTP items more
> than once.
> > +
> > +    .. code-block:: c
> > +
> > +      int
> > +      rte_flow_tunnel_match(uint16_t port_id,
> > +                            struct rte_flow_tunnel *tunnel,
> > +                            struct rte_flow_item **pmd_items,
> > +                            uint32_t *num_of_pmd_items,
> > +                            struct rte_flow_error *error);
> > +
> > +3.2 Integrate PMD items into TMR items list.
> > +
> > +3.3 Create TMR flow rule.
> > +
> > +    .. code-block:: console
> > +
> > +      flow create <port> group 0 match {PMD items} / {APP items} /
> > + end actions {offload actions} / end
> > +
> > +The model provides helper function call to restore packets that miss
> > +tunnel TMR rules to its original state:
> > +
> > +.. code-block:: c
> > +
> > +  int
> > +  rte_flow_get_restore_info(uint16_t port_id,
> > +                            struct rte_mbuf *mbuf,
> > +                            struct rte_flow_restore_info *info,
> > +                            struct rte_flow_error *error);
> > +
> > +rte_tunnel object filled by the call inside ``rte_flow_restore_info
> > +*info parameter`` can be used by the application to create new TMR
> > +rule for that tunnel.
> > +
> > +The model requirements:
> > +
> > +Software application must initialize
> > +rte_tunnel object with tunnel parameters before calling
> > +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> > +
> > +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> > +released by application with rte_flow_action_release() call.
> > +Application can release the actionsfter TSR rule was created.
> > +
> > +PMD items array obtained with rte_flow_tunnel_match() must be
> > +released by application with rte_flow_item_release() call.
> > +Application can release the items after rule was created. However, if
> > +the application needs to create additional TMR rule for the same
> > +tunnel it will need to obtain PMD items again.
> > +
> > +Application cannot destroy rte_tunnel object before it releases all
> > +PMD actions & PMD items referencing that tunnel.
> > +
> >  Caveats
> >  -------
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev_version.map
> > b/lib/librte_ethdev/rte_ethdev_version.map
> > index 7155056045..63800811df 100644
> > --- a/lib/librte_ethdev/rte_ethdev_version.map
> > +++ b/lib/librte_ethdev/rte_ethdev_version.map
> > @@ -241,4 +241,9 @@ EXPERIMENTAL {
> >  	__rte_ethdev_trace_rx_burst;
> >  	__rte_ethdev_trace_tx_burst;
> >  	rte_flow_get_aged_flows;
> > +	rte_flow_tunnel_decap_set;
> > +	rte_flow_tunnel_match;
> > +	rte_flow_tunnel_get_restore_info;
> > +	rte_flow_tunnel_action_decap_release;
> > +	rte_flow_tunnel_item_release;
> >  };
> > diff --git a/lib/librte_ethdev/rte_flow.c
> > b/lib/librte_ethdev/rte_flow.c index c19d25649f..2dc5bfbb3f 100644
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -1268,3 +1268,115 @@ rte_flow_get_aged_flows(uint16_t port_id,
> void **contexts,
> >  				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> >  				  NULL, rte_strerror(ENOTSUP));
> >  }
> > +
> > +int
> > +rte_flow_tunnel_decap_set(uint16_t port_id,
> > +			  struct rte_flow_tunnel *tunnel,
> > +			  struct rte_flow_action **actions,
> > +			  uint32_t *num_of_actions,
> > +			  struct rte_flow_error *error)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->tunnel_decap_set)) {
> > +		return flow_err(port_id,
> > +				ops->tunnel_decap_set(dev, tunnel, actions,
> > +						      num_of_actions, error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > +
> > +int
> > +rte_flow_tunnel_match(uint16_t port_id,
> > +		      struct rte_flow_tunnel *tunnel,
> > +		      struct rte_flow_item **items,
> > +		      uint32_t *num_of_items,
> > +		      struct rte_flow_error *error) {
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->tunnel_match)) {
> > +		return flow_err(port_id,
> > +				ops->tunnel_match(dev, tunnel, items,
> > +						  num_of_items, error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > +
> > +int
> > +rte_flow_tunnel_get_restore_info(uint16_t port_id,
> > +				 struct rte_mbuf *m,
> > +				 struct rte_flow_restore_info *restore_info,
> > +				 struct rte_flow_error *error)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->get_restore_info)) {
> > +		return flow_err(port_id,
> > +				ops->get_restore_info(dev, m, restore_info,
> > +						      error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > +
> > +int
> > +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> > +				     struct rte_flow_action *actions,
> > +				     uint32_t num_of_actions,
> > +				     struct rte_flow_error *error) {
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->action_release)) {
> > +		return flow_err(port_id,
> > +				ops->action_release(dev, actions,
> > +						    num_of_actions, error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > +
> > +int
> > +rte_flow_tunnel_item_release(uint16_t port_id,
> > +			     struct rte_flow_item *items,
> > +			     uint32_t num_of_items,
> > +			     struct rte_flow_error *error) {
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> > +
> > +	if (unlikely(!ops))
> > +		return -rte_errno;
> > +	if (likely(!!ops->item_release)) {
> > +		return flow_err(port_id,
> > +				ops->item_release(dev, items,
> > +						  num_of_items, error),
> > +				error);
> > +	}
> > +	return rte_flow_error_set(error, ENOTSUP,
> > +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +				  NULL, rte_strerror(ENOTSUP));
> > +}
> > diff --git a/lib/librte_ethdev/rte_flow.h
> > b/lib/librte_ethdev/rte_flow.h index b0e4199192..1374b6e5a7 100644
> > --- a/lib/librte_ethdev/rte_flow.h
> > +++ b/lib/librte_ethdev/rte_flow.h
> > @@ -3324,6 +3324,202 @@ int
> >  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
> >  			uint32_t nb_contexts, struct rte_flow_error *error);
> >
> > +/* Tunnel information. */
> > +__rte_experimental
> 
> __rte_experimental is not required AFAIK on structure definitions, structure
> definitions are not symbols, just on exported functions and variables.
> 
> Did you get a specific warning, that made you add this?

[Gregory Etelson] The attribute is not required in structures definition.
It's removed in v2 patch version

> 
> > +struct rte_flow_ip_tunnel_key {
> > +	rte_be64_t tun_id; /**< Tunnel identification. */
> > +	union {
> > +		struct {
> > +			rte_be32_t src_addr; /**< IPv4 source address. */
> > +			rte_be32_t dst_addr; /**< IPv4 destination address.
> */
> > +		} ipv4;
> > +		struct {
> > +			uint8_t src_addr[16]; /**< IPv6 source address. */
> > +			uint8_t dst_addr[16]; /**< IPv6 destination address.
> */
> > +		} ipv6;
> > +	} u;
> > +	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> > +	rte_be16_t tun_flags; /**< Tunnel flags. */
> > +	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> > +	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
> > +	rte_be32_t label; /**< Flow Label for IPv6. */
> > +	rte_be16_t tp_src; /**< Tunnel port source. */
> > +	rte_be16_t tp_dst; /**< Tunnel port destination. */ };
> > +
> > +
> > +/* Tunnel has a type and the key information. */ __rte_experimental
> > +struct rte_flow_tunnel {
> > +	/**
> > +	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> > +	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> > +	 */
> > +	enum rte_flow_item_type		type;
> > +	struct rte_flow_ip_tunnel_key	tun_info; /**< Tunnel key info. */
> > +};
> > +
> > +/**
> > + * Indicate that the packet has a tunnel.
> > + */
> > +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> > +
> > +/**
> > + * Indicate that the packet has a non decapsulated tunnel header.
> > + */
> > +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> > +
> > +/**
> > + * Indicate that the packet has a group_id.
> > + */
> > +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> > +
> > +/**
> > + * Restore information structure to communicate the current packet
> > +processing
> > + * state when some of the processing pipeline is done in hardware and
> > +should
> > + * continue in software.
> > + */
> > +__rte_experimental
> > +struct rte_flow_restore_info {
> > +	/**
> > +	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation
> of
> > +	 * other fields in struct rte_flow_restore_info.
> > +	 */
> > +	uint64_t flags;
> > +	uint32_t group_id; /**< Group ID. */
> > +	struct rte_flow_tunnel tunnel; /**< Tunnel information. */ };
> > +
> > +/**
> > + * Allocate an array of actions to be used in rte_flow_create, to
> > +implement
> > + * tunnel-decap-set for the given tunnel.
> > + * Sample usage:
> > + *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> > + *            jump group 0 / end
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] tunnel
> > + *   Tunnel properties.
> > + * @param[out] actions
> > + *   Array of actions to be allocated by the PMD. This array should be
> > + *   concatenated with the actions array provided to rte_flow_create.
> > + * @param[out] num_of_actions
> > + *   Number of actions allocated.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_decap_set(uint16_t port_id,
> > +			  struct rte_flow_tunnel *tunnel,
> > +			  struct rte_flow_action **actions,
> > +			  uint32_t *num_of_actions,
> > +			  struct rte_flow_error *error);
> > +
> > +/**
> > + * Allocate an array of items to be used in rte_flow_create, to
> > +implement
> > + * tunnel-match for the given tunnel.
> > + * Sample usage:
> > + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> > + *           inner-header-matches / end
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] tunnel
> > + *   Tunnel properties.
> > + * @param[out] items
> > + *   Array of items to be allocated by the PMD. This array should be
> > + *   concatenated with the items array provided to rte_flow_create.
> > + * @param[out] num_of_items
> > + *   Number of items allocated.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_match(uint16_t port_id,
> > +		      struct rte_flow_tunnel *tunnel,
> > +		      struct rte_flow_item **items,
> > +		      uint32_t *num_of_items,
> > +		      struct rte_flow_error *error);
> > +
> > +/**
> > + * Populate the current packet processing state, if exists, for the given
> mbuf.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] m
> > + *   Mbuf struct.
> > + * @param[out] info
> > + *   Restore information. Upon success contains the HW state.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_get_restore_info(uint16_t port_id,
> > +				 struct rte_mbuf *m,
> > +				 struct rte_flow_restore_info *info,
> > +				 struct rte_flow_error *error);
> > +
> > +/**
> > + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] actions
> > + *   Array of actions to be released.
> > + * @param[in] num_of_actions
> > + *   Number of elements in actions array.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> > +				     struct rte_flow_action *actions,
> > +				     uint32_t num_of_actions,
> > +				     struct rte_flow_error *error);
> > +
> > +/**
> > + * Release the item array as allocated by rte_flow_tunnel_match.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] items
> > + *   Array of items to be released.
> > + * @param[in] num_of_items
> > + *   Number of elements in item array.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_item_release(uint16_t port_id,
> > +			     struct rte_flow_item *items,
> > +			     uint32_t num_of_items,
> > +			     struct rte_flow_error *error);
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_ethdev/rte_flow_driver.h
> > b/lib/librte_ethdev/rte_flow_driver.h
> > index 881cc469b7..ad1d7a2cdc 100644
> > --- a/lib/librte_ethdev/rte_flow_driver.h
> > +++ b/lib/librte_ethdev/rte_flow_driver.h
> > @@ -107,6 +107,38 @@ struct rte_flow_ops {
> >  		 void **context,
> >  		 uint32_t nb_contexts,
> >  		 struct rte_flow_error *err);
> > +	/** See rte_flow_tunnel_decap_set() */
> > +	int (*tunnel_decap_set)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_flow_tunnel *tunnel,
> > +		 struct rte_flow_action **pmd_actions,
> > +		 uint32_t *num_of_actions,
> > +		 struct rte_flow_error *err);
> > +	/** See rte_flow_tunnel_match() */
> > +	int (*tunnel_match)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_flow_tunnel *tunnel,
> > +		 struct rte_flow_item **pmd_items,
> > +		 uint32_t *num_of_items,
> > +		 struct rte_flow_error *err);
> > +	/** See rte_flow_get_rte_flow_restore_info() */
> > +	int (*get_restore_info)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_mbuf *m,
> > +		 struct rte_flow_restore_info *info,
> > +		 struct rte_flow_error *err);
> > +	/** See rte_flow_action_tunnel_decap_release() */
> > +	int (*action_release)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_flow_action *pmd_actions,
> > +		 uint32_t num_of_actions,
> > +		 struct rte_flow_error *err);
> > +	/** See rte_flow_item_release() */
> > +	int (*item_release)
> > +		(struct rte_eth_dev *dev,
> > +		 struct rte_flow_item *pmd_items,
> > +		 uint32_t num_of_items,
> > +		 struct rte_flow_error *err);
> >  };
> >
> >  /**
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: allow negative values in flow rule types
  2020-06-25 16:03 ` [dpdk-dev] [PATCH 1/2] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-07-05 13:34   ` Andrew Rybchenko
  2020-08-19 14:33     ` Gregory Etelson
  0 siblings, 1 reply; 95+ messages in thread
From: Andrew Rybchenko @ 2020-07-05 13:34 UTC (permalink / raw)
  To: Gregory Etelson, dev; +Cc: matan, rasland, Ori Kam

On 6/25/20 7:03 PM, Gregory Etelson wrote:
> RTE flow items & actions use positive values in item & action type.
> Negative values are reserved for PMD private types. PMD
> items & actions usually are not exposed to application and are not
> used to create RTE flows.
> 
> The patch allows applications with access to PMD flow
> items & actions ability to integrate RTE and PMD items & actions
> and use them to create flow rule.
> 
> Signed-off-by: Gregory Etelson <getelson@mellanox.com>
> Acked-by: Ori Kam <orika@mellanox.com>
> ---
>  lib/librte_ethdev/rte_flow.c | 30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index 1685be5f73..c19d25649f 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -563,7 +563,12 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
>  		}
>  		break;
>  	default:
> -		off = rte_flow_desc_item[item->type].size;
> +		/**
> +		 * allow PMD private flow item
> +		 */
> +		off = (uint32_t)item->type <= INT_MAX ?
> +			rte_flow_desc_item[item->type].size :
> +			sizeof(void *);

May be it is out-of-scope of the patch (strictly speaking), but
usage of 'off' variable is very misleading here. It is not used
as an offset. It is used as a size to copy.

Also it is absolutely unclear why sizeof(void *) is a right
size for PMD private flow items.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
  2020-06-25 16:03 ` [dpdk-dev] [PATCH 1/2] ethdev: allow negative values in flow rule types Gregory Etelson
  2020-06-25 16:03 ` [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model Gregory Etelson
@ 2020-07-05 13:39 ` Andrew Rybchenko
  2020-09-08 20:15 ` [dpdk-dev] [PATCH v2 0/4] Tunnel Offload API Gregory Etelson
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 95+ messages in thread
From: Andrew Rybchenko @ 2020-07-05 13:39 UTC (permalink / raw)
  To: Gregory Etelson, dev; +Cc: matan, rasland

On 6/25/20 7:03 PM, Gregory Etelson wrote:
> Hardware vendors implement tunneled traffic offload techniques
> differently. Although RTE flow API provides tools capable to offload
> all sorts of network stacks, software application must reference this
> hardware differences in flow rules compilation. As the result tunneled
> traffic flow rules that utilize hardware capabilities can be different
> for the same traffic.  
> 
> Tunnel port offload proposed in [1] provides software application with
> unified rules model for tunneled traffic regardless underlying
> hardware.
>  - The model introduces a concept of a virtual tunnel port (VTP).
>  - The model uses VTP to offload ingress tunneled network traffic 
>    with RTE flow rules.
>  - The model is implemented as set of helper functions. Each PMD
>    implements VTP offload according to underlying hardware offload
>    capabilities.  Applications must query PMD for VTP flow
>    items / actions before using in creation of a VTP flow rule.
> 
> The model components:
> - Virtual Tunnel Port (VTP) is a stateless software object that
>   describes tunneled network traffic.  VTP object usually contains
>   descriptions of outer headers, tunnel headers and inner headers.
> - Tunnel Steering flow Rule (TSR) detects tunneled packets and
>   delegates them to tunnel processing infrastructure, implemented 
>   in PMD for optimal hardware utilization, for further processing.
> - Tunnel Matching flow Rule (TMR) verifies packet configuration and
>   runs offload actions in case of a match.
> 
> Application actions: 
> 1 Initialize VTP object according to tunnel
>   network parameters.
> 2 Create TSR flow rule:
> 2.1 Query PMD for VTP actions: application can query for VTP actions
>     more than once
>     int
>     rte_flow_tunnel_decap_set(uint16_t port_id,
>                               struct rte_flow_tunnel *tunnel,
>                               struct rte_flow_action **pmd_actions,
>                               uint32_t *num_of_pmd_actions,
>                               struct rte_flow_error *error);
> 
> 2.2 Integrate PMD actions into TSR actions list.
> 2.3 Create TSR flow rule:
>     flow create <port> group 0
>           match {tunnel items} / end
>           actions {PMD actions} / {App actions} / end
> 
> 3 Create TMR flow rule:
> 3.1 Query PMD for VTP items: application can query for VTP items
>     more than once
>     int
>     rte_flow_tunnel_match(uint16_t port_id,
>                           struct rte_flow_tunnel *tunnel,
>                           struct rte_flow_item **pmd_items,
>                           uint32_t *num_of_pmd_items,
>                           struct rte_flow_error *error);
> 
> 3.2 Integrate PMD items into TMR items list:
> 3.3 Create TMR flow rule
>     flow create <port> group 0
>           match {PMD items} / {APP items} / end
>           actions {offload actions} / end
> 
> The model provides helper function call to restore packets that miss
> tunnel TMR rules to its original state:
> int
> rte_flow_get_restore_info(uint16_t port_id,
>                           struct rte_mbuf *mbuf,
>                           struct rte_flow_restore_info *info,
>                           struct rte_flow_error *error);
> 
> rte_tunnel object filled by the call inside
> rte_flow_restore_info *info parameter can be used by the application
> to create new TMR rule for that tunnel.
> 
> The model requirements:
> Software application must initialize
> rte_tunnel object with tunnel parameters before calling
> rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> 
> PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> released by application with rte_flow_action_release() call.
> Application can release the actionsfter TSR rule was created.
> 
> PMD items array obtained with rte_flow_tunnel_match() must be released
> by application with rte_flow_item_release() call.  Application can
> release the items after rule was created. However, if the application
> needs to create additional TMR rule for the same tunnel it will need
> to obtain PMD items again.
> 
> Application cannot destroy rte_tunnel object before it releases all
> PMD actions & PMD items referencing that tunnel.
> 
> [1] https://mails.dpdk.org/archives/dev/2020-June/169656.html

Three copies of the above text here, in 2/2 description and 2/2
content in DPDK documentation is too much. IMHO, it should only
one place in the documentation with brief summary in
patch/series description.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
  2020-06-25 16:03 ` [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model Gregory Etelson
       [not found]   ` <DB8PR05MB6761ED02BCD188771BDCDE64A86F0@DB8PR05MB6761.eurprd05.prod.outlook.com>
@ 2020-07-05 14:50   ` Andrew Rybchenko
  2020-08-19 14:30     ` Gregory Etelson
  1 sibling, 1 reply; 95+ messages in thread
From: Andrew Rybchenko @ 2020-07-05 14:50 UTC (permalink / raw)
  To: Gregory Etelson, dev; +Cc: matan, rasland, Eli Britstein, Ori Kam

Hi Gregory,

I'm sorry for the review with toooo many questions without any
suggestions on how to answer it. Please, see below.

On 6/25/20 7:03 PM, Gregory Etelson wrote:
> From: Eli Britstein <elibr@mellanox.com>
> 
> Hardware vendors implement tunneled traffic offload techniques
> differently. Although RTE flow API provides tools capable to offload
> all sorts of network stacks, software application must reference this
> hardware differences in flow rules compilation. As the result tunneled
> traffic flow rules that utilize hardware capabilities can be different
> for the same traffic.
> 
> Tunnel port offload proposed in [1] provides software application with
> unified rules model for tunneled traffic regardless underlying
> hardware.
>  - The model introduces a concept of a virtual tunnel port (VTP).
>  - The model uses VTP to offload ingress tunneled network traffic 
>    with RTE flow rules.
>  - The model is implemented as set of helper functions. Each PMD
>    implements VTP offload according to underlying hardware offload
>    capabilities.  Applications must query PMD for VTP flow
>    items / actions before using in creation of a VTP flow rule.
> 
> The model components:
> - Virtual Tunnel Port (VTP) is a stateless software object that
>   describes tunneled network traffic.  VTP object usually contains
>   descriptions of outer headers, tunnel headers and inner headers.
> - Tunnel Steering flow Rule (TSR) detects tunneled packets and
>   delegates them to tunnel processing infrastructure, implemented
>   in PMD for optimal hardware utilization, for further processing.
> - Tunnel Matching flow Rule (TMR) verifies packet configuration and
>   runs offload actions in case of a match.
> 
> Application actions:
> 1 Initialize VTP object according to tunnel
>   network parameters.
> 2 Create TSR flow rule:
> 2.1 Query PMD for VTP actions: application can query for VTP actions
>     more than once
>     int
>     rte_flow_tunnel_decap_set(uint16_t port_id,
>                               struct rte_flow_tunnel *tunnel,
>                               struct rte_flow_action **pmd_actions,
>                               uint32_t *num_of_pmd_actions,
>                               struct rte_flow_error *error);
> 
> 2.2 Integrate PMD actions into TSR actions list.
> 2.3 Create TSR flow rule:
>     flow create <port> group 0
>           match {tunnel items} / end
>           actions {PMD actions} / {App actions} / end
> 
> 3 Create TMR flow rule:
> 3.1 Query PMD for VTP items: application can query for VTP items
>     more than once
>     int
>     rte_flow_tunnel_match(uint16_t port_id,
>                           struct rte_flow_tunnel *tunnel,
>                           struct rte_flow_item **pmd_items,
>                           uint32_t *num_of_pmd_items,
>                           struct rte_flow_error *error);
> 
> 3.2 Integrate PMD items into TMR items list:
> 3.3 Create TMR flow rule
>     flow create <port> group 0
>           match {PMD items} / {APP items} / end
>           actions {offload actions} / end
> 
> The model provides helper function call to restore packets that miss
> tunnel TMR rules to its original state:
> int
> rte_flow_get_restore_info(uint16_t port_id,
>                           struct rte_mbuf *mbuf,
>                           struct rte_flow_restore_info *info,
>                           struct rte_flow_error *error);
> 
> rte_tunnel object filled by the call inside
> rte_flow_restore_info *info parameter can be used by the application
> to create new TMR rule for that tunnel.
> 
> The model requirements:
> Software application must initialize
> rte_tunnel object with tunnel parameters before calling
> rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> 
> PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> released by application with rte_flow_action_release() call.
> Application can release the actionsfter TSR rule was created.
> 
> PMD items array obtained with rte_flow_tunnel_match() must be released
> by application with rte_flow_item_release() call.  Application can
> release the items after rule was created. However, if the application
> needs to create additional TMR rule for the same tunnel it will need
> to obtain PMD items again.
> 
> Application cannot destroy rte_tunnel object before it releases all
> PMD actions & PMD items referencing that tunnel.
> 
> [1] https://mails.dpdk.org/archives/dev/2020-June/169656.html
> 
> Signed-off-by: Eli Britstein <elibr@mellanox.com>
> Acked-by: Ori Kam <orika@mellanox.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
>  lib/librte_ethdev/rte_ethdev_version.map |   5 +
>  lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
>  lib/librte_ethdev/rte_flow.h             | 196 +++++++++++++++++++++++
>  lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
>  5 files changed, 450 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index d5dd18ce99..cfd98c2e7d 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3010,6 +3010,111 @@ operations include:
>  - Duplication of a complete flow rule description.
>  - Pattern item or action name retrieval.
>  
> +Tunneled traffic offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Provide software application with unified rules model for tunneled traffic
> +regardless underlying hardware.
> +
> + - The model introduces a concept of a virtual tunnel port (VTP).

It looks like it is absolutely abstract concept now, since it
is not mentioned/referenced in the header file. It makes it
hard to put the description and API together.

> + - The model uses VTP to offload ingress tunneled network traffic 
> +   with RTE flow rules.
> + - The model is implemented as set of helper functions. Each PMD
> +   implements VTP offload according to underlying hardware offload
> +   capabilities.  Applications must query PMD for VTP flow
> +   items / actions before using in creation of a VTP flow rule.

For me it looks like "creation of a VTP flow rule" is not
covered yet. Flow rules examples mention it in pattern and
actions, but there is no corresponding pattern items and
actions. May be I simply misunderstand the idea.

> +
> +The model components:
> +
> +- Virtual Tunnel Port (VTP) is a stateless software object that
> +  describes tunneled network traffic.  VTP object usually contains
> +  descriptions of outer headers, tunnel headers and inner headers.

Are inner headers really a part of the tunnel description?

> +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> +  delegates them to tunnel processing infrastructure, implemented
> +  in PMD for optimal hardware utilization, for further processing.
> +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> +  runs offload actions in case of a match.

Is it for fully offloaded tunnels with encap/decap or all
tunnels (detected, but partially offloaded, e.g. checksumming)?

> +
> +Application actions:
> +
> +1 Initialize VTP object according to tunnel network parameters.

I.e. fill in 'struct rte_flow_tunnel'. Is it correct?

> +
> +2 Create TSR flow rule.
> +
> +2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
> +
> +  .. code-block:: c
> +
> +    int
> +    rte_flow_tunnel_decap_set(uint16_t port_id,
> +                              struct rte_flow_tunnel *tunnel,
> +                              struct rte_flow_action **pmd_actions,
> +                              uint32_t *num_of_pmd_actions,
> +                              struct rte_flow_error *error);
> +
> +2.2 Integrate PMD actions into TSR actions list.
> +
> +2.3 Create TSR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end

Are application actions strictly required?
If no, it is better to make it clear.
Do tunnel items correlate here somehow with tunnel
specification in 'struct rte_flow_tunnel'?
Is it obtained using rte_flow_tunnel_match()?

> +
> +3 Create TMR flow rule.
> +
> +3.1 Query PMD for VTP items. Application can query for VTP items more than once.
> +
> +    .. code-block:: c
> +
> +      int
> +      rte_flow_tunnel_match(uint16_t port_id,
> +                            struct rte_flow_tunnel *tunnel,
> +                            struct rte_flow_item **pmd_items,
> +                            uint32_t *num_of_pmd_items,
> +                            struct rte_flow_error *error);
> +
> +3.2 Integrate PMD items into TMR items list.
> +
> +3.3 Create TMR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
> +
> +The model provides helper function call to restore packets that miss
> +tunnel TMR rules to its original state:
> +
> +.. code-block:: c
> +
> +  int
> +  rte_flow_get_restore_info(uint16_t port_id,
> +                            struct rte_mbuf *mbuf,
> +                            struct rte_flow_restore_info *info,
> +                            struct rte_flow_error *error);
> +
> +rte_tunnel object filled by the call inside
> +``rte_flow_restore_info *info parameter`` can be used by the application
> +to create new TMR rule for that tunnel.

I think an example, for example, for VXLAN over IPv4 tunnel
case with some concrete parameters would be very useful here
for understanding. Could it be annotated with a description
of the transformations happening with a packet on different
stages of the processing (including restore example).

> +
> +The model requirements:
> +
> +Software application must initialize
> +rte_tunnel object with tunnel parameters before calling
> +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> +
> +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> +released by application with rte_flow_action_release() call.
> +Application can release the actionsfter TSR rule was created.

actionsfter ?

> +
> +PMD items array obtained with rte_flow_tunnel_match() must be released
> +by application with rte_flow_item_release() call.  Application can
> +release the items after rule was created. However, if the application
> +needs to create additional TMR rule for the same tunnel it will need
> +to obtain PMD items again.
> +
> +Application cannot destroy rte_tunnel object before it releases all
> +PMD actions & PMD items referencing that tunnel.
> +
>  Caveats
>  -------
>  

[snip]

> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index b0e4199192..1374b6e5a7 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3324,6 +3324,202 @@ int
>  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>  			uint32_t nb_contexts, struct rte_flow_error *error);
>  
> +/* Tunnel information. */
> +__rte_experimental
> +struct rte_flow_ip_tunnel_key {
> +	rte_be64_t tun_id; /**< Tunnel identification. */

What is it? Why is it big-endian? Why is it in IP tunnel key?
I.e. why is it not in a generic structure?

> +	union {
> +		struct {
> +			rte_be32_t src_addr; /**< IPv4 source address. */
> +			rte_be32_t dst_addr; /**< IPv4 destination address. */
> +		} ipv4;
> +		struct {
> +			uint8_t src_addr[16]; /**< IPv6 source address. */
> +			uint8_t dst_addr[16]; /**< IPv6 destination address. */
> +		} ipv6;
> +	} u;
> +	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> +	rte_be16_t tun_flags; /**< Tunnel flags. */

Which flags? Where are these flags defined?
Why is it big-endian?

> +	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> +	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */

If combine, I'd stick to IPv6 terminology since it is a bit
better (well-thought, especially current tendencies in
(re)naming in software).

> +	rte_be32_t label; /**< Flow Label for IPv6. */

What about IPv6 tunnels with extension headers? How to extend?

> +	rte_be16_t tp_src; /**< Tunnel port source. */
> +	rte_be16_t tp_dst; /**< Tunnel port destination. */

What about IP-in-IP tunnels? Is it applicable?

> +};
> +
> +
> +/* Tunnel has a type and the key information. */
> +__rte_experimental
> +struct rte_flow_tunnel {
> +	/**
> +	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> +	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> +	 */
> +	enum rte_flow_item_type		type;
> +	struct rte_flow_ip_tunnel_key	tun_info; /**< Tunnel key info. */

How to extended for non-IP tunnels? MPLS?
Or tunnels with more protocols? E.g. MPLS-over-UDP?

> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +__rte_experimental
> +struct rte_flow_restore_info {
> +	/**
> +	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> +	 * other fields in struct rte_flow_restore_info.
> +	 */
> +	uint64_t flags;
> +	uint32_t group_id; /**< Group ID. */

What is the group ID here?

> +	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-decap-set for the given tunnel.
> + * Sample usage:
> + *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> + *            jump group 0 / end

Why is jump to group used in example above? Is it mandatory?

> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] actions
> + *   Array of actions to be allocated by the PMD. This array should be
> + *   concatenated with the actions array provided to rte_flow_create.

Please, specify concatenation order explicitly.

> + * @param[out] num_of_actions
> + *   Number of actions allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> +			  struct rte_flow_tunnel *tunnel,
> +			  struct rte_flow_action **actions,
> +			  uint32_t *num_of_actions,

Why does approach to specify actions differ here?
I.e. array of specified size vs END-terminated array?
Must the actions array be END-terminated here?
It must be a strong reason to do it and it should be
explained.

> +			  struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> + *           inner-header-matches / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] items
> + *   Array of items to be allocated by the PMD. This array should be
> + *   concatenated with the items array provided to rte_flow_create.

Concatenation order/rules should be described.
Since it is an output which entity does the concatenation.
Is it allowed to refine PMD rules in application
rule specification?

> + * @param[out] num_of_items
> + *   Number of items allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +		      struct rte_flow_tunnel *tunnel,
> +		      struct rte_flow_item **items,
> +		      uint32_t *num_of_items,

Same as above for actions.

> +		      struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given mbuf.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] m
> + *   Mbuf struct.
> + * @param[out] info
> + *   Restore information. Upon success contains the HW state.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_get_restore_info(uint16_t port_id,
> +				 struct rte_mbuf *m,
> +				 struct rte_flow_restore_info *info,

Is it suggesting to make a copy of the restore info for each
mbuf? It sounds very expensive. Could you share your thoughts
about it.

> +				 struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] actions
> + *   Array of actions to be released.
> + * @param[in] num_of_actions
> + *   Number of elements in actions array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> +				     struct rte_flow_action *actions,
> +				     uint32_t num_of_actions,

Same question as above for actions and items specification
approach.

> +				     struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] items
> + *   Array of items to be released.
> + * @param[in] num_of_items
> + *   Number of elements in item array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> +			     struct rte_flow_item *items,
> +			     uint32_t num_of_items,

Same question as above for actions and items specification
approach.

> +			     struct rte_flow_error *error);
>  #ifdef __cplusplus
>  }
>  #endif

[snip]

Andrew.

(Right now it is hard to fully imagine how to deal with it.
And it looks like a shim to vendor-specific API. May be I'm
wrong. Hopefully the next version will have PMD implementation
example and it will shed a bit more light on it.)

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
  2020-07-01  6:52       ` Gregory Etelson
@ 2020-07-13  8:21         ` Thomas Monjalon
  2020-07-13 13:23           ` Gregory Etelson
  0 siblings, 1 reply; 95+ messages in thread
From: Thomas Monjalon @ 2020-07-13  8:21 UTC (permalink / raw)
  To: Gregory Etelson
  Cc: Kinsella, Ray, dev, Matan Azrad, Raslan Darawsheh, Ori Kam,
	John McNamara, Marko Kovacevic, Neil Horman, Ferruh Yigit,
	Andrew Rybchenko, Ajit Khaparde, sriharsha.basavapatna,
	hemal.shah, Eli Britstein, Oz Shlomo, dev

01/07/2020 08:52, Gregory Etelson:
> From: Kinsella, Ray <mdr@ashroe.eu>
> > __rte_experimental is not required AFAIK on structure definitions, structure
> > definitions are not symbols, just on exported functions and variables.
> > 
> > Did you get a specific warning, that made you add this?
> 
> [Gregory Etelson] The attribute is not required in structures definition.
> It's removed in v2 patch version

Is v2 sent? I don't find it.



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
  2020-07-13  8:21         ` Thomas Monjalon
@ 2020-07-13 13:23           ` Gregory Etelson
  0 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-07-13 13:23 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Kinsella, Ray, dev, Matan Azrad, Raslan Darawsheh, Ori Kam,
	John McNamara, Marko Kovacevic, Neil Horman, Ferruh Yigit,
	Andrew Rybchenko, Ajit Khaparde, sriharsha.basavapatna,
	hemal.shah, Eli Britstein, Oz Shlomo, dev

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 13, 2020 11:21
> To: Gregory Etelson <getelson@mellanox.com>
> Cc: Kinsella, Ray <mdr@ashroe.eu>; dev@dpdk.org; Matan Azrad
> <matan@mellanox.com>; Raslan Darawsheh <rasland@mellanox.com>; Ori
> Kam <orika@mellanox.com>; John McNamara <john.mcnamara@intel.com>;
> Marko Kovacevic <marko.kovacevic@intel.com>; Neil Horman
> <nhorman@tuxdriver.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Andrew
> Rybchenko <arybchenko@solarflare.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; sriharsha.basavapatna@broadcom.com;
> hemal.shah@broadcom.com; Eli Britstein <elibr@mellanox.com>; Oz Shlomo
> <ozsh@mellanox.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
> 
> 01/07/2020 08:52, Gregory Etelson:
> > From: Kinsella, Ray <mdr@ashroe.eu>
> > > __rte_experimental is not required AFAIK on structure definitions,
> > > structure definitions are not symbols, just on exported functions and
> variables.
> > >
> > > Did you get a specific warning, that made you add this?
> >
> > [Gregory Etelson] The attribute is not required in structures definition.
> > It's removed in v2 patch version
> 
> Is v2 sent? I don't find it.
> 

V2 was not posted yet. it's in progress.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
  2020-07-05 14:50   ` Andrew Rybchenko
@ 2020-08-19 14:30     ` Gregory Etelson
  0 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-08-19 14:30 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Eli Britstein, Ori Kam

Hello Andrew,

Thank you for detailed review.
Sorry for the late response.
Please see my answers inline.

Regards,
Gregory

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Sunday, July 5, 2020 17:51
> To: Gregory Etelson <getelson@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh 
> <rasland@mellanox.com>; Eli Britstein <elibr@mellanox.com>; Ori Kam 
> <orika@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
> 
> Hi Gregory,
> 
> I'm sorry for the review with toooo many questions without any 
> suggestions on how to answer it. Please, see below.
> 
> On 6/25/20 7:03 PM, Gregory Etelson wrote:
> > From: Eli Britstein <elibr@mellanox.com>
> >
> > Hardware vendors implement tunneled traffic offload techniques 
> > differently. Although RTE flow API provides tools capable to offload 
> > all sorts of network stacks, software application must reference 
> > this hardware differences in flow rules compilation. As the result 
> > tunneled traffic flow rules that utilize hardware capabilities can 
> > be different for the same traffic.
> >
> > Tunnel port offload proposed in [1] provides software application 
> > with unified rules model for tunneled traffic regardless underlying 
> > hardware.
> >  - The model introduces a concept of a virtual tunnel port (VTP).
> >  - The model uses VTP to offload ingress tunneled network traffic
> >    with RTE flow rules.
> >  - The model is implemented as set of helper functions. Each PMD
> >    implements VTP offload according to underlying hardware offload
> >    capabilities.  Applications must query PMD for VTP flow
> >    items / actions before using in creation of a VTP flow rule.
> >
> > The model components:
> > - Virtual Tunnel Port (VTP) is a stateless software object that
> >   describes tunneled network traffic.  VTP object usually contains
> >   descriptions of outer headers, tunnel headers and inner headers.
> > - Tunnel Steering flow Rule (TSR) detects tunneled packets and
> >   delegates them to tunnel processing infrastructure, implemented
> >   in PMD for optimal hardware utilization, for further processing.
> > - Tunnel Matching flow Rule (TMR) verifies packet configuration and
> >   runs offload actions in case of a match.
> >
> > Application actions:
> > 1 Initialize VTP object according to tunnel
> >   network parameters.
> > 2 Create TSR flow rule:
> > 2.1 Query PMD for VTP actions: application can query for VTP actions
> >     more than once
> >     int
> >     rte_flow_tunnel_decap_set(uint16_t port_id,
> >                               struct rte_flow_tunnel *tunnel,
> >                               struct rte_flow_action **pmd_actions,
> >                               uint32_t *num_of_pmd_actions,
> >                               struct rte_flow_error *error);
> >
> > 2.2 Integrate PMD actions into TSR actions list.
> > 2.3 Create TSR flow rule:
> >     flow create <port> group 0
> >           match {tunnel items} / end
> >           actions {PMD actions} / {App actions} / end
> >
> > 3 Create TMR flow rule:
> > 3.1 Query PMD for VTP items: application can query for VTP items
> >     more than once
> >     int
> >     rte_flow_tunnel_match(uint16_t port_id,
> >                           struct rte_flow_tunnel *tunnel,
> >                           struct rte_flow_item **pmd_items,
> >                           uint32_t *num_of_pmd_items,
> >                           struct rte_flow_error *error);
> >
> > 3.2 Integrate PMD items into TMR items list:
> > 3.3 Create TMR flow rule
> >     flow create <port> group 0
> >           match {PMD items} / {APP items} / end
> >           actions {offload actions} / end
> >
> > The model provides helper function call to restore packets that miss 
> > tunnel TMR rules to its original state:
> > int
> > rte_flow_get_restore_info(uint16_t port_id,
> >                           struct rte_mbuf *mbuf,
> >                           struct rte_flow_restore_info *info,
> >                           struct rte_flow_error *error);
> >
> > rte_tunnel object filled by the call inside rte_flow_restore_info 
> > *info parameter can be used by the application to create new TMR 
> > rule for that tunnel.
> >
> > The model requirements:
> > Software application must initialize rte_tunnel object with tunnel 
> > parameters before calling
> > rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> >
> > PMD actions array obtained in rte_flow_tunnel_decap_set() must be 
> > released by application with rte_flow_action_release() call.
> > Application can release the actionsfter TSR rule was created.
> >
> > PMD items array obtained with rte_flow_tunnel_match() must be 
> > released by application with rte_flow_item_release() call.
> > Application can release the items after rule was created. However, 
> > if the application needs to create additional TMR rule for the same 
> > tunnel it will need to obtain PMD items again.
> >
> > Application cannot destroy rte_tunnel object before it releases all 
> > PMD actions & PMD items referencing that tunnel.
> >
> > [1]
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
> > il
> > s.dpdk.org%2Farchives%2Fdev%2F2020-
> June%2F169656.html&amp;data=02%7C01
> >
> %7Cgetelson%40mellanox.com%7Cb1dd5d535357492c727508d820f2d97b%7
> Ca65297
> >
> 1c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637295574702858871&amp;sd
> ata=Ej%2
> >
> FtMUt2A%2FV3y70iRTXq5NmCRbW8nxCx4PLZQQFwb3o%3D&amp;reserved=
> 0
> >
> > Signed-off-by: Eli Britstein <elibr@mellanox.com>
> > Acked-by: Ori Kam <orika@mellanox.com>
> > ---
> >  doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
> >  lib/librte_ethdev/rte_ethdev_version.map |   5 +
> >  lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
> >  lib/librte_ethdev/rte_flow.h             | 196 +++++++++++++++++++++++
> >  lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
> >  5 files changed, 450 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > index d5dd18ce99..cfd98c2e7d 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -3010,6 +3010,111 @@ operations include:
> >  - Duplication of a complete flow rule description.
> >  - Pattern item or action name retrieval.
> >
> > +Tunneled traffic offload
> > +~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Provide software application with unified rules model for tunneled 
> > +traffic regardless underlying hardware.
> > +
> > + - The model introduces a concept of a virtual tunnel port (VTP).
> 
> It looks like it is absolutely abstract concept now, since it is not 
> mentioned/referenced in the header file. It makes it hard to put the 
> description and API together.
> 

Virtual tunnel is a concept that allows PMD to restore packet outer header after decap.
Example below shows how this feature is used.

> > + - The model uses VTP to offload ingress tunneled network traffic
> > +   with RTE flow rules.
> > + - The model is implemented as set of helper functions. Each PMD
> > +   implements VTP offload according to underlying hardware offload
> > +   capabilities.  Applications must query PMD for VTP flow
> > +   items / actions before using in creation of a VTP flow rule.
> 
> For me it looks like "creation of a VTP flow rule" is not covered yet. 
> Flow rules examples mention it in pattern and actions, but there is no 
> corresponding pattern items and actions. May be I simply misunderstand the idea.
>

This feature does not introduce new RTE items or actions. 
The idea is that PMD will use private items and actions to implement internal logic 
that will allow packet restore after decap. There's a detailed example below.
Internal PMD elements referred as pmd_item / pmd_action.

> > +
> > +The model components:
> > +
> > +- Virtual Tunnel Port (VTP) is a stateless software object that
> > +  describes tunneled network traffic.  VTP object usually contains
> > +  descriptions of outer headers, tunnel headers and inner headers.
> 
> Are inner headers really a part of the tunnel description?
>

Inner headers are not part of tunnel description.

> > +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> > +  delegates them to tunnel processing infrastructure, implemented
> > +  in PMD for optimal hardware utilization, for further processing.
> > +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> > +  runs offload actions in case of a match.
> 
> Is it for fully offloaded tunnels with encap/decap or all tunnels 
> (detected, but partially offloaded, e.g. checksumming)?
> 

Tunnel API is designed for tunnel packets that will go through decap action.
Current tunnel API version focuses on VXLAN tunnel.

> > +
> > +Application actions:
> > +
> > +1 Initialize VTP object according to tunnel network parameters.
> 
> I.e. fill in 'struct rte_flow_tunnel'. Is it correct?
>

Correct. Application initializes rte_flow_tunnel structure

> > +
> > +2 Create TSR flow rule.
> > +
> > +2.1 Query PMD for VTP actions. Application can query for VTP 
> > +actions
> more than once.
> > +
> > +  .. code-block:: c
> > +
> > +    int
> > +    rte_flow_tunnel_decap_set(uint16_t port_id,
> > +                              struct rte_flow_tunnel *tunnel,
> > +                              struct rte_flow_action **pmd_actions,
> > +                              uint32_t *num_of_pmd_actions,
> > +                              struct rte_flow_error *error);
> > +
> > +2.2 Integrate PMD actions into TSR actions list.
> > +
> > +2.3 Create TSR flow rule.
> > +
> > +    .. code-block:: console
> > +
> > +      flow create <port> group 0 match {tunnel items} / end actions 
> > + {PMD actions} / {App actions} / end
> 
> Are application actions strictly required?
> If no, it is better to make it clear.
> Do tunnel items correlate here somehow with tunnel specification in 
> 'struct rte_flow_tunnel'?
> Is it obtained using rte_flow_tunnel_match()?
>

Application actions are not required in steering rule.
{tunnel items} refer to rte flow items that identify tunnel. 
These items correlate to tunnel description in rte_flow_tunnel structure.

> > +
> > +3 Create TMR flow rule.
> > +
> > +3.1 Query PMD for VTP items. Application can query for VTP items 
> > +more
> than once.
> > +
> > +    .. code-block:: c
> > +
> > +      int
> > +      rte_flow_tunnel_match(uint16_t port_id,
> > +                            struct rte_flow_tunnel *tunnel,
> > +                            struct rte_flow_item **pmd_items,
> > +                            uint32_t *num_of_pmd_items,
> > +                            struct rte_flow_error *error);
> > +
> > +3.2 Integrate PMD items into TMR items list.
> > +
> > +3.3 Create TMR flow rule.
> > +
> > +    .. code-block:: console
> > +
> > +      flow create <port> group 0 match {PMD items} / {APP items} / 
> > + end actions {offload actions} / end
> > +
> > +The model provides helper function call to restore packets that 
> > +miss tunnel TMR rules to its original state:
> > +
> > +.. code-block:: c
> > +
> > +  int
> > +  rte_flow_get_restore_info(uint16_t port_id,
> > +                            struct rte_mbuf *mbuf,
> > +                            struct rte_flow_restore_info *info,
> > +                            struct rte_flow_error *error);
> > +
> > +rte_tunnel object filled by the call inside ``rte_flow_restore_info 
> > +*info parameter`` can be used by the application to create new TMR 
> > +rule for that tunnel.
> 
> I think an example, for example, for VXLAN over IPv4 tunnel case with 
> some concrete parameters would be very useful here for understanding.
> Could it be annotated with a description of the transformations 
> happening with a packet on different stages of the processing (including restore example).
>

VXLAN Code example:
Assume application needs to do inner NAT on VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0 pattern eth / ipv4 / udp dst is 4789 / vxlan / end
         actions {pmd actions} / jump group 3 / end

First VXLAN packet that arrives matches the rule in group 0 and jumps to group 3
In group 3 the packet will miss since there is no flow to match and will be uploaded
to application.
Application  will call rte_flow_get_restore_info() to get the packet outer header.
Application will insert a new rule in group 3 to match outer and inner headers:

flow create <port id> ingress group 3 pattern {pmd items} /  eth / ipv4 dst is 172.10.10.1 / udp dst 4789 / vxlan vni is 10 /
        ipv4 dst is 184.1.2.3 / end
         actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of rules will be that VXLAN packet with vni=10, outer IPv4 dst =172.10.10.1 and inner IPv4 dst=184.1.2.3
will be received decaped on queue 3 with IPv4 dst=186.1.1.1

Note: Packet in group 3 is considered decaped. All actions in that group will be done on
header that was inner before decap. Application may specify outer header to be matched on. 
It's PMD responsibility to translate these items to outer metadata. 
 
API usage:
/**
 * 1. Initiate RTE flow tunnel object
 */
struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_info = {
    .tun_id = rte_cpu_to_be_64(10),
  }
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);

/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */ 							  
app_actions  =   jump group 3
rule_items = app_items;  /* eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                                               rule_items, rule_actions, &error);
/**
  * 4. after flow creation application does not need to keep tunnel
  * action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                                                            num_pmd_actions);
							   
/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = eth / ipv4 dst is 172.10.10.1 / udp dst 4789 / vxlan vni is 10 / ipv4 dst is 184.1.2.3
app_actions = set_ipv4_dst  186.1.1.1 / queue index 3

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                                            &num_pmd_items,  &error); 
rule_items = {pmd_items, app_items}; 
rule_actions = app_actions; 
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                                               rule_items, rule_actions, &error);							
/**
 * 7. Release PMD items after rule creation  
 */
rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items);

> > +
> > +The model requirements:
> > +
> > +Software application must initialize rte_tunnel object with tunnel 
> > +parameters before calling
> > +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> > +
> > +PMD actions array obtained in rte_flow_tunnel_decap_set() must be 
> > +released by application with rte_flow_action_release() call.
> > +Application can release the actionsfter TSR rule was created.
> 
> actionsfter ?
> 

Typo. Original text was
"Application can release the actions after TSR rule was created."

> > +
> > +PMD items array obtained with rte_flow_tunnel_match() must be 
> > +released by application with rte_flow_item_release() call.
> > +Application can release the items after rule was created. However, 
> > +if the application needs to create additional TMR rule for the same 
> > +tunnel it will need to obtain PMD items again.
> > +
> > +Application cannot destroy rte_tunnel object before it releases all 
> > +PMD actions & PMD items referencing that tunnel.
> > +
> >  Caveats
> >  -------
> >
> 
> [snip]
> 
> > diff --git a/lib/librte_ethdev/rte_flow.h 
> > b/lib/librte_ethdev/rte_flow.h index b0e4199192..1374b6e5a7 100644
> > --- a/lib/librte_ethdev/rte_flow.h
> > +++ b/lib/librte_ethdev/rte_flow.h
> > @@ -3324,6 +3324,202 @@ int
> >  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
> >  			uint32_t nb_contexts, struct rte_flow_error *error);
> >
> > +/* Tunnel information. */
> > +__rte_experimental
> > +struct rte_flow_ip_tunnel_key {
> > +	rte_be64_t tun_id; /**< Tunnel identification. */
> 
> What is it? Why is it big-endian? Why is it in IP tunnel key?
> I.e. why is it not in a generic structure?
>

tun_id is tunnel identification key, vni value in VXLAN case.
I'll change it to CPU order. 
Also, I will merge tunnel_key structure into rte_tunnel in next version.

> > +	union {
> > +		struct {
> > +			rte_be32_t src_addr; /**< IPv4 source address. */
> > +			rte_be32_t dst_addr; /**< IPv4 destination address.
> */
> > +		} ipv4;
> > +		struct {
> > +			uint8_t src_addr[16]; /**< IPv6 source address. */
> > +			uint8_t dst_addr[16]; /**< IPv6 destination address.
> */
> > +		} ipv6;
> > +	} u;
> > +	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> > +	rte_be16_t tun_flags; /**< Tunnel flags. */
> 
> Which flags? Where are these flags defined?
> Why is it big-endian?
> 

Tunnel flags value, as it appeared in incoming packet.
Since this structure may be used for different tunnel types,
exact definitions are different in each type.
I'll change is to CPU order.

> > +	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> > +	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
> 
> If combine, I'd stick to IPv6 terminology since it is a bit better 
> (well-thought, especially current tendencies in (re)naming in software).
> 
> > +	rte_be32_t label; /**< Flow Label for IPv6. */
> 
> What about IPv6 tunnels with extension headers? How to extend?
> 

These tunnels are not supported at this point.
Structure can be expended in future.

> > +	rte_be16_t tp_src; /**< Tunnel port source. */
> > +	rte_be16_t tp_dst; /**< Tunnel port destination. */
> 
> What about IP-in-IP tunnels? Is it applicable?
> 

No support for IP-in-IP tunnels in current version.
Future releases can have that feature, based on request.

> > +};
> > +
> > +
> > +/* Tunnel has a type and the key information. */ __rte_experimental 
> > +struct rte_flow_tunnel {
> > +	/**
> > +	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> > +	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> > +	 */
> > +	enum rte_flow_item_type		type;
> > +	struct rte_flow_ip_tunnel_key	tun_info; /**< Tunnel key info. */
> 
> How to extended for non-IP tunnels? MPLS?
> Or tunnels with more protocols? E.g. MPLS-over-UDP?
>

Regarding extensions to new tunnels
Basically this structure is a software representor. Each tunnel format can add its own
fields. In the current implementation we target VXLAN tunnel type.
Additional tunnel types can be added in the future per demand with addition of relevant members to 
base structure.
 
> > +};
> > +
> > +/**
> > + * Indicate that the packet has a tunnel.
> > + */
> > +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> > +
> > +/**
> > + * Indicate that the packet has a non decapsulated tunnel header.
> > + */
> > +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> > +
> > +/**
> > + * Indicate that the packet has a group_id.
> > + */
> > +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> > +
> > +/**
> > + * Restore information structure to communicate the current packet 
> > +processing
> > + * state when some of the processing pipeline is done in hardware 
> > +and should
> > + * continue in software.
> > + */
> > +__rte_experimental
> > +struct rte_flow_restore_info {
> > +	/**
> > +	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation
> of
> > +	 * other fields in struct rte_flow_restore_info.
> > +	 */
> > +	uint64_t flags;
> > +	uint32_t group_id; /**< Group ID. */
> 
> What is the group ID here?
>

Flow group ID where partially offloaded packed missed.
 
> > +	struct rte_flow_tunnel tunnel; /**< Tunnel information. */ };
> > +
> > +/**
> > + * Allocate an array of actions to be used in rte_flow_create, to 
> > +implement
> > + * tunnel-decap-set for the given tunnel.
> > + * Sample usage:
> > + *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> > + *            jump group 0 / end
> 
> Why is jump to group used in example above? Is it mandatory?
>

JUMP action is not strictly required.  Application may use any terminating action.
The most common case however is  jump.

> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] tunnel
> > + *   Tunnel properties.
> > + * @param[out] actions
> > + *   Array of actions to be allocated by the PMD. This array should be
> > + *   concatenated with the actions array provided to rte_flow_create.
> 
> Please, specify concatenation order explicitly.
> 

PMD actions precede application actions.
I'll update that spec in next version.

> > + * @param[out] num_of_actions
> > + *   Number of actions allocated.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_decap_set(uint16_t port_id,
> > +			  struct rte_flow_tunnel *tunnel,
> > +			  struct rte_flow_action **actions,
> > +			  uint32_t *num_of_actions,
> 
> Why does approach to specify actions differ here?
> I.e. array of specified size vs END-terminated array?
> Must the actions array be END-terminated here?
> It must be a strong reason to do it and it should be explained.
> 

PMD actions returned in rte_flow_tunnel_decap_set() concatenated with application actions. 
Actions array produced by concatenation is passed to rte_flow_create() and must be END terminated.
PMD actions array is intermediate parameter and does not require END termination.

> > +			  struct rte_flow_error *error);
> > +
> > +/**
> > + * Allocate an array of items to be used in rte_flow_create, to 
> > +implement
> > + * tunnel-match for the given tunnel.
> > + * Sample usage:
> > + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> > + *           inner-header-matches / end
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] tunnel
> > + *   Tunnel properties.
> > + * @param[out] items
> > + *   Array of items to be allocated by the PMD. This array should be
> > + *   concatenated with the items array provided to rte_flow_create.
> 
> Concatenation order/rules should be described.
> Since it is an output which entity does the concatenation.
> Is it allowed to refine PMD rules in application rule specification?
> 

PMD items precede application items.
I'll update this section too.
I'm not sure about the last question about rules refine. Please elaborate.

> > + * @param[out] num_of_items
> > + *   Number of items allocated.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_match(uint16_t port_id,
> > +		      struct rte_flow_tunnel *tunnel,
> > +		      struct rte_flow_item **items,
> > +		      uint32_t *num_of_items,
> 
> Same as above for actions.
> 

PMD items returned in rte_flow_tunnel_match () concatenated with application items.
Items array produced by concatenation is passed to rte_flow_create() and must be END terminated. 
PMD items array is intermediate parameter and does not require END termination.

> > +		      struct rte_flow_error *error);
> > +
> > +/**
> > + * Populate the current packet processing state, if exists, for the 
> > +given
> mbuf.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] m
> > + *   Mbuf struct.
> > + * @param[out] info
> > + *   Restore information. Upon success contains the HW state.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_get_restore_info(uint16_t port_id,
> > +				 struct rte_mbuf *m,
> > +				 struct rte_flow_restore_info *info,
> 
> Is it suggesting to make a copy of the restore info for each mbuf? It 
> sounds very expensive. Could you share your thoughts about it.
> 

Restore info may be different for each packet. For example,
if application declared tunnel info with tunnel type only,
restore info will have outer header data as well.

> > +				 struct rte_flow_error *error);
> > +
> > +/**
> > + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] actions
> > + *   Array of actions to be released.
> > + * @param[in] num_of_actions
> > + *   Number of elements in actions array.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> > +				     struct rte_flow_action *actions,
> > +				     uint32_t num_of_actions,
> 
> Same question as above for actions and items specification approach.
> 

PMD items and actions are intermediate parameters used by application
to create arguments for rte_flow_create() call.

> > +				     struct rte_flow_error *error);
> > +
> > +/**
> > + * Release the item array as allocated by rte_flow_tunnel_match.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] items
> > + *   Array of items to be released.
> > + * @param[in] num_of_items
> > + *   Number of elements in item array.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL. PMDs initialize this
> > + *   structure in case of error only.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +__rte_experimental
> > +int
> > +rte_flow_tunnel_item_release(uint16_t port_id,
> > +			     struct rte_flow_item *items,
> > +			     uint32_t num_of_items,
> 
> Same question as above for actions and items specification approach.
> 

PMD items and actions are intermediate parameters used by application
to create arguments for rte_flow_create() call.

> > +			     struct rte_flow_error *error);
> >  #ifdef __cplusplus
> >  }
> >  #endif
> 
> [snip]
> 
> Andrew.
> 
> (Right now it is hard to fully imagine how to deal with it.
> And it looks like a shim to vendor-specific API. May be I'm wrong. 
> Hopefully the next version will have PMD implementation example and it 
> will shed a bit more light on it.)

The main idea of this API is to hide the internal logic and best practices of specific vendors.
For example, one vendor may use internal registers to save outer header while other vendor
will  only decap packet at the final stage.

I hope that examples above give you better understanding of the API.

PMD implementation is in progress.
However, we are in contact with other vendors that also interested implementing this API.
Since we don't want to block their progress it's important to reach conclusion on general API parts.
In any case, I'll be more than happy to get your review on PMD.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] ethdev: allow negative values in flow rule types
  2020-07-05 13:34   ` Andrew Rybchenko
@ 2020-08-19 14:33     ` Gregory Etelson
  0 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-08-19 14:33 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Matan Azrad, Raslan Darawsheh, Ori Kam

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Sunday, July 5, 2020 16:34
> To: Gregory Etelson <getelson@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh 
> <rasland@mellanox.com>; Ori Kam <orika@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH 1/2] ethdev: allow negative values in 
> flow rule types
> 
> On 6/25/20 7:03 PM, Gregory Etelson wrote:
> > RTE flow items & actions use positive values in item & action type.
> > Negative values are reserved for PMD private types. PMD items & 
> > actions usually are not exposed to application and are not used to 
> > create RTE flows.
> >
> > The patch allows applications with access to PMD flow items & 
> > actions ability to integrate RTE and PMD items & actions and use 
> > them to create flow rule.
> >
> > Signed-off-by: Gregory Etelson <getelson@mellanox.com>
> > Acked-by: Ori Kam <orika@mellanox.com>
> > ---
> >  lib/librte_ethdev/rte_flow.c | 30 ++++++++++++++++++++++++------
> >  1 file changed, 24 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ethdev/rte_flow.c 
> > b/lib/librte_ethdev/rte_flow.c index 1685be5f73..c19d25649f 100644
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -563,7 +563,12 @@ rte_flow_conv_item_spec(void *buf, const size_t
> size,
> >  		}
> >  		break;
> >  	default:
> > -		off = rte_flow_desc_item[item->type].size;
> > +		/**
> > +		 * allow PMD private flow item
> > +		 */
> > +		off = (uint32_t)item->type <= INT_MAX ?
> > +			rte_flow_desc_item[item->type].size :
> > +			sizeof(void *);
> 
> May be it is out-of-scope of the patch (strictly speaking), but usage of 'off'
> variable is very misleading here. It is not used as an offset. It is 
> used as a size to copy.
>

The 'off' variable in that scope refers to object size to copy. 
I did not change it because it's not related to proposed change.
 
> Also it is absolutely unclear why sizeof(void *) is a right size for 
> PMD private flow items.

RTE flow library functions cannot work with PMD private items & actions (elements) 
because RTE flow has no API to query PMD flow object size . In the patch, 
PMD flow elements use object pointer. RTE flow library functions handle PMD element object size 
as size of a pointer. PMD handles its objects internally.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v2 0/4] Tunnel Offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (2 preceding siblings ...)
  2020-07-05 13:39 ` [dpdk-dev] [PATCH 0/2] " Andrew Rybchenko
@ 2020-09-08 20:15 ` Gregory Etelson
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
                     ` (3 more replies)
  2020-09-30  9:18 ` [dpdk-dev] [PATCH v3 0/4] Tunnel Offload API Gregory Etelson
                   ` (13 subsequent siblings)
  17 siblings, 4 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-09-08 20:15 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, orika

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Eli Britstein (1):
  ethdev: tunnel offload model

Gregory Etelson (3):
  ethdev: allow negative values in flow rule types
  net/mlx5: implement tunnel offload API
  app/testpmd: support tunnel offload API

 app/test-pmd/cmdline_flow.c              | 102 ++++-
 app/test-pmd/config.c                    | 147 +++++++-
 app/test-pmd/testpmd.c                   |   5 +-
 app/test-pmd/testpmd.h                   |  27 +-
 app/test-pmd/util.c                      |  30 +-
 doc/guides/prog_guide/rte_flow.rst       | 105 ++++++
 drivers/net/mlx5/linux/mlx5_os.c         |  14 +
 drivers/net/mlx5/mlx5.c                  |   6 +
 drivers/net/mlx5/mlx5.h                  |   4 +
 drivers/net/mlx5/mlx5_flow.c             | 453 +++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h             |  49 +++
 drivers/net/mlx5/mlx5_flow_dv.c          |  71 +++-
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 lib/librte_ethdev/rte_flow.c             | 142 ++++++-
 lib/librte_ethdev/rte_flow.h             | 195 ++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++
 16 files changed, 1370 insertions(+), 17 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-08 20:15 ` [dpdk-dev] [PATCH v2 0/4] Tunnel Offload API Gregory Etelson
@ 2020-09-08 20:15   ` Gregory Etelson
  2020-09-15  4:36     ` Ajit Khaparde
  2020-09-15  8:45     ` Andrew Rybchenko
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 2/4] ethdev: tunnel offload model Gregory Etelson
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-09-08 20:15 UTC (permalink / raw)
  To: dev
  Cc: matan, rasland, orika, Gregory Etelson, Ori Kam, Thomas Monjalon,
	Ferruh Yigit, Andrew Rybchenko

From: Gregory Etelson <getelson@mellanox.com>

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow library functions cannot work with PMD private items and
actions (elements) because RTE flow has no API to query PMD flow
object size. In the patch, PMD flow elements use object pointer.
RTE flow library functions handle PMD element object size as
size of a pointer. PMD handles its objects internally.

Signed-off-by: Gregory Etelson <getelson@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
---
v2:
* Update commit log
---
 lib/librte_ethdev/rte_flow.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index f8fdd68fe9..9905426bc9 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -564,7 +564,12 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_item[item->type].size;
+		/**
+		 * allow PMD private flow item
+		 */
+		off = (uint32_t)item->type <= INT_MAX ?
+			rte_flow_desc_item[item->type].size :
+			sizeof(void *);
 		rte_memcpy(buf, data, (size > off ? off : size));
 		break;
 	}
@@ -667,7 +672,12 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_action[action->type].size;
+		/**
+		 * allow PMD private flow action
+		 */
+		off = (uint32_t)action->type <= INT_MAX ?
+			rte_flow_desc_action[action->type].size :
+			sizeof(void *);
 		rte_memcpy(buf, action->conf, (size > off ? off : size));
 		break;
 	}
@@ -709,8 +719,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
-		    !rte_flow_desc_item[src->type].name)
+		/**
+		 * allow PMD private flow item
+		 */
+		if (((uint32_t)src->type <= INT_MAX) &&
+			((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
+		    !rte_flow_desc_item[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
 				 "cannot convert unknown item type");
@@ -798,8 +812,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
-		    !rte_flow_desc_action[src->type].name)
+		/**
+		 * allow PMD private flow action
+		 */
+		if (((uint32_t)src->type <= INT_MAX) &&
+		    ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
+		    !rte_flow_desc_action[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				 src, "cannot convert unknown action type");
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v2 2/4] ethdev: tunnel offload model
  2020-09-08 20:15 ` [dpdk-dev] [PATCH v2 0/4] Tunnel Offload API Gregory Etelson
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-09-08 20:15   ` Gregory Etelson
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 3/4] net/mlx5: implement tunnel offload API Gregory Etelson
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 4/4] app/testpmd: support " Gregory Etelson
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-09-08 20:15 UTC (permalink / raw)
  To: dev
  Cc: matan, rasland, orika, Eli Britstein, Ori Kam, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, Thomas Monjalon,
	Ferruh Yigit, Andrew Rybchenko

From: Eli Britstein <elibr@mellanox.com>

Rte_flow API provides the building blocks for vendor agnostic flow
classification offloads.  The rte_flow match and action primitives are
fine grained, thus enabling DPDK applications the flexibility to
offload network stacks and complex pipelines.

Applications wishing to offload complex data structures (e.g. tunnel
virtual ports) are required to use the rte_flow primitives, such as
group, meta, mark, tag and others to model their high level objects.

The hardware model design for high level software objects is not
trivial.  Furthermore, an optimal design is often vendor specific.

The goal of this API is to provide applications with the hardware
offload model for common high level software objects which is optimal
in regards to the underlying hardware.

Tunnel ports are the first of such objects.

Tunnel ports
------------
Ingress processing of tunneled traffic requires the classification of
the tunnel type followed by a decap action.

In software, once a packet is decapsulated the in_port field is
changed to a virtual port representing the tunnel type. The outer
header fields are stored as packet metadata members and may be matched
by proceeding flows.

Openvswitch, for example, uses two flows:
1. classification flow - setting the virtual port representing the
tunnel type For example: match on udp port 4789
actions=tnl_pop(vxlan_vport)
2. steering flow according to outer and inner header matches match on
in_port=vxlan_vport and outer/inner header matches actions=forward to
p ort X The benefits of multi-flow tables are described in [1].

Offloading tunnel ports
-----------------------
Tunnel ports introduce a new stateless field that can be matched on.
Currently the rte_flow library provides an API to encap, decap and
match on tunnel headers. However, there is no rte_flow primitive to
set and match tunnel virtual ports.

There are several possible hardware models for offloading virtual
tunnel port flows including, but not limited to, the following:
1. Setting the virtual port on a hw register using the
rte_flow_action_mark/ rte_flow_action_tag/rte_flow_set_meta objects.
2. Mapping a virtual port to an rte_flow group
3. Avoiding the need to match on transient objects by merging
multi-table flows to a single rte_flow rule.

Every approach has its pros and cons.  The preferred approach should
take into account the entire system architecture and is very often
vendor specific.

The proposed rte_flow_tunnel_decap_set helper function (drafted below)
is designed to provide a common, vendor agnostic, API for setting the
virtual port value.  The helper API enables PMD implementations to
return vendor specific combination of rte_flow actions realizing the
vendor's hardware model for setting a tunnel port.  Applications may
append the list of actions returned from the helper function when
creating an rte_flow rule in hardware.

Similarly, the rte_flow_tunnel_match helper (drafted below)
allows for multiple hardware implementations to return a list of
fte_flow items.

Miss handling
-------------
Packets going through multiple rte_flow groups are exposed to hw
misses due to partial packet processing. In such cases, the software
should continue the packet's processing from the point where the
hardware missed.

We propose a generic rte_flow_restore structure providing the state
that was stored in hardware when the packet missed.

Currently, the structure will provide the tunnel state of the packet
that missed, namely:
1. The group id that missed
2. The tunnel port that missed
3. Tunnel information that was stored in memory (due to decap action).
In the future, we may add additional fields as more state may be
stored in the device memory (e.g. ct_state).

Applications may query the state via a new
rte_flow_tunnel_get_restore_info(mbuf) API, thus allowing
a vendor specific implementation.

VXLAN Code example:
Assume application needs to do inner NAT on VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

First VXLAN packet that arrives matches the rule in group 0 and jumps
to group 3 In group 3 the packet will miss since there is no flow to
match and will be uploaded to application.  Application  will call
rte_flow_get_restore_info() to get the packet outer header.
Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of rules will be that VXLAN packet with vni=10, outer IPv4
dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decaped
on queue 3 with IPv4 dst=186.1.1.1

Note: Packet in group 3 is considered decaped. All actions in that
group will be done on header that was inner before decap. Application
may specify outer header to be matched on.  It's PMD responsibility to
translate these items to outer metadata.

API usage:
/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);

/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);
/**
  * 4. after flow creation application does not need to keep tunnel
  * action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);

/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
v2:
* Update commit log
---
 doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 5 files changed, 449 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 3e5cd1e0d8..827ea0ca76 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3018,6 +3018,111 @@ operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Provide software application with unified rules model for tunneled traffic
+regardless underlying hardware.
+
+ - The model introduces a concept of a virtual tunnel port (VTP).
+ - The model uses VTP to offload ingress tunneled network traffic 
+   with RTE flow rules.
+ - The model is implemented as set of helper functions. Each PMD
+   implements VTP offload according to underlying hardware offload
+   capabilities.  Applications must query PMD for VTP flow
+   items / actions before using in creation of a VTP flow rule.
+
+The model components:
+
+- Virtual Tunnel Port (VTP) is a stateless software object that
+  describes tunneled network traffic.  VTP object usually contains
+  descriptions of outer headers, tunnel headers and inner headers.
+- Tunnel Steering flow Rule (TSR) detects tunneled packets and
+  delegates them to tunnel processing infrastructure, implemented
+  in PMD for optimal hardware utilization, for further processing.
+- Tunnel Matching flow Rule (TMR) verifies packet configuration and
+  runs offload actions in case of a match.
+
+Application actions:
+
+1 Initialize VTP object according to tunnel network parameters.
+
+2 Create TSR flow rule.
+
+2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
+
+  .. code-block:: c
+
+    int
+    rte_flow_tunnel_decap_set(uint16_t port_id,
+                              struct rte_flow_tunnel *tunnel,
+                              struct rte_flow_action **pmd_actions,
+                              uint32_t *num_of_pmd_actions,
+                              struct rte_flow_error *error);
+
+2.2 Integrate PMD actions into TSR actions list.
+
+2.3 Create TSR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
+
+3 Create TMR flow rule.
+
+3.1 Query PMD for VTP items. Application can query for VTP items more than once.
+
+    .. code-block:: c
+
+      int
+      rte_flow_tunnel_match(uint16_t port_id,
+                            struct rte_flow_tunnel *tunnel,
+                            struct rte_flow_item **pmd_items,
+                            uint32_t *num_of_pmd_items,
+                            struct rte_flow_error *error);
+
+3.2 Integrate PMD items into TMR items list.
+
+3.3 Create TMR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
+
+The model provides helper function call to restore packets that miss
+tunnel TMR rules to its original state:
+
+.. code-block:: c
+
+  int
+  rte_flow_get_restore_info(uint16_t port_id,
+                            struct rte_mbuf *mbuf,
+                            struct rte_flow_restore_info *info,
+                            struct rte_flow_error *error);
+
+rte_tunnel object filled by the call inside
+``rte_flow_restore_info *info parameter`` can be used by the application
+to create new TMR rule for that tunnel.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+Application can release the actionsfter TSR rule was created.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release the items after rule was created. However, if the application
+needs to create additional TMR rule for the same tunnel it will need
+to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index 1212a17d32..8bb6b99d4a 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -241,6 +241,11 @@ EXPERIMENTAL {
 	__rte_ethdev_trace_rx_burst;
 	__rte_ethdev_trace_tx_burst;
 	rte_flow_get_aged_flows;
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_tunnel_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
 };
 
 INTERNAL {
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 9905426bc9..23e364f337 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1269,3 +1269,115 @@ rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, rte_strerror(ENOTSUP));
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_get_restore_info(uint16_t port_id,
+				 struct rte_mbuf *m,
+				 struct rte_flow_restore_info *restore_info,
+				 struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index da8bfa5489..d485fb2f77 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3357,6 +3357,201 @@ int
 rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 			uint32_t nb_contexts, struct rte_flow_error *error);
 
+/* Tunnel has a type and the key information. */
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type	type;
+	uint64_t tun_id; /**< Tunnel identification. */
+
+	RTE_STD_C11
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	};
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+	uint16_t   tun_flags; /**< Tunnel flags. */
+
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+
+	/**
+	 * following members required to restore packet
+	 * after miss
+	 */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	uint32_t label; /**< Flow Label for IPv6. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID where packed missed */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_get_restore_info(uint16_t port_id,
+				 struct rte_mbuf *m,
+				 struct rte_flow_restore_info *info,
+				 struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 881cc469b7..ad1d7a2cdc 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -107,6 +107,38 @@ struct rte_flow_ops {
 		 void **context,
 		 uint32_t nb_contexts,
 		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v2 3/4] net/mlx5: implement tunnel offload API
  2020-09-08 20:15 ` [dpdk-dev] [PATCH v2 0/4] Tunnel Offload API Gregory Etelson
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 2/4] ethdev: tunnel offload model Gregory Etelson
@ 2020-09-08 20:15   ` Gregory Etelson
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 4/4] app/testpmd: support " Gregory Etelson
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-09-08 20:15 UTC (permalink / raw)
  To: dev
  Cc: matan, rasland, orika, Gregory Etelson, Matan Azrad,
	Shahaf Shuler, Viacheslav Ovsiienko

From: Gregory Etelson <getelson@mellanox.com>

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions whith tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@mellanox.com>
---
v2:
* introduce MLX5 PMD API implementation
---
 drivers/net/mlx5/linux/mlx5_os.c |  14 +
 drivers/net/mlx5/mlx5.c          |   6 +
 drivers/net/mlx5/mlx5.h          |   4 +
 drivers/net/mlx5/mlx5_flow.c     | 453 +++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_flow.h     |  49 ++++
 drivers/net/mlx5/mlx5_flow_dv.c  |  71 ++++-
 6 files changed, 595 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 69123e12c3..7193398750 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -281,6 +281,12 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		sh->esw_drop_action = mlx5_glue->dr_create_flow_action_drop();
 	}
 #endif
+	if (!sh->tunnel_hub)
+		err = mlx5_alloc_tunnel_hub(sh);
+	if (err) {
+		DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
+		goto error;
+	}
 	if (priv->config.reclaim_mode == MLX5_RCM_AGGR) {
 		mlx5_glue->dr_reclaim_domain_memory(sh->rx_domain, 1);
 		mlx5_glue->dr_reclaim_domain_memory(sh->tx_domain, 1);
@@ -319,6 +325,10 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 	return err;
 }
@@ -372,6 +382,10 @@ mlx5_os_free_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 }
 
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1e4c695f84..569070d3db 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -177,6 +177,9 @@
 /* Decap will be used or not. */
 #define MLX5_DECAP_EN "decap_en"
 
+/* Configure flow tunnel offload functionality */
+#define MLX5_TUNNEL_OFFLOAD "tunnel_offload"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1621,6 +1624,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 		config->sys_mem_en = !!tmp;
 	} else if (strcmp(MLX5_DECAP_EN, key) == 0) {
 		config->decap_en = !!tmp;
+	} else if (strcmp(MLX5_TUNNEL_OFFLOAD, key) == 0) {
+		config->tunnel_offload = !!tmp;
 	} else {
 		DRV_LOG(WARNING, "%s: unknown parameter", key);
 		rte_errno = EINVAL;
@@ -1681,6 +1686,7 @@ mlx5_args(struct mlx5_dev_config *config, struct rte_devargs *devargs)
 		MLX5_RECLAIM_MEM,
 		MLX5_SYS_MEM_EN,
 		MLX5_DECAP_EN,
+		MLX5_TUNNEL_OFFLOAD,
 		NULL,
 	};
 	struct rte_kvargs *kvlist;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 78d6eb7281..e450aec029 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -208,6 +208,7 @@ struct mlx5_dev_config {
 	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	unsigned int sys_mem_en:1; /* The default memory allocator. */
 	unsigned int decap_en:1; /* Whether decap will be used or not. */
+	unsigned int tunnel_offload:1; /* Flow tunnel offload functionality */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
@@ -605,6 +606,7 @@ struct mlx5_dev_ctx_shared {
 	LIST_ENTRY(mlx5_dev_ctx_shared) next;
 	uint32_t refcnt;
 	uint32_t devx:1; /* Opened with DV. */
+	uint32_t tunnel:1; /* 1 RTE flow tunnel enabled */
 	uint32_t max_port; /* Maximal IB device port index. */
 	void *ctx; /* Verbs/DV/DevX context. */
 	void *pd; /* Protection Domain. */
@@ -634,6 +636,8 @@ struct mlx5_dev_ctx_shared {
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
 	struct mlx5_hlist *flow_tbls;
+	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
+	struct mlx5_flow_tunnel_hub *tunnel_hub;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
 	void *pop_vlan_action; /* Pointer to DR pop VLAN action. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 71501730b5..7d4bddee39 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -18,6 +18,7 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 #include <rte_ip.h>
+#include <rte_hash.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -30,6 +31,13 @@
 #include "mlx5_flow_os.h"
 #include "mlx5_rxtx.h"
 
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id);
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev, struct mlx5_flow_tunnel *tunnel);
+static uint32_t
+mlx5_mark_to_tunnel_id(uint32_t mark);
+
 /** Device flow drivers. */
 extern const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops;
 
@@ -220,6 +228,169 @@ static const struct rte_flow_expand_node mlx5_support_expansion[] = {
 	},
 };
 
+static inline bool
+mlx5_flow_tunnel_validate(__rte_unused struct rte_eth_dev *dev,
+			  struct rte_flow_tunnel *tunnel)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+
+	if (!priv->config.tunnel_offload || !tunnel)
+		goto err;
+
+	switch (tunnel->type) {
+	default:
+		goto err;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		break;
+	}
+
+	return true;
+
+err:
+	return false;
+}
+
+static int
+mlx5_flow_tunnel_set(struct rte_eth_dev *dev,
+		    struct rte_flow_tunnel *app_tunnel,
+		    struct rte_flow_action **actions,
+		    uint32_t *num_of_actions,
+		    struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	if (!mlx5_flow_tunnel_validate(dev, app_tunnel))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  "invalid argument");
+
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  "failed to match pmd tunnel");
+	}
+	rte_atomic32_inc(&tunnel->refctn);
+	*actions = &tunnel->action;
+	*num_of_actions = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_match(struct rte_eth_dev *dev,
+		       struct rte_flow_tunnel *app_tunnel,
+		       struct rte_flow_item **items,
+		       uint32_t *num_of_items,
+		       struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	if (!mlx5_flow_tunnel_validate(dev, app_tunnel))
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "failed to match pmd tunnel");
+	}
+
+	rte_atomic32_inc(&tunnel->refctn);
+	*items = &tunnel->item;
+	*num_of_items = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_item_release(struct rte_eth_dev *dev,
+		       struct rte_flow_item *pmd_items,
+		       uint32_t num_items, struct rte_flow_error *err)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->item == pmd_items)
+			break;
+	}
+	if (!tun || num_items != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (rte_atomic32_dec_and_test(&tun->refctn))
+		mlx5_flow_tunnel_free(dev, tun);
+	return 0;
+}
+
+static int
+mlx5_flow_action_release(struct rte_eth_dev *dev,
+			 struct rte_flow_action *pmd_actions,
+			 uint32_t num_actions, struct rte_flow_error *err)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->action == pmd_actions)
+			break;
+	}
+	if (!tun || num_actions != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (rte_atomic32_dec_and_test(&tun->refctn))
+		mlx5_flow_tunnel_free(dev, tun);
+
+	return 0;
+}
+
+static void
+mlx5_restore_packet_outer(struct rte_eth_dev *dev __rte_unused,
+			  struct mlx5_flow_tunnel *tunnel __rte_unused,
+			  struct rte_mbuf *m __rte_unused)
+{
+}
+
+static int
+mlx5_flow_tunnel_get_restore_info(struct rte_eth_dev *dev,
+				  struct rte_mbuf *m,
+				  struct rte_flow_restore_info *info,
+				  struct rte_flow_error *err)
+{
+	uint32_t id;
+	uint64_t ol_flags = m->ol_flags;
+	struct mlx5_flow_tunnel *tunnel;
+	const uint64_t mask = PKT_RX_FDIR | PKT_RX_FDIR_ID;
+
+	if ((ol_flags & mask) != mask)
+		goto err;
+	id = mlx5_mark_to_tunnel_id(m->hash.fdir.hi);
+	if (!id)
+		goto err;
+	tunnel = mlx5_find_tunnel_id(dev, id);
+	if (!tunnel)
+		goto err;
+	mlx5_restore_packet_outer(dev, tunnel, m);
+	memcpy(&info->tunnel, &tunnel->app_tunnel, sizeof(info->tunnel));
+	m->ol_flags &= ~PKT_RX_FDIR;
+	info->group_id = -1u;
+	info->flags = RTE_FLOW_RESTORE_INFO_TUNNEL |
+		      RTE_FLOW_RESTORE_INFO_ENCAPSULATED;
+
+	return 0;
+
+err:
+	return rte_flow_error_set(err, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "failed to get restore info");
+}
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -229,6 +400,11 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
 	.get_aged_flows = mlx5_flow_get_aged_flows,
+	.tunnel_decap_set = mlx5_flow_tunnel_set,
+	.tunnel_match = mlx5_flow_tunnel_match,
+	.action_release = mlx5_flow_action_release,
+	.item_release = mlx5_flow_item_release,
+	.get_restore_info = mlx5_flow_tunnel_get_restore_info,
 };
 
 /* Convert FDIR request to Generic flow. */
@@ -3524,6 +3700,104 @@ flow_hairpin_split(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static uint32_t
+mlx5_tunnel_id_to_mark(uint32_t id)
+{
+	return (!id || id >= MLX5_MAX_TUNNELS) ?
+		0 : (id << 16);
+}
+
+static uint32_t
+mlx5_mark_to_tunnel_id(uint32_t mark)
+{
+	return mark & MLX5_TUNNEL_MARK_MASK ?
+		mark >> 16 : 0;
+}
+
+static int
+flow_tunnel_add_default_miss(struct rte_eth_dev *dev,
+			     struct rte_flow *flow,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *app_actions,
+			     uint32_t flow_idx,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_flow *dev_flow;
+	struct rte_flow_attr miss_attr = *attr;
+	const struct mlx5_flow_tunnel *tunnel = app_actions[0].conf;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	uint16_t queue[priv->reta_idx_n];
+	struct rte_flow_action_rss action_rss = {
+		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		.level = 0,
+		.types = priv->rss_conf.rss_hf,
+		.key_len = priv->rss_conf.rss_key_len,
+		.queue_num = priv->reta_idx_n,
+		.key = priv->rss_conf.rss_key,
+		.queue = queue,
+	};
+	const struct rte_flow_action_mark miss_mark = {
+		.id = mlx5_tunnel_id_to_mark(tunnel->tunnel_id)
+	};
+	const struct rte_flow_item *items, miss_items[2] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		}
+	};
+	const struct rte_flow_action *actions, miss_actions[3] = {
+		{ .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &miss_mark },
+		{ .type = RTE_FLOW_ACTION_TYPE_RSS, .conf = &action_rss },
+		{ .type = RTE_FLOW_ACTION_TYPE_END, .conf = NULL }
+	};
+	const struct rte_flow_action_jump *jump_data;
+	uint32_t i;
+
+	if (!priv->reta_idx_n || !priv->rxqs_n)
+		return rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				   NULL, "invalid port configuration");
+	if (!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG))
+		action_rss.types = 0;
+	for (i = 0; i != priv->reta_idx_n; ++i)
+		queue[i] = (*priv->reta_idx)[i];
+
+	if (!miss_mark.id)
+		rte_flow_error_set(error, EINVAL,
+				   RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				   NULL, "invalid tunnel id");
+	items = (typeof(items))miss_items;
+	actions = (typeof(actions))miss_actions;
+	for (; app_actions->type != RTE_FLOW_ACTION_TYPE_JUMP; app_actions++);
+	jump_data = app_actions->conf;
+	miss_attr.priority = 3;
+	miss_attr.group = TUNNEL_STEER_GROUP(jump_data->group);
+	dev_flow = flow_drv_prepare(dev, flow, &miss_attr,
+				    items, actions, flow_idx, error);
+	if (!dev_flow)
+		return -rte_errno;
+	dev_flow->flow = flow;
+	dev_flow->external = true;
+	/* Subflow object was created, we must include one in the list. */
+	SILIST_INSERT(&flow->dev_handles, dev_flow->handle_idx,
+		      dev_flow->handle, next);
+
+	DRV_LOG(DEBUG,
+		"port %u tunnel type=%d id=%u miss rule priority=%u group=%u",
+		dev->data->port_id, tunnel->app_tunnel.type,
+		tunnel->tunnel_id, miss_attr.priority, miss_attr.group);
+	return flow_drv_translate(dev, dev_flow, &miss_attr, items,
+				  actions, error);
+}
+
 /**
  * The last stage of splitting chain, just creates the subflow
  * without any modification.
@@ -4296,6 +4570,27 @@ flow_create_split_outer(struct rte_eth_dev *dev,
 	return ret;
 }
 
+static struct mlx5_flow_tunnel *
+flow_tunnel_from_rule(struct rte_eth_dev *dev,
+		      const struct rte_flow_attr *attr,
+		      const struct rte_flow_item items[],
+		      const struct rte_flow_action actions[])
+{
+	struct mlx5_flow_tunnel *tunnel;
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)items[0].spec;
+	else if (is_flow_tunnel_steer_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)actions[0].conf;
+	else
+		tunnel = NULL;
+#pragma GCC diagnostic pop
+
+	return tunnel;
+}
+
 /**
  * Create a flow and add it to @p list.
  *
@@ -4356,6 +4651,7 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	int hairpin_flow;
 	uint32_t hairpin_id = 0;
 	struct rte_flow_attr attr_tx = { .priority = 0 };
+	struct mlx5_flow_tunnel *tunnel;
 	int ret;
 
 	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
@@ -4430,6 +4726,15 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 					      error);
 		if (ret < 0)
 			goto error;
+		if (is_flow_tunnel_steer_rule(dev, attr,
+					      buf->entry[i].pattern,
+					      p_actions_rx)) {
+			ret = flow_tunnel_add_default_miss(dev, flow, attr,
+							   p_actions_rx,
+							   idx, error);
+			if (ret < 0)
+				goto error;
+		}
 	}
 	/* Create the tx flow. */
 	if (hairpin_flow) {
@@ -4484,6 +4789,12 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	priv->flow_idx = priv->flow_nested_idx;
 	if (priv->flow_nested_idx)
 		priv->flow_nested_idx = 0;
+	tunnel = flow_tunnel_from_rule(dev, attr, items, actions);
+	if (tunnel) {
+		flow->tunnel = 1;
+		flow->tunnel_id = tunnel->tunnel_id;
+		rte_atomic32_inc(&tunnel->refctn);
+	}
 	return idx;
 error:
 	MLX5_ASSERT(flow);
@@ -4657,6 +4968,13 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 		}
 	}
 	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+	if (flow->tunnel) {
+		struct mlx5_flow_tunnel *tunnel;
+		tunnel = mlx5_find_tunnel_id(dev, flow->tunnel_id);
+		RTE_VERIFY(tunnel);
+		if (rte_atomic32_dec_and_test(&tunnel->refctn))
+			mlx5_flow_tunnel_free(dev, tunnel);
+	}
 }
 
 /**
@@ -6301,3 +6619,138 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 		 dev->data->port_id);
 	return -ENOTSUP;
 }
+
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev,
+		      struct mlx5_flow_tunnel *tunnel)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+
+	DRV_LOG(DEBUG, "port %u release pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+	RTE_VERIFY(!rte_atomic32_read(&tunnel->refctn));
+	LIST_REMOVE(tunnel, chain);
+	mlx5_flow_id_release(id_pool, tunnel->tunnel_id);
+	free(tunnel);
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (tun->tunnel_id == id)
+			break;
+	}
+
+	return tun;
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_flow_tunnel_allocate(struct rte_eth_dev *dev,
+			  const struct rte_flow_tunnel *app_tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+	uint32_t id;
+
+	ret = mlx5_flow_id_get(id_pool, &id);
+	if (ret)
+		return NULL;
+	/**
+	 * mlx5 flow tunnel is an auxlilary data structure
+	 * It's not part of IO. No need to allocate it from
+	 * huge pages pools dedicated for IO
+	 */
+	tunnel = calloc(1, sizeof(*tunnel));
+	if (!tunnel) {
+		mlx5_flow_id_release(id_pool, id);
+		return NULL;
+	}
+	/* initiate new PMD tunnel */
+	memcpy(&tunnel->app_tunnel, app_tunnel, sizeof(*app_tunnel));
+	rte_atomic32_init(&tunnel->refctn);
+	tunnel->tunnel_id = id;
+	tunnel->action.type = MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET;
+	tunnel->action.conf = tunnel;
+	tunnel->item.type = MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL;
+	tunnel->item.spec = tunnel;
+	tunnel->item.last = NULL;
+	tunnel->item.mask = NULL;
+	DRV_LOG(DEBUG, "port %u new pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+
+	return tunnel;
+}
+
+int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel)
+{
+	int ret;
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (!memcmp(app_tunnel, &tun->app_tunnel,
+			    sizeof(*app_tunnel))) {
+			*tunnel = tun;
+			ret = 0;
+			break;
+		}
+	}
+	if (!tun) {
+		tun = mlx5_flow_tunnel_allocate(dev, app_tunnel);
+		if (tun) {
+			LIST_INSERT_HEAD(&thub->tunnels, tun, chain);
+			*tunnel = tun;
+		} else {
+			ret = -ENOMEM;
+		}
+	}
+
+	return ret;
+}
+
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->tunnel_hub)
+		return;
+	RTE_VERIFY(LIST_EMPTY(&sh->tunnel_hub->tunnels));
+	mlx5_flow_id_pool_release(sh->tunnel_hub->tunnel_ids);
+	free(sh->tunnel_hub);
+}
+
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
+{
+	int err;
+
+	sh->tunnel_hub = calloc(1, sizeof(*sh->tunnel_hub));
+	if (!sh->tunnel_hub)
+		return -ENOMEM;
+	LIST_INIT(&sh->tunnel_hub->tunnels);
+	sh->tunnel_hub->tunnel_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TUNNELS);
+	if (!sh->tunnel_hub->tunnel_ids) {
+		free(sh->tunnel_hub);
+		err = -rte_errno;
+		goto out;
+	}
+	err = 0;
+
+out:
+	return err;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 66caefce46..5d2be7d123 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -26,6 +26,7 @@ enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
 	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
 
 /* Private (internal) rte flow actions. */
@@ -35,6 +36,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_MARK,
 	MLX5_RTE_FLOW_ACTION_TYPE_COPY_MREG,
 	MLX5_RTE_FLOW_ACTION_TYPE_DEFAULT_MISS,
+	MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET,
 };
 
 /* Matches on selected register. */
@@ -196,6 +198,7 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_SET_IPV6_DSCP (1ull << 33)
 #define MLX5_FLOW_ACTION_AGE (1ull << 34)
 #define MLX5_FLOW_ACTION_DEFAULT_MISS (1ull << 35)
+#define MLX5_FLOW_ACTION_TUNNEL_TYPE1 (1ull << 36)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -816,6 +819,45 @@ struct mlx5_fdir_flow {
 
 #define HAIRPIN_FLOW_ID_BITS 28
 
+#define MLX5_MAX_TUNNELS 63
+#define MLX5_TUNNEL_MARK_MASK 0x3F0000u
+#define TUNNEL_STEER_GROUP(grp) ((grp) | (1u << 31))
+
+struct mlx5_flow_tunnel {
+	LIST_ENTRY(mlx5_flow_tunnel) chain;
+	struct rte_flow_tunnel app_tunnel;	/** app tunnel copy */
+	uint32_t tunnel_id;		/** unique tunnel ID */
+	rte_atomic32_t refctn;
+	struct rte_flow_action action;
+	struct rte_flow_item item;
+};
+
+/** PMD tunnel related context */
+struct mlx5_flow_tunnel_hub {
+	LIST_HEAD(, mlx5_flow_tunnel) tunnels;
+	struct mlx5_flow_id_pool *tunnel_ids;
+};
+
+static inline bool
+is_flow_tunnel_match_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (items[0].type == (typeof(items[0].type))
+			MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL);
+}
+
+static inline bool
+is_flow_tunnel_steer_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (actions[0].type == (typeof(actions[0].type))
+			MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET);
+}
+
 /* Flow structure. */
 struct rte_flow {
 	ILIST_ENTRY(uint32_t)next; /**< Index to the next flow structure. */
@@ -823,12 +865,14 @@ struct rte_flow {
 	/**< Device flow handles that are part of the flow. */
 	uint32_t drv_type:2; /**< Driver type. */
 	uint32_t fdir:1; /**< Identifier of associated FDIR if any. */
+	uint32_t tunnel:1;
 	uint32_t hairpin_flow_id:HAIRPIN_FLOW_ID_BITS;
 	/**< The flow id used for hairpin. */
 	uint32_t copy_applied:1; /**< The MARK copy Flow os applied. */
 	uint32_t rix_mreg_copy;
 	/**< Index to metadata register copy table resource. */
 	uint32_t counter; /**< Holds flow counter. */
+	uint32_t tunnel_id;  /**< Tunnel id */
 	uint16_t meter; /**< Holds flow meter id. */
 } __rte_packed;
 
@@ -1045,4 +1089,9 @@ int mlx5_flow_destroy_policer_rules(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr);
 int mlx5_flow_meter_flush(struct rte_eth_dev *dev,
 			  struct rte_mtr_error *error);
+int mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+			 const struct rte_flow_tunnel *app_tunnel,
+			 struct mlx5_flow_tunnel **tunnel);
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 53399800ff..21af7fb93a 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3676,6 +3676,8 @@ flow_dv_validate_action_jump(const struct rte_flow_action *action,
 					  NULL, "action configuration not set");
 	target_group =
 		((const struct rte_flow_action_jump *)action->conf)->group;
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_TYPE1)
+		target_group = TUNNEL_STEER_GROUP(target_group);
 	ret = mlx5_flow_group_to_table(attributes, external, target_group,
 				       true, &table, error);
 	if (ret)
@@ -5035,6 +5037,15 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  NULL, "item not supported");
 		switch (type) {
+		case MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL:
+			if (items[0].type != (typeof(items[0].type))
+						MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ITEM,
+						NULL, "MLX5 private items "
+						"must be the first");
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -5699,6 +5710,17 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			action_flags |= MLX5_FLOW_ACTION_SET_IPV6_DSCP;
 			rw_act_num += MLX5_ACT_NUM_SET_DSCP;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			if (actions[0].type != (typeof(actions[0].type))
+				MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL, "MLX5 private action "
+						"must be the first");
+
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_TYPE1;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5706,6 +5728,31 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  "action not supported");
 		}
 	}
+	/*
+	 * Validate flow tunnel decap_set rule
+	 */
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_TYPE1) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_DECAP |
+						MLX5_FLOW_ACTION_MARK;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set decap rule");
+		if (!(action_flags & MLX5_FLOW_ACTION_JUMP))
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel set decap rule must terminate "
+					"with JUMP");
+		if (!attr->ingress)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel flows for ingress traffic only");
+	}
 	/*
 	 * Validate the drop action mutual exclusion with other actions.
 	 * Drop action is mutually-exclusive with any other action, except for
@@ -8110,11 +8157,14 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 	uint8_t next_protocol = 0xff;
 	struct rte_vlan_hdr vlan = { 0 };
 	uint32_t table;
+	uint32_t attr_group;
 	int ret = 0;
 
+	attr_group = !is_flow_tunnel_match_rule(dev, attr, items, actions) ?
+			attr->group : TUNNEL_STEER_GROUP(attr->group);
 	mhdr_res->ft_type = attr->egress ? MLX5DV_FLOW_TABLE_TYPE_NIC_TX :
 					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
-	ret = mlx5_flow_group_to_table(attr, dev_flow->external, attr->group,
+	ret = mlx5_flow_group_to_table(attr, dev_flow->external, attr_group,
 				       !!priv->fdb_def_rule, &table, error);
 	if (ret)
 		return ret;
@@ -8125,6 +8175,15 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		priority = dev_conf->flow_prio - 1;
 	/* number of actions must be set to 0 in case of dirty stack. */
 	mhdr_res->actions_num = 0;
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		if (flow_dv_create_action_l2_decap(dev, dev_flow,
+						   attr->transfer,
+						   error))
+			return -rte_errno;
+		dev_flow->dv.actions[actions_n++] =
+				dev_flow->dv.encap_decap->action;
+		action_flags |= MLX5_FLOW_ACTION_DECAP;
+	}
 	for (; !actions_end ; actions++) {
 		const struct rte_flow_action_queue *queue;
 		const struct rte_flow_action_rss *rss;
@@ -8134,6 +8193,7 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		const struct rte_flow_action_meter *mtr;
 		struct mlx5_flow_tbl_resource *tbl;
 		uint32_t port_id = 0;
+		uint32_t jump_group;
 		struct mlx5_flow_dv_port_id_action_resource port_id_resource;
 		int action_type = actions->type;
 		const struct rte_flow_action *found_action = NULL;
@@ -8145,6 +8205,9 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 						  actions,
 						  "action not supported");
 		switch (action_type) {
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_TYPE1;
+			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -8377,8 +8440,12 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_data = action->conf;
+			jump_group = !(action_flags &
+					MLX5_FLOW_ACTION_TUNNEL_TYPE1) ?
+					jump_data->group :
+					TUNNEL_STEER_GROUP(jump_data->group);
 			ret = mlx5_flow_group_to_table(attr, dev_flow->external,
-						       jump_data->group,
+						       jump_group,
 						       !!priv->fdb_def_rule,
 						       &table, error);
 			if (ret)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v2 4/4] app/testpmd: support tunnel offload API
  2020-09-08 20:15 ` [dpdk-dev] [PATCH v2 0/4] Tunnel Offload API Gregory Etelson
                     ` (2 preceding siblings ...)
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 3/4] net/mlx5: implement tunnel offload API Gregory Etelson
@ 2020-09-08 20:15   ` Gregory Etelson
  2020-09-15  4:47     ` Ajit Khaparde
  3 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-09-08 20:15 UTC (permalink / raw)
  To: dev
  Cc: matan, rasland, orika, Ori Kam, Wenzhuo Lu, Beilei Xing,
	Bernard Iremonger

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:

* Create application tunnel:
flow tunnel <port> type <tunnel type>
On success, the command creates application tunnel object and returns
the tunnel descriptor. Tunnel descriptor is used in subsequent flow
creation commands to reference the tunnel.

* Create tunnel steering flow rule:
tunnel_set <tunnel descriptor> parameter used with steering rule
template.

* Create tunnel matching flow rule:
tunnel_match <tunnel descriptor> used with matching rule template.

* If tunnel steering rule was offloaded, outer header of a partially
offloaded packet is restored after miss.

Example:
test packet=
<Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
<UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
<VXLAN  NextProtocol=Ethernet vni=0x0 |
<Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
<ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
>>> len(packet)
92

testpmd> flow flush 0
testpmd> port 0/queue 0: received 1 packets
src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow tunnel 0 type vxlan
port 0: flow tunnel #1 type vxlan
testpmd> flow create 0 ingress group 0 tunnel_set 1
         pattern eth /ipv4 / udp dst is 4789 / vxlan / end
         actions  jump group 0 / end
Flow rule #0 created
testpmd> port 0/queue 0: received 1 packets
tunnel restore info: - vxlan tunnel - outer header present # <--
  src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow create 0 ingress group 0 tunnel_match 1
         pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
         end
         actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
         queue index 0 / end
Flow rule #1 created
testpmd> port 0/queue 0: received 1 packets
  src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
length=42

TODO: add tunnel destruction command.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
v2:
* introduce testpmd support for tunnel offload API
---
 app/test-pmd/cmdline_flow.c | 102 ++++++++++++++++++++++++-
 app/test-pmd/config.c       | 147 +++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.c      |   5 +-
 app/test-pmd/testpmd.h      |  27 ++++++-
 app/test-pmd/util.c         |  30 +++++++-
 5 files changed, 302 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 6263d307ed..f0a7a4a9ea 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -69,6 +69,10 @@ enum index {
 	LIST,
 	AGED,
 	ISOLATE,
+	TUNNEL,
+
+	/* Tunnel argumens. */
+	TUNNEL_RULE,
 
 	/* Destroy arguments. */
 	DESTROY_RULE,
@@ -88,6 +92,8 @@ enum index {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 
 	/* Validate/create pattern. */
 	PATTERN,
@@ -653,6 +659,7 @@ struct buffer {
 	union {
 		struct {
 			struct rte_flow_attr attr;
+			struct tunnel_ops tunnel_ops;
 			struct rte_flow_item *pattern;
 			struct rte_flow_action *actions;
 			uint32_t pattern_n;
@@ -713,10 +720,18 @@ static const enum index next_vc_attr[] = {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 	PATTERN,
 	ZERO,
 };
 
+static const enum index next_tunnel_attr[] = {
+	TUNNEL_RULE,
+	END,
+	ZERO,
+};
+
 static const enum index next_destroy_attr[] = {
 	DESTROY_RULE,
 	END,
@@ -1516,6 +1531,9 @@ static int parse_aged(struct context *, const struct token *,
 static int parse_isolate(struct context *, const struct token *,
 			 const char *, unsigned int,
 			 void *, unsigned int);
+static int parse_tunnel(struct context *, const struct token *,
+			const char *, unsigned int,
+			void *, unsigned int);
 static int parse_int(struct context *, const struct token *,
 		     const char *, unsigned int,
 		     void *, unsigned int);
@@ -1698,7 +1716,8 @@ static const struct token token_list[] = {
 			      LIST,
 			      AGED,
 			      QUERY,
-			      ISOLATE)),
+			      ISOLATE,
+			      TUNNEL)),
 		.call = parse_init,
 	},
 	/* Sub-level commands. */
@@ -1772,6 +1791,21 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY(struct buffer, port)),
 		.call = parse_isolate,
 	},
+	[TUNNEL] = {
+		.name = "tunnel",
+		.help = "new tunnel API",
+		.next = NEXT(NEXT_ENTRY(TUNNEL_RULE), NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	/* Tunnel arguments. */
+	[TUNNEL_RULE]{
+		.name = "type",
+		.help = "specify tunnel type",
+		.next = NEXT(next_tunnel_attr, NEXT_ENTRY(FILE_PATH)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, type)),
+		.call = parse_tunnel,
+	},
 	/* Destroy arguments. */
 	[DESTROY_RULE] = {
 		.name = "rule",
@@ -1835,6 +1869,20 @@ static const struct token token_list[] = {
 		.next = NEXT(next_vc_attr),
 		.call = parse_vc,
 	},
+	[TUNNEL_SET] = {
+		.name = "tunnel_set",
+		.help = "tunnel steer rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
+	[TUNNEL_MATCH] = {
+		.name = "tunnel_match",
+		.help = "tunnel match rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
 	/* Validate/create pattern. */
 	[PATTERN] = {
 		.name = "pattern",
@@ -4054,12 +4102,28 @@ parse_vc(struct context *ctx, const struct token *token,
 		return len;
 	}
 	ctx->objdata = 0;
-	ctx->object = &out->args.vc.attr;
+	switch (ctx->curr) {
+	default:
+		ctx->object = &out->args.vc.attr;
+		break;
+	case TUNNEL_SET:
+	case TUNNEL_MATCH:
+		ctx->object = &out->args.vc.tunnel_ops;
+		break;
+	}
 	ctx->objmask = NULL;
 	switch (ctx->curr) {
 	case GROUP:
 	case PRIORITY:
 		return len;
+	case TUNNEL_SET:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.actions = 1;
+		return len;
+	case TUNNEL_MATCH:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.items = 1;
+		return len;
 	case INGRESS:
 		out->args.vc.attr.ingress = 1;
 		return len;
@@ -5597,6 +5661,34 @@ parse_isolate(struct context *ctx, const struct token *token,
 	return len;
 }
 
+static int
+parse_tunnel(struct context *ctx, const struct token *token,
+	     const char *str, unsigned int len,
+	     void *buf, unsigned int size)
+{
+	struct buffer *out = buf;
+
+	/* Token name must match. */
+	if (parse_default(ctx, token, str, len, NULL, 0) < 0)
+		return -1;
+	/* Nothing else to do if there is no buffer. */
+	if (!out)
+		return len;
+	if (!out->command) {
+		if (ctx->curr != TUNNEL)
+			return -1;
+		if (sizeof(*out) > size)
+			return -1;
+		out->command = ctx->curr;
+		ctx->objdata = 0;
+		ctx->object = out;
+		ctx->objmask = NULL;
+	}
+	if (ctx->curr == TUNNEL_RULE)
+		ctx->object = &out->args.vc.tunnel_ops;
+	return len;
+}
+
 /**
  * Parse signed/unsigned integers 8 to 64-bit long.
  *
@@ -6547,7 +6639,8 @@ cmd_flow_parsed(const struct buffer *in)
 		break;
 	case CREATE:
 		port_flow_create(in->port, &in->args.vc.attr,
-				 in->args.vc.pattern, in->args.vc.actions);
+				 in->args.vc.pattern, in->args.vc.actions,
+				 &in->args.vc.tunnel_ops);
 		break;
 	case DESTROY:
 		port_flow_destroy(in->port, in->args.destroy.rule_n,
@@ -6573,6 +6666,9 @@ cmd_flow_parsed(const struct buffer *in)
 	case AGED:
 		port_flow_aged(in->port, in->args.aged.destroy);
 		break;
+	case TUNNEL:
+		port_flow_add_tunnel(in->port, &in->args.vc.tunnel_ops);
+		break;
 	default:
 		break;
 	}
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 30bee33248..b90927f451 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1337,6 +1337,58 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 
 /* Generic flow management functions. */
 
+static struct port_flow_tunnel *
+port_flow_locate_tunnel(struct rte_port *port, uint32_t port_tunnel_id)
+{
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (flow_tunnel->id == port_tunnel_id)
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+void port_flow_release_tunnel(struct port_flow_tunnel *flow_tunnel)
+{
+	LIST_REMOVE(flow_tunnel, chain);
+	free(flow_tunnel);
+}
+
+void port_flow_add_tunnel(portid_t port_id, const struct tunnel_ops *ops)
+{
+	struct rte_port *port = &ports[port_id];
+	enum rte_flow_item_type	type;
+	struct port_flow_tunnel *flow_tunnel;
+
+	if (!strncmp(ops->type, "vxlan", strlen("vxlan")))
+		type = RTE_FLOW_ITEM_TYPE_VXLAN;
+	else {
+		printf("cannot offload \"%s\" tunnel type\n", ops->type);
+		return;
+	}
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (flow_tunnel->tunnel.type == type)
+			break;
+	}
+	if (!flow_tunnel) {
+		flow_tunnel = calloc(1, sizeof(*flow_tunnel));
+		if (!flow_tunnel) {
+			printf("failed to allocate port flow_tunnel object\n");
+			return;
+		}
+		flow_tunnel->tunnel.type = type;
+		flow_tunnel->id = LIST_EMPTY(&port->flow_tunnel_list) ? 1 :
+				  LIST_FIRST(&port->flow_tunnel_list)->id + 1;
+		LIST_INSERT_HEAD(&port->flow_tunnel_list, flow_tunnel, chain);
+	}
+	printf("port %d: flow tunnel #%u type %s\n",
+		port_id, flow_tunnel->id, ops->type);
+}
+
 /** Generate a port_flow entry from attributes/pattern/actions. */
 static struct port_flow *
 port_flow_new(const struct rte_flow_attr *attr,
@@ -1503,13 +1555,15 @@ int
 port_flow_create(portid_t port_id,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item *pattern,
-		 const struct rte_flow_action *actions)
+		 const struct rte_flow_action *actions,
+		 const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow *flow;
 	struct rte_port *port;
 	struct port_flow *pf;
 	uint32_t id = 0;
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft;
 
 	port = &ports[port_id];
 	if (port->flow_list) {
@@ -1520,6 +1574,75 @@ port_flow_create(portid_t port_id,
 		}
 		id = port->flow_list->id + 1;
 	}
+	if (tunnel_ops->enabled) {
+		int ret;
+		pft = port_flow_locate_tunnel(port, tunnel_ops->id);
+		if (!pft) {
+			printf("failed to locate port flow tunnel #%u\n",
+				tunnel_ops->id);
+			return -ENOENT;
+		}
+		if (tunnel_ops->actions) {
+			uint32_t num_actions;
+			const struct rte_flow_action *aptr;
+
+			ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
+							&pft->pmd_actions,
+							&pft->num_pmd_actions,
+							&error);
+			if (ret) {
+				port_flow_complain(&error);
+				return -EINVAL;
+			}
+			for (aptr = actions, num_actions = 1;
+			     aptr->type != RTE_FLOW_ACTION_TYPE_END;
+			     aptr++, num_actions++);
+			pft->actions = malloc(
+					(num_actions +  pft->num_pmd_actions) *
+					sizeof(actions[0]));
+			if (!pft->actions) {
+				rte_flow_tunnel_action_decap_release(
+						port_id, pft->actions,
+						pft->num_pmd_actions, &error);
+				return -ENOMEM;
+			}
+			rte_memcpy(pft->actions, pft->pmd_actions,
+				   pft->num_pmd_actions * sizeof(actions[0]));
+			rte_memcpy(pft->actions + pft->num_pmd_actions, actions,
+				   num_actions * sizeof(actions[0]));
+			actions = pft->actions;
+		}
+		if (tunnel_ops->items) {
+			uint32_t num_items;
+			const struct rte_flow_item *iptr;
+
+			ret = rte_flow_tunnel_match(port_id, &pft->tunnel,
+						    &pft->pmd_items,
+						    &pft->num_pmd_items,
+						    &error);
+			if (ret) {
+				port_flow_complain(&error);
+				return -EINVAL;
+			}
+			for (iptr = pattern, num_items = 1;
+			     iptr->type != RTE_FLOW_ITEM_TYPE_END;
+			     iptr++, num_items++);
+			pft->items = malloc((num_items + pft->num_pmd_items) *
+					    sizeof(pattern[0]));
+			if (!pft->items) {
+				rte_flow_tunnel_item_release(
+						port_id, pft->pmd_items,
+						pft->num_pmd_items, &error);
+				return -ENOMEM;
+			}
+			rte_memcpy(pft->items, pft->pmd_items,
+				   pft->num_pmd_items * sizeof(pattern[0]));
+			rte_memcpy(pft->items + pft->num_pmd_items, pattern,
+				   num_items * sizeof(pattern[0]));
+			pattern = pft->items;
+		}
+
+	}
 	pf = port_flow_new(attr, pattern, actions, &error);
 	if (!pf)
 		return port_flow_complain(&error);
@@ -1535,6 +1658,20 @@ port_flow_create(portid_t port_id,
 	pf->id = id;
 	pf->flow = flow;
 	port->flow_list = pf;
+	if (tunnel_ops->enabled) {
+		if (tunnel_ops->actions) {
+			free(pft->actions);
+			rte_flow_tunnel_action_decap_release(
+				port_id, pft->pmd_actions,
+				pft->num_pmd_actions, &error);
+		}
+		if (tunnel_ops->items) {
+			free(pft->items);
+			rte_flow_tunnel_item_release(port_id, pft->pmd_items,
+						     pft->num_pmd_items,
+						     &error);
+		}
+	}
 	printf("Flow rule #%u created\n", pf->id);
 	return 0;
 }
@@ -1829,7 +1966,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 		       pf->rule.attr->egress ? 'e' : '-',
 		       pf->rule.attr->transfer ? 't' : '-');
 		while (item->type != RTE_FLOW_ITEM_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
+			if ((uint32_t)item->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)item->type,
 					  NULL) <= 0)
@@ -1840,7 +1979,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 		}
 		printf("=>");
 		while (action->type != RTE_FLOW_ACTION_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
+			if ((uint32_t)action->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)action->type,
 					  NULL) <= 0)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 7842c3b781..e2330116e1 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3591,6 +3591,8 @@ init_port_dcb_config(portid_t pid,
 static void
 init_port(void)
 {
+	int i;
+
 	/* Configuration of Ethernet ports. */
 	ports = rte_zmalloc("testpmd: ports",
 			    sizeof(struct rte_port) * RTE_MAX_ETHPORTS,
@@ -3600,7 +3602,8 @@ init_port(void)
 				"rte_zmalloc(%d struct rte_port) failed\n",
 				RTE_MAX_ETHPORTS);
 	}
-
+	for (i = 0; i < RTE_MAX_ETHPORTS; i++)
+		LIST_INIT(&ports[i].flow_tunnel_list);
 	/* Initialize ports NUMA structures */
 	memset(port_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
 	memset(rxring_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 25a12b14f2..9c47533c68 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -12,6 +12,7 @@
 #include <rte_gro.h>
 #include <rte_gso.h>
 #include <cmdline.h>
+#include <sys/queue.h>
 
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
@@ -148,6 +149,26 @@ struct port_flow {
 	uint8_t data[]; /**< Storage for flow rule description */
 };
 
+struct port_flow_tunnel {
+	LIST_ENTRY(port_flow_tunnel) chain;
+	struct rte_flow_action *pmd_actions;
+	struct rte_flow_item   *pmd_items;
+	uint32_t id;
+	uint32_t num_pmd_actions;
+	uint32_t num_pmd_items;
+	struct rte_flow_tunnel tunnel;
+	struct rte_flow_action *actions;
+	struct rte_flow_item *items;
+};
+
+struct tunnel_ops {
+	uint32_t id;
+	char type[16];
+	uint32_t enabled:1;
+	uint32_t actions:1;
+	uint32_t items:1;
+};
+
 /**
  * The data structure associated with each port.
  */
@@ -178,6 +199,7 @@ struct rte_port {
 	uint32_t                mc_addr_nb; /**< nb. of addr. in mc_addr_pool */
 	uint8_t                 slave_flag; /**< bonding slave port */
 	struct port_flow        *flow_list; /**< Associated flows. */
+	LIST_HEAD(, port_flow_tunnel) flow_tunnel_list;
 	const struct rte_eth_rxtx_callback *rx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	const struct rte_eth_rxtx_callback *tx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	/**< metadata value to insert in Tx packets. */
@@ -729,7 +751,8 @@ int port_flow_validate(portid_t port_id,
 int port_flow_create(portid_t port_id,
 		     const struct rte_flow_attr *attr,
 		     const struct rte_flow_item *pattern,
-		     const struct rte_flow_action *actions);
+		     const struct rte_flow_action *actions,
+		     const struct tunnel_ops *tunnel_ops);
 void update_age_action_context(const struct rte_flow_action *actions,
 		     struct port_flow *pf);
 int port_flow_destroy(portid_t port_id, uint32_t n, const uint32_t *rule);
@@ -739,6 +762,8 @@ int port_flow_query(portid_t port_id, uint32_t rule,
 		    const struct rte_flow_action *action);
 void port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group);
 void port_flow_aged(portid_t port_id, uint8_t destroy);
+void port_flow_release_tunnel(struct port_flow_tunnel *flow_tunnel);
+void port_flow_add_tunnel(portid_t port_id, const struct tunnel_ops *ops);
 int port_flow_isolate(portid_t port_id, int set);
 
 void rx_ring_desc_display(portid_t port_id, queueid_t rxq_id, uint16_t rxd_id);
diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index 8488fa1a8f..6757acde9a 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -22,6 +22,12 @@ print_ether_addr(const char *what, const struct rte_ether_addr *eth_addr)
 	printf("%s%s", what, buf);
 }
 
+static bool tunnel_missed_packet(struct rte_mbuf  *mb)
+{
+	uint64_t mask = PKT_RX_FDIR | PKT_RX_FDIR_ID;
+	return (mb->ol_flags & mask) == mask;
+}
+
 static inline void
 dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 	      uint16_t nb_pkts, int is_rx)
@@ -51,15 +57,37 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 		mb = pkts[i];
 		eth_hdr = rte_pktmbuf_read(mb, 0, sizeof(_eth_hdr), &_eth_hdr);
 		eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
-		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
 
+		if (tunnel_missed_packet(mb)) {
+			int ret;
+			struct rte_flow_error error;
+			struct rte_flow_restore_info info = { 0, };
+
+			ret = rte_flow_tunnel_get_restore_info(port_id, mb,
+							       &info, &error);
+			if (!ret) {
+				printf("tunnel restore info:");
+				if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL)
+					if (info.tunnel.type ==
+					    RTE_FLOW_ITEM_TYPE_VXLAN)
+						printf(" - vxlan tunnel");
+				if (info.flags &
+				    RTE_FLOW_RESTORE_INFO_ENCAPSULATED)
+					printf(" - outer header present");
+				if (info.flags & RTE_FLOW_RESTORE_INFO_GROUP_ID)
+					printf(" - miss group %u",
+						info.group_id);
+			}
+			printf("\n");
+		}
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
 		printf(" - type=0x%04x - length=%u - nb_segs=%d",
 		       eth_type, (unsigned int) mb->pkt_len,
 		       (int)mb->nb_segs);
+		ol_flags = mb->ol_flags;
 		if (ol_flags & PKT_RX_RSS_HASH) {
 			printf(" - RSS hash=0x%x", (unsigned int) mb->hash.rss);
 			printf(" - RSS queue=0x%x", (unsigned int) queue);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-09-15  4:36     ` Ajit Khaparde
  2020-09-15  8:46       ` Andrew Rybchenko
  2020-09-15  8:45     ` Andrew Rybchenko
  1 sibling, 1 reply; 95+ messages in thread
From: Ajit Khaparde @ 2020-09-15  4:36 UTC (permalink / raw)
  To: Gregory Etelson
  Cc: dpdk-dev, matan, rasland, orika, Gregory Etelson, Ori Kam,
	Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko

On Tue, Sep 8, 2020 at 1:16 PM Gregory Etelson <getelson@nvidia.com> wrote:

> From: Gregory Etelson <getelson@mellanox.com>
>
> RTE flow items & actions use positive values in item & action type.
> Negative values are reserved for PMD private types. PMD
> items & actions usually are not exposed to application and are not
> used to create RTE flows.
>
> The patch allows applications with access to PMD flow
> items & actions ability to integrate RTE and PMD items & actions
> and use them to create flow rule.
>
While we are reviewing this, some quick comment/questions..

Doesn't this go against the above "PMD
items & actions usually are not exposed to application and are not
used to create RTE flows."?
Why would an application try to use PMD specific private types?
Isn't this contrary to having a standard API?



>
> RTE flow library functions cannot work with PMD private items and
> actions (elements) because RTE flow has no API to query PMD flow
> object size. In the patch, PMD flow elements use object pointer.
> RTE flow library functions handle PMD element object size as
> size of a pointer. PMD handles its objects internally.
>
> Signed-off-by: Gregory Etelson <getelson@mellanox.com>
> Acked-by: Ori Kam <orika@mellanox.com>
> ---
> v2:
> * Update commit log
> ---
>  lib/librte_ethdev/rte_flow.c | 30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index f8fdd68fe9..9905426bc9 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -564,7 +564,12 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
>                 }
>                 break;
>         default:
> -               off = rte_flow_desc_item[item->type].size;
> +               /**
> +                * allow PMD private flow item
> +                */
> +               off = (uint32_t)item->type <= INT_MAX ?
> +                       rte_flow_desc_item[item->type].size :
> +                       sizeof(void *);
>                 rte_memcpy(buf, data, (size > off ? off : size));
>                 break;
>         }
> @@ -667,7 +672,12 @@ rte_flow_conv_action_conf(void *buf, const size_t
> size,
>                 }
>                 break;
>         default:
> -               off = rte_flow_desc_action[action->type].size;
> +               /**
> +                * allow PMD private flow action
> +                */
> +               off = (uint32_t)action->type <= INT_MAX ?
> +                       rte_flow_desc_action[action->type].size :
> +                       sizeof(void *);
>                 rte_memcpy(buf, action->conf, (size > off ? off : size));
>                 break;
>         }
> @@ -709,8 +719,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
>         unsigned int i;
>
>         for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> -               if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
> -                   !rte_flow_desc_item[src->type].name)
> +               /**
> +                * allow PMD private flow item
> +                */
> +               if (((uint32_t)src->type <= INT_MAX) &&
> +                       ((size_t)src->type >= RTE_DIM(rte_flow_desc_item)
> ||
> +                   !rte_flow_desc_item[src->type].name))
>                         return rte_flow_error_set
>                                 (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
> src,
>                                  "cannot convert unknown item type");
> @@ -798,8 +812,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
>         unsigned int i;
>
>         for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> -               if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> -                   !rte_flow_desc_action[src->type].name)
> +               /**
> +                * allow PMD private flow action
> +                */
> +               if (((uint32_t)src->type <= INT_MAX) &&
> +                   ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> +                   !rte_flow_desc_action[src->type].name))
>                         return rte_flow_error_set
>                                 (error, ENOTSUP,
> RTE_FLOW_ERROR_TYPE_ACTION,
>                                  src, "cannot convert unknown action
> type");
> --
> 2.25.1
>
>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] app/testpmd: support tunnel offload API
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 4/4] app/testpmd: support " Gregory Etelson
@ 2020-09-15  4:47     ` Ajit Khaparde
  2020-09-15 10:44       ` Gregory Etelson
  0 siblings, 1 reply; 95+ messages in thread
From: Ajit Khaparde @ 2020-09-15  4:47 UTC (permalink / raw)
  To: Gregory Etelson
  Cc: dpdk-dev, matan, rasland, orika, Ori Kam, Wenzhuo Lu,
	Beilei Xing, Bernard Iremonger

On Tue, Sep 8, 2020 at 1:17 PM Gregory Etelson <getelson@nvidia.com> wrote:

> ::::snip::::
> @@ -1520,6 +1574,75 @@ port_flow_create(portid_t port_id,
>                 }
>                 id = port->flow_list->id + 1;
>         }
> +       if (tunnel_ops->enabled) {
> +               int ret;
> +               pft = port_flow_locate_tunnel(port, tunnel_ops->id);
> +               if (!pft) {
> +                       printf("failed to locate port flow tunnel #%u\n",
> +                               tunnel_ops->id);
> +                       return -ENOENT;
> +               }
> +               if (tunnel_ops->actions) {
> +                       uint32_t num_actions;
> +                       const struct rte_flow_action *aptr;
> +
> +                       ret = rte_flow_tunnel_decap_set(port_id,
> &pft->tunnel,
> +                                                       &pft->pmd_actions,
> +
>  &pft->num_pmd_actions,
> +                                                       &error);
>
Does tunnel_ops always indicate decap?
Shouldn't there be a check for encap/decap? Or check for direction?



> +                       if (ret) {
> +                               port_flow_complain(&error);
> +                               return -EINVAL;
> +                       }
> ::::snip::::
>
>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
  2020-09-15  4:36     ` Ajit Khaparde
@ 2020-09-15  8:45     ` Andrew Rybchenko
  2020-09-15 16:17       ` Gregory Etelson
  1 sibling, 1 reply; 95+ messages in thread
From: Andrew Rybchenko @ 2020-09-15  8:45 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: matan, rasland, orika, Gregory Etelson, Ori Kam, Thomas Monjalon,
	Ferruh Yigit

On 9/8/20 11:15 PM, Gregory Etelson wrote:
> From: Gregory Etelson <getelson@mellanox.com>
> 
> RTE flow items & actions use positive values in item & action type.
> Negative values are reserved for PMD private types. PMD
> items & actions usually are not exposed to application and are not
> used to create RTE flows.
> 
> The patch allows applications with access to PMD flow
> items & actions ability to integrate RTE and PMD items & actions
> and use them to create flow rule.
> 
> RTE flow library functions cannot work with PMD private items and
> actions (elements) because RTE flow has no API to query PMD flow
> object size. In the patch, PMD flow elements use object pointer.
> RTE flow library functions handle PMD element object size as
> size of a pointer. PMD handles its objects internally.
> 
> Signed-off-by: Gregory Etelson <getelson@mellanox.com>
> Acked-by: Ori Kam <orika@mellanox.com>
> ---
> v2:
> * Update commit log
> ---
>  lib/librte_ethdev/rte_flow.c | 30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index f8fdd68fe9..9905426bc9 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -564,7 +564,12 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
>  		}
>  		break;
>  	default:
> -		off = rte_flow_desc_item[item->type].size;
> +		/**
> +		 * allow PMD private flow item
> +		 */
> +		off = (uint32_t)item->type <= INT_MAX ?

It looks inconsistent to cast to uint32_t and compare vs
INT_MAX. It should be either cast to 'unsigned int' or
compare vs INT32_MAX. I think cast to 'unsigned int' is
better for 'enum' values.
But may be it should be 'int' and < 0 in fact to match
the description that negative values are vendor-specific.

> +			rte_flow_desc_item[item->type].size :
> +			sizeof(void *);
>  		rte_memcpy(buf, data, (size > off ? off : size));
>  		break;
>  	}
> @@ -667,7 +672,12 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
>  		}
>  		break;
>  	default:
> -		off = rte_flow_desc_action[action->type].size;
> +		/**
> +		 * allow PMD private flow action
> +		 */
> +		off = (uint32_t)action->type <= INT_MAX ?

Same

> +			rte_flow_desc_action[action->type].size :
> +			sizeof(void *);
>  		rte_memcpy(buf, action->conf, (size > off ? off : size));
>  		break;
>  	}
> @@ -709,8 +719,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
>  	unsigned int i;
>  
>  	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> -		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
> -		    !rte_flow_desc_item[src->type].name)
> +		/**
> +		 * allow PMD private flow item
> +		 */
> +		if (((uint32_t)src->type <= INT_MAX) &&

Same

> +			((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
> +		    !rte_flow_desc_item[src->type].name))
>  			return rte_flow_error_set
>  				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
>  				 "cannot convert unknown item type");
> @@ -798,8 +812,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
>  	unsigned int i;
>  
>  	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> -		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> -		    !rte_flow_desc_action[src->type].name)
> +		/**
> +		 * allow PMD private flow action
> +		 */
> +		if (((uint32_t)src->type <= INT_MAX) &&

Same

> +		    ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> +		    !rte_flow_desc_action[src->type].name))
>  			return rte_flow_error_set
>  				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
>  				 src, "cannot convert unknown action type");
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-15  4:36     ` Ajit Khaparde
@ 2020-09-15  8:46       ` Andrew Rybchenko
  2020-09-15 10:27         ` Gregory Etelson
  0 siblings, 1 reply; 95+ messages in thread
From: Andrew Rybchenko @ 2020-09-15  8:46 UTC (permalink / raw)
  To: Ajit Khaparde, Gregory Etelson
  Cc: dpdk-dev, matan, rasland, orika, Gregory Etelson, Ori Kam,
	Thomas Monjalon, Ferruh Yigit

On 9/15/20 7:36 AM, Ajit Khaparde wrote:
> On Tue, Sep 8, 2020 at 1:16 PM Gregory Etelson <getelson@nvidia.com
> <mailto:getelson@nvidia.com>> wrote:
>
>     From: Gregory Etelson <getelson@mellanox.com
>     <mailto:getelson@mellanox.com>>
>
>     RTE flow items & actions use positive values in item & action type.
>     Negative values are reserved for PMD private types. PMD
>     items & actions usually are not exposed to application and are not
>     used to create RTE flows.
>
>     The patch allows applications with access to PMD flow
>     items & actions ability to integrate RTE and PMD items & actions
>     and use them to create flow rule.
>
> While we are reviewing this, some quick comment/questions..
>
> Doesn't this go against the above "PMD
> items & actions usually are not exposed to application and are not
> used to create RTE flows."?
> Why would an application try to use PMD specific private types?
> Isn't this contrary to having a standard API?

+1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-15  8:46       ` Andrew Rybchenko
@ 2020-09-15 10:27         ` Gregory Etelson
  2020-09-16 17:21           ` Gregory Etelson
  0 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-09-15 10:27 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam,
	Gregory Etelson, Ori Kam, NBU-Contact-Thomas Monjalon,
	Ferruh Yigit

Subject: Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
On 9/15/20 7:36 AM, Ajit Khaparde wrote:
On Tue, Sep 8, 2020 at 1:16 PM Gregory Etelson <getelson@nvidia.com<mailto:getelson@nvidia.com>> wrote:
From: Gregory Etelson <getelson@mellanox.com<mailto:getelson@mellanox.com>>

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.
While we are reviewing this, some quick comment/questions..

Doesn't this go against the above "PMD
items & actions usually are not exposed to application and are not
used to create RTE flows."?
Why would an application try to use PMD specific private types?
Isn't this contrary to having a standard API?

+1
PMD items and actions obtained in that patch are not intended to be used by application.
In general, application is not aware about content it receives from PMD. This is a special case when application
receives opaque element from PMD and passes it back.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/4] app/testpmd: support tunnel offload API
  2020-09-15  4:47     ` Ajit Khaparde
@ 2020-09-15 10:44       ` Gregory Etelson
  0 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-09-15 10:44 UTC (permalink / raw)
  To: Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam, Ori Kam,
	Wenzhuo Lu, Beilei Xing, Bernard Iremonger

::::snip::::
@@ -1520,6 +1574,75 @@ port_flow_create(portid_t port_id,
                }
                id = port->flow_list->id + 1;
        }
+       if (tunnel_ops->enabled) {
+               int ret;
+               pft = port_flow_locate_tunnel(port, tunnel_ops->id);
+               if (!pft) {
+                       printf("failed to locate port flow tunnel #%u\n",
+                               tunnel_ops->id);
+                       return -ENOENT;
+               }
+               if (tunnel_ops->actions) {
+                       uint32_t num_actions;
+                       const struct rte_flow_action *aptr;
+
+                       ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
+                                                       &pft->pmd_actions,
+                                                       &pft->num_pmd_actions,
+                                                       &error);
> Does tunnel_ops always indicate decap?
> Shouldn't there be a check for encap/decap? Or check for direction?

When tunnel offload API is enabled, application does not need to specify
decap  in flow rules - that action is carried out internally by PMD.
If application activated the API, it's expected that it will call rte_flow_tunnel_decap_set()
and rte_flow_tunnel_math()  and use results obtained by the functions to compile flow rule parameters.
tunnel_ops members specify 2 things
1 - tunnel offload API was activated
2 - what function, decap_set or match, should be activated and how to proceed with results
 
+                       if (ret) {
+                               port_flow_complain(&error);
+                               return -EINVAL;
+                       }
::::snip::::

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-15  8:45     ` Andrew Rybchenko
@ 2020-09-15 16:17       ` Gregory Etelson
  0 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-09-15 16:17 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: Matan Azrad, Raslan Darawsheh, Ori Kam, Gregory Etelson, Ori Kam,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit

> On 9/8/20 11:15 PM, Gregory Etelson wrote:
> > From: Gregory Etelson <getelson@mellanox.com>
> >
> > RTE flow items & actions use positive values in item & action type.
> > Negative values are reserved for PMD private types. PMD items &
> > actions usually are not exposed to application and are not used to
> > create RTE flows.
> >
> > The patch allows applications with access to PMD flow items & actions
> > ability to integrate RTE and PMD items & actions and use them to
> > create flow rule.
> >
> > RTE flow library functions cannot work with PMD private items and
> > actions (elements) because RTE flow has no API to query PMD flow
> > object size. In the patch, PMD flow elements use object pointer.
> > RTE flow library functions handle PMD element object size as size of a
> > pointer. PMD handles its objects internally.
> >
> > Signed-off-by: Gregory Etelson <getelson@mellanox.com>
> > Acked-by: Ori Kam <orika@mellanox.com>
> > ---
> > v2:
> > * Update commit log
> > ---
> >  lib/librte_ethdev/rte_flow.c | 30 ++++++++++++++++++++++++------
> >  1 file changed, 24 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ethdev/rte_flow.c
> > b/lib/librte_ethdev/rte_flow.c index f8fdd68fe9..9905426bc9 100644
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -564,7 +564,12 @@ rte_flow_conv_item_spec(void *buf, const size_t
> size,
> >               }
> >               break;
> >       default:
> > -             off = rte_flow_desc_item[item->type].size;
> > +             /**
> > +              * allow PMD private flow item
> > +              */
> > +             off = (uint32_t)item->type <= INT_MAX ?
> 
> It looks inconsistent to cast to uint32_t and compare vs INT_MAX. It should
> be either cast to 'unsigned int' or compare vs INT32_MAX. I think cast to
> 'unsigned int' is better for 'enum' values.
> But may be it should be 'int' and < 0 in fact to match the description that
> negative values are vendor-specific.
> 

I'll update PMD private item verification to 
(int) item->type < 0
In the next patch release.
The same applies to PMD private actions verification

> > +                     rte_flow_desc_item[item->type].size :
> > +                     sizeof(void *);
> >               rte_memcpy(buf, data, (size > off ? off : size));
> >               break;
> >       }
> > @@ -667,7 +672,12 @@ rte_flow_conv_action_conf(void *buf, const size_t
> size,
> >               }
> >               break;
> >       default:
> > -             off = rte_flow_desc_action[action->type].size;
> > +             /**
> > +              * allow PMD private flow action
> > +              */
> > +             off = (uint32_t)action->type <= INT_MAX ?
> 
> Same
> 
> > +                     rte_flow_desc_action[action->type].size :
> > +                     sizeof(void *);
> >               rte_memcpy(buf, action->conf, (size > off ? off : size));
> >               break;
> >       }
> > @@ -709,8 +719,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
> >       unsigned int i;
> >
> >       for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> > -             if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
> > -                 !rte_flow_desc_item[src->type].name)
> > +             /**
> > +              * allow PMD private flow item
> > +              */
> > +             if (((uint32_t)src->type <= INT_MAX) &&
> 
> Same
> 
> > +                     ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
> > +                 !rte_flow_desc_item[src->type].name))
> >                       return rte_flow_error_set
> >                               (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
> >                                "cannot convert unknown item type"); @@
> > -798,8 +812,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
> >       unsigned int i;
> >
> >       for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> > -             if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> > -                 !rte_flow_desc_action[src->type].name)
> > +             /**
> > +              * allow PMD private flow action
> > +              */
> > +             if (((uint32_t)src->type <= INT_MAX) &&
> 
> Same
> 
> > +                 ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> > +                 !rte_flow_desc_action[src->type].name))
> >                       return rte_flow_error_set
> >                               (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
> >                                src, "cannot convert unknown action
> > type");
> >


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-15 10:27         ` Gregory Etelson
@ 2020-09-16 17:21           ` Gregory Etelson
  2020-09-17  6:49             ` Andrew Rybchenko
  0 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-09-16 17:21 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam, Ori Kam,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit

From: Gregory Etelson 
Sent: Tuesday, September 15, 2020 13:27
To: Andrew Rybchenko <arybchenko@solarflare.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Gregory Etelson <getelson@mellanox.com>; Ori Kam <orika@mellanox.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
Subject: RE: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types

Subject: Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
On 9/15/20 7:36 AM, Ajit Khaparde wrote:
On Tue, Sep 8, 2020 at 1:16 PM Gregory Etelson <mailto:getelson@nvidia.com> wrote:
From: Gregory Etelson <mailto:getelson@mellanox.com>

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.
While we are reviewing this, some quick comment/questions..

Doesn't this go against the above "PMD
items & actions usually are not exposed to application and are not
used to create RTE flows."?
Why would an application try to use PMD specific private types?
Isn't this contrary to having a standard API?

+1

I would like to clarify the purpose and use of private elements patch.
That patch is prerequisite for  [PATCH v2 2/4] ethdev: tunnel offload model patch.
The tunnel offload API provides unified hardware independent model to offload tunneled packets,
match on packet headers in hardware and to restore outer headers of partially offloaded packets.
The model implementation depends on hardware capabilities. For example,  if hardware supports inner nat,
it can do nat first and postpone decap to the end, while other hardware that cannot do inner nat must decap first
and run nat actions afterwards. Such hardware has to save outer header in some hardware context,
register or memory, for application to restore a packet later, if needed. Also, in this case the exact solution
depends on PMD because of limited number of hardware contexts.
Although application working with DKDK can implement all these requirements with existing flow rules API,
it will have to address each hardware specifications separately.
To solve this limitation we selected design where application quires PMD for actions, or items,
that are optimal for a hardware that PMD represents. Result can be a mixture of RTE and PMD private elements -
it's up to PMD implementation. Application passes these elements back to PMD as a flow rule recipe
that's already optimal for underlying hardware.
If PMD has private elements in such rule items or actions, these private elements must not be rejected by RTE layer.
 
I hope it helps to understand what this model is trying to achieve. 
Did that clarify your concerns ?



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-16 17:21           ` Gregory Etelson
@ 2020-09-17  6:49             ` Andrew Rybchenko
  2020-09-17  7:47               ` Ori Kam
  2020-09-17  7:56               ` Gregory Etelson
  0 siblings, 2 replies; 95+ messages in thread
From: Andrew Rybchenko @ 2020-09-17  6:49 UTC (permalink / raw)
  To: Gregory Etelson, Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam, Ori Kam,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit

On 9/16/20 8:21 PM, Gregory Etelson wrote:
> From: Gregory Etelson 
> Sent: Tuesday, September 15, 2020 13:27
> To: Andrew Rybchenko <arybchenko@solarflare.com>; Ajit Khaparde <ajit.khaparde@broadcom.com>
> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Gregory Etelson <getelson@mellanox.com>; Ori Kam <orika@mellanox.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit <ferruh.yigit@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
> 
> Subject: Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
> On 9/15/20 7:36 AM, Ajit Khaparde wrote:
> On Tue, Sep 8, 2020 at 1:16 PM Gregory Etelson <mailto:getelson@nvidia.com> wrote:
> From: Gregory Etelson <mailto:getelson@mellanox.com>
> 
> RTE flow items & actions use positive values in item & action type.
> Negative values are reserved for PMD private types. PMD
> items & actions usually are not exposed to application and are not
> used to create RTE flows.
> 
> The patch allows applications with access to PMD flow
> items & actions ability to integrate RTE and PMD items & actions
> and use them to create flow rule.
> While we are reviewing this, some quick comment/questions..
> 
> Doesn't this go against the above "PMD
> items & actions usually are not exposed to application and are not
> used to create RTE flows."?
> Why would an application try to use PMD specific private types?
> Isn't this contrary to having a standard API?
> 
> +1
> 
> I would like to clarify the purpose and use of private elements patch.
> That patch is prerequisite for  [PATCH v2 2/4] ethdev: tunnel offload model patch.
> The tunnel offload API provides unified hardware independent model to offload tunneled packets,
> match on packet headers in hardware and to restore outer headers of partially offloaded packets.
> The model implementation depends on hardware capabilities. For example,  if hardware supports inner nat,
> it can do nat first and postpone decap to the end, while other hardware that cannot do inner nat must decap first
> and run nat actions afterwards. Such hardware has to save outer header in some hardware context,
> register or memory, for application to restore a packet later, if needed. Also, in this case the exact solution
> depends on PMD because of limited number of hardware contexts.
> Although application working with DKDK can implement all these requirements with existing flow rules API,
> it will have to address each hardware specifications separately.
> To solve this limitation we selected design where application quires PMD for actions, or items,
> that are optimal for a hardware that PMD represents. Result can be a mixture of RTE and PMD private elements -
> it's up to PMD implementation. Application passes these elements back to PMD as a flow rule recipe
> that's already optimal for underlying hardware.
> If PMD has private elements in such rule items or actions, these private elements must not be rejected by RTE layer.
>  
> I hope it helps to understand what this model is trying to achieve. 
> Did that clarify your concerns ?

There is a very simple question which I can't answer after
reading it.
Why these PMD specific actions and items do not bind
application to a specific vendor. If it binds, it should
be clearly stated in the description. If no, I'd like to
understand why since opaque actions/items are not really
well defined and hardly portable across vendors.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-17  6:49             ` Andrew Rybchenko
@ 2020-09-17  7:47               ` Ori Kam
  2020-09-17 15:15                 ` Andrew Rybchenko
  2020-09-17  7:56               ` Gregory Etelson
  1 sibling, 1 reply; 95+ messages in thread
From: Ori Kam @ 2020-09-17  7:47 UTC (permalink / raw)
  To: Andrew Rybchenko, Gregory Etelson, Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybchenko@solarflare.com>
> Sent: Thursday, September 17, 2020 9:50 AM
> 
> On 9/16/20 8:21 PM, Gregory Etelson wrote:
> > From: Gregory Etelson
> > Sent: Tuesday, September 15, 2020 13:27
> > To: Andrew Rybchenko <arybchenko@solarflare.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>
> > Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@nvidia.com>; Raslan
> Darawsheh <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Gregory
> Etelson <getelson@mellanox.com>; Ori Kam <orika@mellanox.com>; NBU-
> Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> <ferruh.yigit@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow
> rule types
> >
> > Subject: Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow
> rule types
> > On 9/15/20 7:36 AM, Ajit Khaparde wrote:
> > On Tue, Sep 8, 2020 at 1:16 PM Gregory Etelson
> <mailto:getelson@nvidia.com> wrote:
> > From: Gregory Etelson <mailto:getelson@mellanox.com>
> >
> > RTE flow items & actions use positive values in item & action type.
> > Negative values are reserved for PMD private types. PMD
> > items & actions usually are not exposed to application and are not
> > used to create RTE flows.
> >
> > The patch allows applications with access to PMD flow
> > items & actions ability to integrate RTE and PMD items & actions
> > and use them to create flow rule.
> > While we are reviewing this, some quick comment/questions..
> >
> > Doesn't this go against the above "PMD
> > items & actions usually are not exposed to application and are not
> > used to create RTE flows."?
> > Why would an application try to use PMD specific private types?
> > Isn't this contrary to having a standard API?
> >
> > +1
> >
> > I would like to clarify the purpose and use of private elements patch.
> > That patch is prerequisite for  [PATCH v2 2/4] ethdev: tunnel offload model
> patch.
> > The tunnel offload API provides unified hardware independent model to
> offload tunneled packets,
> > match on packet headers in hardware and to restore outer headers of
> partially offloaded packets.
> > The model implementation depends on hardware capabilities. For example,
> if hardware supports inner nat,
> > it can do nat first and postpone decap to the end, while other hardware that
> cannot do inner nat must decap first
> > and run nat actions afterwards. Such hardware has to save outer header in
> some hardware context,
> > register or memory, for application to restore a packet later, if needed. Also,
> in this case the exact solution
> > depends on PMD because of limited number of hardware contexts.
> > Although application working with DKDK can implement all these
> requirements with existing flow rules API,
> > it will have to address each hardware specifications separately.
> > To solve this limitation we selected design where application quires PMD for
> actions, or items,
> > that are optimal for a hardware that PMD represents. Result can be a
> mixture of RTE and PMD private elements -
> > it's up to PMD implementation. Application passes these elements back to
> PMD as a flow rule recipe
> > that's already optimal for underlying hardware.
> > If PMD has private elements in such rule items or actions, these private
> elements must not be rejected by RTE layer.
> >
> > I hope it helps to understand what this model is trying to achieve.
> > Did that clarify your concerns ?
> 
> There is a very simple question which I can't answer after
> reading it.
> Why these PMD specific actions and items do not bind
> application to a specific vendor. If it binds, it should
> be clearly stated in the description. If no, I'd like to
> understand why since opaque actions/items are not really
> well defined and hardly portable across vendors.

You are correct, when looking at this patch as a stand a lone
patch using such action / items does bind the application to specific PMD.
first sometimes it is required, for example one vendor may introduce private action
to support some key costumer, or enable feature that is not supported using standard rte flow API.

The main reason for this patch is the tunnel API[1] as stated in the reply
from Gregory, the tunnel API exposes a public function that returns a list of 
actions / items. The list is generated by the PMD, so using the API is not binding
since it is generic, but the action / items returned are private but the application is not aware of those actions / items, from it's point of view it called a generic function
and got actions that are configured to do the requested job. All the application needs to do is send the actions / item as actions / item when calling flow create.

Does this answer your question?

[1] https://patches.dpdk.org/patch/76931/


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-17  6:49             ` Andrew Rybchenko
  2020-09-17  7:47               ` Ori Kam
@ 2020-09-17  7:56               ` Gregory Etelson
  2020-09-17 15:18                 ` Andrew Rybchenko
  1 sibling, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-09-17  7:56 UTC (permalink / raw)
  To: Andrew Rybchenko, Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam, Ori Kam,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit

> On 9/16/20 8:21 PM, Gregory Etelson wrote:
> > From: Gregory Etelson
> > Sent: Tuesday, September 15, 2020 13:27
> > To: Andrew Rybchenko <arybchenko@solarflare.com>; Ajit Khaparde
> > <ajit.khaparde@broadcom.com>
> > Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@nvidia.com>; Raslan
> > Darawsheh <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Gregory
> > Etelson <getelson@mellanox.com>; Ori Kam <orika@mellanox.com>;
> > NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
> > <ferruh.yigit@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values
> > in flow rule types
> >
> > Subject: Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values
> > in flow rule types On 9/15/20 7:36 AM, Ajit Khaparde wrote:
> > On Tue, Sep 8, 2020 at 1:16 PM Gregory Etelson
> <mailto:getelson@nvidia.com> wrote:
> > From: Gregory Etelson <mailto:getelson@mellanox.com>
> >
> > RTE flow items & actions use positive values in item & action type.
> > Negative values are reserved for PMD private types. PMD items &
> > actions usually are not exposed to application and are not used to
> > create RTE flows.
> >
> > The patch allows applications with access to PMD flow items & actions
> > ability to integrate RTE and PMD items & actions and use them to
> > create flow rule.
> > While we are reviewing this, some quick comment/questions..
> >
> > Doesn't this go against the above "PMD items & actions usually are not
> > exposed to application and are not used to create RTE flows."?
> > Why would an application try to use PMD specific private types?
> > Isn't this contrary to having a standard API?
> >
> > +1
> >
> > I would like to clarify the purpose and use of private elements patch.
> > That patch is prerequisite for  [PATCH v2 2/4] ethdev: tunnel offload model
> patch.
> > The tunnel offload API provides unified hardware independent model to
> > offload tunneled packets, match on packet headers in hardware and to
> restore outer headers of partially offloaded packets.
> > The model implementation depends on hardware capabilities. For
> > example,  if hardware supports inner nat, it can do nat first and
> > postpone decap to the end, while other hardware that cannot do inner
> > nat must decap first and run nat actions afterwards. Such hardware has
> > to save outer header in some hardware context, register or memory, for
> application to restore a packet later, if needed. Also, in this case the exact
> solution depends on PMD because of limited number of hardware contexts.
> > Although application working with DKDK can implement all these
> > requirements with existing flow rules API, it will have to address each
> hardware specifications separately.
> > To solve this limitation we selected design where application quires
> > PMD for actions, or items, that are optimal for a hardware that PMD
> > represents. Result can be a mixture of RTE and PMD private elements -
> > it's up to PMD implementation. Application passes these elements back to
> PMD as a flow rule recipe that's already optimal for underlying hardware.
> > If PMD has private elements in such rule items or actions, these private
> elements must not be rejected by RTE layer.
> >
> > I hope it helps to understand what this model is trying to achieve.
> > Did that clarify your concerns ?
> 
> There is a very simple question which I can't answer after reading it.
> Why these PMD specific actions and items do not bind application to a
> specific vendor. If it binds, it should be clearly stated in the description. If no,
> I'd like to understand why since opaque actions/items are not really well
> defined and hardly portable across vendors.

Tunnel Offload API does not bind application to a vendor.
One of the main goals of that model is to provide application with vendor/hardware independent solution.
PMD transfer to application an array of items. Application passes that array back to PMD as opaque data,
in rte_flow_create(), without reviewing the array content. Therefore, if there are internal PMD actions in the array,
they have no effect on application.
Consider the following application code example:

/* get PMD actions that implement tunnel offload */
rte_tunnel_decap_set(&tunnel, &pmd_actions, pmd_actions_num, error);

/* compile an array of actions to create flow rule */
memcpy(actions, pmd_actions,  pmd_actions_num * sizeof(actions[0]));
memcpy(actions + pmd_actions_num, app_actions, app_actions_num * sizeof(actions[0]));

/* create flow rule*/
rte_flow_create(port_id, attr, pattern, actions, error);

vendor A provides pmd_actions_A = {va1, va2 …. vaN}
vendor B provides pmd_actions_B = {vb1}
Regardless of pmd_actions content, application code will not change.
However, each PMD will receive exact, hardware related, actions for tunnel offload.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-17  7:47               ` Ori Kam
@ 2020-09-17 15:15                 ` Andrew Rybchenko
  0 siblings, 0 replies; 95+ messages in thread
From: Andrew Rybchenko @ 2020-09-17 15:15 UTC (permalink / raw)
  To: Ori Kam, Gregory Etelson, Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit

Hi Ori,

On 9/17/20 10:47 AM, Ori Kam wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko@solarflare.com>
>> Sent: Thursday, September 17, 2020 9:50 AM
>>
>> On 9/16/20 8:21 PM, Gregory Etelson wrote:
>>> From: Gregory Etelson
>>> Sent: Tuesday, September 15, 2020 13:27
>>> To: Andrew Rybchenko <arybchenko@solarflare.com>; Ajit Khaparde
>> <ajit.khaparde@broadcom.com>
>>> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@nvidia.com>; Raslan
>> Darawsheh <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Gregory
>> Etelson <getelson@mellanox.com>; Ori Kam <orika@mellanox.com>; NBU-
>> Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>> <ferruh.yigit@intel.com>
>>> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow
>> rule types
>>>
>>> Subject: Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow
>> rule types
>>> On 9/15/20 7:36 AM, Ajit Khaparde wrote:
>>> On Tue, Sep 8, 2020 at 1:16 PM Gregory Etelson
>> <mailto:getelson@nvidia.com> wrote:
>>> From: Gregory Etelson <mailto:getelson@mellanox.com>
>>>
>>> RTE flow items & actions use positive values in item & action type.
>>> Negative values are reserved for PMD private types. PMD
>>> items & actions usually are not exposed to application and are not
>>> used to create RTE flows.
>>>
>>> The patch allows applications with access to PMD flow
>>> items & actions ability to integrate RTE and PMD items & actions
>>> and use them to create flow rule.
>>> While we are reviewing this, some quick comment/questions..
>>>
>>> Doesn't this go against the above "PMD
>>> items & actions usually are not exposed to application and are not
>>> used to create RTE flows."?
>>> Why would an application try to use PMD specific private types?
>>> Isn't this contrary to having a standard API?
>>>
>>> +1
>>>
>>> I would like to clarify the purpose and use of private elements patch.
>>> That patch is prerequisite for  [PATCH v2 2/4] ethdev: tunnel offload model
>> patch.
>>> The tunnel offload API provides unified hardware independent model to
>> offload tunneled packets,
>>> match on packet headers in hardware and to restore outer headers of
>> partially offloaded packets.
>>> The model implementation depends on hardware capabilities. For example,
>> if hardware supports inner nat,
>>> it can do nat first and postpone decap to the end, while other hardware that
>> cannot do inner nat must decap first
>>> and run nat actions afterwards. Such hardware has to save outer header in
>> some hardware context,
>>> register or memory, for application to restore a packet later, if needed. Also,
>> in this case the exact solution
>>> depends on PMD because of limited number of hardware contexts.
>>> Although application working with DKDK can implement all these
>> requirements with existing flow rules API,
>>> it will have to address each hardware specifications separately.
>>> To solve this limitation we selected design where application quires PMD for
>> actions, or items,
>>> that are optimal for a hardware that PMD represents. Result can be a
>> mixture of RTE and PMD private elements -
>>> it's up to PMD implementation. Application passes these elements back to
>> PMD as a flow rule recipe
>>> that's already optimal for underlying hardware.
>>> If PMD has private elements in such rule items or actions, these private
>> elements must not be rejected by RTE layer.
>>>
>>> I hope it helps to understand what this model is trying to achieve.
>>> Did that clarify your concerns ?
>>
>> There is a very simple question which I can't answer after
>> reading it.
>> Why these PMD specific actions and items do not bind
>> application to a specific vendor. If it binds, it should
>> be clearly stated in the description. If no, I'd like to
>> understand why since opaque actions/items are not really
>> well defined and hardly portable across vendors.
> 
> You are correct, when looking at this patch as a stand a lone
> patch using such action / items does bind the application to specific PMD.
> first sometimes it is required, for example one vendor may introduce private action
> to support some key costumer, or enable feature that is not supported using standard rte flow API.

Good. So I understand it correctly.

> The main reason for this patch is the tunnel API[1] as stated in the reply
> from Gregory, the tunnel API exposes a public function that returns a list of
> actions / items. The list is generated by the PMD, so using the API is not binding
> since it is generic, but the action / items returned are private but the application is not aware of those actions / items, from it's point of view it called a generic function
> and got actions that are configured to do the requested job. All the application needs to do is send the actions / item as actions / item when calling flow create.
> 
> Does this answer your question?

Yes, many thanks. I'm still trying to put the feature design in my head
and your explanations helped a lot. May be it is just the same words as
before, but in a bit different order and highlights.

> [1] https://patches.dpdk.org/patch/76931/
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types
  2020-09-17  7:56               ` Gregory Etelson
@ 2020-09-17 15:18                 ` Andrew Rybchenko
  0 siblings, 0 replies; 95+ messages in thread
From: Andrew Rybchenko @ 2020-09-17 15:18 UTC (permalink / raw)
  To: Gregory Etelson, Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam, Ori Kam,
	NBU-Contact-Thomas Monjalon, Ferruh Yigit

On 9/17/20 10:56 AM, Gregory Etelson wrote:
>> On 9/16/20 8:21 PM, Gregory Etelson wrote:
>>> From: Gregory Etelson
>>> Sent: Tuesday, September 15, 2020 13:27
>>> To: Andrew Rybchenko <arybchenko@solarflare.com>; Ajit Khaparde
>>> <ajit.khaparde@broadcom.com>
>>> Cc: dpdk-dev <dev@dpdk.org>; Matan Azrad <matan@nvidia.com>; Raslan
>>> Darawsheh <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Gregory
>>> Etelson <getelson@mellanox.com>; Ori Kam <orika@mellanox.com>;
>>> NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Ferruh Yigit
>>> <ferruh.yigit@intel.com>
>>> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values
>>> in flow rule types
>>>
>>> Subject: Re: [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values
>>> in flow rule types On 9/15/20 7:36 AM, Ajit Khaparde wrote:
>>> On Tue, Sep 8, 2020 at 1:16 PM Gregory Etelson
>> <mailto:getelson@nvidia.com> wrote:
>>> From: Gregory Etelson <mailto:getelson@mellanox.com>
>>>
>>> RTE flow items & actions use positive values in item & action type.
>>> Negative values are reserved for PMD private types. PMD items &
>>> actions usually are not exposed to application and are not used to
>>> create RTE flows.
>>>
>>> The patch allows applications with access to PMD flow items & actions
>>> ability to integrate RTE and PMD items & actions and use them to
>>> create flow rule.
>>> While we are reviewing this, some quick comment/questions..
>>>
>>> Doesn't this go against the above "PMD items & actions usually are not
>>> exposed to application and are not used to create RTE flows."?
>>> Why would an application try to use PMD specific private types?
>>> Isn't this contrary to having a standard API?
>>>
>>> +1
>>>
>>> I would like to clarify the purpose and use of private elements patch.
>>> That patch is prerequisite for  [PATCH v2 2/4] ethdev: tunnel offload model
>> patch.
>>> The tunnel offload API provides unified hardware independent model to
>>> offload tunneled packets, match on packet headers in hardware and to
>> restore outer headers of partially offloaded packets.
>>> The model implementation depends on hardware capabilities. For
>>> example,  if hardware supports inner nat, it can do nat first and
>>> postpone decap to the end, while other hardware that cannot do inner
>>> nat must decap first and run nat actions afterwards. Such hardware has
>>> to save outer header in some hardware context, register or memory, for
>> application to restore a packet later, if needed. Also, in this case the exact
>> solution depends on PMD because of limited number of hardware contexts.
>>> Although application working with DKDK can implement all these
>>> requirements with existing flow rules API, it will have to address each
>> hardware specifications separately.
>>> To solve this limitation we selected design where application quires
>>> PMD for actions, or items, that are optimal for a hardware that PMD
>>> represents. Result can be a mixture of RTE and PMD private elements -
>>> it's up to PMD implementation. Application passes these elements back to
>> PMD as a flow rule recipe that's already optimal for underlying hardware.
>>> If PMD has private elements in such rule items or actions, these private
>> elements must not be rejected by RTE layer.
>>>
>>> I hope it helps to understand what this model is trying to achieve.
>>> Did that clarify your concerns ?
>>
>> There is a very simple question which I can't answer after reading it.
>> Why these PMD specific actions and items do not bind application to a
>> specific vendor. If it binds, it should be clearly stated in the description. If no,
>> I'd like to understand why since opaque actions/items are not really well
>> defined and hardly portable across vendors.
> 
> Tunnel Offload API does not bind application to a vendor.
> One of the main goals of that model is to provide application with vendor/hardware independent solution.
> PMD transfer to application an array of items. Application passes that array back to PMD as opaque data,
> in rte_flow_create(), without reviewing the array content. Therefore, if there are internal PMD actions in the array,
> they have no effect on application.
> Consider the following application code example:
> 
> /* get PMD actions that implement tunnel offload */
> rte_tunnel_decap_set(&tunnel, &pmd_actions, pmd_actions_num, error);
> 
> /* compile an array of actions to create flow rule */
> memcpy(actions, pmd_actions,  pmd_actions_num * sizeof(actions[0]));
> memcpy(actions + pmd_actions_num, app_actions, app_actions_num * sizeof(actions[0]));
> 
> /* create flow rule*/
> rte_flow_create(port_id, attr, pattern, actions, error);
> 
> vendor A provides pmd_actions_A = {va1, va2 …. vaN}
> vendor B provides pmd_actions_B = {vb1}
> Regardless of pmd_actions content, application code will not change.
> However, each PMD will receive exact, hardware related, actions for tunnel offload.
> 

Many thanks for explanations. I got it. I'll wait for the next version
to take a look at code once again.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v3 0/4] Tunnel Offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (3 preceding siblings ...)
  2020-09-08 20:15 ` [dpdk-dev] [PATCH v2 0/4] Tunnel Offload API Gregory Etelson
@ 2020-09-30  9:18 ` Gregory Etelson
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
                     ` (3 more replies)
  2020-10-04 13:50 ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Gregory Etelson
                   ` (12 subsequent siblings)
  17 siblings, 4 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-09-30  9:18 UTC (permalink / raw)
  To: dev; +Cc: getelson, matan, rasland

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Eli Britstein (1):
  ethdev: tunnel offload model

Gregory Etelson (3):
  ethdev: allow negative values in flow rule types
  net/mlx5: implement tunnel offload API
  app/testpmd: add commands for tunnel offload API

 app/test-pmd/cmdline_flow.c                 | 170 ++++-
 app/test-pmd/config.c                       | 253 +++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 +-
 app/test-pmd/util.c                         |  35 +-
 doc/guides/nics/mlx5.rst                    |   3 +
 doc/guides/prog_guide/rte_flow.rst          | 105 +++
 doc/guides/rel_notes/release_20_11.rst      |  10 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++
 drivers/net/mlx5/linux/mlx5_os.c            |  18 +
 drivers/net/mlx5/mlx5.c                     |   8 +-
 drivers/net/mlx5/mlx5.h                     |   3 +
 drivers/net/mlx5/mlx5_defs.h                |   2 +
 drivers/net/mlx5/mlx5_flow.c                | 678 +++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h                | 173 ++++-
 drivers/net/mlx5/mlx5_flow_dv.c             | 241 ++++++-
 lib/librte_ethdev/rte_ethdev_version.map    |   6 +
 lib/librte_ethdev/rte_flow.c                | 140 +++-
 lib/librte_ethdev/rte_flow.h                | 195 ++++++
 lib/librte_ethdev/rte_flow_driver.h         |  32 +
 20 files changed, 2095 insertions(+), 65 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v3 1/4] ethdev: allow negative values in flow rule types
  2020-09-30  9:18 ` [dpdk-dev] [PATCH v3 0/4] Tunnel Offload API Gregory Etelson
@ 2020-09-30  9:18   ` Gregory Etelson
  2020-10-04  5:40     ` Ajit Khaparde
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 2/4] ethdev: tunnel offload model Gregory Etelson
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-09-30  9:18 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, Gregory Etelson, Ori Kam,
	Viacheslav Ovsiienko, Ori Kam, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko

From: Gregory Etelson <getelson@mellanox.com>

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow library functions cannot work with PMD private items and
actions (elements) because RTE flow has no API to query PMD flow
object size. In the patch, PMD flow elements use object pointer.
RTE flow library functions handle PMD element object size as
size of a pointer. PMD handles its objects internally.

Signed-off-by: Gregory Etelson <getelson@mellanox.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 lib/librte_ethdev/rte_flow.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index f8fdd68fe9..c8c6d62a8b 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -564,7 +564,11 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_item[item->type].size;
+		/**
+		 * allow PMD private flow item
+		 */
+		off = (int)item->type >= 0 ?
+		      rte_flow_desc_item[item->type].size : sizeof(void *);
 		rte_memcpy(buf, data, (size > off ? off : size));
 		break;
 	}
@@ -667,7 +671,11 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_action[action->type].size;
+		/**
+		 * allow PMD private flow action
+		 */
+		off = (int)action->type >= 0 ?
+		      rte_flow_desc_action[action->type].size : sizeof(void *);
 		rte_memcpy(buf, action->conf, (size > off ? off : size));
 		break;
 	}
@@ -709,8 +717,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
-		    !rte_flow_desc_item[src->type].name)
+		/**
+		 * allow PMD private flow item
+		 */
+		if (((int)src->type >= 0) &&
+			((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
+		    !rte_flow_desc_item[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
 				 "cannot convert unknown item type");
@@ -798,8 +810,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
-		    !rte_flow_desc_action[src->type].name)
+		/**
+		 * allow PMD private flow action
+		 */
+		if (((int)src->type >= 0) &&
+		    ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
+		    !rte_flow_desc_action[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				 src, "cannot convert unknown action type");
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v3 2/4] ethdev: tunnel offload model
  2020-09-30  9:18 ` [dpdk-dev] [PATCH v3 0/4] Tunnel Offload API Gregory Etelson
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-09-30  9:18   ` Gregory Etelson
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 3/4] net/mlx5: implement tunnel offload API Gregory Etelson
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 4/4] app/testpmd: add commands for " Gregory Etelson
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-09-30  9:18 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, Eli Britstein, Ori Kam,
	Viacheslav Ovsiienko, Ori Kam, John McNamara, Marko Kovacevic,
	Ray Kinsella, Neil Horman, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko

From: Eli Britstein <elibr@mellanox.com>

Rte_flow API provides the building blocks for vendor agnostic flow
classification offloads.  The rte_flow match and action primitives are
fine grained, thus enabling DPDK applications the flexibility to
offload network stacks and complex pipelines.

Applications wishing to offload complex data structures (e.g. tunnel
virtual ports) are required to use the rte_flow primitives, such as
group, meta, mark, tag and others to model their high level objects.

The hardware model design for high level software objects is not
trivial.  Furthermore, an optimal design is often vendor specific.

The goal of this API is to provide applications with the hardware
offload model for common high level software objects which is optimal
in regards to the underlying hardware.

Tunnel ports are the first of such objects.

Tunnel ports
------------
Ingress processing of tunneled traffic requires the classification of
the tunnel type followed by a decap action.

In software, once a packet is decapsulated the in_port field is
changed to a virtual port representing the tunnel type. The outer
header fields are stored as packet metadata members and may be matched
by proceeding flows.

Openvswitch, for example, uses two flows:
1. classification flow - setting the virtual port representing the
tunnel type For example: match on udp port 4789
actions=tnl_pop(vxlan_vport)
2. steering flow according to outer and inner header matches match on
in_port=vxlan_vport and outer/inner header matches actions=forward to
p ort X The benefits of multi-flow tables are described in [1].

Offloading tunnel ports
-----------------------
Tunnel ports introduce a new stateless field that can be matched on.
Currently the rte_flow library provides an API to encap, decap and
match on tunnel headers. However, there is no rte_flow primitive to
set and match tunnel virtual ports.

There are several possible hardware models for offloading virtual
tunnel port flows including, but not limited to, the following:
1. Setting the virtual port on a hw register using the
rte_flow_action_mark/ rte_flow_action_tag/rte_flow_set_meta objects.
2. Mapping a virtual port to an rte_flow group
3. Avoiding the need to match on transient objects by merging
multi-table flows to a single rte_flow rule.

Every approach has its pros and cons.  The preferred approach should
take into account the entire system architecture and is very often
vendor specific.

The proposed rte_flow_tunnel_decap_set helper function (drafted below)
is designed to provide a common, vendor agnostic, API for setting the
virtual port value.  The helper API enables PMD implementations to
return vendor specific combination of rte_flow actions realizing the
vendor's hardware model for setting a tunnel port.  Applications may
append the list of actions returned from the helper function when
creating an rte_flow rule in hardware.

Similarly, the rte_flow_tunnel_match helper (drafted below)
allows for multiple hardware implementations to return a list of
fte_flow items.

Miss handling
-------------
Packets going through multiple rte_flow groups are exposed to hw
misses due to partial packet processing. In such cases, the software
should continue the packet's processing from the point where the
hardware missed.

We propose a generic rte_flow_restore structure providing the state
that was stored in hardware when the packet missed.

Currently, the structure will provide the tunnel state of the packet
that missed, namely:
1. The group id that missed
2. The tunnel port that missed
3. Tunnel information that was stored in memory (due to decap action).
In the future, we may add additional fields as more state may be
stored in the device memory (e.g. ct_state).

Applications may query the state via a new
rte_flow_tunnel_get_restore_info(mbuf) API, thus allowing
a vendor specific implementation.

VXLAN Code example:
Assume application needs to do inner NAT on VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

First VXLAN packet that arrives matches the rule in group 0 and jumps
to group 3 In group 3 the packet will miss since there is no flow to
match and will be uploaded to application.  Application  will call
rte_flow_get_restore_info() to get the packet outer header.
Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of rules will be that VXLAN packet with vni=10, outer IPv4
dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decaped
on queue 3 with IPv4 dst=186.1.1.1

Note: Packet in group 3 is considered decaped. All actions in that
group will be done on header that was inner before decap. Application
may specify outer header to be matched on.  It's PMD responsibility to
translate these items to outer metadata.

API usage:
/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);

/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);
/**
  * 4. after flow creation application does not need to keep tunnel
  * action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);

/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
 doc/guides/rel_notes/release_20_11.rst   |  10 ++
 lib/librte_ethdev/rte_ethdev_version.map |   6 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 6 files changed, 460 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 119b128739..e62030150e 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3031,6 +3031,111 @@ operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Provide software application with unified rules model for tunneled traffic
+regardless underlying hardware.
+
+ - The model introduces a concept of a virtual tunnel port (VTP).
+ - The model uses VTP to offload ingress tunneled network traffic 
+   with RTE flow rules.
+ - The model is implemented as set of helper functions. Each PMD
+   implements VTP offload according to underlying hardware offload
+   capabilities.  Applications must query PMD for VTP flow
+   items / actions before using in creation of a VTP flow rule.
+
+The model components:
+
+- Virtual Tunnel Port (VTP) is a stateless software object that
+  describes tunneled network traffic.  VTP object usually contains
+  descriptions of outer headers, tunnel headers and inner headers.
+- Tunnel Steering flow Rule (TSR) detects tunneled packets and
+  delegates them to tunnel processing infrastructure, implemented
+  in PMD for optimal hardware utilization, for further processing.
+- Tunnel Matching flow Rule (TMR) verifies packet configuration and
+  runs offload actions in case of a match.
+
+Application actions:
+
+1 Initialize VTP object according to tunnel network parameters.
+
+2 Create TSR flow rule.
+
+2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
+
+  .. code-block:: c
+
+    int
+    rte_flow_tunnel_decap_set(uint16_t port_id,
+                              struct rte_flow_tunnel *tunnel,
+                              struct rte_flow_action **pmd_actions,
+                              uint32_t *num_of_pmd_actions,
+                              struct rte_flow_error *error);
+
+2.2 Integrate PMD actions into TSR actions list.
+
+2.3 Create TSR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
+
+3 Create TMR flow rule.
+
+3.1 Query PMD for VTP items. Application can query for VTP items more than once.
+
+    .. code-block:: c
+
+      int
+      rte_flow_tunnel_match(uint16_t port_id,
+                            struct rte_flow_tunnel *tunnel,
+                            struct rte_flow_item **pmd_items,
+                            uint32_t *num_of_pmd_items,
+                            struct rte_flow_error *error);
+
+3.2 Integrate PMD items into TMR items list.
+
+3.3 Create TMR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
+
+The model provides helper function call to restore packets that miss
+tunnel TMR rules to its original state:
+
+.. code-block:: c
+
+  int
+  rte_flow_get_restore_info(uint16_t port_id,
+                            struct rte_mbuf *mbuf,
+                            struct rte_flow_restore_info *info,
+                            struct rte_flow_error *error);
+
+rte_tunnel object filled by the call inside
+``rte_flow_restore_info *info parameter`` can be used by the application
+to create new TMR rule for that tunnel.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+Application can release the actionsfter TSR rule was created.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release the items after rule was created. However, if the application
+needs to create additional TMR rule for the same tunnel it will need
+to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index c6642f5f94..802d9e74d6 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,16 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Flow rules allowed to use private PMD items / actions.**
+
+  * Flow rule verification was updated to accept private PMD
+    items and actions.
+
+* **Added generic API to offload tunneled traffic and restore missed packet.**
+
+  * Added a new hardware independent helper API to RTE flow library that
+    offloads tunneled traffic and restores missed packets.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index c95ef5157a..9832c138a2 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -226,6 +226,12 @@ EXPERIMENTAL {
 	rte_tm_wred_profile_add;
 	rte_tm_wred_profile_delete;
 
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
+
 	# added in 20.11
 	rte_eth_link_speed_to_str;
 	rte_eth_link_to_str;
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index c8c6d62a8b..181c02792d 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1267,3 +1267,115 @@ rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, rte_strerror(ENOTSUP));
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *restore_info,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index da8bfa5489..2f12d3ea1a 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3357,6 +3357,201 @@ int
 rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 			uint32_t nb_contexts, struct rte_flow_error *error);
 
+/* Tunnel has a type and the key information. */
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type	type;
+	uint64_t tun_id; /**< Tunnel identification. */
+
+	RTE_STD_C11
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	};
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+	uint16_t   tun_flags; /**< Tunnel flags. */
+
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+
+	/**
+	 * the following members are required to restore packet
+	 * after miss
+	 */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	uint32_t label; /**< Flow Label for IPv6. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID where packed missed */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *info,
+			  struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 3ee871d3eb..9d87407203 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -108,6 +108,38 @@ struct rte_flow_ops {
 		 void **context,
 		 uint32_t nb_contexts,
 		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v3 3/4] net/mlx5: implement tunnel offload API
  2020-09-30  9:18 ` [dpdk-dev] [PATCH v3 0/4] Tunnel Offload API Gregory Etelson
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 2/4] ethdev: tunnel offload model Gregory Etelson
@ 2020-09-30  9:18   ` Gregory Etelson
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 4/4] app/testpmd: add commands for " Gregory Etelson
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-09-30  9:18 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, Viacheslav Ovsiienko, Matan Azrad,
	Shahaf Shuler, Viacheslav Ovsiienko, John McNamara,
	Marko Kovacevic

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions whith tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2:
* introduce MLX5 PMD API implementation
v3:
* bug fixes
---
 doc/guides/nics/mlx5.rst         |   3 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +
 drivers/net/mlx5/mlx5.c          |   8 +-
 drivers/net/mlx5/mlx5.h          |   3 +
 drivers/net/mlx5/mlx5_defs.h     |   2 +
 drivers/net/mlx5/mlx5_flow.c     | 678 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h     | 173 +++++++-
 drivers/net/mlx5/mlx5_flow_dv.c  | 241 +++++++++--
 8 files changed, 1080 insertions(+), 46 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 211c0c5a6c..287fd23b43 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -815,6 +815,9 @@ Driver options
     24 bits. The actual supported width can be retrieved in runtime by
     series of rte_flow_validate() trials.
 
+  - 3, this engages tunnel offload mode. In E-Switch configuration, that
+    mode implicitly activates ``dv_xmeta_en=1``.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 81a2e99e71..ecf4d2f2a6 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -298,6 +298,12 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		sh->esw_drop_action = mlx5_glue->dr_create_flow_action_drop();
 	}
 #endif
+	if (!sh->tunnel_hub)
+		err = mlx5_alloc_tunnel_hub(sh);
+	if (err) {
+		DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
+		goto error;
+	}
 	if (priv->config.reclaim_mode == MLX5_RCM_AGGR) {
 		mlx5_glue->dr_reclaim_domain_memory(sh->rx_domain, 1);
 		mlx5_glue->dr_reclaim_domain_memory(sh->tx_domain, 1);
@@ -344,6 +350,10 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 	return err;
 }
@@ -405,6 +415,10 @@ mlx5_os_free_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 }
 
@@ -658,6 +672,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			strerror(rte_errno));
 		goto error;
 	}
+	if (config->dv_miss_info) {
+		if (switch_info->master || switch_info->representor)
+			config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
+	}
 	mlx5_malloc_mem_select(config->sys_mem_en);
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4a807fb4fd..569a865da8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1590,13 +1590,17 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 	} else if (strcmp(MLX5_DV_XMETA_EN, key) == 0) {
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
-		    tmp != MLX5_XMETA_MODE_META32) {
+		    tmp != MLX5_XMETA_MODE_META32 &&
+		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
 			DRV_LOG(ERR, "invalid extensive "
 				     "metadata parameter");
 			rte_errno = EINVAL;
 			return -rte_errno;
 		}
-		config->dv_xmeta_en = tmp;
+		if (tmp != MLX5_XMETA_MODE_MISS_INFO)
+			config->dv_xmeta_en = tmp;
+		else
+			config->dv_miss_info = 1;
 	} else if (strcmp(MLX5_LACP_BY_USER, key) == 0) {
 		config->lacp_by_user = !!tmp;
 	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0907506755..e12c4cee4b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -206,6 +206,7 @@ struct mlx5_dev_config {
 	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	unsigned int sys_mem_en:1; /* The default memory allocator. */
 	unsigned int decap_en:1; /* Whether decap will be used or not. */
+	unsigned int dv_miss_info:1; /* restore packet after partial hw miss */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
@@ -632,6 +633,8 @@ struct mlx5_dev_ctx_shared {
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
 	struct mlx5_hlist *flow_tbls;
+	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
+	struct mlx5_flow_tunnel_hub *tunnel_hub;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
 	void *pop_vlan_action; /* Pointer to DR pop VLAN action. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 0df47391ee..41a7537d5e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -165,6 +165,8 @@
 #define MLX5_XMETA_MODE_LEGACY 0
 #define MLX5_XMETA_MODE_META16 1
 #define MLX5_XMETA_MODE_META32 2
+/* Provide info on patrial hw miss. Implies MLX5_XMETA_MODE_META16 */
+#define MLX5_XMETA_MODE_MISS_INFO 3
 
 /* MLX5_TX_DB_NC supported values. */
 #define MLX5_TXDB_CACHED 0
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 416505f1c8..26625e0bd8 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -18,6 +18,7 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 #include <rte_ip.h>
+#include <rte_hash.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -30,6 +31,18 @@
 #include "mlx5_flow_os.h"
 #include "mlx5_rxtx.h"
 
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id);
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev, struct mlx5_flow_tunnel *tunnel);
+static uint32_t
+tunnel_flow_group_to_flow_table(struct rte_eth_dev *dev,
+				const struct mlx5_flow_tunnel *tunnel,
+				uint32_t group, uint32_t *table,
+				struct rte_flow_error *error);
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark);
+
 /** Device flow drivers. */
 extern const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops;
 
@@ -220,6 +233,171 @@ static const struct rte_flow_expand_node mlx5_support_expansion[] = {
 	},
 };
 
+struct tunnel_validation {
+	bool verdict;
+	const char *msg;
+};
+
+static inline struct tunnel_validation
+mlx5_flow_tunnel_validate(struct rte_eth_dev *dev,
+			  struct rte_flow_tunnel *tunnel)
+{
+	struct tunnel_validation tv;
+
+	if (!is_tunnel_offload_active(dev)) {
+		tv.msg = "tunnel offload was not activated";
+		goto err;
+	} else if (!tunnel) {
+		tv.msg = "no application tunnel";
+		goto err;
+	}
+
+	switch (tunnel->type) {
+	default:
+		tv.msg = "unsupported tunnel type";
+		goto err;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		break;
+	}
+
+	tv.verdict = true;
+	return tv;
+
+err:
+	tv.verdict = false;
+	return tv;
+}
+
+static int
+mlx5_flow_tunnel_decap_set(struct rte_eth_dev *dev,
+		    struct rte_flow_tunnel *app_tunnel,
+		    struct rte_flow_action **actions,
+		    uint32_t *num_of_actions,
+		    struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_validation tv;
+
+	tv = mlx5_flow_tunnel_validate(dev, app_tunnel);
+	if (!tv.verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  tv.msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*actions = &tunnel->action;
+	*num_of_actions = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_match(struct rte_eth_dev *dev,
+		       struct rte_flow_tunnel *app_tunnel,
+		       struct rte_flow_item **items,
+		       uint32_t *num_of_items,
+		       struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_validation tv;
+
+	tv = mlx5_flow_tunnel_validate(dev, app_tunnel);
+	if (!tv.verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  tv.msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*items = &tunnel->item;
+	*num_of_items = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_item_release(struct rte_eth_dev *dev,
+		       struct rte_flow_item *pmd_items,
+		       uint32_t num_items, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->item == pmd_items)
+			break;
+	}
+	if (!tun || num_items != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+	return 0;
+}
+
+static int
+mlx5_flow_action_release(struct rte_eth_dev *dev,
+			 struct rte_flow_action *pmd_actions,
+			 uint32_t num_actions, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->action == pmd_actions)
+			break;
+	}
+	if (!tun || num_actions != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_get_restore_info(struct rte_eth_dev *dev,
+				  struct rte_mbuf *m,
+				  struct rte_flow_restore_info *info,
+				  struct rte_flow_error *err)
+{
+	uint64_t ol_flags = m->ol_flags;
+	const struct mlx5_flow_tbl_data_entry *tble;
+	const uint64_t mask = PKT_RX_FDIR | PKT_RX_FDIR_ID;
+
+	if ((ol_flags & mask) != mask)
+		goto err;
+	tble = tunnel_mark_decode(dev, m->hash.fdir.hi);
+	if (!tble) {
+		DRV_LOG(DEBUG, "port %u invalid miss tunnel mark %#x",
+			dev->data->port_id, m->hash.fdir.hi);
+		goto err;
+	}
+	MLX5_ASSERT(tble->tunnel);
+	memcpy(&info->tunnel, &tble->tunnel->app_tunnel, sizeof(info->tunnel));
+	info->group_id = tble->group_id;
+	info->flags = RTE_FLOW_RESTORE_INFO_TUNNEL |
+		      RTE_FLOW_RESTORE_INFO_GROUP_ID |
+		      RTE_FLOW_RESTORE_INFO_ENCAPSULATED;
+
+	return 0;
+
+err:
+	return rte_flow_error_set(err, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "failed to get restore info");
+}
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -229,6 +407,11 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
 	.get_aged_flows = mlx5_flow_get_aged_flows,
+	.tunnel_decap_set = mlx5_flow_tunnel_decap_set,
+	.tunnel_match = mlx5_flow_tunnel_match,
+	.action_release = mlx5_flow_action_release,
+	.item_release = mlx5_flow_item_release,
+	.get_restore_info = mlx5_flow_tunnel_get_restore_info,
 };
 
 /* Convert FDIR request to Generic flow. */
@@ -3524,6 +3707,136 @@ flow_hairpin_split(struct rte_eth_dev *dev,
 	return 0;
 }
 
+__extension__
+union tunnel_offload_mark {
+	uint32_t val;
+	struct {
+		uint32_t app_reserve:8;
+		uint32_t table_id:15;
+		uint32_t transfer:1;
+		uint32_t _unused_:8;
+	};
+};
+
+struct tunnel_default_miss_ctx {
+	uint16_t *queue;
+	__extension__
+	union {
+		struct rte_flow_action_rss action_rss;
+		struct rte_flow_action_queue miss_queue;
+		struct rte_flow_action_jump miss_jump;
+		uint8_t raw[0];
+	};
+};
+
+static int
+flow_tunnel_add_default_miss(struct rte_eth_dev *dev,
+			     struct rte_flow *flow,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *app_actions,
+			     uint32_t flow_idx,
+			     struct tunnel_default_miss_ctx *ctx,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_flow *dev_flow;
+	struct rte_flow_attr miss_attr = *attr;
+	const struct mlx5_flow_tunnel *tunnel = app_actions[0].conf;
+	const struct rte_flow_item miss_items[2] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		}
+	};
+	union tunnel_offload_mark mark_id;
+	struct rte_flow_action_mark miss_mark;
+	struct rte_flow_action miss_actions[3] = {
+		[0] = { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &miss_mark },
+		[2] = { .type = RTE_FLOW_ACTION_TYPE_END,  .conf = NULL }
+	};
+	const struct rte_flow_action_jump *jump_data;
+	uint32_t i, flow_table = 0; /* prevent compilation warning */
+	int ret;
+
+	if (!attr->transfer) {
+		struct mlx5_priv *priv = dev->data->dev_private;
+		uint32_t q_size;
+
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_RSS;
+		q_size = priv->reta_idx_n * sizeof(ctx->queue[0]);
+		ctx->queue = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, q_size,
+					 0, SOCKET_ID_ANY);
+		if (!ctx->queue)
+			return rte_flow_error_set
+				(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid default miss RSS");
+		ctx->action_rss.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		ctx->action_rss.level = 0,
+		ctx->action_rss.types = priv->rss_conf.rss_hf,
+		ctx->action_rss.key_len = priv->rss_conf.rss_key_len,
+		ctx->action_rss.queue_num = priv->reta_idx_n,
+		ctx->action_rss.key = priv->rss_conf.rss_key,
+		ctx->action_rss.queue = ctx->queue;
+		if (!priv->reta_idx_n || !priv->rxqs_n)
+			return rte_flow_error_set
+				(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid port configuration");
+		if (!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG))
+			ctx->action_rss.types = 0;
+		for (i = 0; i != priv->reta_idx_n; ++i)
+			ctx->queue[i] = (*priv->reta_idx)[i];
+	} else {
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_JUMP;
+		ctx->miss_jump.group = MLX5_TNL_MISS_FDB_JUMP_GRP;
+	}
+	miss_actions[1].conf = (typeof(miss_actions[1].conf))ctx->raw;
+	for (; app_actions->type != RTE_FLOW_ACTION_TYPE_JUMP; app_actions++);
+	jump_data = app_actions->conf;
+	miss_attr.priority = MLX5_TNL_MISS_RULE_PRIORITY;
+	miss_attr.group = jump_data->group;
+	ret = tunnel_flow_group_to_flow_table(dev, tunnel, jump_data->group,
+					      &flow_table, error);
+	if (ret)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  NULL, "invalid tunnel id");
+	mark_id.app_reserve = 0;
+	mark_id.table_id = tunnel_flow_tbl_to_id(flow_table);
+	mark_id.transfer = !!attr->transfer;
+	mark_id._unused_ = 0;
+	miss_mark.id = mark_id.val;
+	dev_flow = flow_drv_prepare(dev, flow, &miss_attr,
+				    miss_items, miss_actions, flow_idx, error);
+	if (!dev_flow)
+		return -rte_errno;
+	dev_flow->flow = flow;
+	dev_flow->external = true;
+	dev_flow->tunnel = tunnel;
+	/* Subflow object was created, we must include one in the list. */
+	SILIST_INSERT(&flow->dev_handles, dev_flow->handle_idx,
+		      dev_flow->handle, next);
+	DRV_LOG(DEBUG,
+		"port %u tunnel type=%d id=%u miss rule priority=%u group=%u",
+		dev->data->port_id, tunnel->app_tunnel.type,
+		tunnel->tunnel_id, miss_attr.priority, miss_attr.group);
+	ret = flow_drv_translate(dev, dev_flow, &miss_attr, miss_items,
+				  miss_actions, error);
+	if (!ret)
+		ret = flow_mreg_update_copy_table(dev, flow, miss_actions,
+						  error);
+
+	return ret;
+}
+
 /**
  * The last stage of splitting chain, just creates the subflow
  * without any modification.
@@ -4296,6 +4609,27 @@ flow_create_split_outer(struct rte_eth_dev *dev,
 	return ret;
 }
 
+static struct mlx5_flow_tunnel *
+flow_tunnel_from_rule(struct rte_eth_dev *dev,
+		      const struct rte_flow_attr *attr,
+		      const struct rte_flow_item items[],
+		      const struct rte_flow_action actions[])
+{
+	struct mlx5_flow_tunnel *tunnel;
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)items[0].spec;
+	else if (is_flow_tunnel_steer_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)actions[0].conf;
+	else
+		tunnel = NULL;
+#pragma GCC diagnostic pop
+
+	return tunnel;
+}
+
 /**
  * Create a flow and add it to @p list.
  *
@@ -4356,6 +4690,8 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	int hairpin_flow;
 	uint32_t hairpin_id = 0;
 	struct rte_flow_attr attr_tx = { .priority = 0 };
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_default_miss_ctx default_miss_ctx = { 0, };
 	int ret;
 
 	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
@@ -4430,6 +4766,19 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 					      error);
 		if (ret < 0)
 			goto error;
+		if (is_flow_tunnel_steer_rule(dev, attr,
+					      buf->entry[i].pattern,
+					      p_actions_rx)) {
+			ret = flow_tunnel_add_default_miss(dev, flow, attr,
+							   p_actions_rx,
+							   idx,
+							   &default_miss_ctx,
+							   error);
+			if (ret < 0) {
+				mlx5_free(default_miss_ctx.queue);
+				goto error;
+			}
+		}
 	}
 	/* Create the tx flow. */
 	if (hairpin_flow) {
@@ -4484,6 +4833,13 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	priv->flow_idx = priv->flow_nested_idx;
 	if (priv->flow_nested_idx)
 		priv->flow_nested_idx = 0;
+	tunnel = flow_tunnel_from_rule(dev, attr, items, actions);
+	if (tunnel) {
+		flow->tunnel = 1;
+		flow->tunnel_id = tunnel->tunnel_id;
+		__atomic_add_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED);
+		mlx5_free(default_miss_ctx.queue);
+	}
 	return idx;
 error:
 	MLX5_ASSERT(flow);
@@ -4603,6 +4959,7 @@ mlx5_flow_create(struct rte_eth_dev *dev,
 				   "port not started");
 		return NULL;
 	}
+
 	return (void *)(uintptr_t)flow_list_create(dev, &priv->flows,
 				  attr, items, actions, true, error);
 }
@@ -4657,6 +5014,13 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 		}
 	}
 	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+	if (flow->tunnel) {
+		struct mlx5_flow_tunnel *tunnel;
+		tunnel = mlx5_find_tunnel_id(dev, flow->tunnel_id);
+		RTE_VERIFY(tunnel);
+		if (!__atomic_sub_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED))
+			mlx5_flow_tunnel_free(dev, tunnel);
+	}
 }
 
 /**
@@ -6131,19 +6495,122 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 	sh->cmng.pending_queries--;
 }
 
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_hlist_entry *he;
+	union tunnel_offload_mark mbits = { .val = mark };
+	union mlx5_flow_tbl_key table_key = {
+		{
+			.table_id = tunnel_id_to_flow_tbl(mbits.table_id),
+			.reserved = 0,
+			.domain = !!mbits.transfer,
+			.direction = 0,
+		}
+	};
+	he = mlx5_hlist_lookup(sh->flow_tbls, table_key.v64);
+	return he ?
+	       container_of(he, struct mlx5_flow_tbl_data_entry, entry) : NULL;
+}
+
+static uint32_t
+tunnel_flow_group_to_flow_table(struct rte_eth_dev *dev,
+				const struct mlx5_flow_tunnel *tunnel,
+				uint32_t group, uint32_t *table,
+				struct rte_flow_error *error)
+{
+	struct mlx5_hlist_entry *he;
+	struct tunnel_tbl_entry *tte;
+	union tunnel_tbl_key key = {
+		.tunnel_id = tunnel ? tunnel->tunnel_id : 0,
+		.group = group
+	};
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_hlist *group_hash;
+
+	group_hash = tunnel ? tunnel->groups : thub->groups;
+	he = mlx5_hlist_lookup(group_hash, key.val);
+	if (!he) {
+		int ret;
+		tte = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+				  sizeof(*tte), 0,
+				  SOCKET_ID_ANY);
+		if (!tte)
+			goto err;
+		tte->hash.key = key.val;
+		ret = mlx5_flow_id_get(thub->table_ids, &tte->flow_table);
+		if (ret) {
+			mlx5_free(tte);
+			goto err;
+		}
+		tte->flow_table = tunnel_id_to_flow_tbl(tte->flow_table);
+		mlx5_hlist_insert(group_hash, &tte->hash);
+	} else {
+		tte = container_of(he, typeof(*tte), hash);
+	}
+	*table = tte->flow_table;
+	DRV_LOG(DEBUG, "port %u tunnel %u group=%#x table=%#x",
+		dev->data->port_id, key.tunnel_id, group, *table);
+	return 0;
+
+err:
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+				  NULL, "tunnel group index not supported");
+}
+
+static int
+flow_group_to_table(uint32_t port_id, uint32_t group, uint32_t *table,
+		    struct flow_grp_info grp_info, struct rte_flow_error *error)
+{
+	if (grp_info.transfer && grp_info.external && grp_info.fdb_def_rule) {
+		if (group == UINT32_MAX)
+			return rte_flow_error_set
+						(error, EINVAL,
+						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						 NULL,
+						 "group index not supported");
+		*table = group + 1;
+	} else {
+		*table = group;
+	}
+	DRV_LOG(DEBUG, "port %u group=%#x table=%#x", port_id, group, *table);
+	return 0;
+}
+
 /**
  * Translate the rte_flow group index to HW table value.
  *
- * @param[in] attributes
- *   Pointer to flow attributes
- * @param[in] external
- *   Value is part of flow rule created by request external to PMD.
+ * If tunnel offload is disabled, all group ids coverted to flow table
+ * id using the standard method.
+ * If tunnel offload is enabled, group id can be converted using the
+ * standard or tunnel conversion method. Group conversion method
+ * selection depends on flags in `grp_info` parameter:
+ * - Internal (grp_info.external == 0) groups conversion uses the
+ *   standard method.
+ * - Group ids in JUMP action converted with the tunnel conversion.
+ * - Group id in rule attribute conversion depends on a rule type and
+ *   group id value:
+ *   ** non zero group attributes converted with the tunnel method
+ *   ** zero group attribute in non-tunnel rule is converted using the
+ *      standard method - there's only one root table
+ *   ** zero group attribute in steer tunnel rule is converted with the
+ *      standard method - single root table
+ *   ** zero group attribute in match tunnel rule is a special OvS
+ *      case: that value is used for portability reasons. That group
+ *      id is converted with the tunnel conversion method.
+ *
+ * @param[in] dev
+ *   Port device
+ * @param[in] tunnel
+ *   PMD tunnel offload object
  * @param[in] group
  *   rte_flow group index value.
- * @param[out] fdb_def_rule
- *   Whether fdb jump to table 1 is configured.
  * @param[out] table
  *   HW table value.
+ * @param[in] grp_info
+ *   flags used for conversion
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -6151,22 +6618,34 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_flow_group_to_table(const struct rte_flow_attr *attributes, bool external,
-			 uint32_t group, bool fdb_def_rule, uint32_t *table,
+mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group, uint32_t *table,
+			 struct flow_grp_info grp_info,
 			 struct rte_flow_error *error)
 {
-	if (attributes->transfer && external && fdb_def_rule) {
-		if (group == UINT32_MAX)
-			return rte_flow_error_set
-						(error, EINVAL,
-						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
-						 NULL,
-						 "group index not supported");
-		*table = group + 1;
+	int ret;
+	bool standard_translation;
+
+	if (is_tunnel_offload_active(dev)) {
+		standard_translation = !grp_info.external ||
+					grp_info.std_tbl_fix;
 	} else {
-		*table = group;
+		standard_translation = true;
 	}
-	return 0;
+	DRV_LOG(DEBUG,
+		"port %u group=%#x transfer=%d external=%d fdb_def_rule=%d translate=%s",
+		dev->data->port_id, group, grp_info.transfer,
+		grp_info.external, grp_info.fdb_def_rule,
+		standard_translation ? "STANDARD" : "TUNNEL");
+	if (standard_translation)
+		ret = flow_group_to_table(dev->data->port_id, group, table,
+					  grp_info, error);
+	else
+		ret = tunnel_flow_group_to_flow_table(dev, tunnel, group,
+						      table, error);
+
+	return ret;
 }
 
 /**
@@ -6305,3 +6784,166 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 		 dev->data->port_id);
 	return -ENOTSUP;
 }
+
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev,
+		      struct mlx5_flow_tunnel *tunnel)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+
+	DRV_LOG(DEBUG, "port %u release pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+	RTE_VERIFY(!__atomic_load_n(&tunnel->refctn, __ATOMIC_RELAXED));
+	LIST_REMOVE(tunnel, chain);
+	mlx5_flow_id_release(id_pool, tunnel->tunnel_id);
+	mlx5_hlist_destroy(tunnel->groups, NULL, NULL);
+	mlx5_free(tunnel);
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (tun->tunnel_id == id)
+			break;
+	}
+
+	return tun;
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_flow_tunnel_allocate(struct rte_eth_dev *dev,
+			  const struct rte_flow_tunnel *app_tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+	uint32_t id;
+
+	ret = mlx5_flow_id_get(id_pool, &id);
+	if (ret)
+		return NULL;
+	/**
+	 * mlx5 flow tunnel is an auxlilary data structure
+	 * It's not part of IO. No need to allocate it from
+	 * huge pages pools dedicated for IO
+	 */
+	tunnel = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*tunnel),
+			     0, SOCKET_ID_ANY);
+	if (!tunnel) {
+		mlx5_flow_id_pool_release(id_pool);
+		return NULL;
+	}
+	tunnel->groups = mlx5_hlist_create("tunnel groups", 1024);
+	if (!tunnel->groups) {
+		mlx5_flow_id_pool_release(id_pool);
+		mlx5_free(tunnel);
+		return NULL;
+	}
+	/* initiate new PMD tunnel */
+	memcpy(&tunnel->app_tunnel, app_tunnel, sizeof(*app_tunnel));
+	tunnel->tunnel_id = id;
+	tunnel->action.type = MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET;
+	tunnel->action.conf = tunnel;
+	tunnel->item.type = MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL;
+	tunnel->item.spec = tunnel;
+	tunnel->item.last = NULL;
+	tunnel->item.mask = NULL;
+
+	DRV_LOG(DEBUG, "port %u new pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+
+	return tunnel;
+}
+
+int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (!memcmp(app_tunnel, &tun->app_tunnel,
+			    sizeof(*app_tunnel))) {
+			*tunnel = tun;
+			ret = 0;
+			break;
+		}
+	}
+	if (!tun) {
+		tun = mlx5_flow_tunnel_allocate(dev, app_tunnel);
+		if (tun) {
+			LIST_INSERT_HEAD(&thub->tunnels, tun, chain);
+			*tunnel = tun;
+		} else {
+			ret = -ENOMEM;
+		}
+	}
+	if (tun)
+		__atomic_add_fetch(&tun->refctn, 1, __ATOMIC_RELAXED);
+
+	return ret;
+}
+
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id)
+{
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+
+	if (!thub)
+		return;
+	if (!LIST_EMPTY(&thub->tunnels))
+		DRV_LOG(WARNING, "port %u tunnels present\n", port_id);
+	mlx5_flow_id_pool_release(thub->tunnel_ids);
+	mlx5_flow_id_pool_release(thub->table_ids);
+	mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	mlx5_free(thub);
+}
+
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
+{
+	int err;
+	struct mlx5_flow_tunnel_hub *thub;
+
+	thub = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*thub),
+			   0, SOCKET_ID_ANY);
+	if (!thub)
+		return -ENOMEM;
+	LIST_INIT(&thub->tunnels);
+	thub->tunnel_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TUNNELS);
+	if (!thub->tunnel_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->table_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TABLES);
+	if (!thub->table_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->groups = mlx5_hlist_create("flow groups", MLX5_MAX_TABLES);
+	if (!thub->groups) {
+		err = -rte_errno;
+		goto err;
+	}
+	sh->tunnel_hub = thub;
+
+	return 0;
+
+err:
+	if (thub->groups)
+		mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	if (thub->table_ids)
+		mlx5_flow_id_pool_release(thub->table_ids);
+	if (thub->tunnel_ids)
+		mlx5_flow_id_pool_release(thub->tunnel_ids);
+	if (thub)
+		mlx5_free(thub);
+	return err;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 279daf21f5..8691db16ab 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -26,6 +26,7 @@ enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
 	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
 
 /* Private (internal) rte flow actions. */
@@ -35,6 +36,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_MARK,
 	MLX5_RTE_FLOW_ACTION_TYPE_COPY_MREG,
 	MLX5_RTE_FLOW_ACTION_TYPE_DEFAULT_MISS,
+	MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET,
 };
 
 /* Matches on selected register. */
@@ -196,6 +198,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_SET_IPV6_DSCP (1ull << 33)
 #define MLX5_FLOW_ACTION_AGE (1ull << 34)
 #define MLX5_FLOW_ACTION_DEFAULT_MISS (1ull << 35)
+#define MLX5_FLOW_ACTION_TUNNEL_SET (1ull << 36)
+#define MLX5_FLOW_ACTION_TUNNEL_MATCH (1ull << 37)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -517,6 +521,10 @@ struct mlx5_flow_tbl_data_entry {
 	struct mlx5_flow_dv_jump_tbl_resource jump;
 	/**< jump resource, at most one for each table created. */
 	uint32_t idx; /**< index for the indexed mempool. */
+	/**< tunnel offload */
+	const struct mlx5_flow_tunnel *tunnel;
+	uint32_t group_id;
+	bool external;
 };
 
 /* Verbs specification header. */
@@ -695,6 +703,7 @@ struct mlx5_flow {
 	};
 	struct mlx5_flow_handle *handle;
 	uint32_t handle_idx; /* Index of the mlx5 flow handle memory. */
+	const struct mlx5_flow_tunnel *tunnel;
 };
 
 /* Flow meter state. */
@@ -840,6 +849,112 @@ struct mlx5_fdir_flow {
 
 #define HAIRPIN_FLOW_ID_BITS 28
 
+#define MLX5_MAX_TUNNELS 256
+#define MLX5_TNL_MISS_RULE_PRIORITY 3
+#define MLX5_TNL_MISS_FDB_JUMP_GRP  0xfaac
+
+/*
+ * When tunnel offload is active, all JUMP group ids are converted
+ * using the same method. That conversion is applied both to tunnel and
+ * regular rule types.
+ * Group ids used in tunnel rules are relative to it's tunnel (!).
+ * Application can create number of steer rules, using the same
+ * tunnel, with different group id in each rule.
+ * Each tunnel stores its groups internally in PMD tunnel object.
+ * Groups used in regular rules do not belong to any tunnel and are stored
+ * in tunnel hub.
+ */
+
+struct mlx5_flow_tunnel {
+	LIST_ENTRY(mlx5_flow_tunnel) chain;
+	struct rte_flow_tunnel app_tunnel;	/** app tunnel copy */
+	uint32_t tunnel_id;			/** unique tunnel ID */
+	uint32_t refctn;
+	struct rte_flow_action action;
+	struct rte_flow_item item;
+	struct mlx5_hlist *groups;		/** tunnel groups */
+};
+
+/** PMD tunnel related context */
+struct mlx5_flow_tunnel_hub {
+	LIST_HEAD(, mlx5_flow_tunnel) tunnels;
+	struct mlx5_flow_id_pool *tunnel_ids;
+	struct mlx5_flow_id_pool *table_ids;
+	struct mlx5_hlist *groups;		/** non tunnel groups */
+};
+
+/* convert jump group to flow table ID in tunnel rules */
+struct tunnel_tbl_entry {
+	struct mlx5_hlist_entry hash;
+	uint32_t flow_table;
+};
+
+static inline uint32_t
+tunnel_id_to_flow_tbl(uint32_t id)
+{
+	return id | (1u << 16);
+}
+
+static inline uint32_t
+tunnel_flow_tbl_to_id(uint32_t flow_tbl)
+{
+	return flow_tbl & ~(1u << 16);
+}
+
+union tunnel_tbl_key {
+	uint64_t val;
+	struct {
+		uint32_t tunnel_id;
+		uint32_t group;
+	};
+};
+
+static inline struct mlx5_flow_tunnel_hub *
+mlx5_tunnel_hub(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return priv->sh->tunnel_hub;
+}
+
+static inline bool
+is_tunnel_offload_active(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return !!priv->config.dv_miss_info;
+}
+
+static inline bool
+is_flow_tunnel_match_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (items[0].type == (typeof(items[0].type))
+				 MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL);
+}
+
+static inline bool
+is_flow_tunnel_steer_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (actions[0].type == (typeof(actions[0].type))
+				   MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET);
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_actions_to_tunnel(const struct rte_flow_action actions[])
+{
+	return actions[0].conf;
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_items_to_tunnel(const struct rte_flow_item items[])
+{
+	return items[0].spec;
+}
+
 /* Flow structure. */
 struct rte_flow {
 	ILIST_ENTRY(uint32_t)next; /**< Index to the next flow structure. */
@@ -847,12 +962,14 @@ struct rte_flow {
 	/**< Device flow handles that are part of the flow. */
 	uint32_t drv_type:2; /**< Driver type. */
 	uint32_t fdir:1; /**< Identifier of associated FDIR if any. */
+	uint32_t tunnel:1;
 	uint32_t hairpin_flow_id:HAIRPIN_FLOW_ID_BITS;
 	/**< The flow id used for hairpin. */
 	uint32_t copy_applied:1; /**< The MARK copy Flow os applied. */
 	uint32_t rix_mreg_copy;
 	/**< Index to metadata register copy table resource. */
 	uint32_t counter; /**< Holds flow counter. */
+	uint32_t tunnel_id;  /**< Tunnel id */
 	uint16_t meter; /**< Holds flow meter id. */
 } __rte_packed;
 
@@ -935,9 +1052,54 @@ void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
 uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
 uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
 			      uint32_t id);
-int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
-			     bool external, uint32_t group, bool fdb_def_rule,
-			     uint32_t *table, struct rte_flow_error *error);
+__extension__
+struct flow_grp_info {
+	uint64_t external:1;
+	uint64_t transfer:1;
+	uint64_t fdb_def_rule:1;
+	/* force standard group translation */
+	uint64_t std_tbl_fix:1;
+};
+
+static inline bool
+tunnel_use_standard_attr_group_translate
+		    (struct rte_eth_dev *dev,
+		     const struct mlx5_flow_tunnel *tunnel,
+		     const struct rte_flow_attr *attr,
+		     const struct rte_flow_item items[],
+		     const struct rte_flow_action actions[])
+{
+	bool verdict;
+
+	if (!is_tunnel_offload_active(dev))
+		/* no tunnel offload API */
+		verdict = true;
+	else if (tunnel) {
+		/*
+		 * OvS will use jump to group 0 in tunnel steer rule.
+		 * If tunnel steer rule starts from group 0 (attr.group == 0)
+		 * that 0 group must be traslated with standard method.
+		 * attr.group == 0 in tunnel match rule translated with tunnel
+		 * method
+		 */
+		verdict = !attr->group &&
+			  is_flow_tunnel_steer_rule(dev, attr, items, actions);
+	} else {
+		/*
+		 * non-tunnel group translation uses standard method for
+		 * root group only: attr.group == 0
+		 */
+		verdict = !attr->group;
+	}
+
+	return verdict;
+}
+
+int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     uint32_t group, uint32_t *table,
+			     struct flow_grp_info flags,
+				 struct rte_flow_error *error);
 uint64_t mlx5_flow_hashfields_adjust(struct mlx5_flow_rss_desc *rss_desc,
 				     int tunnel, uint64_t layer_types,
 				     uint64_t hash_fields);
@@ -1069,4 +1231,9 @@ int mlx5_flow_destroy_policer_rules(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr);
 int mlx5_flow_meter_flush(struct rte_eth_dev *dev,
 			  struct rte_mtr_error *error);
+int mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+			 const struct rte_flow_tunnel *app_tunnel,
+			 struct mlx5_flow_tunnel **tunnel);
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 3819cdb266..2e6cb779fb 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3702,14 +3702,21 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_dv_validate_action_jump(const struct rte_flow_action *action,
+flow_dv_validate_action_jump(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     const struct rte_flow_action *action,
 			     uint64_t action_flags,
 			     const struct rte_flow_attr *attributes,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table;
 	int ret = 0;
-
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attributes->transfer,
+		.fdb_def_rule = 1,
+		.std_tbl_fix = 0
+	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
 			    MLX5_FLOW_FATE_ESWITCH_ACTIONS))
 		return rte_flow_error_set(error, EINVAL,
@@ -3726,11 +3733,13 @@ flow_dv_validate_action_jump(const struct rte_flow_action *action,
 					  NULL, "action configuration not set");
 	target_group =
 		((const struct rte_flow_action_jump *)action->conf)->group;
-	ret = mlx5_flow_group_to_table(attributes, external, target_group,
-				       true, &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, target_group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
-	if (attributes->group == target_group)
+	if (attributes->group == target_group &&
+	    !(action_flags & (MLX5_FLOW_ACTION_TUNNEL_SET |
+			      MLX5_FLOW_ACTION_TUNNEL_MATCH)))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "target group must be other than"
@@ -4982,8 +4991,9 @@ flow_dv_counter_release(struct rte_eth_dev *dev, uint32_t counter)
  */
 static int
 flow_dv_validate_attributes(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_tunnel *tunnel,
 			    const struct rte_flow_attr *attributes,
-			    bool external __rte_unused,
+			    struct flow_grp_info grp_info,
 			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -4999,9 +5009,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 #else
 	uint32_t table = 0;
 
-	ret = mlx5_flow_group_to_table(attributes, external,
-				       attributes->group, !!priv->fdb_def_rule,
-				       &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attributes->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	if (!table)
@@ -5123,10 +5132,28 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 	const struct rte_flow_item_vlan *vlan_m = NULL;
 	int16_t rw_act_num = 0;
 	uint64_t is_root;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	if (items == NULL)
 		return -1;
-	ret = flow_dv_validate_attributes(dev, attr, external, error);
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		tunnel = flow_items_to_tunnel(items);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_MATCH |
+				MLX5_FLOW_ACTION_DECAP;
+	} else if (is_flow_tunnel_steer_rule(dev, attr, items, actions)) {
+		tunnel = flow_actions_to_tunnel(actions);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+	} else {
+		tunnel = NULL;
+	}
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = flow_dv_validate_attributes(dev, tunnel, attr, grp_info, error);
 	if (ret < 0)
 		return ret;
 	is_root = (uint64_t)ret;
@@ -5139,6 +5166,15 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  NULL, "item not supported");
 		switch (type) {
+		case MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL:
+			if (items[0].type != (typeof(items[0].type))
+						MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ITEM,
+						NULL, "MLX5 private items "
+						"must be the first");
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -5703,7 +5739,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_MDF_TTL;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			ret = flow_dv_validate_action_jump(actions,
+			ret = flow_dv_validate_action_jump(dev, tunnel, actions,
 							   action_flags,
 							   attr, external,
 							   error);
@@ -5803,6 +5839,17 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			action_flags |= MLX5_FLOW_ACTION_SET_IPV6_DSCP;
 			rw_act_num += MLX5_ACT_NUM_SET_DSCP;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			if (actions[0].type != (typeof(actions[0].type))
+				MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL, "MLX5 private action "
+						"must be the first");
+
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5810,6 +5857,54 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  "action not supported");
 		}
 	}
+	/*
+	 * Validate actions in flow rules
+	 * - Explicit decap action is prohibited by the tunnel offload API.
+	 * - Drop action in tunnel steer rule is prohibited by the API.
+	 * - Application cannot use MARK action because it's value can mask
+	 *   tunnel default miss nitification.
+	 * - JUMP in tunnel match rule has no support in current PMD
+	 *   implementation.
+	 * - TAG & META are reserved for future uses.
+	 */
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_SET) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_DECAP    |
+					    MLX5_FLOW_ACTION_MARK     |
+					    MLX5_FLOW_ACTION_SET_TAG  |
+					    MLX5_FLOW_ACTION_SET_META |
+					    MLX5_FLOW_ACTION_DROP;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set decap rule");
+		if (!(action_flags & MLX5_FLOW_ACTION_JUMP))
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel set decap rule must terminate "
+					"with JUMP");
+		if (!attr->ingress)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel flows for ingress traffic only");
+	}
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_MATCH) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_JUMP    |
+					    MLX5_FLOW_ACTION_MARK    |
+					    MLX5_FLOW_ACTION_SET_TAG |
+					    MLX5_FLOW_ACTION_SET_META;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set match rule");
+	}
 	/*
 	 * Validate the drop action mutual exclusion with other actions.
 	 * Drop action is mutually-exclusive with any other action, except for
@@ -7616,6 +7711,9 @@ static struct mlx5_flow_tbl_resource *
 flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 			 uint32_t table_id, uint8_t egress,
 			 uint8_t transfer,
+			 bool external,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group_id,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -7652,6 +7750,9 @@ flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	tbl_data->idx = idx;
+	tbl_data->tunnel = tunnel;
+	tbl_data->group_id = group_id;
+	tbl_data->external = external;
 	tbl = &tbl_data->tbl;
 	pos = &tbl_data->entry;
 	if (transfer)
@@ -7715,6 +7816,41 @@ flow_dv_tbl_resource_release(struct rte_eth_dev *dev,
 
 		mlx5_flow_os_destroy_flow_tbl(tbl->obj);
 		tbl->obj = NULL;
+		if (is_tunnel_offload_active(dev) && tbl_data->external) {
+			struct mlx5_hlist_entry *he;
+			struct mlx5_hlist *tunnel_grp_hash;
+			struct mlx5_flow_tunnel_hub *thub =
+							mlx5_tunnel_hub(dev);
+			union tunnel_tbl_key tunnel_key = {
+				.tunnel_id = tbl_data->tunnel ?
+						tbl_data->tunnel->tunnel_id : 0,
+				.group = tbl_data->group_id
+			};
+			union mlx5_flow_tbl_key table_key = {
+				.v64 = pos->key
+			};
+			uint32_t table_id = table_key.table_id;
+
+			tunnel_grp_hash = tbl_data->tunnel ?
+						tbl_data->tunnel->groups :
+						thub->groups;
+			he = mlx5_hlist_lookup(tunnel_grp_hash, tunnel_key.val);
+			if (he) {
+				struct tunnel_tbl_entry *tte;
+				tte = container_of(he, typeof(*tte), hash);
+				MLX5_ASSERT(tte->flow_table == table_id);
+				mlx5_hlist_remove(tunnel_grp_hash, he);
+				mlx5_free(tte);
+			}
+			mlx5_flow_id_release(mlx5_tunnel_hub(dev)->table_ids,
+					     tunnel_flow_tbl_to_id(table_id));
+			DRV_LOG(DEBUG,
+				"port %u release table_id %#x tunnel %u group %u",
+				dev->data->port_id, table_id,
+				tbl_data->tunnel ?
+				tbl_data->tunnel->tunnel_id : 0,
+				tbl_data->group_id);
+		}
 		/* remove the entry from the hash list and free memory. */
 		mlx5_hlist_remove(sh->flow_tbls, pos);
 		mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_JUMP],
@@ -7760,7 +7896,7 @@ flow_dv_matcher_register(struct rte_eth_dev *dev,
 	int ret;
 
 	tbl = flow_dv_tbl_resource_get(dev, key->table_id, key->direction,
-				       key->domain, error);
+				       key->domain, false, NULL, 0, error);
 	if (!tbl)
 		return -rte_errno;	/* No need to refill the error info */
 	tbl_data = container_of(tbl, struct mlx5_flow_tbl_data_entry, tbl);
@@ -8215,11 +8351,23 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 	struct rte_vlan_hdr vlan = { 0 };
 	uint32_t table;
 	int ret = 0;
-
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!dev_flow->external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
+	tunnel = is_flow_tunnel_match_rule(dev, attr, items, actions) ?
+		 flow_items_to_tunnel(items) :
+		 is_flow_tunnel_steer_rule(dev, attr, items, actions) ?
+		 flow_actions_to_tunnel(actions) :
+		 dev_flow->tunnel ? dev_flow->tunnel : NULL;
 	mhdr_res->ft_type = attr->egress ? MLX5DV_FLOW_TABLE_TYPE_NIC_TX :
 					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
-	ret = mlx5_flow_group_to_table(attr, dev_flow->external, attr->group,
-				       !!priv->fdb_def_rule, &table, error);
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attr->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	dev_flow->dv.group = table;
@@ -8229,6 +8377,45 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		priority = dev_conf->flow_prio - 1;
 	/* number of actions must be set to 0 in case of dirty stack. */
 	mhdr_res->actions_num = 0;
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		/*
+		 * do not add decap action if match rule drops packet
+		 * HW rejects rules with decap & drop
+		 */
+		bool add_decap = true;
+		const struct rte_flow_action *ptr = actions;
+		struct mlx5_flow_tbl_resource *tbl;
+
+		for (; ptr->type != RTE_FLOW_ACTION_TYPE_END; ptr++) {
+			if (ptr->type == RTE_FLOW_ACTION_TYPE_DROP) {
+				add_decap = false;
+				break;
+			}
+		}
+		if (add_decap) {
+			if (flow_dv_create_action_l2_decap(dev, dev_flow,
+							   attr->transfer,
+							   error))
+				return -rte_errno;
+			dev_flow->dv.actions[actions_n++] =
+					dev_flow->dv.encap_decap->action;
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
+		}
+		/*
+		 * bind table_id with <group, table> for tunnel match rule.
+		 * Tunnel set rule establishes that bind in JUMP action handler.
+		 * Required for scenario when application creates tunnel match
+		 * rule before tunnel set rule.
+		 */
+		tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+					       attr->transfer,
+					       !!dev_flow->external, tunnel,
+					       attr->group, error);
+		if (!tbl)
+			return rte_flow_error_set
+			       (error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+			       actions, "cannot register tunnel group");
+	}
 	for (; !actions_end ; actions++) {
 		const struct rte_flow_action_queue *queue;
 		const struct rte_flow_action_rss *rss;
@@ -8249,6 +8436,9 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 						  actions,
 						  "action not supported");
 		switch (action_type) {
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -8480,16 +8670,19 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
+			grp_info.std_tbl_fix = 0;
 			jump_data = action->conf;
-			ret = mlx5_flow_group_to_table(attr, dev_flow->external,
+			ret = mlx5_flow_group_to_table(dev, tunnel,
 						       jump_data->group,
-						       !!priv->fdb_def_rule,
-						       &table, error);
+						       &table,
+						       grp_info, error);
 			if (ret)
 				return ret;
-			tbl = flow_dv_tbl_resource_get(dev, table,
-						       attr->egress,
-						       attr->transfer, error);
+			tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+						       attr->transfer,
+						       !!dev_flow->external,
+						       tunnel, jump_data->group,
+						       error);
 			if (!tbl)
 				return rte_flow_error_set
 						(error, errno,
@@ -9681,7 +9874,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 		dtb = &mtb->ingress;
 	/* Create the meter table with METER level. */
 	dtb->tbl = flow_dv_tbl_resource_get(dev, MLX5_FLOW_TABLE_LEVEL_METER,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->tbl) {
 		DRV_LOG(ERR, "Failed to create meter policer table.");
 		return -1;
@@ -9689,7 +9883,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 	/* Create the meter suffix table with SUFFIX level. */
 	dtb->sfx_tbl = flow_dv_tbl_resource_get(dev,
 					    MLX5_FLOW_TABLE_LEVEL_SUFFIX,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->sfx_tbl) {
 		DRV_LOG(ERR, "Failed to create meter suffix table.");
 		return -1;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v3 4/4] app/testpmd: add commands for tunnel offload API
  2020-09-30  9:18 ` [dpdk-dev] [PATCH v3 0/4] Tunnel Offload API Gregory Etelson
                     ` (2 preceding siblings ...)
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 3/4] net/mlx5: implement tunnel offload API Gregory Etelson
@ 2020-09-30  9:18   ` Gregory Etelson
  2020-10-01  5:32     ` Ajit Khaparde
  3 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-09-30  9:18 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, Ori Kam, Wenzhuo Lu, Beilei Xing,
	Bernard Iremonger, John McNamara, Marko Kovacevic

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:

* Create application tunnel:
flow tunnel create <port> type <tunnel type>
On success, the command creates application tunnel object and returns
the tunnel descriptor. Tunnel descriptor is used in subsequent flow
creation commands to reference the tunnel.

* Create tunnel steering flow rule:
tunnel_set <tunnel descriptor> parameter used with steering rule
template.

* Create tunnel matching flow rule:
tunnel_match <tunnel descriptor> used with matching rule template.

* If tunnel steering rule was offloaded, outer header of a partially
offloaded packet is restored after miss.

Example:
test packet=
<Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
<UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
<VXLAN  NextProtocol=Ethernet vni=0x0 |
<Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
<ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
>>> len(packet)
92

testpmd> flow flush 0
testpmd> port 0/queue 0: received 1 packets
src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow tunnel 0 type vxlan
port 0: flow tunnel #1 type vxlan
testpmd> flow create 0 ingress group 0 tunnel_set 1
         pattern eth /ipv4 / udp dst is 4789 / vxlan / end
         actions  jump group 0 / end
Flow rule #0 created
testpmd> port 0/queue 0: received 1 packets
tunnel restore info: - vxlan tunnel - outer header present # <--
  src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow create 0 ingress group 0 tunnel_match 1
         pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
         end
         actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
         queue index 0 / end
Flow rule #1 created
testpmd> port 0/queue 0: received 1 packets
  src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
length=42

* Destroy flow tunnel
flow tunnel destroy <port> id <tunnel id>

* Show existing flow tunnels
flow tunnel list <port>

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
v2:
* introduce testpmd support for tunnel offload API

v3:
* update flow tunnel commands
---
 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 253 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 6 files changed, 533 insertions(+), 13 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 6263d307ed..0fb61860cd 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -69,6 +69,14 @@ enum index {
 	LIST,
 	AGED,
 	ISOLATE,
+	TUNNEL,
+
+	/* Tunnel argumens. */
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	TUNNEL_LIST,
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
 
 	/* Destroy arguments. */
 	DESTROY_RULE,
@@ -88,6 +96,8 @@ enum index {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 
 	/* Validate/create pattern. */
 	PATTERN,
@@ -653,6 +663,7 @@ struct buffer {
 	union {
 		struct {
 			struct rte_flow_attr attr;
+			struct tunnel_ops tunnel_ops;
 			struct rte_flow_item *pattern;
 			struct rte_flow_action *actions;
 			uint32_t pattern_n;
@@ -713,10 +724,32 @@ static const enum index next_vc_attr[] = {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 	PATTERN,
 	ZERO,
 };
 
+static const enum index tunnel_create_attr[] = {
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_destroy_attr[] = {
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_list_attr[] = {
+	TUNNEL_LIST,
+	END,
+	ZERO,
+};
+
 static const enum index next_destroy_attr[] = {
 	DESTROY_RULE,
 	END,
@@ -1516,6 +1549,9 @@ static int parse_aged(struct context *, const struct token *,
 static int parse_isolate(struct context *, const struct token *,
 			 const char *, unsigned int,
 			 void *, unsigned int);
+static int parse_tunnel(struct context *, const struct token *,
+			const char *, unsigned int,
+			void *, unsigned int);
 static int parse_int(struct context *, const struct token *,
 		     const char *, unsigned int,
 		     void *, unsigned int);
@@ -1698,7 +1734,8 @@ static const struct token token_list[] = {
 			      LIST,
 			      AGED,
 			      QUERY,
-			      ISOLATE)),
+			      ISOLATE,
+			      TUNNEL)),
 		.call = parse_init,
 	},
 	/* Sub-level commands. */
@@ -1772,6 +1809,49 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY(struct buffer, port)),
 		.call = parse_isolate,
 	},
+	[TUNNEL] = {
+		.name = "tunnel",
+		.help = "new tunnel API",
+		.next = NEXT(NEXT_ENTRY
+			     (TUNNEL_CREATE, TUNNEL_LIST, TUNNEL_DESTROY)),
+		.call = parse_tunnel,
+	},
+	/* Tunnel arguments. */
+	[TUNNEL_CREATE] = {
+		.name = "create",
+		.help = "create new tunnel object",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_CREATE_TYPE] = {
+		.name = "type",
+		.help = "create new tunnel",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(FILE_PATH)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, type)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY] = {
+		.name = "destroy",
+		.help = "destroy tunel",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY_ID] = {
+		.name = "id",
+		.help = "tunnel identifier to testroy",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_LIST] = {
+		.name = "list",
+		.help = "list existing tunnels",
+		.next = NEXT(tunnel_list_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
 	/* Destroy arguments. */
 	[DESTROY_RULE] = {
 		.name = "rule",
@@ -1835,6 +1915,20 @@ static const struct token token_list[] = {
 		.next = NEXT(next_vc_attr),
 		.call = parse_vc,
 	},
+	[TUNNEL_SET] = {
+		.name = "tunnel_set",
+		.help = "tunnel steer rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
+	[TUNNEL_MATCH] = {
+		.name = "tunnel_match",
+		.help = "tunnel match rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
 	/* Validate/create pattern. */
 	[PATTERN] = {
 		.name = "pattern",
@@ -4054,12 +4148,28 @@ parse_vc(struct context *ctx, const struct token *token,
 		return len;
 	}
 	ctx->objdata = 0;
-	ctx->object = &out->args.vc.attr;
+	switch (ctx->curr) {
+	default:
+		ctx->object = &out->args.vc.attr;
+		break;
+	case TUNNEL_SET:
+	case TUNNEL_MATCH:
+		ctx->object = &out->args.vc.tunnel_ops;
+		break;
+	}
 	ctx->objmask = NULL;
 	switch (ctx->curr) {
 	case GROUP:
 	case PRIORITY:
 		return len;
+	case TUNNEL_SET:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.actions = 1;
+		return len;
+	case TUNNEL_MATCH:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.items = 1;
+		return len;
 	case INGRESS:
 		out->args.vc.attr.ingress = 1;
 		return len;
@@ -5597,6 +5707,47 @@ parse_isolate(struct context *ctx, const struct token *token,
 	return len;
 }
 
+static int
+parse_tunnel(struct context *ctx, const struct token *token,
+	     const char *str, unsigned int len,
+	     void *buf, unsigned int size)
+{
+	struct buffer *out = buf;
+
+	/* Token name must match. */
+	if (parse_default(ctx, token, str, len, NULL, 0) < 0)
+		return -1;
+	/* Nothing else to do if there is no buffer. */
+	if (!out)
+		return len;
+	if (!out->command) {
+		if (ctx->curr != TUNNEL)
+			return -1;
+		if (sizeof(*out) > size)
+			return -1;
+		out->command = ctx->curr;
+		ctx->objdata = 0;
+		ctx->object = out;
+		ctx->objmask = NULL;
+	} else {
+		switch (ctx->curr) {
+		default:
+			break;
+		case TUNNEL_CREATE:
+		case TUNNEL_DESTROY:
+		case TUNNEL_LIST:
+			out->command = ctx->curr;
+			break;
+		case TUNNEL_CREATE_TYPE:
+		case TUNNEL_DESTROY_ID:
+			ctx->object = &out->args.vc.tunnel_ops;
+			break;
+		}
+	}
+
+	return len;
+}
+
 /**
  * Parse signed/unsigned integers 8 to 64-bit long.
  *
@@ -6543,11 +6694,13 @@ cmd_flow_parsed(const struct buffer *in)
 	switch (in->command) {
 	case VALIDATE:
 		port_flow_validate(in->port, &in->args.vc.attr,
-				   in->args.vc.pattern, in->args.vc.actions);
+				   in->args.vc.pattern, in->args.vc.actions,
+				   &in->args.vc.tunnel_ops);
 		break;
 	case CREATE:
 		port_flow_create(in->port, &in->args.vc.attr,
-				 in->args.vc.pattern, in->args.vc.actions);
+				 in->args.vc.pattern, in->args.vc.actions,
+				 &in->args.vc.tunnel_ops);
 		break;
 	case DESTROY:
 		port_flow_destroy(in->port, in->args.destroy.rule_n,
@@ -6573,6 +6726,15 @@ cmd_flow_parsed(const struct buffer *in)
 	case AGED:
 		port_flow_aged(in->port, in->args.aged.destroy);
 		break;
+	case TUNNEL_CREATE:
+		port_flow_tunnel_create(in->port, &in->args.vc.tunnel_ops);
+		break;
+	case TUNNEL_DESTROY:
+		port_flow_tunnel_destroy(in->port, in->args.vc.tunnel_ops.id);
+		break;
+	case TUNNEL_LIST:
+		port_flow_tunnel_list(in->port);
+		break;
 	default:
 		break;
 	}
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 2d9a456467..d0f86230d0 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1339,6 +1339,115 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 
 /* Generic flow management functions. */
 
+static struct port_flow_tunnel *
+port_flow_locate_tunnel_id(struct rte_port *port, uint32_t port_tunnel_id)
+{
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (flow_tunnel->id == port_tunnel_id)
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+const char *
+port_flow_tunnel_type(struct rte_flow_tunnel *tunnel)
+{
+	const char *type;
+	switch (tunnel->type) {
+	default:
+		type = "unknown";
+		break;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		type = "vxlan";
+		break;
+	}
+
+	return type;
+}
+
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (!memcmp(&flow_tunnel->tunnel, tun, sizeof(*tun)))
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+void port_flow_tunnel_list(portid_t port_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		printf("port %u tunnel #%u type=%s",
+			port_id, flt->id, port_flow_tunnel_type(&flt->tunnel));
+		if (flt->tunnel.tun_id)
+			printf(" id=%lu", flt->tunnel.tun_id);
+		printf("\n");
+	}
+}
+
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->id == tunnel_id)
+			break;
+	}
+	if (flt) {
+		LIST_REMOVE(flt, chain);
+		free(flt);
+		printf("port %u: flow tunnel #%u destroyed\n",
+			port_id, tunnel_id);
+	}
+}
+
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops)
+{
+	struct rte_port *port = &ports[port_id];
+	enum rte_flow_item_type	type;
+	struct port_flow_tunnel *flt;
+
+	if (!strcmp(ops->type, "vxlan"))
+		type = RTE_FLOW_ITEM_TYPE_VXLAN;
+	else {
+		printf("cannot offload \"%s\" tunnel type\n", ops->type);
+		return;
+	}
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->tunnel.type == type)
+			break;
+	}
+	if (!flt) {
+		flt = calloc(1, sizeof(*flt));
+		if (!flt) {
+			printf("failed to allocate port flt object\n");
+			return;
+		}
+		flt->tunnel.type = type;
+		flt->id = LIST_EMPTY(&port->flow_tunnel_list) ? 1 :
+				  LIST_FIRST(&port->flow_tunnel_list)->id + 1;
+		LIST_INSERT_HEAD(&port->flow_tunnel_list, flt, chain);
+	}
+	printf("port %d: flow tunnel #%u type %s\n",
+		port_id, flt->id, ops->type);
+}
+
 /** Generate a port_flow entry from attributes/pattern/actions. */
 static struct port_flow *
 port_flow_new(const struct rte_flow_attr *attr,
@@ -1463,19 +1572,137 @@ rss_config_display(struct rte_flow_action_rss *rss_conf)
 	}
 }
 
+static struct port_flow_tunnel *
+port_flow_tunnel_offload_cmd_prep(portid_t port_id,
+				  const struct rte_flow_item *pattern,
+				  const struct rte_flow_action *actions,
+				  const struct tunnel_ops *tunnel_ops)
+{
+	int ret;
+	struct rte_port *port;
+	struct port_flow_tunnel *pft;
+	struct rte_flow_error error;
+
+	port = &ports[port_id];
+	pft = port_flow_locate_tunnel_id(port, tunnel_ops->id);
+	if (!pft) {
+		printf("failed to locate port flow tunnel #%u\n",
+			tunnel_ops->id);
+		return NULL;
+	}
+	if (tunnel_ops->actions) {
+		uint32_t num_actions;
+		const struct rte_flow_action *aptr;
+
+		ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
+						&pft->pmd_actions,
+						&pft->num_pmd_actions,
+						&error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (aptr = actions, num_actions = 1;
+		     aptr->type != RTE_FLOW_ACTION_TYPE_END;
+		     aptr++, num_actions++);
+		pft->actions = malloc(
+				(num_actions +  pft->num_pmd_actions) *
+				sizeof(actions[0]));
+		if (!pft->actions) {
+			rte_flow_tunnel_action_decap_release(
+					port_id, pft->actions,
+					pft->num_pmd_actions, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->actions, pft->pmd_actions,
+			   pft->num_pmd_actions * sizeof(actions[0]));
+		rte_memcpy(pft->actions + pft->num_pmd_actions, actions,
+			   num_actions * sizeof(actions[0]));
+	}
+	if (tunnel_ops->items) {
+		uint32_t num_items;
+		const struct rte_flow_item *iptr;
+
+		ret = rte_flow_tunnel_match(port_id, &pft->tunnel,
+					    &pft->pmd_items,
+					    &pft->num_pmd_items,
+					    &error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (iptr = pattern, num_items = 1;
+		     iptr->type != RTE_FLOW_ITEM_TYPE_END;
+		     iptr++, num_items++);
+		pft->items = malloc((num_items + pft->num_pmd_items) *
+				    sizeof(pattern[0]));
+		if (!pft->items) {
+			rte_flow_tunnel_item_release(
+					port_id, pft->pmd_items,
+					pft->num_pmd_items, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->items, pft->pmd_items,
+			   pft->num_pmd_items * sizeof(pattern[0]));
+		rte_memcpy(pft->items + pft->num_pmd_items, pattern,
+			   num_items * sizeof(pattern[0]));
+	}
+
+	return pft;
+}
+
+static void
+port_flow_tunnel_offload_cmd_release(portid_t port_id,
+				     const struct tunnel_ops *tunnel_ops,
+				     struct port_flow_tunnel *pft)
+{
+	struct rte_flow_error error;
+
+	if (tunnel_ops->actions) {
+		free(pft->actions);
+		rte_flow_tunnel_action_decap_release(
+			port_id, pft->pmd_actions,
+			pft->num_pmd_actions, &error);
+		pft->actions = NULL;
+		pft->pmd_actions = NULL;
+	}
+	if (tunnel_ops->items) {
+		free(pft->items);
+		rte_flow_tunnel_item_release(port_id, pft->pmd_items,
+					     pft->num_pmd_items,
+					     &error);
+		pft->items = NULL;
+		pft->pmd_items = NULL;
+	}
+}
+
 /** Validate flow rule. */
 int
 port_flow_validate(portid_t port_id,
 		   const struct rte_flow_attr *attr,
 		   const struct rte_flow_item *pattern,
-		   const struct rte_flow_action *actions)
+		   const struct rte_flow_action *actions,
+		   const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	/* Poisoning to make sure PMDs update it in case of error. */
 	memset(&error, 0x11, sizeof(error));
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	if (rte_flow_validate(port_id, attr, pattern, actions, &error))
 		return port_flow_complain(&error);
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule validated\n");
 	return 0;
 }
@@ -1505,13 +1732,15 @@ int
 port_flow_create(portid_t port_id,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item *pattern,
-		 const struct rte_flow_action *actions)
+		 const struct rte_flow_action *actions,
+		 const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow *flow;
 	struct rte_port *port;
 	struct port_flow *pf;
 	uint32_t id = 0;
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	port = &ports[port_id];
 	if (port->flow_list) {
@@ -1522,6 +1751,16 @@ port_flow_create(portid_t port_id,
 		}
 		id = port->flow_list->id + 1;
 	}
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	pf = port_flow_new(attr, pattern, actions, &error);
 	if (!pf)
 		return port_flow_complain(&error);
@@ -1537,6 +1776,8 @@ port_flow_create(portid_t port_id,
 	pf->id = id;
 	pf->flow = flow;
 	port->flow_list = pf;
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule #%u created\n", pf->id);
 	return 0;
 }
@@ -1831,7 +2072,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 		       pf->rule.attr->egress ? 'e' : '-',
 		       pf->rule.attr->transfer ? 't' : '-');
 		while (item->type != RTE_FLOW_ITEM_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
+			if ((uint32_t)item->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)item->type,
 					  NULL) <= 0)
@@ -1842,7 +2085,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 		}
 		printf("=>");
 		while (action->type != RTE_FLOW_ACTION_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
+			if ((uint32_t)action->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)action->type,
 					  NULL) <= 0)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index fe6450cc0d..e484079147 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3588,6 +3588,8 @@ init_port_dcb_config(portid_t pid,
 static void
 init_port(void)
 {
+	int i;
+
 	/* Configuration of Ethernet ports. */
 	ports = rte_zmalloc("testpmd: ports",
 			    sizeof(struct rte_port) * RTE_MAX_ETHPORTS,
@@ -3597,7 +3599,8 @@ init_port(void)
 				"rte_zmalloc(%d struct rte_port) failed\n",
 				RTE_MAX_ETHPORTS);
 	}
-
+	for (i = 0; i < RTE_MAX_ETHPORTS; i++)
+		LIST_INIT(&ports[i].flow_tunnel_list);
 	/* Initialize ports NUMA structures */
 	memset(port_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
 	memset(rxring_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f139fe7a0a..1dd8cfd3d8 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -12,6 +12,7 @@
 #include <rte_gro.h>
 #include <rte_gso.h>
 #include <cmdline.h>
+#include <sys/queue.h>
 
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
@@ -142,6 +143,26 @@ struct port_flow {
 	uint8_t data[]; /**< Storage for flow rule description */
 };
 
+struct port_flow_tunnel {
+	LIST_ENTRY(port_flow_tunnel) chain;
+	struct rte_flow_action *pmd_actions;
+	struct rte_flow_item   *pmd_items;
+	uint32_t id;
+	uint32_t num_pmd_actions;
+	uint32_t num_pmd_items;
+	struct rte_flow_tunnel tunnel;
+	struct rte_flow_action *actions;
+	struct rte_flow_item *items;
+};
+
+struct tunnel_ops {
+	uint32_t id;
+	char type[16];
+	uint32_t enabled:1;
+	uint32_t actions:1;
+	uint32_t items:1;
+};
+
 /**
  * The data structure associated with each port.
  */
@@ -172,6 +193,7 @@ struct rte_port {
 	uint32_t                mc_addr_nb; /**< nb. of addr. in mc_addr_pool */
 	uint8_t                 slave_flag; /**< bonding slave port */
 	struct port_flow        *flow_list; /**< Associated flows. */
+	LIST_HEAD(, port_flow_tunnel) flow_tunnel_list;
 	const struct rte_eth_rxtx_callback *rx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	const struct rte_eth_rxtx_callback *tx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	/**< metadata value to insert in Tx packets. */
@@ -749,11 +771,13 @@ void port_reg_set(portid_t port_id, uint32_t reg_off, uint32_t value);
 int port_flow_validate(portid_t port_id,
 		       const struct rte_flow_attr *attr,
 		       const struct rte_flow_item *pattern,
-		       const struct rte_flow_action *actions);
+		       const struct rte_flow_action *actions,
+		       const struct tunnel_ops *tunnel_ops);
 int port_flow_create(portid_t port_id,
 		     const struct rte_flow_attr *attr,
 		     const struct rte_flow_item *pattern,
-		     const struct rte_flow_action *actions);
+		     const struct rte_flow_action *actions,
+		     const struct tunnel_ops *tunnel_ops);
 void update_age_action_context(const struct rte_flow_action *actions,
 		     struct port_flow *pf);
 int port_flow_destroy(portid_t port_id, uint32_t n, const uint32_t *rule);
@@ -763,6 +787,12 @@ int port_flow_query(portid_t port_id, uint32_t rule,
 		    const struct rte_flow_action *action);
 void port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group);
 void port_flow_aged(portid_t port_id, uint8_t destroy);
+const char *port_flow_tunnel_type(struct rte_flow_tunnel *tunnel);
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun);
+void port_flow_tunnel_list(portid_t port_id);
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id);
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops);
 int port_flow_isolate(portid_t port_id, int set);
 
 void rx_ring_desc_display(portid_t port_id, queueid_t rxq_id, uint16_t rxd_id);
diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index 8488fa1a8f..781a813759 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -48,18 +48,49 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 	       is_rx ? "received" : "sent",
 	       (unsigned int) nb_pkts);
 	for (i = 0; i < nb_pkts; i++) {
+		int ret;
+		struct rte_flow_error error;
+		struct rte_flow_restore_info info = { 0, };
+
 		mb = pkts[i];
 		eth_hdr = rte_pktmbuf_read(mb, 0, sizeof(_eth_hdr), &_eth_hdr);
 		eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
-		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
-
+		ret = rte_flow_get_restore_info(port_id, mb, &info, &error);
+		if (!ret) {
+			printf("restore info:");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
+				struct port_flow_tunnel *port_tunnel;
+
+				port_tunnel = port_flow_locate_tunnel
+					      (port_id, &info.tunnel);
+				printf(" - tunnel");
+				if (port_tunnel)
+					printf(" #%u", port_tunnel->id);
+				else
+					printf(" %s", "-none-");
+				printf(" type %s",
+					port_flow_tunnel_type(&info.tunnel));
+			} else {
+				printf(" - no tunnel info");
+			}
+			if (info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED)
+				printf(" - outer header present");
+			else
+				printf(" - no outer header");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_GROUP_ID)
+				printf(" - miss group %u", info.group_id);
+			else
+				printf(" - no miss group");
+			printf("\n");
+		}
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
 		printf(" - type=0x%04x - length=%u - nb_segs=%d",
 		       eth_type, (unsigned int) mb->pkt_len,
 		       (int)mb->nb_segs);
+		ol_flags = mb->ol_flags;
 		if (ol_flags & PKT_RX_RSS_HASH) {
 			printf(" - RSS hash=0x%x", (unsigned int) mb->hash.rss);
 			printf(" - RSS queue=0x%x", (unsigned int) queue);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a972ef8951..97fcbfd329 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3720,6 +3720,45 @@ following sections.
 
    flow aged {port_id} [destroy]
 
+- Tunnel offload - create a tunnel stub::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+- Tunnel offload - destroy a tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+- Tunnel offload - list port tunnel stubs::
+
+   flow tunnel list {port_id}
+
+Creating a tunnel stub for offload
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel create`` setup a tunnel stub for tunnel offload flow rules::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+If successful, it will return a tunnel stub ID usable with other commands::
+
+   port [...]: flow tunnel #[...] type [...]
+
+Tunnel stub ID is relative to a port.
+
+Destroying tunnel offload stub
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel destroy`` destroy port tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+Listing tunnel offload stubs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel list`` list port tunnel offload stubs::
+
+   flow tunnel list {port_id}
+
 Validating flow rules
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -3766,6 +3805,7 @@ to ``rte_flow_create()``::
 
    flow create {port_id}
       [group {group_id}] [priority {level}] [ingress] [egress] [transfer]
+      [tunnel_set {tunnel_id}] [tunnel_match {tunnel_id}]
       pattern {item} [/ {item} [...]] / end
       actions {action} [/ {action} [...]] / end
 
@@ -3780,6 +3820,7 @@ Otherwise it will show an error message of the form::
 Parameters describe in the following order:
 
 - Attributes (*group*, *priority*, *ingress*, *egress*, *transfer* tokens).
+- Tunnel offload specification (tunnel_set, tunnel_match)
 - A matching pattern, starting with the *pattern* token and terminated by an
   *end* pattern item.
 - Actions, starting with the *actions* token and terminated by an *end*
@@ -3823,6 +3864,14 @@ Most rules affect RX therefore contain the ``ingress`` token::
 
    testpmd> flow create 0 ingress pattern [...]
 
+Tunnel offload
+^^^^^^^^^^^^^^
+
+Indicate tunnel offload rule type
+
+- ``tunnel_set {tunnel_id}``: mark rule as tunnel offload decap_set type.
+- ``tunnel_match {tunnel_id}``:  mark rule as tunel offload match type.
+
 Matching pattern
 ^^^^^^^^^^^^^^^^
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] app/testpmd: add commands for tunnel offload API
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 4/4] app/testpmd: add commands for " Gregory Etelson
@ 2020-10-01  5:32     ` Ajit Khaparde
  2020-10-01  9:05       ` Gregory Etelson
  0 siblings, 1 reply; 95+ messages in thread
From: Ajit Khaparde @ 2020-10-01  5:32 UTC (permalink / raw)
  To: Gregory Etelson
  Cc: dpdk-dev, Matan Azrad, rasland, Ori Kam, Wenzhuo Lu, Beilei Xing,
	Bernard Iremonger, John McNamara, Marko Kovacevic

On Wed, Sep 30, 2020 at 2:21 AM Gregory Etelson <getelson@nvidia.com> wrote:
>
> Tunnel Offload API provides hardware independent, unified model
> to offload tunneled traffic. Key model elements are:
>  - apply matches to both outer and inner packet headers
>    during entire offload procedure;
>  - restore outer header of partially offloaded packet;
>  - model is implemented as a set of helper functions.
>
> Implementation details:
>
> * Create application tunnel:
> flow tunnel create <port> type <tunnel type>
> On success, the command creates application tunnel object and returns
> the tunnel descriptor. Tunnel descriptor is used in subsequent flow
> creation commands to reference the tunnel.
>
> * Create tunnel steering flow rule:
> tunnel_set <tunnel descriptor> parameter used with steering rule
> template.
>
> * Create tunnel matching flow rule:
> tunnel_match <tunnel descriptor> used with matching rule template.
>
> * If tunnel steering rule was offloaded, outer header of a partially
> offloaded packet is restored after miss.
>
> Example:
> test packet=
> <Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
> <IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
> <UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
> <VXLAN  NextProtocol=Ethernet vni=0x0 |
> <Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
> <IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
> <ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
> >>> len(packet)
> 92
>
> testpmd> flow flush 0
> testpmd> port 0/queue 0: received 1 packets
> src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
> length=92
>
> testpmd> flow tunnel 0 type vxlan
> port 0: flow tunnel #1 type vxlan
> testpmd> flow create 0 ingress group 0 tunnel_set 1
>          pattern eth /ipv4 / udp dst is 4789 / vxlan / end
>          actions  jump group 0 / end

I could not get enough time to completely look at this.
Can you help understand the sequence a bit.

So when this flow create is issued, it looks like the application issues
one of the helper APIs.
In response the PMD returns the vendor specific items/actions.
So what happens to the original match and action criteria provided by the user
or applications?

Does the vendor provided actions and items in the tunnel descriptor
take precedence over what was used by the application?

And is the criteria provided by the PMD opaque to the application?
Such that only the PMD can decipher it when the application calls it
for subsequent flow create(s)?

> Flow rule #0 created
> testpmd> port 0/queue 0: received 1 packets
> tunnel restore info: - vxlan tunnel - outer header present # <--
>   src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
> length=92
>
> testpmd> flow create 0 ingress group 0 tunnel_match 1
>          pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
>          end
>          actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
>          queue index 0 / end
> Flow rule #1 created
> testpmd> port 0/queue 0: received 1 packets
>   src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
> length=42
>
> * Destroy flow tunnel
> flow tunnel destroy <port> id <tunnel id>
>
> * Show existing flow tunnels
> flow tunnel list <port>
>
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> ---
> v2:
> * introduce testpmd support for tunnel offload API
>
> v3:
> * update flow tunnel commands
> ---
>  app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
>  app/test-pmd/config.c                       | 253 +++++++++++++++++++-
>  app/test-pmd/testpmd.c                      |   5 +-
>  app/test-pmd/testpmd.h                      |  34 ++-
>  app/test-pmd/util.c                         |  35 ++-
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
>  6 files changed, 533 insertions(+), 13 deletions(-)
>
> diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
> index 6263d307ed..0fb61860cd 100644
> --- a/app/test-pmd/cmdline_flow.c
> +++ b/app/test-pmd/cmdline_flow.c
> @@ -69,6 +69,14 @@ enum index {
>         LIST,
>         AGED,
>         ISOLATE,
> +       TUNNEL,
> +
> +       /* Tunnel argumens. */
> +       TUNNEL_CREATE,
> +       TUNNEL_CREATE_TYPE,
> +       TUNNEL_LIST,
> +       TUNNEL_DESTROY,
> +       TUNNEL_DESTROY_ID,
>
>         /* Destroy arguments. */
>         DESTROY_RULE,
> @@ -88,6 +96,8 @@ enum index {
>         INGRESS,
>         EGRESS,
>         TRANSFER,
> +       TUNNEL_SET,
> +       TUNNEL_MATCH,
>
>         /* Validate/create pattern. */
>         PATTERN,
> @@ -653,6 +663,7 @@ struct buffer {
>         union {
>                 struct {
>                         struct rte_flow_attr attr;
> +                       struct tunnel_ops tunnel_ops;
>                         struct rte_flow_item *pattern;
>                         struct rte_flow_action *actions;
>                         uint32_t pattern_n;
> @@ -713,10 +724,32 @@ static const enum index next_vc_attr[] = {
>         INGRESS,
>         EGRESS,
>         TRANSFER,
> +       TUNNEL_SET,
> +       TUNNEL_MATCH,
>         PATTERN,
>         ZERO,
>  };
>
> +static const enum index tunnel_create_attr[] = {
> +       TUNNEL_CREATE,
> +       TUNNEL_CREATE_TYPE,
> +       END,
> +       ZERO,
> +};
> +
> +static const enum index tunnel_destroy_attr[] = {
> +       TUNNEL_DESTROY,
> +       TUNNEL_DESTROY_ID,
> +       END,
> +       ZERO,
> +};
> +
> +static const enum index tunnel_list_attr[] = {
> +       TUNNEL_LIST,
> +       END,
> +       ZERO,
> +};
> +
>  static const enum index next_destroy_attr[] = {
>         DESTROY_RULE,
>         END,
> @@ -1516,6 +1549,9 @@ static int parse_aged(struct context *, const struct token *,
>  static int parse_isolate(struct context *, const struct token *,
>                          const char *, unsigned int,
>                          void *, unsigned int);
> +static int parse_tunnel(struct context *, const struct token *,
> +                       const char *, unsigned int,
> +                       void *, unsigned int);
>  static int parse_int(struct context *, const struct token *,
>                      const char *, unsigned int,
>                      void *, unsigned int);
> @@ -1698,7 +1734,8 @@ static const struct token token_list[] = {
>                               LIST,
>                               AGED,
>                               QUERY,
> -                             ISOLATE)),
> +                             ISOLATE,
> +                             TUNNEL)),
>                 .call = parse_init,
>         },
>         /* Sub-level commands. */
> @@ -1772,6 +1809,49 @@ static const struct token token_list[] = {
>                              ARGS_ENTRY(struct buffer, port)),
>                 .call = parse_isolate,
>         },
> +       [TUNNEL] = {
> +               .name = "tunnel",
> +               .help = "new tunnel API",
> +               .next = NEXT(NEXT_ENTRY
> +                            (TUNNEL_CREATE, TUNNEL_LIST, TUNNEL_DESTROY)),
> +               .call = parse_tunnel,
> +       },
> +       /* Tunnel arguments. */
> +       [TUNNEL_CREATE] = {
> +               .name = "create",
> +               .help = "create new tunnel object",
> +               .next = NEXT(tunnel_create_attr, NEXT_ENTRY(PORT_ID)),
> +               .args = ARGS(ARGS_ENTRY(struct buffer, port)),
> +               .call = parse_tunnel,
> +       },
> +       [TUNNEL_CREATE_TYPE] = {
> +               .name = "type",
> +               .help = "create new tunnel",
> +               .next = NEXT(tunnel_create_attr, NEXT_ENTRY(FILE_PATH)),
> +               .args = ARGS(ARGS_ENTRY(struct tunnel_ops, type)),
> +               .call = parse_tunnel,
> +       },
> +       [TUNNEL_DESTROY] = {
> +               .name = "destroy",
> +               .help = "destroy tunel",
> +               .next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(PORT_ID)),
> +               .args = ARGS(ARGS_ENTRY(struct buffer, port)),
> +               .call = parse_tunnel,
> +       },
> +       [TUNNEL_DESTROY_ID] = {
> +               .name = "id",
> +               .help = "tunnel identifier to testroy",
> +               .next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(UNSIGNED)),
> +               .args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
> +               .call = parse_tunnel,
> +       },
> +       [TUNNEL_LIST] = {
> +               .name = "list",
> +               .help = "list existing tunnels",
> +               .next = NEXT(tunnel_list_attr, NEXT_ENTRY(PORT_ID)),
> +               .args = ARGS(ARGS_ENTRY(struct buffer, port)),
> +               .call = parse_tunnel,
> +       },
>         /* Destroy arguments. */
>         [DESTROY_RULE] = {
>                 .name = "rule",
> @@ -1835,6 +1915,20 @@ static const struct token token_list[] = {
>                 .next = NEXT(next_vc_attr),
>                 .call = parse_vc,
>         },
> +       [TUNNEL_SET] = {
> +               .name = "tunnel_set",
> +               .help = "tunnel steer rule",
> +               .next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
> +               .args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
> +               .call = parse_vc,
> +       },
> +       [TUNNEL_MATCH] = {
> +               .name = "tunnel_match",
> +               .help = "tunnel match rule",
> +               .next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
> +               .args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
> +               .call = parse_vc,
> +       },
>         /* Validate/create pattern. */
>         [PATTERN] = {
>                 .name = "pattern",
> @@ -4054,12 +4148,28 @@ parse_vc(struct context *ctx, const struct token *token,
>                 return len;
>         }
>         ctx->objdata = 0;
> -       ctx->object = &out->args.vc.attr;
> +       switch (ctx->curr) {
> +       default:
> +               ctx->object = &out->args.vc.attr;
> +               break;
> +       case TUNNEL_SET:
> +       case TUNNEL_MATCH:
> +               ctx->object = &out->args.vc.tunnel_ops;
> +               break;
> +       }
>         ctx->objmask = NULL;
>         switch (ctx->curr) {
>         case GROUP:
>         case PRIORITY:
>                 return len;
> +       case TUNNEL_SET:
> +               out->args.vc.tunnel_ops.enabled = 1;
> +               out->args.vc.tunnel_ops.actions = 1;
> +               return len;
> +       case TUNNEL_MATCH:
> +               out->args.vc.tunnel_ops.enabled = 1;
> +               out->args.vc.tunnel_ops.items = 1;
> +               return len;
>         case INGRESS:
>                 out->args.vc.attr.ingress = 1;
>                 return len;
> @@ -5597,6 +5707,47 @@ parse_isolate(struct context *ctx, const struct token *token,
>         return len;
>  }
>
> +static int
> +parse_tunnel(struct context *ctx, const struct token *token,
> +            const char *str, unsigned int len,
> +            void *buf, unsigned int size)
> +{
> +       struct buffer *out = buf;
> +
> +       /* Token name must match. */
> +       if (parse_default(ctx, token, str, len, NULL, 0) < 0)
> +               return -1;
> +       /* Nothing else to do if there is no buffer. */
> +       if (!out)
> +               return len;
> +       if (!out->command) {
> +               if (ctx->curr != TUNNEL)
> +                       return -1;
> +               if (sizeof(*out) > size)
> +                       return -1;
> +               out->command = ctx->curr;
> +               ctx->objdata = 0;
> +               ctx->object = out;
> +               ctx->objmask = NULL;
> +       } else {
> +               switch (ctx->curr) {
> +               default:
> +                       break;
> +               case TUNNEL_CREATE:
> +               case TUNNEL_DESTROY:
> +               case TUNNEL_LIST:
> +                       out->command = ctx->curr;
> +                       break;
> +               case TUNNEL_CREATE_TYPE:
> +               case TUNNEL_DESTROY_ID:
> +                       ctx->object = &out->args.vc.tunnel_ops;
> +                       break;
> +               }
> +       }
> +
> +       return len;
> +}
> +
>  /**
>   * Parse signed/unsigned integers 8 to 64-bit long.
>   *
> @@ -6543,11 +6694,13 @@ cmd_flow_parsed(const struct buffer *in)
>         switch (in->command) {
>         case VALIDATE:
>                 port_flow_validate(in->port, &in->args.vc.attr,
> -                                  in->args.vc.pattern, in->args.vc.actions);
> +                                  in->args.vc.pattern, in->args.vc.actions,
> +                                  &in->args.vc.tunnel_ops);
>                 break;
>         case CREATE:
>                 port_flow_create(in->port, &in->args.vc.attr,
> -                                in->args.vc.pattern, in->args.vc.actions);
> +                                in->args.vc.pattern, in->args.vc.actions,
> +                                &in->args.vc.tunnel_ops);
>                 break;
>         case DESTROY:
>                 port_flow_destroy(in->port, in->args.destroy.rule_n,
> @@ -6573,6 +6726,15 @@ cmd_flow_parsed(const struct buffer *in)
>         case AGED:
>                 port_flow_aged(in->port, in->args.aged.destroy);
>                 break;
> +       case TUNNEL_CREATE:
> +               port_flow_tunnel_create(in->port, &in->args.vc.tunnel_ops);
> +               break;
> +       case TUNNEL_DESTROY:
> +               port_flow_tunnel_destroy(in->port, in->args.vc.tunnel_ops.id);
> +               break;
> +       case TUNNEL_LIST:
> +               port_flow_tunnel_list(in->port);
> +               break;
>         default:
>                 break;
>         }
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 2d9a456467..d0f86230d0 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -1339,6 +1339,115 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
>
>  /* Generic flow management functions. */
>
> +static struct port_flow_tunnel *
> +port_flow_locate_tunnel_id(struct rte_port *port, uint32_t port_tunnel_id)
> +{
> +       struct port_flow_tunnel *flow_tunnel;
> +
> +       LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
> +               if (flow_tunnel->id == port_tunnel_id)
> +                       goto out;
> +       }
> +       flow_tunnel = NULL;
> +
> +out:
> +       return flow_tunnel;
> +}
> +
> +const char *
> +port_flow_tunnel_type(struct rte_flow_tunnel *tunnel)
> +{
> +       const char *type;
> +       switch (tunnel->type) {
> +       default:
> +               type = "unknown";
> +               break;
> +       case RTE_FLOW_ITEM_TYPE_VXLAN:
> +               type = "vxlan";
> +               break;
> +       }
> +
> +       return type;
> +}
> +
> +struct port_flow_tunnel *
> +port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun)
> +{
> +       struct rte_port *port = &ports[port_id];
> +       struct port_flow_tunnel *flow_tunnel;
> +
> +       LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
> +               if (!memcmp(&flow_tunnel->tunnel, tun, sizeof(*tun)))
> +                       goto out;
> +       }
> +       flow_tunnel = NULL;
> +
> +out:
> +       return flow_tunnel;
> +}
> +
> +void port_flow_tunnel_list(portid_t port_id)
> +{
> +       struct rte_port *port = &ports[port_id];
> +       struct port_flow_tunnel *flt;
> +
> +       LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
> +               printf("port %u tunnel #%u type=%s",
> +                       port_id, flt->id, port_flow_tunnel_type(&flt->tunnel));
> +               if (flt->tunnel.tun_id)
> +                       printf(" id=%lu", flt->tunnel.tun_id);
> +               printf("\n");
> +       }
> +}
> +
> +void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id)
> +{
> +       struct rte_port *port = &ports[port_id];
> +       struct port_flow_tunnel *flt;
> +
> +       LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
> +               if (flt->id == tunnel_id)
> +                       break;
> +       }
> +       if (flt) {
> +               LIST_REMOVE(flt, chain);
> +               free(flt);
> +               printf("port %u: flow tunnel #%u destroyed\n",
> +                       port_id, tunnel_id);
> +       }
> +}
> +
> +void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops)
> +{
> +       struct rte_port *port = &ports[port_id];
> +       enum rte_flow_item_type type;
> +       struct port_flow_tunnel *flt;
> +
> +       if (!strcmp(ops->type, "vxlan"))
> +               type = RTE_FLOW_ITEM_TYPE_VXLAN;
> +       else {
> +               printf("cannot offload \"%s\" tunnel type\n", ops->type);
> +               return;
> +       }
> +       LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
> +               if (flt->tunnel.type == type)
> +                       break;
> +       }
> +       if (!flt) {
> +               flt = calloc(1, sizeof(*flt));
> +               if (!flt) {
> +                       printf("failed to allocate port flt object\n");
> +                       return;
> +               }
> +               flt->tunnel.type = type;
> +               flt->id = LIST_EMPTY(&port->flow_tunnel_list) ? 1 :
> +                                 LIST_FIRST(&port->flow_tunnel_list)->id + 1;
> +               LIST_INSERT_HEAD(&port->flow_tunnel_list, flt, chain);
> +       }
> +       printf("port %d: flow tunnel #%u type %s\n",
> +               port_id, flt->id, ops->type);
> +}
> +
>  /** Generate a port_flow entry from attributes/pattern/actions. */
>  static struct port_flow *
>  port_flow_new(const struct rte_flow_attr *attr,
> @@ -1463,19 +1572,137 @@ rss_config_display(struct rte_flow_action_rss *rss_conf)
>         }
>  }
>
> +static struct port_flow_tunnel *
> +port_flow_tunnel_offload_cmd_prep(portid_t port_id,
> +                                 const struct rte_flow_item *pattern,
> +                                 const struct rte_flow_action *actions,
> +                                 const struct tunnel_ops *tunnel_ops)
> +{
> +       int ret;
> +       struct rte_port *port;
> +       struct port_flow_tunnel *pft;
> +       struct rte_flow_error error;
> +
> +       port = &ports[port_id];
> +       pft = port_flow_locate_tunnel_id(port, tunnel_ops->id);
> +       if (!pft) {
> +               printf("failed to locate port flow tunnel #%u\n",
> +                       tunnel_ops->id);
> +               return NULL;
> +       }
> +       if (tunnel_ops->actions) {
> +               uint32_t num_actions;
> +               const struct rte_flow_action *aptr;
> +
> +               ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
> +                                               &pft->pmd_actions,
> +                                               &pft->num_pmd_actions,
> +                                               &error);
> +               if (ret) {
> +                       port_flow_complain(&error);
> +                       return NULL;
> +               }
> +               for (aptr = actions, num_actions = 1;
> +                    aptr->type != RTE_FLOW_ACTION_TYPE_END;
> +                    aptr++, num_actions++);
> +               pft->actions = malloc(
> +                               (num_actions +  pft->num_pmd_actions) *
> +                               sizeof(actions[0]));
> +               if (!pft->actions) {
> +                       rte_flow_tunnel_action_decap_release(
> +                                       port_id, pft->actions,
> +                                       pft->num_pmd_actions, &error);
> +                       return NULL;
> +               }
> +               rte_memcpy(pft->actions, pft->pmd_actions,
> +                          pft->num_pmd_actions * sizeof(actions[0]));
> +               rte_memcpy(pft->actions + pft->num_pmd_actions, actions,
> +                          num_actions * sizeof(actions[0]));
> +       }
> +       if (tunnel_ops->items) {
> +               uint32_t num_items;
> +               const struct rte_flow_item *iptr;
> +
> +               ret = rte_flow_tunnel_match(port_id, &pft->tunnel,
> +                                           &pft->pmd_items,
> +                                           &pft->num_pmd_items,
> +                                           &error);
> +               if (ret) {
> +                       port_flow_complain(&error);
> +                       return NULL;
> +               }
> +               for (iptr = pattern, num_items = 1;
> +                    iptr->type != RTE_FLOW_ITEM_TYPE_END;
> +                    iptr++, num_items++);
> +               pft->items = malloc((num_items + pft->num_pmd_items) *
> +                                   sizeof(pattern[0]));
> +               if (!pft->items) {
> +                       rte_flow_tunnel_item_release(
> +                                       port_id, pft->pmd_items,
> +                                       pft->num_pmd_items, &error);
> +                       return NULL;
> +               }
> +               rte_memcpy(pft->items, pft->pmd_items,
> +                          pft->num_pmd_items * sizeof(pattern[0]));
> +               rte_memcpy(pft->items + pft->num_pmd_items, pattern,
> +                          num_items * sizeof(pattern[0]));
> +       }
> +
> +       return pft;
> +}
> +
> +static void
> +port_flow_tunnel_offload_cmd_release(portid_t port_id,
> +                                    const struct tunnel_ops *tunnel_ops,
> +                                    struct port_flow_tunnel *pft)
> +{
> +       struct rte_flow_error error;
> +
> +       if (tunnel_ops->actions) {
> +               free(pft->actions);
> +               rte_flow_tunnel_action_decap_release(
> +                       port_id, pft->pmd_actions,
> +                       pft->num_pmd_actions, &error);
> +               pft->actions = NULL;
> +               pft->pmd_actions = NULL;
> +       }
> +       if (tunnel_ops->items) {
> +               free(pft->items);
> +               rte_flow_tunnel_item_release(port_id, pft->pmd_items,
> +                                            pft->num_pmd_items,
> +                                            &error);
> +               pft->items = NULL;
> +               pft->pmd_items = NULL;
> +       }
> +}
> +
>  /** Validate flow rule. */
>  int
>  port_flow_validate(portid_t port_id,
>                    const struct rte_flow_attr *attr,
>                    const struct rte_flow_item *pattern,
> -                  const struct rte_flow_action *actions)
> +                  const struct rte_flow_action *actions,
> +                  const struct tunnel_ops *tunnel_ops)
>  {
>         struct rte_flow_error error;
> +       struct port_flow_tunnel *pft = NULL;
>
>         /* Poisoning to make sure PMDs update it in case of error. */
>         memset(&error, 0x11, sizeof(error));
> +       if (tunnel_ops->enabled) {
> +               pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
> +                                                       actions, tunnel_ops);
> +               if (!pft)
> +                       return -ENOENT;
> +               if (pft->items)
> +                       pattern = pft->items;
> +               if (pft->actions)
> +                       actions = pft->actions;
> +       }
>         if (rte_flow_validate(port_id, attr, pattern, actions, &error))
>                 return port_flow_complain(&error);
> +       if (tunnel_ops->enabled)
> +               port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
>         printf("Flow rule validated\n");
>         return 0;
>  }
> @@ -1505,13 +1732,15 @@ int
>  port_flow_create(portid_t port_id,
>                  const struct rte_flow_attr *attr,
>                  const struct rte_flow_item *pattern,
> -                const struct rte_flow_action *actions)
> +                const struct rte_flow_action *actions,
> +                const struct tunnel_ops *tunnel_ops)
>  {
>         struct rte_flow *flow;
>         struct rte_port *port;
>         struct port_flow *pf;
>         uint32_t id = 0;
>         struct rte_flow_error error;
> +       struct port_flow_tunnel *pft = NULL;
>
>         port = &ports[port_id];
>         if (port->flow_list) {
> @@ -1522,6 +1751,16 @@ port_flow_create(portid_t port_id,
>                 }
>                 id = port->flow_list->id + 1;
>         }
> +       if (tunnel_ops->enabled) {
> +               pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
> +                                                       actions, tunnel_ops);
> +               if (!pft)
> +                       return -ENOENT;
> +               if (pft->items)
> +                       pattern = pft->items;
> +               if (pft->actions)
> +                       actions = pft->actions;
> +       }
>         pf = port_flow_new(attr, pattern, actions, &error);
>         if (!pf)
>                 return port_flow_complain(&error);
> @@ -1537,6 +1776,8 @@ port_flow_create(portid_t port_id,
>         pf->id = id;
>         pf->flow = flow;
>         port->flow_list = pf;
> +       if (tunnel_ops->enabled)
> +               port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
>         printf("Flow rule #%u created\n", pf->id);
>         return 0;
>  }
> @@ -1831,7 +2072,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
>                        pf->rule.attr->egress ? 'e' : '-',
>                        pf->rule.attr->transfer ? 't' : '-');
>                 while (item->type != RTE_FLOW_ITEM_TYPE_END) {
> -                       if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
> +                       if ((uint32_t)item->type > INT_MAX)
> +                               name = "PMD_INTERNAL";
> +                       else if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
>                                           &name, sizeof(name),
>                                           (void *)(uintptr_t)item->type,
>                                           NULL) <= 0)
> @@ -1842,7 +2085,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
>                 }
>                 printf("=>");
>                 while (action->type != RTE_FLOW_ACTION_TYPE_END) {
> -                       if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
> +                       if ((uint32_t)action->type > INT_MAX)
> +                               name = "PMD_INTERNAL";
> +                       else if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
>                                           &name, sizeof(name),
>                                           (void *)(uintptr_t)action->type,
>                                           NULL) <= 0)
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index fe6450cc0d..e484079147 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -3588,6 +3588,8 @@ init_port_dcb_config(portid_t pid,
>  static void
>  init_port(void)
>  {
> +       int i;
> +
>         /* Configuration of Ethernet ports. */
>         ports = rte_zmalloc("testpmd: ports",
>                             sizeof(struct rte_port) * RTE_MAX_ETHPORTS,
> @@ -3597,7 +3599,8 @@ init_port(void)
>                                 "rte_zmalloc(%d struct rte_port) failed\n",
>                                 RTE_MAX_ETHPORTS);
>         }
> -
> +       for (i = 0; i < RTE_MAX_ETHPORTS; i++)
> +               LIST_INIT(&ports[i].flow_tunnel_list);
>         /* Initialize ports NUMA structures */
>         memset(port_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
>         memset(rxring_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index f139fe7a0a..1dd8cfd3d8 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -12,6 +12,7 @@
>  #include <rte_gro.h>
>  #include <rte_gso.h>
>  #include <cmdline.h>
> +#include <sys/queue.h>
>
>  #define RTE_PORT_ALL            (~(portid_t)0x0)
>
> @@ -142,6 +143,26 @@ struct port_flow {
>         uint8_t data[]; /**< Storage for flow rule description */
>  };
>
> +struct port_flow_tunnel {
> +       LIST_ENTRY(port_flow_tunnel) chain;
> +       struct rte_flow_action *pmd_actions;
> +       struct rte_flow_item   *pmd_items;
> +       uint32_t id;
> +       uint32_t num_pmd_actions;
> +       uint32_t num_pmd_items;
> +       struct rte_flow_tunnel tunnel;
> +       struct rte_flow_action *actions;
> +       struct rte_flow_item *items;
> +};
> +
> +struct tunnel_ops {
> +       uint32_t id;
> +       char type[16];
> +       uint32_t enabled:1;
> +       uint32_t actions:1;
> +       uint32_t items:1;
> +};
> +
>  /**
>   * The data structure associated with each port.
>   */
> @@ -172,6 +193,7 @@ struct rte_port {
>         uint32_t                mc_addr_nb; /**< nb. of addr. in mc_addr_pool */
>         uint8_t                 slave_flag; /**< bonding slave port */
>         struct port_flow        *flow_list; /**< Associated flows. */
> +       LIST_HEAD(, port_flow_tunnel) flow_tunnel_list;
>         const struct rte_eth_rxtx_callback *rx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
>         const struct rte_eth_rxtx_callback *tx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
>         /**< metadata value to insert in Tx packets. */
> @@ -749,11 +771,13 @@ void port_reg_set(portid_t port_id, uint32_t reg_off, uint32_t value);
>  int port_flow_validate(portid_t port_id,
>                        const struct rte_flow_attr *attr,
>                        const struct rte_flow_item *pattern,
> -                      const struct rte_flow_action *actions);
> +                      const struct rte_flow_action *actions,
> +                      const struct tunnel_ops *tunnel_ops);
>  int port_flow_create(portid_t port_id,
>                      const struct rte_flow_attr *attr,
>                      const struct rte_flow_item *pattern,
> -                    const struct rte_flow_action *actions);
> +                    const struct rte_flow_action *actions,
> +                    const struct tunnel_ops *tunnel_ops);
>  void update_age_action_context(const struct rte_flow_action *actions,
>                      struct port_flow *pf);
>  int port_flow_destroy(portid_t port_id, uint32_t n, const uint32_t *rule);
> @@ -763,6 +787,12 @@ int port_flow_query(portid_t port_id, uint32_t rule,
>                     const struct rte_flow_action *action);
>  void port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group);
>  void port_flow_aged(portid_t port_id, uint8_t destroy);
> +const char *port_flow_tunnel_type(struct rte_flow_tunnel *tunnel);
> +struct port_flow_tunnel *
> +port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun);
> +void port_flow_tunnel_list(portid_t port_id);
> +void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id);
> +void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops);
>  int port_flow_isolate(portid_t port_id, int set);
>
>  void rx_ring_desc_display(portid_t port_id, queueid_t rxq_id, uint16_t rxd_id);
> diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
> index 8488fa1a8f..781a813759 100644
> --- a/app/test-pmd/util.c
> +++ b/app/test-pmd/util.c
> @@ -48,18 +48,49 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
>                is_rx ? "received" : "sent",
>                (unsigned int) nb_pkts);
>         for (i = 0; i < nb_pkts; i++) {
> +               int ret;
> +               struct rte_flow_error error;
> +               struct rte_flow_restore_info info = { 0, };
> +
>                 mb = pkts[i];
>                 eth_hdr = rte_pktmbuf_read(mb, 0, sizeof(_eth_hdr), &_eth_hdr);
>                 eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
> -               ol_flags = mb->ol_flags;
>                 packet_type = mb->packet_type;
>                 is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
> -
> +               ret = rte_flow_get_restore_info(port_id, mb, &info, &error);
> +               if (!ret) {
> +                       printf("restore info:");
> +                       if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
> +                               struct port_flow_tunnel *port_tunnel;
> +
> +                               port_tunnel = port_flow_locate_tunnel
> +                                             (port_id, &info.tunnel);
> +                               printf(" - tunnel");
> +                               if (port_tunnel)
> +                                       printf(" #%u", port_tunnel->id);
> +                               else
> +                                       printf(" %s", "-none-");
> +                               printf(" type %s",
> +                                       port_flow_tunnel_type(&info.tunnel));
> +                       } else {
> +                               printf(" - no tunnel info");
> +                       }
> +                       if (info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED)
> +                               printf(" - outer header present");
> +                       else
> +                               printf(" - no outer header");
> +                       if (info.flags & RTE_FLOW_RESTORE_INFO_GROUP_ID)
> +                               printf(" - miss group %u", info.group_id);
> +                       else
> +                               printf(" - no miss group");
> +                       printf("\n");
> +               }
>                 print_ether_addr("  src=", &eth_hdr->s_addr);
>                 print_ether_addr(" - dst=", &eth_hdr->d_addr);
>                 printf(" - type=0x%04x - length=%u - nb_segs=%d",
>                        eth_type, (unsigned int) mb->pkt_len,
>                        (int)mb->nb_segs);
> +               ol_flags = mb->ol_flags;
>                 if (ol_flags & PKT_RX_RSS_HASH) {
>                         printf(" - RSS hash=0x%x", (unsigned int) mb->hash.rss);
>                         printf(" - RSS queue=0x%x", (unsigned int) queue);
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index a972ef8951..97fcbfd329 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -3720,6 +3720,45 @@ following sections.
>
>     flow aged {port_id} [destroy]
>
> +- Tunnel offload - create a tunnel stub::
> +
> +   flow tunnel create {port_id} type {tunnel_type}
> +
> +- Tunnel offload - destroy a tunnel stub::
> +
> +   flow tunnel destroy {port_id} id {tunnel_id}
> +
> +- Tunnel offload - list port tunnel stubs::
> +
> +   flow tunnel list {port_id}
> +
> +Creating a tunnel stub for offload
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +``flow tunnel create`` setup a tunnel stub for tunnel offload flow rules::
> +
> +   flow tunnel create {port_id} type {tunnel_type}
> +
> +If successful, it will return a tunnel stub ID usable with other commands::
> +
> +   port [...]: flow tunnel #[...] type [...]
> +
> +Tunnel stub ID is relative to a port.
> +
> +Destroying tunnel offload stub
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +``flow tunnel destroy`` destroy port tunnel stub::
> +
> +   flow tunnel destroy {port_id} id {tunnel_id}
> +
> +Listing tunnel offload stubs
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +``flow tunnel list`` list port tunnel offload stubs::
> +
> +   flow tunnel list {port_id}
> +
>  Validating flow rules
>  ~~~~~~~~~~~~~~~~~~~~~
>
> @@ -3766,6 +3805,7 @@ to ``rte_flow_create()``::
>
>     flow create {port_id}
>        [group {group_id}] [priority {level}] [ingress] [egress] [transfer]
> +      [tunnel_set {tunnel_id}] [tunnel_match {tunnel_id}]
>        pattern {item} [/ {item} [...]] / end
>        actions {action} [/ {action} [...]] / end
>
> @@ -3780,6 +3820,7 @@ Otherwise it will show an error message of the form::
>  Parameters describe in the following order:
>
>  - Attributes (*group*, *priority*, *ingress*, *egress*, *transfer* tokens).
> +- Tunnel offload specification (tunnel_set, tunnel_match)
>  - A matching pattern, starting with the *pattern* token and terminated by an
>    *end* pattern item.
>  - Actions, starting with the *actions* token and terminated by an *end*
> @@ -3823,6 +3864,14 @@ Most rules affect RX therefore contain the ``ingress`` token::
>
>     testpmd> flow create 0 ingress pattern [...]
>
> +Tunnel offload
> +^^^^^^^^^^^^^^
> +
> +Indicate tunnel offload rule type
> +
> +- ``tunnel_set {tunnel_id}``: mark rule as tunnel offload decap_set type.
> +- ``tunnel_match {tunnel_id}``:  mark rule as tunel offload match type.
> +
>  Matching pattern
>  ^^^^^^^^^^^^^^^^
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] app/testpmd: add commands for tunnel offload API
  2020-10-01  5:32     ` Ajit Khaparde
@ 2020-10-01  9:05       ` Gregory Etelson
  2020-10-04  5:40         ` Ajit Khaparde
  0 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-10-01  9:05 UTC (permalink / raw)
  To: Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam, Wenzhuo Lu,
	Beilei Xing, Bernard Iremonger, John McNamara, Marko Kovacevic,
	Eli Britstein, Oz Shlomo

Hello Ajit,

> -----Original Message-----
> On Wed, Sep 30, 2020 at 2:21 AM Gregory Etelson <getelson@nvidia.com>
> wrote:
> >
> > Tunnel Offload API provides hardware independent, unified model
> > to offload tunneled traffic. Key model elements are:
> >  - apply matches to both outer and inner packet headers
> >    during entire offload procedure;
> >  - restore outer header of partially offloaded packet;
> >  - model is implemented as a set of helper functions.
> >
> > Implementation details:
> >
> > * Create application tunnel:
> > flow tunnel create <port> type <tunnel type>
> > On success, the command creates application tunnel object and returns
> > the tunnel descriptor. Tunnel descriptor is used in subsequent flow
> > creation commands to reference the tunnel.
> >
> > * Create tunnel steering flow rule:
> > tunnel_set <tunnel descriptor> parameter used with steering rule
> > template.
> >
> > * Create tunnel matching flow rule:
> > tunnel_match <tunnel descriptor> used with matching rule template.
> >
> > * If tunnel steering rule was offloaded, outer header of a partially
> > offloaded packet is restored after miss.
> >
> > Example:
> > test packet=
> > <Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
> > <IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
> > <UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
> > <VXLAN  NextProtocol=Ethernet vni=0x0 |
> > <Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
> > <IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
> > <ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
> > >>> len(packet)
> > 92
> >
> > testpmd> flow flush 0
> > testpmd> port 0/queue 0: received 1 packets
> > src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
> > length=92
> >
> > testpmd> flow tunnel 0 type vxlan
> > port 0: flow tunnel #1 type vxlan
> > testpmd> flow create 0 ingress group 0 tunnel_set 1
> >          pattern eth /ipv4 / udp dst is 4789 / vxlan / end
> >          actions  jump group 0 / end
> 
> I could not get enough time to completely look at this.
> Can you help understand the sequence a bit.
> 
> So when this flow create is issued, it looks like the application issues
> one of the helper APIs.
> In response the PMD returns the vendor specific items/actions.
> So what happens to the original match and action criteria provided by the
> user
> or applications?
> 
> Does the vendor provided actions and items in the tunnel descriptor
> take precedence over what was used by the application?
> 
> And is the criteria provided by the PMD opaque to the application?
> Such that only the PMD can decipher it when the application calls it
> for subsequent flow create(s)?
> 

Flow rules that implement tunnel offload API are compiled from application and
PMD actions and items. Both application and PMD elements are equally important
for offload rule creation. Application is "aware" what type of tunnel offload
flow rule, decap_set or match, it builds. Therefore, application elements are
created according to tunnel offload rule type and the API helper function, that
matches selected rule type, is activated to provide PMD elements.
PMD elements are opaque. When all elements are present, application concatenates
PMD items with application items and PMD actions with application actions.
As the result application will have pattern & actions arrays for flow rule
creation. For the decap_set rule, the API expects application to provide pattern
that describes a tunnel and the JUMP action, although the JUMP is not strictly
required. And for the match rule, application pattern should describe packet
headers and actions refer to inner parts.

> > Flow rule #0 created
> > testpmd> port 0/queue 0: received 1 packets
> > tunnel restore info: - vxlan tunnel - outer header present # <--
> >   src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
> > length=92
> >
> > testpmd> flow create 0 ingress group 0 tunnel_match 1
> >          pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
> >          end
> >          actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
> >          queue index 0 / end
> > Flow rule #1 created
> > testpmd> port 0/queue 0: received 1 packets
> >   src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
> > length=42
> >
> > * Destroy flow tunnel
> > flow tunnel destroy <port> id <tunnel id>
> >
> > * Show existing flow tunnels
> > flow tunnel list <port>
> >
> > Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> > ---
> > v2:
> > * introduce testpmd support for tunnel offload API
> >
> > v3:
> > * update flow tunnel commands
> > ---
> >  app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
> >  app/test-pmd/config.c                       | 253 +++++++++++++++++++-
> >  app/test-pmd/testpmd.c                      |   5 +-
> >  app/test-pmd/testpmd.h                      |  34 ++-
> >  app/test-pmd/util.c                         |  35 ++-
> >  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
> >  6 files changed, 533 insertions(+), 13 deletions(-)
> >
> > diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
> > index 6263d307ed..0fb61860cd 100644
> > --- a/app/test-pmd/cmdline_flow.c
> > +++ b/app/test-pmd/cmdline_flow.c
> > @@ -69,6 +69,14 @@ enum index {
> >         LIST,
> >         AGED,
> >         ISOLATE,
> > +       TUNNEL,
> > +
> > +       /* Tunnel argumens. */
> > +       TUNNEL_CREATE,
> > +       TUNNEL_CREATE_TYPE,
> > +       TUNNEL_LIST,
> > +       TUNNEL_DESTROY,
> > +       TUNNEL_DESTROY_ID,
> >
> >         /* Destroy arguments. */
> >         DESTROY_RULE,
> > @@ -88,6 +96,8 @@ enum index {
> >         INGRESS,
> >         EGRESS,
> >         TRANSFER,
> > +       TUNNEL_SET,
> > +       TUNNEL_MATCH,
> >
> >         /* Validate/create pattern. */
> >         PATTERN,
> > @@ -653,6 +663,7 @@ struct buffer {
> >         union {
> >                 struct {
> >                         struct rte_flow_attr attr;
> > +                       struct tunnel_ops tunnel_ops;
> >                         struct rte_flow_item *pattern;
> >                         struct rte_flow_action *actions;
> >                         uint32_t pattern_n;
> > @@ -713,10 +724,32 @@ static const enum index next_vc_attr[] = {
> >         INGRESS,
> >         EGRESS,
> >         TRANSFER,
> > +       TUNNEL_SET,
> > +       TUNNEL_MATCH,
> >         PATTERN,
> >         ZERO,
> >  };
> >
> > +static const enum index tunnel_create_attr[] = {
> > +       TUNNEL_CREATE,
> > +       TUNNEL_CREATE_TYPE,
> > +       END,
> > +       ZERO,
> > +};
> > +
> > +static const enum index tunnel_destroy_attr[] = {
> > +       TUNNEL_DESTROY,
> > +       TUNNEL_DESTROY_ID,
> > +       END,
> > +       ZERO,
> > +};
> > +
> > +static const enum index tunnel_list_attr[] = {
> > +       TUNNEL_LIST,
> > +       END,
> > +       ZERO,
> > +};
> > +
> >  static const enum index next_destroy_attr[] = {
> >         DESTROY_RULE,
> >         END,
> > @@ -1516,6 +1549,9 @@ static int parse_aged(struct context *, const
> struct token *,
> >  static int parse_isolate(struct context *, const struct token *,
> >                          const char *, unsigned int,
> >                          void *, unsigned int);
> > +static int parse_tunnel(struct context *, const struct token *,
> > +                       const char *, unsigned int,
> > +                       void *, unsigned int);
> >  static int parse_int(struct context *, const struct token *,
> >                      const char *, unsigned int,
> >                      void *, unsigned int);
> > @@ -1698,7 +1734,8 @@ static const struct token token_list[] = {
> >                               LIST,
> >                               AGED,
> >                               QUERY,
> > -                             ISOLATE)),
> > +                             ISOLATE,
> > +                             TUNNEL)),
> >                 .call = parse_init,
> >         },
> >         /* Sub-level commands. */
> > @@ -1772,6 +1809,49 @@ static const struct token token_list[] = {
> >                              ARGS_ENTRY(struct buffer, port)),
> >                 .call = parse_isolate,
> >         },
> > +       [TUNNEL] = {
> > +               .name = "tunnel",
> > +               .help = "new tunnel API",
> > +               .next = NEXT(NEXT_ENTRY
> > +                            (TUNNEL_CREATE, TUNNEL_LIST, TUNNEL_DESTROY)),
> > +               .call = parse_tunnel,
> > +       },
> > +       /* Tunnel arguments. */
> > +       [TUNNEL_CREATE] = {
> > +               .name = "create",
> > +               .help = "create new tunnel object",
> > +               .next = NEXT(tunnel_create_attr, NEXT_ENTRY(PORT_ID)),
> > +               .args = ARGS(ARGS_ENTRY(struct buffer, port)),
> > +               .call = parse_tunnel,
> > +       },
> > +       [TUNNEL_CREATE_TYPE] = {
> > +               .name = "type",
> > +               .help = "create new tunnel",
> > +               .next = NEXT(tunnel_create_attr, NEXT_ENTRY(FILE_PATH)),
> > +               .args = ARGS(ARGS_ENTRY(struct tunnel_ops, type)),
> > +               .call = parse_tunnel,
> > +       },
> > +       [TUNNEL_DESTROY] = {
> > +               .name = "destroy",
> > +               .help = "destroy tunel",
> > +               .next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(PORT_ID)),
> > +               .args = ARGS(ARGS_ENTRY(struct buffer, port)),
> > +               .call = parse_tunnel,
> > +       },
> > +       [TUNNEL_DESTROY_ID] = {
> > +               .name = "id",
> > +               .help = "tunnel identifier to testroy",
> > +               .next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(UNSIGNED)),
> > +               .args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
> > +               .call = parse_tunnel,
> > +       },
> > +       [TUNNEL_LIST] = {
> > +               .name = "list",
> > +               .help = "list existing tunnels",
> > +               .next = NEXT(tunnel_list_attr, NEXT_ENTRY(PORT_ID)),
> > +               .args = ARGS(ARGS_ENTRY(struct buffer, port)),
> > +               .call = parse_tunnel,
> > +       },
> >         /* Destroy arguments. */
> >         [DESTROY_RULE] = {
> >                 .name = "rule",
> > @@ -1835,6 +1915,20 @@ static const struct token token_list[] = {
> >                 .next = NEXT(next_vc_attr),
> >                 .call = parse_vc,
> >         },
> > +       [TUNNEL_SET] = {
> > +               .name = "tunnel_set",
> > +               .help = "tunnel steer rule",
> > +               .next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
> > +               .args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
> > +               .call = parse_vc,
> > +       },
> > +       [TUNNEL_MATCH] = {
> > +               .name = "tunnel_match",
> > +               .help = "tunnel match rule",
> > +               .next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
> > +               .args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
> > +               .call = parse_vc,
> > +       },
> >         /* Validate/create pattern. */
> >         [PATTERN] = {
> >                 .name = "pattern",
> > @@ -4054,12 +4148,28 @@ parse_vc(struct context *ctx, const struct
> token *token,
> >                 return len;
> >         }
> >         ctx->objdata = 0;
> > -       ctx->object = &out->args.vc.attr;
> > +       switch (ctx->curr) {
> > +       default:
> > +               ctx->object = &out->args.vc.attr;
> > +               break;
> > +       case TUNNEL_SET:
> > +       case TUNNEL_MATCH:
> > +               ctx->object = &out->args.vc.tunnel_ops;
> > +               break;
> > +       }
> >         ctx->objmask = NULL;
> >         switch (ctx->curr) {
> >         case GROUP:
> >         case PRIORITY:
> >                 return len;
> > +       case TUNNEL_SET:
> > +               out->args.vc.tunnel_ops.enabled = 1;
> > +               out->args.vc.tunnel_ops.actions = 1;
> > +               return len;
> > +       case TUNNEL_MATCH:
> > +               out->args.vc.tunnel_ops.enabled = 1;
> > +               out->args.vc.tunnel_ops.items = 1;
> > +               return len;
> >         case INGRESS:
> >                 out->args.vc.attr.ingress = 1;
> >                 return len;
> > @@ -5597,6 +5707,47 @@ parse_isolate(struct context *ctx, const struct
> token *token,
> >         return len;
> >  }
> >
> > +static int
> > +parse_tunnel(struct context *ctx, const struct token *token,
> > +            const char *str, unsigned int len,
> > +            void *buf, unsigned int size)
> > +{
> > +       struct buffer *out = buf;
> > +
> > +       /* Token name must match. */
> > +       if (parse_default(ctx, token, str, len, NULL, 0) < 0)
> > +               return -1;
> > +       /* Nothing else to do if there is no buffer. */
> > +       if (!out)
> > +               return len;
> > +       if (!out->command) {
> > +               if (ctx->curr != TUNNEL)
> > +                       return -1;
> > +               if (sizeof(*out) > size)
> > +                       return -1;
> > +               out->command = ctx->curr;
> > +               ctx->objdata = 0;
> > +               ctx->object = out;
> > +               ctx->objmask = NULL;
> > +       } else {
> > +               switch (ctx->curr) {
> > +               default:
> > +                       break;
> > +               case TUNNEL_CREATE:
> > +               case TUNNEL_DESTROY:
> > +               case TUNNEL_LIST:
> > +                       out->command = ctx->curr;
> > +                       break;
> > +               case TUNNEL_CREATE_TYPE:
> > +               case TUNNEL_DESTROY_ID:
> > +                       ctx->object = &out->args.vc.tunnel_ops;
> > +                       break;
> > +               }
> > +       }
> > +
> > +       return len;
> > +}
> > +
> >  /**
> >   * Parse signed/unsigned integers 8 to 64-bit long.
> >   *
> > @@ -6543,11 +6694,13 @@ cmd_flow_parsed(const struct buffer *in)
> >         switch (in->command) {
> >         case VALIDATE:
> >                 port_flow_validate(in->port, &in->args.vc.attr,
> > -                                  in->args.vc.pattern, in->args.vc.actions);
> > +                                  in->args.vc.pattern, in->args.vc.actions,
> > +                                  &in->args.vc.tunnel_ops);
> >                 break;
> >         case CREATE:
> >                 port_flow_create(in->port, &in->args.vc.attr,
> > -                                in->args.vc.pattern, in->args.vc.actions);
> > +                                in->args.vc.pattern, in->args.vc.actions,
> > +                                &in->args.vc.tunnel_ops);
> >                 break;
> >         case DESTROY:
> >                 port_flow_destroy(in->port, in->args.destroy.rule_n,
> > @@ -6573,6 +6726,15 @@ cmd_flow_parsed(const struct buffer *in)
> >         case AGED:
> >                 port_flow_aged(in->port, in->args.aged.destroy);
> >                 break;
> > +       case TUNNEL_CREATE:
> > +               port_flow_tunnel_create(in->port, &in->args.vc.tunnel_ops);
> > +               break;
> > +       case TUNNEL_DESTROY:
> > +               port_flow_tunnel_destroy(in->port, in->args.vc.tunnel_ops.id);
> > +               break;
> > +       case TUNNEL_LIST:
> > +               port_flow_tunnel_list(in->port);
> > +               break;
> >         default:
> >                 break;
> >         }
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> > index 2d9a456467..d0f86230d0 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -1339,6 +1339,115 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
> >
> >  /* Generic flow management functions. */
> >
> > +static struct port_flow_tunnel *
> > +port_flow_locate_tunnel_id(struct rte_port *port, uint32_t
> port_tunnel_id)
> > +{
> > +       struct port_flow_tunnel *flow_tunnel;
> > +
> > +       LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
> > +               if (flow_tunnel->id == port_tunnel_id)
> > +                       goto out;
> > +       }
> > +       flow_tunnel = NULL;
> > +
> > +out:
> > +       return flow_tunnel;
> > +}
> > +
> > +const char *
> > +port_flow_tunnel_type(struct rte_flow_tunnel *tunnel)
> > +{
> > +       const char *type;
> > +       switch (tunnel->type) {
> > +       default:
> > +               type = "unknown";
> > +               break;
> > +       case RTE_FLOW_ITEM_TYPE_VXLAN:
> > +               type = "vxlan";
> > +               break;
> > +       }
> > +
> > +       return type;
> > +}
> > +
> > +struct port_flow_tunnel *
> > +port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun)
> > +{
> > +       struct rte_port *port = &ports[port_id];
> > +       struct port_flow_tunnel *flow_tunnel;
> > +
> > +       LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
> > +               if (!memcmp(&flow_tunnel->tunnel, tun, sizeof(*tun)))
> > +                       goto out;
> > +       }
> > +       flow_tunnel = NULL;
> > +
> > +out:
> > +       return flow_tunnel;
> > +}
> > +
> > +void port_flow_tunnel_list(portid_t port_id)
> > +{
> > +       struct rte_port *port = &ports[port_id];
> > +       struct port_flow_tunnel *flt;
> > +
> > +       LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
> > +               printf("port %u tunnel #%u type=%s",
> > +                       port_id, flt->id, port_flow_tunnel_type(&flt->tunnel));
> > +               if (flt->tunnel.tun_id)
> > +                       printf(" id=%lu", flt->tunnel.tun_id);
> > +               printf("\n");
> > +       }
> > +}
> > +
> > +void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id)
> > +{
> > +       struct rte_port *port = &ports[port_id];
> > +       struct port_flow_tunnel *flt;
> > +
> > +       LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
> > +               if (flt->id == tunnel_id)
> > +                       break;
> > +       }
> > +       if (flt) {
> > +               LIST_REMOVE(flt, chain);
> > +               free(flt);
> > +               printf("port %u: flow tunnel #%u destroyed\n",
> > +                       port_id, tunnel_id);
> > +       }
> > +}
> > +
> > +void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops
> *ops)
> > +{
> > +       struct rte_port *port = &ports[port_id];
> > +       enum rte_flow_item_type type;
> > +       struct port_flow_tunnel *flt;
> > +
> > +       if (!strcmp(ops->type, "vxlan"))
> > +               type = RTE_FLOW_ITEM_TYPE_VXLAN;
> > +       else {
> > +               printf("cannot offload \"%s\" tunnel type\n", ops->type);
> > +               return;
> > +       }
> > +       LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
> > +               if (flt->tunnel.type == type)
> > +                       break;
> > +       }
> > +       if (!flt) {
> > +               flt = calloc(1, sizeof(*flt));
> > +               if (!flt) {
> > +                       printf("failed to allocate port flt object\n");
> > +                       return;
> > +               }
> > +               flt->tunnel.type = type;
> > +               flt->id = LIST_EMPTY(&port->flow_tunnel_list) ? 1 :
> > +                                 LIST_FIRST(&port->flow_tunnel_list)->id + 1;
> > +               LIST_INSERT_HEAD(&port->flow_tunnel_list, flt, chain);
> > +       }
> > +       printf("port %d: flow tunnel #%u type %s\n",
> > +               port_id, flt->id, ops->type);
> > +}
> > +
> >  /** Generate a port_flow entry from attributes/pattern/actions. */
> >  static struct port_flow *
> >  port_flow_new(const struct rte_flow_attr *attr,
> > @@ -1463,19 +1572,137 @@ rss_config_display(struct rte_flow_action_rss
> *rss_conf)
> >         }
> >  }
> >
> > +static struct port_flow_tunnel *
> > +port_flow_tunnel_offload_cmd_prep(portid_t port_id,
> > +                                 const struct rte_flow_item *pattern,
> > +                                 const struct rte_flow_action *actions,
> > +                                 const struct tunnel_ops *tunnel_ops)
> > +{
> > +       int ret;
> > +       struct rte_port *port;
> > +       struct port_flow_tunnel *pft;
> > +       struct rte_flow_error error;
> > +
> > +       port = &ports[port_id];
> > +       pft = port_flow_locate_tunnel_id(port, tunnel_ops->id);
> > +       if (!pft) {
> > +               printf("failed to locate port flow tunnel #%u\n",
> > +                       tunnel_ops->id);
> > +               return NULL;
> > +       }
> > +       if (tunnel_ops->actions) {
> > +               uint32_t num_actions;
> > +               const struct rte_flow_action *aptr;
> > +
> > +               ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
> > +                                               &pft->pmd_actions,
> > +                                               &pft->num_pmd_actions,
> > +                                               &error);
> > +               if (ret) {
> > +                       port_flow_complain(&error);
> > +                       return NULL;
> > +               }
> > +               for (aptr = actions, num_actions = 1;
> > +                    aptr->type != RTE_FLOW_ACTION_TYPE_END;
> > +                    aptr++, num_actions++);
> > +               pft->actions = malloc(
> > +                               (num_actions +  pft->num_pmd_actions) *
> > +                               sizeof(actions[0]));
> > +               if (!pft->actions) {
> > +                       rte_flow_tunnel_action_decap_release(
> > +                                       port_id, pft->actions,
> > +                                       pft->num_pmd_actions, &error);
> > +                       return NULL;
> > +               }
> > +               rte_memcpy(pft->actions, pft->pmd_actions,
> > +                          pft->num_pmd_actions * sizeof(actions[0]));
> > +               rte_memcpy(pft->actions + pft->num_pmd_actions, actions,
> > +                          num_actions * sizeof(actions[0]));
> > +       }
> > +       if (tunnel_ops->items) {
> > +               uint32_t num_items;
> > +               const struct rte_flow_item *iptr;
> > +
> > +               ret = rte_flow_tunnel_match(port_id, &pft->tunnel,
> > +                                           &pft->pmd_items,
> > +                                           &pft->num_pmd_items,
> > +                                           &error);
> > +               if (ret) {
> > +                       port_flow_complain(&error);
> > +                       return NULL;
> > +               }
> > +               for (iptr = pattern, num_items = 1;
> > +                    iptr->type != RTE_FLOW_ITEM_TYPE_END;
> > +                    iptr++, num_items++);
> > +               pft->items = malloc((num_items + pft->num_pmd_items) *
> > +                                   sizeof(pattern[0]));
> > +               if (!pft->items) {
> > +                       rte_flow_tunnel_item_release(
> > +                                       port_id, pft->pmd_items,
> > +                                       pft->num_pmd_items, &error);
> > +                       return NULL;
> > +               }
> > +               rte_memcpy(pft->items, pft->pmd_items,
> > +                          pft->num_pmd_items * sizeof(pattern[0]));
> > +               rte_memcpy(pft->items + pft->num_pmd_items, pattern,
> > +                          num_items * sizeof(pattern[0]));
> > +       }
> > +
> > +       return pft;
> > +}
> > +
> > +static void
> > +port_flow_tunnel_offload_cmd_release(portid_t port_id,
> > +                                    const struct tunnel_ops *tunnel_ops,
> > +                                    struct port_flow_tunnel *pft)
> > +{
> > +       struct rte_flow_error error;
> > +
> > +       if (tunnel_ops->actions) {
> > +               free(pft->actions);
> > +               rte_flow_tunnel_action_decap_release(
> > +                       port_id, pft->pmd_actions,
> > +                       pft->num_pmd_actions, &error);
> > +               pft->actions = NULL;
> > +               pft->pmd_actions = NULL;
> > +       }
> > +       if (tunnel_ops->items) {
> > +               free(pft->items);
> > +               rte_flow_tunnel_item_release(port_id, pft->pmd_items,
> > +                                            pft->num_pmd_items,
> > +                                            &error);
> > +               pft->items = NULL;
> > +               pft->pmd_items = NULL;
> > +       }
> > +}
> > +
> >  /** Validate flow rule. */
> >  int
> >  port_flow_validate(portid_t port_id,
> >                    const struct rte_flow_attr *attr,
> >                    const struct rte_flow_item *pattern,
> > -                  const struct rte_flow_action *actions)
> > +                  const struct rte_flow_action *actions,
> > +                  const struct tunnel_ops *tunnel_ops)
> >  {
> >         struct rte_flow_error error;
> > +       struct port_flow_tunnel *pft = NULL;
> >
> >         /* Poisoning to make sure PMDs update it in case of error. */
> >         memset(&error, 0x11, sizeof(error));
> > +       if (tunnel_ops->enabled) {
> > +               pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
> > +                                                       actions, tunnel_ops);
> > +               if (!pft)
> > +                       return -ENOENT;
> > +               if (pft->items)
> > +                       pattern = pft->items;
> > +               if (pft->actions)
> > +                       actions = pft->actions;
> > +       }
> >         if (rte_flow_validate(port_id, attr, pattern, actions, &error))
> >                 return port_flow_complain(&error);
> > +       if (tunnel_ops->enabled)
> > +               port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
> >         printf("Flow rule validated\n");
> >         return 0;
> >  }
> > @@ -1505,13 +1732,15 @@ int
> >  port_flow_create(portid_t port_id,
> >                  const struct rte_flow_attr *attr,
> >                  const struct rte_flow_item *pattern,
> > -                const struct rte_flow_action *actions)
> > +                const struct rte_flow_action *actions,
> > +                const struct tunnel_ops *tunnel_ops)
> >  {
> >         struct rte_flow *flow;
> >         struct rte_port *port;
> >         struct port_flow *pf;
> >         uint32_t id = 0;
> >         struct rte_flow_error error;
> > +       struct port_flow_tunnel *pft = NULL;
> >
> >         port = &ports[port_id];
> >         if (port->flow_list) {
> > @@ -1522,6 +1751,16 @@ port_flow_create(portid_t port_id,
> >                 }
> >                 id = port->flow_list->id + 1;
> >         }
> > +       if (tunnel_ops->enabled) {
> > +               pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
> > +                                                       actions, tunnel_ops);
> > +               if (!pft)
> > +                       return -ENOENT;
> > +               if (pft->items)
> > +                       pattern = pft->items;
> > +               if (pft->actions)
> > +                       actions = pft->actions;
> > +       }
> >         pf = port_flow_new(attr, pattern, actions, &error);
> >         if (!pf)
> >                 return port_flow_complain(&error);
> > @@ -1537,6 +1776,8 @@ port_flow_create(portid_t port_id,
> >         pf->id = id;
> >         pf->flow = flow;
> >         port->flow_list = pf;
> > +       if (tunnel_ops->enabled)
> > +               port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
> >         printf("Flow rule #%u created\n", pf->id);
> >         return 0;
> >  }
> > @@ -1831,7 +2072,9 @@ port_flow_list(portid_t port_id, uint32_t n, const
> uint32_t group[n])
> >                        pf->rule.attr->egress ? 'e' : '-',
> >                        pf->rule.attr->transfer ? 't' : '-');
> >                 while (item->type != RTE_FLOW_ITEM_TYPE_END) {
> > -                       if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
> > +                       if ((uint32_t)item->type > INT_MAX)
> > +                               name = "PMD_INTERNAL";
> > +                       else if
> (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
> >                                           &name, sizeof(name),
> >                                           (void *)(uintptr_t)item->type,
> >                                           NULL) <= 0)
> > @@ -1842,7 +2085,9 @@ port_flow_list(portid_t port_id, uint32_t n, const
> uint32_t group[n])
> >                 }
> >                 printf("=>");
> >                 while (action->type != RTE_FLOW_ACTION_TYPE_END) {
> > -                       if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
> > +                       if ((uint32_t)action->type > INT_MAX)
> > +                               name = "PMD_INTERNAL";
> > +                       else if
> (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
> >                                           &name, sizeof(name),
> >                                           (void *)(uintptr_t)action->type,
> >                                           NULL) <= 0)
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> > index fe6450cc0d..e484079147 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -3588,6 +3588,8 @@ init_port_dcb_config(portid_t pid,
> >  static void
> >  init_port(void)
> >  {
> > +       int i;
> > +
> >         /* Configuration of Ethernet ports. */
> >         ports = rte_zmalloc("testpmd: ports",
> >                             sizeof(struct rte_port) * RTE_MAX_ETHPORTS,
> > @@ -3597,7 +3599,8 @@ init_port(void)
> >                                 "rte_zmalloc(%d struct rte_port) failed\n",
> >                                 RTE_MAX_ETHPORTS);
> >         }
> > -
> > +       for (i = 0; i < RTE_MAX_ETHPORTS; i++)
> > +               LIST_INIT(&ports[i].flow_tunnel_list);
> >         /* Initialize ports NUMA structures */
> >         memset(port_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
> >         memset(rxring_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
> > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> > index f139fe7a0a..1dd8cfd3d8 100644
> > --- a/app/test-pmd/testpmd.h
> > +++ b/app/test-pmd/testpmd.h
> > @@ -12,6 +12,7 @@
> >  #include <rte_gro.h>
> >  #include <rte_gso.h>
> >  #include <cmdline.h>
> > +#include <sys/queue.h>
> >
> >  #define RTE_PORT_ALL            (~(portid_t)0x0)
> >
> > @@ -142,6 +143,26 @@ struct port_flow {
> >         uint8_t data[]; /**< Storage for flow rule description */
> >  };
> >
> > +struct port_flow_tunnel {
> > +       LIST_ENTRY(port_flow_tunnel) chain;
> > +       struct rte_flow_action *pmd_actions;
> > +       struct rte_flow_item   *pmd_items;
> > +       uint32_t id;
> > +       uint32_t num_pmd_actions;
> > +       uint32_t num_pmd_items;
> > +       struct rte_flow_tunnel tunnel;
> > +       struct rte_flow_action *actions;
> > +       struct rte_flow_item *items;
> > +};
> > +
> > +struct tunnel_ops {
> > +       uint32_t id;
> > +       char type[16];
> > +       uint32_t enabled:1;
> > +       uint32_t actions:1;
> > +       uint32_t items:1;
> > +};
> > +
> >  /**
> >   * The data structure associated with each port.
> >   */
> > @@ -172,6 +193,7 @@ struct rte_port {
> >         uint32_t                mc_addr_nb; /**< nb. of addr. in mc_addr_pool */
> >         uint8_t                 slave_flag; /**< bonding slave port */
> >         struct port_flow        *flow_list; /**< Associated flows. */
> > +       LIST_HEAD(, port_flow_tunnel) flow_tunnel_list;
> >         const struct rte_eth_rxtx_callback
> *rx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
> >         const struct rte_eth_rxtx_callback
> *tx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
> >         /**< metadata value to insert in Tx packets. */
> > @@ -749,11 +771,13 @@ void port_reg_set(portid_t port_id, uint32_t
> reg_off, uint32_t value);
> >  int port_flow_validate(portid_t port_id,
> >                        const struct rte_flow_attr *attr,
> >                        const struct rte_flow_item *pattern,
> > -                      const struct rte_flow_action *actions);
> > +                      const struct rte_flow_action *actions,
> > +                      const struct tunnel_ops *tunnel_ops);
> >  int port_flow_create(portid_t port_id,
> >                      const struct rte_flow_attr *attr,
> >                      const struct rte_flow_item *pattern,
> > -                    const struct rte_flow_action *actions);
> > +                    const struct rte_flow_action *actions,
> > +                    const struct tunnel_ops *tunnel_ops);
> >  void update_age_action_context(const struct rte_flow_action *actions,
> >                      struct port_flow *pf);
> >  int port_flow_destroy(portid_t port_id, uint32_t n, const uint32_t *rule);
> > @@ -763,6 +787,12 @@ int port_flow_query(portid_t port_id, uint32_t
> rule,
> >                     const struct rte_flow_action *action);
> >  void port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group);
> >  void port_flow_aged(portid_t port_id, uint8_t destroy);
> > +const char *port_flow_tunnel_type(struct rte_flow_tunnel *tunnel);
> > +struct port_flow_tunnel *
> > +port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun);
> > +void port_flow_tunnel_list(portid_t port_id);
> > +void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id);
> > +void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops
> *ops);
> >  int port_flow_isolate(portid_t port_id, int set);
> >
> >  void rx_ring_desc_display(portid_t port_id, queueid_t rxq_id, uint16_t
> rxd_id);
> > diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
> > index 8488fa1a8f..781a813759 100644
> > --- a/app/test-pmd/util.c
> > +++ b/app/test-pmd/util.c
> > @@ -48,18 +48,49 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue,
> struct rte_mbuf *pkts[],
> >                is_rx ? "received" : "sent",
> >                (unsigned int) nb_pkts);
> >         for (i = 0; i < nb_pkts; i++) {
> > +               int ret;
> > +               struct rte_flow_error error;
> > +               struct rte_flow_restore_info info = { 0, };
> > +
> >                 mb = pkts[i];
> >                 eth_hdr = rte_pktmbuf_read(mb, 0, sizeof(_eth_hdr), &_eth_hdr);
> >                 eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
> > -               ol_flags = mb->ol_flags;
> >                 packet_type = mb->packet_type;
> >                 is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
> > -
> > +               ret = rte_flow_get_restore_info(port_id, mb, &info, &error);
> > +               if (!ret) {
> > +                       printf("restore info:");
> > +                       if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
> > +                               struct port_flow_tunnel *port_tunnel;
> > +
> > +                               port_tunnel = port_flow_locate_tunnel
> > +                                             (port_id, &info.tunnel);
> > +                               printf(" - tunnel");
> > +                               if (port_tunnel)
> > +                                       printf(" #%u", port_tunnel->id);
> > +                               else
> > +                                       printf(" %s", "-none-");
> > +                               printf(" type %s",
> > +                                       port_flow_tunnel_type(&info.tunnel));
> > +                       } else {
> > +                               printf(" - no tunnel info");
> > +                       }
> > +                       if (info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED)
> > +                               printf(" - outer header present");
> > +                       else
> > +                               printf(" - no outer header");
> > +                       if (info.flags & RTE_FLOW_RESTORE_INFO_GROUP_ID)
> > +                               printf(" - miss group %u", info.group_id);
> > +                       else
> > +                               printf(" - no miss group");
> > +                       printf("\n");
> > +               }
> >                 print_ether_addr("  src=", &eth_hdr->s_addr);
> >                 print_ether_addr(" - dst=", &eth_hdr->d_addr);
> >                 printf(" - type=0x%04x - length=%u - nb_segs=%d",
> >                        eth_type, (unsigned int) mb->pkt_len,
> >                        (int)mb->nb_segs);
> > +               ol_flags = mb->ol_flags;
> >                 if (ol_flags & PKT_RX_RSS_HASH) {
> >                         printf(" - RSS hash=0x%x", (unsigned int) mb->hash.rss);
> >                         printf(" - RSS queue=0x%x", (unsigned int) queue);
> > diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > index a972ef8951..97fcbfd329 100644
> > --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > @@ -3720,6 +3720,45 @@ following sections.
> >
> >     flow aged {port_id} [destroy]
> >
> > +- Tunnel offload - create a tunnel stub::
> > +
> > +   flow tunnel create {port_id} type {tunnel_type}
> > +
> > +- Tunnel offload - destroy a tunnel stub::
> > +
> > +   flow tunnel destroy {port_id} id {tunnel_id}
> > +
> > +- Tunnel offload - list port tunnel stubs::
> > +
> > +   flow tunnel list {port_id}
> > +
> > +Creating a tunnel stub for offload
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +``flow tunnel create`` setup a tunnel stub for tunnel offload flow rules::
> > +
> > +   flow tunnel create {port_id} type {tunnel_type}
> > +
> > +If successful, it will return a tunnel stub ID usable with other commands::
> > +
> > +   port [...]: flow tunnel #[...] type [...]
> > +
> > +Tunnel stub ID is relative to a port.
> > +
> > +Destroying tunnel offload stub
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +``flow tunnel destroy`` destroy port tunnel stub::
> > +
> > +   flow tunnel destroy {port_id} id {tunnel_id}
> > +
> > +Listing tunnel offload stubs
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +``flow tunnel list`` list port tunnel offload stubs::
> > +
> > +   flow tunnel list {port_id}
> > +
> >  Validating flow rules
> >  ~~~~~~~~~~~~~~~~~~~~~
> >
> > @@ -3766,6 +3805,7 @@ to ``rte_flow_create()``::
> >
> >     flow create {port_id}
> >        [group {group_id}] [priority {level}] [ingress] [egress] [transfer]
> > +      [tunnel_set {tunnel_id}] [tunnel_match {tunnel_id}]
> >        pattern {item} [/ {item} [...]] / end
> >        actions {action} [/ {action} [...]] / end
> >
> > @@ -3780,6 +3820,7 @@ Otherwise it will show an error message of the
> form::
> >  Parameters describe in the following order:
> >
> >  - Attributes (*group*, *priority*, *ingress*, *egress*, *transfer* tokens).
> > +- Tunnel offload specification (tunnel_set, tunnel_match)
> >  - A matching pattern, starting with the *pattern* token and terminated by
> an
> >    *end* pattern item.
> >  - Actions, starting with the *actions* token and terminated by an *end*
> > @@ -3823,6 +3864,14 @@ Most rules affect RX therefore contain the
> ``ingress`` token::
> >
> >     testpmd> flow create 0 ingress pattern [...]
> >
> > +Tunnel offload
> > +^^^^^^^^^^^^^^
> > +
> > +Indicate tunnel offload rule type
> > +
> > +- ``tunnel_set {tunnel_id}``: mark rule as tunnel offload decap_set type.
> > +- ``tunnel_match {tunnel_id}``:  mark rule as tunel offload match type.
> > +
> >  Matching pattern
> >  ^^^^^^^^^^^^^^^^
> >
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/4] ethdev: allow negative values in flow rule types
  2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-10-04  5:40     ` Ajit Khaparde
  2020-10-04  9:24       ` Gregory Etelson
  0 siblings, 1 reply; 95+ messages in thread
From: Ajit Khaparde @ 2020-10-04  5:40 UTC (permalink / raw)
  To: Gregory Etelson
  Cc: dpdk-dev, Matan Azrad, rasland, Gregory Etelson, Ori Kam,
	Viacheslav Ovsiienko, Ori Kam, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko

Gregory,
Please see inline.

On Wed, Sep 30, 2020 at 2:19 AM Gregory Etelson <getelson@nvidia.com> wrote:
>
> From: Gregory Etelson <getelson@mellanox.com>
>
> RTE flow items & actions use positive values in item & action type.
> Negative values are reserved for PMD private types. PMD
> items & actions usually are not exposed to application and are not
> used to create RTE flows.
>
> The patch allows applications with access to PMD flow
> items & actions ability to integrate RTE and PMD items & actions
> and use them to create flow rule.
>
> RTE flow library functions cannot work with PMD private items and
> actions (elements) because RTE flow has no API to query PMD flow
> object size. In the patch, PMD flow elements use object pointer.
> RTE flow library functions handle PMD element object size as
> size of a pointer. PMD handles its objects internally.

This is important information. Apart from the commit log,
this should also be added  in the rte_flow API documentation.
The comment in the code/API could be elaborated with this info as well.

>
> Signed-off-by: Gregory Etelson <getelson@mellanox.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  lib/librte_ethdev/rte_flow.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index f8fdd68fe9..c8c6d62a8b 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -564,7 +564,11 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
>                 }
>                 break;
>         default:
> -               off = rte_flow_desc_item[item->type].size;
> +               /**
> +                * allow PMD private flow item
> +                */
> +               off = (int)item->type >= 0 ?
> +                     rte_flow_desc_item[item->type].size : sizeof(void *);
>                 rte_memcpy(buf, data, (size > off ? off : size));
>                 break;
>         }
> @@ -667,7 +671,11 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
>                 }
>                 break;
>         default:
> -               off = rte_flow_desc_action[action->type].size;
> +               /**
> +                * allow PMD private flow action
> +                */
> +               off = (int)action->type >= 0 ?
> +                     rte_flow_desc_action[action->type].size : sizeof(void *);
>                 rte_memcpy(buf, action->conf, (size > off ? off : size));
>                 break;
>         }
> @@ -709,8 +717,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
>         unsigned int i;
>
>         for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> -               if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
> -                   !rte_flow_desc_item[src->type].name)
> +               /**
> +                * allow PMD private flow item
> +                */
> +               if (((int)src->type >= 0) &&
> +                       ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
> +                   !rte_flow_desc_item[src->type].name))
>                         return rte_flow_error_set
>                                 (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
>                                  "cannot convert unknown item type");
> @@ -798,8 +810,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
>         unsigned int i;
>
>         for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> -               if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> -                   !rte_flow_desc_action[src->type].name)
> +               /**
> +                * allow PMD private flow action
> +                */
> +               if (((int)src->type >= 0) &&
> +                   ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> +                   !rte_flow_desc_action[src->type].name))
>                         return rte_flow_error_set
>                                 (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
>                                  src, "cannot convert unknown action type");
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] app/testpmd: add commands for tunnel offload API
  2020-10-01  9:05       ` Gregory Etelson
@ 2020-10-04  5:40         ` Ajit Khaparde
  2020-10-04  9:29           ` Gregory Etelson
  0 siblings, 1 reply; 95+ messages in thread
From: Ajit Khaparde @ 2020-10-04  5:40 UTC (permalink / raw)
  To: Gregory Etelson
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam, Wenzhuo Lu,
	Beilei Xing, Bernard Iremonger, John McNamara, Marko Kovacevic,
	Eli Britstein, Oz Shlomo

Gregory,

[snip]
> > > testpmd> flow tunnel 0 type vxlan
> > > port 0: flow tunnel #1 type vxlan
> > > testpmd> flow create 0 ingress group 0 tunnel_set 1
> > >          pattern eth /ipv4 / udp dst is 4789 / vxlan / end
> > >          actions  jump group 0 / end
> >
> > I could not get enough time to completely look at this.
> > Can you help understand the sequence a bit.
> >
> > So when this flow create is issued, it looks like the application issues
> > one of the helper APIs.
> > In response the PMD returns the vendor specific items/actions.
> > So what happens to the original match and action criteria provided by the
> > user
> > or applications?
> >
> > Does the vendor provided actions and items in the tunnel descriptor
> > take precedence over what was used by the application?
> >
> > And is the criteria provided by the PMD opaque to the application?
> > Such that only the PMD can decipher it when the application calls it
> > for subsequent flow create(s)?
> >
>
> Flow rules that implement tunnel offload API are compiled from application and
> PMD actions and items. Both application and PMD elements are equally important
> for offload rule creation. Application is "aware" what type of tunnel offload
> flow rule, decap_set or match, it builds. Therefore, application elements are
> created according to tunnel offload rule type and the API helper function, that
> matches selected rule type, is activated to provide PMD elements.
> PMD elements are opaque. When all elements are present, application concatenates
> PMD items with application items and PMD actions with application actions.
> As the result application will have pattern & actions arrays for flow rule
> creation. For the decap_set rule, the API expects application to provide pattern
> that describes a tunnel and the JUMP action, although the JUMP is not strictly
> required. And for the match rule, application pattern should describe packet
> headers and actions refer to inner parts.
Since the application concatenates the elements provided by the PMD with
that of its own, the PMD is free to validate, parse and use a subset
or all of them?

[snip]

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/4] ethdev: allow negative values in flow rule types
  2020-10-04  5:40     ` Ajit Khaparde
@ 2020-10-04  9:24       ` Gregory Etelson
  0 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-04  9:24 UTC (permalink / raw)
  To: Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Gregory Etelson,
	Ori Kam, Slava Ovsiienko, Ori Kam, NBU-Contact-Thomas Monjalon,
	Ferruh Yigit, Andrew Rybchenko, Eli Britstein, Oz Shlomo

Hello Ajit,

[snip]
> > RTE flow library functions cannot work with PMD private items and
> > actions (elements) because RTE flow has no API to query PMD flow
> > object size. In the patch, PMD flow elements use object pointer.
> > RTE flow library functions handle PMD element object size as
> > size of a pointer. PMD handles its objects internally.
> 
> This is important information. Apart from the commit log,
> this should also be added  in the rte_flow API documentation.
> The comment in the code/API could be elaborated with this info as well.
> 

I'll update code comments & rte_flow  API documentation in the next patch update.
The update will be ready this week.

> >
> > Signed-off-by: Gregory Etelson <getelson@mellanox.com>
> > Acked-by: Ori Kam <orika@nvidia.com>
> > Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> > ---
> >  lib/librte_ethdev/rte_flow.c | 28 ++++++++++++++++++++++------
> >  1 file changed, 22 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> > index f8fdd68fe9..c8c6d62a8b 100644
> > --- a/lib/librte_ethdev/rte_flow.c
> > +++ b/lib/librte_ethdev/rte_flow.c
> > @@ -564,7 +564,11 @@ rte_flow_conv_item_spec(void *buf, const size_t
> size,
> >                 }
> >                 break;
> >         default:
> > -               off = rte_flow_desc_item[item->type].size;
> > +               /**
> > +                * allow PMD private flow item
> > +                */
> > +               off = (int)item->type >= 0 ?
> > +                     rte_flow_desc_item[item->type].size : sizeof(void *);
> >                 rte_memcpy(buf, data, (size > off ? off : size));
> >                 break;
> >         }
> > @@ -667,7 +671,11 @@ rte_flow_conv_action_conf(void *buf, const size_t
> size,
> >                 }
> >                 break;
> >         default:
> > -               off = rte_flow_desc_action[action->type].size;
> > +               /**
> > +                * allow PMD private flow action
> > +                */
> > +               off = (int)action->type >= 0 ?
> > +                     rte_flow_desc_action[action->type].size : sizeof(void *);
> >                 rte_memcpy(buf, action->conf, (size > off ? off : size));
> >                 break;
> >         }
> > @@ -709,8 +717,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
> >         unsigned int i;
> >
> >         for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> > -               if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
> > -                   !rte_flow_desc_item[src->type].name)
> > +               /**
> > +                * allow PMD private flow item
> > +                */
> > +               if (((int)src->type >= 0) &&
> > +                       ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
> > +                   !rte_flow_desc_item[src->type].name))
> >                         return rte_flow_error_set
> >                                 (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
> >                                  "cannot convert unknown item type");
> > @@ -798,8 +810,12 @@ rte_flow_conv_actions(struct rte_flow_action
> *dst,
> >         unsigned int i;
> >
> >         for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> > -               if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> > -                   !rte_flow_desc_action[src->type].name)
> > +               /**
> > +                * allow PMD private flow action
> > +                */
> > +               if (((int)src->type >= 0) &&
> > +                   ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
> > +                   !rte_flow_desc_action[src->type].name))
> >                         return rte_flow_error_set
> >                                 (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
> >                                  src, "cannot convert unknown action type");
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] app/testpmd: add commands for tunnel offload API
  2020-10-04  5:40         ` Ajit Khaparde
@ 2020-10-04  9:29           ` Gregory Etelson
  0 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-04  9:29 UTC (permalink / raw)
  To: Ajit Khaparde
  Cc: dpdk-dev, Matan Azrad, Raslan Darawsheh, Ori Kam, Wenzhuo Lu,
	Beilei Xing, Bernard Iremonger, John McNamara, Marko Kovacevic,
	Eli Britstein, Oz Shlomo

Hello Ajit,

> [snip]
> > > > testpmd> flow tunnel 0 type vxlan
> > > > port 0: flow tunnel #1 type vxlan
> > > > testpmd> flow create 0 ingress group 0 tunnel_set 1
> > > >          pattern eth /ipv4 / udp dst is 4789 / vxlan / end
> > > >          actions  jump group 0 / end
> > >
> > > I could not get enough time to completely look at this.
> > > Can you help understand the sequence a bit.
> > >
> > > So when this flow create is issued, it looks like the application issues
> > > one of the helper APIs.
> > > In response the PMD returns the vendor specific items/actions.
> > > So what happens to the original match and action criteria provided by the
> > > user
> > > or applications?
> > >
> > > Does the vendor provided actions and items in the tunnel descriptor
> > > take precedence over what was used by the application?
> > >
> > > And is the criteria provided by the PMD opaque to the application?
> > > Such that only the PMD can decipher it when the application calls it
> > > for subsequent flow create(s)?
> > >
> >
> > Flow rules that implement tunnel offload API are compiled from application
> and
> > PMD actions and items. Both application and PMD elements are equally
> important
> > for offload rule creation. Application is "aware" what type of tunnel offload
> > flow rule, decap_set or match, it builds. Therefore, application elements
> are
> > created according to tunnel offload rule type and the API helper function,
> that
> > matches selected rule type, is activated to provide PMD elements.
> > PMD elements are opaque. When all elements are present, application
> concatenates
> > PMD items with application items and PMD actions with application
> actions.
> > As the result application will have pattern & actions arrays for flow rule
> > creation. For the decap_set rule, the API expects application to provide
> pattern
> > that describes a tunnel and the JUMP action, although the JUMP is not
> strictly
> > required. And for the match rule, application pattern should describe
> packet
> > headers and actions refer to inner parts.
> Since the application concatenates the elements provided by the PMD with
> that of its own, the PMD is free to validate, parse and use a subset
> or all of them?
> 

Concatenated array must provide enough information for PMD to build a rule.
Application RTE elements describe flow and flow actions while PMD elements are
about internal PMD hardware offload and restore after partial miss . PMD uses all
application elements, but it may skip internal elements, depending on
implementation.

> [snip]

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (4 preceding siblings ...)
  2020-09-30  9:18 ` [dpdk-dev] [PATCH v3 0/4] Tunnel Offload API Gregory Etelson
@ 2020-10-04 13:50 ` Gregory Etelson
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
                     ` (4 more replies)
  2020-10-15 12:41 ` [dpdk-dev] [PATCH v5 0/3] " Gregory Etelson
                   ` (11 subsequent siblings)
  17 siblings, 5 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-04 13:50 UTC (permalink / raw)
  To: dev; +Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

 v2:
 * documentation updates
 * MLX5 PMD implementation for tunnel offload
 * testpmd updates for tunnel offload

 v3:
 * documentation updates
 * MLX5 PMD updates
 * testpmd updates

 v4:
 * updated patch: allow negative values in flow rule types


Eli Britstein (1):
  ethdev: tunnel offload model

Gregory Etelson (3):
  ethdev: allow negative values in flow rule types
  net/mlx5: implement tunnel offload API
  app/testpmd: add commands for tunnel offload API

 app/test-pmd/cmdline_flow.c                 | 170 ++++-
 app/test-pmd/config.c                       | 253 +++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 +-
 app/test-pmd/util.c                         |  35 +-
 doc/guides/nics/mlx5.rst                    |   3 +
 doc/guides/prog_guide/rte_flow.rst          | 108 ++++
 doc/guides/rel_notes/release_20_11.rst      |   9 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++
 drivers/net/mlx5/linux/mlx5_os.c            |  18 +
 drivers/net/mlx5/mlx5.c                     |   8 +-
 drivers/net/mlx5/mlx5.h                     |   3 +
 drivers/net/mlx5/mlx5_defs.h                |   2 +
 drivers/net/mlx5/mlx5_flow.c                | 678 +++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h                | 173 ++++-
 drivers/net/mlx5/mlx5_flow_dv.c             | 241 ++++++-
 lib/librte_ethdev/rte_ethdev_version.map    |   6 +
 lib/librte_ethdev/rte_flow.c                | 140 +++-
 lib/librte_ethdev/rte_flow.h                | 195 ++++++
 lib/librte_ethdev/rte_flow_driver.h         |  32 +
 20 files changed, 2097 insertions(+), 65 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v4 1/4] ethdev: allow negative values in flow rule types
  2020-10-04 13:50 ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Gregory Etelson
@ 2020-10-04 13:50   ` Gregory Etelson
  2020-10-14 23:40     ` Thomas Monjalon
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model Gregory Etelson
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-10-04 13:50 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde,
	Gregory Etelson, Ori Kam, Viacheslav Ovsiienko, Thomas Monjalon,
	Ferruh Yigit, Andrew Rybchenko

From: Gregory Etelson <getelson@mellanox.com>

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow item or action coversion library accepts positive known
element types with predefined sizes only. Private PMD items and
actions do not fit into this scheme becase PMD type values are
negative, each PMD has it's own types numeration and element types and
their sizes are not visible at RTE level.  To resolve these
limitations the patch proposes this solution:
1. PMD can to expose elements of pointer size only.  RTE flow
   conversion functions will use pointer size for each configuration
   object in private PMD element it processes;
2. RTE flow verification will not reject elements with negative type.

Signed-off-by: Gregory Etelson <getelson@mellanox.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v4:
* update the 'Negative types' section in the rtre_flow.rst

* update the patch comment
---
 doc/guides/prog_guide/rte_flow.rst |  3 +++
 lib/librte_ethdev/rte_flow.c       | 28 ++++++++++++++++++++++------
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 119b128739..df71bd2eeb 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2678,6 +2678,9 @@ identifiers they are not aware of.
 
 A method to generate them remains to be defined.
 
+Application may use PMD dynamic items or actions in flow rules. In that case
+size of configuration object in dynamic element must be a pointer size.
+
 Planned types
 ~~~~~~~~~~~~~
 
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index f8fdd68fe9..c8c6d62a8b 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -564,7 +564,11 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_item[item->type].size;
+		/**
+		 * allow PMD private flow item
+		 */
+		off = (int)item->type >= 0 ?
+		      rte_flow_desc_item[item->type].size : sizeof(void *);
 		rte_memcpy(buf, data, (size > off ? off : size));
 		break;
 	}
@@ -667,7 +671,11 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_action[action->type].size;
+		/**
+		 * allow PMD private flow action
+		 */
+		off = (int)action->type >= 0 ?
+		      rte_flow_desc_action[action->type].size : sizeof(void *);
 		rte_memcpy(buf, action->conf, (size > off ? off : size));
 		break;
 	}
@@ -709,8 +717,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
-		    !rte_flow_desc_item[src->type].name)
+		/**
+		 * allow PMD private flow item
+		 */
+		if (((int)src->type >= 0) &&
+			((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
+		    !rte_flow_desc_item[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
 				 "cannot convert unknown item type");
@@ -798,8 +810,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
-		    !rte_flow_desc_action[src->type].name)
+		/**
+		 * allow PMD private flow action
+		 */
+		if (((int)src->type >= 0) &&
+		    ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
+		    !rte_flow_desc_action[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				 src, "cannot convert unknown action type");
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model
  2020-10-04 13:50 ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Gregory Etelson
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-10-04 13:50   ` Gregory Etelson
  2020-10-06  9:47     ` Sriharsha Basavapatna
  2020-10-14 23:55     ` Thomas Monjalon
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 3/4] net/mlx5: implement tunnel offload API Gregory Etelson
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-04 13:50 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde,
	Eli Britstein, Ori Kam, Viacheslav Ovsiienko, Ray Kinsella,
	Neil Horman, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko

From: Eli Britstein <elibr@mellanox.com>

Rte_flow API provides the building blocks for vendor agnostic flow
classification offloads.  The rte_flow match and action primitives are
fine grained, thus enabling DPDK applications the flexibility to
offload network stacks and complex pipelines.

Applications wishing to offload complex data structures (e.g. tunnel
virtual ports) are required to use the rte_flow primitives, such as
group, meta, mark, tag and others to model their high level objects.

The hardware model design for high level software objects is not
trivial.  Furthermore, an optimal design is often vendor specific.

The goal of this API is to provide applications with the hardware
offload model for common high level software objects which is optimal
in regards to the underlying hardware.

Tunnel ports are the first of such objects.

Tunnel ports
------------
Ingress processing of tunneled traffic requires the classification of
the tunnel type followed by a decap action.

In software, once a packet is decapsulated the in_port field is
changed to a virtual port representing the tunnel type. The outer
header fields are stored as packet metadata members and may be matched
by proceeding flows.

Openvswitch, for example, uses two flows:
1. classification flow - setting the virtual port representing the
tunnel type For example: match on udp port 4789
actions=tnl_pop(vxlan_vport)
2. steering flow according to outer and inner header matches match on
in_port=vxlan_vport and outer/inner header matches actions=forward to
p ort X The benefits of multi-flow tables are described in [1].

Offloading tunnel ports
-----------------------
Tunnel ports introduce a new stateless field that can be matched on.
Currently the rte_flow library provides an API to encap, decap and
match on tunnel headers. However, there is no rte_flow primitive to
set and match tunnel virtual ports.

There are several possible hardware models for offloading virtual
tunnel port flows including, but not limited to, the following:
1. Setting the virtual port on a hw register using the
rte_flow_action_mark/ rte_flow_action_tag/rte_flow_set_meta objects.
2. Mapping a virtual port to an rte_flow group
3. Avoiding the need to match on transient objects by merging
multi-table flows to a single rte_flow rule.

Every approach has its pros and cons.  The preferred approach should
take into account the entire system architecture and is very often
vendor specific.

The proposed rte_flow_tunnel_decap_set helper function (drafted below)
is designed to provide a common, vendor agnostic, API for setting the
virtual port value.  The helper API enables PMD implementations to
return vendor specific combination of rte_flow actions realizing the
vendor's hardware model for setting a tunnel port.  Applications may
append the list of actions returned from the helper function when
creating an rte_flow rule in hardware.

Similarly, the rte_flow_tunnel_match helper (drafted below)
allows for multiple hardware implementations to return a list of
fte_flow items.

Miss handling
-------------
Packets going through multiple rte_flow groups are exposed to hw
misses due to partial packet processing. In such cases, the software
should continue the packet's processing from the point where the
hardware missed.

We propose a generic rte_flow_restore structure providing the state
that was stored in hardware when the packet missed.

Currently, the structure will provide the tunnel state of the packet
that missed, namely:
1. The group id that missed
2. The tunnel port that missed
3. Tunnel information that was stored in memory (due to decap action).
In the future, we may add additional fields as more state may be
stored in the device memory (e.g. ct_state).

Applications may query the state via a new
rte_flow_tunnel_get_restore_info(mbuf) API, thus allowing
a vendor specific implementation.

VXLAN Code example:
Assume application needs to do inner NAT on VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

First VXLAN packet that arrives matches the rule in group 0 and jumps
to group 3 In group 3 the packet will miss since there is no flow to
match and will be uploaded to application.  Application  will call
rte_flow_get_restore_info() to get the packet outer header.
Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of rules will be that VXLAN packet with vni=10, outer IPv4
dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decaped
on queue 3 with IPv4 dst=186.1.1.1

Note: Packet in group 3 is considered decaped. All actions in that
group will be done on header that was inner before decap. Application
may specify outer header to be matched on.  It's PMD responsibility to
translate these items to outer metadata.

API usage:
/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);

/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);
/**
  * 4. after flow creation application does not need to keep tunnel
  * action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);

/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
 doc/guides/rel_notes/release_20_11.rst   |   9 ++
 lib/librte_ethdev/rte_ethdev_version.map |   6 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 6 files changed, 459 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index df71bd2eeb..28f2aca984 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3034,6 +3034,111 @@ operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Provide software application with unified rules model for tunneled traffic
+regardless underlying hardware.
+
+ - The model introduces a concept of a virtual tunnel port (VTP).
+ - The model uses VTP to offload ingress tunneled network traffic 
+   with RTE flow rules.
+ - The model is implemented as set of helper functions. Each PMD
+   implements VTP offload according to underlying hardware offload
+   capabilities.  Applications must query PMD for VTP flow
+   items / actions before using in creation of a VTP flow rule.
+
+The model components:
+
+- Virtual Tunnel Port (VTP) is a stateless software object that
+  describes tunneled network traffic.  VTP object usually contains
+  descriptions of outer headers, tunnel headers and inner headers.
+- Tunnel Steering flow Rule (TSR) detects tunneled packets and
+  delegates them to tunnel processing infrastructure, implemented
+  in PMD for optimal hardware utilization, for further processing.
+- Tunnel Matching flow Rule (TMR) verifies packet configuration and
+  runs offload actions in case of a match.
+
+Application actions:
+
+1 Initialize VTP object according to tunnel network parameters.
+
+2 Create TSR flow rule.
+
+2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
+
+  .. code-block:: c
+
+    int
+    rte_flow_tunnel_decap_set(uint16_t port_id,
+                              struct rte_flow_tunnel *tunnel,
+                              struct rte_flow_action **pmd_actions,
+                              uint32_t *num_of_pmd_actions,
+                              struct rte_flow_error *error);
+
+2.2 Integrate PMD actions into TSR actions list.
+
+2.3 Create TSR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
+
+3 Create TMR flow rule.
+
+3.1 Query PMD for VTP items. Application can query for VTP items more than once.
+
+    .. code-block:: c
+
+      int
+      rte_flow_tunnel_match(uint16_t port_id,
+                            struct rte_flow_tunnel *tunnel,
+                            struct rte_flow_item **pmd_items,
+                            uint32_t *num_of_pmd_items,
+                            struct rte_flow_error *error);
+
+3.2 Integrate PMD items into TMR items list.
+
+3.3 Create TMR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
+
+The model provides helper function call to restore packets that miss
+tunnel TMR rules to its original state:
+
+.. code-block:: c
+
+  int
+  rte_flow_get_restore_info(uint16_t port_id,
+                            struct rte_mbuf *mbuf,
+                            struct rte_flow_restore_info *info,
+                            struct rte_flow_error *error);
+
+rte_tunnel object filled by the call inside
+``rte_flow_restore_info *info parameter`` can be used by the application
+to create new TMR rule for that tunnel.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+Application can release the actionsfter TSR rule was created.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release the items after rule was created. However, if the application
+needs to create additional TMR rule for the same tunnel it will need
+to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 1a9945f314..9ea8ee3170 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -62,6 +62,15 @@ New Features
   * Added support for 200G PAM4 link speed.
   * Added support for RSS hash level selection.
   * Updated HWRM structures to 1.10.1.70 version.
+* **Flow rules allowed to use private PMD items / actions.**
+
+  * Flow rule verification was updated to accept private PMD
+    items and actions.
+
+* **Added generic API to offload tunneled traffic and restore missed packet.**
+
+  * Added a new hardware independent helper API to RTE flow library that
+    offloads tunneled traffic and restores missed packets.
 
 * **Updated Cisco enic driver.**
 
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index c95ef5157a..9832c138a2 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -226,6 +226,12 @@ EXPERIMENTAL {
 	rte_tm_wred_profile_add;
 	rte_tm_wred_profile_delete;
 
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
+
 	# added in 20.11
 	rte_eth_link_speed_to_str;
 	rte_eth_link_to_str;
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index c8c6d62a8b..181c02792d 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1267,3 +1267,115 @@ rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				  NULL, rte_strerror(ENOTSUP));
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *restore_info,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index da8bfa5489..2f12d3ea1a 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3357,6 +3357,201 @@ int
 rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
 			uint32_t nb_contexts, struct rte_flow_error *error);
 
+/* Tunnel has a type and the key information. */
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type	type;
+	uint64_t tun_id; /**< Tunnel identification. */
+
+	RTE_STD_C11
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	};
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+	uint16_t   tun_flags; /**< Tunnel flags. */
+
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+
+	/**
+	 * the following members are required to restore packet
+	 * after miss
+	 */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	uint32_t label; /**< Flow Label for IPv6. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID where packed missed */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *info,
+			  struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 3ee871d3eb..9d87407203 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -108,6 +108,38 @@ struct rte_flow_ops {
 		 void **context,
 		 uint32_t nb_contexts,
 		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v4 3/4] net/mlx5: implement tunnel offload API
  2020-10-04 13:50 ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Gregory Etelson
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model Gregory Etelson
@ 2020-10-04 13:50   ` Gregory Etelson
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 4/4] app/testpmd: add commands for " Gregory Etelson
  2020-10-14 17:25   ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Ferruh Yigit
  4 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-04 13:50 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde,
	Viacheslav Ovsiienko, Shahaf Shuler

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions whith tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2:
* introduce MLX5 PMD API implementation
v3:
* bug fixes
---
 doc/guides/nics/mlx5.rst         |   3 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +
 drivers/net/mlx5/mlx5.c          |   8 +-
 drivers/net/mlx5/mlx5.h          |   3 +
 drivers/net/mlx5/mlx5_defs.h     |   2 +
 drivers/net/mlx5/mlx5_flow.c     | 678 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h     | 173 +++++++-
 drivers/net/mlx5/mlx5_flow_dv.c  | 241 +++++++++--
 8 files changed, 1080 insertions(+), 46 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index b0614ae335..03eec3503d 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -815,6 +815,9 @@ Driver options
     24 bits. The actual supported width can be retrieved in runtime by
     series of rte_flow_validate() trials.
 
+  - 3, this engages tunnel offload mode. In E-Switch configuration, that
+    mode implicitly activates ``dv_xmeta_en=1``.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 188a6d4c38..70db822987 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -298,6 +298,12 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		sh->esw_drop_action = mlx5_glue->dr_create_flow_action_drop();
 	}
 #endif
+	if (!sh->tunnel_hub)
+		err = mlx5_alloc_tunnel_hub(sh);
+	if (err) {
+		DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
+		goto error;
+	}
 	if (priv->config.reclaim_mode == MLX5_RCM_AGGR) {
 		mlx5_glue->dr_reclaim_domain_memory(sh->rx_domain, 1);
 		mlx5_glue->dr_reclaim_domain_memory(sh->tx_domain, 1);
@@ -344,6 +350,10 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 	return err;
 }
@@ -405,6 +415,10 @@ mlx5_os_free_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 }
 
@@ -658,6 +672,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			strerror(rte_errno));
 		goto error;
 	}
+	if (config->dv_miss_info) {
+		if (switch_info->master || switch_info->representor)
+			config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
+	}
 	mlx5_malloc_mem_select(config->sys_mem_en);
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 01ead6e6af..88d843fc4e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1591,13 +1591,17 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 	} else if (strcmp(MLX5_DV_XMETA_EN, key) == 0) {
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
-		    tmp != MLX5_XMETA_MODE_META32) {
+		    tmp != MLX5_XMETA_MODE_META32 &&
+		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
 			DRV_LOG(ERR, "invalid extensive "
 				     "metadata parameter");
 			rte_errno = EINVAL;
 			return -rte_errno;
 		}
-		config->dv_xmeta_en = tmp;
+		if (tmp != MLX5_XMETA_MODE_MISS_INFO)
+			config->dv_xmeta_en = tmp;
+		else
+			config->dv_miss_info = 1;
 	} else if (strcmp(MLX5_LACP_BY_USER, key) == 0) {
 		config->lacp_by_user = !!tmp;
 	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index bd91e167e0..b008531736 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -206,6 +206,7 @@ struct mlx5_dev_config {
 	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	unsigned int sys_mem_en:1; /* The default memory allocator. */
 	unsigned int decap_en:1; /* Whether decap will be used or not. */
+	unsigned int dv_miss_info:1; /* restore packet after partial hw miss */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
@@ -632,6 +633,8 @@ struct mlx5_dev_ctx_shared {
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
 	struct mlx5_hlist *flow_tbls;
+	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
+	struct mlx5_flow_tunnel_hub *tunnel_hub;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
 	void *pop_vlan_action; /* Pointer to DR pop VLAN action. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 0df47391ee..41a7537d5e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -165,6 +165,8 @@
 #define MLX5_XMETA_MODE_LEGACY 0
 #define MLX5_XMETA_MODE_META16 1
 #define MLX5_XMETA_MODE_META32 2
+/* Provide info on patrial hw miss. Implies MLX5_XMETA_MODE_META16 */
+#define MLX5_XMETA_MODE_MISS_INFO 3
 
 /* MLX5_TX_DB_NC supported values. */
 #define MLX5_TXDB_CACHED 0
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index ffa7646ca4..36c1aa4543 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -18,6 +18,7 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 #include <rte_ip.h>
+#include <rte_hash.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -30,6 +31,18 @@
 #include "mlx5_flow_os.h"
 #include "mlx5_rxtx.h"
 
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id);
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev, struct mlx5_flow_tunnel *tunnel);
+static uint32_t
+tunnel_flow_group_to_flow_table(struct rte_eth_dev *dev,
+				const struct mlx5_flow_tunnel *tunnel,
+				uint32_t group, uint32_t *table,
+				struct rte_flow_error *error);
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark);
+
 /** Device flow drivers. */
 extern const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops;
 
@@ -220,6 +233,171 @@ static const struct rte_flow_expand_node mlx5_support_expansion[] = {
 	},
 };
 
+struct tunnel_validation {
+	bool verdict;
+	const char *msg;
+};
+
+static inline struct tunnel_validation
+mlx5_flow_tunnel_validate(struct rte_eth_dev *dev,
+			  struct rte_flow_tunnel *tunnel)
+{
+	struct tunnel_validation tv;
+
+	if (!is_tunnel_offload_active(dev)) {
+		tv.msg = "tunnel offload was not activated";
+		goto err;
+	} else if (!tunnel) {
+		tv.msg = "no application tunnel";
+		goto err;
+	}
+
+	switch (tunnel->type) {
+	default:
+		tv.msg = "unsupported tunnel type";
+		goto err;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		break;
+	}
+
+	tv.verdict = true;
+	return tv;
+
+err:
+	tv.verdict = false;
+	return tv;
+}
+
+static int
+mlx5_flow_tunnel_decap_set(struct rte_eth_dev *dev,
+		    struct rte_flow_tunnel *app_tunnel,
+		    struct rte_flow_action **actions,
+		    uint32_t *num_of_actions,
+		    struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_validation tv;
+
+	tv = mlx5_flow_tunnel_validate(dev, app_tunnel);
+	if (!tv.verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  tv.msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*actions = &tunnel->action;
+	*num_of_actions = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_match(struct rte_eth_dev *dev,
+		       struct rte_flow_tunnel *app_tunnel,
+		       struct rte_flow_item **items,
+		       uint32_t *num_of_items,
+		       struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_validation tv;
+
+	tv = mlx5_flow_tunnel_validate(dev, app_tunnel);
+	if (!tv.verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  tv.msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*items = &tunnel->item;
+	*num_of_items = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_item_release(struct rte_eth_dev *dev,
+		       struct rte_flow_item *pmd_items,
+		       uint32_t num_items, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->item == pmd_items)
+			break;
+	}
+	if (!tun || num_items != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+	return 0;
+}
+
+static int
+mlx5_flow_action_release(struct rte_eth_dev *dev,
+			 struct rte_flow_action *pmd_actions,
+			 uint32_t num_actions, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->action == pmd_actions)
+			break;
+	}
+	if (!tun || num_actions != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_get_restore_info(struct rte_eth_dev *dev,
+				  struct rte_mbuf *m,
+				  struct rte_flow_restore_info *info,
+				  struct rte_flow_error *err)
+{
+	uint64_t ol_flags = m->ol_flags;
+	const struct mlx5_flow_tbl_data_entry *tble;
+	const uint64_t mask = PKT_RX_FDIR | PKT_RX_FDIR_ID;
+
+	if ((ol_flags & mask) != mask)
+		goto err;
+	tble = tunnel_mark_decode(dev, m->hash.fdir.hi);
+	if (!tble) {
+		DRV_LOG(DEBUG, "port %u invalid miss tunnel mark %#x",
+			dev->data->port_id, m->hash.fdir.hi);
+		goto err;
+	}
+	MLX5_ASSERT(tble->tunnel);
+	memcpy(&info->tunnel, &tble->tunnel->app_tunnel, sizeof(info->tunnel));
+	info->group_id = tble->group_id;
+	info->flags = RTE_FLOW_RESTORE_INFO_TUNNEL |
+		      RTE_FLOW_RESTORE_INFO_GROUP_ID |
+		      RTE_FLOW_RESTORE_INFO_ENCAPSULATED;
+
+	return 0;
+
+err:
+	return rte_flow_error_set(err, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "failed to get restore info");
+}
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -229,6 +407,11 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
 	.get_aged_flows = mlx5_flow_get_aged_flows,
+	.tunnel_decap_set = mlx5_flow_tunnel_decap_set,
+	.tunnel_match = mlx5_flow_tunnel_match,
+	.action_release = mlx5_flow_action_release,
+	.item_release = mlx5_flow_item_release,
+	.get_restore_info = mlx5_flow_tunnel_get_restore_info,
 };
 
 /* Convert FDIR request to Generic flow. */
@@ -3524,6 +3707,136 @@ flow_hairpin_split(struct rte_eth_dev *dev,
 	return 0;
 }
 
+__extension__
+union tunnel_offload_mark {
+	uint32_t val;
+	struct {
+		uint32_t app_reserve:8;
+		uint32_t table_id:15;
+		uint32_t transfer:1;
+		uint32_t _unused_:8;
+	};
+};
+
+struct tunnel_default_miss_ctx {
+	uint16_t *queue;
+	__extension__
+	union {
+		struct rte_flow_action_rss action_rss;
+		struct rte_flow_action_queue miss_queue;
+		struct rte_flow_action_jump miss_jump;
+		uint8_t raw[0];
+	};
+};
+
+static int
+flow_tunnel_add_default_miss(struct rte_eth_dev *dev,
+			     struct rte_flow *flow,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *app_actions,
+			     uint32_t flow_idx,
+			     struct tunnel_default_miss_ctx *ctx,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_flow *dev_flow;
+	struct rte_flow_attr miss_attr = *attr;
+	const struct mlx5_flow_tunnel *tunnel = app_actions[0].conf;
+	const struct rte_flow_item miss_items[2] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		}
+	};
+	union tunnel_offload_mark mark_id;
+	struct rte_flow_action_mark miss_mark;
+	struct rte_flow_action miss_actions[3] = {
+		[0] = { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &miss_mark },
+		[2] = { .type = RTE_FLOW_ACTION_TYPE_END,  .conf = NULL }
+	};
+	const struct rte_flow_action_jump *jump_data;
+	uint32_t i, flow_table = 0; /* prevent compilation warning */
+	int ret;
+
+	if (!attr->transfer) {
+		struct mlx5_priv *priv = dev->data->dev_private;
+		uint32_t q_size;
+
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_RSS;
+		q_size = priv->reta_idx_n * sizeof(ctx->queue[0]);
+		ctx->queue = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, q_size,
+					 0, SOCKET_ID_ANY);
+		if (!ctx->queue)
+			return rte_flow_error_set
+				(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid default miss RSS");
+		ctx->action_rss.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		ctx->action_rss.level = 0,
+		ctx->action_rss.types = priv->rss_conf.rss_hf,
+		ctx->action_rss.key_len = priv->rss_conf.rss_key_len,
+		ctx->action_rss.queue_num = priv->reta_idx_n,
+		ctx->action_rss.key = priv->rss_conf.rss_key,
+		ctx->action_rss.queue = ctx->queue;
+		if (!priv->reta_idx_n || !priv->rxqs_n)
+			return rte_flow_error_set
+				(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid port configuration");
+		if (!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG))
+			ctx->action_rss.types = 0;
+		for (i = 0; i != priv->reta_idx_n; ++i)
+			ctx->queue[i] = (*priv->reta_idx)[i];
+	} else {
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_JUMP;
+		ctx->miss_jump.group = MLX5_TNL_MISS_FDB_JUMP_GRP;
+	}
+	miss_actions[1].conf = (typeof(miss_actions[1].conf))ctx->raw;
+	for (; app_actions->type != RTE_FLOW_ACTION_TYPE_JUMP; app_actions++);
+	jump_data = app_actions->conf;
+	miss_attr.priority = MLX5_TNL_MISS_RULE_PRIORITY;
+	miss_attr.group = jump_data->group;
+	ret = tunnel_flow_group_to_flow_table(dev, tunnel, jump_data->group,
+					      &flow_table, error);
+	if (ret)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  NULL, "invalid tunnel id");
+	mark_id.app_reserve = 0;
+	mark_id.table_id = tunnel_flow_tbl_to_id(flow_table);
+	mark_id.transfer = !!attr->transfer;
+	mark_id._unused_ = 0;
+	miss_mark.id = mark_id.val;
+	dev_flow = flow_drv_prepare(dev, flow, &miss_attr,
+				    miss_items, miss_actions, flow_idx, error);
+	if (!dev_flow)
+		return -rte_errno;
+	dev_flow->flow = flow;
+	dev_flow->external = true;
+	dev_flow->tunnel = tunnel;
+	/* Subflow object was created, we must include one in the list. */
+	SILIST_INSERT(&flow->dev_handles, dev_flow->handle_idx,
+		      dev_flow->handle, next);
+	DRV_LOG(DEBUG,
+		"port %u tunnel type=%d id=%u miss rule priority=%u group=%u",
+		dev->data->port_id, tunnel->app_tunnel.type,
+		tunnel->tunnel_id, miss_attr.priority, miss_attr.group);
+	ret = flow_drv_translate(dev, dev_flow, &miss_attr, miss_items,
+				  miss_actions, error);
+	if (!ret)
+		ret = flow_mreg_update_copy_table(dev, flow, miss_actions,
+						  error);
+
+	return ret;
+}
+
 /**
  * The last stage of splitting chain, just creates the subflow
  * without any modification.
@@ -4296,6 +4609,27 @@ flow_create_split_outer(struct rte_eth_dev *dev,
 	return ret;
 }
 
+static struct mlx5_flow_tunnel *
+flow_tunnel_from_rule(struct rte_eth_dev *dev,
+		      const struct rte_flow_attr *attr,
+		      const struct rte_flow_item items[],
+		      const struct rte_flow_action actions[])
+{
+	struct mlx5_flow_tunnel *tunnel;
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)items[0].spec;
+	else if (is_flow_tunnel_steer_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)actions[0].conf;
+	else
+		tunnel = NULL;
+#pragma GCC diagnostic pop
+
+	return tunnel;
+}
+
 /**
  * Create a flow and add it to @p list.
  *
@@ -4356,6 +4690,8 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	int hairpin_flow;
 	uint32_t hairpin_id = 0;
 	struct rte_flow_attr attr_tx = { .priority = 0 };
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_default_miss_ctx default_miss_ctx = { 0, };
 	int ret;
 
 	hairpin_flow = flow_check_hairpin_split(dev, attr, actions);
@@ -4430,6 +4766,19 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 					      error);
 		if (ret < 0)
 			goto error;
+		if (is_flow_tunnel_steer_rule(dev, attr,
+					      buf->entry[i].pattern,
+					      p_actions_rx)) {
+			ret = flow_tunnel_add_default_miss(dev, flow, attr,
+							   p_actions_rx,
+							   idx,
+							   &default_miss_ctx,
+							   error);
+			if (ret < 0) {
+				mlx5_free(default_miss_ctx.queue);
+				goto error;
+			}
+		}
 	}
 	/* Create the tx flow. */
 	if (hairpin_flow) {
@@ -4484,6 +4833,13 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	priv->flow_idx = priv->flow_nested_idx;
 	if (priv->flow_nested_idx)
 		priv->flow_nested_idx = 0;
+	tunnel = flow_tunnel_from_rule(dev, attr, items, actions);
+	if (tunnel) {
+		flow->tunnel = 1;
+		flow->tunnel_id = tunnel->tunnel_id;
+		__atomic_add_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED);
+		mlx5_free(default_miss_ctx.queue);
+	}
 	return idx;
 error:
 	MLX5_ASSERT(flow);
@@ -4603,6 +4959,7 @@ mlx5_flow_create(struct rte_eth_dev *dev,
 				   "port not started");
 		return NULL;
 	}
+
 	return (void *)(uintptr_t)flow_list_create(dev, &priv->flows,
 				  attr, items, actions, true, error);
 }
@@ -4657,6 +5014,13 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 		}
 	}
 	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+	if (flow->tunnel) {
+		struct mlx5_flow_tunnel *tunnel;
+		tunnel = mlx5_find_tunnel_id(dev, flow->tunnel_id);
+		RTE_VERIFY(tunnel);
+		if (!__atomic_sub_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED))
+			mlx5_flow_tunnel_free(dev, tunnel);
+	}
 }
 
 /**
@@ -6131,19 +6495,122 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 	sh->cmng.pending_queries--;
 }
 
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_hlist_entry *he;
+	union tunnel_offload_mark mbits = { .val = mark };
+	union mlx5_flow_tbl_key table_key = {
+		{
+			.table_id = tunnel_id_to_flow_tbl(mbits.table_id),
+			.reserved = 0,
+			.domain = !!mbits.transfer,
+			.direction = 0,
+		}
+	};
+	he = mlx5_hlist_lookup(sh->flow_tbls, table_key.v64);
+	return he ?
+	       container_of(he, struct mlx5_flow_tbl_data_entry, entry) : NULL;
+}
+
+static uint32_t
+tunnel_flow_group_to_flow_table(struct rte_eth_dev *dev,
+				const struct mlx5_flow_tunnel *tunnel,
+				uint32_t group, uint32_t *table,
+				struct rte_flow_error *error)
+{
+	struct mlx5_hlist_entry *he;
+	struct tunnel_tbl_entry *tte;
+	union tunnel_tbl_key key = {
+		.tunnel_id = tunnel ? tunnel->tunnel_id : 0,
+		.group = group
+	};
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_hlist *group_hash;
+
+	group_hash = tunnel ? tunnel->groups : thub->groups;
+	he = mlx5_hlist_lookup(group_hash, key.val);
+	if (!he) {
+		int ret;
+		tte = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+				  sizeof(*tte), 0,
+				  SOCKET_ID_ANY);
+		if (!tte)
+			goto err;
+		tte->hash.key = key.val;
+		ret = mlx5_flow_id_get(thub->table_ids, &tte->flow_table);
+		if (ret) {
+			mlx5_free(tte);
+			goto err;
+		}
+		tte->flow_table = tunnel_id_to_flow_tbl(tte->flow_table);
+		mlx5_hlist_insert(group_hash, &tte->hash);
+	} else {
+		tte = container_of(he, typeof(*tte), hash);
+	}
+	*table = tte->flow_table;
+	DRV_LOG(DEBUG, "port %u tunnel %u group=%#x table=%#x",
+		dev->data->port_id, key.tunnel_id, group, *table);
+	return 0;
+
+err:
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+				  NULL, "tunnel group index not supported");
+}
+
+static int
+flow_group_to_table(uint32_t port_id, uint32_t group, uint32_t *table,
+		    struct flow_grp_info grp_info, struct rte_flow_error *error)
+{
+	if (grp_info.transfer && grp_info.external && grp_info.fdb_def_rule) {
+		if (group == UINT32_MAX)
+			return rte_flow_error_set
+						(error, EINVAL,
+						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						 NULL,
+						 "group index not supported");
+		*table = group + 1;
+	} else {
+		*table = group;
+	}
+	DRV_LOG(DEBUG, "port %u group=%#x table=%#x", port_id, group, *table);
+	return 0;
+}
+
 /**
  * Translate the rte_flow group index to HW table value.
  *
- * @param[in] attributes
- *   Pointer to flow attributes
- * @param[in] external
- *   Value is part of flow rule created by request external to PMD.
+ * If tunnel offload is disabled, all group ids coverted to flow table
+ * id using the standard method.
+ * If tunnel offload is enabled, group id can be converted using the
+ * standard or tunnel conversion method. Group conversion method
+ * selection depends on flags in `grp_info` parameter:
+ * - Internal (grp_info.external == 0) groups conversion uses the
+ *   standard method.
+ * - Group ids in JUMP action converted with the tunnel conversion.
+ * - Group id in rule attribute conversion depends on a rule type and
+ *   group id value:
+ *   ** non zero group attributes converted with the tunnel method
+ *   ** zero group attribute in non-tunnel rule is converted using the
+ *      standard method - there's only one root table
+ *   ** zero group attribute in steer tunnel rule is converted with the
+ *      standard method - single root table
+ *   ** zero group attribute in match tunnel rule is a special OvS
+ *      case: that value is used for portability reasons. That group
+ *      id is converted with the tunnel conversion method.
+ *
+ * @param[in] dev
+ *   Port device
+ * @param[in] tunnel
+ *   PMD tunnel offload object
  * @param[in] group
  *   rte_flow group index value.
- * @param[out] fdb_def_rule
- *   Whether fdb jump to table 1 is configured.
  * @param[out] table
  *   HW table value.
+ * @param[in] grp_info
+ *   flags used for conversion
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -6151,22 +6618,34 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_flow_group_to_table(const struct rte_flow_attr *attributes, bool external,
-			 uint32_t group, bool fdb_def_rule, uint32_t *table,
+mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group, uint32_t *table,
+			 struct flow_grp_info grp_info,
 			 struct rte_flow_error *error)
 {
-	if (attributes->transfer && external && fdb_def_rule) {
-		if (group == UINT32_MAX)
-			return rte_flow_error_set
-						(error, EINVAL,
-						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
-						 NULL,
-						 "group index not supported");
-		*table = group + 1;
+	int ret;
+	bool standard_translation;
+
+	if (is_tunnel_offload_active(dev)) {
+		standard_translation = !grp_info.external ||
+					grp_info.std_tbl_fix;
 	} else {
-		*table = group;
+		standard_translation = true;
 	}
-	return 0;
+	DRV_LOG(DEBUG,
+		"port %u group=%#x transfer=%d external=%d fdb_def_rule=%d translate=%s",
+		dev->data->port_id, group, grp_info.transfer,
+		grp_info.external, grp_info.fdb_def_rule,
+		standard_translation ? "STANDARD" : "TUNNEL");
+	if (standard_translation)
+		ret = flow_group_to_table(dev->data->port_id, group, table,
+					  grp_info, error);
+	else
+		ret = tunnel_flow_group_to_flow_table(dev, tunnel, group,
+						      table, error);
+
+	return ret;
 }
 
 /**
@@ -6305,3 +6784,166 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 		 dev->data->port_id);
 	return -ENOTSUP;
 }
+
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev,
+		      struct mlx5_flow_tunnel *tunnel)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+
+	DRV_LOG(DEBUG, "port %u release pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+	RTE_VERIFY(!__atomic_load_n(&tunnel->refctn, __ATOMIC_RELAXED));
+	LIST_REMOVE(tunnel, chain);
+	mlx5_flow_id_release(id_pool, tunnel->tunnel_id);
+	mlx5_hlist_destroy(tunnel->groups, NULL, NULL);
+	mlx5_free(tunnel);
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (tun->tunnel_id == id)
+			break;
+	}
+
+	return tun;
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_flow_tunnel_allocate(struct rte_eth_dev *dev,
+			  const struct rte_flow_tunnel *app_tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+	uint32_t id;
+
+	ret = mlx5_flow_id_get(id_pool, &id);
+	if (ret)
+		return NULL;
+	/**
+	 * mlx5 flow tunnel is an auxlilary data structure
+	 * It's not part of IO. No need to allocate it from
+	 * huge pages pools dedicated for IO
+	 */
+	tunnel = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*tunnel),
+			     0, SOCKET_ID_ANY);
+	if (!tunnel) {
+		mlx5_flow_id_pool_release(id_pool);
+		return NULL;
+	}
+	tunnel->groups = mlx5_hlist_create("tunnel groups", 1024);
+	if (!tunnel->groups) {
+		mlx5_flow_id_pool_release(id_pool);
+		mlx5_free(tunnel);
+		return NULL;
+	}
+	/* initiate new PMD tunnel */
+	memcpy(&tunnel->app_tunnel, app_tunnel, sizeof(*app_tunnel));
+	tunnel->tunnel_id = id;
+	tunnel->action.type = MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET;
+	tunnel->action.conf = tunnel;
+	tunnel->item.type = MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL;
+	tunnel->item.spec = tunnel;
+	tunnel->item.last = NULL;
+	tunnel->item.mask = NULL;
+
+	DRV_LOG(DEBUG, "port %u new pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+
+	return tunnel;
+}
+
+int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (!memcmp(app_tunnel, &tun->app_tunnel,
+			    sizeof(*app_tunnel))) {
+			*tunnel = tun;
+			ret = 0;
+			break;
+		}
+	}
+	if (!tun) {
+		tun = mlx5_flow_tunnel_allocate(dev, app_tunnel);
+		if (tun) {
+			LIST_INSERT_HEAD(&thub->tunnels, tun, chain);
+			*tunnel = tun;
+		} else {
+			ret = -ENOMEM;
+		}
+	}
+	if (tun)
+		__atomic_add_fetch(&tun->refctn, 1, __ATOMIC_RELAXED);
+
+	return ret;
+}
+
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id)
+{
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+
+	if (!thub)
+		return;
+	if (!LIST_EMPTY(&thub->tunnels))
+		DRV_LOG(WARNING, "port %u tunnels present\n", port_id);
+	mlx5_flow_id_pool_release(thub->tunnel_ids);
+	mlx5_flow_id_pool_release(thub->table_ids);
+	mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	mlx5_free(thub);
+}
+
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
+{
+	int err;
+	struct mlx5_flow_tunnel_hub *thub;
+
+	thub = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*thub),
+			   0, SOCKET_ID_ANY);
+	if (!thub)
+		return -ENOMEM;
+	LIST_INIT(&thub->tunnels);
+	thub->tunnel_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TUNNELS);
+	if (!thub->tunnel_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->table_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TABLES);
+	if (!thub->table_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->groups = mlx5_hlist_create("flow groups", MLX5_MAX_TABLES);
+	if (!thub->groups) {
+		err = -rte_errno;
+		goto err;
+	}
+	sh->tunnel_hub = thub;
+
+	return 0;
+
+err:
+	if (thub->groups)
+		mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	if (thub->table_ids)
+		mlx5_flow_id_pool_release(thub->table_ids);
+	if (thub->tunnel_ids)
+		mlx5_flow_id_pool_release(thub->tunnel_ids);
+	if (thub)
+		mlx5_free(thub);
+	return err;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 279daf21f5..8691db16ab 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -26,6 +26,7 @@ enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
 	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
 
 /* Private (internal) rte flow actions. */
@@ -35,6 +36,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_MARK,
 	MLX5_RTE_FLOW_ACTION_TYPE_COPY_MREG,
 	MLX5_RTE_FLOW_ACTION_TYPE_DEFAULT_MISS,
+	MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET,
 };
 
 /* Matches on selected register. */
@@ -196,6 +198,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_SET_IPV6_DSCP (1ull << 33)
 #define MLX5_FLOW_ACTION_AGE (1ull << 34)
 #define MLX5_FLOW_ACTION_DEFAULT_MISS (1ull << 35)
+#define MLX5_FLOW_ACTION_TUNNEL_SET (1ull << 36)
+#define MLX5_FLOW_ACTION_TUNNEL_MATCH (1ull << 37)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -517,6 +521,10 @@ struct mlx5_flow_tbl_data_entry {
 	struct mlx5_flow_dv_jump_tbl_resource jump;
 	/**< jump resource, at most one for each table created. */
 	uint32_t idx; /**< index for the indexed mempool. */
+	/**< tunnel offload */
+	const struct mlx5_flow_tunnel *tunnel;
+	uint32_t group_id;
+	bool external;
 };
 
 /* Verbs specification header. */
@@ -695,6 +703,7 @@ struct mlx5_flow {
 	};
 	struct mlx5_flow_handle *handle;
 	uint32_t handle_idx; /* Index of the mlx5 flow handle memory. */
+	const struct mlx5_flow_tunnel *tunnel;
 };
 
 /* Flow meter state. */
@@ -840,6 +849,112 @@ struct mlx5_fdir_flow {
 
 #define HAIRPIN_FLOW_ID_BITS 28
 
+#define MLX5_MAX_TUNNELS 256
+#define MLX5_TNL_MISS_RULE_PRIORITY 3
+#define MLX5_TNL_MISS_FDB_JUMP_GRP  0xfaac
+
+/*
+ * When tunnel offload is active, all JUMP group ids are converted
+ * using the same method. That conversion is applied both to tunnel and
+ * regular rule types.
+ * Group ids used in tunnel rules are relative to it's tunnel (!).
+ * Application can create number of steer rules, using the same
+ * tunnel, with different group id in each rule.
+ * Each tunnel stores its groups internally in PMD tunnel object.
+ * Groups used in regular rules do not belong to any tunnel and are stored
+ * in tunnel hub.
+ */
+
+struct mlx5_flow_tunnel {
+	LIST_ENTRY(mlx5_flow_tunnel) chain;
+	struct rte_flow_tunnel app_tunnel;	/** app tunnel copy */
+	uint32_t tunnel_id;			/** unique tunnel ID */
+	uint32_t refctn;
+	struct rte_flow_action action;
+	struct rte_flow_item item;
+	struct mlx5_hlist *groups;		/** tunnel groups */
+};
+
+/** PMD tunnel related context */
+struct mlx5_flow_tunnel_hub {
+	LIST_HEAD(, mlx5_flow_tunnel) tunnels;
+	struct mlx5_flow_id_pool *tunnel_ids;
+	struct mlx5_flow_id_pool *table_ids;
+	struct mlx5_hlist *groups;		/** non tunnel groups */
+};
+
+/* convert jump group to flow table ID in tunnel rules */
+struct tunnel_tbl_entry {
+	struct mlx5_hlist_entry hash;
+	uint32_t flow_table;
+};
+
+static inline uint32_t
+tunnel_id_to_flow_tbl(uint32_t id)
+{
+	return id | (1u << 16);
+}
+
+static inline uint32_t
+tunnel_flow_tbl_to_id(uint32_t flow_tbl)
+{
+	return flow_tbl & ~(1u << 16);
+}
+
+union tunnel_tbl_key {
+	uint64_t val;
+	struct {
+		uint32_t tunnel_id;
+		uint32_t group;
+	};
+};
+
+static inline struct mlx5_flow_tunnel_hub *
+mlx5_tunnel_hub(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return priv->sh->tunnel_hub;
+}
+
+static inline bool
+is_tunnel_offload_active(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return !!priv->config.dv_miss_info;
+}
+
+static inline bool
+is_flow_tunnel_match_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (items[0].type == (typeof(items[0].type))
+				 MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL);
+}
+
+static inline bool
+is_flow_tunnel_steer_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (actions[0].type == (typeof(actions[0].type))
+				   MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET);
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_actions_to_tunnel(const struct rte_flow_action actions[])
+{
+	return actions[0].conf;
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_items_to_tunnel(const struct rte_flow_item items[])
+{
+	return items[0].spec;
+}
+
 /* Flow structure. */
 struct rte_flow {
 	ILIST_ENTRY(uint32_t)next; /**< Index to the next flow structure. */
@@ -847,12 +962,14 @@ struct rte_flow {
 	/**< Device flow handles that are part of the flow. */
 	uint32_t drv_type:2; /**< Driver type. */
 	uint32_t fdir:1; /**< Identifier of associated FDIR if any. */
+	uint32_t tunnel:1;
 	uint32_t hairpin_flow_id:HAIRPIN_FLOW_ID_BITS;
 	/**< The flow id used for hairpin. */
 	uint32_t copy_applied:1; /**< The MARK copy Flow os applied. */
 	uint32_t rix_mreg_copy;
 	/**< Index to metadata register copy table resource. */
 	uint32_t counter; /**< Holds flow counter. */
+	uint32_t tunnel_id;  /**< Tunnel id */
 	uint16_t meter; /**< Holds flow meter id. */
 } __rte_packed;
 
@@ -935,9 +1052,54 @@ void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
 uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
 uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
 			      uint32_t id);
-int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
-			     bool external, uint32_t group, bool fdb_def_rule,
-			     uint32_t *table, struct rte_flow_error *error);
+__extension__
+struct flow_grp_info {
+	uint64_t external:1;
+	uint64_t transfer:1;
+	uint64_t fdb_def_rule:1;
+	/* force standard group translation */
+	uint64_t std_tbl_fix:1;
+};
+
+static inline bool
+tunnel_use_standard_attr_group_translate
+		    (struct rte_eth_dev *dev,
+		     const struct mlx5_flow_tunnel *tunnel,
+		     const struct rte_flow_attr *attr,
+		     const struct rte_flow_item items[],
+		     const struct rte_flow_action actions[])
+{
+	bool verdict;
+
+	if (!is_tunnel_offload_active(dev))
+		/* no tunnel offload API */
+		verdict = true;
+	else if (tunnel) {
+		/*
+		 * OvS will use jump to group 0 in tunnel steer rule.
+		 * If tunnel steer rule starts from group 0 (attr.group == 0)
+		 * that 0 group must be traslated with standard method.
+		 * attr.group == 0 in tunnel match rule translated with tunnel
+		 * method
+		 */
+		verdict = !attr->group &&
+			  is_flow_tunnel_steer_rule(dev, attr, items, actions);
+	} else {
+		/*
+		 * non-tunnel group translation uses standard method for
+		 * root group only: attr.group == 0
+		 */
+		verdict = !attr->group;
+	}
+
+	return verdict;
+}
+
+int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     uint32_t group, uint32_t *table,
+			     struct flow_grp_info flags,
+				 struct rte_flow_error *error);
 uint64_t mlx5_flow_hashfields_adjust(struct mlx5_flow_rss_desc *rss_desc,
 				     int tunnel, uint64_t layer_types,
 				     uint64_t hash_fields);
@@ -1069,4 +1231,9 @@ int mlx5_flow_destroy_policer_rules(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr);
 int mlx5_flow_meter_flush(struct rte_eth_dev *dev,
 			  struct rte_mtr_error *error);
+int mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+			 const struct rte_flow_tunnel *app_tunnel,
+			 struct mlx5_flow_tunnel **tunnel);
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 79fdf34c0e..380fb0fb09 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3702,14 +3702,21 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_dv_validate_action_jump(const struct rte_flow_action *action,
+flow_dv_validate_action_jump(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     const struct rte_flow_action *action,
 			     uint64_t action_flags,
 			     const struct rte_flow_attr *attributes,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table;
 	int ret = 0;
-
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attributes->transfer,
+		.fdb_def_rule = 1,
+		.std_tbl_fix = 0
+	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
 			    MLX5_FLOW_FATE_ESWITCH_ACTIONS))
 		return rte_flow_error_set(error, EINVAL,
@@ -3726,11 +3733,13 @@ flow_dv_validate_action_jump(const struct rte_flow_action *action,
 					  NULL, "action configuration not set");
 	target_group =
 		((const struct rte_flow_action_jump *)action->conf)->group;
-	ret = mlx5_flow_group_to_table(attributes, external, target_group,
-				       true, &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, target_group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
-	if (attributes->group == target_group)
+	if (attributes->group == target_group &&
+	    !(action_flags & (MLX5_FLOW_ACTION_TUNNEL_SET |
+			      MLX5_FLOW_ACTION_TUNNEL_MATCH)))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "target group must be other than"
@@ -4982,8 +4991,9 @@ flow_dv_counter_release(struct rte_eth_dev *dev, uint32_t counter)
  */
 static int
 flow_dv_validate_attributes(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_tunnel *tunnel,
 			    const struct rte_flow_attr *attributes,
-			    bool external __rte_unused,
+			    struct flow_grp_info grp_info,
 			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -4999,9 +5009,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 #else
 	uint32_t table = 0;
 
-	ret = mlx5_flow_group_to_table(attributes, external,
-				       attributes->group, !!priv->fdb_def_rule,
-				       &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attributes->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	if (!table)
@@ -5123,10 +5132,28 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 	const struct rte_flow_item_vlan *vlan_m = NULL;
 	int16_t rw_act_num = 0;
 	uint64_t is_root;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	if (items == NULL)
 		return -1;
-	ret = flow_dv_validate_attributes(dev, attr, external, error);
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		tunnel = flow_items_to_tunnel(items);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_MATCH |
+				MLX5_FLOW_ACTION_DECAP;
+	} else if (is_flow_tunnel_steer_rule(dev, attr, items, actions)) {
+		tunnel = flow_actions_to_tunnel(actions);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+	} else {
+		tunnel = NULL;
+	}
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = flow_dv_validate_attributes(dev, tunnel, attr, grp_info, error);
 	if (ret < 0)
 		return ret;
 	is_root = (uint64_t)ret;
@@ -5139,6 +5166,15 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  NULL, "item not supported");
 		switch (type) {
+		case MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL:
+			if (items[0].type != (typeof(items[0].type))
+						MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ITEM,
+						NULL, "MLX5 private items "
+						"must be the first");
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -5703,7 +5739,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_MDF_TTL;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			ret = flow_dv_validate_action_jump(actions,
+			ret = flow_dv_validate_action_jump(dev, tunnel, actions,
 							   action_flags,
 							   attr, external,
 							   error);
@@ -5803,6 +5839,17 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			action_flags |= MLX5_FLOW_ACTION_SET_IPV6_DSCP;
 			rw_act_num += MLX5_ACT_NUM_SET_DSCP;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			if (actions[0].type != (typeof(actions[0].type))
+				MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL, "MLX5 private action "
+						"must be the first");
+
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -5810,6 +5857,54 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  "action not supported");
 		}
 	}
+	/*
+	 * Validate actions in flow rules
+	 * - Explicit decap action is prohibited by the tunnel offload API.
+	 * - Drop action in tunnel steer rule is prohibited by the API.
+	 * - Application cannot use MARK action because it's value can mask
+	 *   tunnel default miss nitification.
+	 * - JUMP in tunnel match rule has no support in current PMD
+	 *   implementation.
+	 * - TAG & META are reserved for future uses.
+	 */
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_SET) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_DECAP    |
+					    MLX5_FLOW_ACTION_MARK     |
+					    MLX5_FLOW_ACTION_SET_TAG  |
+					    MLX5_FLOW_ACTION_SET_META |
+					    MLX5_FLOW_ACTION_DROP;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set decap rule");
+		if (!(action_flags & MLX5_FLOW_ACTION_JUMP))
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel set decap rule must terminate "
+					"with JUMP");
+		if (!attr->ingress)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel flows for ingress traffic only");
+	}
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_MATCH) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_JUMP    |
+					    MLX5_FLOW_ACTION_MARK    |
+					    MLX5_FLOW_ACTION_SET_TAG |
+					    MLX5_FLOW_ACTION_SET_META;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set match rule");
+	}
 	/*
 	 * Validate the drop action mutual exclusion with other actions.
 	 * Drop action is mutually-exclusive with any other action, except for
@@ -7616,6 +7711,9 @@ static struct mlx5_flow_tbl_resource *
 flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 			 uint32_t table_id, uint8_t egress,
 			 uint8_t transfer,
+			 bool external,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group_id,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -7652,6 +7750,9 @@ flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	tbl_data->idx = idx;
+	tbl_data->tunnel = tunnel;
+	tbl_data->group_id = group_id;
+	tbl_data->external = external;
 	tbl = &tbl_data->tbl;
 	pos = &tbl_data->entry;
 	if (transfer)
@@ -7715,6 +7816,41 @@ flow_dv_tbl_resource_release(struct rte_eth_dev *dev,
 
 		mlx5_flow_os_destroy_flow_tbl(tbl->obj);
 		tbl->obj = NULL;
+		if (is_tunnel_offload_active(dev) && tbl_data->external) {
+			struct mlx5_hlist_entry *he;
+			struct mlx5_hlist *tunnel_grp_hash;
+			struct mlx5_flow_tunnel_hub *thub =
+							mlx5_tunnel_hub(dev);
+			union tunnel_tbl_key tunnel_key = {
+				.tunnel_id = tbl_data->tunnel ?
+						tbl_data->tunnel->tunnel_id : 0,
+				.group = tbl_data->group_id
+			};
+			union mlx5_flow_tbl_key table_key = {
+				.v64 = pos->key
+			};
+			uint32_t table_id = table_key.table_id;
+
+			tunnel_grp_hash = tbl_data->tunnel ?
+						tbl_data->tunnel->groups :
+						thub->groups;
+			he = mlx5_hlist_lookup(tunnel_grp_hash, tunnel_key.val);
+			if (he) {
+				struct tunnel_tbl_entry *tte;
+				tte = container_of(he, typeof(*tte), hash);
+				MLX5_ASSERT(tte->flow_table == table_id);
+				mlx5_hlist_remove(tunnel_grp_hash, he);
+				mlx5_free(tte);
+			}
+			mlx5_flow_id_release(mlx5_tunnel_hub(dev)->table_ids,
+					     tunnel_flow_tbl_to_id(table_id));
+			DRV_LOG(DEBUG,
+				"port %u release table_id %#x tunnel %u group %u",
+				dev->data->port_id, table_id,
+				tbl_data->tunnel ?
+				tbl_data->tunnel->tunnel_id : 0,
+				tbl_data->group_id);
+		}
 		/* remove the entry from the hash list and free memory. */
 		mlx5_hlist_remove(sh->flow_tbls, pos);
 		mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_JUMP],
@@ -7760,7 +7896,7 @@ flow_dv_matcher_register(struct rte_eth_dev *dev,
 	int ret;
 
 	tbl = flow_dv_tbl_resource_get(dev, key->table_id, key->direction,
-				       key->domain, error);
+				       key->domain, false, NULL, 0, error);
 	if (!tbl)
 		return -rte_errno;	/* No need to refill the error info */
 	tbl_data = container_of(tbl, struct mlx5_flow_tbl_data_entry, tbl);
@@ -8215,11 +8351,23 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 	struct rte_vlan_hdr vlan = { 0 };
 	uint32_t table;
 	int ret = 0;
-
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!dev_flow->external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
+	tunnel = is_flow_tunnel_match_rule(dev, attr, items, actions) ?
+		 flow_items_to_tunnel(items) :
+		 is_flow_tunnel_steer_rule(dev, attr, items, actions) ?
+		 flow_actions_to_tunnel(actions) :
+		 dev_flow->tunnel ? dev_flow->tunnel : NULL;
 	mhdr_res->ft_type = attr->egress ? MLX5DV_FLOW_TABLE_TYPE_NIC_TX :
 					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
-	ret = mlx5_flow_group_to_table(attr, dev_flow->external, attr->group,
-				       !!priv->fdb_def_rule, &table, error);
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attr->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	dev_flow->dv.group = table;
@@ -8229,6 +8377,45 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		priority = dev_conf->flow_prio - 1;
 	/* number of actions must be set to 0 in case of dirty stack. */
 	mhdr_res->actions_num = 0;
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		/*
+		 * do not add decap action if match rule drops packet
+		 * HW rejects rules with decap & drop
+		 */
+		bool add_decap = true;
+		const struct rte_flow_action *ptr = actions;
+		struct mlx5_flow_tbl_resource *tbl;
+
+		for (; ptr->type != RTE_FLOW_ACTION_TYPE_END; ptr++) {
+			if (ptr->type == RTE_FLOW_ACTION_TYPE_DROP) {
+				add_decap = false;
+				break;
+			}
+		}
+		if (add_decap) {
+			if (flow_dv_create_action_l2_decap(dev, dev_flow,
+							   attr->transfer,
+							   error))
+				return -rte_errno;
+			dev_flow->dv.actions[actions_n++] =
+					dev_flow->dv.encap_decap->action;
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
+		}
+		/*
+		 * bind table_id with <group, table> for tunnel match rule.
+		 * Tunnel set rule establishes that bind in JUMP action handler.
+		 * Required for scenario when application creates tunnel match
+		 * rule before tunnel set rule.
+		 */
+		tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+					       attr->transfer,
+					       !!dev_flow->external, tunnel,
+					       attr->group, error);
+		if (!tbl)
+			return rte_flow_error_set
+			       (error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+			       actions, "cannot register tunnel group");
+	}
 	for (; !actions_end ; actions++) {
 		const struct rte_flow_action_queue *queue;
 		const struct rte_flow_action_rss *rss;
@@ -8249,6 +8436,9 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 						  actions,
 						  "action not supported");
 		switch (action_type) {
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -8480,16 +8670,19 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 			action_flags |= MLX5_FLOW_ACTION_DECAP;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
+			grp_info.std_tbl_fix = 0;
 			jump_data = action->conf;
-			ret = mlx5_flow_group_to_table(attr, dev_flow->external,
+			ret = mlx5_flow_group_to_table(dev, tunnel,
 						       jump_data->group,
-						       !!priv->fdb_def_rule,
-						       &table, error);
+						       &table,
+						       grp_info, error);
 			if (ret)
 				return ret;
-			tbl = flow_dv_tbl_resource_get(dev, table,
-						       attr->egress,
-						       attr->transfer, error);
+			tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+						       attr->transfer,
+						       !!dev_flow->external,
+						       tunnel, jump_data->group,
+						       error);
 			if (!tbl)
 				return rte_flow_error_set
 						(error, errno,
@@ -9681,7 +9874,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 		dtb = &mtb->ingress;
 	/* Create the meter table with METER level. */
 	dtb->tbl = flow_dv_tbl_resource_get(dev, MLX5_FLOW_TABLE_LEVEL_METER,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->tbl) {
 		DRV_LOG(ERR, "Failed to create meter policer table.");
 		return -1;
@@ -9689,7 +9883,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 	/* Create the meter suffix table with SUFFIX level. */
 	dtb->sfx_tbl = flow_dv_tbl_resource_get(dev,
 					    MLX5_FLOW_TABLE_LEVEL_SUFFIX,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->sfx_tbl) {
 		DRV_LOG(ERR, "Failed to create meter suffix table.");
 		return -1;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v4 4/4] app/testpmd: add commands for tunnel offload API
  2020-10-04 13:50 ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Gregory Etelson
                     ` (2 preceding siblings ...)
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 3/4] net/mlx5: implement tunnel offload API Gregory Etelson
@ 2020-10-04 13:50   ` Gregory Etelson
  2020-10-04 13:59     ` Ori Kam
  2020-10-14 17:25   ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Ferruh Yigit
  4 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-10-04 13:50 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde, Ori Kam,
	Wenzhuo Lu, Beilei Xing, Bernard Iremonger

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:

* Create application tunnel:
flow tunnel create <port> type <tunnel type>
On success, the command creates application tunnel object and returns
the tunnel descriptor. Tunnel descriptor is used in subsequent flow
creation commands to reference the tunnel.

* Create tunnel steering flow rule:
tunnel_set <tunnel descriptor> parameter used with steering rule
template.

* Create tunnel matching flow rule:
tunnel_match <tunnel descriptor> used with matching rule template.

* If tunnel steering rule was offloaded, outer header of a partially
offloaded packet is restored after miss.

Example:
test packet=
<Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
<UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
<VXLAN  NextProtocol=Ethernet vni=0x0 |
<Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
<ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
>>> len(packet)
92

testpmd> flow flush 0
testpmd> port 0/queue 0: received 1 packets
src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow tunnel 0 type vxlan
port 0: flow tunnel #1 type vxlan
testpmd> flow create 0 ingress group 0 tunnel_set 1
         pattern eth /ipv4 / udp dst is 4789 / vxlan / end
         actions  jump group 0 / end
Flow rule #0 created
testpmd> port 0/queue 0: received 1 packets
tunnel restore info: - vxlan tunnel - outer header present # <--
  src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow create 0 ingress group 0 tunnel_match 1
         pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
         end
         actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
         queue index 0 / end
Flow rule #1 created
testpmd> port 0/queue 0: received 1 packets
  src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
length=42

* Destroy flow tunnel
flow tunnel destroy <port> id <tunnel id>

* Show existing flow tunnels
flow tunnel list <port>

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
v2:
* introduce testpmd support for tunnel offload API

v3:
* update flow tunnel commands
---
 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 253 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 6 files changed, 533 insertions(+), 13 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 6e04d538ea..973dc274c0 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -69,6 +69,14 @@ enum index {
 	LIST,
 	AGED,
 	ISOLATE,
+	TUNNEL,
+
+	/* Tunnel argumens. */
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	TUNNEL_LIST,
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
 
 	/* Destroy arguments. */
 	DESTROY_RULE,
@@ -88,6 +96,8 @@ enum index {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 
 	/* Validate/create pattern. */
 	PATTERN,
@@ -655,6 +665,7 @@ struct buffer {
 	union {
 		struct {
 			struct rte_flow_attr attr;
+			struct tunnel_ops tunnel_ops;
 			struct rte_flow_item *pattern;
 			struct rte_flow_action *actions;
 			uint32_t pattern_n;
@@ -715,10 +726,32 @@ static const enum index next_vc_attr[] = {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 	PATTERN,
 	ZERO,
 };
 
+static const enum index tunnel_create_attr[] = {
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_destroy_attr[] = {
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_list_attr[] = {
+	TUNNEL_LIST,
+	END,
+	ZERO,
+};
+
 static const enum index next_destroy_attr[] = {
 	DESTROY_RULE,
 	END,
@@ -1520,6 +1553,9 @@ static int parse_aged(struct context *, const struct token *,
 static int parse_isolate(struct context *, const struct token *,
 			 const char *, unsigned int,
 			 void *, unsigned int);
+static int parse_tunnel(struct context *, const struct token *,
+			const char *, unsigned int,
+			void *, unsigned int);
 static int parse_int(struct context *, const struct token *,
 		     const char *, unsigned int,
 		     void *, unsigned int);
@@ -1702,7 +1738,8 @@ static const struct token token_list[] = {
 			      LIST,
 			      AGED,
 			      QUERY,
-			      ISOLATE)),
+			      ISOLATE,
+			      TUNNEL)),
 		.call = parse_init,
 	},
 	/* Sub-level commands. */
@@ -1776,6 +1813,49 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY(struct buffer, port)),
 		.call = parse_isolate,
 	},
+	[TUNNEL] = {
+		.name = "tunnel",
+		.help = "new tunnel API",
+		.next = NEXT(NEXT_ENTRY
+			     (TUNNEL_CREATE, TUNNEL_LIST, TUNNEL_DESTROY)),
+		.call = parse_tunnel,
+	},
+	/* Tunnel arguments. */
+	[TUNNEL_CREATE] = {
+		.name = "create",
+		.help = "create new tunnel object",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_CREATE_TYPE] = {
+		.name = "type",
+		.help = "create new tunnel",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(FILE_PATH)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, type)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY] = {
+		.name = "destroy",
+		.help = "destroy tunel",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY_ID] = {
+		.name = "id",
+		.help = "tunnel identifier to testroy",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_LIST] = {
+		.name = "list",
+		.help = "list existing tunnels",
+		.next = NEXT(tunnel_list_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
 	/* Destroy arguments. */
 	[DESTROY_RULE] = {
 		.name = "rule",
@@ -1839,6 +1919,20 @@ static const struct token token_list[] = {
 		.next = NEXT(next_vc_attr),
 		.call = parse_vc,
 	},
+	[TUNNEL_SET] = {
+		.name = "tunnel_set",
+		.help = "tunnel steer rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
+	[TUNNEL_MATCH] = {
+		.name = "tunnel_match",
+		.help = "tunnel match rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
 	/* Validate/create pattern. */
 	[PATTERN] = {
 		.name = "pattern",
@@ -4072,12 +4166,28 @@ parse_vc(struct context *ctx, const struct token *token,
 		return len;
 	}
 	ctx->objdata = 0;
-	ctx->object = &out->args.vc.attr;
+	switch (ctx->curr) {
+	default:
+		ctx->object = &out->args.vc.attr;
+		break;
+	case TUNNEL_SET:
+	case TUNNEL_MATCH:
+		ctx->object = &out->args.vc.tunnel_ops;
+		break;
+	}
 	ctx->objmask = NULL;
 	switch (ctx->curr) {
 	case GROUP:
 	case PRIORITY:
 		return len;
+	case TUNNEL_SET:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.actions = 1;
+		return len;
+	case TUNNEL_MATCH:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.items = 1;
+		return len;
 	case INGRESS:
 		out->args.vc.attr.ingress = 1;
 		return len;
@@ -5615,6 +5725,47 @@ parse_isolate(struct context *ctx, const struct token *token,
 	return len;
 }
 
+static int
+parse_tunnel(struct context *ctx, const struct token *token,
+	     const char *str, unsigned int len,
+	     void *buf, unsigned int size)
+{
+	struct buffer *out = buf;
+
+	/* Token name must match. */
+	if (parse_default(ctx, token, str, len, NULL, 0) < 0)
+		return -1;
+	/* Nothing else to do if there is no buffer. */
+	if (!out)
+		return len;
+	if (!out->command) {
+		if (ctx->curr != TUNNEL)
+			return -1;
+		if (sizeof(*out) > size)
+			return -1;
+		out->command = ctx->curr;
+		ctx->objdata = 0;
+		ctx->object = out;
+		ctx->objmask = NULL;
+	} else {
+		switch (ctx->curr) {
+		default:
+			break;
+		case TUNNEL_CREATE:
+		case TUNNEL_DESTROY:
+		case TUNNEL_LIST:
+			out->command = ctx->curr;
+			break;
+		case TUNNEL_CREATE_TYPE:
+		case TUNNEL_DESTROY_ID:
+			ctx->object = &out->args.vc.tunnel_ops;
+			break;
+		}
+	}
+
+	return len;
+}
+
 /**
  * Parse signed/unsigned integers 8 to 64-bit long.
  *
@@ -6561,11 +6712,13 @@ cmd_flow_parsed(const struct buffer *in)
 	switch (in->command) {
 	case VALIDATE:
 		port_flow_validate(in->port, &in->args.vc.attr,
-				   in->args.vc.pattern, in->args.vc.actions);
+				   in->args.vc.pattern, in->args.vc.actions,
+				   &in->args.vc.tunnel_ops);
 		break;
 	case CREATE:
 		port_flow_create(in->port, &in->args.vc.attr,
-				 in->args.vc.pattern, in->args.vc.actions);
+				 in->args.vc.pattern, in->args.vc.actions,
+				 &in->args.vc.tunnel_ops);
 		break;
 	case DESTROY:
 		port_flow_destroy(in->port, in->args.destroy.rule_n,
@@ -6591,6 +6744,15 @@ cmd_flow_parsed(const struct buffer *in)
 	case AGED:
 		port_flow_aged(in->port, in->args.aged.destroy);
 		break;
+	case TUNNEL_CREATE:
+		port_flow_tunnel_create(in->port, &in->args.vc.tunnel_ops);
+		break;
+	case TUNNEL_DESTROY:
+		port_flow_tunnel_destroy(in->port, in->args.vc.tunnel_ops.id);
+		break;
+	case TUNNEL_LIST:
+		port_flow_tunnel_list(in->port);
+		break;
 	default:
 		break;
 	}
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 418ea6dda4..1a43684709 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1456,6 +1456,115 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 
 /* Generic flow management functions. */
 
+static struct port_flow_tunnel *
+port_flow_locate_tunnel_id(struct rte_port *port, uint32_t port_tunnel_id)
+{
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (flow_tunnel->id == port_tunnel_id)
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+const char *
+port_flow_tunnel_type(struct rte_flow_tunnel *tunnel)
+{
+	const char *type;
+	switch (tunnel->type) {
+	default:
+		type = "unknown";
+		break;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		type = "vxlan";
+		break;
+	}
+
+	return type;
+}
+
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (!memcmp(&flow_tunnel->tunnel, tun, sizeof(*tun)))
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+void port_flow_tunnel_list(portid_t port_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		printf("port %u tunnel #%u type=%s",
+			port_id, flt->id, port_flow_tunnel_type(&flt->tunnel));
+		if (flt->tunnel.tun_id)
+			printf(" id=%lu", flt->tunnel.tun_id);
+		printf("\n");
+	}
+}
+
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->id == tunnel_id)
+			break;
+	}
+	if (flt) {
+		LIST_REMOVE(flt, chain);
+		free(flt);
+		printf("port %u: flow tunnel #%u destroyed\n",
+			port_id, tunnel_id);
+	}
+}
+
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops)
+{
+	struct rte_port *port = &ports[port_id];
+	enum rte_flow_item_type	type;
+	struct port_flow_tunnel *flt;
+
+	if (!strcmp(ops->type, "vxlan"))
+		type = RTE_FLOW_ITEM_TYPE_VXLAN;
+	else {
+		printf("cannot offload \"%s\" tunnel type\n", ops->type);
+		return;
+	}
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->tunnel.type == type)
+			break;
+	}
+	if (!flt) {
+		flt = calloc(1, sizeof(*flt));
+		if (!flt) {
+			printf("failed to allocate port flt object\n");
+			return;
+		}
+		flt->tunnel.type = type;
+		flt->id = LIST_EMPTY(&port->flow_tunnel_list) ? 1 :
+				  LIST_FIRST(&port->flow_tunnel_list)->id + 1;
+		LIST_INSERT_HEAD(&port->flow_tunnel_list, flt, chain);
+	}
+	printf("port %d: flow tunnel #%u type %s\n",
+		port_id, flt->id, ops->type);
+}
+
 /** Generate a port_flow entry from attributes/pattern/actions. */
 static struct port_flow *
 port_flow_new(const struct rte_flow_attr *attr,
@@ -1580,19 +1689,137 @@ rss_config_display(struct rte_flow_action_rss *rss_conf)
 	}
 }
 
+static struct port_flow_tunnel *
+port_flow_tunnel_offload_cmd_prep(portid_t port_id,
+				  const struct rte_flow_item *pattern,
+				  const struct rte_flow_action *actions,
+				  const struct tunnel_ops *tunnel_ops)
+{
+	int ret;
+	struct rte_port *port;
+	struct port_flow_tunnel *pft;
+	struct rte_flow_error error;
+
+	port = &ports[port_id];
+	pft = port_flow_locate_tunnel_id(port, tunnel_ops->id);
+	if (!pft) {
+		printf("failed to locate port flow tunnel #%u\n",
+			tunnel_ops->id);
+		return NULL;
+	}
+	if (tunnel_ops->actions) {
+		uint32_t num_actions;
+		const struct rte_flow_action *aptr;
+
+		ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
+						&pft->pmd_actions,
+						&pft->num_pmd_actions,
+						&error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (aptr = actions, num_actions = 1;
+		     aptr->type != RTE_FLOW_ACTION_TYPE_END;
+		     aptr++, num_actions++);
+		pft->actions = malloc(
+				(num_actions +  pft->num_pmd_actions) *
+				sizeof(actions[0]));
+		if (!pft->actions) {
+			rte_flow_tunnel_action_decap_release(
+					port_id, pft->actions,
+					pft->num_pmd_actions, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->actions, pft->pmd_actions,
+			   pft->num_pmd_actions * sizeof(actions[0]));
+		rte_memcpy(pft->actions + pft->num_pmd_actions, actions,
+			   num_actions * sizeof(actions[0]));
+	}
+	if (tunnel_ops->items) {
+		uint32_t num_items;
+		const struct rte_flow_item *iptr;
+
+		ret = rte_flow_tunnel_match(port_id, &pft->tunnel,
+					    &pft->pmd_items,
+					    &pft->num_pmd_items,
+					    &error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (iptr = pattern, num_items = 1;
+		     iptr->type != RTE_FLOW_ITEM_TYPE_END;
+		     iptr++, num_items++);
+		pft->items = malloc((num_items + pft->num_pmd_items) *
+				    sizeof(pattern[0]));
+		if (!pft->items) {
+			rte_flow_tunnel_item_release(
+					port_id, pft->pmd_items,
+					pft->num_pmd_items, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->items, pft->pmd_items,
+			   pft->num_pmd_items * sizeof(pattern[0]));
+		rte_memcpy(pft->items + pft->num_pmd_items, pattern,
+			   num_items * sizeof(pattern[0]));
+	}
+
+	return pft;
+}
+
+static void
+port_flow_tunnel_offload_cmd_release(portid_t port_id,
+				     const struct tunnel_ops *tunnel_ops,
+				     struct port_flow_tunnel *pft)
+{
+	struct rte_flow_error error;
+
+	if (tunnel_ops->actions) {
+		free(pft->actions);
+		rte_flow_tunnel_action_decap_release(
+			port_id, pft->pmd_actions,
+			pft->num_pmd_actions, &error);
+		pft->actions = NULL;
+		pft->pmd_actions = NULL;
+	}
+	if (tunnel_ops->items) {
+		free(pft->items);
+		rte_flow_tunnel_item_release(port_id, pft->pmd_items,
+					     pft->num_pmd_items,
+					     &error);
+		pft->items = NULL;
+		pft->pmd_items = NULL;
+	}
+}
+
 /** Validate flow rule. */
 int
 port_flow_validate(portid_t port_id,
 		   const struct rte_flow_attr *attr,
 		   const struct rte_flow_item *pattern,
-		   const struct rte_flow_action *actions)
+		   const struct rte_flow_action *actions,
+		   const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	/* Poisoning to make sure PMDs update it in case of error. */
 	memset(&error, 0x11, sizeof(error));
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	if (rte_flow_validate(port_id, attr, pattern, actions, &error))
 		return port_flow_complain(&error);
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule validated\n");
 	return 0;
 }
@@ -1622,13 +1849,15 @@ int
 port_flow_create(portid_t port_id,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item *pattern,
-		 const struct rte_flow_action *actions)
+		 const struct rte_flow_action *actions,
+		 const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow *flow;
 	struct rte_port *port;
 	struct port_flow *pf;
 	uint32_t id = 0;
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	port = &ports[port_id];
 	if (port->flow_list) {
@@ -1639,6 +1868,16 @@ port_flow_create(portid_t port_id,
 		}
 		id = port->flow_list->id + 1;
 	}
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	pf = port_flow_new(attr, pattern, actions, &error);
 	if (!pf)
 		return port_flow_complain(&error);
@@ -1654,6 +1893,8 @@ port_flow_create(portid_t port_id,
 	pf->id = id;
 	pf->flow = flow;
 	port->flow_list = pf;
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule #%u created\n", pf->id);
 	return 0;
 }
@@ -1951,7 +2192,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 		       pf->rule.attr->egress ? 'e' : '-',
 		       pf->rule.attr->transfer ? 't' : '-');
 		while (item->type != RTE_FLOW_ITEM_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
+			if ((uint32_t)item->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)item->type,
 					  NULL) <= 0)
@@ -1962,7 +2205,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 		}
 		printf("=>");
 		while (action->type != RTE_FLOW_ACTION_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
+			if ((uint32_t)action->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)action->type,
 					  NULL) <= 0)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index ccba71c076..93b9e4dde1 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3567,6 +3567,8 @@ init_port_dcb_config(portid_t pid,
 static void
 init_port(void)
 {
+	int i;
+
 	/* Configuration of Ethernet ports. */
 	ports = rte_zmalloc("testpmd: ports",
 			    sizeof(struct rte_port) * RTE_MAX_ETHPORTS,
@@ -3576,7 +3578,8 @@ init_port(void)
 				"rte_zmalloc(%d struct rte_port) failed\n",
 				RTE_MAX_ETHPORTS);
 	}
-
+	for (i = 0; i < RTE_MAX_ETHPORTS; i++)
+		LIST_INIT(&ports[i].flow_tunnel_list);
 	/* Initialize ports NUMA structures */
 	memset(port_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
 	memset(rxring_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index c7e7e41a97..d3d43b4504 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -12,6 +12,7 @@
 #include <rte_gro.h>
 #include <rte_gso.h>
 #include <cmdline.h>
+#include <sys/queue.h>
 
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
@@ -142,6 +143,26 @@ struct port_flow {
 	uint8_t data[]; /**< Storage for flow rule description */
 };
 
+struct port_flow_tunnel {
+	LIST_ENTRY(port_flow_tunnel) chain;
+	struct rte_flow_action *pmd_actions;
+	struct rte_flow_item   *pmd_items;
+	uint32_t id;
+	uint32_t num_pmd_actions;
+	uint32_t num_pmd_items;
+	struct rte_flow_tunnel tunnel;
+	struct rte_flow_action *actions;
+	struct rte_flow_item *items;
+};
+
+struct tunnel_ops {
+	uint32_t id;
+	char type[16];
+	uint32_t enabled:1;
+	uint32_t actions:1;
+	uint32_t items:1;
+};
+
 /**
  * The data structure associated with each port.
  */
@@ -172,6 +193,7 @@ struct rte_port {
 	uint32_t                mc_addr_nb; /**< nb. of addr. in mc_addr_pool */
 	uint8_t                 slave_flag; /**< bonding slave port */
 	struct port_flow        *flow_list; /**< Associated flows. */
+	LIST_HEAD(, port_flow_tunnel) flow_tunnel_list;
 	const struct rte_eth_rxtx_callback *rx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	const struct rte_eth_rxtx_callback *tx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	/**< metadata value to insert in Tx packets. */
@@ -751,11 +773,13 @@ void port_reg_set(portid_t port_id, uint32_t reg_off, uint32_t value);
 int port_flow_validate(portid_t port_id,
 		       const struct rte_flow_attr *attr,
 		       const struct rte_flow_item *pattern,
-		       const struct rte_flow_action *actions);
+		       const struct rte_flow_action *actions,
+		       const struct tunnel_ops *tunnel_ops);
 int port_flow_create(portid_t port_id,
 		     const struct rte_flow_attr *attr,
 		     const struct rte_flow_item *pattern,
-		     const struct rte_flow_action *actions);
+		     const struct rte_flow_action *actions,
+		     const struct tunnel_ops *tunnel_ops);
 void update_age_action_context(const struct rte_flow_action *actions,
 		     struct port_flow *pf);
 int port_flow_destroy(portid_t port_id, uint32_t n, const uint32_t *rule);
@@ -765,6 +789,12 @@ int port_flow_query(portid_t port_id, uint32_t rule,
 		    const struct rte_flow_action *action);
 void port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group);
 void port_flow_aged(portid_t port_id, uint8_t destroy);
+const char *port_flow_tunnel_type(struct rte_flow_tunnel *tunnel);
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun);
+void port_flow_tunnel_list(portid_t port_id);
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id);
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops);
 int port_flow_isolate(portid_t port_id, int set);
 
 void rx_ring_desc_display(portid_t port_id, queueid_t rxq_id, uint16_t rxd_id);
diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index 8488fa1a8f..781a813759 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -48,18 +48,49 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 	       is_rx ? "received" : "sent",
 	       (unsigned int) nb_pkts);
 	for (i = 0; i < nb_pkts; i++) {
+		int ret;
+		struct rte_flow_error error;
+		struct rte_flow_restore_info info = { 0, };
+
 		mb = pkts[i];
 		eth_hdr = rte_pktmbuf_read(mb, 0, sizeof(_eth_hdr), &_eth_hdr);
 		eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
-		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
-
+		ret = rte_flow_get_restore_info(port_id, mb, &info, &error);
+		if (!ret) {
+			printf("restore info:");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
+				struct port_flow_tunnel *port_tunnel;
+
+				port_tunnel = port_flow_locate_tunnel
+					      (port_id, &info.tunnel);
+				printf(" - tunnel");
+				if (port_tunnel)
+					printf(" #%u", port_tunnel->id);
+				else
+					printf(" %s", "-none-");
+				printf(" type %s",
+					port_flow_tunnel_type(&info.tunnel));
+			} else {
+				printf(" - no tunnel info");
+			}
+			if (info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED)
+				printf(" - outer header present");
+			else
+				printf(" - no outer header");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_GROUP_ID)
+				printf(" - miss group %u", info.group_id);
+			else
+				printf(" - no miss group");
+			printf("\n");
+		}
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
 		printf(" - type=0x%04x - length=%u - nb_segs=%d",
 		       eth_type, (unsigned int) mb->pkt_len,
 		       (int)mb->nb_segs);
+		ol_flags = mb->ol_flags;
 		if (ol_flags & PKT_RX_RSS_HASH) {
 			printf(" - RSS hash=0x%x", (unsigned int) mb->hash.rss);
 			printf(" - RSS queue=0x%x", (unsigned int) queue);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 72bdb1be43..ec3ddd102e 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3727,6 +3727,45 @@ following sections.
 
    flow aged {port_id} [destroy]
 
+- Tunnel offload - create a tunnel stub::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+- Tunnel offload - destroy a tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+- Tunnel offload - list port tunnel stubs::
+
+   flow tunnel list {port_id}
+
+Creating a tunnel stub for offload
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel create`` setup a tunnel stub for tunnel offload flow rules::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+If successful, it will return a tunnel stub ID usable with other commands::
+
+   port [...]: flow tunnel #[...] type [...]
+
+Tunnel stub ID is relative to a port.
+
+Destroying tunnel offload stub
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel destroy`` destroy port tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+Listing tunnel offload stubs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel list`` list port tunnel offload stubs::
+
+   flow tunnel list {port_id}
+
 Validating flow rules
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -3773,6 +3812,7 @@ to ``rte_flow_create()``::
 
    flow create {port_id}
       [group {group_id}] [priority {level}] [ingress] [egress] [transfer]
+      [tunnel_set {tunnel_id}] [tunnel_match {tunnel_id}]
       pattern {item} [/ {item} [...]] / end
       actions {action} [/ {action} [...]] / end
 
@@ -3787,6 +3827,7 @@ Otherwise it will show an error message of the form::
 Parameters describe in the following order:
 
 - Attributes (*group*, *priority*, *ingress*, *egress*, *transfer* tokens).
+- Tunnel offload specification (tunnel_set, tunnel_match)
 - A matching pattern, starting with the *pattern* token and terminated by an
   *end* pattern item.
 - Actions, starting with the *actions* token and terminated by an *end*
@@ -3830,6 +3871,14 @@ Most rules affect RX therefore contain the ``ingress`` token::
 
    testpmd> flow create 0 ingress pattern [...]
 
+Tunnel offload
+^^^^^^^^^^^^^^
+
+Indicate tunnel offload rule type
+
+- ``tunnel_set {tunnel_id}``: mark rule as tunnel offload decap_set type.
+- ``tunnel_match {tunnel_id}``:  mark rule as tunel offload match type.
+
 Matching pattern
 ^^^^^^^^^^^^^^^^
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v4 4/4] app/testpmd: add commands for tunnel offload API
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 4/4] app/testpmd: add commands for " Gregory Etelson
@ 2020-10-04 13:59     ` Ori Kam
  0 siblings, 0 replies; 95+ messages in thread
From: Ori Kam @ 2020-10-04 13:59 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: Matan Azrad, Raslan Darawsheh, Eli Britstein, Oz Shlomo,
	ajit.khaparde, Wenzhuo Lu, Beilei Xing, Bernard Iremonger

Hi

> -----Original Message-----
> From: Gregory Etelson <getelson@nvidia.com>
> Sent: Sunday, October 4, 2020 4:51 PM
> Subject: [PATCH v4 4/4] app/testpmd: add commands for tunnel offload API
> 
> Tunnel Offload API provides hardware independent, unified model
> to offload tunneled traffic. Key model elements are:
>  - apply matches to both outer and inner packet headers
>    during entire offload procedure;
>  - restore outer header of partially offloaded packet;
>  - model is implemented as a set of helper functions.
> 
> Implementation details:
> 
> * Create application tunnel:
> flow tunnel create <port> type <tunnel type>
> On success, the command creates application tunnel object and returns
> the tunnel descriptor. Tunnel descriptor is used in subsequent flow
> creation commands to reference the tunnel.
> 
> * Create tunnel steering flow rule:
> tunnel_set <tunnel descriptor> parameter used with steering rule
> template.
> 
> * Create tunnel matching flow rule:
> tunnel_match <tunnel descriptor> used with matching rule template.
> 
> * If tunnel steering rule was offloaded, outer header of a partially
> offloaded packet is restored after miss.
> 
> Example:
> test packet=
> <Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
> <IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
> <UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
> <VXLAN  NextProtocol=Ethernet vni=0x0 |
> <Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
> <IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
> <ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
> >>> len(packet)
> 92
> 
> testpmd> flow flush 0
> testpmd> port 0/queue 0: received 1 packets
> src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
> length=92
> 
> testpmd> flow tunnel 0 type vxlan
> port 0: flow tunnel #1 type vxlan
> testpmd> flow create 0 ingress group 0 tunnel_set 1
>          pattern eth /ipv4 / udp dst is 4789 / vxlan / end
>          actions  jump group 0 / end
> Flow rule #0 created
> testpmd> port 0/queue 0: received 1 packets
> tunnel restore info: - vxlan tunnel - outer header present # <--
>   src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
> length=92
> 
> testpmd> flow create 0 ingress group 0 tunnel_match 1
>          pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
>          end
>          actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
>          queue index 0 / end
> Flow rule #1 created
> testpmd> port 0/queue 0: received 1 packets
>   src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
> length=42
> 
> * Destroy flow tunnel
> flow tunnel destroy <port> id <tunnel id>
> 
> * Show existing flow tunnels
> flow tunnel list <port>
> 
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> ---
> v2:
> * introduce testpmd support for tunnel offload API
> 
> v3:
> * update flow tunnel commands
> ---


Acked-by: Ori Kam <orika@nvidia.com>
Thanks,
Ori

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model Gregory Etelson
@ 2020-10-06  9:47     ` Sriharsha Basavapatna
  2020-10-07 12:36       ` Gregory Etelson
  2020-10-14 23:55     ` Thomas Monjalon
  1 sibling, 1 reply; 95+ messages in thread
From: Sriharsha Basavapatna @ 2020-10-06  9:47 UTC (permalink / raw)
  To: Gregory Etelson, Eli Britstein, dev; +Cc: Sriharsha Basavapatna

On Sun, Oct 4, 2020 at 7:23 PM Gregory Etelson <getelson@nvidia.com> wrote:
>
> From: Eli Britstein <elibr@mellanox.com>
>
> Rte_flow API provides the building blocks for vendor agnostic flow
> classification offloads.  The rte_flow match and action primitives are
> fine grained, thus enabling DPDK applications the flexibility to
> offload network stacks and complex pipelines.
>
> Applications wishing to offload complex data structures (e.g. tunnel
> virtual ports) are required to use the rte_flow primitives, such as
> group, meta, mark, tag and others to model their high level objects.
>
> The hardware model design for high level software objects is not
> trivial.  Furthermore, an optimal design is often vendor specific.
>
> The goal of this API is to provide applications with the hardware
> offload model for common high level software objects which is optimal
> in regards to the underlying hardware.
>
> Tunnel ports are the first of such objects.
>
> Tunnel ports
> ------------
> Ingress processing of tunneled traffic requires the classification of
> the tunnel type followed by a decap action.
>
> In software, once a packet is decapsulated the in_port field is
> changed to a virtual port representing the tunnel type. The outer
> header fields are stored as packet metadata members and may be matched
> by proceeding flows.
>
> Openvswitch, for example, uses two flows:
> 1. classification flow - setting the virtual port representing the
> tunnel type For example: match on udp port 4789
> actions=tnl_pop(vxlan_vport)
> 2. steering flow according to outer and inner header matches match on
> in_port=vxlan_vport and outer/inner header matches actions=forward to
> p ort X The benefits of multi-flow tables are described in [1].
>
> Offloading tunnel ports
> -----------------------
> Tunnel ports introduce a new stateless field that can be matched on.
> Currently the rte_flow library provides an API to encap, decap and
> match on tunnel headers. However, there is no rte_flow primitive to
> set and match tunnel virtual ports.

Tunnel vport is an internal construct used by one specific
application: OVS. So, shouldn't the rte APIs also be application
agnostic apart from being vendor agnostic ? For OVS, the match fields
in the existing datapath flow rules contain enough information to
identify the tunnel type.
>
> There are several possible hardware models for offloading virtual
> tunnel port flows including, but not limited to, the following:
> 1. Setting the virtual port on a hw register using the
> rte_flow_action_mark/ rte_flow_action_tag/rte_flow_set_meta objects.
> 2. Mapping a virtual port to an rte_flow group
> 3. Avoiding the need to match on transient objects by merging
> multi-table flows to a single rte_flow rule.
>
> Every approach has its pros and cons.  The preferred approach should
> take into account the entire system architecture and is very often
> vendor specific.
>
> The proposed rte_flow_tunnel_decap_set helper function (drafted below)
> is designed to provide a common, vendor agnostic, API for setting the
> virtual port value.  The helper API enables PMD implementations to
> return vendor specific combination of rte_flow actions realizing the
> vendor's hardware model for setting a tunnel port.  Applications may
> append the list of actions returned from the helper function when
> creating an rte_flow rule in hardware.

Wouldn't it be better if the APIs do not refer to vports and avoid
percolating it down to the PMD ? My point here is to avoid bringing in
the knowledge of an application specific virtual object (vport) to the
PMD.

Here's some other issues that I see with the helper APIs and
vendor-specific variable actions.
1) The application needs some kind of validation (or understanding) of
the actions returned by the PMD. The application can't just blindly
use the actions specified by the PMD. That is, the decision to pick
the set of actions can't be left entirely to the PMD.
2) The application needs to learn a PMD-specific way of action
processing for each vendor. For example, how should the application
handle flow-miss, given a different set of actions between two vendors
(if one vendor has already popped the tunnel header while the other
one hasn't).
3) The end-users/customers won't have a common interface (as in,
consistent actions) to perform tunnel decap action. This becomes a
manageability/maintenance issue for the application while working with
different vendors.

IMO, the API shouldn't expect the PMD to understand the notion of
vport. The goal here is to offload a flow rule to decap the tunnel
header and forward the packet to a HW endpoint.  The problem is that
we don't have a way to express the "tnl_pop" datapath action to the HW
(decap flow #1, in the context of br-phy in OVS-DPDK) and also we may
not want the HW to really pop the tunnel header at that stage. If this
cannot be expressed with existing rte action types, maybe we should
introduce a new action that clearly defines what is expected to the
PMD.
>
> Similarly, the rte_flow_tunnel_match helper (drafted below)
> allows for multiple hardware implementations to return a list of
> fte_flow items.
>
> Miss handling
> -------------
> Packets going through multiple rte_flow groups are exposed to hw
> misses due to partial packet processing. In such cases, the software
> should continue the packet's processing from the point where the
> hardware missed.

Whether the packet goes through multiple groups or not for tunnel
decap processing, should be left to the PMD/HW.  These assumptions
shouldn't be built into the APIs. The encapsulated packet (i,e with
outer headers) should be provided to the application, rather than
making SW understand that there was a miss in stage-1, or stage-n in
HW. That is, HW either processes it entirely, or punts the whole
packet to SW if there's a miss. And the packet should take the normal
processing path in SW (no action offload).

Thanks,
-Harsha

>
> We propose a generic rte_flow_restore structure providing the state
> that was stored in hardware when the packet missed.
>
> Currently, the structure will provide the tunnel state of the packet
> that missed, namely:
> 1. The group id that missed
> 2. The tunnel port that missed
> 3. Tunnel information that was stored in memory (due to decap action).
> In the future, we may add additional fields as more state may be
> stored in the device memory (e.g. ct_state).
>
> Applications may query the state via a new
> rte_flow_tunnel_get_restore_info(mbuf) API, thus allowing
> a vendor specific implementation.
>
> VXLAN Code example:
> Assume application needs to do inner NAT on VXLAN packet.
> The first  rule in group 0:
>
> flow create <port id> ingress group 0
>   pattern eth / ipv4 / udp dst is 4789 / vxlan / end
>   actions {pmd actions} / jump group 3 / end
>
> First VXLAN packet that arrives matches the rule in group 0 and jumps
> to group 3 In group 3 the packet will miss since there is no flow to
> match and will be uploaded to application.  Application  will call
> rte_flow_get_restore_info() to get the packet outer header.
> Application will insert a new rule in group 3 to match outer and inner
> headers:
>
> flow create <port id> ingress group 3
>   pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
>           udp dst 4789 / vxlan vni is 10 /
>           ipv4 dst is 184.1.2.3 / end
>   actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end
>
> Resulting of rules will be that VXLAN packet with vni=10, outer IPv4
> dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decaped
> on queue 3 with IPv4 dst=186.1.1.1
>
> Note: Packet in group 3 is considered decaped. All actions in that
> group will be done on header that was inner before decap. Application
> may specify outer header to be matched on.  It's PMD responsibility to
> translate these items to outer metadata.
>
> API usage:
> /**
>  * 1. Initiate RTE flow tunnel object
>  */
> const struct rte_flow_tunnel tunnel = {
>   .type = RTE_FLOW_ITEM_TYPE_VXLAN,
>   .tun_id = 10,
> }
>
> /**
>  * 2. Obtain PMD tunnel actions
>  *
>  * pmd_actions is an intermediate variable application uses to
>  * compile actions array
>  */
> struct rte_flow_action **pmd_actions;
> rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
>                               &num_pmd_actions, &error);
>
> /**
>  * 3. offload the first  rule
>  * matching on VXLAN traffic and jumps to group 3
>  * (implicitly decaps packet)
>  */
> app_actions  =   jump group 3
> rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
> rule_actions = { pmd_actions, app_actions };
> attr.group = 0;
> flow_1 = rte_flow_create(port_id, &attr,
>                          rule_items, rule_actions, &error);
> /**
>   * 4. after flow creation application does not need to keep tunnel
>   * action resources.
>   */
> rte_flow_tunnel_action_release(port_id, pmd_actions,
>                                num_pmd_actions);
>
> /**
>   * 5. After partially offloaded packet miss because there was no
>   * matching rule handle miss on group 3
>   */
> struct rte_flow_restore_info info;
> rte_flow_get_restore_info(port_id, mbuf, &info, &error);
>
> /**
>  * 6. Offload NAT rule:
>  */
> app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
>             vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
> app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }
>
> rte_flow_tunnel_match(&info.tunnel, &pmd_items,
>                       &num_pmd_items,  &error);
> rule_items = {pmd_items, app_items};
> rule_actions = app_actions;
> attr.group = info.group_id;
> flow_2 = rte_flow_create(port_id, &attr,
>                          rule_items, rule_actions, &error);
>
> /**
>  * 7. Release PMD items after rule creation
>  */
> rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items);
>
> References
> 1. https://mails.dpdk.org/archives/dev/2020-June/index.html
>
> Signed-off-by: Eli Britstein <elibr@mellanox.com>
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
>  doc/guides/rel_notes/release_20_11.rst   |   9 ++
>  lib/librte_ethdev/rte_ethdev_version.map |   6 +
>  lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
>  lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
>  lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
>  6 files changed, 459 insertions(+)
>
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index df71bd2eeb..28f2aca984 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3034,6 +3034,111 @@ operations include:
>  - Duplication of a complete flow rule description.
>  - Pattern item or action name retrieval.
>
> +Tunneled traffic offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Provide software application with unified rules model for tunneled traffic
> +regardless underlying hardware.
> +
> + - The model introduces a concept of a virtual tunnel port (VTP).
> + - The model uses VTP to offload ingress tunneled network traffic
> +   with RTE flow rules.
> + - The model is implemented as set of helper functions. Each PMD
> +   implements VTP offload according to underlying hardware offload
> +   capabilities.  Applications must query PMD for VTP flow
> +   items / actions before using in creation of a VTP flow rule.
> +
> +The model components:
> +
> +- Virtual Tunnel Port (VTP) is a stateless software object that
> +  describes tunneled network traffic.  VTP object usually contains
> +  descriptions of outer headers, tunnel headers and inner headers.
> +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> +  delegates them to tunnel processing infrastructure, implemented
> +  in PMD for optimal hardware utilization, for further processing.
> +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> +  runs offload actions in case of a match.
> +
> +Application actions:
> +
> +1 Initialize VTP object according to tunnel network parameters.
> +
> +2 Create TSR flow rule.
> +
> +2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
> +
> +  .. code-block:: c
> +
> +    int
> +    rte_flow_tunnel_decap_set(uint16_t port_id,
> +                              struct rte_flow_tunnel *tunnel,
> +                              struct rte_flow_action **pmd_actions,
> +                              uint32_t *num_of_pmd_actions,
> +                              struct rte_flow_error *error);
> +
> +2.2 Integrate PMD actions into TSR actions list.
> +
> +2.3 Create TSR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
> +
> +3 Create TMR flow rule.
> +
> +3.1 Query PMD for VTP items. Application can query for VTP items more than once.
> +
> +    .. code-block:: c
> +
> +      int
> +      rte_flow_tunnel_match(uint16_t port_id,
> +                            struct rte_flow_tunnel *tunnel,
> +                            struct rte_flow_item **pmd_items,
> +                            uint32_t *num_of_pmd_items,
> +                            struct rte_flow_error *error);
> +
> +3.2 Integrate PMD items into TMR items list.
> +
> +3.3 Create TMR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
> +
> +The model provides helper function call to restore packets that miss
> +tunnel TMR rules to its original state:
> +
> +.. code-block:: c
> +
> +  int
> +  rte_flow_get_restore_info(uint16_t port_id,
> +                            struct rte_mbuf *mbuf,
> +                            struct rte_flow_restore_info *info,
> +                            struct rte_flow_error *error);
> +
> +rte_tunnel object filled by the call inside
> +``rte_flow_restore_info *info parameter`` can be used by the application
> +to create new TMR rule for that tunnel.
> +
> +The model requirements:
> +
> +Software application must initialize
> +rte_tunnel object with tunnel parameters before calling
> +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> +
> +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> +released by application with rte_flow_action_release() call.
> +Application can release the actionsfter TSR rule was created.
> +
> +PMD items array obtained with rte_flow_tunnel_match() must be released
> +by application with rte_flow_item_release() call.  Application can
> +release the items after rule was created. However, if the application
> +needs to create additional TMR rule for the same tunnel it will need
> +to obtain PMD items again.
> +
> +Application cannot destroy rte_tunnel object before it releases all
> +PMD actions & PMD items referencing that tunnel.
> +
>  Caveats
>  -------
>
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index 1a9945f314..9ea8ee3170 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -62,6 +62,15 @@ New Features
>    * Added support for 200G PAM4 link speed.
>    * Added support for RSS hash level selection.
>    * Updated HWRM structures to 1.10.1.70 version.
> +* **Flow rules allowed to use private PMD items / actions.**
> +
> +  * Flow rule verification was updated to accept private PMD
> +    items and actions.
> +
> +* **Added generic API to offload tunneled traffic and restore missed packet.**
> +
> +  * Added a new hardware independent helper API to RTE flow library that
> +    offloads tunneled traffic and restores missed packets.
>
>  * **Updated Cisco enic driver.**
>
> diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
> index c95ef5157a..9832c138a2 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -226,6 +226,12 @@ EXPERIMENTAL {
>         rte_tm_wred_profile_add;
>         rte_tm_wred_profile_delete;
>
> +       rte_flow_tunnel_decap_set;
> +       rte_flow_tunnel_match;
> +       rte_flow_get_restore_info;
> +       rte_flow_tunnel_action_decap_release;
> +       rte_flow_tunnel_item_release;
> +
>         # added in 20.11
>         rte_eth_link_speed_to_str;
>         rte_eth_link_to_str;
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index c8c6d62a8b..181c02792d 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -1267,3 +1267,115 @@ rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>                                   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
>                                   NULL, rte_strerror(ENOTSUP));
>  }
> +
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> +                         struct rte_flow_tunnel *tunnel,
> +                         struct rte_flow_action **actions,
> +                         uint32_t *num_of_actions,
> +                         struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->tunnel_decap_set)) {
> +               return flow_err(port_id,
> +                               ops->tunnel_decap_set(dev, tunnel, actions,
> +                                                     num_of_actions, error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +                     struct rte_flow_tunnel *tunnel,
> +                     struct rte_flow_item **items,
> +                     uint32_t *num_of_items,
> +                     struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->tunnel_match)) {
> +               return flow_err(port_id,
> +                               ops->tunnel_match(dev, tunnel, items,
> +                                                 num_of_items, error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_get_restore_info(uint16_t port_id,
> +                         struct rte_mbuf *m,
> +                         struct rte_flow_restore_info *restore_info,
> +                         struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->get_restore_info)) {
> +               return flow_err(port_id,
> +                               ops->get_restore_info(dev, m, restore_info,
> +                                                     error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> +                                    struct rte_flow_action *actions,
> +                                    uint32_t num_of_actions,
> +                                    struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->action_release)) {
> +               return flow_err(port_id,
> +                               ops->action_release(dev, actions,
> +                                                   num_of_actions, error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> +                            struct rte_flow_item *items,
> +                            uint32_t num_of_items,
> +                            struct rte_flow_error *error)
> +{
> +       struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +       const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +       if (unlikely(!ops))
> +               return -rte_errno;
> +       if (likely(!!ops->item_release)) {
> +               return flow_err(port_id,
> +                               ops->item_release(dev, items,
> +                                                 num_of_items, error),
> +                               error);
> +       }
> +       return rte_flow_error_set(error, ENOTSUP,
> +                                 RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +                                 NULL, rte_strerror(ENOTSUP));
> +}
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index da8bfa5489..2f12d3ea1a 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3357,6 +3357,201 @@ int
>  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>                         uint32_t nb_contexts, struct rte_flow_error *error);
>
> +/* Tunnel has a type and the key information. */
> +struct rte_flow_tunnel {
> +       /**
> +        * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> +        * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> +        */
> +       enum rte_flow_item_type type;
> +       uint64_t tun_id; /**< Tunnel identification. */
> +
> +       RTE_STD_C11
> +       union {
> +               struct {
> +                       rte_be32_t src_addr; /**< IPv4 source address. */
> +                       rte_be32_t dst_addr; /**< IPv4 destination address. */
> +               } ipv4;
> +               struct {
> +                       uint8_t src_addr[16]; /**< IPv6 source address. */
> +                       uint8_t dst_addr[16]; /**< IPv6 destination address. */
> +               } ipv6;
> +       };
> +       rte_be16_t tp_src; /**< Tunnel port source. */
> +       rte_be16_t tp_dst; /**< Tunnel port destination. */
> +       uint16_t   tun_flags; /**< Tunnel flags. */
> +
> +       bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> +
> +       /**
> +        * the following members are required to restore packet
> +        * after miss
> +        */
> +       uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> +       uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
> +       uint32_t label; /**< Flow Label for IPv6. */
> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +struct rte_flow_restore_info {
> +       /**
> +        * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> +        * other fields in struct rte_flow_restore_info.
> +        */
> +       uint64_t flags;
> +       uint32_t group_id; /**< Group ID where packed missed */
> +       struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-decap-set for the given tunnel.
> + * Sample usage:
> + *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> + *            jump group 0 / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] actions
> + *   Array of actions to be allocated by the PMD. This array should be
> + *   concatenated with the actions array provided to rte_flow_create.
> + * @param[out] num_of_actions
> + *   Number of actions allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> +                         struct rte_flow_tunnel *tunnel,
> +                         struct rte_flow_action **actions,
> +                         uint32_t *num_of_actions,
> +                         struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> + *           inner-header-matches / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] items
> + *   Array of items to be allocated by the PMD. This array should be
> + *   concatenated with the items array provided to rte_flow_create.
> + * @param[out] num_of_items
> + *   Number of items allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +                     struct rte_flow_tunnel *tunnel,
> +                     struct rte_flow_item **items,
> +                     uint32_t *num_of_items,
> +                     struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given mbuf.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] m
> + *   Mbuf struct.
> + * @param[out] info
> + *   Restore information. Upon success contains the HW state.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_get_restore_info(uint16_t port_id,
> +                         struct rte_mbuf *m,
> +                         struct rte_flow_restore_info *info,
> +                         struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] actions
> + *   Array of actions to be released.
> + * @param[in] num_of_actions
> + *   Number of elements in actions array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> +                                    struct rte_flow_action *actions,
> +                                    uint32_t num_of_actions,
> +                                    struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] items
> + *   Array of items to be released.
> + * @param[in] num_of_items
> + *   Number of elements in item array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> +                            struct rte_flow_item *items,
> +                            uint32_t num_of_items,
> +                            struct rte_flow_error *error);
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
> index 3ee871d3eb..9d87407203 100644
> --- a/lib/librte_ethdev/rte_flow_driver.h
> +++ b/lib/librte_ethdev/rte_flow_driver.h
> @@ -108,6 +108,38 @@ struct rte_flow_ops {
>                  void **context,
>                  uint32_t nb_contexts,
>                  struct rte_flow_error *err);
> +       /** See rte_flow_tunnel_decap_set() */
> +       int (*tunnel_decap_set)
> +               (struct rte_eth_dev *dev,
> +                struct rte_flow_tunnel *tunnel,
> +                struct rte_flow_action **pmd_actions,
> +                uint32_t *num_of_actions,
> +                struct rte_flow_error *err);
> +       /** See rte_flow_tunnel_match() */
> +       int (*tunnel_match)
> +               (struct rte_eth_dev *dev,
> +                struct rte_flow_tunnel *tunnel,
> +                struct rte_flow_item **pmd_items,
> +                uint32_t *num_of_items,
> +                struct rte_flow_error *err);
> +       /** See rte_flow_get_rte_flow_restore_info() */
> +       int (*get_restore_info)
> +               (struct rte_eth_dev *dev,
> +                struct rte_mbuf *m,
> +                struct rte_flow_restore_info *info,
> +                struct rte_flow_error *err);
> +       /** See rte_flow_action_tunnel_decap_release() */
> +       int (*action_release)
> +               (struct rte_eth_dev *dev,
> +                struct rte_flow_action *pmd_actions,
> +                uint32_t num_of_actions,
> +                struct rte_flow_error *err);
> +       /** See rte_flow_item_release() */
> +       int (*item_release)
> +               (struct rte_eth_dev *dev,
> +                struct rte_flow_item *pmd_items,
> +                uint32_t num_of_items,
> +                struct rte_flow_error *err);
>  };
>
>  /**
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model
  2020-10-06  9:47     ` Sriharsha Basavapatna
@ 2020-10-07 12:36       ` Gregory Etelson
  2020-10-14 17:23         ` Ferruh Yigit
  0 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-10-07 12:36 UTC (permalink / raw)
  To: Sriharsha Basavapatna, dev
  Cc: Gregory Etelson, Eli Britstein, Oz Shlomo, Ori Kam

Hello Harsha,

> -----Original Message-----

[snip]
> 
> Tunnel vport is an internal construct used by one specific
> application: OVS. So, shouldn't the rte APIs also be application
> agnostic apart from being vendor agnostic ? For OVS, the match fields
> in the existing datapath flow rules contain enough information to
> identify the tunnel type.

Tunnel offload model was inspired by OVS vport, but it is not part of the existing API.
It looks like the API documentation should not use that term to avoid confusion.

[snip]

[snip]
> 
> Wouldn't it be better if the APIs do not refer to vports and avoid
> percolating it down to the PMD ? My point here is to avoid bringing in
> the knowledge of an application specific virtual object (vport) to the
> PMD.
> 

As I have mentioned above, the API description should not mention vport.
I'll post updated documents. 

> Here's some other issues that I see with the helper APIs and
> vendor-specific variable actions.
> 1) The application needs some kind of validation (or understanding) of
> the actions returned by the PMD. The application can't just blindly
> use the actions specified by the PMD. That is, the decision to pick
> the set of actions can't be left entirely to the PMD.
> 2) The application needs to learn a PMD-specific way of action
> processing for each vendor. For example, how should the application
> handle flow-miss, given a different set of actions between two vendors
> (if one vendor has already popped the tunnel header while the other
> one hasn't).
> 3) The end-users/customers won't have a common interface (as in,
> consistent actions) to perform tunnel decap action. This becomes a
> manageability/maintenance issue for the application while working with
> different vendors.
> 
> IMO, the API shouldn't expect the PMD to understand the notion of
> vport. The goal here is to offload a flow rule to decap the tunnel
> header and forward the packet to a HW endpoint.  The problem is that
> we don't have a way to express the "tnl_pop" datapath action to the HW
> (decap flow #1, in the context of br-phy in OVS-DPDK) and also we may
> not want the HW to really pop the tunnel header at that stage. If this
> cannot be expressed with existing rte action types, maybe we should
> introduce a new action that clearly defines what is expected to the
> PMD.

Tunnel Offload API provides a common interface for all HW vendors:
Rule #1: define a tunneled traffic and steer / group traffic related to
that tunnel
Rule #2: within the tunnel selection, run matchers on all packet headers,
outer and inner, and perform actions on inner headers in case of a match.
For the rule #1 application provides tunnel matchers and traffic selection actions
and for rule #2 application provides full header matchers and actions for inner parts.
The rest is supplied by PMD according to HW and rule type. Application does not
need to understand exact PMD elements implementation.
Helper return value notifies application whether it received requested PMD elements or not.
If helper completed successfully, it means that application received required elements
and can complete flow rule compilation.
As the result, a packet will be fully offloaded or returned to application with enough
information to continue processing in SW.

[snip]

[snip]

> > Miss handling
> > -------------
> > Packets going through multiple rte_flow groups are exposed to hw
> > misses due to partial packet processing. In such cases, the software
> > should continue the packet's processing from the point where the
> > hardware missed.
> 
> Whether the packet goes through multiple groups or not for tunnel
> decap processing, should be left to the PMD/HW.  These assumptions
> shouldn't be built into the APIs. The encapsulated packet (i,e with
> outer headers) should be provided to the application, rather than
> making SW understand that there was a miss in stage-1, or stage-n in
> HW. That is, HW either processes it entirely, or punts the whole
> packet to SW if there's a miss. And the packet should take the normal
> processing path in SW (no action offload).
> 
> Thanks,
> -Harsha

The packet is provided to the application via the standard rte_eth_rx_burst API.
Additional information about the HW packet processing state is provided to
the application by the suggested rte_flow_get_restore_info API. It is up to the
application if to use such provided info, or even if to call this API at all.

[snip]

Regards,
Gregory

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model
  2020-10-07 12:36       ` Gregory Etelson
@ 2020-10-14 17:23         ` Ferruh Yigit
  2020-10-16  9:15           ` Gregory Etelson
  0 siblings, 1 reply; 95+ messages in thread
From: Ferruh Yigit @ 2020-10-14 17:23 UTC (permalink / raw)
  To: Gregory Etelson, Sriharsha Basavapatna, dev
  Cc: Eli Britstein, Oz Shlomo, Ori Kam

On 10/7/2020 1:36 PM, Gregory Etelson wrote:
> Hello Harsha,
> 
>> -----Original Message-----
> 
> [snip]
>>
>> Tunnel vport is an internal construct used by one specific
>> application: OVS. So, shouldn't the rte APIs also be application
>> agnostic apart from being vendor agnostic ? For OVS, the match fields
>> in the existing datapath flow rules contain enough information to
>> identify the tunnel type.
> 
> Tunnel offload model was inspired by OVS vport, but it is not part of the existing API.
> It looks like the API documentation should not use that term to avoid confusion.
> 
> [snip]
> 
> [snip]
>>
>> Wouldn't it be better if the APIs do not refer to vports and avoid
>> percolating it down to the PMD ? My point here is to avoid bringing in
>> the knowledge of an application specific virtual object (vport) to the
>> PMD.
>>
> 
> As I have mentioned above, the API description should not mention vport.
> I'll post updated documents.
> 
>> Here's some other issues that I see with the helper APIs and
>> vendor-specific variable actions.
>> 1) The application needs some kind of validation (or understanding) of
>> the actions returned by the PMD. The application can't just blindly
>> use the actions specified by the PMD. That is, the decision to pick
>> the set of actions can't be left entirely to the PMD.
>> 2) The application needs to learn a PMD-specific way of action
>> processing for each vendor. For example, how should the application
>> handle flow-miss, given a different set of actions between two vendors
>> (if one vendor has already popped the tunnel header while the other
>> one hasn't).
>> 3) The end-users/customers won't have a common interface (as in,
>> consistent actions) to perform tunnel decap action. This becomes a
>> manageability/maintenance issue for the application while working with
>> different vendors.
>>
>> IMO, the API shouldn't expect the PMD to understand the notion of
>> vport. The goal here is to offload a flow rule to decap the tunnel
>> header and forward the packet to a HW endpoint.  The problem is that
>> we don't have a way to express the "tnl_pop" datapath action to the HW
>> (decap flow #1, in the context of br-phy in OVS-DPDK) and also we may
>> not want the HW to really pop the tunnel header at that stage. If this
>> cannot be expressed with existing rte action types, maybe we should
>> introduce a new action that clearly defines what is expected to the
>> PMD.
> 
> Tunnel Offload API provides a common interface for all HW vendors:
> Rule #1: define a tunneled traffic and steer / group traffic related to
> that tunnel
> Rule #2: within the tunnel selection, run matchers on all packet headers,
> outer and inner, and perform actions on inner headers in case of a match.
> For the rule #1 application provides tunnel matchers and traffic selection actions
> and for rule #2 application provides full header matchers and actions for inner parts.
> The rest is supplied by PMD according to HW and rule type. Application does not
> need to understand exact PMD elements implementation.
> Helper return value notifies application whether it received requested PMD elements or not.
> If helper completed successfully, it means that application received required elements
> and can complete flow rule compilation.
> As the result, a packet will be fully offloaded or returned to application with enough
> information to continue processing in SW.
> 
> [snip]
> 
> [snip]
> 
>>> Miss handling
>>> -------------
>>> Packets going through multiple rte_flow groups are exposed to hw
>>> misses due to partial packet processing. In such cases, the software
>>> should continue the packet's processing from the point where the
>>> hardware missed.
>>
>> Whether the packet goes through multiple groups or not for tunnel
>> decap processing, should be left to the PMD/HW.  These assumptions
>> shouldn't be built into the APIs. The encapsulated packet (i,e with
>> outer headers) should be provided to the application, rather than
>> making SW understand that there was a miss in stage-1, or stage-n in
>> HW. That is, HW either processes it entirely, or punts the whole
>> packet to SW if there's a miss. And the packet should take the normal
>> processing path in SW (no action offload).
>>
>> Thanks,
>> -Harsha
> 
> The packet is provided to the application via the standard rte_eth_rx_burst API.
> Additional information about the HW packet processing state is provided to
> the application by the suggested rte_flow_get_restore_info API. It is up to the
> application if to use such provided info, or even if to call this API at all.
> 
> [snip]
> 
> Regards,
> Gregory
> 


Hi Gregory, Sriharsha,

Is there any output of the discussion?

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API
  2020-10-04 13:50 ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Gregory Etelson
                     ` (3 preceding siblings ...)
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 4/4] app/testpmd: add commands for " Gregory Etelson
@ 2020-10-14 17:25   ` Ferruh Yigit
  4 siblings, 0 replies; 95+ messages in thread
From: Ferruh Yigit @ 2020-10-14 17:25 UTC (permalink / raw)
  To: Gregory Etelson, dev; +Cc: matan, rasland, elibr, ozsh, ajit.khaparde

On 10/4/2020 2:50 PM, Gregory Etelson wrote:
> Tunnel Offload API provides hardware independent, unified model
> to offload tunneled traffic. Key model elements are:
>   - apply matches to both outer and inner packet headers
>     during entire offload procedure;
>   - restore outer header of partially offloaded packet;
>   - model is implemented as a set of helper functions.
> 
>   v2:
>   * documentation updates
>   * MLX5 PMD implementation for tunnel offload
>   * testpmd updates for tunnel offload
> 
>   v3:
>   * documentation updates
>   * MLX5 PMD updates
>   * testpmd updates
> 
>   v4:
>   * updated patch: allow negative values in flow rule types
> 
> 
> Eli Britstein (1):
>    ethdev: tunnel offload model
> 
> Gregory Etelson (3):
>    ethdev: allow negative values in flow rule types
>    net/mlx5: implement tunnel offload API
>    app/testpmd: add commands for tunnel offload API
> 

Hi Gregory,

If there is no more discussion results you are waiting, can you please rebase 
your patch on top of latest next-net?


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/4] ethdev: allow negative values in flow rule types
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-10-14 23:40     ` Thomas Monjalon
  0 siblings, 0 replies; 95+ messages in thread
From: Thomas Monjalon @ 2020-10-14 23:40 UTC (permalink / raw)
  To: Gregory Etelson
  Cc: dev, matan, rasland, elibr, ozsh, ajit.khaparde, Ori Kam,
	Viacheslav Ovsiienko, Ferruh Yigit, Andrew Rybchenko

04/10/2020 15:50, Gregory Etelson:
> From: Gregory Etelson <getelson@mellanox.com>
> 
> RTE flow items & actions use positive values in item & action type.
> Negative values are reserved for PMD private types. PMD
> items & actions usually are not exposed to application and are not
> used to create RTE flows.
> 
> The patch allows applications with access to PMD flow
> items & actions ability to integrate RTE and PMD items & actions
> and use them to create flow rule.
> 
> RTE flow item or action coversion library accepts positive known

typo: coversion

> element types with predefined sizes only. Private PMD items and
> actions do not fit into this scheme becase PMD type values are
> negative, each PMD has it's own types numeration and element types and
> their sizes are not visible at RTE level.  To resolve these
> limitations the patch proposes this solution:
> 1. PMD can to expose elements of pointer size only.  RTE flow

typo: "can to"

>    conversion functions will use pointer size for each configuration
>    object in private PMD element it processes;
> 2. RTE flow verification will not reject elements with negative type.
> 
> Signed-off-by: Gregory Etelson <getelson@mellanox.com>

Don't you want to use nvidia email?

> Acked-by: Ori Kam <orika@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

I was not aware of negative PMD types in rte_flow.
Allowing to use these opaque PMD types for magic complex rule looks OK.




^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model
  2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model Gregory Etelson
  2020-10-06  9:47     ` Sriharsha Basavapatna
@ 2020-10-14 23:55     ` Thomas Monjalon
  1 sibling, 0 replies; 95+ messages in thread
From: Thomas Monjalon @ 2020-10-14 23:55 UTC (permalink / raw)
  To: Gregory Etelson
  Cc: dev, matan, rasland, ozsh, ajit.khaparde, Eli Britstein, Ori Kam,
	Viacheslav Ovsiienko, Ray Kinsella, Neil Horman, Ferruh Yigit,
	Andrew Rybchenko, asafp

Formatting review below (someone has to do it):

04/10/2020 15:50, Gregory Etelson:
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3034,6 +3034,111 @@ operations include:
> +Tunneled traffic offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Provide software application with unified rules model for tunneled traffic
> +regardless underlying hardware.
> +
> + - The model introduces a concept of a virtual tunnel port (VTP).

Given the API does use this terminology, it is confusing to find
this wording in the doc.

> + - The model uses VTP to offload ingress tunneled network traffic 
> +   with RTE flow rules.
> + - The model is implemented as set of helper functions. Each PMD
> +   implements VTP offload according to underlying hardware offload
> +   capabilities.  Applications must query PMD for VTP flow
> +   items / actions before using in creation of a VTP flow rule.
> +
> +The model components:
> +
> +- Virtual Tunnel Port (VTP) is a stateless software object that
> +  describes tunneled network traffic.  VTP object usually contains
> +  descriptions of outer headers, tunnel headers and inner headers.
> +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> +  delegates them to tunnel processing infrastructure, implemented
> +  in PMD for optimal hardware utilization, for further processing.
> +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> +  runs offload actions in case of a match.

I'm not a fan of all those acronyms.
It makes reading more difficult in my opinion.

> +
> +Application actions:
> +
> +1 Initialize VTP object according to tunnel network parameters.
> +
> +2 Create TSR flow rule.
> +
> +2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
> +
> +  .. code-block:: c
> +
> +    int
> +    rte_flow_tunnel_decap_set(uint16_t port_id,
> +                              struct rte_flow_tunnel *tunnel,
> +                              struct rte_flow_action **pmd_actions,
> +                              uint32_t *num_of_pmd_actions,
> +                              struct rte_flow_error *error);
> +
> +2.2 Integrate PMD actions into TSR actions list.
> +
> +2.3 Create TSR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end

Not sure about the testpmd syntax here.
Is it bringing a value compared to some text description?

> +
> +3 Create TMR flow rule.
> +
> +3.1 Query PMD for VTP items. Application can query for VTP items more than once.
> +
> +    .. code-block:: c
> +
> +      int
> +      rte_flow_tunnel_match(uint16_t port_id,
> +                            struct rte_flow_tunnel *tunnel,
> +                            struct rte_flow_item **pmd_items,
> +                            uint32_t *num_of_pmd_items,
> +                            struct rte_flow_error *error);
> +
> +3.2 Integrate PMD items into TMR items list.
> +
> +3.3 Create TMR flow rule.
> +
> +    .. code-block:: console
> +
> +      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
> +
> +The model provides helper function call to restore packets that miss
> +tunnel TMR rules to its original state:
> +
> +.. code-block:: c
> +
> +  int
> +  rte_flow_get_restore_info(uint16_t port_id,
> +                            struct rte_mbuf *mbuf,
> +                            struct rte_flow_restore_info *info,
> +                            struct rte_flow_error *error);
> +
> +rte_tunnel object filled by the call inside
> +``rte_flow_restore_info *info parameter`` can be used by the application
> +to create new TMR rule for that tunnel.
> +
> +The model requirements:

Should it be a section title?

> +
> +Software application must initialize
> +rte_tunnel object with tunnel parameters before calling
> +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().

It is preferred having code symbols between double backquotes.

> +
> +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> +released by application with rte_flow_action_release() call.
> +Application can release the actionsfter TSR rule was created.

typo: actionsfter

> +
> +PMD items array obtained with rte_flow_tunnel_match() must be released
> +by application with rte_flow_item_release() call.  Application can
> +release the items after rule was created. However, if the application
> +needs to create additional TMR rule for the same tunnel it will need
> +to obtain PMD items again.
> +
> +Application cannot destroy rte_tunnel object before it releases all
> +PMD actions & PMD items referencing that tunnel.
[...]
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -62,6 +62,15 @@ New Features
>    * Added support for 200G PAM4 link speed.
>    * Added support for RSS hash level selection.
>    * Updated HWRM structures to 1.10.1.70 version.
> +* **Flow rules allowed to use private PMD items / actions.**
> +
> +  * Flow rule verification was updated to accept private PMD
> +    items and actions.

This should be in the previous patch, but not sure it's worth noting at all.

> +
> +* **Added generic API to offload tunneled traffic and restore missed packet.**
> +
> +  * Added a new hardware independent helper API to RTE flow library that
> +    offloads tunneled traffic and restores missed packets.

Here and elsewhere, "hardware independent" is implied for rte_flow API.
Please write "flow API" or "rte_flow".
This block should be before driver ones, with other ethdev features.

> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -226,6 +226,12 @@ EXPERIMENTAL {
>  	rte_tm_wred_profile_add;
>  	rte_tm_wred_profile_delete;
>  
> +	rte_flow_tunnel_decap_set;
> +	rte_flow_tunnel_match;
> +	rte_flow_get_restore_info;
> +	rte_flow_tunnel_action_decap_release;
> +	rte_flow_tunnel_item_release;

It is for 20.11 now, so should be placed below.

> +
>  	# added in 20.11
>  	rte_eth_link_speed_to_str;
>  	rte_eth_link_to_str;




^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v5 0/3] Tunnel Offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (5 preceding siblings ...)
  2020-10-04 13:50 ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Gregory Etelson
@ 2020-10-15 12:41 ` Gregory Etelson
  2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
                     ` (3 more replies)
  2020-10-16  8:55 ` [dpdk-dev] [PATCH v6 " Gregory Etelson
                   ` (10 subsequent siblings)
  17 siblings, 4 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-15 12:41 UTC (permalink / raw)
  To: dev; +Cc: getelson, matan, rasland, elibr, ozsh

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

 v2:
 * documentation updates
 * MLX5 PMD implementation for tunnel offload
 * testpmd updates for tunnel offload

 v3:
 * documentation updates
 * MLX5 PMD updates
 * testpmd updates

 v4:
 * updated patch: allow negative values in flow rule types

v5:
 * rebase to next-net

Eli Britstein (1):
  ethdev: tunnel offload model

Gregory Etelson (2):
  ethdev: allow negative values in flow rule types
  app/testpmd: add commands for tunnel offload API

 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 252 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/prog_guide/rte_flow.rst          | 108 +++++++++
 doc/guides/rel_notes/release_20_11.rst      |  10 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 lib/librte_ethdev/rte_ethdev_version.map    |   5 +
 lib/librte_ethdev/rte_flow.c                | 140 ++++++++++-
 lib/librte_ethdev/rte_flow.h                | 195 +++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h         |  32 +++
 12 files changed, 1016 insertions(+), 19 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v5 1/3] ethdev: allow negative values in flow rule types
  2020-10-15 12:41 ` [dpdk-dev] [PATCH v5 0/3] " Gregory Etelson
@ 2020-10-15 12:41   ` Gregory Etelson
  2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 2/3] ethdev: tunnel offload model Gregory Etelson
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-15 12:41 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Ori Kam,
	Viacheslav Ovsiienko, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow item or action conversion library accepts positive known
element types with predefined sizes only. Private PMD items and
actions do not fit into this scheme becase PMD type values are
negative, each PMD has it's own types numeration and element types and
their sizes are not visible at RTE level.  To resolve these
limitations the patch proposes this solution:
1. PMD can expose elements of pointer size only.  RTE flow
   conversion functions will use pointer size for each configuration
   object in private PMD element it processes;
2. RTE flow verification will not reject elements with negative type.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v4:
* update the 'Negative types' section in the rtre_flow.rst

* update the patch comment

v5:
* rebase to next-net
---
 doc/guides/prog_guide/rte_flow.rst     |  3 +++
 doc/guides/rel_notes/release_20_11.rst |  5 +++++
 lib/librte_ethdev/rte_flow.c           | 28 ++++++++++++++++++++------
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 55497c9033..d0dd7a00f1 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2765,6 +2765,9 @@ identifiers they are not aware of.
 
 A method to generate them remains to be defined.
 
+Application may use PMD dynamic items or actions in flow rules. In that case
+size of configuration object in dynamic element must be a pointer size.
+
 Planned types
 ~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 3181fdb6a7..55bcb5e91f 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -92,6 +92,11 @@ New Features
   * Updated HWRM structures to 1.10.1.70 version.
   * Added TRUFLOW support for Stingray devices.
 
+* **Flow rules allowed to use private PMD items / actions.**
+
+  * Flow rule verification was updated to accept private PMD
+    items and actions.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index ee4f3464fe..411da1940e 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -480,7 +480,11 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_item[item->type].size;
+		/**
+		 * allow PMD private flow item
+		 */
+		off = (int)item->type >= 0 ?
+		      rte_flow_desc_item[item->type].size : sizeof(void *);
 		rte_memcpy(buf, data, (size > off ? off : size));
 		break;
 	}
@@ -583,7 +587,11 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_action[action->type].size;
+		/**
+		 * allow PMD private flow action
+		 */
+		off = (int)action->type >= 0 ?
+		      rte_flow_desc_action[action->type].size : sizeof(void *);
 		rte_memcpy(buf, action->conf, (size > off ? off : size));
 		break;
 	}
@@ -625,8 +633,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
-		    !rte_flow_desc_item[src->type].name)
+		/**
+		 * allow PMD private flow item
+		 */
+		if (((int)src->type >= 0) &&
+			((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
+		    !rte_flow_desc_item[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
 				 "cannot convert unknown item type");
@@ -714,8 +726,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
-		    !rte_flow_desc_action[src->type].name)
+		/**
+		 * allow PMD private flow action
+		 */
+		if (((int)src->type >= 0) &&
+		    ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
+		    !rte_flow_desc_action[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				 src, "cannot convert unknown action type");
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v5 2/3] ethdev: tunnel offload model
  2020-10-15 12:41 ` [dpdk-dev] [PATCH v5 0/3] " Gregory Etelson
  2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-10-15 12:41   ` Gregory Etelson
  2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
  2020-10-15 22:47   ` [dpdk-dev] [PATCH v5 0/3] Tunnel Offload API Ferruh Yigit
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-15 12:41 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Eli Britstein, Ori Kam,
	Viacheslav Ovsiienko, Ray Kinsella, Neil Horman, Thomas Monjalon,
	Ferruh Yigit, Andrew Rybchenko

From: Eli Britstein <elibr@mellanox.com>

Rte_flow API provides the building blocks for vendor agnostic flow
classification offloads.  The rte_flow match and action primitives are
fine grained, thus enabling DPDK applications the flexibility to
offload network stacks and complex pipelines.

Applications wishing to offload complex data structures (e.g. tunnel
virtual ports) are required to use the rte_flow primitives, such as
group, meta, mark, tag and others to model their high level objects.

The hardware model design for high level software objects is not
trivial.  Furthermore, an optimal design is often vendor specific.

The goal of this API is to provide applications with the hardware
offload model for common high level software objects which is optimal
in regards to the underlying hardware.

Tunnel ports are the first of such objects.

Tunnel ports
------------
Ingress processing of tunneled traffic requires the classification of
the tunnel type followed by a decap action.

In software, once a packet is decapsulated the in_port field is
changed to a virtual port representing the tunnel type. The outer
header fields are stored as packet metadata members and may be matched
by proceeding flows.

Openvswitch, for example, uses two flows:
1. classification flow - setting the virtual port representing the
tunnel type For example: match on udp port 4789
actions=tnl_pop(vxlan_vport)
2. steering flow according to outer and inner header matches match on
in_port=vxlan_vport and outer/inner header matches actions=forward to
p ort X The benefits of multi-flow tables are described in [1].

Offloading tunnel ports
-----------------------
Tunnel ports introduce a new stateless field that can be matched on.
Currently the rte_flow library provides an API to encap, decap and
match on tunnel headers. However, there is no rte_flow primitive to
set and match tunnel virtual ports.

There are several possible hardware models for offloading virtual
tunnel port flows including, but not limited to, the following:
1. Setting the virtual port on a hw register using the
rte_flow_action_mark/ rte_flow_action_tag/rte_flow_set_meta objects.
2. Mapping a virtual port to an rte_flow group
3. Avoiding the need to match on transient objects by merging
multi-table flows to a single rte_flow rule.

Every approach has its pros and cons.  The preferred approach should
take into account the entire system architecture and is very often
vendor specific.

The proposed rte_flow_tunnel_decap_set helper function (drafted below)
is designed to provide a common, vendor agnostic, API for setting the
virtual port value.  The helper API enables PMD implementations to
return vendor specific combination of rte_flow actions realizing the
vendor's hardware model for setting a tunnel port.  Applications may
append the list of actions returned from the helper function when
creating an rte_flow rule in hardware.

Similarly, the rte_flow_tunnel_match helper (drafted below)
allows for multiple hardware implementations to return a list of
fte_flow items.

Miss handling
-------------
Packets going through multiple rte_flow groups are exposed to hw
misses due to partial packet processing. In such cases, the software
should continue the packet's processing from the point where the
hardware missed.

We propose a generic rte_flow_restore structure providing the state
that was stored in hardware when the packet missed.

Currently, the structure will provide the tunnel state of the packet
that missed, namely:
1. The group id that missed
2. The tunnel port that missed
3. Tunnel information that was stored in memory (due to decap action).
In the future, we may add additional fields as more state may be
stored in the device memory (e.g. ct_state).

Applications may query the state via a new
rte_flow_tunnel_get_restore_info(mbuf) API, thus allowing
a vendor specific implementation.

VXLAN Code example:
Assume application needs to do inner NAT on VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

First VXLAN packet that arrives matches the rule in group 0 and jumps
to group 3 In group 3 the packet will miss since there is no flow to
match and will be uploaded to application.  Application  will call
rte_flow_get_restore_info() to get the packet outer header.
Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of rules will be that VXLAN packet with vni=10, outer IPv4
dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received decaped
on queue 3 with IPv4 dst=186.1.1.1

Note: Packet in group 3 is considered decaped. All actions in that
group will be done on header that was inner before decap. Application
may specify outer header to be matched on.  It's PMD responsibility to
translate these items to outer metadata.

API usage:
/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);

/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);
/**
  * 4. after flow creation application does not need to keep tunnel
  * action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);

/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id, pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v5:

* rebase to next-net
---
 doc/guides/prog_guide/rte_flow.rst       | 105 ++++++++++++
 doc/guides/rel_notes/release_20_11.rst   |   5 +
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 6 files changed, 454 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index d0dd7a00f1..72ec789c27 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3121,6 +3121,111 @@ operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Provide software application with unified rules model for tunneled traffic
+regardless underlying hardware.
+
+ - The model introduces a concept of a virtual tunnel port (VTP).
+ - The model uses VTP to offload ingress tunneled network traffic 
+   with RTE flow rules.
+ - The model is implemented as set of helper functions. Each PMD
+   implements VTP offload according to underlying hardware offload
+   capabilities.  Applications must query PMD for VTP flow
+   items / actions before using in creation of a VTP flow rule.
+
+The model components:
+
+- Virtual Tunnel Port (VTP) is a stateless software object that
+  describes tunneled network traffic.  VTP object usually contains
+  descriptions of outer headers, tunnel headers and inner headers.
+- Tunnel Steering flow Rule (TSR) detects tunneled packets and
+  delegates them to tunnel processing infrastructure, implemented
+  in PMD for optimal hardware utilization, for further processing.
+- Tunnel Matching flow Rule (TMR) verifies packet configuration and
+  runs offload actions in case of a match.
+
+Application actions:
+
+1 Initialize VTP object according to tunnel network parameters.
+
+2 Create TSR flow rule.
+
+2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
+
+  .. code-block:: c
+
+    int
+    rte_flow_tunnel_decap_set(uint16_t port_id,
+                              struct rte_flow_tunnel *tunnel,
+                              struct rte_flow_action **pmd_actions,
+                              uint32_t *num_of_pmd_actions,
+                              struct rte_flow_error *error);
+
+2.2 Integrate PMD actions into TSR actions list.
+
+2.3 Create TSR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
+
+3 Create TMR flow rule.
+
+3.1 Query PMD for VTP items. Application can query for VTP items more than once.
+
+    .. code-block:: c
+
+      int
+      rte_flow_tunnel_match(uint16_t port_id,
+                            struct rte_flow_tunnel *tunnel,
+                            struct rte_flow_item **pmd_items,
+                            uint32_t *num_of_pmd_items,
+                            struct rte_flow_error *error);
+
+3.2 Integrate PMD items into TMR items list.
+
+3.3 Create TMR flow rule.
+
+    .. code-block:: console
+
+      flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
+
+The model provides helper function call to restore packets that miss
+tunnel TMR rules to its original state:
+
+.. code-block:: c
+
+  int
+  rte_flow_get_restore_info(uint16_t port_id,
+                            struct rte_mbuf *mbuf,
+                            struct rte_flow_restore_info *info,
+                            struct rte_flow_error *error);
+
+rte_tunnel object filled by the call inside
+``rte_flow_restore_info *info parameter`` can be used by the application
+to create new TMR rule for that tunnel.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+Application can release the actionsfter TSR rule was created.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release the items after rule was created. However, if the application
+needs to create additional TMR rule for the same tunnel it will need
+to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 55bcb5e91f..a0c6f7264d 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -97,6 +97,11 @@ New Features
   * Flow rule verification was updated to accept private PMD
     items and actions.
 
+* **Added generic API to offload tunneled traffic and restore missed packet.**
+
+  * Added a new hardware independent helper API to RTE flow library that
+    offloads tunneled traffic and restores missed packets.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index 31242dabbf..734657737f 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -236,6 +236,11 @@ EXPERIMENTAL {
 	rte_flow_shared_action_destroy;
 	rte_flow_shared_action_query;
 	rte_flow_shared_action_update;
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
 };
 
 INTERNAL {
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 411da1940e..f2433677c8 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1097,3 +1097,115 @@ rte_flow_shared_action_query(uint16_t port_id,
 				       data, error);
 	return flow_err(port_id, ret, error);
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *restore_info,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index d3e8d8a9c2..2471ae740d 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3611,6 +3611,201 @@ rte_flow_shared_action_query(uint16_t port_id,
 			     void *data,
 			     struct rte_flow_error *error);
 
+/* Tunnel has a type and the key information. */
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type	type;
+	uint64_t tun_id; /**< Tunnel identification. */
+
+	RTE_STD_C11
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	};
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+	uint16_t   tun_flags; /**< Tunnel flags. */
+
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+
+	/**
+	 * the following members are required to restore packet
+	 * after miss
+	 */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	uint32_t label; /**< Flow Label for IPv6. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID where packed missed */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *info,
+			  struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 58f56b0262..bd5ffc0bb1 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -131,6 +131,38 @@ struct rte_flow_ops {
 		 const struct rte_flow_shared_action *shared_action,
 		 void *data,
 		 struct rte_flow_error *error);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v5 3/3] app/testpmd: add commands for tunnel offload API
  2020-10-15 12:41 ` [dpdk-dev] [PATCH v5 0/3] " Gregory Etelson
  2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
  2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 2/3] ethdev: tunnel offload model Gregory Etelson
@ 2020-10-15 12:41   ` Gregory Etelson
  2020-10-15 22:47   ` [dpdk-dev] [PATCH v5 0/3] Tunnel Offload API Ferruh Yigit
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-15 12:41 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Ori Kam, Wenzhuo Lu,
	Beilei Xing, Bernard Iremonger

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:

* Create application tunnel:
flow tunnel create <port> type <tunnel type>
On success, the command creates application tunnel object and returns
the tunnel descriptor. Tunnel descriptor is used in subsequent flow
creation commands to reference the tunnel.

* Create tunnel steering flow rule:
tunnel_set <tunnel descriptor> parameter used with steering rule
template.

* Create tunnel matching flow rule:
tunnel_match <tunnel descriptor> used with matching rule template.

* If tunnel steering rule was offloaded, outer header of a partially
offloaded packet is restored after miss.

Example:
test packet=
<Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
<UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
<VXLAN  NextProtocol=Ethernet vni=0x0 |
<Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
<ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
>>> len(packet)
92

testpmd> flow flush 0
testpmd> port 0/queue 0: received 1 packets
src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow tunnel 0 type vxlan
port 0: flow tunnel #1 type vxlan
testpmd> flow create 0 ingress group 0 tunnel_set 1
         pattern eth /ipv4 / udp dst is 4789 / vxlan / end
         actions  jump group 0 / end
Flow rule #0 created
testpmd> port 0/queue 0: received 1 packets
tunnel restore info: - vxlan tunnel - outer header present # <--
  src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow create 0 ingress group 0 tunnel_match 1
         pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
         end
         actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
         queue index 0 / end
Flow rule #1 created
testpmd> port 0/queue 0: received 1 packets
  src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
length=42

* Destroy flow tunnel
flow tunnel destroy <port> id <tunnel id>

* Show existing flow tunnels
flow tunnel list <port>

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
v2:
* introduce testpmd support for tunnel offload API

v3:
* update flow tunnel commands

v5:
* rebase to next-net
---
 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 252 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 6 files changed, 532 insertions(+), 13 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 84bba0f29e..ce7e9fc079 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -74,6 +74,14 @@ enum index {
 	LIST,
 	AGED,
 	ISOLATE,
+	TUNNEL,
+
+	/* Tunnel argumens. */
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	TUNNEL_LIST,
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
 
 	/* Destroy arguments. */
 	DESTROY_RULE,
@@ -93,6 +101,8 @@ enum index {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 
 	/* Shared action arguments */
 	SHARED_ACTION_CREATE,
@@ -711,6 +721,7 @@ struct buffer {
 		} sa; /* Shared action query arguments */
 		struct {
 			struct rte_flow_attr attr;
+			struct tunnel_ops tunnel_ops;
 			struct rte_flow_item *pattern;
 			struct rte_flow_action *actions;
 			uint32_t pattern_n;
@@ -787,10 +798,32 @@ static const enum index next_vc_attr[] = {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 	PATTERN,
 	ZERO,
 };
 
+static const enum index tunnel_create_attr[] = {
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_destroy_attr[] = {
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_list_attr[] = {
+	TUNNEL_LIST,
+	END,
+	ZERO,
+};
+
 static const enum index next_destroy_attr[] = {
 	DESTROY_RULE,
 	END,
@@ -1639,6 +1672,9 @@ static int parse_aged(struct context *, const struct token *,
 static int parse_isolate(struct context *, const struct token *,
 			 const char *, unsigned int,
 			 void *, unsigned int);
+static int parse_tunnel(struct context *, const struct token *,
+			const char *, unsigned int,
+			void *, unsigned int);
 static int parse_int(struct context *, const struct token *,
 		     const char *, unsigned int,
 		     void *, unsigned int);
@@ -1840,7 +1876,8 @@ static const struct token token_list[] = {
 			      LIST,
 			      AGED,
 			      QUERY,
-			      ISOLATE)),
+			      ISOLATE,
+			      TUNNEL)),
 		.call = parse_init,
 	},
 	/* Top-level command. */
@@ -1951,6 +1988,49 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY(struct buffer, port)),
 		.call = parse_isolate,
 	},
+	[TUNNEL] = {
+		.name = "tunnel",
+		.help = "new tunnel API",
+		.next = NEXT(NEXT_ENTRY
+			     (TUNNEL_CREATE, TUNNEL_LIST, TUNNEL_DESTROY)),
+		.call = parse_tunnel,
+	},
+	/* Tunnel arguments. */
+	[TUNNEL_CREATE] = {
+		.name = "create",
+		.help = "create new tunnel object",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_CREATE_TYPE] = {
+		.name = "type",
+		.help = "create new tunnel",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(FILE_PATH)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, type)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY] = {
+		.name = "destroy",
+		.help = "destroy tunel",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY_ID] = {
+		.name = "id",
+		.help = "tunnel identifier to testroy",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_LIST] = {
+		.name = "list",
+		.help = "list existing tunnels",
+		.next = NEXT(tunnel_list_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
 	/* Destroy arguments. */
 	[DESTROY_RULE] = {
 		.name = "rule",
@@ -2014,6 +2094,20 @@ static const struct token token_list[] = {
 		.next = NEXT(next_vc_attr),
 		.call = parse_vc,
 	},
+	[TUNNEL_SET] = {
+		.name = "tunnel_set",
+		.help = "tunnel steer rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
+	[TUNNEL_MATCH] = {
+		.name = "tunnel_match",
+		.help = "tunnel match rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
 	/* Validate/create pattern. */
 	[PATTERN] = {
 		.name = "pattern",
@@ -4477,12 +4571,28 @@ parse_vc(struct context *ctx, const struct token *token,
 		return len;
 	}
 	ctx->objdata = 0;
-	ctx->object = &out->args.vc.attr;
+	switch (ctx->curr) {
+	default:
+		ctx->object = &out->args.vc.attr;
+		break;
+	case TUNNEL_SET:
+	case TUNNEL_MATCH:
+		ctx->object = &out->args.vc.tunnel_ops;
+		break;
+	}
 	ctx->objmask = NULL;
 	switch (ctx->curr) {
 	case GROUP:
 	case PRIORITY:
 		return len;
+	case TUNNEL_SET:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.actions = 1;
+		return len;
+	case TUNNEL_MATCH:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.items = 1;
+		return len;
 	case INGRESS:
 		out->args.vc.attr.ingress = 1;
 		return len;
@@ -6090,6 +6200,47 @@ parse_isolate(struct context *ctx, const struct token *token,
 	return len;
 }
 
+static int
+parse_tunnel(struct context *ctx, const struct token *token,
+	     const char *str, unsigned int len,
+	     void *buf, unsigned int size)
+{
+	struct buffer *out = buf;
+
+	/* Token name must match. */
+	if (parse_default(ctx, token, str, len, NULL, 0) < 0)
+		return -1;
+	/* Nothing else to do if there is no buffer. */
+	if (!out)
+		return len;
+	if (!out->command) {
+		if (ctx->curr != TUNNEL)
+			return -1;
+		if (sizeof(*out) > size)
+			return -1;
+		out->command = ctx->curr;
+		ctx->objdata = 0;
+		ctx->object = out;
+		ctx->objmask = NULL;
+	} else {
+		switch (ctx->curr) {
+		default:
+			break;
+		case TUNNEL_CREATE:
+		case TUNNEL_DESTROY:
+		case TUNNEL_LIST:
+			out->command = ctx->curr;
+			break;
+		case TUNNEL_CREATE_TYPE:
+		case TUNNEL_DESTROY_ID:
+			ctx->object = &out->args.vc.tunnel_ops;
+			break;
+		}
+	}
+
+	return len;
+}
+
 /**
  * Parse signed/unsigned integers 8 to 64-bit long.
  *
@@ -7130,11 +7281,13 @@ cmd_flow_parsed(const struct buffer *in)
 		break;
 	case VALIDATE:
 		port_flow_validate(in->port, &in->args.vc.attr,
-				   in->args.vc.pattern, in->args.vc.actions);
+				   in->args.vc.pattern, in->args.vc.actions,
+				   &in->args.vc.tunnel_ops);
 		break;
 	case CREATE:
 		port_flow_create(in->port, &in->args.vc.attr,
-				 in->args.vc.pattern, in->args.vc.actions);
+				 in->args.vc.pattern, in->args.vc.actions,
+				 &in->args.vc.tunnel_ops);
 		break;
 	case DESTROY:
 		port_flow_destroy(in->port, in->args.destroy.rule_n,
@@ -7160,6 +7313,15 @@ cmd_flow_parsed(const struct buffer *in)
 	case AGED:
 		port_flow_aged(in->port, in->args.aged.destroy);
 		break;
+	case TUNNEL_CREATE:
+		port_flow_tunnel_create(in->port, &in->args.vc.tunnel_ops);
+		break;
+	case TUNNEL_DESTROY:
+		port_flow_tunnel_destroy(in->port, in->args.vc.tunnel_ops.id);
+		break;
+	case TUNNEL_LIST:
+		port_flow_tunnel_list(in->port);
+		break;
 	default:
 		break;
 	}
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 2c00b55440..6bce791dfb 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1521,6 +1521,115 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 
 /* Generic flow management functions. */
 
+static struct port_flow_tunnel *
+port_flow_locate_tunnel_id(struct rte_port *port, uint32_t port_tunnel_id)
+{
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (flow_tunnel->id == port_tunnel_id)
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+const char *
+port_flow_tunnel_type(struct rte_flow_tunnel *tunnel)
+{
+	const char *type;
+	switch (tunnel->type) {
+	default:
+		type = "unknown";
+		break;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		type = "vxlan";
+		break;
+	}
+
+	return type;
+}
+
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (!memcmp(&flow_tunnel->tunnel, tun, sizeof(*tun)))
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+void port_flow_tunnel_list(portid_t port_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		printf("port %u tunnel #%u type=%s",
+			port_id, flt->id, port_flow_tunnel_type(&flt->tunnel));
+		if (flt->tunnel.tun_id)
+			printf(" id=%lu", flt->tunnel.tun_id);
+		printf("\n");
+	}
+}
+
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->id == tunnel_id)
+			break;
+	}
+	if (flt) {
+		LIST_REMOVE(flt, chain);
+		free(flt);
+		printf("port %u: flow tunnel #%u destroyed\n",
+			port_id, tunnel_id);
+	}
+}
+
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops)
+{
+	struct rte_port *port = &ports[port_id];
+	enum rte_flow_item_type	type;
+	struct port_flow_tunnel *flt;
+
+	if (!strcmp(ops->type, "vxlan"))
+		type = RTE_FLOW_ITEM_TYPE_VXLAN;
+	else {
+		printf("cannot offload \"%s\" tunnel type\n", ops->type);
+		return;
+	}
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->tunnel.type == type)
+			break;
+	}
+	if (!flt) {
+		flt = calloc(1, sizeof(*flt));
+		if (!flt) {
+			printf("failed to allocate port flt object\n");
+			return;
+		}
+		flt->tunnel.type = type;
+		flt->id = LIST_EMPTY(&port->flow_tunnel_list) ? 1 :
+				  LIST_FIRST(&port->flow_tunnel_list)->id + 1;
+		LIST_INSERT_HEAD(&port->flow_tunnel_list, flt, chain);
+	}
+	printf("port %d: flow tunnel #%u type %s\n",
+		port_id, flt->id, ops->type);
+}
+
 /** Generate a port_flow entry from attributes/pattern/actions. */
 static struct port_flow *
 port_flow_new(const struct rte_flow_attr *attr,
@@ -1860,20 +1969,137 @@ port_shared_action_query(portid_t port_id, uint32_t id)
 	}
 	return ret;
 }
+static struct port_flow_tunnel *
+port_flow_tunnel_offload_cmd_prep(portid_t port_id,
+				  const struct rte_flow_item *pattern,
+				  const struct rte_flow_action *actions,
+				  const struct tunnel_ops *tunnel_ops)
+{
+	int ret;
+	struct rte_port *port;
+	struct port_flow_tunnel *pft;
+	struct rte_flow_error error;
+
+	port = &ports[port_id];
+	pft = port_flow_locate_tunnel_id(port, tunnel_ops->id);
+	if (!pft) {
+		printf("failed to locate port flow tunnel #%u\n",
+			tunnel_ops->id);
+		return NULL;
+	}
+	if (tunnel_ops->actions) {
+		uint32_t num_actions;
+		const struct rte_flow_action *aptr;
+
+		ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
+						&pft->pmd_actions,
+						&pft->num_pmd_actions,
+						&error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (aptr = actions, num_actions = 1;
+		     aptr->type != RTE_FLOW_ACTION_TYPE_END;
+		     aptr++, num_actions++);
+		pft->actions = malloc(
+				(num_actions +  pft->num_pmd_actions) *
+				sizeof(actions[0]));
+		if (!pft->actions) {
+			rte_flow_tunnel_action_decap_release(
+					port_id, pft->actions,
+					pft->num_pmd_actions, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->actions, pft->pmd_actions,
+			   pft->num_pmd_actions * sizeof(actions[0]));
+		rte_memcpy(pft->actions + pft->num_pmd_actions, actions,
+			   num_actions * sizeof(actions[0]));
+	}
+	if (tunnel_ops->items) {
+		uint32_t num_items;
+		const struct rte_flow_item *iptr;
+
+		ret = rte_flow_tunnel_match(port_id, &pft->tunnel,
+					    &pft->pmd_items,
+					    &pft->num_pmd_items,
+					    &error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (iptr = pattern, num_items = 1;
+		     iptr->type != RTE_FLOW_ITEM_TYPE_END;
+		     iptr++, num_items++);
+		pft->items = malloc((num_items + pft->num_pmd_items) *
+				    sizeof(pattern[0]));
+		if (!pft->items) {
+			rte_flow_tunnel_item_release(
+					port_id, pft->pmd_items,
+					pft->num_pmd_items, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->items, pft->pmd_items,
+			   pft->num_pmd_items * sizeof(pattern[0]));
+		rte_memcpy(pft->items + pft->num_pmd_items, pattern,
+			   num_items * sizeof(pattern[0]));
+	}
+
+	return pft;
+}
+
+static void
+port_flow_tunnel_offload_cmd_release(portid_t port_id,
+				     const struct tunnel_ops *tunnel_ops,
+				     struct port_flow_tunnel *pft)
+{
+	struct rte_flow_error error;
+
+	if (tunnel_ops->actions) {
+		free(pft->actions);
+		rte_flow_tunnel_action_decap_release(
+			port_id, pft->pmd_actions,
+			pft->num_pmd_actions, &error);
+		pft->actions = NULL;
+		pft->pmd_actions = NULL;
+	}
+	if (tunnel_ops->items) {
+		free(pft->items);
+		rte_flow_tunnel_item_release(port_id, pft->pmd_items,
+					     pft->num_pmd_items,
+					     &error);
+		pft->items = NULL;
+		pft->pmd_items = NULL;
+	}
+}
 
 /** Validate flow rule. */
 int
 port_flow_validate(portid_t port_id,
 		   const struct rte_flow_attr *attr,
 		   const struct rte_flow_item *pattern,
-		   const struct rte_flow_action *actions)
+		   const struct rte_flow_action *actions,
+		   const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	/* Poisoning to make sure PMDs update it in case of error. */
 	memset(&error, 0x11, sizeof(error));
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	if (rte_flow_validate(port_id, attr, pattern, actions, &error))
 		return port_flow_complain(&error);
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule validated\n");
 	return 0;
 }
@@ -1903,13 +2129,15 @@ int
 port_flow_create(portid_t port_id,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item *pattern,
-		 const struct rte_flow_action *actions)
+		 const struct rte_flow_action *actions,
+		 const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow *flow;
 	struct rte_port *port;
 	struct port_flow *pf;
 	uint32_t id = 0;
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	port = &ports[port_id];
 	if (port->flow_list) {
@@ -1920,6 +2148,16 @@ port_flow_create(portid_t port_id,
 		}
 		id = port->flow_list->id + 1;
 	}
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	pf = port_flow_new(attr, pattern, actions, &error);
 	if (!pf)
 		return port_flow_complain(&error);
@@ -1935,6 +2173,8 @@ port_flow_create(portid_t port_id,
 	pf->id = id;
 	pf->flow = flow;
 	port->flow_list = pf;
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule #%u created\n", pf->id);
 	return 0;
 }
@@ -2244,7 +2484,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group)
 		       pf->rule.attr->egress ? 'e' : '-',
 		       pf->rule.attr->transfer ? 't' : '-');
 		while (item->type != RTE_FLOW_ITEM_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
+			if ((uint32_t)item->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)item->type,
 					  NULL) <= 0)
@@ -2255,7 +2497,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group)
 		}
 		printf("=>");
 		while (action->type != RTE_FLOW_ACTION_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
+			if ((uint32_t)action->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)action->type,
 					  NULL) <= 0)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index ccba71c076..93b9e4dde1 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3567,6 +3567,8 @@ init_port_dcb_config(portid_t pid,
 static void
 init_port(void)
 {
+	int i;
+
 	/* Configuration of Ethernet ports. */
 	ports = rte_zmalloc("testpmd: ports",
 			    sizeof(struct rte_port) * RTE_MAX_ETHPORTS,
@@ -3576,7 +3578,8 @@ init_port(void)
 				"rte_zmalloc(%d struct rte_port) failed\n",
 				RTE_MAX_ETHPORTS);
 	}
-
+	for (i = 0; i < RTE_MAX_ETHPORTS; i++)
+		LIST_INIT(&ports[i].flow_tunnel_list);
 	/* Initialize ports NUMA structures */
 	memset(port_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
 	memset(rxring_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a4dfd4fc23..2fc69497f5 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -12,6 +12,7 @@
 #include <rte_gro.h>
 #include <rte_gso.h>
 #include <cmdline.h>
+#include <sys/queue.h>
 
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
@@ -150,6 +151,26 @@ struct port_shared_action {
 	struct rte_flow_shared_action *action;	/**< Shared action handle. */
 };
 
+struct port_flow_tunnel {
+	LIST_ENTRY(port_flow_tunnel) chain;
+	struct rte_flow_action *pmd_actions;
+	struct rte_flow_item   *pmd_items;
+	uint32_t id;
+	uint32_t num_pmd_actions;
+	uint32_t num_pmd_items;
+	struct rte_flow_tunnel tunnel;
+	struct rte_flow_action *actions;
+	struct rte_flow_item *items;
+};
+
+struct tunnel_ops {
+	uint32_t id;
+	char type[16];
+	uint32_t enabled:1;
+	uint32_t actions:1;
+	uint32_t items:1;
+};
+
 /**
  * The data structure associated with each port.
  */
@@ -182,6 +203,7 @@ struct rte_port {
 	struct port_flow        *flow_list; /**< Associated flows. */
 	struct port_shared_action *actions_list;
 	/**< Associated shared actions. */
+	LIST_HEAD(, port_flow_tunnel) flow_tunnel_list;
 	const struct rte_eth_rxtx_callback *rx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	const struct rte_eth_rxtx_callback *tx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	/**< metadata value to insert in Tx packets. */
@@ -771,11 +793,13 @@ int port_shared_action_update(portid_t port_id, uint32_t id,
 int port_flow_validate(portid_t port_id,
 		       const struct rte_flow_attr *attr,
 		       const struct rte_flow_item *pattern,
-		       const struct rte_flow_action *actions);
+		       const struct rte_flow_action *actions,
+		       const struct tunnel_ops *tunnel_ops);
 int port_flow_create(portid_t port_id,
 		     const struct rte_flow_attr *attr,
 		     const struct rte_flow_item *pattern,
-		     const struct rte_flow_action *actions);
+		     const struct rte_flow_action *actions,
+		     const struct tunnel_ops *tunnel_ops);
 int port_shared_action_query(portid_t port_id, uint32_t id);
 void update_age_action_context(const struct rte_flow_action *actions,
 		     struct port_flow *pf);
@@ -786,6 +810,12 @@ int port_flow_query(portid_t port_id, uint32_t rule,
 		    const struct rte_flow_action *action);
 void port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group);
 void port_flow_aged(portid_t port_id, uint8_t destroy);
+const char *port_flow_tunnel_type(struct rte_flow_tunnel *tunnel);
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun);
+void port_flow_tunnel_list(portid_t port_id);
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id);
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops);
 int port_flow_isolate(portid_t port_id, int set);
 
 void rx_ring_desc_display(portid_t port_id, queueid_t rxq_id, uint16_t rxd_id);
diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index 8488fa1a8f..781a813759 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -48,18 +48,49 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 	       is_rx ? "received" : "sent",
 	       (unsigned int) nb_pkts);
 	for (i = 0; i < nb_pkts; i++) {
+		int ret;
+		struct rte_flow_error error;
+		struct rte_flow_restore_info info = { 0, };
+
 		mb = pkts[i];
 		eth_hdr = rte_pktmbuf_read(mb, 0, sizeof(_eth_hdr), &_eth_hdr);
 		eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
-		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
-
+		ret = rte_flow_get_restore_info(port_id, mb, &info, &error);
+		if (!ret) {
+			printf("restore info:");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
+				struct port_flow_tunnel *port_tunnel;
+
+				port_tunnel = port_flow_locate_tunnel
+					      (port_id, &info.tunnel);
+				printf(" - tunnel");
+				if (port_tunnel)
+					printf(" #%u", port_tunnel->id);
+				else
+					printf(" %s", "-none-");
+				printf(" type %s",
+					port_flow_tunnel_type(&info.tunnel));
+			} else {
+				printf(" - no tunnel info");
+			}
+			if (info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED)
+				printf(" - outer header present");
+			else
+				printf(" - no outer header");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_GROUP_ID)
+				printf(" - miss group %u", info.group_id);
+			else
+				printf(" - no miss group");
+			printf("\n");
+		}
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
 		printf(" - type=0x%04x - length=%u - nb_segs=%d",
 		       eth_type, (unsigned int) mb->pkt_len,
 		       (int)mb->nb_segs);
+		ol_flags = mb->ol_flags;
 		if (ol_flags & PKT_RX_RSS_HASH) {
 			printf(" - RSS hash=0x%x", (unsigned int) mb->hash.rss);
 			printf(" - RSS queue=0x%x", (unsigned int) queue);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 43c0ea0599..05a4446757 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3749,6 +3749,45 @@ following sections.
 
    flow aged {port_id} [destroy]
 
+- Tunnel offload - create a tunnel stub::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+- Tunnel offload - destroy a tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+- Tunnel offload - list port tunnel stubs::
+
+   flow tunnel list {port_id}
+
+Creating a tunnel stub for offload
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel create`` setup a tunnel stub for tunnel offload flow rules::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+If successful, it will return a tunnel stub ID usable with other commands::
+
+   port [...]: flow tunnel #[...] type [...]
+
+Tunnel stub ID is relative to a port.
+
+Destroying tunnel offload stub
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel destroy`` destroy port tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+Listing tunnel offload stubs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel list`` list port tunnel offload stubs::
+
+   flow tunnel list {port_id}
+
 Validating flow rules
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -3795,6 +3834,7 @@ to ``rte_flow_create()``::
 
    flow create {port_id}
       [group {group_id}] [priority {level}] [ingress] [egress] [transfer]
+      [tunnel_set {tunnel_id}] [tunnel_match {tunnel_id}]
       pattern {item} [/ {item} [...]] / end
       actions {action} [/ {action} [...]] / end
 
@@ -3809,6 +3849,7 @@ Otherwise it will show an error message of the form::
 Parameters describe in the following order:
 
 - Attributes (*group*, *priority*, *ingress*, *egress*, *transfer* tokens).
+- Tunnel offload specification (tunnel_set, tunnel_match)
 - A matching pattern, starting with the *pattern* token and terminated by an
   *end* pattern item.
 - Actions, starting with the *actions* token and terminated by an *end*
@@ -3852,6 +3893,14 @@ Most rules affect RX therefore contain the ``ingress`` token::
 
    testpmd> flow create 0 ingress pattern [...]
 
+Tunnel offload
+^^^^^^^^^^^^^^
+
+Indicate tunnel offload rule type
+
+- ``tunnel_set {tunnel_id}``: mark rule as tunnel offload decap_set type.
+- ``tunnel_match {tunnel_id}``:  mark rule as tunel offload match type.
+
 Matching pattern
 ^^^^^^^^^^^^^^^^
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/3] Tunnel Offload API
  2020-10-15 12:41 ` [dpdk-dev] [PATCH v5 0/3] " Gregory Etelson
                     ` (2 preceding siblings ...)
  2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
@ 2020-10-15 22:47   ` Ferruh Yigit
  3 siblings, 0 replies; 95+ messages in thread
From: Ferruh Yigit @ 2020-10-15 22:47 UTC (permalink / raw)
  To: Gregory Etelson, dev; +Cc: matan, rasland, elibr, ozsh

On 10/15/2020 1:41 PM, Gregory Etelson wrote:
> Tunnel Offload API provides hardware independent, unified model
> to offload tunneled traffic. Key model elements are:
>   - apply matches to both outer and inner packet headers
>     during entire offload procedure;
>   - restore outer header of partially offloaded packet;
>   - model is implemented as a set of helper functions.
> 
>   v2:
>   * documentation updates
>   * MLX5 PMD implementation for tunnel offload
>   * testpmd updates for tunnel offload
> 
>   v3:
>   * documentation updates
>   * MLX5 PMD updates
>   * testpmd updates
> 
>   v4:
>   * updated patch: allow negative values in flow rule types
> 
> v5:
>   * rebase to next-net
> 
> Eli Britstein (1):
>    ethdev: tunnel offload model
> 
> Gregory Etelson (2):
>    ethdev: allow negative values in flow rule types
>    app/testpmd: add commands for tunnel offload API
> 

32 build fails [1], can you please check?

Also there are a few typos [2].


[1]
../app/test-pmd/config.c: In function ‘port_flow_tunnel_list’:
../app/test-pmd/config.c:1580:18: error: format ‘%lu’ expects argument of type 
‘long unsigned int’, but argument 2 has type ‘uint64_t’ {aka ‘long long unsigned 
int’} [-Werror=format=]

1580 |    printf(" id=%lu", flt->tunnel.tun_id);
      |                ~~^   ~~~~~~~~~~~~~~~~~~
      |                  |              |
      |                  |              uint64_t {aka long long unsigned int}
      |                  long unsigned int
      |                %llu


[2]
### ethdev: allow negative values in flow rule types

WARNING:TYPO_SPELLING: 'becase' may be misspelled - perhaps 'because'?
#17:
actions do not fit into this scheme becase PMD type values are

total: 0 errors, 1 warnings, 72 lines checked

### app/testpmd: add commands for tunnel offload API

WARNING:TYPO_SPELLING: 'argumens' may be misspelled - perhaps 'arguments'?
#87: FILE: app/test-pmd/cmdline_flow.c:79:
+       /* Tunnel argumens. */

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v6 0/3] Tunnel Offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (6 preceding siblings ...)
  2020-10-15 12:41 ` [dpdk-dev] [PATCH v5 0/3] " Gregory Etelson
@ 2020-10-16  8:55 ` Gregory Etelson
  2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
                     ` (2 more replies)
  2020-10-16 10:33 ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Gregory Etelson
                   ` (9 subsequent siblings)
  17 siblings, 3 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16  8:55 UTC (permalink / raw)
  To: dev; +Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde, asafp

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

 v2:
 * documentation updates
 * MLX5 PMD implementation for tunnel offload
 * testpmd updates for tunnel offload

 v3:
 * documentation updates
 * MLX5 PMD updates
 * testpmd updates

 v4:
 * updated patch: allow negative values in flow rule types

v5:
 * rebase to next-net

v6:
* update tunnel offload API documentation

Eli Britstein (1):
  ethdev: tunnel offload model

Gregory Etelson (2):
  ethdev: allow negative values in flow rule types
  app/testpmd: add commands for tunnel offload API

 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 252 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/prog_guide/rte_flow.rst          |  81 +++++++
 doc/guides/rel_notes/release_20_11.rst      |  10 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 lib/librte_ethdev/rte_ethdev_version.map    |   5 +
 lib/librte_ethdev/rte_flow.c                | 140 ++++++++++-
 lib/librte_ethdev/rte_flow.h                | 195 +++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h         |  32 +++
 12 files changed, 989 insertions(+), 19 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v6 1/3] ethdev: allow negative values in flow rule types
  2020-10-16  8:55 ` [dpdk-dev] [PATCH v6 " Gregory Etelson
@ 2020-10-16  8:55   ` Gregory Etelson
  2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 2/3] ethdev: tunnel offload model Gregory Etelson
  2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
  2 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16  8:55 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde, asafp,
	Ori Kam, Viacheslav Ovsiienko, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow item or action conversion library accepts positive known
element types with predefined sizes only. Private PMD items and
actions do not fit into this scheme because PMD type values are
negative, each PMD has it's own types numeration and element types and
their sizes are not visible at RTE level.  To resolve these
limitations the patch proposes this solution:
1. PMD can expose elements of pointer size only.  RTE flow
   conversion functions will use pointer size for each configuration
   object in private PMD element it processes;
2. RTE flow verification will not reject elements with negative type.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v4:
* update the 'Negative types' section in the rtre_flow.rst

* update the patch comment

v5:
* rebase to next-net
---
 doc/guides/prog_guide/rte_flow.rst     |  3 +++
 doc/guides/rel_notes/release_20_11.rst |  5 +++++
 lib/librte_ethdev/rte_flow.c           | 28 ++++++++++++++++++++------
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 6ee0d3a10a..7fb5ec9059 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2775,6 +2775,9 @@ identifiers they are not aware of.
 
 A method to generate them remains to be defined.
 
+Application may use PMD dynamic items or actions in flow rules. In that case
+size of configuration object in dynamic element must be a pointer size.
+
 Planned types
 ~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 79d9ebac4e..9155b468d6 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -116,6 +116,11 @@ New Features
   * Updated HWRM structures to 1.10.1.70 version.
   * Added TRUFLOW support for Stingray devices.
 
+* **Flow rules allowed to use private PMD items / actions.**
+
+  * Flow rule verification was updated to accept private PMD
+    items and actions.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 686fe40eaa..b74ea5593a 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -518,7 +518,11 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_item[item->type].size;
+		/**
+		 * allow PMD private flow item
+		 */
+		off = (int)item->type >= 0 ?
+		      rte_flow_desc_item[item->type].size : sizeof(void *);
 		rte_memcpy(buf, data, (size > off ? off : size));
 		break;
 	}
@@ -621,7 +625,11 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_action[action->type].size;
+		/**
+		 * allow PMD private flow action
+		 */
+		off = (int)action->type >= 0 ?
+		      rte_flow_desc_action[action->type].size : sizeof(void *);
 		rte_memcpy(buf, action->conf, (size > off ? off : size));
 		break;
 	}
@@ -663,8 +671,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
-		    !rte_flow_desc_item[src->type].name)
+		/**
+		 * allow PMD private flow item
+		 */
+		if (((int)src->type >= 0) &&
+			((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
+		    !rte_flow_desc_item[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
 				 "cannot convert unknown item type");
@@ -752,8 +764,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
-		    !rte_flow_desc_action[src->type].name)
+		/**
+		 * allow PMD private flow action
+		 */
+		if (((int)src->type >= 0) &&
+		    ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
+		    !rte_flow_desc_action[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				 src, "cannot convert unknown action type");
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v6 2/3] ethdev: tunnel offload model
  2020-10-16  8:55 ` [dpdk-dev] [PATCH v6 " Gregory Etelson
  2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-10-16  8:55   ` Gregory Etelson
  2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
  2 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16  8:55 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde, asafp,
	Eli Britstein, Ori Kam, Viacheslav Ovsiienko, Ray Kinsella,
	Neil Horman, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko

From: Eli Britstein <elibr@mellanox.com>

rte_flow API provides the building blocks for vendor-agnostic flow
classification offloads. The rte_flow "patterns" and "actions"
primitives are fine-grained, thus enabling DPDK applications the
flexibility to offload network stacks and complex pipelines.
Applications wishing to offload tunneled traffic are required to use
the rte_flow primitives, such as group, meta, mark, tag, and others to
model their high-level objects.  The hardware model design for
high-level software objects is not trivial.  Furthermore, an optimal
design is often vendor-specific.

When hardware offloads tunneled traffic in multi-group logic,
partially offloaded packets may arrive to the application after they
were modified in hardware. In this case, the application may need to
restore the original packet headers. Consider the following sequence:
The application decaps a packet in one group and jumps to a second
group where it tries to match on a 5-tuple, that will miss and send
the packet to the application. In this case, the application does not
receive the original packet but a modified one. Also, in this case,
the application cannot match on the outer header fields, such as VXLAN
vni and 5-tuple.

There are several possible ways to use rte_flow "patterns" and
"actions" to resolve the issues above. For example:
1 Mapping headers to a hardware registers using the
rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
2 Apply the decap only at the last offload stage after all the
"patterns" were matched and the packet will be fully offloaded.
Every approach has its pros and cons and is highly dependent on the
hardware vendor.  For example, some hardware may have a limited number
of registers while other hardware could not support inner actions and
must decap before accessing inner headers.

The tunnel offload model resolves these issues. The model goals are:
1 Provide a unified application API to offload tunneled traffic that
is capable to match on outer headers after decap.
2 Allow the application to restore the outer header of partially
offloaded packets.

The tunnel offload model does not introduce new elements to the
existing RTE flow model and is implemented as a set of helper
functions.

For the application to work with the tunnel offload API it
has to adjust flow rules in multi-table tunnel offload in the
following way:
1 Remove explicit call to decap action and replace it with PMD actions
obtained from rte_flow_tunnel_decap_and_set() helper.
2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
other rules in the tunnel offload sequence.

VXLAN Code example:

Assume application needs to do inner NAT on the VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

The first VXLAN packet that arrives matches the rule in group 0 and
jumps to group 3.  In group 3 the packet will miss since there is no
flow to match and will be sent to the application.  Application  will
call rte_flow_get_restore_info() to get the packet outer header.

Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of the rules will be that VXLAN packet with vni=10, outer
IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received
decapped on queue 3 with IPv4 dst=186.1.1.1

Note: The packet in group 3 is considered decapped. All actions in
that group will be done on the header that was inner before decap. The
application may specify an outer header to be matched on.  It's PMD
responsibility to translate these items to outer metadata.

API usage:

/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);
/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
  * 4. after flow creation application does not need to keep the
  * tunnel action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);
/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id,
                             pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v5:
* rebase to next-net

v6:
* update the patch comment
* update tunnel offload section in rte_flow.rst
---
 doc/guides/prog_guide/rte_flow.rst       |  78 +++++++++
 doc/guides/rel_notes/release_20_11.rst   |   5 +
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 6 files changed, 427 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 7fb5ec9059..8dc048c6f4 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3131,6 +3131,84 @@ operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+rte_flow API provides the building blocks for vendor-agnostic flow
+classification offloads. The rte_flow "patterns" and "actions"
+primitives are fine-grained, thus enabling DPDK applications the
+flexibility to offload network stacks and complex pipelines.
+Applications wishing to offload tunneled traffic are required to use
+the rte_flow primitives, such as group, meta, mark, tag, and others to
+model their high-level objects.  The hardware model design for
+high-level software objects is not trivial.  Furthermore, an optimal
+design is often vendor-specific.
+
+When hardware offloads tunneled traffic in multi-group logic,
+partially offloaded packets may arrive to the application after they
+were modified in hardware. In this case, the application may need to
+restore the original packet headers. Consider the following sequence:
+The application decaps a packet in one group and jumps to a second
+group where it tries to match on a 5-tuple, that will miss and send
+the packet to the application. In this case, the application does not
+receive the original packet but a modified one. Also, in this case,
+the application cannot match on the outer header fields, such as VXLAN
+vni and 5-tuple.
+
+There are several possible ways to use rte_flow "patterns" and
+"actions" to resolve the issues above. For example:
+
+1 Mapping headers to a hardware registers using the
+rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
+
+2 Apply the decap only at the last offload stage after all the
+"patterns" were matched and the packet will be fully offloaded.
+
+Every approach has its pros and cons and is highly dependent on the
+hardware vendor.  For example, some hardware may have a limited number
+of registers while other hardware could not support inner actions and
+must decap before accessing inner headers.
+
+The tunnel offload model resolves these issues. The model goals are:
+
+1 Provide a unified application API to offload tunneled traffic that
+is capable to match on outer headers after decap.
+
+2 Allow the application to restore the outer header of partially
+offloaded packets.
+
+The tunnel offload model does not introduce new elements to the
+existing RTE flow model and is implemented as a set of helper
+functions.
+
+For the application to work with the tunnel offload API it
+has to adjust flow rules in multi-table tunnel offload in the
+following way:
+
+1 Remove explicit call to decap action and replace it with PMD actions
+obtained from rte_flow_tunnel_decap_and_set() helper.
+
+2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
+other rules in the tunnel offload sequence.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release PMD items and actions after rule was created. However, if the
+application needs to create additional rule for the same tunnel it
+will need to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 9155b468d6..f125ce79dd 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -121,6 +121,11 @@ New Features
   * Flow rule verification was updated to accept private PMD
     items and actions.
 
+* **Added generic API to offload tunneled traffic and restore missed packet.**
+
+  * Added a new hardware independent helper API to RTE flow library that
+    offloads tunneled traffic and restores missed packets.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index f64c379ac2..8ddda2547f 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -239,6 +239,11 @@ EXPERIMENTAL {
 	rte_flow_shared_action_destroy;
 	rte_flow_shared_action_query;
 	rte_flow_shared_action_update;
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
 };
 
 INTERNAL {
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index b74ea5593a..380c5cae2c 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1143,3 +1143,115 @@ rte_flow_shared_action_query(uint16_t port_id,
 				       data, error);
 	return flow_err(port_id, ret, error);
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *restore_info,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 48395284b5..a8eac4deb8 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3620,6 +3620,201 @@ rte_flow_shared_action_query(uint16_t port_id,
 			     void *data,
 			     struct rte_flow_error *error);
 
+/* Tunnel has a type and the key information. */
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type	type;
+	uint64_t tun_id; /**< Tunnel identification. */
+
+	RTE_STD_C11
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	};
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+	uint16_t   tun_flags; /**< Tunnel flags. */
+
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+
+	/**
+	 * the following members are required to restore packet
+	 * after miss
+	 */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	uint32_t label; /**< Flow Label for IPv6. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID where packed missed */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *info,
+			  struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 58f56b0262..bd5ffc0bb1 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -131,6 +131,38 @@ struct rte_flow_ops {
 		 const struct rte_flow_shared_action *shared_action,
 		 void *data,
 		 struct rte_flow_error *error);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v6 3/3] app/testpmd: add commands for tunnel offload API
  2020-10-16  8:55 ` [dpdk-dev] [PATCH v6 " Gregory Etelson
  2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
  2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 2/3] ethdev: tunnel offload model Gregory Etelson
@ 2020-10-16  8:55   ` Gregory Etelson
  2 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16  8:55 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde, asafp,
	Ori Kam, Wenzhuo Lu, Beilei Xing, Bernard Iremonger

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:

* Create application tunnel:
flow tunnel create <port> type <tunnel type>
On success, the command creates application tunnel object and returns
the tunnel descriptor. Tunnel descriptor is used in subsequent flow
creation commands to reference the tunnel.

* Create tunnel steering flow rule:
tunnel_set <tunnel descriptor> parameter used with steering rule
template.

* Create tunnel matching flow rule:
tunnel_match <tunnel descriptor> used with matching rule template.

* If tunnel steering rule was offloaded, outer header of a partially
offloaded packet is restored after miss.

Example:
test packet=
<Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
<UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
<VXLAN  NextProtocol=Ethernet vni=0x0 |
<Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
<ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
>>> len(packet)
92

testpmd> flow flush 0
testpmd> port 0/queue 0: received 1 packets
src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow tunnel 0 type vxlan
port 0: flow tunnel #1 type vxlan
testpmd> flow create 0 ingress group 0 tunnel_set 1
         pattern eth /ipv4 / udp dst is 4789 / vxlan / end
         actions  jump group 0 / end
Flow rule #0 created
testpmd> port 0/queue 0: received 1 packets
tunnel restore info: - vxlan tunnel - outer header present # <--
  src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow create 0 ingress group 0 tunnel_match 1
         pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
         end
         actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
         queue index 0 / end
Flow rule #1 created
testpmd> port 0/queue 0: received 1 packets
  src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
length=42

* Destroy flow tunnel
flow tunnel destroy <port> id <tunnel id>

* Show existing flow tunnels
flow tunnel list <port>

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
v2:
* introduce testpmd support for tunnel offload API

v3:
* update flow tunnel commands

v5:
* rebase to next-net
---
 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 252 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 6 files changed, 532 insertions(+), 13 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 00c70a144a..b9a1f7178a 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -74,6 +74,14 @@ enum index {
 	LIST,
 	AGED,
 	ISOLATE,
+	TUNNEL,
+
+	/* Tunnel arguments. */
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	TUNNEL_LIST,
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
 
 	/* Destroy arguments. */
 	DESTROY_RULE,
@@ -93,6 +101,8 @@ enum index {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 
 	/* Shared action arguments */
 	SHARED_ACTION_CREATE,
@@ -713,6 +723,7 @@ struct buffer {
 		} sa; /* Shared action query arguments */
 		struct {
 			struct rte_flow_attr attr;
+			struct tunnel_ops tunnel_ops;
 			struct rte_flow_item *pattern;
 			struct rte_flow_action *actions;
 			uint32_t pattern_n;
@@ -789,10 +800,32 @@ static const enum index next_vc_attr[] = {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 	PATTERN,
 	ZERO,
 };
 
+static const enum index tunnel_create_attr[] = {
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_destroy_attr[] = {
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_list_attr[] = {
+	TUNNEL_LIST,
+	END,
+	ZERO,
+};
+
 static const enum index next_destroy_attr[] = {
 	DESTROY_RULE,
 	END,
@@ -1643,6 +1676,9 @@ static int parse_aged(struct context *, const struct token *,
 static int parse_isolate(struct context *, const struct token *,
 			 const char *, unsigned int,
 			 void *, unsigned int);
+static int parse_tunnel(struct context *, const struct token *,
+			const char *, unsigned int,
+			void *, unsigned int);
 static int parse_int(struct context *, const struct token *,
 		     const char *, unsigned int,
 		     void *, unsigned int);
@@ -1844,7 +1880,8 @@ static const struct token token_list[] = {
 			      LIST,
 			      AGED,
 			      QUERY,
-			      ISOLATE)),
+			      ISOLATE,
+			      TUNNEL)),
 		.call = parse_init,
 	},
 	/* Top-level command. */
@@ -1955,6 +1992,49 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY(struct buffer, port)),
 		.call = parse_isolate,
 	},
+	[TUNNEL] = {
+		.name = "tunnel",
+		.help = "new tunnel API",
+		.next = NEXT(NEXT_ENTRY
+			     (TUNNEL_CREATE, TUNNEL_LIST, TUNNEL_DESTROY)),
+		.call = parse_tunnel,
+	},
+	/* Tunnel arguments. */
+	[TUNNEL_CREATE] = {
+		.name = "create",
+		.help = "create new tunnel object",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_CREATE_TYPE] = {
+		.name = "type",
+		.help = "create new tunnel",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(FILE_PATH)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, type)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY] = {
+		.name = "destroy",
+		.help = "destroy tunel",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY_ID] = {
+		.name = "id",
+		.help = "tunnel identifier to testroy",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_LIST] = {
+		.name = "list",
+		.help = "list existing tunnels",
+		.next = NEXT(tunnel_list_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
 	/* Destroy arguments. */
 	[DESTROY_RULE] = {
 		.name = "rule",
@@ -2018,6 +2098,20 @@ static const struct token token_list[] = {
 		.next = NEXT(next_vc_attr),
 		.call = parse_vc,
 	},
+	[TUNNEL_SET] = {
+		.name = "tunnel_set",
+		.help = "tunnel steer rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
+	[TUNNEL_MATCH] = {
+		.name = "tunnel_match",
+		.help = "tunnel match rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
 	/* Validate/create pattern. */
 	[PATTERN] = {
 		.name = "pattern",
@@ -4495,12 +4589,28 @@ parse_vc(struct context *ctx, const struct token *token,
 		return len;
 	}
 	ctx->objdata = 0;
-	ctx->object = &out->args.vc.attr;
+	switch (ctx->curr) {
+	default:
+		ctx->object = &out->args.vc.attr;
+		break;
+	case TUNNEL_SET:
+	case TUNNEL_MATCH:
+		ctx->object = &out->args.vc.tunnel_ops;
+		break;
+	}
 	ctx->objmask = NULL;
 	switch (ctx->curr) {
 	case GROUP:
 	case PRIORITY:
 		return len;
+	case TUNNEL_SET:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.actions = 1;
+		return len;
+	case TUNNEL_MATCH:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.items = 1;
+		return len;
 	case INGRESS:
 		out->args.vc.attr.ingress = 1;
 		return len;
@@ -6108,6 +6218,47 @@ parse_isolate(struct context *ctx, const struct token *token,
 	return len;
 }
 
+static int
+parse_tunnel(struct context *ctx, const struct token *token,
+	     const char *str, unsigned int len,
+	     void *buf, unsigned int size)
+{
+	struct buffer *out = buf;
+
+	/* Token name must match. */
+	if (parse_default(ctx, token, str, len, NULL, 0) < 0)
+		return -1;
+	/* Nothing else to do if there is no buffer. */
+	if (!out)
+		return len;
+	if (!out->command) {
+		if (ctx->curr != TUNNEL)
+			return -1;
+		if (sizeof(*out) > size)
+			return -1;
+		out->command = ctx->curr;
+		ctx->objdata = 0;
+		ctx->object = out;
+		ctx->objmask = NULL;
+	} else {
+		switch (ctx->curr) {
+		default:
+			break;
+		case TUNNEL_CREATE:
+		case TUNNEL_DESTROY:
+		case TUNNEL_LIST:
+			out->command = ctx->curr;
+			break;
+		case TUNNEL_CREATE_TYPE:
+		case TUNNEL_DESTROY_ID:
+			ctx->object = &out->args.vc.tunnel_ops;
+			break;
+		}
+	}
+
+	return len;
+}
+
 /**
  * Parse signed/unsigned integers 8 to 64-bit long.
  *
@@ -7148,11 +7299,13 @@ cmd_flow_parsed(const struct buffer *in)
 		break;
 	case VALIDATE:
 		port_flow_validate(in->port, &in->args.vc.attr,
-				   in->args.vc.pattern, in->args.vc.actions);
+				   in->args.vc.pattern, in->args.vc.actions,
+				   &in->args.vc.tunnel_ops);
 		break;
 	case CREATE:
 		port_flow_create(in->port, &in->args.vc.attr,
-				 in->args.vc.pattern, in->args.vc.actions);
+				 in->args.vc.pattern, in->args.vc.actions,
+				 &in->args.vc.tunnel_ops);
 		break;
 	case DESTROY:
 		port_flow_destroy(in->port, in->args.destroy.rule_n,
@@ -7178,6 +7331,15 @@ cmd_flow_parsed(const struct buffer *in)
 	case AGED:
 		port_flow_aged(in->port, in->args.aged.destroy);
 		break;
+	case TUNNEL_CREATE:
+		port_flow_tunnel_create(in->port, &in->args.vc.tunnel_ops);
+		break;
+	case TUNNEL_DESTROY:
+		port_flow_tunnel_destroy(in->port, in->args.vc.tunnel_ops.id);
+		break;
+	case TUNNEL_LIST:
+		port_flow_tunnel_list(in->port);
+		break;
 	default:
 		break;
 	}
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 2c00b55440..6bce791dfb 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1521,6 +1521,115 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 
 /* Generic flow management functions. */
 
+static struct port_flow_tunnel *
+port_flow_locate_tunnel_id(struct rte_port *port, uint32_t port_tunnel_id)
+{
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (flow_tunnel->id == port_tunnel_id)
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+const char *
+port_flow_tunnel_type(struct rte_flow_tunnel *tunnel)
+{
+	const char *type;
+	switch (tunnel->type) {
+	default:
+		type = "unknown";
+		break;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		type = "vxlan";
+		break;
+	}
+
+	return type;
+}
+
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (!memcmp(&flow_tunnel->tunnel, tun, sizeof(*tun)))
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+void port_flow_tunnel_list(portid_t port_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		printf("port %u tunnel #%u type=%s",
+			port_id, flt->id, port_flow_tunnel_type(&flt->tunnel));
+		if (flt->tunnel.tun_id)
+			printf(" id=%lu", flt->tunnel.tun_id);
+		printf("\n");
+	}
+}
+
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->id == tunnel_id)
+			break;
+	}
+	if (flt) {
+		LIST_REMOVE(flt, chain);
+		free(flt);
+		printf("port %u: flow tunnel #%u destroyed\n",
+			port_id, tunnel_id);
+	}
+}
+
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops)
+{
+	struct rte_port *port = &ports[port_id];
+	enum rte_flow_item_type	type;
+	struct port_flow_tunnel *flt;
+
+	if (!strcmp(ops->type, "vxlan"))
+		type = RTE_FLOW_ITEM_TYPE_VXLAN;
+	else {
+		printf("cannot offload \"%s\" tunnel type\n", ops->type);
+		return;
+	}
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->tunnel.type == type)
+			break;
+	}
+	if (!flt) {
+		flt = calloc(1, sizeof(*flt));
+		if (!flt) {
+			printf("failed to allocate port flt object\n");
+			return;
+		}
+		flt->tunnel.type = type;
+		flt->id = LIST_EMPTY(&port->flow_tunnel_list) ? 1 :
+				  LIST_FIRST(&port->flow_tunnel_list)->id + 1;
+		LIST_INSERT_HEAD(&port->flow_tunnel_list, flt, chain);
+	}
+	printf("port %d: flow tunnel #%u type %s\n",
+		port_id, flt->id, ops->type);
+}
+
 /** Generate a port_flow entry from attributes/pattern/actions. */
 static struct port_flow *
 port_flow_new(const struct rte_flow_attr *attr,
@@ -1860,20 +1969,137 @@ port_shared_action_query(portid_t port_id, uint32_t id)
 	}
 	return ret;
 }
+static struct port_flow_tunnel *
+port_flow_tunnel_offload_cmd_prep(portid_t port_id,
+				  const struct rte_flow_item *pattern,
+				  const struct rte_flow_action *actions,
+				  const struct tunnel_ops *tunnel_ops)
+{
+	int ret;
+	struct rte_port *port;
+	struct port_flow_tunnel *pft;
+	struct rte_flow_error error;
+
+	port = &ports[port_id];
+	pft = port_flow_locate_tunnel_id(port, tunnel_ops->id);
+	if (!pft) {
+		printf("failed to locate port flow tunnel #%u\n",
+			tunnel_ops->id);
+		return NULL;
+	}
+	if (tunnel_ops->actions) {
+		uint32_t num_actions;
+		const struct rte_flow_action *aptr;
+
+		ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
+						&pft->pmd_actions,
+						&pft->num_pmd_actions,
+						&error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (aptr = actions, num_actions = 1;
+		     aptr->type != RTE_FLOW_ACTION_TYPE_END;
+		     aptr++, num_actions++);
+		pft->actions = malloc(
+				(num_actions +  pft->num_pmd_actions) *
+				sizeof(actions[0]));
+		if (!pft->actions) {
+			rte_flow_tunnel_action_decap_release(
+					port_id, pft->actions,
+					pft->num_pmd_actions, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->actions, pft->pmd_actions,
+			   pft->num_pmd_actions * sizeof(actions[0]));
+		rte_memcpy(pft->actions + pft->num_pmd_actions, actions,
+			   num_actions * sizeof(actions[0]));
+	}
+	if (tunnel_ops->items) {
+		uint32_t num_items;
+		const struct rte_flow_item *iptr;
+
+		ret = rte_flow_tunnel_match(port_id, &pft->tunnel,
+					    &pft->pmd_items,
+					    &pft->num_pmd_items,
+					    &error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (iptr = pattern, num_items = 1;
+		     iptr->type != RTE_FLOW_ITEM_TYPE_END;
+		     iptr++, num_items++);
+		pft->items = malloc((num_items + pft->num_pmd_items) *
+				    sizeof(pattern[0]));
+		if (!pft->items) {
+			rte_flow_tunnel_item_release(
+					port_id, pft->pmd_items,
+					pft->num_pmd_items, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->items, pft->pmd_items,
+			   pft->num_pmd_items * sizeof(pattern[0]));
+		rte_memcpy(pft->items + pft->num_pmd_items, pattern,
+			   num_items * sizeof(pattern[0]));
+	}
+
+	return pft;
+}
+
+static void
+port_flow_tunnel_offload_cmd_release(portid_t port_id,
+				     const struct tunnel_ops *tunnel_ops,
+				     struct port_flow_tunnel *pft)
+{
+	struct rte_flow_error error;
+
+	if (tunnel_ops->actions) {
+		free(pft->actions);
+		rte_flow_tunnel_action_decap_release(
+			port_id, pft->pmd_actions,
+			pft->num_pmd_actions, &error);
+		pft->actions = NULL;
+		pft->pmd_actions = NULL;
+	}
+	if (tunnel_ops->items) {
+		free(pft->items);
+		rte_flow_tunnel_item_release(port_id, pft->pmd_items,
+					     pft->num_pmd_items,
+					     &error);
+		pft->items = NULL;
+		pft->pmd_items = NULL;
+	}
+}
 
 /** Validate flow rule. */
 int
 port_flow_validate(portid_t port_id,
 		   const struct rte_flow_attr *attr,
 		   const struct rte_flow_item *pattern,
-		   const struct rte_flow_action *actions)
+		   const struct rte_flow_action *actions,
+		   const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	/* Poisoning to make sure PMDs update it in case of error. */
 	memset(&error, 0x11, sizeof(error));
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	if (rte_flow_validate(port_id, attr, pattern, actions, &error))
 		return port_flow_complain(&error);
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule validated\n");
 	return 0;
 }
@@ -1903,13 +2129,15 @@ int
 port_flow_create(portid_t port_id,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item *pattern,
-		 const struct rte_flow_action *actions)
+		 const struct rte_flow_action *actions,
+		 const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow *flow;
 	struct rte_port *port;
 	struct port_flow *pf;
 	uint32_t id = 0;
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	port = &ports[port_id];
 	if (port->flow_list) {
@@ -1920,6 +2148,16 @@ port_flow_create(portid_t port_id,
 		}
 		id = port->flow_list->id + 1;
 	}
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	pf = port_flow_new(attr, pattern, actions, &error);
 	if (!pf)
 		return port_flow_complain(&error);
@@ -1935,6 +2173,8 @@ port_flow_create(portid_t port_id,
 	pf->id = id;
 	pf->flow = flow;
 	port->flow_list = pf;
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule #%u created\n", pf->id);
 	return 0;
 }
@@ -2244,7 +2484,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group)
 		       pf->rule.attr->egress ? 'e' : '-',
 		       pf->rule.attr->transfer ? 't' : '-');
 		while (item->type != RTE_FLOW_ITEM_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
+			if ((uint32_t)item->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)item->type,
 					  NULL) <= 0)
@@ -2255,7 +2497,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group)
 		}
 		printf("=>");
 		while (action->type != RTE_FLOW_ACTION_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
+			if ((uint32_t)action->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)action->type,
 					  NULL) <= 0)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6caba60988..333904d686 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3684,6 +3684,8 @@ init_port_dcb_config(portid_t pid,
 static void
 init_port(void)
 {
+	int i;
+
 	/* Configuration of Ethernet ports. */
 	ports = rte_zmalloc("testpmd: ports",
 			    sizeof(struct rte_port) * RTE_MAX_ETHPORTS,
@@ -3693,7 +3695,8 @@ init_port(void)
 				"rte_zmalloc(%d struct rte_port) failed\n",
 				RTE_MAX_ETHPORTS);
 	}
-
+	for (i = 0; i < RTE_MAX_ETHPORTS; i++)
+		LIST_INIT(&ports[i].flow_tunnel_list);
 	/* Initialize ports NUMA structures */
 	memset(port_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
 	memset(rxring_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f8b0a3517d..5238ac3dd5 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -12,6 +12,7 @@
 #include <rte_gro.h>
 #include <rte_gso.h>
 #include <cmdline.h>
+#include <sys/queue.h>
 
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
@@ -150,6 +151,26 @@ struct port_shared_action {
 	struct rte_flow_shared_action *action;	/**< Shared action handle. */
 };
 
+struct port_flow_tunnel {
+	LIST_ENTRY(port_flow_tunnel) chain;
+	struct rte_flow_action *pmd_actions;
+	struct rte_flow_item   *pmd_items;
+	uint32_t id;
+	uint32_t num_pmd_actions;
+	uint32_t num_pmd_items;
+	struct rte_flow_tunnel tunnel;
+	struct rte_flow_action *actions;
+	struct rte_flow_item *items;
+};
+
+struct tunnel_ops {
+	uint32_t id;
+	char type[16];
+	uint32_t enabled:1;
+	uint32_t actions:1;
+	uint32_t items:1;
+};
+
 /**
  * The data structure associated with each port.
  */
@@ -182,6 +203,7 @@ struct rte_port {
 	struct port_flow        *flow_list; /**< Associated flows. */
 	struct port_shared_action *actions_list;
 	/**< Associated shared actions. */
+	LIST_HEAD(, port_flow_tunnel) flow_tunnel_list;
 	const struct rte_eth_rxtx_callback *rx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	const struct rte_eth_rxtx_callback *tx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	/**< metadata value to insert in Tx packets. */
@@ -773,11 +795,13 @@ int port_shared_action_update(portid_t port_id, uint32_t id,
 int port_flow_validate(portid_t port_id,
 		       const struct rte_flow_attr *attr,
 		       const struct rte_flow_item *pattern,
-		       const struct rte_flow_action *actions);
+		       const struct rte_flow_action *actions,
+		       const struct tunnel_ops *tunnel_ops);
 int port_flow_create(portid_t port_id,
 		     const struct rte_flow_attr *attr,
 		     const struct rte_flow_item *pattern,
-		     const struct rte_flow_action *actions);
+		     const struct rte_flow_action *actions,
+		     const struct tunnel_ops *tunnel_ops);
 int port_shared_action_query(portid_t port_id, uint32_t id);
 void update_age_action_context(const struct rte_flow_action *actions,
 		     struct port_flow *pf);
@@ -788,6 +812,12 @@ int port_flow_query(portid_t port_id, uint32_t rule,
 		    const struct rte_flow_action *action);
 void port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group);
 void port_flow_aged(portid_t port_id, uint8_t destroy);
+const char *port_flow_tunnel_type(struct rte_flow_tunnel *tunnel);
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun);
+void port_flow_tunnel_list(portid_t port_id);
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id);
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops);
 int port_flow_isolate(portid_t port_id, int set);
 
 void rx_ring_desc_display(portid_t port_id, queueid_t rxq_id, uint16_t rxd_id);
diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index 8488fa1a8f..781a813759 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -48,18 +48,49 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 	       is_rx ? "received" : "sent",
 	       (unsigned int) nb_pkts);
 	for (i = 0; i < nb_pkts; i++) {
+		int ret;
+		struct rte_flow_error error;
+		struct rte_flow_restore_info info = { 0, };
+
 		mb = pkts[i];
 		eth_hdr = rte_pktmbuf_read(mb, 0, sizeof(_eth_hdr), &_eth_hdr);
 		eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
-		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
-
+		ret = rte_flow_get_restore_info(port_id, mb, &info, &error);
+		if (!ret) {
+			printf("restore info:");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
+				struct port_flow_tunnel *port_tunnel;
+
+				port_tunnel = port_flow_locate_tunnel
+					      (port_id, &info.tunnel);
+				printf(" - tunnel");
+				if (port_tunnel)
+					printf(" #%u", port_tunnel->id);
+				else
+					printf(" %s", "-none-");
+				printf(" type %s",
+					port_flow_tunnel_type(&info.tunnel));
+			} else {
+				printf(" - no tunnel info");
+			}
+			if (info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED)
+				printf(" - outer header present");
+			else
+				printf(" - no outer header");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_GROUP_ID)
+				printf(" - miss group %u", info.group_id);
+			else
+				printf(" - no miss group");
+			printf("\n");
+		}
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
 		printf(" - type=0x%04x - length=%u - nb_segs=%d",
 		       eth_type, (unsigned int) mb->pkt_len,
 		       (int)mb->nb_segs);
+		ol_flags = mb->ol_flags;
 		if (ol_flags & PKT_RX_RSS_HASH) {
 			printf(" - RSS hash=0x%x", (unsigned int) mb->hash.rss);
 			printf(" - RSS queue=0x%x", (unsigned int) queue);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 43c0ea0599..05a4446757 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3749,6 +3749,45 @@ following sections.
 
    flow aged {port_id} [destroy]
 
+- Tunnel offload - create a tunnel stub::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+- Tunnel offload - destroy a tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+- Tunnel offload - list port tunnel stubs::
+
+   flow tunnel list {port_id}
+
+Creating a tunnel stub for offload
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel create`` setup a tunnel stub for tunnel offload flow rules::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+If successful, it will return a tunnel stub ID usable with other commands::
+
+   port [...]: flow tunnel #[...] type [...]
+
+Tunnel stub ID is relative to a port.
+
+Destroying tunnel offload stub
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel destroy`` destroy port tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+Listing tunnel offload stubs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel list`` list port tunnel offload stubs::
+
+   flow tunnel list {port_id}
+
 Validating flow rules
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -3795,6 +3834,7 @@ to ``rte_flow_create()``::
 
    flow create {port_id}
       [group {group_id}] [priority {level}] [ingress] [egress] [transfer]
+      [tunnel_set {tunnel_id}] [tunnel_match {tunnel_id}]
       pattern {item} [/ {item} [...]] / end
       actions {action} [/ {action} [...]] / end
 
@@ -3809,6 +3849,7 @@ Otherwise it will show an error message of the form::
 Parameters describe in the following order:
 
 - Attributes (*group*, *priority*, *ingress*, *egress*, *transfer* tokens).
+- Tunnel offload specification (tunnel_set, tunnel_match)
 - A matching pattern, starting with the *pattern* token and terminated by an
   *end* pattern item.
 - Actions, starting with the *actions* token and terminated by an *end*
@@ -3852,6 +3893,14 @@ Most rules affect RX therefore contain the ``ingress`` token::
 
    testpmd> flow create 0 ingress pattern [...]
 
+Tunnel offload
+^^^^^^^^^^^^^^
+
+Indicate tunnel offload rule type
+
+- ``tunnel_set {tunnel_id}``: mark rule as tunnel offload decap_set type.
+- ``tunnel_match {tunnel_id}``:  mark rule as tunel offload match type.
+
 Matching pattern
 ^^^^^^^^^^^^^^^^
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model
  2020-10-14 17:23         ` Ferruh Yigit
@ 2020-10-16  9:15           ` Gregory Etelson
  0 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16  9:15 UTC (permalink / raw)
  To: Ferruh Yigit, Sriharsha Basavapatna, dev
  Cc: Eli Britstein, Oz Shlomo, Ori Kam

Hello Feruh,

> Subject: Re: [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/7/2020 1:36 PM, Gregory Etelson wrote:
> > Hello Harsha,
> >
> >> -----Original Message-----
> >
> > [snip]
> >>
> >> Tunnel vport is an internal construct used by one specific
> >> application: OVS. So, shouldn't the rte APIs also be application
> >> agnostic apart from being vendor agnostic ? For OVS, the match fields
> >> in the existing datapath flow rules contain enough information to
> >> identify the tunnel type.
> >
> > Tunnel offload model was inspired by OVS vport, but it is not part of
> the existing API.
> > It looks like the API documentation should not use that term to avoid
> confusion.
> >
> > [snip]
> >
> > [snip]
> >>
> >> Wouldn't it be better if the APIs do not refer to vports and avoid
> >> percolating it down to the PMD ? My point here is to avoid bringing in
> >> the knowledge of an application specific virtual object (vport) to the
> >> PMD.
> >>
> >
> > As I have mentioned above, the API description should not mention vport.
> > I'll post updated documents.
> >
> >> Here's some other issues that I see with the helper APIs and
> >> vendor-specific variable actions.
> >> 1) The application needs some kind of validation (or understanding) of
> >> the actions returned by the PMD. The application can't just blindly
> >> use the actions specified by the PMD. That is, the decision to pick
> >> the set of actions can't be left entirely to the PMD.
> >> 2) The application needs to learn a PMD-specific way of action
> >> processing for each vendor. For example, how should the application
> >> handle flow-miss, given a different set of actions between two vendors
> >> (if one vendor has already popped the tunnel header while the other
> >> one hasn't).
> >> 3) The end-users/customers won't have a common interface (as in,
> >> consistent actions) to perform tunnel decap action. This becomes a
> >> manageability/maintenance issue for the application while working with
> >> different vendors.
> >>
> >> IMO, the API shouldn't expect the PMD to understand the notion of
> >> vport. The goal here is to offload a flow rule to decap the tunnel
> >> header and forward the packet to a HW endpoint.  The problem is that
> >> we don't have a way to express the "tnl_pop" datapath action to the HW
> >> (decap flow #1, in the context of br-phy in OVS-DPDK) and also we may
> >> not want the HW to really pop the tunnel header at that stage. If this
> >> cannot be expressed with existing rte action types, maybe we should
> >> introduce a new action that clearly defines what is expected to the
> >> PMD.
> >
> > Tunnel Offload API provides a common interface for all HW vendors:
> > Rule #1: define a tunneled traffic and steer / group traffic related to
> > that tunnel
> > Rule #2: within the tunnel selection, run matchers on all packet
> headers,
> > outer and inner, and perform actions on inner headers in case of a
> match.
> > For the rule #1 application provides tunnel matchers and traffic
> selection actions
> > and for rule #2 application provides full header matchers and actions
> for inner parts.
> > The rest is supplied by PMD according to HW and rule type. Application
> does not
> > need to understand exact PMD elements implementation.
> > Helper return value notifies application whether it received requested
> PMD elements or not.
> > If helper completed successfully, it means that application received
> required elements
> > and can complete flow rule compilation.
> > As the result, a packet will be fully offloaded or returned to
> application with enough
> > information to continue processing in SW.
> >
> > [snip]
> >
> > [snip]
> >
> >>> Miss handling
> >>> -------------
> >>> Packets going through multiple rte_flow groups are exposed to hw
> >>> misses due to partial packet processing. In such cases, the software
> >>> should continue the packet's processing from the point where the
> >>> hardware missed.
> >>
> >> Whether the packet goes through multiple groups or not for tunnel
> >> decap processing, should be left to the PMD/HW.  These assumptions
> >> shouldn't be built into the APIs. The encapsulated packet (i,e with
> >> outer headers) should be provided to the application, rather than
> >> making SW understand that there was a miss in stage-1, or stage-n in
> >> HW. That is, HW either processes it entirely, or punts the whole
> >> packet to SW if there's a miss. And the packet should take the normal
> >> processing path in SW (no action offload).
> >>
> >> Thanks,
> >> -Harsha
> >
> > The packet is provided to the application via the standard
> rte_eth_rx_burst API.
> > Additional information about the HW packet processing state is provided
> to
> > the application by the suggested rte_flow_get_restore_info API. It is up
> to the
> > application if to use such provided info, or even if to call this API at
> all.
> >
> > [snip]
> >
> > Regards,
> > Gregory
> >
> 
> 
> Hi Gregory, Sriharsha,
> 
> Is there any output of the discussion?

Tunnel API documentation was updated in V6.

Regards,
Gregory


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (7 preceding siblings ...)
  2020-10-16  8:55 ` [dpdk-dev] [PATCH v6 " Gregory Etelson
@ 2020-10-16 10:33 ` Gregory Etelson
  2020-10-16 10:33   ` [dpdk-dev] [PATCH v7 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
                     ` (3 more replies)
  2020-10-16 12:51 ` [dpdk-dev] [PATCH v8 " Gregory Etelson
                   ` (8 subsequent siblings)
  17 siblings, 4 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16 10:33 UTC (permalink / raw)
  To: dev; +Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde, asafp

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

 v2:
 * documentation updates
 * MLX5 PMD implementation for tunnel offload
 * testpmd updates for tunnel offload

 v3:
 * documentation updates
 * MLX5 PMD updates
 * testpmd updates

 v4:
 * updated patch: allow negative values in flow rule types

v5:
 * rebase to next-net

v6:
* update tunnel offload API documentation

v7:
* testpmd: resolve "%lu" differences in ubuntu 32 & 64

Eli Britstein (1):
  ethdev: tunnel offload model

Gregory Etelson (2):
  ethdev: allow negative values in flow rule types
  app/testpmd: add commands for tunnel offload API

 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 252 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/prog_guide/rte_flow.rst          |  81 +++++++
 doc/guides/rel_notes/release_20_11.rst      |  10 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 lib/librte_ethdev/rte_ethdev_version.map    |   5 +
 lib/librte_ethdev/rte_flow.c                | 140 ++++++++++-
 lib/librte_ethdev/rte_flow.h                | 195 +++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h         |  32 +++
 12 files changed, 989 insertions(+), 19 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v7 1/3] ethdev: allow negative values in flow rule types
  2020-10-16 10:33 ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Gregory Etelson
@ 2020-10-16 10:33   ` Gregory Etelson
  2020-10-16 10:33   ` [dpdk-dev] [PATCH v7 2/3] ethdev: tunnel offload model Gregory Etelson
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16 10:33 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde, asafp,
	Ori Kam, Viacheslav Ovsiienko, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow item or action conversion library accepts positive known
element types with predefined sizes only. Private PMD items and
actions do not fit into this scheme becase PMD type values are
negative, each PMD has it's own types numeration and element types and
their sizes are not visible at RTE level.  To resolve these
limitations the patch proposes this solution:
1. PMD can expose elements of pointer size only.  RTE flow
   conversion functions will use pointer size for each configuration
   object in private PMD element it processes;
2. RTE flow verification will not reject elements with negative type.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v4:
* update the 'Negative types' section in the rtre_flow.rst

* update the patch comment

v5:
* rebase to next-net
---
 doc/guides/prog_guide/rte_flow.rst     |  3 +++
 doc/guides/rel_notes/release_20_11.rst |  5 +++++
 lib/librte_ethdev/rte_flow.c           | 28 ++++++++++++++++++++------
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 6ee0d3a10a..7fb5ec9059 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2775,6 +2775,9 @@ identifiers they are not aware of.
 
 A method to generate them remains to be defined.
 
+Application may use PMD dynamic items or actions in flow rules. In that case
+size of configuration object in dynamic element must be a pointer size.
+
 Planned types
 ~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 79d9ebac4e..9155b468d6 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -116,6 +116,11 @@ New Features
   * Updated HWRM structures to 1.10.1.70 version.
   * Added TRUFLOW support for Stingray devices.
 
+* **Flow rules allowed to use private PMD items / actions.**
+
+  * Flow rule verification was updated to accept private PMD
+    items and actions.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 686fe40eaa..b74ea5593a 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -518,7 +518,11 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_item[item->type].size;
+		/**
+		 * allow PMD private flow item
+		 */
+		off = (int)item->type >= 0 ?
+		      rte_flow_desc_item[item->type].size : sizeof(void *);
 		rte_memcpy(buf, data, (size > off ? off : size));
 		break;
 	}
@@ -621,7 +625,11 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_action[action->type].size;
+		/**
+		 * allow PMD private flow action
+		 */
+		off = (int)action->type >= 0 ?
+		      rte_flow_desc_action[action->type].size : sizeof(void *);
 		rte_memcpy(buf, action->conf, (size > off ? off : size));
 		break;
 	}
@@ -663,8 +671,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
-		    !rte_flow_desc_item[src->type].name)
+		/**
+		 * allow PMD private flow item
+		 */
+		if (((int)src->type >= 0) &&
+			((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
+		    !rte_flow_desc_item[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
 				 "cannot convert unknown item type");
@@ -752,8 +764,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
-		    !rte_flow_desc_action[src->type].name)
+		/**
+		 * allow PMD private flow action
+		 */
+		if (((int)src->type >= 0) &&
+		    ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
+		    !rte_flow_desc_action[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				 src, "cannot convert unknown action type");
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v7 2/3] ethdev: tunnel offload model
  2020-10-16 10:33 ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Gregory Etelson
  2020-10-16 10:33   ` [dpdk-dev] [PATCH v7 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-10-16 10:33   ` Gregory Etelson
  2020-10-16 10:34   ` [dpdk-dev] [PATCH v7 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
  2020-10-16 12:10   ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Ferruh Yigit
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16 10:33 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde, asafp,
	Eli Britstein, Ori Kam, Viacheslav Ovsiienko, Ray Kinsella,
	Neil Horman, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko

From: Eli Britstein <elibr@mellanox.com>

rte_flow API provides the building blocks for vendor-agnostic flow
classification offloads. The rte_flow "patterns" and "actions"
primitives are fine-grained, thus enabling DPDK applications the
flexibility to offload network stacks and complex pipelines.
Applications wishing to offload tunneled traffic are required to use
the rte_flow primitives, such as group, meta, mark, tag, and others to
model their high-level objects.  The hardware model design for
high-level software objects is not trivial.  Furthermore, an optimal
design is often vendor-specific.

When hardware offloads tunneled traffic in multi-group logic,
partially offloaded packets may arrive to the application after they
were modified in hardware. In this case, the application may need to
restore the original packet headers. Consider the following sequence:
The application decaps a packet in one group and jumps to a second
group where it tries to match on a 5-tuple, that will miss and send
the packet to the application. In this case, the application does not
receive the original packet but a modified one. Also, in this case,
the application cannot match on the outer header fields, such as VXLAN
vni and 5-tuple.

There are several possible ways to use rte_flow "patterns" and
"actions" to resolve the issues above. For example:
1 Mapping headers to a hardware registers using the
rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
2 Apply the decap only at the last offload stage after all the
"patterns" were matched and the packet will be fully offloaded.
Every approach has its pros and cons and is highly dependent on the
hardware vendor.  For example, some hardware may have a limited number
of registers while other hardware could not support inner actions and
must decap before accessing inner headers.

The tunnel offload model resolves these issues. The model goals are:
1 Provide a unified application API to offload tunneled traffic that
is capable to match on outer headers after decap.
2 Allow the application to restore the outer header of partially
offloaded packets.

The tunnel offload model does not introduce new elements to the
existing RTE flow model and is implemented as a set of helper
functions.

For the application to work with the tunnel offload API it
has to adjust flow rules in multi-table tunnel offload in the
following way:
1 Remove explicit call to decap action and replace it with PMD actions
obtained from rte_flow_tunnel_decap_and_set() helper.
2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
other rules in the tunnel offload sequence.

VXLAN Code example:

Assume application needs to do inner NAT on the VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

The first VXLAN packet that arrives matches the rule in group 0 and
jumps to group 3.  In group 3 the packet will miss since there is no
flow to match and will be sent to the application.  Application  will
call rte_flow_get_restore_info() to get the packet outer header.

Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of the rules will be that VXLAN packet with vni=10, outer
IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received
decapped on queue 3 with IPv4 dst=186.1.1.1

Note: The packet in group 3 is considered decapped. All actions in
that group will be done on the header that was inner before decap. The
application may specify an outer header to be matched on.  It's PMD
responsibility to translate these items to outer metadata.

API usage:

/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);
/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
  * 4. after flow creation application does not need to keep the
  * tunnel action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);
/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id,
                             pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v5:
* rebase to next-net

v6:
* update the patch comment
* update tunnel offload section in rte_flow.rst
---
 doc/guides/prog_guide/rte_flow.rst       |  78 +++++++++
 doc/guides/rel_notes/release_20_11.rst   |   5 +
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 6 files changed, 427 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 7fb5ec9059..8dc048c6f4 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3131,6 +3131,84 @@ operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+rte_flow API provides the building blocks for vendor-agnostic flow
+classification offloads. The rte_flow "patterns" and "actions"
+primitives are fine-grained, thus enabling DPDK applications the
+flexibility to offload network stacks and complex pipelines.
+Applications wishing to offload tunneled traffic are required to use
+the rte_flow primitives, such as group, meta, mark, tag, and others to
+model their high-level objects.  The hardware model design for
+high-level software objects is not trivial.  Furthermore, an optimal
+design is often vendor-specific.
+
+When hardware offloads tunneled traffic in multi-group logic,
+partially offloaded packets may arrive to the application after they
+were modified in hardware. In this case, the application may need to
+restore the original packet headers. Consider the following sequence:
+The application decaps a packet in one group and jumps to a second
+group where it tries to match on a 5-tuple, that will miss and send
+the packet to the application. In this case, the application does not
+receive the original packet but a modified one. Also, in this case,
+the application cannot match on the outer header fields, such as VXLAN
+vni and 5-tuple.
+
+There are several possible ways to use rte_flow "patterns" and
+"actions" to resolve the issues above. For example:
+
+1 Mapping headers to a hardware registers using the
+rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
+
+2 Apply the decap only at the last offload stage after all the
+"patterns" were matched and the packet will be fully offloaded.
+
+Every approach has its pros and cons and is highly dependent on the
+hardware vendor.  For example, some hardware may have a limited number
+of registers while other hardware could not support inner actions and
+must decap before accessing inner headers.
+
+The tunnel offload model resolves these issues. The model goals are:
+
+1 Provide a unified application API to offload tunneled traffic that
+is capable to match on outer headers after decap.
+
+2 Allow the application to restore the outer header of partially
+offloaded packets.
+
+The tunnel offload model does not introduce new elements to the
+existing RTE flow model and is implemented as a set of helper
+functions.
+
+For the application to work with the tunnel offload API it
+has to adjust flow rules in multi-table tunnel offload in the
+following way:
+
+1 Remove explicit call to decap action and replace it with PMD actions
+obtained from rte_flow_tunnel_decap_and_set() helper.
+
+2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
+other rules in the tunnel offload sequence.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release PMD items and actions after rule was created. However, if the
+application needs to create additional rule for the same tunnel it
+will need to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 9155b468d6..f125ce79dd 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -121,6 +121,11 @@ New Features
   * Flow rule verification was updated to accept private PMD
     items and actions.
 
+* **Added generic API to offload tunneled traffic and restore missed packet.**
+
+  * Added a new hardware independent helper API to RTE flow library that
+    offloads tunneled traffic and restores missed packets.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index f64c379ac2..8ddda2547f 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -239,6 +239,11 @@ EXPERIMENTAL {
 	rte_flow_shared_action_destroy;
 	rte_flow_shared_action_query;
 	rte_flow_shared_action_update;
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
 };
 
 INTERNAL {
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index b74ea5593a..380c5cae2c 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1143,3 +1143,115 @@ rte_flow_shared_action_query(uint16_t port_id,
 				       data, error);
 	return flow_err(port_id, ret, error);
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *restore_info,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 48395284b5..a8eac4deb8 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3620,6 +3620,201 @@ rte_flow_shared_action_query(uint16_t port_id,
 			     void *data,
 			     struct rte_flow_error *error);
 
+/* Tunnel has a type and the key information. */
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type	type;
+	uint64_t tun_id; /**< Tunnel identification. */
+
+	RTE_STD_C11
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	};
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+	uint16_t   tun_flags; /**< Tunnel flags. */
+
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+
+	/**
+	 * the following members are required to restore packet
+	 * after miss
+	 */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	uint32_t label; /**< Flow Label for IPv6. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID where packed missed */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *info,
+			  struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 58f56b0262..bd5ffc0bb1 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -131,6 +131,38 @@ struct rte_flow_ops {
 		 const struct rte_flow_shared_action *shared_action,
 		 void *data,
 		 struct rte_flow_error *error);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v7 3/3] app/testpmd: add commands for tunnel offload API
  2020-10-16 10:33 ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Gregory Etelson
  2020-10-16 10:33   ` [dpdk-dev] [PATCH v7 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
  2020-10-16 10:33   ` [dpdk-dev] [PATCH v7 2/3] ethdev: tunnel offload model Gregory Etelson
@ 2020-10-16 10:34   ` Gregory Etelson
  2020-10-16 12:10   ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Ferruh Yigit
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16 10:34 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, ajit.khaparde, asafp,
	Ori Kam, Wenzhuo Lu, Beilei Xing, Bernard Iremonger

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:

* Create application tunnel:
flow tunnel create <port> type <tunnel type>
On success, the command creates application tunnel object and returns
the tunnel descriptor. Tunnel descriptor is used in subsequent flow
creation commands to reference the tunnel.

* Create tunnel steering flow rule:
tunnel_set <tunnel descriptor> parameter used with steering rule
template.

* Create tunnel matching flow rule:
tunnel_match <tunnel descriptor> used with matching rule template.

* If tunnel steering rule was offloaded, outer header of a partially
offloaded packet is restored after miss.

Example:
test packet=
<Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
<UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
<VXLAN  NextProtocol=Ethernet vni=0x0 |
<Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
<ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
>>> len(packet)
92

testpmd> flow flush 0
testpmd> port 0/queue 0: received 1 packets
src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow tunnel 0 type vxlan
port 0: flow tunnel #1 type vxlan
testpmd> flow create 0 ingress group 0 tunnel_set 1
         pattern eth /ipv4 / udp dst is 4789 / vxlan / end
         actions  jump group 0 / end
Flow rule #0 created
testpmd> port 0/queue 0: received 1 packets
tunnel restore info: - vxlan tunnel - outer header present # <--
  src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow create 0 ingress group 0 tunnel_match 1
         pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
         end
         actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
         queue index 0 / end
Flow rule #1 created
testpmd> port 0/queue 0: received 1 packets
  src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
length=42

* Destroy flow tunnel
flow tunnel destroy <port> id <tunnel id>

* Show existing flow tunnels
flow tunnel list <port>

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
v2:
* introduce testpmd support for tunnel offload API

v3:
* update flow tunnel commands

v5:
* rebase to next-net

v7:
* resolve "%lu" differences in ubuntu 32 & 64
---
 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 252 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 6 files changed, 532 insertions(+), 13 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 00c70a144a..b9a1f7178a 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -74,6 +74,14 @@ enum index {
 	LIST,
 	AGED,
 	ISOLATE,
+	TUNNEL,
+
+	/* Tunnel arguments. */
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	TUNNEL_LIST,
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
 
 	/* Destroy arguments. */
 	DESTROY_RULE,
@@ -93,6 +101,8 @@ enum index {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 
 	/* Shared action arguments */
 	SHARED_ACTION_CREATE,
@@ -713,6 +723,7 @@ struct buffer {
 		} sa; /* Shared action query arguments */
 		struct {
 			struct rte_flow_attr attr;
+			struct tunnel_ops tunnel_ops;
 			struct rte_flow_item *pattern;
 			struct rte_flow_action *actions;
 			uint32_t pattern_n;
@@ -789,10 +800,32 @@ static const enum index next_vc_attr[] = {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 	PATTERN,
 	ZERO,
 };
 
+static const enum index tunnel_create_attr[] = {
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_destroy_attr[] = {
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_list_attr[] = {
+	TUNNEL_LIST,
+	END,
+	ZERO,
+};
+
 static const enum index next_destroy_attr[] = {
 	DESTROY_RULE,
 	END,
@@ -1643,6 +1676,9 @@ static int parse_aged(struct context *, const struct token *,
 static int parse_isolate(struct context *, const struct token *,
 			 const char *, unsigned int,
 			 void *, unsigned int);
+static int parse_tunnel(struct context *, const struct token *,
+			const char *, unsigned int,
+			void *, unsigned int);
 static int parse_int(struct context *, const struct token *,
 		     const char *, unsigned int,
 		     void *, unsigned int);
@@ -1844,7 +1880,8 @@ static const struct token token_list[] = {
 			      LIST,
 			      AGED,
 			      QUERY,
-			      ISOLATE)),
+			      ISOLATE,
+			      TUNNEL)),
 		.call = parse_init,
 	},
 	/* Top-level command. */
@@ -1955,6 +1992,49 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY(struct buffer, port)),
 		.call = parse_isolate,
 	},
+	[TUNNEL] = {
+		.name = "tunnel",
+		.help = "new tunnel API",
+		.next = NEXT(NEXT_ENTRY
+			     (TUNNEL_CREATE, TUNNEL_LIST, TUNNEL_DESTROY)),
+		.call = parse_tunnel,
+	},
+	/* Tunnel arguments. */
+	[TUNNEL_CREATE] = {
+		.name = "create",
+		.help = "create new tunnel object",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_CREATE_TYPE] = {
+		.name = "type",
+		.help = "create new tunnel",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(FILE_PATH)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, type)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY] = {
+		.name = "destroy",
+		.help = "destroy tunel",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY_ID] = {
+		.name = "id",
+		.help = "tunnel identifier to testroy",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_LIST] = {
+		.name = "list",
+		.help = "list existing tunnels",
+		.next = NEXT(tunnel_list_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
 	/* Destroy arguments. */
 	[DESTROY_RULE] = {
 		.name = "rule",
@@ -2018,6 +2098,20 @@ static const struct token token_list[] = {
 		.next = NEXT(next_vc_attr),
 		.call = parse_vc,
 	},
+	[TUNNEL_SET] = {
+		.name = "tunnel_set",
+		.help = "tunnel steer rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
+	[TUNNEL_MATCH] = {
+		.name = "tunnel_match",
+		.help = "tunnel match rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
 	/* Validate/create pattern. */
 	[PATTERN] = {
 		.name = "pattern",
@@ -4495,12 +4589,28 @@ parse_vc(struct context *ctx, const struct token *token,
 		return len;
 	}
 	ctx->objdata = 0;
-	ctx->object = &out->args.vc.attr;
+	switch (ctx->curr) {
+	default:
+		ctx->object = &out->args.vc.attr;
+		break;
+	case TUNNEL_SET:
+	case TUNNEL_MATCH:
+		ctx->object = &out->args.vc.tunnel_ops;
+		break;
+	}
 	ctx->objmask = NULL;
 	switch (ctx->curr) {
 	case GROUP:
 	case PRIORITY:
 		return len;
+	case TUNNEL_SET:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.actions = 1;
+		return len;
+	case TUNNEL_MATCH:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.items = 1;
+		return len;
 	case INGRESS:
 		out->args.vc.attr.ingress = 1;
 		return len;
@@ -6108,6 +6218,47 @@ parse_isolate(struct context *ctx, const struct token *token,
 	return len;
 }
 
+static int
+parse_tunnel(struct context *ctx, const struct token *token,
+	     const char *str, unsigned int len,
+	     void *buf, unsigned int size)
+{
+	struct buffer *out = buf;
+
+	/* Token name must match. */
+	if (parse_default(ctx, token, str, len, NULL, 0) < 0)
+		return -1;
+	/* Nothing else to do if there is no buffer. */
+	if (!out)
+		return len;
+	if (!out->command) {
+		if (ctx->curr != TUNNEL)
+			return -1;
+		if (sizeof(*out) > size)
+			return -1;
+		out->command = ctx->curr;
+		ctx->objdata = 0;
+		ctx->object = out;
+		ctx->objmask = NULL;
+	} else {
+		switch (ctx->curr) {
+		default:
+			break;
+		case TUNNEL_CREATE:
+		case TUNNEL_DESTROY:
+		case TUNNEL_LIST:
+			out->command = ctx->curr;
+			break;
+		case TUNNEL_CREATE_TYPE:
+		case TUNNEL_DESTROY_ID:
+			ctx->object = &out->args.vc.tunnel_ops;
+			break;
+		}
+	}
+
+	return len;
+}
+
 /**
  * Parse signed/unsigned integers 8 to 64-bit long.
  *
@@ -7148,11 +7299,13 @@ cmd_flow_parsed(const struct buffer *in)
 		break;
 	case VALIDATE:
 		port_flow_validate(in->port, &in->args.vc.attr,
-				   in->args.vc.pattern, in->args.vc.actions);
+				   in->args.vc.pattern, in->args.vc.actions,
+				   &in->args.vc.tunnel_ops);
 		break;
 	case CREATE:
 		port_flow_create(in->port, &in->args.vc.attr,
-				 in->args.vc.pattern, in->args.vc.actions);
+				 in->args.vc.pattern, in->args.vc.actions,
+				 &in->args.vc.tunnel_ops);
 		break;
 	case DESTROY:
 		port_flow_destroy(in->port, in->args.destroy.rule_n,
@@ -7178,6 +7331,15 @@ cmd_flow_parsed(const struct buffer *in)
 	case AGED:
 		port_flow_aged(in->port, in->args.aged.destroy);
 		break;
+	case TUNNEL_CREATE:
+		port_flow_tunnel_create(in->port, &in->args.vc.tunnel_ops);
+		break;
+	case TUNNEL_DESTROY:
+		port_flow_tunnel_destroy(in->port, in->args.vc.tunnel_ops.id);
+		break;
+	case TUNNEL_LIST:
+		port_flow_tunnel_list(in->port);
+		break;
 	default:
 		break;
 	}
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 2c00b55440..660eb5af97 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1521,6 +1521,115 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 
 /* Generic flow management functions. */
 
+static struct port_flow_tunnel *
+port_flow_locate_tunnel_id(struct rte_port *port, uint32_t port_tunnel_id)
+{
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (flow_tunnel->id == port_tunnel_id)
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+const char *
+port_flow_tunnel_type(struct rte_flow_tunnel *tunnel)
+{
+	const char *type;
+	switch (tunnel->type) {
+	default:
+		type = "unknown";
+		break;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		type = "vxlan";
+		break;
+	}
+
+	return type;
+}
+
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (!memcmp(&flow_tunnel->tunnel, tun, sizeof(*tun)))
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+void port_flow_tunnel_list(portid_t port_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		printf("port %u tunnel #%u type=%s",
+			port_id, flt->id, port_flow_tunnel_type(&flt->tunnel));
+		if (flt->tunnel.tun_id)
+			printf(" id=%lu", (unsigned long)flt->tunnel.tun_id);
+		printf("\n");
+	}
+}
+
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->id == tunnel_id)
+			break;
+	}
+	if (flt) {
+		LIST_REMOVE(flt, chain);
+		free(flt);
+		printf("port %u: flow tunnel #%u destroyed\n",
+			port_id, tunnel_id);
+	}
+}
+
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops)
+{
+	struct rte_port *port = &ports[port_id];
+	enum rte_flow_item_type	type;
+	struct port_flow_tunnel *flt;
+
+	if (!strcmp(ops->type, "vxlan"))
+		type = RTE_FLOW_ITEM_TYPE_VXLAN;
+	else {
+		printf("cannot offload \"%s\" tunnel type\n", ops->type);
+		return;
+	}
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->tunnel.type == type)
+			break;
+	}
+	if (!flt) {
+		flt = calloc(1, sizeof(*flt));
+		if (!flt) {
+			printf("failed to allocate port flt object\n");
+			return;
+		}
+		flt->tunnel.type = type;
+		flt->id = LIST_EMPTY(&port->flow_tunnel_list) ? 1 :
+				  LIST_FIRST(&port->flow_tunnel_list)->id + 1;
+		LIST_INSERT_HEAD(&port->flow_tunnel_list, flt, chain);
+	}
+	printf("port %d: flow tunnel #%u type %s\n",
+		port_id, flt->id, ops->type);
+}
+
 /** Generate a port_flow entry from attributes/pattern/actions. */
 static struct port_flow *
 port_flow_new(const struct rte_flow_attr *attr,
@@ -1860,20 +1969,137 @@ port_shared_action_query(portid_t port_id, uint32_t id)
 	}
 	return ret;
 }
+static struct port_flow_tunnel *
+port_flow_tunnel_offload_cmd_prep(portid_t port_id,
+				  const struct rte_flow_item *pattern,
+				  const struct rte_flow_action *actions,
+				  const struct tunnel_ops *tunnel_ops)
+{
+	int ret;
+	struct rte_port *port;
+	struct port_flow_tunnel *pft;
+	struct rte_flow_error error;
+
+	port = &ports[port_id];
+	pft = port_flow_locate_tunnel_id(port, tunnel_ops->id);
+	if (!pft) {
+		printf("failed to locate port flow tunnel #%u\n",
+			tunnel_ops->id);
+		return NULL;
+	}
+	if (tunnel_ops->actions) {
+		uint32_t num_actions;
+		const struct rte_flow_action *aptr;
+
+		ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
+						&pft->pmd_actions,
+						&pft->num_pmd_actions,
+						&error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (aptr = actions, num_actions = 1;
+		     aptr->type != RTE_FLOW_ACTION_TYPE_END;
+		     aptr++, num_actions++);
+		pft->actions = malloc(
+				(num_actions +  pft->num_pmd_actions) *
+				sizeof(actions[0]));
+		if (!pft->actions) {
+			rte_flow_tunnel_action_decap_release(
+					port_id, pft->actions,
+					pft->num_pmd_actions, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->actions, pft->pmd_actions,
+			   pft->num_pmd_actions * sizeof(actions[0]));
+		rte_memcpy(pft->actions + pft->num_pmd_actions, actions,
+			   num_actions * sizeof(actions[0]));
+	}
+	if (tunnel_ops->items) {
+		uint32_t num_items;
+		const struct rte_flow_item *iptr;
+
+		ret = rte_flow_tunnel_match(port_id, &pft->tunnel,
+					    &pft->pmd_items,
+					    &pft->num_pmd_items,
+					    &error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (iptr = pattern, num_items = 1;
+		     iptr->type != RTE_FLOW_ITEM_TYPE_END;
+		     iptr++, num_items++);
+		pft->items = malloc((num_items + pft->num_pmd_items) *
+				    sizeof(pattern[0]));
+		if (!pft->items) {
+			rte_flow_tunnel_item_release(
+					port_id, pft->pmd_items,
+					pft->num_pmd_items, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->items, pft->pmd_items,
+			   pft->num_pmd_items * sizeof(pattern[0]));
+		rte_memcpy(pft->items + pft->num_pmd_items, pattern,
+			   num_items * sizeof(pattern[0]));
+	}
+
+	return pft;
+}
+
+static void
+port_flow_tunnel_offload_cmd_release(portid_t port_id,
+				     const struct tunnel_ops *tunnel_ops,
+				     struct port_flow_tunnel *pft)
+{
+	struct rte_flow_error error;
+
+	if (tunnel_ops->actions) {
+		free(pft->actions);
+		rte_flow_tunnel_action_decap_release(
+			port_id, pft->pmd_actions,
+			pft->num_pmd_actions, &error);
+		pft->actions = NULL;
+		pft->pmd_actions = NULL;
+	}
+	if (tunnel_ops->items) {
+		free(pft->items);
+		rte_flow_tunnel_item_release(port_id, pft->pmd_items,
+					     pft->num_pmd_items,
+					     &error);
+		pft->items = NULL;
+		pft->pmd_items = NULL;
+	}
+}
 
 /** Validate flow rule. */
 int
 port_flow_validate(portid_t port_id,
 		   const struct rte_flow_attr *attr,
 		   const struct rte_flow_item *pattern,
-		   const struct rte_flow_action *actions)
+		   const struct rte_flow_action *actions,
+		   const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	/* Poisoning to make sure PMDs update it in case of error. */
 	memset(&error, 0x11, sizeof(error));
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	if (rte_flow_validate(port_id, attr, pattern, actions, &error))
 		return port_flow_complain(&error);
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule validated\n");
 	return 0;
 }
@@ -1903,13 +2129,15 @@ int
 port_flow_create(portid_t port_id,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item *pattern,
-		 const struct rte_flow_action *actions)
+		 const struct rte_flow_action *actions,
+		 const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow *flow;
 	struct rte_port *port;
 	struct port_flow *pf;
 	uint32_t id = 0;
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	port = &ports[port_id];
 	if (port->flow_list) {
@@ -1920,6 +2148,16 @@ port_flow_create(portid_t port_id,
 		}
 		id = port->flow_list->id + 1;
 	}
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	pf = port_flow_new(attr, pattern, actions, &error);
 	if (!pf)
 		return port_flow_complain(&error);
@@ -1935,6 +2173,8 @@ port_flow_create(portid_t port_id,
 	pf->id = id;
 	pf->flow = flow;
 	port->flow_list = pf;
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule #%u created\n", pf->id);
 	return 0;
 }
@@ -2244,7 +2484,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group)
 		       pf->rule.attr->egress ? 'e' : '-',
 		       pf->rule.attr->transfer ? 't' : '-');
 		while (item->type != RTE_FLOW_ITEM_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
+			if ((uint32_t)item->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)item->type,
 					  NULL) <= 0)
@@ -2255,7 +2497,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group)
 		}
 		printf("=>");
 		while (action->type != RTE_FLOW_ACTION_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
+			if ((uint32_t)action->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)action->type,
 					  NULL) <= 0)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6caba60988..333904d686 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3684,6 +3684,8 @@ init_port_dcb_config(portid_t pid,
 static void
 init_port(void)
 {
+	int i;
+
 	/* Configuration of Ethernet ports. */
 	ports = rte_zmalloc("testpmd: ports",
 			    sizeof(struct rte_port) * RTE_MAX_ETHPORTS,
@@ -3693,7 +3695,8 @@ init_port(void)
 				"rte_zmalloc(%d struct rte_port) failed\n",
 				RTE_MAX_ETHPORTS);
 	}
-
+	for (i = 0; i < RTE_MAX_ETHPORTS; i++)
+		LIST_INIT(&ports[i].flow_tunnel_list);
 	/* Initialize ports NUMA structures */
 	memset(port_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
 	memset(rxring_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f8b0a3517d..5238ac3dd5 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -12,6 +12,7 @@
 #include <rte_gro.h>
 #include <rte_gso.h>
 #include <cmdline.h>
+#include <sys/queue.h>
 
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
@@ -150,6 +151,26 @@ struct port_shared_action {
 	struct rte_flow_shared_action *action;	/**< Shared action handle. */
 };
 
+struct port_flow_tunnel {
+	LIST_ENTRY(port_flow_tunnel) chain;
+	struct rte_flow_action *pmd_actions;
+	struct rte_flow_item   *pmd_items;
+	uint32_t id;
+	uint32_t num_pmd_actions;
+	uint32_t num_pmd_items;
+	struct rte_flow_tunnel tunnel;
+	struct rte_flow_action *actions;
+	struct rte_flow_item *items;
+};
+
+struct tunnel_ops {
+	uint32_t id;
+	char type[16];
+	uint32_t enabled:1;
+	uint32_t actions:1;
+	uint32_t items:1;
+};
+
 /**
  * The data structure associated with each port.
  */
@@ -182,6 +203,7 @@ struct rte_port {
 	struct port_flow        *flow_list; /**< Associated flows. */
 	struct port_shared_action *actions_list;
 	/**< Associated shared actions. */
+	LIST_HEAD(, port_flow_tunnel) flow_tunnel_list;
 	const struct rte_eth_rxtx_callback *rx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	const struct rte_eth_rxtx_callback *tx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	/**< metadata value to insert in Tx packets. */
@@ -773,11 +795,13 @@ int port_shared_action_update(portid_t port_id, uint32_t id,
 int port_flow_validate(portid_t port_id,
 		       const struct rte_flow_attr *attr,
 		       const struct rte_flow_item *pattern,
-		       const struct rte_flow_action *actions);
+		       const struct rte_flow_action *actions,
+		       const struct tunnel_ops *tunnel_ops);
 int port_flow_create(portid_t port_id,
 		     const struct rte_flow_attr *attr,
 		     const struct rte_flow_item *pattern,
-		     const struct rte_flow_action *actions);
+		     const struct rte_flow_action *actions,
+		     const struct tunnel_ops *tunnel_ops);
 int port_shared_action_query(portid_t port_id, uint32_t id);
 void update_age_action_context(const struct rte_flow_action *actions,
 		     struct port_flow *pf);
@@ -788,6 +812,12 @@ int port_flow_query(portid_t port_id, uint32_t rule,
 		    const struct rte_flow_action *action);
 void port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group);
 void port_flow_aged(portid_t port_id, uint8_t destroy);
+const char *port_flow_tunnel_type(struct rte_flow_tunnel *tunnel);
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun);
+void port_flow_tunnel_list(portid_t port_id);
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id);
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops);
 int port_flow_isolate(portid_t port_id, int set);
 
 void rx_ring_desc_display(portid_t port_id, queueid_t rxq_id, uint16_t rxd_id);
diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index 8488fa1a8f..781a813759 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -48,18 +48,49 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 	       is_rx ? "received" : "sent",
 	       (unsigned int) nb_pkts);
 	for (i = 0; i < nb_pkts; i++) {
+		int ret;
+		struct rte_flow_error error;
+		struct rte_flow_restore_info info = { 0, };
+
 		mb = pkts[i];
 		eth_hdr = rte_pktmbuf_read(mb, 0, sizeof(_eth_hdr), &_eth_hdr);
 		eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
-		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
-
+		ret = rte_flow_get_restore_info(port_id, mb, &info, &error);
+		if (!ret) {
+			printf("restore info:");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
+				struct port_flow_tunnel *port_tunnel;
+
+				port_tunnel = port_flow_locate_tunnel
+					      (port_id, &info.tunnel);
+				printf(" - tunnel");
+				if (port_tunnel)
+					printf(" #%u", port_tunnel->id);
+				else
+					printf(" %s", "-none-");
+				printf(" type %s",
+					port_flow_tunnel_type(&info.tunnel));
+			} else {
+				printf(" - no tunnel info");
+			}
+			if (info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED)
+				printf(" - outer header present");
+			else
+				printf(" - no outer header");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_GROUP_ID)
+				printf(" - miss group %u", info.group_id);
+			else
+				printf(" - no miss group");
+			printf("\n");
+		}
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
 		printf(" - type=0x%04x - length=%u - nb_segs=%d",
 		       eth_type, (unsigned int) mb->pkt_len,
 		       (int)mb->nb_segs);
+		ol_flags = mb->ol_flags;
 		if (ol_flags & PKT_RX_RSS_HASH) {
 			printf(" - RSS hash=0x%x", (unsigned int) mb->hash.rss);
 			printf(" - RSS queue=0x%x", (unsigned int) queue);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 43c0ea0599..05a4446757 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3749,6 +3749,45 @@ following sections.
 
    flow aged {port_id} [destroy]
 
+- Tunnel offload - create a tunnel stub::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+- Tunnel offload - destroy a tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+- Tunnel offload - list port tunnel stubs::
+
+   flow tunnel list {port_id}
+
+Creating a tunnel stub for offload
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel create`` setup a tunnel stub for tunnel offload flow rules::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+If successful, it will return a tunnel stub ID usable with other commands::
+
+   port [...]: flow tunnel #[...] type [...]
+
+Tunnel stub ID is relative to a port.
+
+Destroying tunnel offload stub
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel destroy`` destroy port tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+Listing tunnel offload stubs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel list`` list port tunnel offload stubs::
+
+   flow tunnel list {port_id}
+
 Validating flow rules
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -3795,6 +3834,7 @@ to ``rte_flow_create()``::
 
    flow create {port_id}
       [group {group_id}] [priority {level}] [ingress] [egress] [transfer]
+      [tunnel_set {tunnel_id}] [tunnel_match {tunnel_id}]
       pattern {item} [/ {item} [...]] / end
       actions {action} [/ {action} [...]] / end
 
@@ -3809,6 +3849,7 @@ Otherwise it will show an error message of the form::
 Parameters describe in the following order:
 
 - Attributes (*group*, *priority*, *ingress*, *egress*, *transfer* tokens).
+- Tunnel offload specification (tunnel_set, tunnel_match)
 - A matching pattern, starting with the *pattern* token and terminated by an
   *end* pattern item.
 - Actions, starting with the *actions* token and terminated by an *end*
@@ -3852,6 +3893,14 @@ Most rules affect RX therefore contain the ``ingress`` token::
 
    testpmd> flow create 0 ingress pattern [...]
 
+Tunnel offload
+^^^^^^^^^^^^^^
+
+Indicate tunnel offload rule type
+
+- ``tunnel_set {tunnel_id}``: mark rule as tunnel offload decap_set type.
+- ``tunnel_match {tunnel_id}``:  mark rule as tunel offload match type.
+
 Matching pattern
 ^^^^^^^^^^^^^^^^
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API
  2020-10-16 10:33 ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Gregory Etelson
                     ` (2 preceding siblings ...)
  2020-10-16 10:34   ` [dpdk-dev] [PATCH v7 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
@ 2020-10-16 12:10   ` Ferruh Yigit
  3 siblings, 0 replies; 95+ messages in thread
From: Ferruh Yigit @ 2020-10-16 12:10 UTC (permalink / raw)
  To: Gregory Etelson, dev; +Cc: matan, rasland, elibr, ozsh, ajit.khaparde, asafp

On 10/16/2020 11:33 AM, Gregory Etelson wrote:
> Tunnel Offload API provides hardware independent, unified model
> to offload tunneled traffic. Key model elements are:
>   - apply matches to both outer and inner packet headers
>     during entire offload procedure;
>   - restore outer header of partially offloaded packet;
>   - model is implemented as a set of helper functions.
> 
>   v2:
>   * documentation updates
>   * MLX5 PMD implementation for tunnel offload
>   * testpmd updates for tunnel offload
> 
>   v3:
>   * documentation updates
>   * MLX5 PMD updates
>   * testpmd updates
> 
>   v4:
>   * updated patch: allow negative values in flow rule types
> 
> v5:
>   * rebase to next-net
> 
> v6:
> * update tunnel offload API documentation
> 
> v7:
> * testpmd: resolve "%lu" differences in ubuntu 32 & 64
> 
> Eli Britstein (1):
>    ethdev: tunnel offload model
> 
> Gregory Etelson (2):
>    ethdev: allow negative values in flow rule types
>    app/testpmd: add commands for tunnel offload API

Series applied to dpdk-next-net/main, thanks.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v8 0/3] Tunnel Offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (8 preceding siblings ...)
  2020-10-16 10:33 ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Gregory Etelson
@ 2020-10-16 12:51 ` Gregory Etelson
  2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
                     ` (3 more replies)
  2020-10-18 12:15 ` [dpdk-dev] [PATCH] ethdev: rename tunnel offload callbacks Gregory Etelson
                   ` (7 subsequent siblings)
  17 siblings, 4 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16 12:51 UTC (permalink / raw)
  To: dev; +Cc: getelson, matan, rasland, elibr, ozsh, asafp

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

 v2:
 * documentation updates
 * MLX5 PMD implementation for tunnel offload
 * testpmd updates for tunnel offload

 v3:
 * documentation updates
 * MLX5 PMD updates
 * testpmd updates

 v4:
 * updated patch: allow negative values in flow rule types

v5:
 * rebase to next-net

v6:
* update tunnel offload API documentation

v7:
* testpmd: resolve "%lu" differences in ubuntu 32 & 64

v8:
* testpmd: use PRIu64 for general cast to uint64_t

Eli Britstein (1):
  ethdev: tunnel offload model

Gregory Etelson (2):
  ethdev: allow negative values in flow rule types
  app/testpmd: add commands for tunnel offload API

 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 252 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/prog_guide/rte_flow.rst          |  81 +++++++
 doc/guides/rel_notes/release_20_11.rst      |  10 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 lib/librte_ethdev/rte_ethdev_version.map    |   5 +
 lib/librte_ethdev/rte_flow.c                | 140 ++++++++++-
 lib/librte_ethdev/rte_flow.h                | 195 +++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h         |  32 +++
 12 files changed, 989 insertions(+), 19 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v8 1/3] ethdev: allow negative values in flow rule types
  2020-10-16 12:51 ` [dpdk-dev] [PATCH v8 " Gregory Etelson
@ 2020-10-16 12:51   ` Gregory Etelson
  2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model Gregory Etelson
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16 12:51 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, asafp, Ori Kam,
	Viacheslav Ovsiienko, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko

RTE flow items & actions use positive values in item & action type.
Negative values are reserved for PMD private types. PMD
items & actions usually are not exposed to application and are not
used to create RTE flows.

The patch allows applications with access to PMD flow
items & actions ability to integrate RTE and PMD items & actions
and use them to create flow rule.

RTE flow item or action conversion library accepts positive known
element types with predefined sizes only. Private PMD items and
actions do not fit into this scheme because PMD type values are
negative, each PMD has it's own types numeration and element types and
their sizes are not visible at RTE level.  To resolve these
limitations the patch proposes this solution:
1. PMD can expose elements of pointer size only.  RTE flow
   conversion functions will use pointer size for each configuration
   object in private PMD element it processes;
2. RTE flow verification will not reject elements with negative type.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v4:
* update the 'Negative types' section in the rtre_flow.rst

* update the patch comment

v5:
* rebase to next-net
---
 doc/guides/prog_guide/rte_flow.rst     |  3 +++
 doc/guides/rel_notes/release_20_11.rst |  5 +++++
 lib/librte_ethdev/rte_flow.c           | 28 ++++++++++++++++++++------
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 6ee0d3a10a..7fb5ec9059 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2775,6 +2775,9 @@ identifiers they are not aware of.
 
 A method to generate them remains to be defined.
 
+Application may use PMD dynamic items or actions in flow rules. In that case
+size of configuration object in dynamic element must be a pointer size.
+
 Planned types
 ~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 79d9ebac4e..9155b468d6 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -116,6 +116,11 @@ New Features
   * Updated HWRM structures to 1.10.1.70 version.
   * Added TRUFLOW support for Stingray devices.
 
+* **Flow rules allowed to use private PMD items / actions.**
+
+  * Flow rule verification was updated to accept private PMD
+    items and actions.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 686fe40eaa..b74ea5593a 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -518,7 +518,11 @@ rte_flow_conv_item_spec(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_item[item->type].size;
+		/**
+		 * allow PMD private flow item
+		 */
+		off = (int)item->type >= 0 ?
+		      rte_flow_desc_item[item->type].size : sizeof(void *);
 		rte_memcpy(buf, data, (size > off ? off : size));
 		break;
 	}
@@ -621,7 +625,11 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
 		}
 		break;
 	default:
-		off = rte_flow_desc_action[action->type].size;
+		/**
+		 * allow PMD private flow action
+		 */
+		off = (int)action->type >= 0 ?
+		      rte_flow_desc_action[action->type].size : sizeof(void *);
 		rte_memcpy(buf, action->conf, (size > off ? off : size));
 		break;
 	}
@@ -663,8 +671,12 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
-		    !rte_flow_desc_item[src->type].name)
+		/**
+		 * allow PMD private flow item
+		 */
+		if (((int)src->type >= 0) &&
+			((size_t)src->type >= RTE_DIM(rte_flow_desc_item) ||
+		    !rte_flow_desc_item[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM, src,
 				 "cannot convert unknown item type");
@@ -752,8 +764,12 @@ rte_flow_conv_actions(struct rte_flow_action *dst,
 	unsigned int i;
 
 	for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
-		if ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
-		    !rte_flow_desc_action[src->type].name)
+		/**
+		 * allow PMD private flow action
+		 */
+		if (((int)src->type >= 0) &&
+		    ((size_t)src->type >= RTE_DIM(rte_flow_desc_action) ||
+		    !rte_flow_desc_action[src->type].name))
 			return rte_flow_error_set
 				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				 src, "cannot convert unknown action type");
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model
  2020-10-16 12:51 ` [dpdk-dev] [PATCH v8 " Gregory Etelson
  2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
@ 2020-10-16 12:51   ` Gregory Etelson
  2020-10-16 15:41     ` Kinsella, Ray
  2021-03-02  9:22     ` Ivan Malov
  2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
  2020-10-16 13:19   ` [dpdk-dev] [PATCH v8 0/3] Tunnel Offload API Ferruh Yigit
  3 siblings, 2 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16 12:51 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, asafp, Eli Britstein,
	Ori Kam, Viacheslav Ovsiienko, Ray Kinsella, Neil Horman,
	Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko

From: Eli Britstein <elibr@mellanox.com>

rte_flow API provides the building blocks for vendor-agnostic flow
classification offloads. The rte_flow "patterns" and "actions"
primitives are fine-grained, thus enabling DPDK applications the
flexibility to offload network stacks and complex pipelines.
Applications wishing to offload tunneled traffic are required to use
the rte_flow primitives, such as group, meta, mark, tag, and others to
model their high-level objects.  The hardware model design for
high-level software objects is not trivial.  Furthermore, an optimal
design is often vendor-specific.

When hardware offloads tunneled traffic in multi-group logic,
partially offloaded packets may arrive to the application after they
were modified in hardware. In this case, the application may need to
restore the original packet headers. Consider the following sequence:
The application decaps a packet in one group and jumps to a second
group where it tries to match on a 5-tuple, that will miss and send
the packet to the application. In this case, the application does not
receive the original packet but a modified one. Also, in this case,
the application cannot match on the outer header fields, such as VXLAN
vni and 5-tuple.

There are several possible ways to use rte_flow "patterns" and
"actions" to resolve the issues above. For example:
1 Mapping headers to a hardware registers using the
rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
2 Apply the decap only at the last offload stage after all the
"patterns" were matched and the packet will be fully offloaded.
Every approach has its pros and cons and is highly dependent on the
hardware vendor.  For example, some hardware may have a limited number
of registers while other hardware could not support inner actions and
must decap before accessing inner headers.

The tunnel offload model resolves these issues. The model goals are:
1 Provide a unified application API to offload tunneled traffic that
is capable to match on outer headers after decap.
2 Allow the application to restore the outer header of partially
offloaded packets.

The tunnel offload model does not introduce new elements to the
existing RTE flow model and is implemented as a set of helper
functions.

For the application to work with the tunnel offload API it
has to adjust flow rules in multi-table tunnel offload in the
following way:
1 Remove explicit call to decap action and replace it with PMD actions
obtained from rte_flow_tunnel_decap_and_set() helper.
2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
other rules in the tunnel offload sequence.

VXLAN Code example:

Assume application needs to do inner NAT on the VXLAN packet.
The first  rule in group 0:

flow create <port id> ingress group 0
  pattern eth / ipv4 / udp dst is 4789 / vxlan / end
  actions {pmd actions} / jump group 3 / end

The first VXLAN packet that arrives matches the rule in group 0 and
jumps to group 3.  In group 3 the packet will miss since there is no
flow to match and will be sent to the application.  Application  will
call rte_flow_get_restore_info() to get the packet outer header.

Application will insert a new rule in group 3 to match outer and inner
headers:

flow create <port id> ingress group 3
  pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
          udp dst 4789 / vxlan vni is 10 /
          ipv4 dst is 184.1.2.3 / end
  actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end

Resulting of the rules will be that VXLAN packet with vni=10, outer
IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received
decapped on queue 3 with IPv4 dst=186.1.1.1

Note: The packet in group 3 is considered decapped. All actions in
that group will be done on the header that was inner before decap. The
application may specify an outer header to be matched on.  It's PMD
responsibility to translate these items to outer metadata.

API usage:

/**
 * 1. Initiate RTE flow tunnel object
 */
const struct rte_flow_tunnel tunnel = {
  .type = RTE_FLOW_ITEM_TYPE_VXLAN,
  .tun_id = 10,
}

/**
 * 2. Obtain PMD tunnel actions
 *
 * pmd_actions is an intermediate variable application uses to
 * compile actions array
 */
struct rte_flow_action **pmd_actions;
rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
                              &num_pmd_actions, &error);
/**
 * 3. offload the first  rule
 * matching on VXLAN traffic and jumps to group 3
 * (implicitly decaps packet)
 */
app_actions  =   jump group 3
rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
rule_actions = { pmd_actions, app_actions };
attr.group = 0;
flow_1 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
  * 4. after flow creation application does not need to keep the
  * tunnel action resources.
  */
rte_flow_tunnel_action_release(port_id, pmd_actions,
                               num_pmd_actions);
/**
  * 5. After partially offloaded packet miss because there was no
  * matching rule handle miss on group 3
  */
struct rte_flow_restore_info info;
rte_flow_get_restore_info(port_id, mbuf, &info, &error);

/**
 * 6. Offload NAT rule:
 */
app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
            vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }

rte_flow_tunnel_match(&info.tunnel, &pmd_items,
                      &num_pmd_items,  &error);
rule_items = {pmd_items, app_items};
rule_actions = app_actions;
attr.group = info.group_id;
flow_2 = rte_flow_create(port_id, &attr,
                         rule_items, rule_actions, &error);

/**
 * 7. Release PMD items after rule creation
 */
rte_flow_tunnel_item_release(port_id,
                             pmd_items, num_pmd_items);

References
1. https://mails.dpdk.org/archives/dev/2020-June/index.html

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

---
v5:
* rebase to next-net

v6:
* update the patch comment
* update tunnel offload section in rte_flow.rst
---
 doc/guides/prog_guide/rte_flow.rst       |  78 +++++++++
 doc/guides/rel_notes/release_20_11.rst   |   5 +
 lib/librte_ethdev/rte_ethdev_version.map |   5 +
 lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
 lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
 lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
 6 files changed, 427 insertions(+)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 7fb5ec9059..8dc048c6f4 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3131,6 +3131,84 @@ operations include:
 - Duplication of a complete flow rule description.
 - Pattern item or action name retrieval.
 
+Tunneled traffic offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+rte_flow API provides the building blocks for vendor-agnostic flow
+classification offloads. The rte_flow "patterns" and "actions"
+primitives are fine-grained, thus enabling DPDK applications the
+flexibility to offload network stacks and complex pipelines.
+Applications wishing to offload tunneled traffic are required to use
+the rte_flow primitives, such as group, meta, mark, tag, and others to
+model their high-level objects.  The hardware model design for
+high-level software objects is not trivial.  Furthermore, an optimal
+design is often vendor-specific.
+
+When hardware offloads tunneled traffic in multi-group logic,
+partially offloaded packets may arrive to the application after they
+were modified in hardware. In this case, the application may need to
+restore the original packet headers. Consider the following sequence:
+The application decaps a packet in one group and jumps to a second
+group where it tries to match on a 5-tuple, that will miss and send
+the packet to the application. In this case, the application does not
+receive the original packet but a modified one. Also, in this case,
+the application cannot match on the outer header fields, such as VXLAN
+vni and 5-tuple.
+
+There are several possible ways to use rte_flow "patterns" and
+"actions" to resolve the issues above. For example:
+
+1 Mapping headers to a hardware registers using the
+rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
+
+2 Apply the decap only at the last offload stage after all the
+"patterns" were matched and the packet will be fully offloaded.
+
+Every approach has its pros and cons and is highly dependent on the
+hardware vendor.  For example, some hardware may have a limited number
+of registers while other hardware could not support inner actions and
+must decap before accessing inner headers.
+
+The tunnel offload model resolves these issues. The model goals are:
+
+1 Provide a unified application API to offload tunneled traffic that
+is capable to match on outer headers after decap.
+
+2 Allow the application to restore the outer header of partially
+offloaded packets.
+
+The tunnel offload model does not introduce new elements to the
+existing RTE flow model and is implemented as a set of helper
+functions.
+
+For the application to work with the tunnel offload API it
+has to adjust flow rules in multi-table tunnel offload in the
+following way:
+
+1 Remove explicit call to decap action and replace it with PMD actions
+obtained from rte_flow_tunnel_decap_and_set() helper.
+
+2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
+other rules in the tunnel offload sequence.
+
+The model requirements:
+
+Software application must initialize
+rte_tunnel object with tunnel parameters before calling
+rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
+
+PMD actions array obtained in rte_flow_tunnel_decap_set() must be
+released by application with rte_flow_action_release() call.
+
+PMD items array obtained with rte_flow_tunnel_match() must be released
+by application with rte_flow_item_release() call.  Application can
+release PMD items and actions after rule was created. However, if the
+application needs to create additional rule for the same tunnel it
+will need to obtain PMD items again.
+
+Application cannot destroy rte_tunnel object before it releases all
+PMD actions & PMD items referencing that tunnel.
+
 Caveats
 -------
 
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 9155b468d6..f125ce79dd 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -121,6 +121,11 @@ New Features
   * Flow rule verification was updated to accept private PMD
     items and actions.
 
+* **Added generic API to offload tunneled traffic and restore missed packet.**
+
+  * Added a new hardware independent helper API to RTE flow library that
+    offloads tunneled traffic and restores missed packets.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
index f64c379ac2..8ddda2547f 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -239,6 +239,11 @@ EXPERIMENTAL {
 	rte_flow_shared_action_destroy;
 	rte_flow_shared_action_query;
 	rte_flow_shared_action_update;
+	rte_flow_tunnel_decap_set;
+	rte_flow_tunnel_match;
+	rte_flow_get_restore_info;
+	rte_flow_tunnel_action_decap_release;
+	rte_flow_tunnel_item_release;
 };
 
 INTERNAL {
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index b74ea5593a..380c5cae2c 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1143,3 +1143,115 @@ rte_flow_shared_action_query(uint16_t port_id,
 				       data, error);
 	return flow_err(port_id, ret, error);
 }
+
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_decap_set)) {
+		return flow_err(port_id,
+				ops->tunnel_decap_set(dev, tunnel, actions,
+						      num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->tunnel_match)) {
+		return flow_err(port_id,
+				ops->tunnel_match(dev, tunnel, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *restore_info,
+			  struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->get_restore_info)) {
+		return flow_err(port_id,
+				ops->get_restore_info(dev, m, restore_info,
+						      error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->action_release)) {
+		return flow_err(port_id,
+				ops->action_release(dev, actions,
+						    num_of_actions, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
+
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->item_release)) {
+		return flow_err(port_id,
+				ops->item_release(dev, items,
+						  num_of_items, error),
+				error);
+	}
+	return rte_flow_error_set(error, ENOTSUP,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOTSUP));
+}
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 48395284b5..a8eac4deb8 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3620,6 +3620,201 @@ rte_flow_shared_action_query(uint16_t port_id,
 			     void *data,
 			     struct rte_flow_error *error);
 
+/* Tunnel has a type and the key information. */
+struct rte_flow_tunnel {
+	/**
+	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
+	 */
+	enum rte_flow_item_type	type;
+	uint64_t tun_id; /**< Tunnel identification. */
+
+	RTE_STD_C11
+	union {
+		struct {
+			rte_be32_t src_addr; /**< IPv4 source address. */
+			rte_be32_t dst_addr; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src_addr[16]; /**< IPv6 source address. */
+			uint8_t dst_addr[16]; /**< IPv6 destination address. */
+		} ipv6;
+	};
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+	uint16_t   tun_flags; /**< Tunnel flags. */
+
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+
+	/**
+	 * the following members are required to restore packet
+	 * after miss
+	 */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	uint32_t label; /**< Flow Label for IPv6. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+struct rte_flow_restore_info {
+	/**
+	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	 * other fields in struct rte_flow_restore_info.
+	 */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID where packed missed */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-decap-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
+ *            jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_decap_set(uint16_t port_id,
+			  struct rte_flow_tunnel *tunnel,
+			  struct rte_flow_action **actions,
+			  uint32_t *num_of_actions,
+			  struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *info,
+			  struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_decap_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[in] num_of_actions
+ *   Number of elements in actions array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_action_decap_release(uint16_t port_id,
+				     struct rte_flow_action *actions,
+				     uint32_t num_of_actions,
+				     struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[in] num_of_items
+ *   Number of elements in item array.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_item_release(uint16_t port_id,
+			     struct rte_flow_item *items,
+			     uint32_t num_of_items,
+			     struct rte_flow_error *error);
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index 58f56b0262..bd5ffc0bb1 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -131,6 +131,38 @@ struct rte_flow_ops {
 		 const struct rte_flow_shared_action *shared_action,
 		 void *data,
 		 struct rte_flow_error *error);
+	/** See rte_flow_tunnel_decap_set() */
+	int (*tunnel_decap_set)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_action **pmd_actions,
+		 uint32_t *num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_tunnel_match() */
+	int (*tunnel_match)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_tunnel *tunnel,
+		 struct rte_flow_item **pmd_items,
+		 uint32_t *num_of_items,
+		 struct rte_flow_error *err);
+	/** See rte_flow_get_rte_flow_restore_info() */
+	int (*get_restore_info)
+		(struct rte_eth_dev *dev,
+		 struct rte_mbuf *m,
+		 struct rte_flow_restore_info *info,
+		 struct rte_flow_error *err);
+	/** See rte_flow_action_tunnel_decap_release() */
+	int (*action_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_action *pmd_actions,
+		 uint32_t num_of_actions,
+		 struct rte_flow_error *err);
+	/** See rte_flow_item_release() */
+	int (*item_release)
+		(struct rte_eth_dev *dev,
+		 struct rte_flow_item *pmd_items,
+		 uint32_t num_of_items,
+		 struct rte_flow_error *err);
 };
 
 /**
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v8 3/3] app/testpmd: add commands for tunnel offload API
  2020-10-16 12:51 ` [dpdk-dev] [PATCH v8 " Gregory Etelson
  2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
  2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model Gregory Etelson
@ 2020-10-16 12:51   ` Gregory Etelson
  2020-10-16 13:19   ` [dpdk-dev] [PATCH v8 0/3] Tunnel Offload API Ferruh Yigit
  3 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-16 12:51 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, asafp, Ori Kam,
	Wenzhuo Lu, Beilei Xing, Bernard Iremonger

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:

* Create application tunnel:
flow tunnel create <port> type <tunnel type>
On success, the command creates application tunnel object and returns
the tunnel descriptor. Tunnel descriptor is used in subsequent flow
creation commands to reference the tunnel.

* Create tunnel steering flow rule:
tunnel_set <tunnel descriptor> parameter used with steering rule
template.

* Create tunnel matching flow rule:
tunnel_match <tunnel descriptor> used with matching rule template.

* If tunnel steering rule was offloaded, outer header of a partially
offloaded packet is restored after miss.

Example:
test packet=
<Ether  dst=24:8a:07:8d:ae:d6 src=50:6b:4b:cc:fc:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=udp src=1.1.1.1 dst=1.1.1.10 |
<UDP  sport=4789 dport=4789 len=58 chksum=0x7f7b |
<VXLAN  NextProtocol=Ethernet vni=0x0 |
<Ether  dst=24:aa:aa:aa:aa:d6 src=50:bb:bb:bb:bb:e2 type=IPv4 |
<IP  version=4 ihl=5 proto=icmp src=2.2.2.2 dst=2.2.2.200 |
<ICMP  type=echo-request code=0 chksum=0xf7ff id=0x0 seq=0x0 |>>>>>>>
>>> len(packet)
92

testpmd> flow flush 0
testpmd> port 0/queue 0: received 1 packets
src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow tunnel 0 type vxlan
port 0: flow tunnel #1 type vxlan
testpmd> flow create 0 ingress group 0 tunnel_set 1
         pattern eth /ipv4 / udp dst is 4789 / vxlan / end
         actions  jump group 0 / end
Flow rule #0 created
testpmd> port 0/queue 0: received 1 packets
tunnel restore info: - vxlan tunnel - outer header present # <--
  src=50:6B:4B:CC:FC:E2 - dst=24:8A:07:8D:AE:D6 - type=0x0800 -
length=92

testpmd> flow create 0 ingress group 0 tunnel_match 1
         pattern eth / ipv4 / udp dst is 4789 / vxlan / eth / ipv4 /
         end
         actions set_mac_dst mac_addr 02:CA:FE:CA:FA:80 /
         queue index 0 / end
Flow rule #1 created
testpmd> port 0/queue 0: received 1 packets
  src=50:BB:BB:BB:BB:E2 - dst=02:CA:FE:CA:FA:80 - type=0x0800 -
length=42

* Destroy flow tunnel
flow tunnel destroy <port> id <tunnel id>

* Show existing flow tunnels
flow tunnel list <port>

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
v2:
* introduce testpmd support for tunnel offload API

v3:
* update flow tunnel commands

v5:
* rebase to next-net

v7:
* resolve "%lu" differences in ubuntu 32 & 64

v8:
* use PRIu64 for general cast to uint64_t
---
 app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
 app/test-pmd/config.c                       | 252 +++++++++++++++++++-
 app/test-pmd/testpmd.c                      |   5 +-
 app/test-pmd/testpmd.h                      |  34 ++-
 app/test-pmd/util.c                         |  35 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
 6 files changed, 532 insertions(+), 13 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 00c70a144a..b9a1f7178a 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -74,6 +74,14 @@ enum index {
 	LIST,
 	AGED,
 	ISOLATE,
+	TUNNEL,
+
+	/* Tunnel arguments. */
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	TUNNEL_LIST,
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
 
 	/* Destroy arguments. */
 	DESTROY_RULE,
@@ -93,6 +101,8 @@ enum index {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 
 	/* Shared action arguments */
 	SHARED_ACTION_CREATE,
@@ -713,6 +723,7 @@ struct buffer {
 		} sa; /* Shared action query arguments */
 		struct {
 			struct rte_flow_attr attr;
+			struct tunnel_ops tunnel_ops;
 			struct rte_flow_item *pattern;
 			struct rte_flow_action *actions;
 			uint32_t pattern_n;
@@ -789,10 +800,32 @@ static const enum index next_vc_attr[] = {
 	INGRESS,
 	EGRESS,
 	TRANSFER,
+	TUNNEL_SET,
+	TUNNEL_MATCH,
 	PATTERN,
 	ZERO,
 };
 
+static const enum index tunnel_create_attr[] = {
+	TUNNEL_CREATE,
+	TUNNEL_CREATE_TYPE,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_destroy_attr[] = {
+	TUNNEL_DESTROY,
+	TUNNEL_DESTROY_ID,
+	END,
+	ZERO,
+};
+
+static const enum index tunnel_list_attr[] = {
+	TUNNEL_LIST,
+	END,
+	ZERO,
+};
+
 static const enum index next_destroy_attr[] = {
 	DESTROY_RULE,
 	END,
@@ -1643,6 +1676,9 @@ static int parse_aged(struct context *, const struct token *,
 static int parse_isolate(struct context *, const struct token *,
 			 const char *, unsigned int,
 			 void *, unsigned int);
+static int parse_tunnel(struct context *, const struct token *,
+			const char *, unsigned int,
+			void *, unsigned int);
 static int parse_int(struct context *, const struct token *,
 		     const char *, unsigned int,
 		     void *, unsigned int);
@@ -1844,7 +1880,8 @@ static const struct token token_list[] = {
 			      LIST,
 			      AGED,
 			      QUERY,
-			      ISOLATE)),
+			      ISOLATE,
+			      TUNNEL)),
 		.call = parse_init,
 	},
 	/* Top-level command. */
@@ -1955,6 +1992,49 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY(struct buffer, port)),
 		.call = parse_isolate,
 	},
+	[TUNNEL] = {
+		.name = "tunnel",
+		.help = "new tunnel API",
+		.next = NEXT(NEXT_ENTRY
+			     (TUNNEL_CREATE, TUNNEL_LIST, TUNNEL_DESTROY)),
+		.call = parse_tunnel,
+	},
+	/* Tunnel arguments. */
+	[TUNNEL_CREATE] = {
+		.name = "create",
+		.help = "create new tunnel object",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_CREATE_TYPE] = {
+		.name = "type",
+		.help = "create new tunnel",
+		.next = NEXT(tunnel_create_attr, NEXT_ENTRY(FILE_PATH)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, type)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY] = {
+		.name = "destroy",
+		.help = "destroy tunel",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_DESTROY_ID] = {
+		.name = "id",
+		.help = "tunnel identifier to testroy",
+		.next = NEXT(tunnel_destroy_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_tunnel,
+	},
+	[TUNNEL_LIST] = {
+		.name = "list",
+		.help = "list existing tunnels",
+		.next = NEXT(tunnel_list_attr, NEXT_ENTRY(PORT_ID)),
+		.args = ARGS(ARGS_ENTRY(struct buffer, port)),
+		.call = parse_tunnel,
+	},
 	/* Destroy arguments. */
 	[DESTROY_RULE] = {
 		.name = "rule",
@@ -2018,6 +2098,20 @@ static const struct token token_list[] = {
 		.next = NEXT(next_vc_attr),
 		.call = parse_vc,
 	},
+	[TUNNEL_SET] = {
+		.name = "tunnel_set",
+		.help = "tunnel steer rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
+	[TUNNEL_MATCH] = {
+		.name = "tunnel_match",
+		.help = "tunnel match rule",
+		.next = NEXT(next_vc_attr, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct tunnel_ops, id)),
+		.call = parse_vc,
+	},
 	/* Validate/create pattern. */
 	[PATTERN] = {
 		.name = "pattern",
@@ -4495,12 +4589,28 @@ parse_vc(struct context *ctx, const struct token *token,
 		return len;
 	}
 	ctx->objdata = 0;
-	ctx->object = &out->args.vc.attr;
+	switch (ctx->curr) {
+	default:
+		ctx->object = &out->args.vc.attr;
+		break;
+	case TUNNEL_SET:
+	case TUNNEL_MATCH:
+		ctx->object = &out->args.vc.tunnel_ops;
+		break;
+	}
 	ctx->objmask = NULL;
 	switch (ctx->curr) {
 	case GROUP:
 	case PRIORITY:
 		return len;
+	case TUNNEL_SET:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.actions = 1;
+		return len;
+	case TUNNEL_MATCH:
+		out->args.vc.tunnel_ops.enabled = 1;
+		out->args.vc.tunnel_ops.items = 1;
+		return len;
 	case INGRESS:
 		out->args.vc.attr.ingress = 1;
 		return len;
@@ -6108,6 +6218,47 @@ parse_isolate(struct context *ctx, const struct token *token,
 	return len;
 }
 
+static int
+parse_tunnel(struct context *ctx, const struct token *token,
+	     const char *str, unsigned int len,
+	     void *buf, unsigned int size)
+{
+	struct buffer *out = buf;
+
+	/* Token name must match. */
+	if (parse_default(ctx, token, str, len, NULL, 0) < 0)
+		return -1;
+	/* Nothing else to do if there is no buffer. */
+	if (!out)
+		return len;
+	if (!out->command) {
+		if (ctx->curr != TUNNEL)
+			return -1;
+		if (sizeof(*out) > size)
+			return -1;
+		out->command = ctx->curr;
+		ctx->objdata = 0;
+		ctx->object = out;
+		ctx->objmask = NULL;
+	} else {
+		switch (ctx->curr) {
+		default:
+			break;
+		case TUNNEL_CREATE:
+		case TUNNEL_DESTROY:
+		case TUNNEL_LIST:
+			out->command = ctx->curr;
+			break;
+		case TUNNEL_CREATE_TYPE:
+		case TUNNEL_DESTROY_ID:
+			ctx->object = &out->args.vc.tunnel_ops;
+			break;
+		}
+	}
+
+	return len;
+}
+
 /**
  * Parse signed/unsigned integers 8 to 64-bit long.
  *
@@ -7148,11 +7299,13 @@ cmd_flow_parsed(const struct buffer *in)
 		break;
 	case VALIDATE:
 		port_flow_validate(in->port, &in->args.vc.attr,
-				   in->args.vc.pattern, in->args.vc.actions);
+				   in->args.vc.pattern, in->args.vc.actions,
+				   &in->args.vc.tunnel_ops);
 		break;
 	case CREATE:
 		port_flow_create(in->port, &in->args.vc.attr,
-				 in->args.vc.pattern, in->args.vc.actions);
+				 in->args.vc.pattern, in->args.vc.actions,
+				 &in->args.vc.tunnel_ops);
 		break;
 	case DESTROY:
 		port_flow_destroy(in->port, in->args.destroy.rule_n,
@@ -7178,6 +7331,15 @@ cmd_flow_parsed(const struct buffer *in)
 	case AGED:
 		port_flow_aged(in->port, in->args.aged.destroy);
 		break;
+	case TUNNEL_CREATE:
+		port_flow_tunnel_create(in->port, &in->args.vc.tunnel_ops);
+		break;
+	case TUNNEL_DESTROY:
+		port_flow_tunnel_destroy(in->port, in->args.vc.tunnel_ops.id);
+		break;
+	case TUNNEL_LIST:
+		port_flow_tunnel_list(in->port);
+		break;
 	default:
 		break;
 	}
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 2c00b55440..c9505e5661 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1521,6 +1521,115 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 
 /* Generic flow management functions. */
 
+static struct port_flow_tunnel *
+port_flow_locate_tunnel_id(struct rte_port *port, uint32_t port_tunnel_id)
+{
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (flow_tunnel->id == port_tunnel_id)
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+const char *
+port_flow_tunnel_type(struct rte_flow_tunnel *tunnel)
+{
+	const char *type;
+	switch (tunnel->type) {
+	default:
+		type = "unknown";
+		break;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		type = "vxlan";
+		break;
+	}
+
+	return type;
+}
+
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flow_tunnel;
+
+	LIST_FOREACH(flow_tunnel, &port->flow_tunnel_list, chain) {
+		if (!memcmp(&flow_tunnel->tunnel, tun, sizeof(*tun)))
+			goto out;
+	}
+	flow_tunnel = NULL;
+
+out:
+	return flow_tunnel;
+}
+
+void port_flow_tunnel_list(portid_t port_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		printf("port %u tunnel #%u type=%s",
+			port_id, flt->id, port_flow_tunnel_type(&flt->tunnel));
+		if (flt->tunnel.tun_id)
+			printf(" id=%" PRIu64, flt->tunnel.tun_id);
+		printf("\n");
+	}
+}
+
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id)
+{
+	struct rte_port *port = &ports[port_id];
+	struct port_flow_tunnel *flt;
+
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->id == tunnel_id)
+			break;
+	}
+	if (flt) {
+		LIST_REMOVE(flt, chain);
+		free(flt);
+		printf("port %u: flow tunnel #%u destroyed\n",
+			port_id, tunnel_id);
+	}
+}
+
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops)
+{
+	struct rte_port *port = &ports[port_id];
+	enum rte_flow_item_type	type;
+	struct port_flow_tunnel *flt;
+
+	if (!strcmp(ops->type, "vxlan"))
+		type = RTE_FLOW_ITEM_TYPE_VXLAN;
+	else {
+		printf("cannot offload \"%s\" tunnel type\n", ops->type);
+		return;
+	}
+	LIST_FOREACH(flt, &port->flow_tunnel_list, chain) {
+		if (flt->tunnel.type == type)
+			break;
+	}
+	if (!flt) {
+		flt = calloc(1, sizeof(*flt));
+		if (!flt) {
+			printf("failed to allocate port flt object\n");
+			return;
+		}
+		flt->tunnel.type = type;
+		flt->id = LIST_EMPTY(&port->flow_tunnel_list) ? 1 :
+				  LIST_FIRST(&port->flow_tunnel_list)->id + 1;
+		LIST_INSERT_HEAD(&port->flow_tunnel_list, flt, chain);
+	}
+	printf("port %d: flow tunnel #%u type %s\n",
+		port_id, flt->id, ops->type);
+}
+
 /** Generate a port_flow entry from attributes/pattern/actions. */
 static struct port_flow *
 port_flow_new(const struct rte_flow_attr *attr,
@@ -1860,20 +1969,137 @@ port_shared_action_query(portid_t port_id, uint32_t id)
 	}
 	return ret;
 }
+static struct port_flow_tunnel *
+port_flow_tunnel_offload_cmd_prep(portid_t port_id,
+				  const struct rte_flow_item *pattern,
+				  const struct rte_flow_action *actions,
+				  const struct tunnel_ops *tunnel_ops)
+{
+	int ret;
+	struct rte_port *port;
+	struct port_flow_tunnel *pft;
+	struct rte_flow_error error;
+
+	port = &ports[port_id];
+	pft = port_flow_locate_tunnel_id(port, tunnel_ops->id);
+	if (!pft) {
+		printf("failed to locate port flow tunnel #%u\n",
+			tunnel_ops->id);
+		return NULL;
+	}
+	if (tunnel_ops->actions) {
+		uint32_t num_actions;
+		const struct rte_flow_action *aptr;
+
+		ret = rte_flow_tunnel_decap_set(port_id, &pft->tunnel,
+						&pft->pmd_actions,
+						&pft->num_pmd_actions,
+						&error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (aptr = actions, num_actions = 1;
+		     aptr->type != RTE_FLOW_ACTION_TYPE_END;
+		     aptr++, num_actions++);
+		pft->actions = malloc(
+				(num_actions +  pft->num_pmd_actions) *
+				sizeof(actions[0]));
+		if (!pft->actions) {
+			rte_flow_tunnel_action_decap_release(
+					port_id, pft->actions,
+					pft->num_pmd_actions, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->actions, pft->pmd_actions,
+			   pft->num_pmd_actions * sizeof(actions[0]));
+		rte_memcpy(pft->actions + pft->num_pmd_actions, actions,
+			   num_actions * sizeof(actions[0]));
+	}
+	if (tunnel_ops->items) {
+		uint32_t num_items;
+		const struct rte_flow_item *iptr;
+
+		ret = rte_flow_tunnel_match(port_id, &pft->tunnel,
+					    &pft->pmd_items,
+					    &pft->num_pmd_items,
+					    &error);
+		if (ret) {
+			port_flow_complain(&error);
+			return NULL;
+		}
+		for (iptr = pattern, num_items = 1;
+		     iptr->type != RTE_FLOW_ITEM_TYPE_END;
+		     iptr++, num_items++);
+		pft->items = malloc((num_items + pft->num_pmd_items) *
+				    sizeof(pattern[0]));
+		if (!pft->items) {
+			rte_flow_tunnel_item_release(
+					port_id, pft->pmd_items,
+					pft->num_pmd_items, &error);
+			return NULL;
+		}
+		rte_memcpy(pft->items, pft->pmd_items,
+			   pft->num_pmd_items * sizeof(pattern[0]));
+		rte_memcpy(pft->items + pft->num_pmd_items, pattern,
+			   num_items * sizeof(pattern[0]));
+	}
+
+	return pft;
+}
+
+static void
+port_flow_tunnel_offload_cmd_release(portid_t port_id,
+				     const struct tunnel_ops *tunnel_ops,
+				     struct port_flow_tunnel *pft)
+{
+	struct rte_flow_error error;
+
+	if (tunnel_ops->actions) {
+		free(pft->actions);
+		rte_flow_tunnel_action_decap_release(
+			port_id, pft->pmd_actions,
+			pft->num_pmd_actions, &error);
+		pft->actions = NULL;
+		pft->pmd_actions = NULL;
+	}
+	if (tunnel_ops->items) {
+		free(pft->items);
+		rte_flow_tunnel_item_release(port_id, pft->pmd_items,
+					     pft->num_pmd_items,
+					     &error);
+		pft->items = NULL;
+		pft->pmd_items = NULL;
+	}
+}
 
 /** Validate flow rule. */
 int
 port_flow_validate(portid_t port_id,
 		   const struct rte_flow_attr *attr,
 		   const struct rte_flow_item *pattern,
-		   const struct rte_flow_action *actions)
+		   const struct rte_flow_action *actions,
+		   const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	/* Poisoning to make sure PMDs update it in case of error. */
 	memset(&error, 0x11, sizeof(error));
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	if (rte_flow_validate(port_id, attr, pattern, actions, &error))
 		return port_flow_complain(&error);
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule validated\n");
 	return 0;
 }
@@ -1903,13 +2129,15 @@ int
 port_flow_create(portid_t port_id,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item *pattern,
-		 const struct rte_flow_action *actions)
+		 const struct rte_flow_action *actions,
+		 const struct tunnel_ops *tunnel_ops)
 {
 	struct rte_flow *flow;
 	struct rte_port *port;
 	struct port_flow *pf;
 	uint32_t id = 0;
 	struct rte_flow_error error;
+	struct port_flow_tunnel *pft = NULL;
 
 	port = &ports[port_id];
 	if (port->flow_list) {
@@ -1920,6 +2148,16 @@ port_flow_create(portid_t port_id,
 		}
 		id = port->flow_list->id + 1;
 	}
+	if (tunnel_ops->enabled) {
+		pft = port_flow_tunnel_offload_cmd_prep(port_id, pattern,
+							actions, tunnel_ops);
+		if (!pft)
+			return -ENOENT;
+		if (pft->items)
+			pattern = pft->items;
+		if (pft->actions)
+			actions = pft->actions;
+	}
 	pf = port_flow_new(attr, pattern, actions, &error);
 	if (!pf)
 		return port_flow_complain(&error);
@@ -1935,6 +2173,8 @@ port_flow_create(portid_t port_id,
 	pf->id = id;
 	pf->flow = flow;
 	port->flow_list = pf;
+	if (tunnel_ops->enabled)
+		port_flow_tunnel_offload_cmd_release(port_id, tunnel_ops, pft);
 	printf("Flow rule #%u created\n", pf->id);
 	return 0;
 }
@@ -2244,7 +2484,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group)
 		       pf->rule.attr->egress ? 'e' : '-',
 		       pf->rule.attr->transfer ? 't' : '-');
 		while (item->type != RTE_FLOW_ITEM_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
+			if ((uint32_t)item->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)item->type,
 					  NULL) <= 0)
@@ -2255,7 +2497,9 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group)
 		}
 		printf("=>");
 		while (action->type != RTE_FLOW_ACTION_TYPE_END) {
-			if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
+			if ((uint32_t)action->type > INT_MAX)
+				name = "PMD_INTERNAL";
+			else if (rte_flow_conv(RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
 					  &name, sizeof(name),
 					  (void *)(uintptr_t)action->type,
 					  NULL) <= 0)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6caba60988..333904d686 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3684,6 +3684,8 @@ init_port_dcb_config(portid_t pid,
 static void
 init_port(void)
 {
+	int i;
+
 	/* Configuration of Ethernet ports. */
 	ports = rte_zmalloc("testpmd: ports",
 			    sizeof(struct rte_port) * RTE_MAX_ETHPORTS,
@@ -3693,7 +3695,8 @@ init_port(void)
 				"rte_zmalloc(%d struct rte_port) failed\n",
 				RTE_MAX_ETHPORTS);
 	}
-
+	for (i = 0; i < RTE_MAX_ETHPORTS; i++)
+		LIST_INIT(&ports[i].flow_tunnel_list);
 	/* Initialize ports NUMA structures */
 	memset(port_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
 	memset(rxring_numa, NUMA_NO_CONFIG, RTE_MAX_ETHPORTS);
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f8b0a3517d..5238ac3dd5 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -12,6 +12,7 @@
 #include <rte_gro.h>
 #include <rte_gso.h>
 #include <cmdline.h>
+#include <sys/queue.h>
 
 #define RTE_PORT_ALL            (~(portid_t)0x0)
 
@@ -150,6 +151,26 @@ struct port_shared_action {
 	struct rte_flow_shared_action *action;	/**< Shared action handle. */
 };
 
+struct port_flow_tunnel {
+	LIST_ENTRY(port_flow_tunnel) chain;
+	struct rte_flow_action *pmd_actions;
+	struct rte_flow_item   *pmd_items;
+	uint32_t id;
+	uint32_t num_pmd_actions;
+	uint32_t num_pmd_items;
+	struct rte_flow_tunnel tunnel;
+	struct rte_flow_action *actions;
+	struct rte_flow_item *items;
+};
+
+struct tunnel_ops {
+	uint32_t id;
+	char type[16];
+	uint32_t enabled:1;
+	uint32_t actions:1;
+	uint32_t items:1;
+};
+
 /**
  * The data structure associated with each port.
  */
@@ -182,6 +203,7 @@ struct rte_port {
 	struct port_flow        *flow_list; /**< Associated flows. */
 	struct port_shared_action *actions_list;
 	/**< Associated shared actions. */
+	LIST_HEAD(, port_flow_tunnel) flow_tunnel_list;
 	const struct rte_eth_rxtx_callback *rx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	const struct rte_eth_rxtx_callback *tx_dump_cb[RTE_MAX_QUEUES_PER_PORT+1];
 	/**< metadata value to insert in Tx packets. */
@@ -773,11 +795,13 @@ int port_shared_action_update(portid_t port_id, uint32_t id,
 int port_flow_validate(portid_t port_id,
 		       const struct rte_flow_attr *attr,
 		       const struct rte_flow_item *pattern,
-		       const struct rte_flow_action *actions);
+		       const struct rte_flow_action *actions,
+		       const struct tunnel_ops *tunnel_ops);
 int port_flow_create(portid_t port_id,
 		     const struct rte_flow_attr *attr,
 		     const struct rte_flow_item *pattern,
-		     const struct rte_flow_action *actions);
+		     const struct rte_flow_action *actions,
+		     const struct tunnel_ops *tunnel_ops);
 int port_shared_action_query(portid_t port_id, uint32_t id);
 void update_age_action_context(const struct rte_flow_action *actions,
 		     struct port_flow *pf);
@@ -788,6 +812,12 @@ int port_flow_query(portid_t port_id, uint32_t rule,
 		    const struct rte_flow_action *action);
 void port_flow_list(portid_t port_id, uint32_t n, const uint32_t *group);
 void port_flow_aged(portid_t port_id, uint8_t destroy);
+const char *port_flow_tunnel_type(struct rte_flow_tunnel *tunnel);
+struct port_flow_tunnel *
+port_flow_locate_tunnel(uint16_t port_id, struct rte_flow_tunnel *tun);
+void port_flow_tunnel_list(portid_t port_id);
+void port_flow_tunnel_destroy(portid_t port_id, uint32_t tunnel_id);
+void port_flow_tunnel_create(portid_t port_id, const struct tunnel_ops *ops);
 int port_flow_isolate(portid_t port_id, int set);
 
 void rx_ring_desc_display(portid_t port_id, queueid_t rxq_id, uint16_t rxd_id);
diff --git a/app/test-pmd/util.c b/app/test-pmd/util.c
index 8488fa1a8f..781a813759 100644
--- a/app/test-pmd/util.c
+++ b/app/test-pmd/util.c
@@ -48,18 +48,49 @@ dump_pkt_burst(uint16_t port_id, uint16_t queue, struct rte_mbuf *pkts[],
 	       is_rx ? "received" : "sent",
 	       (unsigned int) nb_pkts);
 	for (i = 0; i < nb_pkts; i++) {
+		int ret;
+		struct rte_flow_error error;
+		struct rte_flow_restore_info info = { 0, };
+
 		mb = pkts[i];
 		eth_hdr = rte_pktmbuf_read(mb, 0, sizeof(_eth_hdr), &_eth_hdr);
 		eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
-		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
-
+		ret = rte_flow_get_restore_info(port_id, mb, &info, &error);
+		if (!ret) {
+			printf("restore info:");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_TUNNEL) {
+				struct port_flow_tunnel *port_tunnel;
+
+				port_tunnel = port_flow_locate_tunnel
+					      (port_id, &info.tunnel);
+				printf(" - tunnel");
+				if (port_tunnel)
+					printf(" #%u", port_tunnel->id);
+				else
+					printf(" %s", "-none-");
+				printf(" type %s",
+					port_flow_tunnel_type(&info.tunnel));
+			} else {
+				printf(" - no tunnel info");
+			}
+			if (info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED)
+				printf(" - outer header present");
+			else
+				printf(" - no outer header");
+			if (info.flags & RTE_FLOW_RESTORE_INFO_GROUP_ID)
+				printf(" - miss group %u", info.group_id);
+			else
+				printf(" - no miss group");
+			printf("\n");
+		}
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
 		printf(" - type=0x%04x - length=%u - nb_segs=%d",
 		       eth_type, (unsigned int) mb->pkt_len,
 		       (int)mb->nb_segs);
+		ol_flags = mb->ol_flags;
 		if (ol_flags & PKT_RX_RSS_HASH) {
 			printf(" - RSS hash=0x%x", (unsigned int) mb->hash.rss);
 			printf(" - RSS queue=0x%x", (unsigned int) queue);
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 43c0ea0599..05a4446757 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3749,6 +3749,45 @@ following sections.
 
    flow aged {port_id} [destroy]
 
+- Tunnel offload - create a tunnel stub::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+- Tunnel offload - destroy a tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+- Tunnel offload - list port tunnel stubs::
+
+   flow tunnel list {port_id}
+
+Creating a tunnel stub for offload
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel create`` setup a tunnel stub for tunnel offload flow rules::
+
+   flow tunnel create {port_id} type {tunnel_type}
+
+If successful, it will return a tunnel stub ID usable with other commands::
+
+   port [...]: flow tunnel #[...] type [...]
+
+Tunnel stub ID is relative to a port.
+
+Destroying tunnel offload stub
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel destroy`` destroy port tunnel stub::
+
+   flow tunnel destroy {port_id} id {tunnel_id}
+
+Listing tunnel offload stubs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``flow tunnel list`` list port tunnel offload stubs::
+
+   flow tunnel list {port_id}
+
 Validating flow rules
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -3795,6 +3834,7 @@ to ``rte_flow_create()``::
 
    flow create {port_id}
       [group {group_id}] [priority {level}] [ingress] [egress] [transfer]
+      [tunnel_set {tunnel_id}] [tunnel_match {tunnel_id}]
       pattern {item} [/ {item} [...]] / end
       actions {action} [/ {action} [...]] / end
 
@@ -3809,6 +3849,7 @@ Otherwise it will show an error message of the form::
 Parameters describe in the following order:
 
 - Attributes (*group*, *priority*, *ingress*, *egress*, *transfer* tokens).
+- Tunnel offload specification (tunnel_set, tunnel_match)
 - A matching pattern, starting with the *pattern* token and terminated by an
   *end* pattern item.
 - Actions, starting with the *actions* token and terminated by an *end*
@@ -3852,6 +3893,14 @@ Most rules affect RX therefore contain the ``ingress`` token::
 
    testpmd> flow create 0 ingress pattern [...]
 
+Tunnel offload
+^^^^^^^^^^^^^^
+
+Indicate tunnel offload rule type
+
+- ``tunnel_set {tunnel_id}``: mark rule as tunnel offload decap_set type.
+- ``tunnel_match {tunnel_id}``:  mark rule as tunel offload match type.
+
 Matching pattern
 ^^^^^^^^^^^^^^^^
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v8 0/3] Tunnel Offload API
  2020-10-16 12:51 ` [dpdk-dev] [PATCH v8 " Gregory Etelson
                     ` (2 preceding siblings ...)
  2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
@ 2020-10-16 13:19   ` Ferruh Yigit
  2020-10-16 14:20     ` Ferruh Yigit
  3 siblings, 1 reply; 95+ messages in thread
From: Ferruh Yigit @ 2020-10-16 13:19 UTC (permalink / raw)
  To: Gregory Etelson, dev; +Cc: matan, rasland, elibr, ozsh, asafp

On 10/16/2020 1:51 PM, Gregory Etelson wrote:
> Tunnel Offload API provides hardware independent, unified model
> to offload tunneled traffic. Key model elements are:
>   - apply matches to both outer and inner packet headers
>     during entire offload procedure;
>   - restore outer header of partially offloaded packet;
>   - model is implemented as a set of helper functions.
> 
>   v2:
>   * documentation updates
>   * MLX5 PMD implementation for tunnel offload
>   * testpmd updates for tunnel offload
> 
>   v3:
>   * documentation updates
>   * MLX5 PMD updates
>   * testpmd updates
> 
>   v4:
>   * updated patch: allow negative values in flow rule types
> 
> v5:
>   * rebase to next-net
> 
> v6:
> * update tunnel offload API documentation
> 
> v7:
> * testpmd: resolve "%lu" differences in ubuntu 32 & 64
> 
> v8:
> * testpmd: use PRIu64 for general cast to uint64_t

Ahh, I thought build issue solved in v7 already with 'PRIu64' but it seems not.

I will drop the v7 and get this one.

> 
> Eli Britstein (1):
>    ethdev: tunnel offload model
> 
> Gregory Etelson (2):
>    ethdev: allow negative values in flow rule types
>    app/testpmd: add commands for tunnel offload API
> 
>   app/test-pmd/cmdline_flow.c                 | 170 ++++++++++++-
>   app/test-pmd/config.c                       | 252 +++++++++++++++++++-
>   app/test-pmd/testpmd.c                      |   5 +-
>   app/test-pmd/testpmd.h                      |  34 ++-
>   app/test-pmd/util.c                         |  35 ++-
>   doc/guides/prog_guide/rte_flow.rst          |  81 +++++++
>   doc/guides/rel_notes/release_20_11.rst      |  10 +
>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  49 ++++
>   lib/librte_ethdev/rte_ethdev_version.map    |   5 +
>   lib/librte_ethdev/rte_flow.c                | 140 ++++++++++-
>   lib/librte_ethdev/rte_flow.h                | 195 +++++++++++++++
>   lib/librte_ethdev/rte_flow_driver.h         |  32 +++
>   12 files changed, 989 insertions(+), 19 deletions(-)
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v8 0/3] Tunnel Offload API
  2020-10-16 13:19   ` [dpdk-dev] [PATCH v8 0/3] Tunnel Offload API Ferruh Yigit
@ 2020-10-16 14:20     ` Ferruh Yigit
  0 siblings, 0 replies; 95+ messages in thread
From: Ferruh Yigit @ 2020-10-16 14:20 UTC (permalink / raw)
  To: Gregory Etelson, dev; +Cc: matan, rasland, elibr, ozsh, asafp

On 10/16/2020 2:19 PM, Ferruh Yigit wrote:
> On 10/16/2020 1:51 PM, Gregory Etelson wrote:
>> Tunnel Offload API provides hardware independent, unified model
>> to offload tunneled traffic. Key model elements are:
>>   - apply matches to both outer and inner packet headers
>>     during entire offload procedure;
>>   - restore outer header of partially offloaded packet;
>>   - model is implemented as a set of helper functions.
>>
>>   v2:
>>   * documentation updates
>>   * MLX5 PMD implementation for tunnel offload
>>   * testpmd updates for tunnel offload
>>
>>   v3:
>>   * documentation updates
>>   * MLX5 PMD updates
>>   * testpmd updates
>>
>>   v4:
>>   * updated patch: allow negative values in flow rule types
>>
>> v5:
>>   * rebase to next-net
>>
>> v6:
>> * update tunnel offload API documentation
>>
>> v7:
>> * testpmd: resolve "%lu" differences in ubuntu 32 & 64
>>
>> v8:
>> * testpmd: use PRIu64 for general cast to uint64_t
> 
> Ahh, I thought build issue solved in v7 already with 'PRIu64' but it seems not.
> 
> I will drop the v7 and get this one.
> 
>>
>> Eli Britstein (1):
>>    ethdev: tunnel offload model
>>
>> Gregory Etelson (2):
>>    ethdev: allow negative values in flow rule types
>>    app/testpmd: add commands for tunnel offload API
>>

Series applied to dpdk-next-net/main, thanks.

(v7 dropped and patchwork status updated)


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model
  2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model Gregory Etelson
@ 2020-10-16 15:41     ` Kinsella, Ray
  2021-03-02  9:22     ` Ivan Malov
  1 sibling, 0 replies; 95+ messages in thread
From: Kinsella, Ray @ 2020-10-16 15:41 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: matan, rasland, elibr, ozsh, asafp, Eli Britstein, Ori Kam,
	Viacheslav Ovsiienko, Neil Horman, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko



On 16/10/2020 13:51, Gregory Etelson wrote:
> From: Eli Britstein <elibr@mellanox.com>
> 
> rte_flow API provides the building blocks for vendor-agnostic flow
> classification offloads. The rte_flow "patterns" and "actions"
> primitives are fine-grained, thus enabling DPDK applications the
> flexibility to offload network stacks and complex pipelines.
> Applications wishing to offload tunneled traffic are required to use
> the rte_flow primitives, such as group, meta, mark, tag, and others to
> model their high-level objects.  The hardware model design for
> high-level software objects is not trivial.  Furthermore, an optimal
> design is often vendor-specific.
> 
> When hardware offloads tunneled traffic in multi-group logic,
> partially offloaded packets may arrive to the application after they
> were modified in hardware. In this case, the application may need to
> restore the original packet headers. Consider the following sequence:
> The application decaps a packet in one group and jumps to a second
> group where it tries to match on a 5-tuple, that will miss and send
> the packet to the application. In this case, the application does not
> receive the original packet but a modified one. Also, in this case,
> the application cannot match on the outer header fields, such as VXLAN
> vni and 5-tuple.
> 
> There are several possible ways to use rte_flow "patterns" and
> "actions" to resolve the issues above. For example:
> 1 Mapping headers to a hardware registers using the
> rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
> 2 Apply the decap only at the last offload stage after all the
> "patterns" were matched and the packet will be fully offloaded.
> Every approach has its pros and cons and is highly dependent on the
> hardware vendor.  For example, some hardware may have a limited number
> of registers while other hardware could not support inner actions and
> must decap before accessing inner headers.
> 
> The tunnel offload model resolves these issues. The model goals are:
> 1 Provide a unified application API to offload tunneled traffic that
> is capable to match on outer headers after decap.
> 2 Allow the application to restore the outer header of partially
> offloaded packets.
> 
> The tunnel offload model does not introduce new elements to the
> existing RTE flow model and is implemented as a set of helper
> functions.
> 
> For the application to work with the tunnel offload API it
> has to adjust flow rules in multi-table tunnel offload in the
> following way:
> 1 Remove explicit call to decap action and replace it with PMD actions
> obtained from rte_flow_tunnel_decap_and_set() helper.
> 2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
> other rules in the tunnel offload sequence.
> 
> VXLAN Code example:
> 
> Assume application needs to do inner NAT on the VXLAN packet.
> The first  rule in group 0:
> 
> flow create <port id> ingress group 0
>   pattern eth / ipv4 / udp dst is 4789 / vxlan / end
>   actions {pmd actions} / jump group 3 / end
> 
> The first VXLAN packet that arrives matches the rule in group 0 and
> jumps to group 3.  In group 3 the packet will miss since there is no
> flow to match and will be sent to the application.  Application  will
> call rte_flow_get_restore_info() to get the packet outer header.
> 
> Application will insert a new rule in group 3 to match outer and inner
> headers:
> 
> flow create <port id> ingress group 3
>   pattern {pmd items} / eth / ipv4 dst is 172.10.10.1 /
>           udp dst 4789 / vxlan vni is 10 /
>           ipv4 dst is 184.1.2.3 / end
>   actions  set_ipv4_dst  186.1.1.1 / queue index 3 / end
> 
> Resulting of the rules will be that VXLAN packet with vni=10, outer
> IPv4 dst=172.10.10.1 and inner IPv4 dst=184.1.2.3 will be received
> decapped on queue 3 with IPv4 dst=186.1.1.1
> 
> Note: The packet in group 3 is considered decapped. All actions in
> that group will be done on the header that was inner before decap. The
> application may specify an outer header to be matched on.  It's PMD
> responsibility to translate these items to outer metadata.
> 
> API usage:
> 
> /**
>  * 1. Initiate RTE flow tunnel object
>  */
> const struct rte_flow_tunnel tunnel = {
>   .type = RTE_FLOW_ITEM_TYPE_VXLAN,
>   .tun_id = 10,
> }
> 
> /**
>  * 2. Obtain PMD tunnel actions
>  *
>  * pmd_actions is an intermediate variable application uses to
>  * compile actions array
>  */
> struct rte_flow_action **pmd_actions;
> rte_flow_tunnel_decap_and_set(&tunnel, &pmd_actions,
>                               &num_pmd_actions, &error);
> /**
>  * 3. offload the first  rule
>  * matching on VXLAN traffic and jumps to group 3
>  * (implicitly decaps packet)
>  */
> app_actions  =   jump group 3
> rule_items = app_items;  /** eth / ipv4 / udp / vxlan  */
> rule_actions = { pmd_actions, app_actions };
> attr.group = 0;
> flow_1 = rte_flow_create(port_id, &attr,
>                          rule_items, rule_actions, &error);
> 
> /**
>   * 4. after flow creation application does not need to keep the
>   * tunnel action resources.
>   */
> rte_flow_tunnel_action_release(port_id, pmd_actions,
>                                num_pmd_actions);
> /**
>   * 5. After partially offloaded packet miss because there was no
>   * matching rule handle miss on group 3
>   */
> struct rte_flow_restore_info info;
> rte_flow_get_restore_info(port_id, mbuf, &info, &error);
> 
> /**
>  * 6. Offload NAT rule:
>  */
> app_items = { eth / ipv4 dst is 172.10.10.1 / udp dst 4789 /
>             vxlan vni is 10 / ipv4 dst is 184.1.2.3 }
> app_actions = { set_ipv4_dst 186.1.1.1 / queue index 3 }
> 
> rte_flow_tunnel_match(&info.tunnel, &pmd_items,
>                       &num_pmd_items,  &error);
> rule_items = {pmd_items, app_items};
> rule_actions = app_actions;
> attr.group = info.group_id;
> flow_2 = rte_flow_create(port_id, &attr,
>                          rule_items, rule_actions, &error);
> 
> /**
>  * 7. Release PMD items after rule creation
>  */
> rte_flow_tunnel_item_release(port_id,
>                              pmd_items, num_pmd_items);
> 
> References
> 1. https://mails.dpdk.org/archives/dev/2020-June/index.html
> 
> Signed-off-by: Eli Britstein <elibr@mellanox.com>
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> 
> ---
> v5:
> * rebase to next-net
> 
> v6:
> * update the patch comment
> * update tunnel offload section in rte_flow.rst
> ---
>  doc/guides/prog_guide/rte_flow.rst       |  78 +++++++++
>  doc/guides/rel_notes/release_20_11.rst   |   5 +
>  lib/librte_ethdev/rte_ethdev_version.map |   5 +
>  lib/librte_ethdev/rte_flow.c             | 112 +++++++++++++
>  lib/librte_ethdev/rte_flow.h             | 195 +++++++++++++++++++++++
>  lib/librte_ethdev/rte_flow_driver.h      |  32 ++++
>  6 files changed, 427 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index 7fb5ec9059..8dc048c6f4 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3131,6 +3131,84 @@ operations include:
>  - Duplication of a complete flow rule description.
>  - Pattern item or action name retrieval.
>  
> +Tunneled traffic offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +rte_flow API provides the building blocks for vendor-agnostic flow
> +classification offloads. The rte_flow "patterns" and "actions"
> +primitives are fine-grained, thus enabling DPDK applications the
> +flexibility to offload network stacks and complex pipelines.
> +Applications wishing to offload tunneled traffic are required to use
> +the rte_flow primitives, such as group, meta, mark, tag, and others to
> +model their high-level objects.  The hardware model design for
> +high-level software objects is not trivial.  Furthermore, an optimal
> +design is often vendor-specific.
> +
> +When hardware offloads tunneled traffic in multi-group logic,
> +partially offloaded packets may arrive to the application after they
> +were modified in hardware. In this case, the application may need to
> +restore the original packet headers. Consider the following sequence:
> +The application decaps a packet in one group and jumps to a second
> +group where it tries to match on a 5-tuple, that will miss and send
> +the packet to the application. In this case, the application does not
> +receive the original packet but a modified one. Also, in this case,
> +the application cannot match on the outer header fields, such as VXLAN
> +vni and 5-tuple.
> +
> +There are several possible ways to use rte_flow "patterns" and
> +"actions" to resolve the issues above. For example:
> +
> +1 Mapping headers to a hardware registers using the
> +rte_flow_action_mark/rte_flow_action_tag/rte_flow_set_meta objects.
> +
> +2 Apply the decap only at the last offload stage after all the
> +"patterns" were matched and the packet will be fully offloaded.
> +
> +Every approach has its pros and cons and is highly dependent on the
> +hardware vendor.  For example, some hardware may have a limited number
> +of registers while other hardware could not support inner actions and
> +must decap before accessing inner headers.
> +
> +The tunnel offload model resolves these issues. The model goals are:
> +
> +1 Provide a unified application API to offload tunneled traffic that
> +is capable to match on outer headers after decap.
> +
> +2 Allow the application to restore the outer header of partially
> +offloaded packets.
> +
> +The tunnel offload model does not introduce new elements to the
> +existing RTE flow model and is implemented as a set of helper
> +functions.
> +
> +For the application to work with the tunnel offload API it
> +has to adjust flow rules in multi-table tunnel offload in the
> +following way:
> +
> +1 Remove explicit call to decap action and replace it with PMD actions
> +obtained from rte_flow_tunnel_decap_and_set() helper.
> +
> +2 Add PMD items obtained from rte_flow_tunnel_match() helper to all
> +other rules in the tunnel offload sequence.
> +
> +The model requirements:
> +
> +Software application must initialize
> +rte_tunnel object with tunnel parameters before calling
> +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> +
> +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> +released by application with rte_flow_action_release() call.
> +
> +PMD items array obtained with rte_flow_tunnel_match() must be released

Should be rte_flow_tunnel_item_release ?

> +by application with rte_flow_item_release() call.  Application can
> +release PMD items and actions after rule was created. However, if the
> +application needs to create additional rule for the same tunnel it
> +will need to obtain PMD items again.
> +
> +Application cannot destroy rte_tunnel object before it releases all
> +PMD actions & PMD items referencing that tunnel.
> +
>  Caveats
>  -------
>  
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index 9155b468d6..f125ce79dd 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -121,6 +121,11 @@ New Features
>    * Flow rule verification was updated to accept private PMD
>      items and actions.
>  
> +* **Added generic API to offload tunneled traffic and restore missed packet.**
> +
> +  * Added a new hardware independent helper API to RTE flow library that
> +    offloads tunneled traffic and restores missed packets.
> +
>  * **Updated Cisco enic driver.**
>  
>    * Added support for VF representors with single-queue Tx/Rx and flow API
> diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map
> index f64c379ac2..8ddda2547f 100644
> --- a/lib/librte_ethdev/rte_ethdev_version.map
> +++ b/lib/librte_ethdev/rte_ethdev_version.map
> @@ -239,6 +239,11 @@ EXPERIMENTAL {
>  	rte_flow_shared_action_destroy;
>  	rte_flow_shared_action_query;
>  	rte_flow_shared_action_update;
> +	rte_flow_tunnel_decap_set;
> +	rte_flow_tunnel_match;
> +	rte_flow_get_restore_info;
> +	rte_flow_tunnel_action_decap_release;
> +	rte_flow_tunnel_item_release;
>  };
>  
>  INTERNAL {
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index b74ea5593a..380c5cae2c 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -1143,3 +1143,115 @@ rte_flow_shared_action_query(uint16_t port_id,
>  				       data, error);
>  	return flow_err(port_id, ret, error);
>  }
> +
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> +			  struct rte_flow_tunnel *tunnel,
> +			  struct rte_flow_action **actions,
> +			  uint32_t *num_of_actions,
> +			  struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +	if (unlikely(!ops))
> +		return -rte_errno;
> +	if (likely(!!ops->tunnel_decap_set)) {
> +		return flow_err(port_id,
> +				ops->tunnel_decap_set(dev, tunnel, actions,
> +						      num_of_actions, error),
> +				error);
> +	}
> +	return rte_flow_error_set(error, ENOTSUP,
> +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +				  NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +		      struct rte_flow_tunnel *tunnel,
> +		      struct rte_flow_item **items,
> +		      uint32_t *num_of_items,
> +		      struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +	if (unlikely(!ops))
> +		return -rte_errno;
> +	if (likely(!!ops->tunnel_match)) {
> +		return flow_err(port_id,
> +				ops->tunnel_match(dev, tunnel, items,
> +						  num_of_items, error),
> +				error);
> +	}
> +	return rte_flow_error_set(error, ENOTSUP,
> +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +				  NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_get_restore_info(uint16_t port_id,
> +			  struct rte_mbuf *m,
> +			  struct rte_flow_restore_info *restore_info,
> +			  struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +	if (unlikely(!ops))
> +		return -rte_errno;
> +	if (likely(!!ops->get_restore_info)) {
> +		return flow_err(port_id,
> +				ops->get_restore_info(dev, m, restore_info,
> +						      error),
> +				error);
> +	}
> +	return rte_flow_error_set(error, ENOTSUP,
> +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +				  NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> +				     struct rte_flow_action *actions,
> +				     uint32_t num_of_actions,
> +				     struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +	if (unlikely(!ops))
> +		return -rte_errno;
> +	if (likely(!!ops->action_release)) {
> +		return flow_err(port_id,
> +				ops->action_release(dev, actions,
> +						    num_of_actions, error),
> +				error);
> +	}
> +	return rte_flow_error_set(error, ENOTSUP,
> +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +				  NULL, rte_strerror(ENOTSUP));
> +}
> +
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> +			     struct rte_flow_item *items,
> +			     uint32_t num_of_items,
> +			     struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +	if (unlikely(!ops))
> +		return -rte_errno;
> +	if (likely(!!ops->item_release)) {
> +		return flow_err(port_id,
> +				ops->item_release(dev, items,
> +						  num_of_items, error),
> +				error);
> +	}
> +	return rte_flow_error_set(error, ENOTSUP,
> +				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +				  NULL, rte_strerror(ENOTSUP));
> +}
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index 48395284b5..a8eac4deb8 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3620,6 +3620,201 @@ rte_flow_shared_action_query(uint16_t port_id,
>  			     void *data,
>  			     struct rte_flow_error *error);
>  
> +/* Tunnel has a type and the key information. */
> +struct rte_flow_tunnel {
> +	/**
> +	 * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> +	 * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> +	 */
> +	enum rte_flow_item_type	type;
> +	uint64_t tun_id; /**< Tunnel identification. */
> +
> +	RTE_STD_C11
> +	union {
> +		struct {
> +			rte_be32_t src_addr; /**< IPv4 source address. */
> +			rte_be32_t dst_addr; /**< IPv4 destination address. */
> +		} ipv4;
> +		struct {
> +			uint8_t src_addr[16]; /**< IPv6 source address. */
> +			uint8_t dst_addr[16]; /**< IPv6 destination address. */
> +		} ipv6;
> +	};
> +	rte_be16_t tp_src; /**< Tunnel port source. */
> +	rte_be16_t tp_dst; /**< Tunnel port destination. */
> +	uint16_t   tun_flags; /**< Tunnel flags. */
> +
> +	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> +
> +	/**
> +	 * the following members are required to restore packet
> +	 * after miss
> +	 */
> +	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> +	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
> +	uint32_t label; /**< Flow Label for IPv6. */
> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +struct rte_flow_restore_info {
> +	/**
> +	 * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> +	 * other fields in struct rte_flow_restore_info.
> +	 */
> +	uint64_t flags;
> +	uint32_t group_id; /**< Group ID where packed missed */
> +	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-decap-set for the given tunnel.
> + * Sample usage:
> + *   actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> + *            jump group 0 / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] actions
> + *   Array of actions to be allocated by the PMD. This array should be
> + *   concatenated with the actions array provided to rte_flow_create.
> + * @param[out] num_of_actions
> + *   Number of actions allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> +			  struct rte_flow_tunnel *tunnel,
> +			  struct rte_flow_action **actions,
> +			  uint32_t *num_of_actions,
> +			  struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> + *           inner-header-matches / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] items
> + *   Array of items to be allocated by the PMD. This array should be
> + *   concatenated with the items array provided to rte_flow_create.
> + * @param[out] num_of_items
> + *   Number of items allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +		      struct rte_flow_tunnel *tunnel,
> +		      struct rte_flow_item **items,
> +		      uint32_t *num_of_items,
> +		      struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given mbuf.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] m
> + *   Mbuf struct.
> + * @param[out] info
> + *   Restore information. Upon success contains the HW state.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_get_restore_info(uint16_t port_id,
> +			  struct rte_mbuf *m,
> +			  struct rte_flow_restore_info *info,
> +			  struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] actions
> + *   Array of actions to be released.
> + * @param[in] num_of_actions
> + *   Number of elements in actions array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> +				     struct rte_flow_action *actions,
> +				     uint32_t num_of_actions,
> +				     struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] items
> + *   Array of items to be released.
> + * @param[in] num_of_items
> + *   Number of elements in item array.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> +			     struct rte_flow_item *items,
> +			     uint32_t num_of_items,
> +			     struct rte_flow_error *error);
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
> index 58f56b0262..bd5ffc0bb1 100644
> --- a/lib/librte_ethdev/rte_flow_driver.h
> +++ b/lib/librte_ethdev/rte_flow_driver.h
> @@ -131,6 +131,38 @@ struct rte_flow_ops {
>  		 const struct rte_flow_shared_action *shared_action,
>  		 void *data,
>  		 struct rte_flow_error *error);
> +	/** See rte_flow_tunnel_decap_set() */
> +	int (*tunnel_decap_set)
> +		(struct rte_eth_dev *dev,
> +		 struct rte_flow_tunnel *tunnel,
> +		 struct rte_flow_action **pmd_actions,
> +		 uint32_t *num_of_actions,
> +		 struct rte_flow_error *err);
> +	/** See rte_flow_tunnel_match() */
> +	int (*tunnel_match)
> +		(struct rte_eth_dev *dev,
> +		 struct rte_flow_tunnel *tunnel,
> +		 struct rte_flow_item **pmd_items,
> +		 uint32_t *num_of_items,
> +		 struct rte_flow_error *err);

Should be rte_flow_get_restore_info

> +	/** See rte_flow_get_rte_flow_restore_info() */> +	int (*get_restore_info)
> +		(struct rte_eth_dev *dev,
> +		 struct rte_mbuf *m,
> +		 struct rte_flow_restore_info *info,
> +		 struct rte_flow_error *err);

Should be rte_flow_tunnel_action_decap_release
> +	/** See rte_flow_action_tunnel_decap_release() */
> +	int (*action_release)
> +		(struct rte_eth_dev *dev,
> +		 struct rte_flow_action *pmd_actions,
> +		 uint32_t num_of_actions,
> +		 struct rte_flow_error *err);

Should rte_flow_tunnel_item_release?
> +	/** See rte_flow_item_release() */
> +	int (*item_release)
> +		(struct rte_eth_dev *dev,
> +		 struct rte_flow_item *pmd_items,
> +		 uint32_t num_of_items,
> +		 struct rte_flow_error *err);
>  };
>  
>  /**
> 

ABI Changes Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH] ethdev: rename tunnel offload callbacks
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (9 preceding siblings ...)
  2020-10-16 12:51 ` [dpdk-dev] [PATCH v8 " Gregory Etelson
@ 2020-10-18 12:15 ` Gregory Etelson
  2020-10-19  8:31   ` Ferruh Yigit
  2020-10-21  9:22 ` [dpdk-dev] [PATCH] net/mlx5: implement tunnel offload API Gregory Etelson
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-10-18 12:15 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, mdr, Ori Kam,
	Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko

rename new rte_flow ops callbacks to emphasize relation to tunnel
offload API.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
 lib/librte_ethdev/rte_flow.c        | 13 +++++++------
 lib/librte_ethdev/rte_flow_driver.h |  4 ++--
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index 380c5cae2c..d3e5cbc194 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -1223,10 +1223,11 @@ rte_flow_tunnel_action_decap_release(uint16_t port_id,
 
 	if (unlikely(!ops))
 		return -rte_errno;
-	if (likely(!!ops->action_release)) {
+	if (likely(!!ops->tunnel_action_decap_release)) {
 		return flow_err(port_id,
-				ops->action_release(dev, actions,
-						    num_of_actions, error),
+				ops->tunnel_action_decap_release(dev, actions,
+								 num_of_actions,
+								 error),
 				error);
 	}
 	return rte_flow_error_set(error, ENOTSUP,
@@ -1245,10 +1246,10 @@ rte_flow_tunnel_item_release(uint16_t port_id,
 
 	if (unlikely(!ops))
 		return -rte_errno;
-	if (likely(!!ops->item_release)) {
+	if (likely(!!ops->tunnel_item_release)) {
 		return flow_err(port_id,
-				ops->item_release(dev, items,
-						  num_of_items, error),
+				ops->tunnel_item_release(dev, items,
+							 num_of_items, error),
 				error);
 	}
 	return rte_flow_error_set(error, ENOTSUP,
diff --git a/lib/librte_ethdev/rte_flow_driver.h b/lib/librte_ethdev/rte_flow_driver.h
index bd5ffc0bb1..f3d72827cc 100644
--- a/lib/librte_ethdev/rte_flow_driver.h
+++ b/lib/librte_ethdev/rte_flow_driver.h
@@ -152,13 +152,13 @@ struct rte_flow_ops {
 		 struct rte_flow_restore_info *info,
 		 struct rte_flow_error *err);
 	/** See rte_flow_action_tunnel_decap_release() */
-	int (*action_release)
+	int (*tunnel_action_decap_release)
 		(struct rte_eth_dev *dev,
 		 struct rte_flow_action *pmd_actions,
 		 uint32_t num_of_actions,
 		 struct rte_flow_error *err);
 	/** See rte_flow_item_release() */
-	int (*item_release)
+	int (*tunnel_item_release)
 		(struct rte_eth_dev *dev,
 		 struct rte_flow_item *pmd_items,
 		 uint32_t num_of_items,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH] ethdev: rename tunnel offload callbacks
  2020-10-18 12:15 ` [dpdk-dev] [PATCH] ethdev: rename tunnel offload callbacks Gregory Etelson
@ 2020-10-19  8:31   ` Ferruh Yigit
  2020-10-19  9:56     ` Kinsella, Ray
  0 siblings, 1 reply; 95+ messages in thread
From: Ferruh Yigit @ 2020-10-19  8:31 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: matan, rasland, elibr, ozsh, mdr, Ori Kam, Thomas Monjalon,
	Andrew Rybchenko

On 10/18/2020 1:15 PM, Gregory Etelson wrote:
> rename new rte_flow ops callbacks to emphasize relation to tunnel
> offload API.
> 
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> ---
>   lib/librte_ethdev/rte_flow.c        | 13 +++++++------
>   lib/librte_ethdev/rte_flow_driver.h |  4 ++--
>   2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
> index 380c5cae2c..d3e5cbc194 100644
> --- a/lib/librte_ethdev/rte_flow.c
> +++ b/lib/librte_ethdev/rte_flow.c
> @@ -1223,10 +1223,11 @@ rte_flow_tunnel_action_decap_release(uint16_t port_id,
>   
>   	if (unlikely(!ops))
>   		return -rte_errno;
> -	if (likely(!!ops->action_release)) {
> +	if (likely(!!ops->tunnel_action_decap_release)) {

+1 to rename.

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH] ethdev: rename tunnel offload callbacks
  2020-10-19  8:31   ` Ferruh Yigit
@ 2020-10-19  9:56     ` Kinsella, Ray
  2020-10-19 21:29       ` Thomas Monjalon
  0 siblings, 1 reply; 95+ messages in thread
From: Kinsella, Ray @ 2020-10-19  9:56 UTC (permalink / raw)
  To: Ferruh Yigit, Gregory Etelson, dev
  Cc: matan, rasland, elibr, ozsh, Ori Kam, Thomas Monjalon, Andrew Rybchenko



On 19/10/2020 09:31, Ferruh Yigit wrote:
> On 10/18/2020 1:15 PM, Gregory Etelson wrote:
>> rename new rte_flow ops callbacks to emphasize relation to tunnel
>> offload API.
>>
>> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
>> Acked-by: Ori Kam <orika@nvidia.com>
>> ---
>>   lib/librte_ethdev/rte_flow.c        | 13 +++++++------
>>   lib/librte_ethdev/rte_flow_driver.h |  4 ++--
>>   2 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
>> index 380c5cae2c..d3e5cbc194 100644
>> --- a/lib/librte_ethdev/rte_flow.c
>> +++ b/lib/librte_ethdev/rte_flow.c
>> @@ -1223,10 +1223,11 @@ rte_flow_tunnel_action_decap_release(uint16_t port_id,
>>         if (unlikely(!ops))
>>           return -rte_errno;
>> -    if (likely(!!ops->action_release)) {
>> +    if (likely(!!ops->tunnel_action_decap_release)) {
> 
> +1 to rename.
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH] ethdev: rename tunnel offload callbacks
  2020-10-19  9:56     ` Kinsella, Ray
@ 2020-10-19 21:29       ` Thomas Monjalon
  0 siblings, 0 replies; 95+ messages in thread
From: Thomas Monjalon @ 2020-10-19 21:29 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: Ferruh Yigit, matan, rasland, elibr, ozsh, Ori Kam,
	Andrew Rybchenko, Kinsella, Ray

19/10/2020 11:56, Kinsella, Ray:
> On 19/10/2020 09:31, Ferruh Yigit wrote:
> > On 10/18/2020 1:15 PM, Gregory Etelson wrote:
> >> rename new rte_flow ops callbacks to emphasize relation to tunnel
> >> offload API.
> >>
> >> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> >> Acked-by: Ori Kam <orika@nvidia.com>
> > 
> > +1 to rename.
> > 
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Acked-by: Ray Kinsella <mdr@ashroe.eu>

Applied, thanks



^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH] net/mlx5: implement tunnel offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (10 preceding siblings ...)
  2020-10-18 12:15 ` [dpdk-dev] [PATCH] ethdev: rename tunnel offload callbacks Gregory Etelson
@ 2020-10-21  9:22 ` Gregory Etelson
  2020-10-22 16:00 ` [dpdk-dev] [PATCH v2] " Gregory Etelson
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-21  9:22 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Viacheslav Ovsiienko,
	Shahaf Shuler

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions whith tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst         |   3 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +
 drivers/net/mlx5/mlx5.c          |   8 +-
 drivers/net/mlx5/mlx5.h          |   3 +
 drivers/net/mlx5/mlx5_defs.h     |   2 +
 drivers/net/mlx5/mlx5_flow.c     | 680 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h     | 170 +++++++-
 drivers/net/mlx5/mlx5_flow_dv.c  | 248 +++++++++--
 8 files changed, 1081 insertions(+), 51 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 1a8808e854..678d2a9597 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -831,6 +831,9 @@ Driver options
     24 bits. The actual supported width can be retrieved in runtime by
     series of rte_flow_validate() trials.
 
+  - 3, this engages tunnel offload mode. In E-Switch configuration, that
+    mode implicitly activates ``dv_xmeta_en=1``.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index d95082f927..74a7cd9a1c 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -298,6 +298,12 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		sh->esw_drop_action = mlx5_glue->dr_create_flow_action_drop();
 	}
 #endif
+	if (!sh->tunnel_hub)
+		err = mlx5_alloc_tunnel_hub(sh);
+	if (err) {
+		DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
+		goto error;
+	}
 	if (priv->config.reclaim_mode == MLX5_RCM_AGGR) {
 		mlx5_glue->dr_reclaim_domain_memory(sh->rx_domain, 1);
 		mlx5_glue->dr_reclaim_domain_memory(sh->tx_domain, 1);
@@ -344,6 +350,10 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 	return err;
 }
@@ -405,6 +415,10 @@ mlx5_os_free_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 }
 
@@ -708,6 +722,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			strerror(rte_errno));
 		goto error;
 	}
+	if (config->dv_miss_info) {
+		if (switch_info->master || switch_info->representor)
+			config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
+	}
 	mlx5_malloc_mem_select(config->sys_mem_en);
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 74a537b164..330523feb8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1621,13 +1621,17 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 	} else if (strcmp(MLX5_DV_XMETA_EN, key) == 0) {
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
-		    tmp != MLX5_XMETA_MODE_META32) {
+		    tmp != MLX5_XMETA_MODE_META32 &&
+		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
 			DRV_LOG(ERR, "invalid extensive "
 				     "metadata parameter");
 			rte_errno = EINVAL;
 			return -rte_errno;
 		}
-		config->dv_xmeta_en = tmp;
+		if (tmp != MLX5_XMETA_MODE_MISS_INFO)
+			config->dv_xmeta_en = tmp;
+		else
+			config->dv_miss_info = 1;
 	} else if (strcmp(MLX5_LACP_BY_USER, key) == 0) {
 		config->lacp_by_user = !!tmp;
 	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index afa2f31f5b..90034232c8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -208,6 +208,7 @@ struct mlx5_dev_config {
 	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	unsigned int sys_mem_en:1; /* The default memory allocator. */
 	unsigned int decap_en:1; /* Whether decap will be used or not. */
+	unsigned int dv_miss_info:1; /* restore packet after partial hw miss */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
@@ -634,6 +635,8 @@ struct mlx5_dev_ctx_shared {
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
 	struct mlx5_hlist *flow_tbls;
+	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
+	struct mlx5_flow_tunnel_hub *tunnel_hub;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
 	void *pop_vlan_action; /* Pointer to DR pop VLAN action. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 0df47391ee..41a7537d5e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -165,6 +165,8 @@
 #define MLX5_XMETA_MODE_LEGACY 0
 #define MLX5_XMETA_MODE_META16 1
 #define MLX5_XMETA_MODE_META32 2
+/* Provide info on patrial hw miss. Implies MLX5_XMETA_MODE_META16 */
+#define MLX5_XMETA_MODE_MISS_INFO 3
 
 /* MLX5_TX_DB_NC supported values. */
 #define MLX5_TXDB_CACHED 0
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c56dac89f9..3a657f1616 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -18,6 +18,7 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 #include <rte_ip.h>
+#include <rte_hash.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -30,6 +31,18 @@
 #include "mlx5_flow_os.h"
 #include "mlx5_rxtx.h"
 
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id);
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev, struct mlx5_flow_tunnel *tunnel);
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark);
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel);
+
+
 /** Device flow drivers. */
 extern const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops;
 
@@ -546,6 +559,163 @@ static const struct mlx5_flow_expand_node mlx5_support_expansion[] = {
 	},
 };
 
+static inline bool
+mlx5_flow_tunnel_validate(struct rte_eth_dev *dev,
+			  struct rte_flow_tunnel *tunnel,
+			  const char *err_msg)
+{
+	err_msg = NULL;
+	if (!is_tunnel_offload_active(dev)) {
+		err_msg = "tunnel offload was not activated";
+		goto out;
+	} else if (!tunnel) {
+		err_msg = "no application tunnel";
+		goto out;
+	}
+
+	switch (tunnel->type) {
+	default:
+		err_msg = "unsupported tunnel type";
+		goto out;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		break;
+	}
+
+out:
+	return !err_msg;
+}
+
+
+static int
+mlx5_flow_tunnel_decap_set(struct rte_eth_dev *dev,
+		    struct rte_flow_tunnel *app_tunnel,
+		    struct rte_flow_action **actions,
+		    uint32_t *num_of_actions,
+		    struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*actions = &tunnel->action;
+	*num_of_actions = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_match(struct rte_eth_dev *dev,
+		       struct rte_flow_tunnel *app_tunnel,
+		       struct rte_flow_item **items,
+		       uint32_t *num_of_items,
+		       struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*items = &tunnel->item;
+	*num_of_items = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_item_release(struct rte_eth_dev *dev,
+		       struct rte_flow_item *pmd_items,
+		       uint32_t num_items, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->item == pmd_items)
+			break;
+	}
+	if (!tun || num_items != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+	return 0;
+}
+
+static int
+mlx5_flow_action_release(struct rte_eth_dev *dev,
+			 struct rte_flow_action *pmd_actions,
+			 uint32_t num_actions, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->action == pmd_actions)
+			break;
+	}
+	if (!tun || num_actions != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_get_restore_info(struct rte_eth_dev *dev,
+				  struct rte_mbuf *m,
+				  struct rte_flow_restore_info *info,
+				  struct rte_flow_error *err)
+{
+	uint64_t ol_flags = m->ol_flags;
+	const struct mlx5_flow_tbl_data_entry *tble;
+	const uint64_t mask = PKT_RX_FDIR | PKT_RX_FDIR_ID;
+
+	if ((ol_flags & mask) != mask)
+		goto err;
+	tble = tunnel_mark_decode(dev, m->hash.fdir.hi);
+	if (!tble) {
+		DRV_LOG(DEBUG, "port %u invalid miss tunnel mark %#x",
+			dev->data->port_id, m->hash.fdir.hi);
+		goto err;
+	}
+	MLX5_ASSERT(tble->tunnel);
+	memcpy(&info->tunnel, &tble->tunnel->app_tunnel, sizeof(info->tunnel));
+	info->group_id = tble->group_id;
+	info->flags = RTE_FLOW_RESTORE_INFO_TUNNEL |
+		      RTE_FLOW_RESTORE_INFO_GROUP_ID |
+		      RTE_FLOW_RESTORE_INFO_ENCAPSULATED;
+
+	return 0;
+
+err:
+	return rte_flow_error_set(err, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "failed to get restore info");
+}
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -555,6 +725,11 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
 	.get_aged_flows = mlx5_flow_get_aged_flows,
+	.tunnel_decap_set = mlx5_flow_tunnel_decap_set,
+	.tunnel_match = mlx5_flow_tunnel_match,
+	.tunnel_action_decap_release = mlx5_flow_action_release,
+	.tunnel_item_release = mlx5_flow_item_release,
+	.get_restore_info = mlx5_flow_tunnel_get_restore_info,
 };
 
 /* Convert FDIR request to Generic flow. */
@@ -3880,6 +4055,142 @@ flow_hairpin_split(struct rte_eth_dev *dev,
 	return 0;
 }
 
+__extension__
+union tunnel_offload_mark {
+	uint32_t val;
+	struct {
+		uint32_t app_reserve:8;
+		uint32_t table_id:15;
+		uint32_t transfer:1;
+		uint32_t _unused_:8;
+	};
+};
+
+struct tunnel_default_miss_ctx {
+	uint16_t *queue;
+	__extension__
+	union {
+		struct rte_flow_action_rss action_rss;
+		struct rte_flow_action_queue miss_queue;
+		struct rte_flow_action_jump miss_jump;
+		uint8_t raw[0];
+	};
+};
+
+static int
+flow_tunnel_add_default_miss(struct rte_eth_dev *dev,
+			     struct rte_flow *flow,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *app_actions,
+			     uint32_t flow_idx,
+			     struct tunnel_default_miss_ctx *ctx,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow *dev_flow;
+	struct rte_flow_attr miss_attr = *attr;
+	const struct mlx5_flow_tunnel *tunnel = app_actions[0].conf;
+	const struct rte_flow_item miss_items[2] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		}
+	};
+	union tunnel_offload_mark mark_id;
+	struct rte_flow_action_mark miss_mark;
+	struct rte_flow_action miss_actions[3] = {
+		[0] = { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &miss_mark },
+		[2] = { .type = RTE_FLOW_ACTION_TYPE_END,  .conf = NULL }
+	};
+	const struct rte_flow_action_jump *jump_data;
+	uint32_t i, flow_table = 0; /* prevent compilation warning */
+	struct flow_grp_info grp_info = {
+		.external = 1,
+		.transfer = attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+		.std_tbl_fix = 0,
+	};
+	int ret;
+
+	if (!attr->transfer) {
+		uint32_t q_size;
+
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_RSS;
+		q_size = priv->reta_idx_n * sizeof(ctx->queue[0]);
+		ctx->queue = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, q_size,
+					 0, SOCKET_ID_ANY);
+		if (!ctx->queue)
+			return rte_flow_error_set
+				(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid default miss RSS");
+		ctx->action_rss.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		ctx->action_rss.level = 0,
+		ctx->action_rss.types = priv->rss_conf.rss_hf,
+		ctx->action_rss.key_len = priv->rss_conf.rss_key_len,
+		ctx->action_rss.queue_num = priv->reta_idx_n,
+		ctx->action_rss.key = priv->rss_conf.rss_key,
+		ctx->action_rss.queue = ctx->queue;
+		if (!priv->reta_idx_n || !priv->rxqs_n)
+			return rte_flow_error_set
+				(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid port configuration");
+		if (!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG))
+			ctx->action_rss.types = 0;
+		for (i = 0; i != priv->reta_idx_n; ++i)
+			ctx->queue[i] = (*priv->reta_idx)[i];
+	} else {
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_JUMP;
+		ctx->miss_jump.group = MLX5_TNL_MISS_FDB_JUMP_GRP;
+	}
+	miss_actions[1].conf = (typeof(miss_actions[1].conf))ctx->raw;
+	for (; app_actions->type != RTE_FLOW_ACTION_TYPE_JUMP; app_actions++);
+	jump_data = app_actions->conf;
+	miss_attr.priority = MLX5_TNL_MISS_RULE_PRIORITY;
+	miss_attr.group = jump_data->group;
+	ret = mlx5_flow_group_to_table(dev, tunnel, jump_data->group,
+				       &flow_table, grp_info, error);
+	if (ret)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  NULL, "invalid tunnel id");
+	mark_id.app_reserve = 0;
+	mark_id.table_id = tunnel_flow_tbl_to_id(flow_table);
+	mark_id.transfer = !!attr->transfer;
+	mark_id._unused_ = 0;
+	miss_mark.id = mark_id.val;
+	dev_flow = flow_drv_prepare(dev, flow, &miss_attr,
+				    miss_items, miss_actions, flow_idx, error);
+	if (!dev_flow)
+		return -rte_errno;
+	dev_flow->flow = flow;
+	dev_flow->external = true;
+	dev_flow->tunnel = tunnel;
+	/* Subflow object was created, we must include one in the list. */
+	SILIST_INSERT(&flow->dev_handles, dev_flow->handle_idx,
+		      dev_flow->handle, next);
+	DRV_LOG(DEBUG,
+		"port %u tunnel type=%d id=%u miss rule priority=%u group=%u",
+		dev->data->port_id, tunnel->app_tunnel.type,
+		tunnel->tunnel_id, miss_attr.priority, miss_attr.group);
+	ret = flow_drv_translate(dev, dev_flow, &miss_attr, miss_items,
+				  miss_actions, error);
+	if (!ret)
+		ret = flow_mreg_update_copy_table(dev, flow, miss_actions,
+						  error);
+
+	return ret;
+}
+
 /**
  * The last stage of splitting chain, just creates the subflow
  * without any modification.
@@ -5002,6 +5313,27 @@ flow_create_split_outer(struct rte_eth_dev *dev,
 	return ret;
 }
 
+static struct mlx5_flow_tunnel *
+flow_tunnel_from_rule(struct rte_eth_dev *dev,
+		      const struct rte_flow_attr *attr,
+		      const struct rte_flow_item items[],
+		      const struct rte_flow_action actions[])
+{
+	struct mlx5_flow_tunnel *tunnel;
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)items[0].spec;
+	else if (is_flow_tunnel_steer_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)actions[0].conf;
+	else
+		tunnel = NULL;
+#pragma GCC diagnostic pop
+
+	return tunnel;
+}
+
 /**
  * Create a flow and add it to @p list.
  *
@@ -5063,11 +5395,11 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	uint32_t hairpin_id = 0;
 	struct rte_flow_attr attr_tx = { .priority = 0 };
 	struct rte_flow_attr attr_factor = {0};
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_default_miss_ctx default_miss_ctx = { 0, };
 	int ret;
 
 	memcpy((void *)&attr_factor, (const void *)attr, sizeof(*attr));
-	if (external)
-		attr_factor.group *= MLX5_FLOW_TABLE_FACTOR;
 	hairpin_flow = flow_check_hairpin_split(dev, &attr_factor, actions);
 	ret = flow_drv_validate(dev, &attr_factor, items, p_actions_rx,
 				external, hairpin_flow, error);
@@ -5139,6 +5471,19 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 					      error);
 		if (ret < 0)
 			goto error;
+		if (is_flow_tunnel_steer_rule(dev, attr,
+					      buf->entry[i].pattern,
+					      p_actions_rx)) {
+			ret = flow_tunnel_add_default_miss(dev, flow, attr,
+							   p_actions_rx,
+							   idx,
+							   &default_miss_ctx,
+							   error);
+			if (ret < 0) {
+				mlx5_free(default_miss_ctx.queue);
+				goto error;
+			}
+		}
 	}
 	/* Create the tx flow. */
 	if (hairpin_flow) {
@@ -5193,6 +5538,13 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	priv->flow_idx = priv->flow_nested_idx;
 	if (priv->flow_nested_idx)
 		priv->flow_nested_idx = 0;
+	tunnel = flow_tunnel_from_rule(dev, attr, items, actions);
+	if (tunnel) {
+		flow->tunnel = 1;
+		flow->tunnel_id = tunnel->tunnel_id;
+		__atomic_add_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED);
+		mlx5_free(default_miss_ctx.queue);
+	}
 	return idx;
 error:
 	MLX5_ASSERT(flow);
@@ -5312,6 +5664,7 @@ mlx5_flow_create(struct rte_eth_dev *dev,
 				   "port not started");
 		return NULL;
 	}
+
 	return (void *)(uintptr_t)flow_list_create(dev, &priv->flows,
 				  attr, items, actions, true, error);
 }
@@ -5366,6 +5719,13 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 		}
 	}
 	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+	if (flow->tunnel) {
+		struct mlx5_flow_tunnel *tunnel;
+		tunnel = mlx5_find_tunnel_id(dev, flow->tunnel_id);
+		RTE_VERIFY(tunnel);
+		if (!__atomic_sub_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED))
+			mlx5_flow_tunnel_free(dev, tunnel);
+	}
 }
 
 /**
@@ -6845,19 +7205,122 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 	sh->cmng.pending_queries--;
 }
 
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_hlist_entry *he;
+	union tunnel_offload_mark mbits = { .val = mark };
+	union mlx5_flow_tbl_key table_key = {
+		{
+			.table_id = tunnel_id_to_flow_tbl(mbits.table_id),
+			.reserved = 0,
+			.domain = !!mbits.transfer,
+			.direction = 0,
+		}
+	};
+	he = mlx5_hlist_lookup(sh->flow_tbls, table_key.v64);
+	return he ?
+	       container_of(he, struct mlx5_flow_tbl_data_entry, entry) : NULL;
+}
+
+static uint32_t
+tunnel_flow_group_to_flow_table(struct rte_eth_dev *dev,
+				const struct mlx5_flow_tunnel *tunnel,
+				uint32_t group, uint32_t *table,
+				struct rte_flow_error *error)
+{
+	struct mlx5_hlist_entry *he;
+	struct tunnel_tbl_entry *tte;
+	union tunnel_tbl_key key = {
+		.tunnel_id = tunnel ? tunnel->tunnel_id : 0,
+		.group = group
+	};
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_hlist *group_hash;
+
+	group_hash = tunnel ? tunnel->groups : thub->groups;
+	he = mlx5_hlist_lookup(group_hash, key.val);
+	if (!he) {
+		int ret;
+		tte = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+				  sizeof(*tte), 0,
+				  SOCKET_ID_ANY);
+		if (!tte)
+			goto err;
+		tte->hash.key = key.val;
+		ret = mlx5_flow_id_get(thub->table_ids, &tte->flow_table);
+		if (ret) {
+			mlx5_free(tte);
+			goto err;
+		}
+		tte->flow_table = tunnel_id_to_flow_tbl(tte->flow_table);
+		mlx5_hlist_insert(group_hash, &tte->hash);
+	} else {
+		tte = container_of(he, typeof(*tte), hash);
+	}
+	*table = tte->flow_table;
+	DRV_LOG(DEBUG, "port %u tunnel %u group=%#x table=%#x",
+		dev->data->port_id, key.tunnel_id, group, *table);
+	return 0;
+
+err:
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+				  NULL, "tunnel group index not supported");
+}
+
+static int
+flow_group_to_table(uint32_t port_id, uint32_t group, uint32_t *table,
+		    struct flow_grp_info grp_info, struct rte_flow_error *error)
+{
+	if (grp_info.transfer && grp_info.external && grp_info.fdb_def_rule) {
+		if (group == UINT32_MAX)
+			return rte_flow_error_set
+						(error, EINVAL,
+						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						 NULL,
+						 "group index not supported");
+		*table = group + 1;
+	} else {
+		*table = group;
+	}
+	DRV_LOG(DEBUG, "port %u group=%#x table=%#x", port_id, group, *table);
+	return 0;
+}
+
 /**
  * Translate the rte_flow group index to HW table value.
  *
- * @param[in] attributes
- *   Pointer to flow attributes
- * @param[in] external
- *   Value is part of flow rule created by request external to PMD.
+ * If tunnel offload is disabled, all group ids coverted to flow table
+ * id using the standard method.
+ * If tunnel offload is enabled, group id can be converted using the
+ * standard or tunnel conversion method. Group conversion method
+ * selection depends on flags in `grp_info` parameter:
+ * - Internal (grp_info.external == 0) groups conversion uses the
+ *   standard method.
+ * - Group ids in JUMP action converted with the tunnel conversion.
+ * - Group id in rule attribute conversion depends on a rule type and
+ *   group id value:
+ *   ** non zero group attributes converted with the tunnel method
+ *   ** zero group attribute in non-tunnel rule is converted using the
+ *      standard method - there's only one root table
+ *   ** zero group attribute in steer tunnel rule is converted with the
+ *      standard method - single root table
+ *   ** zero group attribute in match tunnel rule is a special OvS
+ *      case: that value is used for portability reasons. That group
+ *      id is converted with the tunnel conversion method.
+ *
+ * @param[in] dev
+ *   Port device
+ * @param[in] tunnel
+ *   PMD tunnel offload object
  * @param[in] group
  *   rte_flow group index value.
- * @param[out] fdb_def_rule
- *   Whether fdb jump to table 1 is configured.
  * @param[out] table
  *   HW table value.
+ * @param[in] grp_info
+ *   flags used for conversion
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -6865,22 +7328,36 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_flow_group_to_table(const struct rte_flow_attr *attributes, bool external,
-			 uint32_t group, bool fdb_def_rule, uint32_t *table,
+mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group, uint32_t *table,
+			 struct flow_grp_info grp_info,
 			 struct rte_flow_error *error)
 {
-	if (attributes->transfer && external && fdb_def_rule) {
-		if (group == UINT32_MAX)
-			return rte_flow_error_set
-						(error, EINVAL,
-						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
-						 NULL,
-						 "group index not supported");
-		*table = group + 1;
+	int ret;
+	bool standard_translation;
+
+	if (grp_info.external && group < MLX5_MAX_TABLES_EXTERNAL)
+		group *= MLX5_FLOW_TABLE_FACTOR;
+	if (is_tunnel_offload_active(dev)) {
+		standard_translation = !grp_info.external ||
+					grp_info.std_tbl_fix;
 	} else {
-		*table = group;
+		standard_translation = true;
 	}
-	return 0;
+	DRV_LOG(DEBUG,
+		"port %u group=%#x transfer=%d external=%d fdb_def_rule=%d translate=%s",
+		dev->data->port_id, group, grp_info.transfer,
+		grp_info.external, grp_info.fdb_def_rule,
+		standard_translation ? "STANDARD" : "TUNNEL");
+	if (standard_translation)
+		ret = flow_group_to_table(dev->data->port_id, group, table,
+					  grp_info, error);
+	else
+		ret = tunnel_flow_group_to_flow_table(dev, tunnel, group,
+						      table, error);
+
+	return ret;
 }
 
 /**
@@ -7024,3 +7501,166 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 		 dev->data->port_id);
 	return -ENOTSUP;
 }
+
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev,
+		      struct mlx5_flow_tunnel *tunnel)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+
+	DRV_LOG(DEBUG, "port %u release pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+	RTE_VERIFY(!__atomic_load_n(&tunnel->refctn, __ATOMIC_RELAXED));
+	LIST_REMOVE(tunnel, chain);
+	mlx5_flow_id_release(id_pool, tunnel->tunnel_id);
+	mlx5_hlist_destroy(tunnel->groups, NULL, NULL);
+	mlx5_free(tunnel);
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (tun->tunnel_id == id)
+			break;
+	}
+
+	return tun;
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_flow_tunnel_allocate(struct rte_eth_dev *dev,
+			  const struct rte_flow_tunnel *app_tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+	uint32_t id;
+
+	ret = mlx5_flow_id_get(id_pool, &id);
+	if (ret)
+		return NULL;
+	/**
+	 * mlx5 flow tunnel is an auxlilary data structure
+	 * It's not part of IO. No need to allocate it from
+	 * huge pages pools dedicated for IO
+	 */
+	tunnel = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*tunnel),
+			     0, SOCKET_ID_ANY);
+	if (!tunnel) {
+		mlx5_flow_id_pool_release(id_pool);
+		return NULL;
+	}
+	tunnel->groups = mlx5_hlist_create("tunnel groups", 1024);
+	if (!tunnel->groups) {
+		mlx5_flow_id_pool_release(id_pool);
+		mlx5_free(tunnel);
+		return NULL;
+	}
+	/* initiate new PMD tunnel */
+	memcpy(&tunnel->app_tunnel, app_tunnel, sizeof(*app_tunnel));
+	tunnel->tunnel_id = id;
+	tunnel->action.type = MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET;
+	tunnel->action.conf = tunnel;
+	tunnel->item.type = MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL;
+	tunnel->item.spec = tunnel;
+	tunnel->item.last = NULL;
+	tunnel->item.mask = NULL;
+
+	DRV_LOG(DEBUG, "port %u new pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+
+	return tunnel;
+}
+
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (!memcmp(app_tunnel, &tun->app_tunnel,
+			    sizeof(*app_tunnel))) {
+			*tunnel = tun;
+			ret = 0;
+			break;
+		}
+	}
+	if (!tun) {
+		tun = mlx5_flow_tunnel_allocate(dev, app_tunnel);
+		if (tun) {
+			LIST_INSERT_HEAD(&thub->tunnels, tun, chain);
+			*tunnel = tun;
+		} else {
+			ret = -ENOMEM;
+		}
+	}
+	if (tun)
+		__atomic_add_fetch(&tun->refctn, 1, __ATOMIC_RELAXED);
+
+	return ret;
+}
+
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id)
+{
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+
+	if (!thub)
+		return;
+	if (!LIST_EMPTY(&thub->tunnels))
+		DRV_LOG(WARNING, "port %u tunnels present\n", port_id);
+	mlx5_flow_id_pool_release(thub->tunnel_ids);
+	mlx5_flow_id_pool_release(thub->table_ids);
+	mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	mlx5_free(thub);
+}
+
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
+{
+	int err;
+	struct mlx5_flow_tunnel_hub *thub;
+
+	thub = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*thub),
+			   0, SOCKET_ID_ANY);
+	if (!thub)
+		return -ENOMEM;
+	LIST_INIT(&thub->tunnels);
+	thub->tunnel_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TUNNELS);
+	if (!thub->tunnel_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->table_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TABLES);
+	if (!thub->table_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->groups = mlx5_hlist_create("flow groups", MLX5_MAX_TABLES);
+	if (!thub->groups) {
+		err = -rte_errno;
+		goto err;
+	}
+	sh->tunnel_hub = thub;
+
+	return 0;
+
+err:
+	if (thub->groups)
+		mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	if (thub->table_ids)
+		mlx5_flow_id_pool_release(thub->table_ids);
+	if (thub->tunnel_ids)
+		mlx5_flow_id_pool_release(thub->tunnel_ids);
+	if (thub)
+		mlx5_free(thub);
+	return err;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ec6aa199d3..3037cdcbfd 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -26,6 +26,7 @@ enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
 	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
 
 /* Private (internal) rte flow actions. */
@@ -35,6 +36,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_MARK,
 	MLX5_RTE_FLOW_ACTION_TYPE_COPY_MREG,
 	MLX5_RTE_FLOW_ACTION_TYPE_DEFAULT_MISS,
+	MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET,
 };
 
 /* Matches on selected register. */
@@ -201,6 +203,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_AGE (1ull << 34)
 #define MLX5_FLOW_ACTION_DEFAULT_MISS (1ull << 35)
 #define MLX5_FLOW_ACTION_SAMPLE (1ull << 36)
+#define MLX5_FLOW_ACTION_TUNNEL_SET (1ull << 37)
+#define MLX5_FLOW_ACTION_TUNNEL_MATCH (1ull << 38)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -530,6 +534,10 @@ struct mlx5_flow_tbl_data_entry {
 	struct mlx5_flow_dv_jump_tbl_resource jump;
 	/**< jump resource, at most one for each table created. */
 	uint32_t idx; /**< index for the indexed mempool. */
+	/**< tunnel offload */
+	const struct mlx5_flow_tunnel *tunnel;
+	uint32_t group_id;
+	bool external;
 };
 
 /* Sub rdma-core actions list. */
@@ -768,6 +776,7 @@ struct mlx5_flow {
 	};
 	struct mlx5_flow_handle *handle;
 	uint32_t handle_idx; /* Index of the mlx5 flow handle memory. */
+	const struct mlx5_flow_tunnel *tunnel;
 };
 
 /* Flow meter state. */
@@ -913,6 +922,112 @@ struct mlx5_fdir_flow {
 
 #define HAIRPIN_FLOW_ID_BITS 28
 
+#define MLX5_MAX_TUNNELS 256
+#define MLX5_TNL_MISS_RULE_PRIORITY 3
+#define MLX5_TNL_MISS_FDB_JUMP_GRP  0x1234faac
+
+/*
+ * When tunnel offload is active, all JUMP group ids are converted
+ * using the same method. That conversion is applied both to tunnel and
+ * regular rule types.
+ * Group ids used in tunnel rules are relative to it's tunnel (!).
+ * Application can create number of steer rules, using the same
+ * tunnel, with different group id in each rule.
+ * Each tunnel stores its groups internally in PMD tunnel object.
+ * Groups used in regular rules do not belong to any tunnel and are stored
+ * in tunnel hub.
+ */
+
+struct mlx5_flow_tunnel {
+	LIST_ENTRY(mlx5_flow_tunnel) chain;
+	struct rte_flow_tunnel app_tunnel;	/** app tunnel copy */
+	uint32_t tunnel_id;			/** unique tunnel ID */
+	uint32_t refctn;
+	struct rte_flow_action action;
+	struct rte_flow_item item;
+	struct mlx5_hlist *groups;		/** tunnel groups */
+};
+
+/** PMD tunnel related context */
+struct mlx5_flow_tunnel_hub {
+	LIST_HEAD(, mlx5_flow_tunnel) tunnels;
+	struct mlx5_flow_id_pool *tunnel_ids;
+	struct mlx5_flow_id_pool *table_ids;
+	struct mlx5_hlist *groups;		/** non tunnel groups */
+};
+
+/* convert jump group to flow table ID in tunnel rules */
+struct tunnel_tbl_entry {
+	struct mlx5_hlist_entry hash;
+	uint32_t flow_table;
+};
+
+static inline uint32_t
+tunnel_id_to_flow_tbl(uint32_t id)
+{
+	return id | (1u << 16);
+}
+
+static inline uint32_t
+tunnel_flow_tbl_to_id(uint32_t flow_tbl)
+{
+	return flow_tbl & ~(1u << 16);
+}
+
+union tunnel_tbl_key {
+	uint64_t val;
+	struct {
+		uint32_t tunnel_id;
+		uint32_t group;
+	};
+};
+
+static inline struct mlx5_flow_tunnel_hub *
+mlx5_tunnel_hub(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return priv->sh->tunnel_hub;
+}
+
+static inline bool
+is_tunnel_offload_active(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return !!priv->config.dv_miss_info;
+}
+
+static inline bool
+is_flow_tunnel_match_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (items[0].type == (typeof(items[0].type))
+				 MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL);
+}
+
+static inline bool
+is_flow_tunnel_steer_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (actions[0].type == (typeof(actions[0].type))
+				   MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET);
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_actions_to_tunnel(const struct rte_flow_action actions[])
+{
+	return actions[0].conf;
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_items_to_tunnel(const struct rte_flow_item items[])
+{
+	return items[0].spec;
+}
+
 /* Flow structure. */
 struct rte_flow {
 	ILIST_ENTRY(uint32_t)next; /**< Index to the next flow structure. */
@@ -920,12 +1035,14 @@ struct rte_flow {
 	/**< Device flow handles that are part of the flow. */
 	uint32_t drv_type:2; /**< Driver type. */
 	uint32_t fdir:1; /**< Identifier of associated FDIR if any. */
+	uint32_t tunnel:1;
 	uint32_t hairpin_flow_id:HAIRPIN_FLOW_ID_BITS;
 	/**< The flow id used for hairpin. */
 	uint32_t copy_applied:1; /**< The MARK copy Flow os applied. */
 	uint32_t rix_mreg_copy;
 	/**< Index to metadata register copy table resource. */
 	uint32_t counter; /**< Holds flow counter. */
+	uint32_t tunnel_id;  /**< Tunnel id */
 	uint16_t meter; /**< Holds flow meter id. */
 } __rte_packed;
 
@@ -1008,9 +1125,54 @@ void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
 uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
 uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
 			      uint32_t id);
-int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
-			     bool external, uint32_t group, bool fdb_def_rule,
-			     uint32_t *table, struct rte_flow_error *error);
+__extension__
+struct flow_grp_info {
+	uint64_t external:1;
+	uint64_t transfer:1;
+	uint64_t fdb_def_rule:1;
+	/* force standard group translation */
+	uint64_t std_tbl_fix:1;
+};
+
+static inline bool
+tunnel_use_standard_attr_group_translate
+		    (struct rte_eth_dev *dev,
+		     const struct mlx5_flow_tunnel *tunnel,
+		     const struct rte_flow_attr *attr,
+		     const struct rte_flow_item items[],
+		     const struct rte_flow_action actions[])
+{
+	bool verdict;
+
+	if (!is_tunnel_offload_active(dev))
+		/* no tunnel offload API */
+		verdict = true;
+	else if (tunnel) {
+		/*
+		 * OvS will use jump to group 0 in tunnel steer rule.
+		 * If tunnel steer rule starts from group 0 (attr.group == 0)
+		 * that 0 group must be traslated with standard method.
+		 * attr.group == 0 in tunnel match rule translated with tunnel
+		 * method
+		 */
+		verdict = !attr->group &&
+			  is_flow_tunnel_steer_rule(dev, attr, items, actions);
+	} else {
+		/*
+		 * non-tunnel group translation uses standard method for
+		 * root group only: attr.group == 0
+		 */
+		verdict = !attr->group;
+	}
+
+	return verdict;
+}
+
+int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     uint32_t group, uint32_t *table,
+			     struct flow_grp_info flags,
+				 struct rte_flow_error *error);
 uint64_t mlx5_flow_hashfields_adjust(struct mlx5_flow_rss_desc *rss_desc,
 				     int tunnel, uint64_t layer_types,
 				     uint64_t hash_fields);
@@ -1144,4 +1306,6 @@ int mlx5_flow_destroy_policer_rules(struct rte_eth_dev *dev,
 				    const struct rte_flow_attr *attr);
 int mlx5_flow_meter_flush(struct rte_eth_dev *dev,
 			  struct rte_mtr_error *error);
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index d3a3f23b80..39f97abb12 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3947,14 +3947,21 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_dv_validate_action_jump(const struct rte_flow_action *action,
+flow_dv_validate_action_jump(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     const struct rte_flow_action *action,
 			     uint64_t action_flags,
 			     const struct rte_flow_attr *attributes,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table;
 	int ret = 0;
-
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attributes->transfer,
+		.fdb_def_rule = 1,
+		.std_tbl_fix = 0
+	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
 			    MLX5_FLOW_FATE_ESWITCH_ACTIONS))
 		return rte_flow_error_set(error, EINVAL,
@@ -3977,11 +3984,13 @@ flow_dv_validate_action_jump(const struct rte_flow_action *action,
 					  NULL, "action configuration not set");
 	target_group =
 		((const struct rte_flow_action_jump *)action->conf)->group;
-	ret = mlx5_flow_group_to_table(attributes, external, target_group,
-				       true, &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, target_group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
-	if (attributes->group == target_group)
+	if (attributes->group == target_group &&
+	    !(action_flags & (MLX5_FLOW_ACTION_TUNNEL_SET |
+			      MLX5_FLOW_ACTION_TUNNEL_MATCH)))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "target group must be other than"
@@ -5419,8 +5428,9 @@ flow_dv_counter_release(struct rte_eth_dev *dev, uint32_t counter)
  */
 static int
 flow_dv_validate_attributes(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_tunnel *tunnel,
 			    const struct rte_flow_attr *attributes,
-			    bool external __rte_unused,
+			    struct flow_grp_info grp_info,
 			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5436,9 +5446,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 #else
 	uint32_t table = 0;
 
-	ret = mlx5_flow_group_to_table(attributes, external,
-				       attributes->group, !!priv->fdb_def_rule,
-				       &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attributes->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	if (!table)
@@ -5552,10 +5561,28 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 	const struct rte_flow_item_vlan *vlan_m = NULL;
 	int16_t rw_act_num = 0;
 	uint64_t is_root;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	if (items == NULL)
 		return -1;
-	ret = flow_dv_validate_attributes(dev, attr, external, error);
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		tunnel = flow_items_to_tunnel(items);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_MATCH |
+				MLX5_FLOW_ACTION_DECAP;
+	} else if (is_flow_tunnel_steer_rule(dev, attr, items, actions)) {
+		tunnel = flow_actions_to_tunnel(actions);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+	} else {
+		tunnel = NULL;
+	}
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = flow_dv_validate_attributes(dev, tunnel, attr, grp_info, error);
 	if (ret < 0)
 		return ret;
 	is_root = (uint64_t)ret;
@@ -5568,6 +5595,15 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  NULL, "item not supported");
 		switch (type) {
+		case MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL:
+			if (items[0].type != (typeof(items[0].type))
+						MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ITEM,
+						NULL, "MLX5 private items "
+						"must be the first");
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -6153,7 +6189,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_MDF_TTL;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			ret = flow_dv_validate_action_jump(actions,
+			ret = flow_dv_validate_action_jump(dev, tunnel, actions,
 							   action_flags,
 							   attr, external,
 							   error);
@@ -6262,6 +6298,17 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			action_flags |= MLX5_FLOW_ACTION_SAMPLE;
 			++actions_n;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			if (actions[0].type != (typeof(actions[0].type))
+				MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL, "MLX5 private action "
+						"must be the first");
+
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -6269,6 +6316,54 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  "action not supported");
 		}
 	}
+	/*
+	 * Validate actions in flow rules
+	 * - Explicit decap action is prohibited by the tunnel offload API.
+	 * - Drop action in tunnel steer rule is prohibited by the API.
+	 * - Application cannot use MARK action because it's value can mask
+	 *   tunnel default miss nitification.
+	 * - JUMP in tunnel match rule has no support in current PMD
+	 *   implementation.
+	 * - TAG & META are reserved for future uses.
+	 */
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_SET) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_DECAP    |
+					    MLX5_FLOW_ACTION_MARK     |
+					    MLX5_FLOW_ACTION_SET_TAG  |
+					    MLX5_FLOW_ACTION_SET_META |
+					    MLX5_FLOW_ACTION_DROP;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set decap rule");
+		if (!(action_flags & MLX5_FLOW_ACTION_JUMP))
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel set decap rule must terminate "
+					"with JUMP");
+		if (!attr->ingress)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel flows for ingress traffic only");
+	}
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_MATCH) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_JUMP    |
+					    MLX5_FLOW_ACTION_MARK    |
+					    MLX5_FLOW_ACTION_SET_TAG |
+					    MLX5_FLOW_ACTION_SET_META;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set match rule");
+	}
 	/*
 	 * Validate the drop action mutual exclusion with other actions.
 	 * Drop action is mutually-exclusive with any other action, except for
@@ -8135,6 +8230,9 @@ static struct mlx5_flow_tbl_resource *
 flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 			 uint32_t table_id, uint8_t egress,
 			 uint8_t transfer,
+			 bool external,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group_id,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -8171,6 +8269,9 @@ flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	tbl_data->idx = idx;
+	tbl_data->tunnel = tunnel;
+	tbl_data->group_id = group_id;
+	tbl_data->external = external;
 	tbl = &tbl_data->tbl;
 	pos = &tbl_data->entry;
 	if (transfer)
@@ -8234,6 +8335,41 @@ flow_dv_tbl_resource_release(struct rte_eth_dev *dev,
 
 		mlx5_flow_os_destroy_flow_tbl(tbl->obj);
 		tbl->obj = NULL;
+		if (is_tunnel_offload_active(dev) && tbl_data->external) {
+			struct mlx5_hlist_entry *he;
+			struct mlx5_hlist *tunnel_grp_hash;
+			struct mlx5_flow_tunnel_hub *thub =
+							mlx5_tunnel_hub(dev);
+			union tunnel_tbl_key tunnel_key = {
+				.tunnel_id = tbl_data->tunnel ?
+						tbl_data->tunnel->tunnel_id : 0,
+				.group = tbl_data->group_id
+			};
+			union mlx5_flow_tbl_key table_key = {
+				.v64 = pos->key
+			};
+			uint32_t table_id = table_key.table_id;
+
+			tunnel_grp_hash = tbl_data->tunnel ?
+						tbl_data->tunnel->groups :
+						thub->groups;
+			he = mlx5_hlist_lookup(tunnel_grp_hash, tunnel_key.val);
+			if (he) {
+				struct tunnel_tbl_entry *tte;
+				tte = container_of(he, typeof(*tte), hash);
+				MLX5_ASSERT(tte->flow_table == table_id);
+				mlx5_hlist_remove(tunnel_grp_hash, he);
+				mlx5_free(tte);
+			}
+			mlx5_flow_id_release(mlx5_tunnel_hub(dev)->table_ids,
+					     tunnel_flow_tbl_to_id(table_id));
+			DRV_LOG(DEBUG,
+				"port %u release table_id %#x tunnel %u group %u",
+				dev->data->port_id, table_id,
+				tbl_data->tunnel ?
+				tbl_data->tunnel->tunnel_id : 0,
+				tbl_data->group_id);
+		}
 		/* remove the entry from the hash list and free memory. */
 		mlx5_hlist_remove(sh->flow_tbls, pos);
 		mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_JUMP],
@@ -8279,7 +8415,7 @@ flow_dv_matcher_register(struct rte_eth_dev *dev,
 	int ret;
 
 	tbl = flow_dv_tbl_resource_get(dev, key->table_id, key->direction,
-				       key->domain, error);
+				       key->domain, false, NULL, 0, error);
 	if (!tbl)
 		return -rte_errno;	/* No need to refill the error info */
 	tbl_data = container_of(tbl, struct mlx5_flow_tbl_data_entry, tbl);
@@ -8774,7 +8910,8 @@ flow_dv_sample_resource_register(struct rte_eth_dev *dev,
 	*cache_resource = *resource;
 	/* Create normal path table level */
 	tbl = flow_dv_tbl_resource_get(dev, next_ft_id,
-					attr->egress, attr->transfer, error);
+					attr->egress, attr->transfer,
+					dev_flow->external, NULL, 0, error);
 	if (!tbl) {
 		rte_flow_error_set(error, ENOMEM,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -9386,6 +9523,12 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 	int tmp_actions_n = 0;
 	uint32_t table;
 	int ret = 0;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!dev_flow->external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	memset(&mdest_res, 0, sizeof(struct mlx5_flow_dv_dest_array_resource));
 	memset(&sample_res, 0, sizeof(struct mlx5_flow_dv_sample_resource));
@@ -9393,8 +9536,17 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
 	/* update normal path action resource into last index of array */
 	sample_act = &mdest_res.sample_act[MLX5_MAX_DEST_NUM - 1];
-	ret = mlx5_flow_group_to_table(attr, dev_flow->external, attr->group,
-				       !!priv->fdb_def_rule, &table, error);
+	tunnel = is_flow_tunnel_match_rule(dev, attr, items, actions) ?
+		 flow_items_to_tunnel(items) :
+		 is_flow_tunnel_steer_rule(dev, attr, items, actions) ?
+		 flow_actions_to_tunnel(actions) :
+		 dev_flow->tunnel ? dev_flow->tunnel : NULL;
+	mhdr_res->ft_type = attr->egress ? MLX5DV_FLOW_TABLE_TYPE_NIC_TX :
+					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attr->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	dev_flow->dv.group = table;
@@ -9404,6 +9556,45 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		priority = dev_conf->flow_prio - 1;
 	/* number of actions must be set to 0 in case of dirty stack. */
 	mhdr_res->actions_num = 0;
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		/*
+		 * do not add decap action if match rule drops packet
+		 * HW rejects rules with decap & drop
+		 */
+		bool add_decap = true;
+		const struct rte_flow_action *ptr = actions;
+		struct mlx5_flow_tbl_resource *tbl;
+
+		for (; ptr->type != RTE_FLOW_ACTION_TYPE_END; ptr++) {
+			if (ptr->type == RTE_FLOW_ACTION_TYPE_DROP) {
+				add_decap = false;
+				break;
+			}
+		}
+		if (add_decap) {
+			if (flow_dv_create_action_l2_decap(dev, dev_flow,
+							   attr->transfer,
+							   error))
+				return -rte_errno;
+			dev_flow->dv.actions[actions_n++] =
+					dev_flow->dv.encap_decap->action;
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
+		}
+		/*
+		 * bind table_id with <group, table> for tunnel match rule.
+		 * Tunnel set rule establishes that bind in JUMP action handler.
+		 * Required for scenario when application creates tunnel match
+		 * rule before tunnel set rule.
+		 */
+		tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+					       attr->transfer,
+					       !!dev_flow->external, tunnel,
+					       attr->group, error);
+		if (!tbl)
+			return rte_flow_error_set
+			       (error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+			       actions, "cannot register tunnel group");
+	}
 	for (; !actions_end ; actions++) {
 		const struct rte_flow_action_queue *queue;
 		const struct rte_flow_action_rss *rss;
@@ -9424,6 +9615,9 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 						  actions,
 						  "action not supported");
 		switch (action_type) {
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -9667,18 +9861,18 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 							action->conf)->group;
-			if (dev_flow->external && jump_group <
-					MLX5_MAX_TABLES_EXTERNAL)
-				jump_group *= MLX5_FLOW_TABLE_FACTOR;
-			ret = mlx5_flow_group_to_table(attr, dev_flow->external,
+			grp_info.std_tbl_fix = 0;
+			ret = mlx5_flow_group_to_table(dev, tunnel,
 						       jump_group,
-						       !!priv->fdb_def_rule,
-						       &table, error);
+						       &table,
+						       grp_info, error);
 			if (ret)
 				return ret;
-			tbl = flow_dv_tbl_resource_get(dev, table,
-						       attr->egress,
-						       attr->transfer, error);
+			tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+						       attr->transfer,
+						       !!dev_flow->external,
+						       tunnel, jump_group,
+						       error);
 			if (!tbl)
 				return rte_flow_error_set
 						(error, errno,
@@ -11149,7 +11343,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 		dtb = &mtb->ingress;
 	/* Create the meter table with METER level. */
 	dtb->tbl = flow_dv_tbl_resource_get(dev, MLX5_FLOW_TABLE_LEVEL_METER,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->tbl) {
 		DRV_LOG(ERR, "Failed to create meter policer table.");
 		return -1;
@@ -11157,7 +11352,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 	/* Create the meter suffix table with SUFFIX level. */
 	dtb->sfx_tbl = flow_dv_tbl_resource_get(dev,
 					    MLX5_FLOW_TABLE_LEVEL_SUFFIX,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->sfx_tbl) {
 		DRV_LOG(ERR, "Failed to create meter suffix table.");
 		return -1;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v2] net/mlx5: implement tunnel offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (11 preceding siblings ...)
  2020-10-21  9:22 ` [dpdk-dev] [PATCH] net/mlx5: implement tunnel offload API Gregory Etelson
@ 2020-10-22 16:00 ` Gregory Etelson
  2020-10-23 13:49 ` [dpdk-dev] [PATCH v3] " Gregory Etelson
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-22 16:00 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Viacheslav Ovsiienko,
	Shahaf Shuler

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions whith tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2:
* fix rebase conflicts
---
 doc/guides/nics/mlx5.rst         |   3 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +
 drivers/net/mlx5/mlx5.c          |   8 +-
 drivers/net/mlx5/mlx5.h          |   3 +
 drivers/net/mlx5/mlx5_defs.h     |   2 +
 drivers/net/mlx5/mlx5_flow.c     | 682 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h     | 170 +++++++-
 drivers/net/mlx5/mlx5_flow_dv.c  | 252 ++++++++++--
 8 files changed, 1085 insertions(+), 53 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 1a8808e854..678d2a9597 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -831,6 +831,9 @@ Driver options
     24 bits. The actual supported width can be retrieved in runtime by
     series of rte_flow_validate() trials.
 
+  - 3, this engages tunnel offload mode. In E-Switch configuration, that
+    mode implicitly activates ``dv_xmeta_en=1``.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 40f9446d43..ed3f020d82 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -291,6 +291,12 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		sh->esw_drop_action = mlx5_glue->dr_create_flow_action_drop();
 	}
 #endif
+	if (!sh->tunnel_hub)
+		err = mlx5_alloc_tunnel_hub(sh);
+	if (err) {
+		DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
+		goto error;
+	}
 	if (priv->config.reclaim_mode == MLX5_RCM_AGGR) {
 		mlx5_glue->dr_reclaim_domain_memory(sh->rx_domain, 1);
 		mlx5_glue->dr_reclaim_domain_memory(sh->tx_domain, 1);
@@ -335,6 +341,10 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 	return err;
 }
@@ -391,6 +401,10 @@ mlx5_os_free_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 }
 
@@ -733,6 +747,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			strerror(rte_errno));
 		goto error;
 	}
+	if (config->dv_miss_info) {
+		if (switch_info->master || switch_info->representor)
+			config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
+	}
 	mlx5_malloc_mem_select(config->sys_mem_en);
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e4ce9a9cb7..25048df070 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1618,13 +1618,17 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 	} else if (strcmp(MLX5_DV_XMETA_EN, key) == 0) {
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
-		    tmp != MLX5_XMETA_MODE_META32) {
+		    tmp != MLX5_XMETA_MODE_META32 &&
+		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
 			DRV_LOG(ERR, "invalid extensive "
 				     "metadata parameter");
 			rte_errno = EINVAL;
 			return -rte_errno;
 		}
-		config->dv_xmeta_en = tmp;
+		if (tmp != MLX5_XMETA_MODE_MISS_INFO)
+			config->dv_xmeta_en = tmp;
+		else
+			config->dv_miss_info = 1;
 	} else if (strcmp(MLX5_LACP_BY_USER, key) == 0) {
 		config->lacp_by_user = !!tmp;
 	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c9d5d71630..479a815eed 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -208,6 +208,7 @@ struct mlx5_dev_config {
 	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	unsigned int sys_mem_en:1; /* The default memory allocator. */
 	unsigned int decap_en:1; /* Whether decap will be used or not. */
+	unsigned int dv_miss_info:1; /* restore packet after partial hw miss */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
@@ -644,6 +645,8 @@ struct mlx5_dev_ctx_shared {
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
 	struct mlx5_hlist *flow_tbls;
+	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
+	struct mlx5_flow_tunnel_hub *tunnel_hub;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
 	void *pop_vlan_action; /* Pointer to DR pop VLAN action. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 0df47391ee..41a7537d5e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -165,6 +165,8 @@
 #define MLX5_XMETA_MODE_LEGACY 0
 #define MLX5_XMETA_MODE_META16 1
 #define MLX5_XMETA_MODE_META32 2
+/* Provide info on patrial hw miss. Implies MLX5_XMETA_MODE_META16 */
+#define MLX5_XMETA_MODE_MISS_INFO 3
 
 /* MLX5_TX_DB_NC supported values. */
 #define MLX5_TXDB_CACHED 0
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index d7243a878b..21004452e3 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -19,6 +19,7 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 #include <rte_ip.h>
+#include <rte_hash.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -32,6 +33,18 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_common_os.h"
 
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id);
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev, struct mlx5_flow_tunnel *tunnel);
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark);
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel);
+
+
 /** Device flow drivers. */
 extern const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops;
 
@@ -548,6 +561,163 @@ static const struct mlx5_flow_expand_node mlx5_support_expansion[] = {
 	},
 };
 
+static inline bool
+mlx5_flow_tunnel_validate(struct rte_eth_dev *dev,
+			  struct rte_flow_tunnel *tunnel,
+			  const char *err_msg)
+{
+	err_msg = NULL;
+	if (!is_tunnel_offload_active(dev)) {
+		err_msg = "tunnel offload was not activated";
+		goto out;
+	} else if (!tunnel) {
+		err_msg = "no application tunnel";
+		goto out;
+	}
+
+	switch (tunnel->type) {
+	default:
+		err_msg = "unsupported tunnel type";
+		goto out;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		break;
+	}
+
+out:
+	return !err_msg;
+}
+
+
+static int
+mlx5_flow_tunnel_decap_set(struct rte_eth_dev *dev,
+		    struct rte_flow_tunnel *app_tunnel,
+		    struct rte_flow_action **actions,
+		    uint32_t *num_of_actions,
+		    struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*actions = &tunnel->action;
+	*num_of_actions = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_match(struct rte_eth_dev *dev,
+		       struct rte_flow_tunnel *app_tunnel,
+		       struct rte_flow_item **items,
+		       uint32_t *num_of_items,
+		       struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*items = &tunnel->item;
+	*num_of_items = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_item_release(struct rte_eth_dev *dev,
+		       struct rte_flow_item *pmd_items,
+		       uint32_t num_items, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->item == pmd_items)
+			break;
+	}
+	if (!tun || num_items != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+	return 0;
+}
+
+static int
+mlx5_flow_action_release(struct rte_eth_dev *dev,
+			 struct rte_flow_action *pmd_actions,
+			 uint32_t num_actions, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->action == pmd_actions)
+			break;
+	}
+	if (!tun || num_actions != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_get_restore_info(struct rte_eth_dev *dev,
+				  struct rte_mbuf *m,
+				  struct rte_flow_restore_info *info,
+				  struct rte_flow_error *err)
+{
+	uint64_t ol_flags = m->ol_flags;
+	const struct mlx5_flow_tbl_data_entry *tble;
+	const uint64_t mask = PKT_RX_FDIR | PKT_RX_FDIR_ID;
+
+	if ((ol_flags & mask) != mask)
+		goto err;
+	tble = tunnel_mark_decode(dev, m->hash.fdir.hi);
+	if (!tble) {
+		DRV_LOG(DEBUG, "port %u invalid miss tunnel mark %#x",
+			dev->data->port_id, m->hash.fdir.hi);
+		goto err;
+	}
+	MLX5_ASSERT(tble->tunnel);
+	memcpy(&info->tunnel, &tble->tunnel->app_tunnel, sizeof(info->tunnel));
+	info->group_id = tble->group_id;
+	info->flags = RTE_FLOW_RESTORE_INFO_TUNNEL |
+		      RTE_FLOW_RESTORE_INFO_GROUP_ID |
+		      RTE_FLOW_RESTORE_INFO_ENCAPSULATED;
+
+	return 0;
+
+err:
+	return rte_flow_error_set(err, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "failed to get restore info");
+}
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -557,6 +727,11 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
 	.get_aged_flows = mlx5_flow_get_aged_flows,
+	.tunnel_decap_set = mlx5_flow_tunnel_decap_set,
+	.tunnel_match = mlx5_flow_tunnel_match,
+	.tunnel_action_decap_release = mlx5_flow_action_release,
+	.tunnel_item_release = mlx5_flow_item_release,
+	.get_restore_info = mlx5_flow_tunnel_get_restore_info,
 };
 
 /* Convert FDIR request to Generic flow. */
@@ -3882,6 +4057,142 @@ flow_hairpin_split(struct rte_eth_dev *dev,
 	return 0;
 }
 
+__extension__
+union tunnel_offload_mark {
+	uint32_t val;
+	struct {
+		uint32_t app_reserve:8;
+		uint32_t table_id:15;
+		uint32_t transfer:1;
+		uint32_t _unused_:8;
+	};
+};
+
+struct tunnel_default_miss_ctx {
+	uint16_t *queue;
+	__extension__
+	union {
+		struct rte_flow_action_rss action_rss;
+		struct rte_flow_action_queue miss_queue;
+		struct rte_flow_action_jump miss_jump;
+		uint8_t raw[0];
+	};
+};
+
+static int
+flow_tunnel_add_default_miss(struct rte_eth_dev *dev,
+			     struct rte_flow *flow,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *app_actions,
+			     uint32_t flow_idx,
+			     struct tunnel_default_miss_ctx *ctx,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow *dev_flow;
+	struct rte_flow_attr miss_attr = *attr;
+	const struct mlx5_flow_tunnel *tunnel = app_actions[0].conf;
+	const struct rte_flow_item miss_items[2] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		}
+	};
+	union tunnel_offload_mark mark_id;
+	struct rte_flow_action_mark miss_mark;
+	struct rte_flow_action miss_actions[3] = {
+		[0] = { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &miss_mark },
+		[2] = { .type = RTE_FLOW_ACTION_TYPE_END,  .conf = NULL }
+	};
+	const struct rte_flow_action_jump *jump_data;
+	uint32_t i, flow_table = 0; /* prevent compilation warning */
+	struct flow_grp_info grp_info = {
+		.external = 1,
+		.transfer = attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+		.std_tbl_fix = 0,
+	};
+	int ret;
+
+	if (!attr->transfer) {
+		uint32_t q_size;
+
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_RSS;
+		q_size = priv->reta_idx_n * sizeof(ctx->queue[0]);
+		ctx->queue = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, q_size,
+					 0, SOCKET_ID_ANY);
+		if (!ctx->queue)
+			return rte_flow_error_set
+				(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid default miss RSS");
+		ctx->action_rss.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		ctx->action_rss.level = 0,
+		ctx->action_rss.types = priv->rss_conf.rss_hf,
+		ctx->action_rss.key_len = priv->rss_conf.rss_key_len,
+		ctx->action_rss.queue_num = priv->reta_idx_n,
+		ctx->action_rss.key = priv->rss_conf.rss_key,
+		ctx->action_rss.queue = ctx->queue;
+		if (!priv->reta_idx_n || !priv->rxqs_n)
+			return rte_flow_error_set
+				(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid port configuration");
+		if (!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG))
+			ctx->action_rss.types = 0;
+		for (i = 0; i != priv->reta_idx_n; ++i)
+			ctx->queue[i] = (*priv->reta_idx)[i];
+	} else {
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_JUMP;
+		ctx->miss_jump.group = MLX5_TNL_MISS_FDB_JUMP_GRP;
+	}
+	miss_actions[1].conf = (typeof(miss_actions[1].conf))ctx->raw;
+	for (; app_actions->type != RTE_FLOW_ACTION_TYPE_JUMP; app_actions++);
+	jump_data = app_actions->conf;
+	miss_attr.priority = MLX5_TNL_MISS_RULE_PRIORITY;
+	miss_attr.group = jump_data->group;
+	ret = mlx5_flow_group_to_table(dev, tunnel, jump_data->group,
+				       &flow_table, grp_info, error);
+	if (ret)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  NULL, "invalid tunnel id");
+	mark_id.app_reserve = 0;
+	mark_id.table_id = tunnel_flow_tbl_to_id(flow_table);
+	mark_id.transfer = !!attr->transfer;
+	mark_id._unused_ = 0;
+	miss_mark.id = mark_id.val;
+	dev_flow = flow_drv_prepare(dev, flow, &miss_attr,
+				    miss_items, miss_actions, flow_idx, error);
+	if (!dev_flow)
+		return -rte_errno;
+	dev_flow->flow = flow;
+	dev_flow->external = true;
+	dev_flow->tunnel = tunnel;
+	/* Subflow object was created, we must include one in the list. */
+	SILIST_INSERT(&flow->dev_handles, dev_flow->handle_idx,
+		      dev_flow->handle, next);
+	DRV_LOG(DEBUG,
+		"port %u tunnel type=%d id=%u miss rule priority=%u group=%u",
+		dev->data->port_id, tunnel->app_tunnel.type,
+		tunnel->tunnel_id, miss_attr.priority, miss_attr.group);
+	ret = flow_drv_translate(dev, dev_flow, &miss_attr, miss_items,
+				  miss_actions, error);
+	if (!ret)
+		ret = flow_mreg_update_copy_table(dev, flow, miss_actions,
+						  error);
+
+	return ret;
+}
+
 /**
  * The last stage of splitting chain, just creates the subflow
  * without any modification.
@@ -5004,6 +5315,27 @@ flow_create_split_outer(struct rte_eth_dev *dev,
 	return ret;
 }
 
+static struct mlx5_flow_tunnel *
+flow_tunnel_from_rule(struct rte_eth_dev *dev,
+		      const struct rte_flow_attr *attr,
+		      const struct rte_flow_item items[],
+		      const struct rte_flow_action actions[])
+{
+	struct mlx5_flow_tunnel *tunnel;
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)items[0].spec;
+	else if (is_flow_tunnel_steer_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)actions[0].conf;
+	else
+		tunnel = NULL;
+#pragma GCC diagnostic pop
+
+	return tunnel;
+}
+
 /**
  * Create a flow and add it to @p list.
  *
@@ -5065,11 +5397,11 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	uint32_t hairpin_id = 0;
 	struct rte_flow_attr attr_tx = { .priority = 0 };
 	struct rte_flow_attr attr_factor = {0};
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_default_miss_ctx default_miss_ctx = { 0, };
 	int ret;
 
 	memcpy((void *)&attr_factor, (const void *)attr, sizeof(*attr));
-	if (external)
-		attr_factor.group *= MLX5_FLOW_TABLE_FACTOR;
 	hairpin_flow = flow_check_hairpin_split(dev, &attr_factor, actions);
 	ret = flow_drv_validate(dev, &attr_factor, items, p_actions_rx,
 				external, hairpin_flow, error);
@@ -5141,6 +5473,19 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 					      error);
 		if (ret < 0)
 			goto error;
+		if (is_flow_tunnel_steer_rule(dev, attr,
+					      buf->entry[i].pattern,
+					      p_actions_rx)) {
+			ret = flow_tunnel_add_default_miss(dev, flow, attr,
+							   p_actions_rx,
+							   idx,
+							   &default_miss_ctx,
+							   error);
+			if (ret < 0) {
+				mlx5_free(default_miss_ctx.queue);
+				goto error;
+			}
+		}
 	}
 	/* Create the tx flow. */
 	if (hairpin_flow) {
@@ -5195,6 +5540,13 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	priv->flow_idx = priv->flow_nested_idx;
 	if (priv->flow_nested_idx)
 		priv->flow_nested_idx = 0;
+	tunnel = flow_tunnel_from_rule(dev, attr, items, actions);
+	if (tunnel) {
+		flow->tunnel = 1;
+		flow->tunnel_id = tunnel->tunnel_id;
+		__atomic_add_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED);
+		mlx5_free(default_miss_ctx.queue);
+	}
 	return idx;
 error:
 	MLX5_ASSERT(flow);
@@ -5314,6 +5666,7 @@ mlx5_flow_create(struct rte_eth_dev *dev,
 				   "port not started");
 		return NULL;
 	}
+
 	return (void *)(uintptr_t)flow_list_create(dev, &priv->flows,
 				  attr, items, actions, true, error);
 }
@@ -5368,6 +5721,13 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 		}
 	}
 	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+	if (flow->tunnel) {
+		struct mlx5_flow_tunnel *tunnel;
+		tunnel = mlx5_find_tunnel_id(dev, flow->tunnel_id);
+		RTE_VERIFY(tunnel);
+		if (!__atomic_sub_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED))
+			mlx5_flow_tunnel_free(dev, tunnel);
+	}
 }
 
 /**
@@ -6902,19 +7262,122 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 	sh->cmng.pending_queries--;
 }
 
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_hlist_entry *he;
+	union tunnel_offload_mark mbits = { .val = mark };
+	union mlx5_flow_tbl_key table_key = {
+		{
+			.table_id = tunnel_id_to_flow_tbl(mbits.table_id),
+			.reserved = 0,
+			.domain = !!mbits.transfer,
+			.direction = 0,
+		}
+	};
+	he = mlx5_hlist_lookup(sh->flow_tbls, table_key.v64);
+	return he ?
+	       container_of(he, struct mlx5_flow_tbl_data_entry, entry) : NULL;
+}
+
+static uint32_t
+tunnel_flow_group_to_flow_table(struct rte_eth_dev *dev,
+				const struct mlx5_flow_tunnel *tunnel,
+				uint32_t group, uint32_t *table,
+				struct rte_flow_error *error)
+{
+	struct mlx5_hlist_entry *he;
+	struct tunnel_tbl_entry *tte;
+	union tunnel_tbl_key key = {
+		.tunnel_id = tunnel ? tunnel->tunnel_id : 0,
+		.group = group
+	};
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_hlist *group_hash;
+
+	group_hash = tunnel ? tunnel->groups : thub->groups;
+	he = mlx5_hlist_lookup(group_hash, key.val);
+	if (!he) {
+		int ret;
+		tte = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+				  sizeof(*tte), 0,
+				  SOCKET_ID_ANY);
+		if (!tte)
+			goto err;
+		tte->hash.key = key.val;
+		ret = mlx5_flow_id_get(thub->table_ids, &tte->flow_table);
+		if (ret) {
+			mlx5_free(tte);
+			goto err;
+		}
+		tte->flow_table = tunnel_id_to_flow_tbl(tte->flow_table);
+		mlx5_hlist_insert(group_hash, &tte->hash);
+	} else {
+		tte = container_of(he, typeof(*tte), hash);
+	}
+	*table = tte->flow_table;
+	DRV_LOG(DEBUG, "port %u tunnel %u group=%#x table=%#x",
+		dev->data->port_id, key.tunnel_id, group, *table);
+	return 0;
+
+err:
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+				  NULL, "tunnel group index not supported");
+}
+
+static int
+flow_group_to_table(uint32_t port_id, uint32_t group, uint32_t *table,
+		    struct flow_grp_info grp_info, struct rte_flow_error *error)
+{
+	if (grp_info.transfer && grp_info.external && grp_info.fdb_def_rule) {
+		if (group == UINT32_MAX)
+			return rte_flow_error_set
+						(error, EINVAL,
+						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						 NULL,
+						 "group index not supported");
+		*table = group + 1;
+	} else {
+		*table = group;
+	}
+	DRV_LOG(DEBUG, "port %u group=%#x table=%#x", port_id, group, *table);
+	return 0;
+}
+
 /**
  * Translate the rte_flow group index to HW table value.
  *
- * @param[in] attributes
- *   Pointer to flow attributes
- * @param[in] external
- *   Value is part of flow rule created by request external to PMD.
+ * If tunnel offload is disabled, all group ids coverted to flow table
+ * id using the standard method.
+ * If tunnel offload is enabled, group id can be converted using the
+ * standard or tunnel conversion method. Group conversion method
+ * selection depends on flags in `grp_info` parameter:
+ * - Internal (grp_info.external == 0) groups conversion uses the
+ *   standard method.
+ * - Group ids in JUMP action converted with the tunnel conversion.
+ * - Group id in rule attribute conversion depends on a rule type and
+ *   group id value:
+ *   ** non zero group attributes converted with the tunnel method
+ *   ** zero group attribute in non-tunnel rule is converted using the
+ *      standard method - there's only one root table
+ *   ** zero group attribute in steer tunnel rule is converted with the
+ *      standard method - single root table
+ *   ** zero group attribute in match tunnel rule is a special OvS
+ *      case: that value is used for portability reasons. That group
+ *      id is converted with the tunnel conversion method.
+ *
+ * @param[in] dev
+ *   Port device
+ * @param[in] tunnel
+ *   PMD tunnel offload object
  * @param[in] group
  *   rte_flow group index value.
- * @param[out] fdb_def_rule
- *   Whether fdb jump to table 1 is configured.
  * @param[out] table
  *   HW table value.
+ * @param[in] grp_info
+ *   flags used for conversion
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -6922,22 +7385,36 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_flow_group_to_table(const struct rte_flow_attr *attributes, bool external,
-			 uint32_t group, bool fdb_def_rule, uint32_t *table,
+mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group, uint32_t *table,
+			 struct flow_grp_info grp_info,
 			 struct rte_flow_error *error)
 {
-	if (attributes->transfer && external && fdb_def_rule) {
-		if (group == UINT32_MAX)
-			return rte_flow_error_set
-						(error, EINVAL,
-						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
-						 NULL,
-						 "group index not supported");
-		*table = group + 1;
+	int ret;
+	bool standard_translation;
+
+	if (grp_info.external && group < MLX5_MAX_TABLES_EXTERNAL)
+		group *= MLX5_FLOW_TABLE_FACTOR;
+	if (is_tunnel_offload_active(dev)) {
+		standard_translation = !grp_info.external ||
+					grp_info.std_tbl_fix;
 	} else {
-		*table = group;
+		standard_translation = true;
 	}
-	return 0;
+	DRV_LOG(DEBUG,
+		"port %u group=%#x transfer=%d external=%d fdb_def_rule=%d translate=%s",
+		dev->data->port_id, group, grp_info.transfer,
+		grp_info.external, grp_info.fdb_def_rule,
+		standard_translation ? "STANDARD" : "TUNNEL");
+	if (standard_translation)
+		ret = flow_group_to_table(dev->data->port_id, group, table,
+					  grp_info, error);
+	else
+		ret = tunnel_flow_group_to_flow_table(dev, tunnel, group,
+						      table, error);
+
+	return ret;
 }
 
 /**
@@ -7081,3 +7558,168 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 		 dev->data->port_id);
 	return -ENOTSUP;
 }
+
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev,
+		      struct mlx5_flow_tunnel *tunnel)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+
+	DRV_LOG(DEBUG, "port %u release pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+	RTE_VERIFY(!__atomic_load_n(&tunnel->refctn, __ATOMIC_RELAXED));
+	LIST_REMOVE(tunnel, chain);
+	mlx5_flow_id_release(id_pool, tunnel->tunnel_id);
+	mlx5_hlist_destroy(tunnel->groups, NULL, NULL);
+	mlx5_free(tunnel);
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (tun->tunnel_id == id)
+			break;
+	}
+
+	return tun;
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_flow_tunnel_allocate(struct rte_eth_dev *dev,
+			  const struct rte_flow_tunnel *app_tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+	uint32_t id;
+
+	ret = mlx5_flow_id_get(id_pool, &id);
+	if (ret)
+		return NULL;
+	/**
+	 * mlx5 flow tunnel is an auxlilary data structure
+	 * It's not part of IO. No need to allocate it from
+	 * huge pages pools dedicated for IO
+	 */
+	tunnel = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*tunnel),
+			     0, SOCKET_ID_ANY);
+	if (!tunnel) {
+		mlx5_flow_id_pool_release(id_pool);
+		return NULL;
+	}
+	tunnel->groups = mlx5_hlist_create("tunnel groups", 1024);
+	if (!tunnel->groups) {
+		mlx5_flow_id_pool_release(id_pool);
+		mlx5_free(tunnel);
+		return NULL;
+	}
+	/* initiate new PMD tunnel */
+	memcpy(&tunnel->app_tunnel, app_tunnel, sizeof(*app_tunnel));
+	tunnel->tunnel_id = id;
+	tunnel->action.type = (typeof(tunnel->action.type))
+			      MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET;
+	tunnel->action.conf = tunnel;
+	tunnel->item.type = (typeof(tunnel->item.type))
+			    MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL;
+	tunnel->item.spec = tunnel;
+	tunnel->item.last = NULL;
+	tunnel->item.mask = NULL;
+
+	DRV_LOG(DEBUG, "port %u new pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+
+	return tunnel;
+}
+
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (!memcmp(app_tunnel, &tun->app_tunnel,
+			    sizeof(*app_tunnel))) {
+			*tunnel = tun;
+			ret = 0;
+			break;
+		}
+	}
+	if (!tun) {
+		tun = mlx5_flow_tunnel_allocate(dev, app_tunnel);
+		if (tun) {
+			LIST_INSERT_HEAD(&thub->tunnels, tun, chain);
+			*tunnel = tun;
+		} else {
+			ret = -ENOMEM;
+		}
+	}
+	if (tun)
+		__atomic_add_fetch(&tun->refctn, 1, __ATOMIC_RELAXED);
+
+	return ret;
+}
+
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id)
+{
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+
+	if (!thub)
+		return;
+	if (!LIST_EMPTY(&thub->tunnels))
+		DRV_LOG(WARNING, "port %u tunnels present\n", port_id);
+	mlx5_flow_id_pool_release(thub->tunnel_ids);
+	mlx5_flow_id_pool_release(thub->table_ids);
+	mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	mlx5_free(thub);
+}
+
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
+{
+	int err;
+	struct mlx5_flow_tunnel_hub *thub;
+
+	thub = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*thub),
+			   0, SOCKET_ID_ANY);
+	if (!thub)
+		return -ENOMEM;
+	LIST_INIT(&thub->tunnels);
+	thub->tunnel_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TUNNELS);
+	if (!thub->tunnel_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->table_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TABLES);
+	if (!thub->table_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->groups = mlx5_hlist_create("flow groups", MLX5_MAX_TABLES);
+	if (!thub->groups) {
+		err = -rte_errno;
+		goto err;
+	}
+	sh->tunnel_hub = thub;
+
+	return 0;
+
+err:
+	if (thub->groups)
+		mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	if (thub->table_ids)
+		mlx5_flow_id_pool_release(thub->table_ids);
+	if (thub->tunnel_ids)
+		mlx5_flow_id_pool_release(thub->tunnel_ids);
+	if (thub)
+		mlx5_free(thub);
+	return err;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index b4be4769ef..ecb5f023d9 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -26,6 +26,7 @@ enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
 	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
 
 /* Private (internal) rte flow actions. */
@@ -35,6 +36,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_MARK,
 	MLX5_RTE_FLOW_ACTION_TYPE_COPY_MREG,
 	MLX5_RTE_FLOW_ACTION_TYPE_DEFAULT_MISS,
+	MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET,
 };
 
 /* Matches on selected register. */
@@ -201,6 +203,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_AGE (1ull << 34)
 #define MLX5_FLOW_ACTION_DEFAULT_MISS (1ull << 35)
 #define MLX5_FLOW_ACTION_SAMPLE (1ull << 36)
+#define MLX5_FLOW_ACTION_TUNNEL_SET (1ull << 37)
+#define MLX5_FLOW_ACTION_TUNNEL_MATCH (1ull << 38)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -530,6 +534,10 @@ struct mlx5_flow_tbl_data_entry {
 	struct mlx5_flow_dv_jump_tbl_resource jump;
 	/**< jump resource, at most one for each table created. */
 	uint32_t idx; /**< index for the indexed mempool. */
+	/**< tunnel offload */
+	const struct mlx5_flow_tunnel *tunnel;
+	uint32_t group_id;
+	bool external;
 };
 
 /* Sub rdma-core actions list. */
@@ -768,6 +776,7 @@ struct mlx5_flow {
 	};
 	struct mlx5_flow_handle *handle;
 	uint32_t handle_idx; /* Index of the mlx5 flow handle memory. */
+	const struct mlx5_flow_tunnel *tunnel;
 };
 
 /* Flow meter state. */
@@ -913,6 +922,112 @@ struct mlx5_fdir_flow {
 
 #define HAIRPIN_FLOW_ID_BITS 28
 
+#define MLX5_MAX_TUNNELS 256
+#define MLX5_TNL_MISS_RULE_PRIORITY 3
+#define MLX5_TNL_MISS_FDB_JUMP_GRP  0x1234faac
+
+/*
+ * When tunnel offload is active, all JUMP group ids are converted
+ * using the same method. That conversion is applied both to tunnel and
+ * regular rule types.
+ * Group ids used in tunnel rules are relative to it's tunnel (!).
+ * Application can create number of steer rules, using the same
+ * tunnel, with different group id in each rule.
+ * Each tunnel stores its groups internally in PMD tunnel object.
+ * Groups used in regular rules do not belong to any tunnel and are stored
+ * in tunnel hub.
+ */
+
+struct mlx5_flow_tunnel {
+	LIST_ENTRY(mlx5_flow_tunnel) chain;
+	struct rte_flow_tunnel app_tunnel;	/** app tunnel copy */
+	uint32_t tunnel_id;			/** unique tunnel ID */
+	uint32_t refctn;
+	struct rte_flow_action action;
+	struct rte_flow_item item;
+	struct mlx5_hlist *groups;		/** tunnel groups */
+};
+
+/** PMD tunnel related context */
+struct mlx5_flow_tunnel_hub {
+	LIST_HEAD(, mlx5_flow_tunnel) tunnels;
+	struct mlx5_flow_id_pool *tunnel_ids;
+	struct mlx5_flow_id_pool *table_ids;
+	struct mlx5_hlist *groups;		/** non tunnel groups */
+};
+
+/* convert jump group to flow table ID in tunnel rules */
+struct tunnel_tbl_entry {
+	struct mlx5_hlist_entry hash;
+	uint32_t flow_table;
+};
+
+static inline uint32_t
+tunnel_id_to_flow_tbl(uint32_t id)
+{
+	return id | (1u << 16);
+}
+
+static inline uint32_t
+tunnel_flow_tbl_to_id(uint32_t flow_tbl)
+{
+	return flow_tbl & ~(1u << 16);
+}
+
+union tunnel_tbl_key {
+	uint64_t val;
+	struct {
+		uint32_t tunnel_id;
+		uint32_t group;
+	};
+};
+
+static inline struct mlx5_flow_tunnel_hub *
+mlx5_tunnel_hub(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return priv->sh->tunnel_hub;
+}
+
+static inline bool
+is_tunnel_offload_active(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return !!priv->config.dv_miss_info;
+}
+
+static inline bool
+is_flow_tunnel_match_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (items[0].type == (typeof(items[0].type))
+				 MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL);
+}
+
+static inline bool
+is_flow_tunnel_steer_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (actions[0].type == (typeof(actions[0].type))
+				   MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET);
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_actions_to_tunnel(const struct rte_flow_action actions[])
+{
+	return actions[0].conf;
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_items_to_tunnel(const struct rte_flow_item items[])
+{
+	return items[0].spec;
+}
+
 /* Flow structure. */
 struct rte_flow {
 	ILIST_ENTRY(uint32_t)next; /**< Index to the next flow structure. */
@@ -920,12 +1035,14 @@ struct rte_flow {
 	/**< Device flow handles that are part of the flow. */
 	uint32_t drv_type:2; /**< Driver type. */
 	uint32_t fdir:1; /**< Identifier of associated FDIR if any. */
+	uint32_t tunnel:1;
 	uint32_t hairpin_flow_id:HAIRPIN_FLOW_ID_BITS;
 	/**< The flow id used for hairpin. */
 	uint32_t copy_applied:1; /**< The MARK copy Flow os applied. */
 	uint32_t rix_mreg_copy;
 	/**< Index to metadata register copy table resource. */
 	uint32_t counter; /**< Holds flow counter. */
+	uint32_t tunnel_id;  /**< Tunnel id */
 	uint16_t meter; /**< Holds flow meter id. */
 } __rte_packed;
 
@@ -1008,9 +1125,54 @@ void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
 uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
 uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
 			      uint32_t id);
-int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
-			     bool external, uint32_t group, bool fdb_def_rule,
-			     uint32_t *table, struct rte_flow_error *error);
+__extension__
+struct flow_grp_info {
+	uint64_t external:1;
+	uint64_t transfer:1;
+	uint64_t fdb_def_rule:1;
+	/* force standard group translation */
+	uint64_t std_tbl_fix:1;
+};
+
+static inline bool
+tunnel_use_standard_attr_group_translate
+		    (struct rte_eth_dev *dev,
+		     const struct mlx5_flow_tunnel *tunnel,
+		     const struct rte_flow_attr *attr,
+		     const struct rte_flow_item items[],
+		     const struct rte_flow_action actions[])
+{
+	bool verdict;
+
+	if (!is_tunnel_offload_active(dev))
+		/* no tunnel offload API */
+		verdict = true;
+	else if (tunnel) {
+		/*
+		 * OvS will use jump to group 0 in tunnel steer rule.
+		 * If tunnel steer rule starts from group 0 (attr.group == 0)
+		 * that 0 group must be traslated with standard method.
+		 * attr.group == 0 in tunnel match rule translated with tunnel
+		 * method
+		 */
+		verdict = !attr->group &&
+			  is_flow_tunnel_steer_rule(dev, attr, items, actions);
+	} else {
+		/*
+		 * non-tunnel group translation uses standard method for
+		 * root group only: attr.group == 0
+		 */
+		verdict = !attr->group;
+	}
+
+	return verdict;
+}
+
+int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     uint32_t group, uint32_t *table,
+			     struct flow_grp_info flags,
+				 struct rte_flow_error *error);
 uint64_t mlx5_flow_hashfields_adjust(struct mlx5_flow_rss_desc *rss_desc,
 				     int tunnel, uint64_t layer_types,
 				     uint64_t hash_fields);
@@ -1145,4 +1307,6 @@ int mlx5_flow_destroy_policer_rules(struct rte_eth_dev *dev,
 int mlx5_flow_meter_flush(struct rte_eth_dev *dev,
 			  struct rte_mtr_error *error);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 15cd34e331..59c9d3e433 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3947,14 +3947,21 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_dv_validate_action_jump(const struct rte_flow_action *action,
+flow_dv_validate_action_jump(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     const struct rte_flow_action *action,
 			     uint64_t action_flags,
 			     const struct rte_flow_attr *attributes,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table;
 	int ret = 0;
-
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attributes->transfer,
+		.fdb_def_rule = 1,
+		.std_tbl_fix = 0
+	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
 			    MLX5_FLOW_FATE_ESWITCH_ACTIONS))
 		return rte_flow_error_set(error, EINVAL,
@@ -3977,11 +3984,13 @@ flow_dv_validate_action_jump(const struct rte_flow_action *action,
 					  NULL, "action configuration not set");
 	target_group =
 		((const struct rte_flow_action_jump *)action->conf)->group;
-	ret = mlx5_flow_group_to_table(attributes, external, target_group,
-				       true, &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, target_group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
-	if (attributes->group == target_group)
+	if (attributes->group == target_group &&
+	    !(action_flags & (MLX5_FLOW_ACTION_TUNNEL_SET |
+			      MLX5_FLOW_ACTION_TUNNEL_MATCH)))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "target group must be other than"
@@ -5160,8 +5169,9 @@ flow_dv_counter_release(struct rte_eth_dev *dev, uint32_t counter)
  */
 static int
 flow_dv_validate_attributes(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_tunnel *tunnel,
 			    const struct rte_flow_attr *attributes,
-			    bool external __rte_unused,
+			    struct flow_grp_info grp_info,
 			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5177,9 +5187,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 #else
 	uint32_t table = 0;
 
-	ret = mlx5_flow_group_to_table(attributes, external,
-				       attributes->group, !!priv->fdb_def_rule,
-				       &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attributes->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	if (!table)
@@ -5293,10 +5302,28 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 	const struct rte_flow_item_vlan *vlan_m = NULL;
 	int16_t rw_act_num = 0;
 	uint64_t is_root;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	if (items == NULL)
 		return -1;
-	ret = flow_dv_validate_attributes(dev, attr, external, error);
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		tunnel = flow_items_to_tunnel(items);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_MATCH |
+				MLX5_FLOW_ACTION_DECAP;
+	} else if (is_flow_tunnel_steer_rule(dev, attr, items, actions)) {
+		tunnel = flow_actions_to_tunnel(actions);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+	} else {
+		tunnel = NULL;
+	}
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = flow_dv_validate_attributes(dev, tunnel, attr, grp_info, error);
 	if (ret < 0)
 		return ret;
 	is_root = (uint64_t)ret;
@@ -5309,6 +5336,15 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  NULL, "item not supported");
 		switch (type) {
+		case MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL:
+			if (items[0].type != (typeof(items[0].type))
+						MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ITEM,
+						NULL, "MLX5 private items "
+						"must be the first");
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -5894,7 +5930,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_MDF_TTL;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			ret = flow_dv_validate_action_jump(actions,
+			ret = flow_dv_validate_action_jump(dev, tunnel, actions,
 							   action_flags,
 							   attr, external,
 							   error);
@@ -6003,6 +6039,17 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			action_flags |= MLX5_FLOW_ACTION_SAMPLE;
 			++actions_n;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			if (actions[0].type != (typeof(actions[0].type))
+				MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL, "MLX5 private action "
+						"must be the first");
+
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -6010,6 +6057,54 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  "action not supported");
 		}
 	}
+	/*
+	 * Validate actions in flow rules
+	 * - Explicit decap action is prohibited by the tunnel offload API.
+	 * - Drop action in tunnel steer rule is prohibited by the API.
+	 * - Application cannot use MARK action because it's value can mask
+	 *   tunnel default miss nitification.
+	 * - JUMP in tunnel match rule has no support in current PMD
+	 *   implementation.
+	 * - TAG & META are reserved for future uses.
+	 */
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_SET) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_DECAP    |
+					    MLX5_FLOW_ACTION_MARK     |
+					    MLX5_FLOW_ACTION_SET_TAG  |
+					    MLX5_FLOW_ACTION_SET_META |
+					    MLX5_FLOW_ACTION_DROP;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set decap rule");
+		if (!(action_flags & MLX5_FLOW_ACTION_JUMP))
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel set decap rule must terminate "
+					"with JUMP");
+		if (!attr->ingress)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel flows for ingress traffic only");
+	}
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_MATCH) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_JUMP    |
+					    MLX5_FLOW_ACTION_MARK    |
+					    MLX5_FLOW_ACTION_SET_TAG |
+					    MLX5_FLOW_ACTION_SET_META;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set match rule");
+	}
 	/*
 	 * Validate the drop action mutual exclusion with other actions.
 	 * Drop action is mutually-exclusive with any other action, except for
@@ -7876,6 +7971,9 @@ static struct mlx5_flow_tbl_resource *
 flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 			 uint32_t table_id, uint8_t egress,
 			 uint8_t transfer,
+			 bool external,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group_id,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -7912,6 +8010,9 @@ flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	tbl_data->idx = idx;
+	tbl_data->tunnel = tunnel;
+	tbl_data->group_id = group_id;
+	tbl_data->external = external;
 	tbl = &tbl_data->tbl;
 	pos = &tbl_data->entry;
 	if (transfer)
@@ -7975,6 +8076,41 @@ flow_dv_tbl_resource_release(struct rte_eth_dev *dev,
 
 		mlx5_flow_os_destroy_flow_tbl(tbl->obj);
 		tbl->obj = NULL;
+		if (is_tunnel_offload_active(dev) && tbl_data->external) {
+			struct mlx5_hlist_entry *he;
+			struct mlx5_hlist *tunnel_grp_hash;
+			struct mlx5_flow_tunnel_hub *thub =
+							mlx5_tunnel_hub(dev);
+			union tunnel_tbl_key tunnel_key = {
+				.tunnel_id = tbl_data->tunnel ?
+						tbl_data->tunnel->tunnel_id : 0,
+				.group = tbl_data->group_id
+			};
+			union mlx5_flow_tbl_key table_key = {
+				.v64 = pos->key
+			};
+			uint32_t table_id = table_key.table_id;
+
+			tunnel_grp_hash = tbl_data->tunnel ?
+						tbl_data->tunnel->groups :
+						thub->groups;
+			he = mlx5_hlist_lookup(tunnel_grp_hash, tunnel_key.val);
+			if (he) {
+				struct tunnel_tbl_entry *tte;
+				tte = container_of(he, typeof(*tte), hash);
+				MLX5_ASSERT(tte->flow_table == table_id);
+				mlx5_hlist_remove(tunnel_grp_hash, he);
+				mlx5_free(tte);
+			}
+			mlx5_flow_id_release(mlx5_tunnel_hub(dev)->table_ids,
+					     tunnel_flow_tbl_to_id(table_id));
+			DRV_LOG(DEBUG,
+				"port %u release table_id %#x tunnel %u group %u",
+				dev->data->port_id, table_id,
+				tbl_data->tunnel ?
+				tbl_data->tunnel->tunnel_id : 0,
+				tbl_data->group_id);
+		}
 		/* remove the entry from the hash list and free memory. */
 		mlx5_hlist_remove(sh->flow_tbls, pos);
 		mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_JUMP],
@@ -8020,7 +8156,7 @@ flow_dv_matcher_register(struct rte_eth_dev *dev,
 	int ret;
 
 	tbl = flow_dv_tbl_resource_get(dev, key->table_id, key->direction,
-				       key->domain, error);
+				       key->domain, false, NULL, 0, error);
 	if (!tbl)
 		return -rte_errno;	/* No need to refill the error info */
 	tbl_data = container_of(tbl, struct mlx5_flow_tbl_data_entry, tbl);
@@ -8515,7 +8651,8 @@ flow_dv_sample_resource_register(struct rte_eth_dev *dev,
 	*cache_resource = *resource;
 	/* Create normal path table level */
 	tbl = flow_dv_tbl_resource_get(dev, next_ft_id,
-					attr->egress, attr->transfer, error);
+					attr->egress, attr->transfer,
+					dev_flow->external, NULL, 0, error);
 	if (!tbl) {
 		rte_flow_error_set(error, ENOMEM,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -9127,6 +9264,12 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 	int tmp_actions_n = 0;
 	uint32_t table;
 	int ret = 0;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!dev_flow->external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	memset(&mdest_res, 0, sizeof(struct mlx5_flow_dv_dest_array_resource));
 	memset(&sample_res, 0, sizeof(struct mlx5_flow_dv_sample_resource));
@@ -9134,8 +9277,17 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
 	/* update normal path action resource into last index of array */
 	sample_act = &mdest_res.sample_act[MLX5_MAX_DEST_NUM - 1];
-	ret = mlx5_flow_group_to_table(attr, dev_flow->external, attr->group,
-				       !!priv->fdb_def_rule, &table, error);
+	tunnel = is_flow_tunnel_match_rule(dev, attr, items, actions) ?
+		 flow_items_to_tunnel(items) :
+		 is_flow_tunnel_steer_rule(dev, attr, items, actions) ?
+		 flow_actions_to_tunnel(actions) :
+		 dev_flow->tunnel ? dev_flow->tunnel : NULL;
+	mhdr_res->ft_type = attr->egress ? MLX5DV_FLOW_TABLE_TYPE_NIC_TX :
+					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attr->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	dev_flow->dv.group = table;
@@ -9145,6 +9297,45 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		priority = dev_conf->flow_prio - 1;
 	/* number of actions must be set to 0 in case of dirty stack. */
 	mhdr_res->actions_num = 0;
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		/*
+		 * do not add decap action if match rule drops packet
+		 * HW rejects rules with decap & drop
+		 */
+		bool add_decap = true;
+		const struct rte_flow_action *ptr = actions;
+		struct mlx5_flow_tbl_resource *tbl;
+
+		for (; ptr->type != RTE_FLOW_ACTION_TYPE_END; ptr++) {
+			if (ptr->type == RTE_FLOW_ACTION_TYPE_DROP) {
+				add_decap = false;
+				break;
+			}
+		}
+		if (add_decap) {
+			if (flow_dv_create_action_l2_decap(dev, dev_flow,
+							   attr->transfer,
+							   error))
+				return -rte_errno;
+			dev_flow->dv.actions[actions_n++] =
+					dev_flow->dv.encap_decap->action;
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
+		}
+		/*
+		 * bind table_id with <group, table> for tunnel match rule.
+		 * Tunnel set rule establishes that bind in JUMP action handler.
+		 * Required for scenario when application creates tunnel match
+		 * rule before tunnel set rule.
+		 */
+		tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+					       attr->transfer,
+					       !!dev_flow->external, tunnel,
+					       attr->group, error);
+		if (!tbl)
+			return rte_flow_error_set
+			       (error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+			       actions, "cannot register tunnel group");
+	}
 	for (; !actions_end ; actions++) {
 		const struct rte_flow_action_queue *queue;
 		const struct rte_flow_action_rss *rss;
@@ -9165,6 +9356,9 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 						  actions,
 						  "action not supported");
 		switch (action_type) {
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -9408,18 +9602,18 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 							action->conf)->group;
-			if (dev_flow->external && jump_group <
-					MLX5_MAX_TABLES_EXTERNAL)
-				jump_group *= MLX5_FLOW_TABLE_FACTOR;
-			ret = mlx5_flow_group_to_table(attr, dev_flow->external,
+			grp_info.std_tbl_fix = 0;
+			ret = mlx5_flow_group_to_table(dev, tunnel,
 						       jump_group,
-						       !!priv->fdb_def_rule,
-						       &table, error);
+						       &table,
+						       grp_info, error);
 			if (ret)
 				return ret;
-			tbl = flow_dv_tbl_resource_get(dev, table,
-						       attr->egress,
-						       attr->transfer, error);
+			tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+						       attr->transfer,
+						       !!dev_flow->external,
+						       tunnel, jump_group,
+						       error);
 			if (!tbl)
 				return rte_flow_error_set
 						(error, errno,
@@ -10890,7 +11084,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 		dtb = &mtb->ingress;
 	/* Create the meter table with METER level. */
 	dtb->tbl = flow_dv_tbl_resource_get(dev, MLX5_FLOW_TABLE_LEVEL_METER,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->tbl) {
 		DRV_LOG(ERR, "Failed to create meter policer table.");
 		return -1;
@@ -10898,7 +11093,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 	/* Create the meter suffix table with SUFFIX level. */
 	dtb->sfx_tbl = flow_dv_tbl_resource_get(dev,
 					    MLX5_FLOW_TABLE_LEVEL_SUFFIX,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->sfx_tbl) {
 		DRV_LOG(ERR, "Failed to create meter suffix table.");
 		return -1;
@@ -11217,10 +11413,10 @@ mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev)
 	void *flow = NULL;
 	int i, ret = -1;
 
-	tbl = flow_dv_tbl_resource_get(dev, 0, 0, 0, NULL);
+	tbl = flow_dv_tbl_resource_get(dev, 0, 0, 0, false, NULL, 0, NULL);
 	if (!tbl)
 		goto err;
-	dest_tbl = flow_dv_tbl_resource_get(dev, 1, 0, 0, NULL);
+	dest_tbl = flow_dv_tbl_resource_get(dev, 1, 0, 0, false, NULL, 0, NULL);
 	if (!dest_tbl)
 		goto err;
 	dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0x4);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v3] net/mlx5: implement tunnel offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (12 preceding siblings ...)
  2020-10-22 16:00 ` [dpdk-dev] [PATCH v2] " Gregory Etelson
@ 2020-10-23 13:49 ` Gregory Etelson
  2020-10-23 13:57 ` [dpdk-dev] [PATCH v4] " Gregory Etelson
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-23 13:49 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Viacheslav Ovsiienko,
	Shahaf Shuler

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions whith tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2:
* fix rebase conflicts
v3:
* remove unised variables
---
 doc/guides/nics/mlx5.rst         |   3 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +
 drivers/net/mlx5/mlx5.c          |   8 +-
 drivers/net/mlx5/mlx5.h          |   3 +
 drivers/net/mlx5/mlx5_defs.h     |   2 +
 drivers/net/mlx5/mlx5_flow.c     | 682 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h     | 170 +++++++-
 drivers/net/mlx5/mlx5_flow_dv.c  | 254 ++++++++++--
 8 files changed, 1087 insertions(+), 53 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 1a8808e854..678d2a9597 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -831,6 +831,9 @@ Driver options
     24 bits. The actual supported width can be retrieved in runtime by
     series of rte_flow_validate() trials.
 
+  - 3, this engages tunnel offload mode. In E-Switch configuration, that
+    mode implicitly activates ``dv_xmeta_en=1``.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 40f9446d43..ed3f020d82 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -291,6 +291,12 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		sh->esw_drop_action = mlx5_glue->dr_create_flow_action_drop();
 	}
 #endif
+	if (!sh->tunnel_hub)
+		err = mlx5_alloc_tunnel_hub(sh);
+	if (err) {
+		DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
+		goto error;
+	}
 	if (priv->config.reclaim_mode == MLX5_RCM_AGGR) {
 		mlx5_glue->dr_reclaim_domain_memory(sh->rx_domain, 1);
 		mlx5_glue->dr_reclaim_domain_memory(sh->tx_domain, 1);
@@ -335,6 +341,10 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 	return err;
 }
@@ -391,6 +401,10 @@ mlx5_os_free_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 }
 
@@ -733,6 +747,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			strerror(rte_errno));
 		goto error;
 	}
+	if (config->dv_miss_info) {
+		if (switch_info->master || switch_info->representor)
+			config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
+	}
 	mlx5_malloc_mem_select(config->sys_mem_en);
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e4ce9a9cb7..25048df070 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1618,13 +1618,17 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 	} else if (strcmp(MLX5_DV_XMETA_EN, key) == 0) {
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
-		    tmp != MLX5_XMETA_MODE_META32) {
+		    tmp != MLX5_XMETA_MODE_META32 &&
+		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
 			DRV_LOG(ERR, "invalid extensive "
 				     "metadata parameter");
 			rte_errno = EINVAL;
 			return -rte_errno;
 		}
-		config->dv_xmeta_en = tmp;
+		if (tmp != MLX5_XMETA_MODE_MISS_INFO)
+			config->dv_xmeta_en = tmp;
+		else
+			config->dv_miss_info = 1;
 	} else if (strcmp(MLX5_LACP_BY_USER, key) == 0) {
 		config->lacp_by_user = !!tmp;
 	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c9d5d71630..479a815eed 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -208,6 +208,7 @@ struct mlx5_dev_config {
 	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	unsigned int sys_mem_en:1; /* The default memory allocator. */
 	unsigned int decap_en:1; /* Whether decap will be used or not. */
+	unsigned int dv_miss_info:1; /* restore packet after partial hw miss */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
@@ -644,6 +645,8 @@ struct mlx5_dev_ctx_shared {
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
 	struct mlx5_hlist *flow_tbls;
+	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
+	struct mlx5_flow_tunnel_hub *tunnel_hub;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
 	void *pop_vlan_action; /* Pointer to DR pop VLAN action. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 0df47391ee..41a7537d5e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -165,6 +165,8 @@
 #define MLX5_XMETA_MODE_LEGACY 0
 #define MLX5_XMETA_MODE_META16 1
 #define MLX5_XMETA_MODE_META32 2
+/* Provide info on patrial hw miss. Implies MLX5_XMETA_MODE_META16 */
+#define MLX5_XMETA_MODE_MISS_INFO 3
 
 /* MLX5_TX_DB_NC supported values. */
 #define MLX5_TXDB_CACHED 0
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index d7243a878b..21004452e3 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -19,6 +19,7 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 #include <rte_ip.h>
+#include <rte_hash.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -32,6 +33,18 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_common_os.h"
 
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id);
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev, struct mlx5_flow_tunnel *tunnel);
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark);
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel);
+
+
 /** Device flow drivers. */
 extern const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops;
 
@@ -548,6 +561,163 @@ static const struct mlx5_flow_expand_node mlx5_support_expansion[] = {
 	},
 };
 
+static inline bool
+mlx5_flow_tunnel_validate(struct rte_eth_dev *dev,
+			  struct rte_flow_tunnel *tunnel,
+			  const char *err_msg)
+{
+	err_msg = NULL;
+	if (!is_tunnel_offload_active(dev)) {
+		err_msg = "tunnel offload was not activated";
+		goto out;
+	} else if (!tunnel) {
+		err_msg = "no application tunnel";
+		goto out;
+	}
+
+	switch (tunnel->type) {
+	default:
+		err_msg = "unsupported tunnel type";
+		goto out;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		break;
+	}
+
+out:
+	return !err_msg;
+}
+
+
+static int
+mlx5_flow_tunnel_decap_set(struct rte_eth_dev *dev,
+		    struct rte_flow_tunnel *app_tunnel,
+		    struct rte_flow_action **actions,
+		    uint32_t *num_of_actions,
+		    struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*actions = &tunnel->action;
+	*num_of_actions = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_match(struct rte_eth_dev *dev,
+		       struct rte_flow_tunnel *app_tunnel,
+		       struct rte_flow_item **items,
+		       uint32_t *num_of_items,
+		       struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*items = &tunnel->item;
+	*num_of_items = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_item_release(struct rte_eth_dev *dev,
+		       struct rte_flow_item *pmd_items,
+		       uint32_t num_items, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->item == pmd_items)
+			break;
+	}
+	if (!tun || num_items != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+	return 0;
+}
+
+static int
+mlx5_flow_action_release(struct rte_eth_dev *dev,
+			 struct rte_flow_action *pmd_actions,
+			 uint32_t num_actions, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->action == pmd_actions)
+			break;
+	}
+	if (!tun || num_actions != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_get_restore_info(struct rte_eth_dev *dev,
+				  struct rte_mbuf *m,
+				  struct rte_flow_restore_info *info,
+				  struct rte_flow_error *err)
+{
+	uint64_t ol_flags = m->ol_flags;
+	const struct mlx5_flow_tbl_data_entry *tble;
+	const uint64_t mask = PKT_RX_FDIR | PKT_RX_FDIR_ID;
+
+	if ((ol_flags & mask) != mask)
+		goto err;
+	tble = tunnel_mark_decode(dev, m->hash.fdir.hi);
+	if (!tble) {
+		DRV_LOG(DEBUG, "port %u invalid miss tunnel mark %#x",
+			dev->data->port_id, m->hash.fdir.hi);
+		goto err;
+	}
+	MLX5_ASSERT(tble->tunnel);
+	memcpy(&info->tunnel, &tble->tunnel->app_tunnel, sizeof(info->tunnel));
+	info->group_id = tble->group_id;
+	info->flags = RTE_FLOW_RESTORE_INFO_TUNNEL |
+		      RTE_FLOW_RESTORE_INFO_GROUP_ID |
+		      RTE_FLOW_RESTORE_INFO_ENCAPSULATED;
+
+	return 0;
+
+err:
+	return rte_flow_error_set(err, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "failed to get restore info");
+}
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -557,6 +727,11 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
 	.get_aged_flows = mlx5_flow_get_aged_flows,
+	.tunnel_decap_set = mlx5_flow_tunnel_decap_set,
+	.tunnel_match = mlx5_flow_tunnel_match,
+	.tunnel_action_decap_release = mlx5_flow_action_release,
+	.tunnel_item_release = mlx5_flow_item_release,
+	.get_restore_info = mlx5_flow_tunnel_get_restore_info,
 };
 
 /* Convert FDIR request to Generic flow. */
@@ -3882,6 +4057,142 @@ flow_hairpin_split(struct rte_eth_dev *dev,
 	return 0;
 }
 
+__extension__
+union tunnel_offload_mark {
+	uint32_t val;
+	struct {
+		uint32_t app_reserve:8;
+		uint32_t table_id:15;
+		uint32_t transfer:1;
+		uint32_t _unused_:8;
+	};
+};
+
+struct tunnel_default_miss_ctx {
+	uint16_t *queue;
+	__extension__
+	union {
+		struct rte_flow_action_rss action_rss;
+		struct rte_flow_action_queue miss_queue;
+		struct rte_flow_action_jump miss_jump;
+		uint8_t raw[0];
+	};
+};
+
+static int
+flow_tunnel_add_default_miss(struct rte_eth_dev *dev,
+			     struct rte_flow *flow,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *app_actions,
+			     uint32_t flow_idx,
+			     struct tunnel_default_miss_ctx *ctx,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow *dev_flow;
+	struct rte_flow_attr miss_attr = *attr;
+	const struct mlx5_flow_tunnel *tunnel = app_actions[0].conf;
+	const struct rte_flow_item miss_items[2] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		}
+	};
+	union tunnel_offload_mark mark_id;
+	struct rte_flow_action_mark miss_mark;
+	struct rte_flow_action miss_actions[3] = {
+		[0] = { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &miss_mark },
+		[2] = { .type = RTE_FLOW_ACTION_TYPE_END,  .conf = NULL }
+	};
+	const struct rte_flow_action_jump *jump_data;
+	uint32_t i, flow_table = 0; /* prevent compilation warning */
+	struct flow_grp_info grp_info = {
+		.external = 1,
+		.transfer = attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+		.std_tbl_fix = 0,
+	};
+	int ret;
+
+	if (!attr->transfer) {
+		uint32_t q_size;
+
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_RSS;
+		q_size = priv->reta_idx_n * sizeof(ctx->queue[0]);
+		ctx->queue = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, q_size,
+					 0, SOCKET_ID_ANY);
+		if (!ctx->queue)
+			return rte_flow_error_set
+				(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid default miss RSS");
+		ctx->action_rss.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		ctx->action_rss.level = 0,
+		ctx->action_rss.types = priv->rss_conf.rss_hf,
+		ctx->action_rss.key_len = priv->rss_conf.rss_key_len,
+		ctx->action_rss.queue_num = priv->reta_idx_n,
+		ctx->action_rss.key = priv->rss_conf.rss_key,
+		ctx->action_rss.queue = ctx->queue;
+		if (!priv->reta_idx_n || !priv->rxqs_n)
+			return rte_flow_error_set
+				(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid port configuration");
+		if (!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG))
+			ctx->action_rss.types = 0;
+		for (i = 0; i != priv->reta_idx_n; ++i)
+			ctx->queue[i] = (*priv->reta_idx)[i];
+	} else {
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_JUMP;
+		ctx->miss_jump.group = MLX5_TNL_MISS_FDB_JUMP_GRP;
+	}
+	miss_actions[1].conf = (typeof(miss_actions[1].conf))ctx->raw;
+	for (; app_actions->type != RTE_FLOW_ACTION_TYPE_JUMP; app_actions++);
+	jump_data = app_actions->conf;
+	miss_attr.priority = MLX5_TNL_MISS_RULE_PRIORITY;
+	miss_attr.group = jump_data->group;
+	ret = mlx5_flow_group_to_table(dev, tunnel, jump_data->group,
+				       &flow_table, grp_info, error);
+	if (ret)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  NULL, "invalid tunnel id");
+	mark_id.app_reserve = 0;
+	mark_id.table_id = tunnel_flow_tbl_to_id(flow_table);
+	mark_id.transfer = !!attr->transfer;
+	mark_id._unused_ = 0;
+	miss_mark.id = mark_id.val;
+	dev_flow = flow_drv_prepare(dev, flow, &miss_attr,
+				    miss_items, miss_actions, flow_idx, error);
+	if (!dev_flow)
+		return -rte_errno;
+	dev_flow->flow = flow;
+	dev_flow->external = true;
+	dev_flow->tunnel = tunnel;
+	/* Subflow object was created, we must include one in the list. */
+	SILIST_INSERT(&flow->dev_handles, dev_flow->handle_idx,
+		      dev_flow->handle, next);
+	DRV_LOG(DEBUG,
+		"port %u tunnel type=%d id=%u miss rule priority=%u group=%u",
+		dev->data->port_id, tunnel->app_tunnel.type,
+		tunnel->tunnel_id, miss_attr.priority, miss_attr.group);
+	ret = flow_drv_translate(dev, dev_flow, &miss_attr, miss_items,
+				  miss_actions, error);
+	if (!ret)
+		ret = flow_mreg_update_copy_table(dev, flow, miss_actions,
+						  error);
+
+	return ret;
+}
+
 /**
  * The last stage of splitting chain, just creates the subflow
  * without any modification.
@@ -5004,6 +5315,27 @@ flow_create_split_outer(struct rte_eth_dev *dev,
 	return ret;
 }
 
+static struct mlx5_flow_tunnel *
+flow_tunnel_from_rule(struct rte_eth_dev *dev,
+		      const struct rte_flow_attr *attr,
+		      const struct rte_flow_item items[],
+		      const struct rte_flow_action actions[])
+{
+	struct mlx5_flow_tunnel *tunnel;
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)items[0].spec;
+	else if (is_flow_tunnel_steer_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)actions[0].conf;
+	else
+		tunnel = NULL;
+#pragma GCC diagnostic pop
+
+	return tunnel;
+}
+
 /**
  * Create a flow and add it to @p list.
  *
@@ -5065,11 +5397,11 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	uint32_t hairpin_id = 0;
 	struct rte_flow_attr attr_tx = { .priority = 0 };
 	struct rte_flow_attr attr_factor = {0};
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_default_miss_ctx default_miss_ctx = { 0, };
 	int ret;
 
 	memcpy((void *)&attr_factor, (const void *)attr, sizeof(*attr));
-	if (external)
-		attr_factor.group *= MLX5_FLOW_TABLE_FACTOR;
 	hairpin_flow = flow_check_hairpin_split(dev, &attr_factor, actions);
 	ret = flow_drv_validate(dev, &attr_factor, items, p_actions_rx,
 				external, hairpin_flow, error);
@@ -5141,6 +5473,19 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 					      error);
 		if (ret < 0)
 			goto error;
+		if (is_flow_tunnel_steer_rule(dev, attr,
+					      buf->entry[i].pattern,
+					      p_actions_rx)) {
+			ret = flow_tunnel_add_default_miss(dev, flow, attr,
+							   p_actions_rx,
+							   idx,
+							   &default_miss_ctx,
+							   error);
+			if (ret < 0) {
+				mlx5_free(default_miss_ctx.queue);
+				goto error;
+			}
+		}
 	}
 	/* Create the tx flow. */
 	if (hairpin_flow) {
@@ -5195,6 +5540,13 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	priv->flow_idx = priv->flow_nested_idx;
 	if (priv->flow_nested_idx)
 		priv->flow_nested_idx = 0;
+	tunnel = flow_tunnel_from_rule(dev, attr, items, actions);
+	if (tunnel) {
+		flow->tunnel = 1;
+		flow->tunnel_id = tunnel->tunnel_id;
+		__atomic_add_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED);
+		mlx5_free(default_miss_ctx.queue);
+	}
 	return idx;
 error:
 	MLX5_ASSERT(flow);
@@ -5314,6 +5666,7 @@ mlx5_flow_create(struct rte_eth_dev *dev,
 				   "port not started");
 		return NULL;
 	}
+
 	return (void *)(uintptr_t)flow_list_create(dev, &priv->flows,
 				  attr, items, actions, true, error);
 }
@@ -5368,6 +5721,13 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 		}
 	}
 	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+	if (flow->tunnel) {
+		struct mlx5_flow_tunnel *tunnel;
+		tunnel = mlx5_find_tunnel_id(dev, flow->tunnel_id);
+		RTE_VERIFY(tunnel);
+		if (!__atomic_sub_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED))
+			mlx5_flow_tunnel_free(dev, tunnel);
+	}
 }
 
 /**
@@ -6902,19 +7262,122 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 	sh->cmng.pending_queries--;
 }
 
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_hlist_entry *he;
+	union tunnel_offload_mark mbits = { .val = mark };
+	union mlx5_flow_tbl_key table_key = {
+		{
+			.table_id = tunnel_id_to_flow_tbl(mbits.table_id),
+			.reserved = 0,
+			.domain = !!mbits.transfer,
+			.direction = 0,
+		}
+	};
+	he = mlx5_hlist_lookup(sh->flow_tbls, table_key.v64);
+	return he ?
+	       container_of(he, struct mlx5_flow_tbl_data_entry, entry) : NULL;
+}
+
+static uint32_t
+tunnel_flow_group_to_flow_table(struct rte_eth_dev *dev,
+				const struct mlx5_flow_tunnel *tunnel,
+				uint32_t group, uint32_t *table,
+				struct rte_flow_error *error)
+{
+	struct mlx5_hlist_entry *he;
+	struct tunnel_tbl_entry *tte;
+	union tunnel_tbl_key key = {
+		.tunnel_id = tunnel ? tunnel->tunnel_id : 0,
+		.group = group
+	};
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_hlist *group_hash;
+
+	group_hash = tunnel ? tunnel->groups : thub->groups;
+	he = mlx5_hlist_lookup(group_hash, key.val);
+	if (!he) {
+		int ret;
+		tte = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+				  sizeof(*tte), 0,
+				  SOCKET_ID_ANY);
+		if (!tte)
+			goto err;
+		tte->hash.key = key.val;
+		ret = mlx5_flow_id_get(thub->table_ids, &tte->flow_table);
+		if (ret) {
+			mlx5_free(tte);
+			goto err;
+		}
+		tte->flow_table = tunnel_id_to_flow_tbl(tte->flow_table);
+		mlx5_hlist_insert(group_hash, &tte->hash);
+	} else {
+		tte = container_of(he, typeof(*tte), hash);
+	}
+	*table = tte->flow_table;
+	DRV_LOG(DEBUG, "port %u tunnel %u group=%#x table=%#x",
+		dev->data->port_id, key.tunnel_id, group, *table);
+	return 0;
+
+err:
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+				  NULL, "tunnel group index not supported");
+}
+
+static int
+flow_group_to_table(uint32_t port_id, uint32_t group, uint32_t *table,
+		    struct flow_grp_info grp_info, struct rte_flow_error *error)
+{
+	if (grp_info.transfer && grp_info.external && grp_info.fdb_def_rule) {
+		if (group == UINT32_MAX)
+			return rte_flow_error_set
+						(error, EINVAL,
+						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						 NULL,
+						 "group index not supported");
+		*table = group + 1;
+	} else {
+		*table = group;
+	}
+	DRV_LOG(DEBUG, "port %u group=%#x table=%#x", port_id, group, *table);
+	return 0;
+}
+
 /**
  * Translate the rte_flow group index to HW table value.
  *
- * @param[in] attributes
- *   Pointer to flow attributes
- * @param[in] external
- *   Value is part of flow rule created by request external to PMD.
+ * If tunnel offload is disabled, all group ids coverted to flow table
+ * id using the standard method.
+ * If tunnel offload is enabled, group id can be converted using the
+ * standard or tunnel conversion method. Group conversion method
+ * selection depends on flags in `grp_info` parameter:
+ * - Internal (grp_info.external == 0) groups conversion uses the
+ *   standard method.
+ * - Group ids in JUMP action converted with the tunnel conversion.
+ * - Group id in rule attribute conversion depends on a rule type and
+ *   group id value:
+ *   ** non zero group attributes converted with the tunnel method
+ *   ** zero group attribute in non-tunnel rule is converted using the
+ *      standard method - there's only one root table
+ *   ** zero group attribute in steer tunnel rule is converted with the
+ *      standard method - single root table
+ *   ** zero group attribute in match tunnel rule is a special OvS
+ *      case: that value is used for portability reasons. That group
+ *      id is converted with the tunnel conversion method.
+ *
+ * @param[in] dev
+ *   Port device
+ * @param[in] tunnel
+ *   PMD tunnel offload object
  * @param[in] group
  *   rte_flow group index value.
- * @param[out] fdb_def_rule
- *   Whether fdb jump to table 1 is configured.
  * @param[out] table
  *   HW table value.
+ * @param[in] grp_info
+ *   flags used for conversion
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -6922,22 +7385,36 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_flow_group_to_table(const struct rte_flow_attr *attributes, bool external,
-			 uint32_t group, bool fdb_def_rule, uint32_t *table,
+mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group, uint32_t *table,
+			 struct flow_grp_info grp_info,
 			 struct rte_flow_error *error)
 {
-	if (attributes->transfer && external && fdb_def_rule) {
-		if (group == UINT32_MAX)
-			return rte_flow_error_set
-						(error, EINVAL,
-						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
-						 NULL,
-						 "group index not supported");
-		*table = group + 1;
+	int ret;
+	bool standard_translation;
+
+	if (grp_info.external && group < MLX5_MAX_TABLES_EXTERNAL)
+		group *= MLX5_FLOW_TABLE_FACTOR;
+	if (is_tunnel_offload_active(dev)) {
+		standard_translation = !grp_info.external ||
+					grp_info.std_tbl_fix;
 	} else {
-		*table = group;
+		standard_translation = true;
 	}
-	return 0;
+	DRV_LOG(DEBUG,
+		"port %u group=%#x transfer=%d external=%d fdb_def_rule=%d translate=%s",
+		dev->data->port_id, group, grp_info.transfer,
+		grp_info.external, grp_info.fdb_def_rule,
+		standard_translation ? "STANDARD" : "TUNNEL");
+	if (standard_translation)
+		ret = flow_group_to_table(dev->data->port_id, group, table,
+					  grp_info, error);
+	else
+		ret = tunnel_flow_group_to_flow_table(dev, tunnel, group,
+						      table, error);
+
+	return ret;
 }
 
 /**
@@ -7081,3 +7558,168 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 		 dev->data->port_id);
 	return -ENOTSUP;
 }
+
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev,
+		      struct mlx5_flow_tunnel *tunnel)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+
+	DRV_LOG(DEBUG, "port %u release pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+	RTE_VERIFY(!__atomic_load_n(&tunnel->refctn, __ATOMIC_RELAXED));
+	LIST_REMOVE(tunnel, chain);
+	mlx5_flow_id_release(id_pool, tunnel->tunnel_id);
+	mlx5_hlist_destroy(tunnel->groups, NULL, NULL);
+	mlx5_free(tunnel);
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (tun->tunnel_id == id)
+			break;
+	}
+
+	return tun;
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_flow_tunnel_allocate(struct rte_eth_dev *dev,
+			  const struct rte_flow_tunnel *app_tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+	uint32_t id;
+
+	ret = mlx5_flow_id_get(id_pool, &id);
+	if (ret)
+		return NULL;
+	/**
+	 * mlx5 flow tunnel is an auxlilary data structure
+	 * It's not part of IO. No need to allocate it from
+	 * huge pages pools dedicated for IO
+	 */
+	tunnel = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*tunnel),
+			     0, SOCKET_ID_ANY);
+	if (!tunnel) {
+		mlx5_flow_id_pool_release(id_pool);
+		return NULL;
+	}
+	tunnel->groups = mlx5_hlist_create("tunnel groups", 1024);
+	if (!tunnel->groups) {
+		mlx5_flow_id_pool_release(id_pool);
+		mlx5_free(tunnel);
+		return NULL;
+	}
+	/* initiate new PMD tunnel */
+	memcpy(&tunnel->app_tunnel, app_tunnel, sizeof(*app_tunnel));
+	tunnel->tunnel_id = id;
+	tunnel->action.type = (typeof(tunnel->action.type))
+			      MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET;
+	tunnel->action.conf = tunnel;
+	tunnel->item.type = (typeof(tunnel->item.type))
+			    MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL;
+	tunnel->item.spec = tunnel;
+	tunnel->item.last = NULL;
+	tunnel->item.mask = NULL;
+
+	DRV_LOG(DEBUG, "port %u new pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+
+	return tunnel;
+}
+
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (!memcmp(app_tunnel, &tun->app_tunnel,
+			    sizeof(*app_tunnel))) {
+			*tunnel = tun;
+			ret = 0;
+			break;
+		}
+	}
+	if (!tun) {
+		tun = mlx5_flow_tunnel_allocate(dev, app_tunnel);
+		if (tun) {
+			LIST_INSERT_HEAD(&thub->tunnels, tun, chain);
+			*tunnel = tun;
+		} else {
+			ret = -ENOMEM;
+		}
+	}
+	if (tun)
+		__atomic_add_fetch(&tun->refctn, 1, __ATOMIC_RELAXED);
+
+	return ret;
+}
+
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id)
+{
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+
+	if (!thub)
+		return;
+	if (!LIST_EMPTY(&thub->tunnels))
+		DRV_LOG(WARNING, "port %u tunnels present\n", port_id);
+	mlx5_flow_id_pool_release(thub->tunnel_ids);
+	mlx5_flow_id_pool_release(thub->table_ids);
+	mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	mlx5_free(thub);
+}
+
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
+{
+	int err;
+	struct mlx5_flow_tunnel_hub *thub;
+
+	thub = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*thub),
+			   0, SOCKET_ID_ANY);
+	if (!thub)
+		return -ENOMEM;
+	LIST_INIT(&thub->tunnels);
+	thub->tunnel_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TUNNELS);
+	if (!thub->tunnel_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->table_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TABLES);
+	if (!thub->table_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->groups = mlx5_hlist_create("flow groups", MLX5_MAX_TABLES);
+	if (!thub->groups) {
+		err = -rte_errno;
+		goto err;
+	}
+	sh->tunnel_hub = thub;
+
+	return 0;
+
+err:
+	if (thub->groups)
+		mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	if (thub->table_ids)
+		mlx5_flow_id_pool_release(thub->table_ids);
+	if (thub->tunnel_ids)
+		mlx5_flow_id_pool_release(thub->tunnel_ids);
+	if (thub)
+		mlx5_free(thub);
+	return err;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index b4be4769ef..ecb5f023d9 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -26,6 +26,7 @@ enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
 	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
 
 /* Private (internal) rte flow actions. */
@@ -35,6 +36,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_MARK,
 	MLX5_RTE_FLOW_ACTION_TYPE_COPY_MREG,
 	MLX5_RTE_FLOW_ACTION_TYPE_DEFAULT_MISS,
+	MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET,
 };
 
 /* Matches on selected register. */
@@ -201,6 +203,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_AGE (1ull << 34)
 #define MLX5_FLOW_ACTION_DEFAULT_MISS (1ull << 35)
 #define MLX5_FLOW_ACTION_SAMPLE (1ull << 36)
+#define MLX5_FLOW_ACTION_TUNNEL_SET (1ull << 37)
+#define MLX5_FLOW_ACTION_TUNNEL_MATCH (1ull << 38)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -530,6 +534,10 @@ struct mlx5_flow_tbl_data_entry {
 	struct mlx5_flow_dv_jump_tbl_resource jump;
 	/**< jump resource, at most one for each table created. */
 	uint32_t idx; /**< index for the indexed mempool. */
+	/**< tunnel offload */
+	const struct mlx5_flow_tunnel *tunnel;
+	uint32_t group_id;
+	bool external;
 };
 
 /* Sub rdma-core actions list. */
@@ -768,6 +776,7 @@ struct mlx5_flow {
 	};
 	struct mlx5_flow_handle *handle;
 	uint32_t handle_idx; /* Index of the mlx5 flow handle memory. */
+	const struct mlx5_flow_tunnel *tunnel;
 };
 
 /* Flow meter state. */
@@ -913,6 +922,112 @@ struct mlx5_fdir_flow {
 
 #define HAIRPIN_FLOW_ID_BITS 28
 
+#define MLX5_MAX_TUNNELS 256
+#define MLX5_TNL_MISS_RULE_PRIORITY 3
+#define MLX5_TNL_MISS_FDB_JUMP_GRP  0x1234faac
+
+/*
+ * When tunnel offload is active, all JUMP group ids are converted
+ * using the same method. That conversion is applied both to tunnel and
+ * regular rule types.
+ * Group ids used in tunnel rules are relative to it's tunnel (!).
+ * Application can create number of steer rules, using the same
+ * tunnel, with different group id in each rule.
+ * Each tunnel stores its groups internally in PMD tunnel object.
+ * Groups used in regular rules do not belong to any tunnel and are stored
+ * in tunnel hub.
+ */
+
+struct mlx5_flow_tunnel {
+	LIST_ENTRY(mlx5_flow_tunnel) chain;
+	struct rte_flow_tunnel app_tunnel;	/** app tunnel copy */
+	uint32_t tunnel_id;			/** unique tunnel ID */
+	uint32_t refctn;
+	struct rte_flow_action action;
+	struct rte_flow_item item;
+	struct mlx5_hlist *groups;		/** tunnel groups */
+};
+
+/** PMD tunnel related context */
+struct mlx5_flow_tunnel_hub {
+	LIST_HEAD(, mlx5_flow_tunnel) tunnels;
+	struct mlx5_flow_id_pool *tunnel_ids;
+	struct mlx5_flow_id_pool *table_ids;
+	struct mlx5_hlist *groups;		/** non tunnel groups */
+};
+
+/* convert jump group to flow table ID in tunnel rules */
+struct tunnel_tbl_entry {
+	struct mlx5_hlist_entry hash;
+	uint32_t flow_table;
+};
+
+static inline uint32_t
+tunnel_id_to_flow_tbl(uint32_t id)
+{
+	return id | (1u << 16);
+}
+
+static inline uint32_t
+tunnel_flow_tbl_to_id(uint32_t flow_tbl)
+{
+	return flow_tbl & ~(1u << 16);
+}
+
+union tunnel_tbl_key {
+	uint64_t val;
+	struct {
+		uint32_t tunnel_id;
+		uint32_t group;
+	};
+};
+
+static inline struct mlx5_flow_tunnel_hub *
+mlx5_tunnel_hub(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return priv->sh->tunnel_hub;
+}
+
+static inline bool
+is_tunnel_offload_active(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return !!priv->config.dv_miss_info;
+}
+
+static inline bool
+is_flow_tunnel_match_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (items[0].type == (typeof(items[0].type))
+				 MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL);
+}
+
+static inline bool
+is_flow_tunnel_steer_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (actions[0].type == (typeof(actions[0].type))
+				   MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET);
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_actions_to_tunnel(const struct rte_flow_action actions[])
+{
+	return actions[0].conf;
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_items_to_tunnel(const struct rte_flow_item items[])
+{
+	return items[0].spec;
+}
+
 /* Flow structure. */
 struct rte_flow {
 	ILIST_ENTRY(uint32_t)next; /**< Index to the next flow structure. */
@@ -920,12 +1035,14 @@ struct rte_flow {
 	/**< Device flow handles that are part of the flow. */
 	uint32_t drv_type:2; /**< Driver type. */
 	uint32_t fdir:1; /**< Identifier of associated FDIR if any. */
+	uint32_t tunnel:1;
 	uint32_t hairpin_flow_id:HAIRPIN_FLOW_ID_BITS;
 	/**< The flow id used for hairpin. */
 	uint32_t copy_applied:1; /**< The MARK copy Flow os applied. */
 	uint32_t rix_mreg_copy;
 	/**< Index to metadata register copy table resource. */
 	uint32_t counter; /**< Holds flow counter. */
+	uint32_t tunnel_id;  /**< Tunnel id */
 	uint16_t meter; /**< Holds flow meter id. */
 } __rte_packed;
 
@@ -1008,9 +1125,54 @@ void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
 uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
 uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
 			      uint32_t id);
-int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
-			     bool external, uint32_t group, bool fdb_def_rule,
-			     uint32_t *table, struct rte_flow_error *error);
+__extension__
+struct flow_grp_info {
+	uint64_t external:1;
+	uint64_t transfer:1;
+	uint64_t fdb_def_rule:1;
+	/* force standard group translation */
+	uint64_t std_tbl_fix:1;
+};
+
+static inline bool
+tunnel_use_standard_attr_group_translate
+		    (struct rte_eth_dev *dev,
+		     const struct mlx5_flow_tunnel *tunnel,
+		     const struct rte_flow_attr *attr,
+		     const struct rte_flow_item items[],
+		     const struct rte_flow_action actions[])
+{
+	bool verdict;
+
+	if (!is_tunnel_offload_active(dev))
+		/* no tunnel offload API */
+		verdict = true;
+	else if (tunnel) {
+		/*
+		 * OvS will use jump to group 0 in tunnel steer rule.
+		 * If tunnel steer rule starts from group 0 (attr.group == 0)
+		 * that 0 group must be traslated with standard method.
+		 * attr.group == 0 in tunnel match rule translated with tunnel
+		 * method
+		 */
+		verdict = !attr->group &&
+			  is_flow_tunnel_steer_rule(dev, attr, items, actions);
+	} else {
+		/*
+		 * non-tunnel group translation uses standard method for
+		 * root group only: attr.group == 0
+		 */
+		verdict = !attr->group;
+	}
+
+	return verdict;
+}
+
+int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     uint32_t group, uint32_t *table,
+			     struct flow_grp_info flags,
+				 struct rte_flow_error *error);
 uint64_t mlx5_flow_hashfields_adjust(struct mlx5_flow_rss_desc *rss_desc,
 				     int tunnel, uint64_t layer_types,
 				     uint64_t hash_fields);
@@ -1145,4 +1307,6 @@ int mlx5_flow_destroy_policer_rules(struct rte_eth_dev *dev,
 int mlx5_flow_meter_flush(struct rte_eth_dev *dev,
 			  struct rte_mtr_error *error);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 15cd34e331..ba9a9cf769 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3947,14 +3947,21 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_dv_validate_action_jump(const struct rte_flow_action *action,
+flow_dv_validate_action_jump(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     const struct rte_flow_action *action,
 			     uint64_t action_flags,
 			     const struct rte_flow_attr *attributes,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table;
 	int ret = 0;
-
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attributes->transfer,
+		.fdb_def_rule = 1,
+		.std_tbl_fix = 0
+	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
 			    MLX5_FLOW_FATE_ESWITCH_ACTIONS))
 		return rte_flow_error_set(error, EINVAL,
@@ -3977,11 +3984,13 @@ flow_dv_validate_action_jump(const struct rte_flow_action *action,
 					  NULL, "action configuration not set");
 	target_group =
 		((const struct rte_flow_action_jump *)action->conf)->group;
-	ret = mlx5_flow_group_to_table(attributes, external, target_group,
-				       true, &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, target_group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
-	if (attributes->group == target_group)
+	if (attributes->group == target_group &&
+	    !(action_flags & (MLX5_FLOW_ACTION_TUNNEL_SET |
+			      MLX5_FLOW_ACTION_TUNNEL_MATCH)))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "target group must be other than"
@@ -5160,8 +5169,9 @@ flow_dv_counter_release(struct rte_eth_dev *dev, uint32_t counter)
  */
 static int
 flow_dv_validate_attributes(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_tunnel *tunnel,
 			    const struct rte_flow_attr *attributes,
-			    bool external __rte_unused,
+			    struct flow_grp_info grp_info,
 			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5169,6 +5179,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 	int ret = 0;
 
 #ifndef HAVE_MLX5DV_DR
+	(void)tunnel;
+	(void)grp_info;
 	if (attributes->group)
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -5177,9 +5189,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 #else
 	uint32_t table = 0;
 
-	ret = mlx5_flow_group_to_table(attributes, external,
-				       attributes->group, !!priv->fdb_def_rule,
-				       &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attributes->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	if (!table)
@@ -5293,10 +5304,28 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 	const struct rte_flow_item_vlan *vlan_m = NULL;
 	int16_t rw_act_num = 0;
 	uint64_t is_root;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	if (items == NULL)
 		return -1;
-	ret = flow_dv_validate_attributes(dev, attr, external, error);
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		tunnel = flow_items_to_tunnel(items);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_MATCH |
+				MLX5_FLOW_ACTION_DECAP;
+	} else if (is_flow_tunnel_steer_rule(dev, attr, items, actions)) {
+		tunnel = flow_actions_to_tunnel(actions);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+	} else {
+		tunnel = NULL;
+	}
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = flow_dv_validate_attributes(dev, tunnel, attr, grp_info, error);
 	if (ret < 0)
 		return ret;
 	is_root = (uint64_t)ret;
@@ -5309,6 +5338,15 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  NULL, "item not supported");
 		switch (type) {
+		case MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL:
+			if (items[0].type != (typeof(items[0].type))
+						MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ITEM,
+						NULL, "MLX5 private items "
+						"must be the first");
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -5894,7 +5932,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_MDF_TTL;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			ret = flow_dv_validate_action_jump(actions,
+			ret = flow_dv_validate_action_jump(dev, tunnel, actions,
 							   action_flags,
 							   attr, external,
 							   error);
@@ -6003,6 +6041,17 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			action_flags |= MLX5_FLOW_ACTION_SAMPLE;
 			++actions_n;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			if (actions[0].type != (typeof(actions[0].type))
+				MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL, "MLX5 private action "
+						"must be the first");
+
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -6010,6 +6059,54 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  "action not supported");
 		}
 	}
+	/*
+	 * Validate actions in flow rules
+	 * - Explicit decap action is prohibited by the tunnel offload API.
+	 * - Drop action in tunnel steer rule is prohibited by the API.
+	 * - Application cannot use MARK action because it's value can mask
+	 *   tunnel default miss nitification.
+	 * - JUMP in tunnel match rule has no support in current PMD
+	 *   implementation.
+	 * - TAG & META are reserved for future uses.
+	 */
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_SET) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_DECAP    |
+					    MLX5_FLOW_ACTION_MARK     |
+					    MLX5_FLOW_ACTION_SET_TAG  |
+					    MLX5_FLOW_ACTION_SET_META |
+					    MLX5_FLOW_ACTION_DROP;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set decap rule");
+		if (!(action_flags & MLX5_FLOW_ACTION_JUMP))
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel set decap rule must terminate "
+					"with JUMP");
+		if (!attr->ingress)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel flows for ingress traffic only");
+	}
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_MATCH) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_JUMP    |
+					    MLX5_FLOW_ACTION_MARK    |
+					    MLX5_FLOW_ACTION_SET_TAG |
+					    MLX5_FLOW_ACTION_SET_META;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set match rule");
+	}
 	/*
 	 * Validate the drop action mutual exclusion with other actions.
 	 * Drop action is mutually-exclusive with any other action, except for
@@ -7876,6 +7973,9 @@ static struct mlx5_flow_tbl_resource *
 flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 			 uint32_t table_id, uint8_t egress,
 			 uint8_t transfer,
+			 bool external,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group_id,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -7912,6 +8012,9 @@ flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	tbl_data->idx = idx;
+	tbl_data->tunnel = tunnel;
+	tbl_data->group_id = group_id;
+	tbl_data->external = external;
 	tbl = &tbl_data->tbl;
 	pos = &tbl_data->entry;
 	if (transfer)
@@ -7975,6 +8078,41 @@ flow_dv_tbl_resource_release(struct rte_eth_dev *dev,
 
 		mlx5_flow_os_destroy_flow_tbl(tbl->obj);
 		tbl->obj = NULL;
+		if (is_tunnel_offload_active(dev) && tbl_data->external) {
+			struct mlx5_hlist_entry *he;
+			struct mlx5_hlist *tunnel_grp_hash;
+			struct mlx5_flow_tunnel_hub *thub =
+							mlx5_tunnel_hub(dev);
+			union tunnel_tbl_key tunnel_key = {
+				.tunnel_id = tbl_data->tunnel ?
+						tbl_data->tunnel->tunnel_id : 0,
+				.group = tbl_data->group_id
+			};
+			union mlx5_flow_tbl_key table_key = {
+				.v64 = pos->key
+			};
+			uint32_t table_id = table_key.table_id;
+
+			tunnel_grp_hash = tbl_data->tunnel ?
+						tbl_data->tunnel->groups :
+						thub->groups;
+			he = mlx5_hlist_lookup(tunnel_grp_hash, tunnel_key.val);
+			if (he) {
+				struct tunnel_tbl_entry *tte;
+				tte = container_of(he, typeof(*tte), hash);
+				MLX5_ASSERT(tte->flow_table == table_id);
+				mlx5_hlist_remove(tunnel_grp_hash, he);
+				mlx5_free(tte);
+			}
+			mlx5_flow_id_release(mlx5_tunnel_hub(dev)->table_ids,
+					     tunnel_flow_tbl_to_id(table_id));
+			DRV_LOG(DEBUG,
+				"port %u release table_id %#x tunnel %u group %u",
+				dev->data->port_id, table_id,
+				tbl_data->tunnel ?
+				tbl_data->tunnel->tunnel_id : 0,
+				tbl_data->group_id);
+		}
 		/* remove the entry from the hash list and free memory. */
 		mlx5_hlist_remove(sh->flow_tbls, pos);
 		mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_JUMP],
@@ -8020,7 +8158,7 @@ flow_dv_matcher_register(struct rte_eth_dev *dev,
 	int ret;
 
 	tbl = flow_dv_tbl_resource_get(dev, key->table_id, key->direction,
-				       key->domain, error);
+				       key->domain, false, NULL, 0, error);
 	if (!tbl)
 		return -rte_errno;	/* No need to refill the error info */
 	tbl_data = container_of(tbl, struct mlx5_flow_tbl_data_entry, tbl);
@@ -8515,7 +8653,8 @@ flow_dv_sample_resource_register(struct rte_eth_dev *dev,
 	*cache_resource = *resource;
 	/* Create normal path table level */
 	tbl = flow_dv_tbl_resource_get(dev, next_ft_id,
-					attr->egress, attr->transfer, error);
+					attr->egress, attr->transfer,
+					dev_flow->external, NULL, 0, error);
 	if (!tbl) {
 		rte_flow_error_set(error, ENOMEM,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -9127,6 +9266,12 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 	int tmp_actions_n = 0;
 	uint32_t table;
 	int ret = 0;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!dev_flow->external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	memset(&mdest_res, 0, sizeof(struct mlx5_flow_dv_dest_array_resource));
 	memset(&sample_res, 0, sizeof(struct mlx5_flow_dv_sample_resource));
@@ -9134,8 +9279,17 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
 	/* update normal path action resource into last index of array */
 	sample_act = &mdest_res.sample_act[MLX5_MAX_DEST_NUM - 1];
-	ret = mlx5_flow_group_to_table(attr, dev_flow->external, attr->group,
-				       !!priv->fdb_def_rule, &table, error);
+	tunnel = is_flow_tunnel_match_rule(dev, attr, items, actions) ?
+		 flow_items_to_tunnel(items) :
+		 is_flow_tunnel_steer_rule(dev, attr, items, actions) ?
+		 flow_actions_to_tunnel(actions) :
+		 dev_flow->tunnel ? dev_flow->tunnel : NULL;
+	mhdr_res->ft_type = attr->egress ? MLX5DV_FLOW_TABLE_TYPE_NIC_TX :
+					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attr->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	dev_flow->dv.group = table;
@@ -9145,6 +9299,45 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		priority = dev_conf->flow_prio - 1;
 	/* number of actions must be set to 0 in case of dirty stack. */
 	mhdr_res->actions_num = 0;
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		/*
+		 * do not add decap action if match rule drops packet
+		 * HW rejects rules with decap & drop
+		 */
+		bool add_decap = true;
+		const struct rte_flow_action *ptr = actions;
+		struct mlx5_flow_tbl_resource *tbl;
+
+		for (; ptr->type != RTE_FLOW_ACTION_TYPE_END; ptr++) {
+			if (ptr->type == RTE_FLOW_ACTION_TYPE_DROP) {
+				add_decap = false;
+				break;
+			}
+		}
+		if (add_decap) {
+			if (flow_dv_create_action_l2_decap(dev, dev_flow,
+							   attr->transfer,
+							   error))
+				return -rte_errno;
+			dev_flow->dv.actions[actions_n++] =
+					dev_flow->dv.encap_decap->action;
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
+		}
+		/*
+		 * bind table_id with <group, table> for tunnel match rule.
+		 * Tunnel set rule establishes that bind in JUMP action handler.
+		 * Required for scenario when application creates tunnel match
+		 * rule before tunnel set rule.
+		 */
+		tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+					       attr->transfer,
+					       !!dev_flow->external, tunnel,
+					       attr->group, error);
+		if (!tbl)
+			return rte_flow_error_set
+			       (error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+			       actions, "cannot register tunnel group");
+	}
 	for (; !actions_end ; actions++) {
 		const struct rte_flow_action_queue *queue;
 		const struct rte_flow_action_rss *rss;
@@ -9165,6 +9358,9 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 						  actions,
 						  "action not supported");
 		switch (action_type) {
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -9408,18 +9604,18 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 							action->conf)->group;
-			if (dev_flow->external && jump_group <
-					MLX5_MAX_TABLES_EXTERNAL)
-				jump_group *= MLX5_FLOW_TABLE_FACTOR;
-			ret = mlx5_flow_group_to_table(attr, dev_flow->external,
+			grp_info.std_tbl_fix = 0;
+			ret = mlx5_flow_group_to_table(dev, tunnel,
 						       jump_group,
-						       !!priv->fdb_def_rule,
-						       &table, error);
+						       &table,
+						       grp_info, error);
 			if (ret)
 				return ret;
-			tbl = flow_dv_tbl_resource_get(dev, table,
-						       attr->egress,
-						       attr->transfer, error);
+			tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+						       attr->transfer,
+						       !!dev_flow->external,
+						       tunnel, jump_group,
+						       error);
 			if (!tbl)
 				return rte_flow_error_set
 						(error, errno,
@@ -10890,7 +11086,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 		dtb = &mtb->ingress;
 	/* Create the meter table with METER level. */
 	dtb->tbl = flow_dv_tbl_resource_get(dev, MLX5_FLOW_TABLE_LEVEL_METER,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->tbl) {
 		DRV_LOG(ERR, "Failed to create meter policer table.");
 		return -1;
@@ -10898,7 +11095,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 	/* Create the meter suffix table with SUFFIX level. */
 	dtb->sfx_tbl = flow_dv_tbl_resource_get(dev,
 					    MLX5_FLOW_TABLE_LEVEL_SUFFIX,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->sfx_tbl) {
 		DRV_LOG(ERR, "Failed to create meter suffix table.");
 		return -1;
@@ -11217,10 +11415,10 @@ mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev)
 	void *flow = NULL;
 	int i, ret = -1;
 
-	tbl = flow_dv_tbl_resource_get(dev, 0, 0, 0, NULL);
+	tbl = flow_dv_tbl_resource_get(dev, 0, 0, 0, false, NULL, 0, NULL);
 	if (!tbl)
 		goto err;
-	dest_tbl = flow_dv_tbl_resource_get(dev, 1, 0, 0, NULL);
+	dest_tbl = flow_dv_tbl_resource_get(dev, 1, 0, 0, false, NULL, 0, NULL);
 	if (!dest_tbl)
 		goto err;
 	dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0x4);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v4] net/mlx5: implement tunnel offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (13 preceding siblings ...)
  2020-10-23 13:49 ` [dpdk-dev] [PATCH v3] " Gregory Etelson
@ 2020-10-23 13:57 ` Gregory Etelson
  2020-10-25 14:08 ` [dpdk-dev] [PATCH v5] " Gregory Etelson
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-23 13:57 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Viacheslav Ovsiienko,
	Shahaf Shuler

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions whith tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2:
* fix rebase conflicts
v3:
* remove unised variables
v4:
* use RTE_SET_USED()
---
 doc/guides/nics/mlx5.rst         |   3 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +
 drivers/net/mlx5/mlx5.c          |   8 +-
 drivers/net/mlx5/mlx5.h          |   3 +
 drivers/net/mlx5/mlx5_defs.h     |   2 +
 drivers/net/mlx5/mlx5_flow.c     | 682 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h     | 170 +++++++-
 drivers/net/mlx5/mlx5_flow_dv.c  | 254 ++++++++++--
 8 files changed, 1087 insertions(+), 53 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 1a8808e854..678d2a9597 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -831,6 +831,9 @@ Driver options
     24 bits. The actual supported width can be retrieved in runtime by
     series of rte_flow_validate() trials.
 
+  - 3, this engages tunnel offload mode. In E-Switch configuration, that
+    mode implicitly activates ``dv_xmeta_en=1``.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 40f9446d43..ed3f020d82 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -291,6 +291,12 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		sh->esw_drop_action = mlx5_glue->dr_create_flow_action_drop();
 	}
 #endif
+	if (!sh->tunnel_hub)
+		err = mlx5_alloc_tunnel_hub(sh);
+	if (err) {
+		DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
+		goto error;
+	}
 	if (priv->config.reclaim_mode == MLX5_RCM_AGGR) {
 		mlx5_glue->dr_reclaim_domain_memory(sh->rx_domain, 1);
 		mlx5_glue->dr_reclaim_domain_memory(sh->tx_domain, 1);
@@ -335,6 +341,10 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 	return err;
 }
@@ -391,6 +401,10 @@ mlx5_os_free_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 }
 
@@ -733,6 +747,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			strerror(rte_errno));
 		goto error;
 	}
+	if (config->dv_miss_info) {
+		if (switch_info->master || switch_info->representor)
+			config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
+	}
 	mlx5_malloc_mem_select(config->sys_mem_en);
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e4ce9a9cb7..25048df070 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1618,13 +1618,17 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 	} else if (strcmp(MLX5_DV_XMETA_EN, key) == 0) {
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
-		    tmp != MLX5_XMETA_MODE_META32) {
+		    tmp != MLX5_XMETA_MODE_META32 &&
+		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
 			DRV_LOG(ERR, "invalid extensive "
 				     "metadata parameter");
 			rte_errno = EINVAL;
 			return -rte_errno;
 		}
-		config->dv_xmeta_en = tmp;
+		if (tmp != MLX5_XMETA_MODE_MISS_INFO)
+			config->dv_xmeta_en = tmp;
+		else
+			config->dv_miss_info = 1;
 	} else if (strcmp(MLX5_LACP_BY_USER, key) == 0) {
 		config->lacp_by_user = !!tmp;
 	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c9d5d71630..479a815eed 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -208,6 +208,7 @@ struct mlx5_dev_config {
 	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	unsigned int sys_mem_en:1; /* The default memory allocator. */
 	unsigned int decap_en:1; /* Whether decap will be used or not. */
+	unsigned int dv_miss_info:1; /* restore packet after partial hw miss */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
@@ -644,6 +645,8 @@ struct mlx5_dev_ctx_shared {
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
 	struct mlx5_hlist *flow_tbls;
+	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
+	struct mlx5_flow_tunnel_hub *tunnel_hub;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
 	void *pop_vlan_action; /* Pointer to DR pop VLAN action. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 0df47391ee..41a7537d5e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -165,6 +165,8 @@
 #define MLX5_XMETA_MODE_LEGACY 0
 #define MLX5_XMETA_MODE_META16 1
 #define MLX5_XMETA_MODE_META32 2
+/* Provide info on patrial hw miss. Implies MLX5_XMETA_MODE_META16 */
+#define MLX5_XMETA_MODE_MISS_INFO 3
 
 /* MLX5_TX_DB_NC supported values. */
 #define MLX5_TXDB_CACHED 0
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index d7243a878b..21004452e3 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -19,6 +19,7 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 #include <rte_ip.h>
+#include <rte_hash.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -32,6 +33,18 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_common_os.h"
 
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id);
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev, struct mlx5_flow_tunnel *tunnel);
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark);
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel);
+
+
 /** Device flow drivers. */
 extern const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops;
 
@@ -548,6 +561,163 @@ static const struct mlx5_flow_expand_node mlx5_support_expansion[] = {
 	},
 };
 
+static inline bool
+mlx5_flow_tunnel_validate(struct rte_eth_dev *dev,
+			  struct rte_flow_tunnel *tunnel,
+			  const char *err_msg)
+{
+	err_msg = NULL;
+	if (!is_tunnel_offload_active(dev)) {
+		err_msg = "tunnel offload was not activated";
+		goto out;
+	} else if (!tunnel) {
+		err_msg = "no application tunnel";
+		goto out;
+	}
+
+	switch (tunnel->type) {
+	default:
+		err_msg = "unsupported tunnel type";
+		goto out;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		break;
+	}
+
+out:
+	return !err_msg;
+}
+
+
+static int
+mlx5_flow_tunnel_decap_set(struct rte_eth_dev *dev,
+		    struct rte_flow_tunnel *app_tunnel,
+		    struct rte_flow_action **actions,
+		    uint32_t *num_of_actions,
+		    struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*actions = &tunnel->action;
+	*num_of_actions = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_match(struct rte_eth_dev *dev,
+		       struct rte_flow_tunnel *app_tunnel,
+		       struct rte_flow_item **items,
+		       uint32_t *num_of_items,
+		       struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*items = &tunnel->item;
+	*num_of_items = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_item_release(struct rte_eth_dev *dev,
+		       struct rte_flow_item *pmd_items,
+		       uint32_t num_items, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->item == pmd_items)
+			break;
+	}
+	if (!tun || num_items != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+	return 0;
+}
+
+static int
+mlx5_flow_action_release(struct rte_eth_dev *dev,
+			 struct rte_flow_action *pmd_actions,
+			 uint32_t num_actions, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->action == pmd_actions)
+			break;
+	}
+	if (!tun || num_actions != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_get_restore_info(struct rte_eth_dev *dev,
+				  struct rte_mbuf *m,
+				  struct rte_flow_restore_info *info,
+				  struct rte_flow_error *err)
+{
+	uint64_t ol_flags = m->ol_flags;
+	const struct mlx5_flow_tbl_data_entry *tble;
+	const uint64_t mask = PKT_RX_FDIR | PKT_RX_FDIR_ID;
+
+	if ((ol_flags & mask) != mask)
+		goto err;
+	tble = tunnel_mark_decode(dev, m->hash.fdir.hi);
+	if (!tble) {
+		DRV_LOG(DEBUG, "port %u invalid miss tunnel mark %#x",
+			dev->data->port_id, m->hash.fdir.hi);
+		goto err;
+	}
+	MLX5_ASSERT(tble->tunnel);
+	memcpy(&info->tunnel, &tble->tunnel->app_tunnel, sizeof(info->tunnel));
+	info->group_id = tble->group_id;
+	info->flags = RTE_FLOW_RESTORE_INFO_TUNNEL |
+		      RTE_FLOW_RESTORE_INFO_GROUP_ID |
+		      RTE_FLOW_RESTORE_INFO_ENCAPSULATED;
+
+	return 0;
+
+err:
+	return rte_flow_error_set(err, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "failed to get restore info");
+}
+
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -557,6 +727,11 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.query = mlx5_flow_query,
 	.dev_dump = mlx5_flow_dev_dump,
 	.get_aged_flows = mlx5_flow_get_aged_flows,
+	.tunnel_decap_set = mlx5_flow_tunnel_decap_set,
+	.tunnel_match = mlx5_flow_tunnel_match,
+	.tunnel_action_decap_release = mlx5_flow_action_release,
+	.tunnel_item_release = mlx5_flow_item_release,
+	.get_restore_info = mlx5_flow_tunnel_get_restore_info,
 };
 
 /* Convert FDIR request to Generic flow. */
@@ -3882,6 +4057,142 @@ flow_hairpin_split(struct rte_eth_dev *dev,
 	return 0;
 }
 
+__extension__
+union tunnel_offload_mark {
+	uint32_t val;
+	struct {
+		uint32_t app_reserve:8;
+		uint32_t table_id:15;
+		uint32_t transfer:1;
+		uint32_t _unused_:8;
+	};
+};
+
+struct tunnel_default_miss_ctx {
+	uint16_t *queue;
+	__extension__
+	union {
+		struct rte_flow_action_rss action_rss;
+		struct rte_flow_action_queue miss_queue;
+		struct rte_flow_action_jump miss_jump;
+		uint8_t raw[0];
+	};
+};
+
+static int
+flow_tunnel_add_default_miss(struct rte_eth_dev *dev,
+			     struct rte_flow *flow,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *app_actions,
+			     uint32_t flow_idx,
+			     struct tunnel_default_miss_ctx *ctx,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow *dev_flow;
+	struct rte_flow_attr miss_attr = *attr;
+	const struct mlx5_flow_tunnel *tunnel = app_actions[0].conf;
+	const struct rte_flow_item miss_items[2] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		}
+	};
+	union tunnel_offload_mark mark_id;
+	struct rte_flow_action_mark miss_mark;
+	struct rte_flow_action miss_actions[3] = {
+		[0] = { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &miss_mark },
+		[2] = { .type = RTE_FLOW_ACTION_TYPE_END,  .conf = NULL }
+	};
+	const struct rte_flow_action_jump *jump_data;
+	uint32_t i, flow_table = 0; /* prevent compilation warning */
+	struct flow_grp_info grp_info = {
+		.external = 1,
+		.transfer = attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+		.std_tbl_fix = 0,
+	};
+	int ret;
+
+	if (!attr->transfer) {
+		uint32_t q_size;
+
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_RSS;
+		q_size = priv->reta_idx_n * sizeof(ctx->queue[0]);
+		ctx->queue = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, q_size,
+					 0, SOCKET_ID_ANY);
+		if (!ctx->queue)
+			return rte_flow_error_set
+				(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid default miss RSS");
+		ctx->action_rss.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		ctx->action_rss.level = 0,
+		ctx->action_rss.types = priv->rss_conf.rss_hf,
+		ctx->action_rss.key_len = priv->rss_conf.rss_key_len,
+		ctx->action_rss.queue_num = priv->reta_idx_n,
+		ctx->action_rss.key = priv->rss_conf.rss_key,
+		ctx->action_rss.queue = ctx->queue;
+		if (!priv->reta_idx_n || !priv->rxqs_n)
+			return rte_flow_error_set
+				(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid port configuration");
+		if (!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG))
+			ctx->action_rss.types = 0;
+		for (i = 0; i != priv->reta_idx_n; ++i)
+			ctx->queue[i] = (*priv->reta_idx)[i];
+	} else {
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_JUMP;
+		ctx->miss_jump.group = MLX5_TNL_MISS_FDB_JUMP_GRP;
+	}
+	miss_actions[1].conf = (typeof(miss_actions[1].conf))ctx->raw;
+	for (; app_actions->type != RTE_FLOW_ACTION_TYPE_JUMP; app_actions++);
+	jump_data = app_actions->conf;
+	miss_attr.priority = MLX5_TNL_MISS_RULE_PRIORITY;
+	miss_attr.group = jump_data->group;
+	ret = mlx5_flow_group_to_table(dev, tunnel, jump_data->group,
+				       &flow_table, grp_info, error);
+	if (ret)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  NULL, "invalid tunnel id");
+	mark_id.app_reserve = 0;
+	mark_id.table_id = tunnel_flow_tbl_to_id(flow_table);
+	mark_id.transfer = !!attr->transfer;
+	mark_id._unused_ = 0;
+	miss_mark.id = mark_id.val;
+	dev_flow = flow_drv_prepare(dev, flow, &miss_attr,
+				    miss_items, miss_actions, flow_idx, error);
+	if (!dev_flow)
+		return -rte_errno;
+	dev_flow->flow = flow;
+	dev_flow->external = true;
+	dev_flow->tunnel = tunnel;
+	/* Subflow object was created, we must include one in the list. */
+	SILIST_INSERT(&flow->dev_handles, dev_flow->handle_idx,
+		      dev_flow->handle, next);
+	DRV_LOG(DEBUG,
+		"port %u tunnel type=%d id=%u miss rule priority=%u group=%u",
+		dev->data->port_id, tunnel->app_tunnel.type,
+		tunnel->tunnel_id, miss_attr.priority, miss_attr.group);
+	ret = flow_drv_translate(dev, dev_flow, &miss_attr, miss_items,
+				  miss_actions, error);
+	if (!ret)
+		ret = flow_mreg_update_copy_table(dev, flow, miss_actions,
+						  error);
+
+	return ret;
+}
+
 /**
  * The last stage of splitting chain, just creates the subflow
  * without any modification.
@@ -5004,6 +5315,27 @@ flow_create_split_outer(struct rte_eth_dev *dev,
 	return ret;
 }
 
+static struct mlx5_flow_tunnel *
+flow_tunnel_from_rule(struct rte_eth_dev *dev,
+		      const struct rte_flow_attr *attr,
+		      const struct rte_flow_item items[],
+		      const struct rte_flow_action actions[])
+{
+	struct mlx5_flow_tunnel *tunnel;
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)items[0].spec;
+	else if (is_flow_tunnel_steer_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)actions[0].conf;
+	else
+		tunnel = NULL;
+#pragma GCC diagnostic pop
+
+	return tunnel;
+}
+
 /**
  * Create a flow and add it to @p list.
  *
@@ -5065,11 +5397,11 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	uint32_t hairpin_id = 0;
 	struct rte_flow_attr attr_tx = { .priority = 0 };
 	struct rte_flow_attr attr_factor = {0};
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_default_miss_ctx default_miss_ctx = { 0, };
 	int ret;
 
 	memcpy((void *)&attr_factor, (const void *)attr, sizeof(*attr));
-	if (external)
-		attr_factor.group *= MLX5_FLOW_TABLE_FACTOR;
 	hairpin_flow = flow_check_hairpin_split(dev, &attr_factor, actions);
 	ret = flow_drv_validate(dev, &attr_factor, items, p_actions_rx,
 				external, hairpin_flow, error);
@@ -5141,6 +5473,19 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 					      error);
 		if (ret < 0)
 			goto error;
+		if (is_flow_tunnel_steer_rule(dev, attr,
+					      buf->entry[i].pattern,
+					      p_actions_rx)) {
+			ret = flow_tunnel_add_default_miss(dev, flow, attr,
+							   p_actions_rx,
+							   idx,
+							   &default_miss_ctx,
+							   error);
+			if (ret < 0) {
+				mlx5_free(default_miss_ctx.queue);
+				goto error;
+			}
+		}
 	}
 	/* Create the tx flow. */
 	if (hairpin_flow) {
@@ -5195,6 +5540,13 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	priv->flow_idx = priv->flow_nested_idx;
 	if (priv->flow_nested_idx)
 		priv->flow_nested_idx = 0;
+	tunnel = flow_tunnel_from_rule(dev, attr, items, actions);
+	if (tunnel) {
+		flow->tunnel = 1;
+		flow->tunnel_id = tunnel->tunnel_id;
+		__atomic_add_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED);
+		mlx5_free(default_miss_ctx.queue);
+	}
 	return idx;
 error:
 	MLX5_ASSERT(flow);
@@ -5314,6 +5666,7 @@ mlx5_flow_create(struct rte_eth_dev *dev,
 				   "port not started");
 		return NULL;
 	}
+
 	return (void *)(uintptr_t)flow_list_create(dev, &priv->flows,
 				  attr, items, actions, true, error);
 }
@@ -5368,6 +5721,13 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 		}
 	}
 	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+	if (flow->tunnel) {
+		struct mlx5_flow_tunnel *tunnel;
+		tunnel = mlx5_find_tunnel_id(dev, flow->tunnel_id);
+		RTE_VERIFY(tunnel);
+		if (!__atomic_sub_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED))
+			mlx5_flow_tunnel_free(dev, tunnel);
+	}
 }
 
 /**
@@ -6902,19 +7262,122 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 	sh->cmng.pending_queries--;
 }
 
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_hlist_entry *he;
+	union tunnel_offload_mark mbits = { .val = mark };
+	union mlx5_flow_tbl_key table_key = {
+		{
+			.table_id = tunnel_id_to_flow_tbl(mbits.table_id),
+			.reserved = 0,
+			.domain = !!mbits.transfer,
+			.direction = 0,
+		}
+	};
+	he = mlx5_hlist_lookup(sh->flow_tbls, table_key.v64);
+	return he ?
+	       container_of(he, struct mlx5_flow_tbl_data_entry, entry) : NULL;
+}
+
+static uint32_t
+tunnel_flow_group_to_flow_table(struct rte_eth_dev *dev,
+				const struct mlx5_flow_tunnel *tunnel,
+				uint32_t group, uint32_t *table,
+				struct rte_flow_error *error)
+{
+	struct mlx5_hlist_entry *he;
+	struct tunnel_tbl_entry *tte;
+	union tunnel_tbl_key key = {
+		.tunnel_id = tunnel ? tunnel->tunnel_id : 0,
+		.group = group
+	};
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_hlist *group_hash;
+
+	group_hash = tunnel ? tunnel->groups : thub->groups;
+	he = mlx5_hlist_lookup(group_hash, key.val);
+	if (!he) {
+		int ret;
+		tte = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+				  sizeof(*tte), 0,
+				  SOCKET_ID_ANY);
+		if (!tte)
+			goto err;
+		tte->hash.key = key.val;
+		ret = mlx5_flow_id_get(thub->table_ids, &tte->flow_table);
+		if (ret) {
+			mlx5_free(tte);
+			goto err;
+		}
+		tte->flow_table = tunnel_id_to_flow_tbl(tte->flow_table);
+		mlx5_hlist_insert(group_hash, &tte->hash);
+	} else {
+		tte = container_of(he, typeof(*tte), hash);
+	}
+	*table = tte->flow_table;
+	DRV_LOG(DEBUG, "port %u tunnel %u group=%#x table=%#x",
+		dev->data->port_id, key.tunnel_id, group, *table);
+	return 0;
+
+err:
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+				  NULL, "tunnel group index not supported");
+}
+
+static int
+flow_group_to_table(uint32_t port_id, uint32_t group, uint32_t *table,
+		    struct flow_grp_info grp_info, struct rte_flow_error *error)
+{
+	if (grp_info.transfer && grp_info.external && grp_info.fdb_def_rule) {
+		if (group == UINT32_MAX)
+			return rte_flow_error_set
+						(error, EINVAL,
+						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						 NULL,
+						 "group index not supported");
+		*table = group + 1;
+	} else {
+		*table = group;
+	}
+	DRV_LOG(DEBUG, "port %u group=%#x table=%#x", port_id, group, *table);
+	return 0;
+}
+
 /**
  * Translate the rte_flow group index to HW table value.
  *
- * @param[in] attributes
- *   Pointer to flow attributes
- * @param[in] external
- *   Value is part of flow rule created by request external to PMD.
+ * If tunnel offload is disabled, all group ids coverted to flow table
+ * id using the standard method.
+ * If tunnel offload is enabled, group id can be converted using the
+ * standard or tunnel conversion method. Group conversion method
+ * selection depends on flags in `grp_info` parameter:
+ * - Internal (grp_info.external == 0) groups conversion uses the
+ *   standard method.
+ * - Group ids in JUMP action converted with the tunnel conversion.
+ * - Group id in rule attribute conversion depends on a rule type and
+ *   group id value:
+ *   ** non zero group attributes converted with the tunnel method
+ *   ** zero group attribute in non-tunnel rule is converted using the
+ *      standard method - there's only one root table
+ *   ** zero group attribute in steer tunnel rule is converted with the
+ *      standard method - single root table
+ *   ** zero group attribute in match tunnel rule is a special OvS
+ *      case: that value is used for portability reasons. That group
+ *      id is converted with the tunnel conversion method.
+ *
+ * @param[in] dev
+ *   Port device
+ * @param[in] tunnel
+ *   PMD tunnel offload object
  * @param[in] group
  *   rte_flow group index value.
- * @param[out] fdb_def_rule
- *   Whether fdb jump to table 1 is configured.
  * @param[out] table
  *   HW table value.
+ * @param[in] grp_info
+ *   flags used for conversion
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -6922,22 +7385,36 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_flow_group_to_table(const struct rte_flow_attr *attributes, bool external,
-			 uint32_t group, bool fdb_def_rule, uint32_t *table,
+mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group, uint32_t *table,
+			 struct flow_grp_info grp_info,
 			 struct rte_flow_error *error)
 {
-	if (attributes->transfer && external && fdb_def_rule) {
-		if (group == UINT32_MAX)
-			return rte_flow_error_set
-						(error, EINVAL,
-						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
-						 NULL,
-						 "group index not supported");
-		*table = group + 1;
+	int ret;
+	bool standard_translation;
+
+	if (grp_info.external && group < MLX5_MAX_TABLES_EXTERNAL)
+		group *= MLX5_FLOW_TABLE_FACTOR;
+	if (is_tunnel_offload_active(dev)) {
+		standard_translation = !grp_info.external ||
+					grp_info.std_tbl_fix;
 	} else {
-		*table = group;
+		standard_translation = true;
 	}
-	return 0;
+	DRV_LOG(DEBUG,
+		"port %u group=%#x transfer=%d external=%d fdb_def_rule=%d translate=%s",
+		dev->data->port_id, group, grp_info.transfer,
+		grp_info.external, grp_info.fdb_def_rule,
+		standard_translation ? "STANDARD" : "TUNNEL");
+	if (standard_translation)
+		ret = flow_group_to_table(dev->data->port_id, group, table,
+					  grp_info, error);
+	else
+		ret = tunnel_flow_group_to_flow_table(dev, tunnel, group,
+						      table, error);
+
+	return ret;
 }
 
 /**
@@ -7081,3 +7558,168 @@ mlx5_flow_get_aged_flows(struct rte_eth_dev *dev, void **contexts,
 		 dev->data->port_id);
 	return -ENOTSUP;
 }
+
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev,
+		      struct mlx5_flow_tunnel *tunnel)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+
+	DRV_LOG(DEBUG, "port %u release pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+	RTE_VERIFY(!__atomic_load_n(&tunnel->refctn, __ATOMIC_RELAXED));
+	LIST_REMOVE(tunnel, chain);
+	mlx5_flow_id_release(id_pool, tunnel->tunnel_id);
+	mlx5_hlist_destroy(tunnel->groups, NULL, NULL);
+	mlx5_free(tunnel);
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (tun->tunnel_id == id)
+			break;
+	}
+
+	return tun;
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_flow_tunnel_allocate(struct rte_eth_dev *dev,
+			  const struct rte_flow_tunnel *app_tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+	uint32_t id;
+
+	ret = mlx5_flow_id_get(id_pool, &id);
+	if (ret)
+		return NULL;
+	/**
+	 * mlx5 flow tunnel is an auxlilary data structure
+	 * It's not part of IO. No need to allocate it from
+	 * huge pages pools dedicated for IO
+	 */
+	tunnel = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*tunnel),
+			     0, SOCKET_ID_ANY);
+	if (!tunnel) {
+		mlx5_flow_id_pool_release(id_pool);
+		return NULL;
+	}
+	tunnel->groups = mlx5_hlist_create("tunnel groups", 1024);
+	if (!tunnel->groups) {
+		mlx5_flow_id_pool_release(id_pool);
+		mlx5_free(tunnel);
+		return NULL;
+	}
+	/* initiate new PMD tunnel */
+	memcpy(&tunnel->app_tunnel, app_tunnel, sizeof(*app_tunnel));
+	tunnel->tunnel_id = id;
+	tunnel->action.type = (typeof(tunnel->action.type))
+			      MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET;
+	tunnel->action.conf = tunnel;
+	tunnel->item.type = (typeof(tunnel->item.type))
+			    MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL;
+	tunnel->item.spec = tunnel;
+	tunnel->item.last = NULL;
+	tunnel->item.mask = NULL;
+
+	DRV_LOG(DEBUG, "port %u new pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+
+	return tunnel;
+}
+
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (!memcmp(app_tunnel, &tun->app_tunnel,
+			    sizeof(*app_tunnel))) {
+			*tunnel = tun;
+			ret = 0;
+			break;
+		}
+	}
+	if (!tun) {
+		tun = mlx5_flow_tunnel_allocate(dev, app_tunnel);
+		if (tun) {
+			LIST_INSERT_HEAD(&thub->tunnels, tun, chain);
+			*tunnel = tun;
+		} else {
+			ret = -ENOMEM;
+		}
+	}
+	if (tun)
+		__atomic_add_fetch(&tun->refctn, 1, __ATOMIC_RELAXED);
+
+	return ret;
+}
+
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id)
+{
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+
+	if (!thub)
+		return;
+	if (!LIST_EMPTY(&thub->tunnels))
+		DRV_LOG(WARNING, "port %u tunnels present\n", port_id);
+	mlx5_flow_id_pool_release(thub->tunnel_ids);
+	mlx5_flow_id_pool_release(thub->table_ids);
+	mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	mlx5_free(thub);
+}
+
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
+{
+	int err;
+	struct mlx5_flow_tunnel_hub *thub;
+
+	thub = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*thub),
+			   0, SOCKET_ID_ANY);
+	if (!thub)
+		return -ENOMEM;
+	LIST_INIT(&thub->tunnels);
+	thub->tunnel_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TUNNELS);
+	if (!thub->tunnel_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->table_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TABLES);
+	if (!thub->table_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->groups = mlx5_hlist_create("flow groups", MLX5_MAX_TABLES);
+	if (!thub->groups) {
+		err = -rte_errno;
+		goto err;
+	}
+	sh->tunnel_hub = thub;
+
+	return 0;
+
+err:
+	if (thub->groups)
+		mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	if (thub->table_ids)
+		mlx5_flow_id_pool_release(thub->table_ids);
+	if (thub->tunnel_ids)
+		mlx5_flow_id_pool_release(thub->tunnel_ids);
+	if (thub)
+		mlx5_free(thub);
+	return err;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index b4be4769ef..ecb5f023d9 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -26,6 +26,7 @@ enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
 	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
 
 /* Private (internal) rte flow actions. */
@@ -35,6 +36,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_MARK,
 	MLX5_RTE_FLOW_ACTION_TYPE_COPY_MREG,
 	MLX5_RTE_FLOW_ACTION_TYPE_DEFAULT_MISS,
+	MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET,
 };
 
 /* Matches on selected register. */
@@ -201,6 +203,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_AGE (1ull << 34)
 #define MLX5_FLOW_ACTION_DEFAULT_MISS (1ull << 35)
 #define MLX5_FLOW_ACTION_SAMPLE (1ull << 36)
+#define MLX5_FLOW_ACTION_TUNNEL_SET (1ull << 37)
+#define MLX5_FLOW_ACTION_TUNNEL_MATCH (1ull << 38)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -530,6 +534,10 @@ struct mlx5_flow_tbl_data_entry {
 	struct mlx5_flow_dv_jump_tbl_resource jump;
 	/**< jump resource, at most one for each table created. */
 	uint32_t idx; /**< index for the indexed mempool. */
+	/**< tunnel offload */
+	const struct mlx5_flow_tunnel *tunnel;
+	uint32_t group_id;
+	bool external;
 };
 
 /* Sub rdma-core actions list. */
@@ -768,6 +776,7 @@ struct mlx5_flow {
 	};
 	struct mlx5_flow_handle *handle;
 	uint32_t handle_idx; /* Index of the mlx5 flow handle memory. */
+	const struct mlx5_flow_tunnel *tunnel;
 };
 
 /* Flow meter state. */
@@ -913,6 +922,112 @@ struct mlx5_fdir_flow {
 
 #define HAIRPIN_FLOW_ID_BITS 28
 
+#define MLX5_MAX_TUNNELS 256
+#define MLX5_TNL_MISS_RULE_PRIORITY 3
+#define MLX5_TNL_MISS_FDB_JUMP_GRP  0x1234faac
+
+/*
+ * When tunnel offload is active, all JUMP group ids are converted
+ * using the same method. That conversion is applied both to tunnel and
+ * regular rule types.
+ * Group ids used in tunnel rules are relative to it's tunnel (!).
+ * Application can create number of steer rules, using the same
+ * tunnel, with different group id in each rule.
+ * Each tunnel stores its groups internally in PMD tunnel object.
+ * Groups used in regular rules do not belong to any tunnel and are stored
+ * in tunnel hub.
+ */
+
+struct mlx5_flow_tunnel {
+	LIST_ENTRY(mlx5_flow_tunnel) chain;
+	struct rte_flow_tunnel app_tunnel;	/** app tunnel copy */
+	uint32_t tunnel_id;			/** unique tunnel ID */
+	uint32_t refctn;
+	struct rte_flow_action action;
+	struct rte_flow_item item;
+	struct mlx5_hlist *groups;		/** tunnel groups */
+};
+
+/** PMD tunnel related context */
+struct mlx5_flow_tunnel_hub {
+	LIST_HEAD(, mlx5_flow_tunnel) tunnels;
+	struct mlx5_flow_id_pool *tunnel_ids;
+	struct mlx5_flow_id_pool *table_ids;
+	struct mlx5_hlist *groups;		/** non tunnel groups */
+};
+
+/* convert jump group to flow table ID in tunnel rules */
+struct tunnel_tbl_entry {
+	struct mlx5_hlist_entry hash;
+	uint32_t flow_table;
+};
+
+static inline uint32_t
+tunnel_id_to_flow_tbl(uint32_t id)
+{
+	return id | (1u << 16);
+}
+
+static inline uint32_t
+tunnel_flow_tbl_to_id(uint32_t flow_tbl)
+{
+	return flow_tbl & ~(1u << 16);
+}
+
+union tunnel_tbl_key {
+	uint64_t val;
+	struct {
+		uint32_t tunnel_id;
+		uint32_t group;
+	};
+};
+
+static inline struct mlx5_flow_tunnel_hub *
+mlx5_tunnel_hub(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return priv->sh->tunnel_hub;
+}
+
+static inline bool
+is_tunnel_offload_active(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return !!priv->config.dv_miss_info;
+}
+
+static inline bool
+is_flow_tunnel_match_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (items[0].type == (typeof(items[0].type))
+				 MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL);
+}
+
+static inline bool
+is_flow_tunnel_steer_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (actions[0].type == (typeof(actions[0].type))
+				   MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET);
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_actions_to_tunnel(const struct rte_flow_action actions[])
+{
+	return actions[0].conf;
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_items_to_tunnel(const struct rte_flow_item items[])
+{
+	return items[0].spec;
+}
+
 /* Flow structure. */
 struct rte_flow {
 	ILIST_ENTRY(uint32_t)next; /**< Index to the next flow structure. */
@@ -920,12 +1035,14 @@ struct rte_flow {
 	/**< Device flow handles that are part of the flow. */
 	uint32_t drv_type:2; /**< Driver type. */
 	uint32_t fdir:1; /**< Identifier of associated FDIR if any. */
+	uint32_t tunnel:1;
 	uint32_t hairpin_flow_id:HAIRPIN_FLOW_ID_BITS;
 	/**< The flow id used for hairpin. */
 	uint32_t copy_applied:1; /**< The MARK copy Flow os applied. */
 	uint32_t rix_mreg_copy;
 	/**< Index to metadata register copy table resource. */
 	uint32_t counter; /**< Holds flow counter. */
+	uint32_t tunnel_id;  /**< Tunnel id */
 	uint16_t meter; /**< Holds flow meter id. */
 } __rte_packed;
 
@@ -1008,9 +1125,54 @@ void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
 uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
 uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
 			      uint32_t id);
-int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
-			     bool external, uint32_t group, bool fdb_def_rule,
-			     uint32_t *table, struct rte_flow_error *error);
+__extension__
+struct flow_grp_info {
+	uint64_t external:1;
+	uint64_t transfer:1;
+	uint64_t fdb_def_rule:1;
+	/* force standard group translation */
+	uint64_t std_tbl_fix:1;
+};
+
+static inline bool
+tunnel_use_standard_attr_group_translate
+		    (struct rte_eth_dev *dev,
+		     const struct mlx5_flow_tunnel *tunnel,
+		     const struct rte_flow_attr *attr,
+		     const struct rte_flow_item items[],
+		     const struct rte_flow_action actions[])
+{
+	bool verdict;
+
+	if (!is_tunnel_offload_active(dev))
+		/* no tunnel offload API */
+		verdict = true;
+	else if (tunnel) {
+		/*
+		 * OvS will use jump to group 0 in tunnel steer rule.
+		 * If tunnel steer rule starts from group 0 (attr.group == 0)
+		 * that 0 group must be traslated with standard method.
+		 * attr.group == 0 in tunnel match rule translated with tunnel
+		 * method
+		 */
+		verdict = !attr->group &&
+			  is_flow_tunnel_steer_rule(dev, attr, items, actions);
+	} else {
+		/*
+		 * non-tunnel group translation uses standard method for
+		 * root group only: attr.group == 0
+		 */
+		verdict = !attr->group;
+	}
+
+	return verdict;
+}
+
+int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     uint32_t group, uint32_t *table,
+			     struct flow_grp_info flags,
+				 struct rte_flow_error *error);
 uint64_t mlx5_flow_hashfields_adjust(struct mlx5_flow_rss_desc *rss_desc,
 				     int tunnel, uint64_t layer_types,
 				     uint64_t hash_fields);
@@ -1145,4 +1307,6 @@ int mlx5_flow_destroy_policer_rules(struct rte_eth_dev *dev,
 int mlx5_flow_meter_flush(struct rte_eth_dev *dev,
 			  struct rte_mtr_error *error);
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 15cd34e331..b19f644b5e 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3947,14 +3947,21 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_dv_validate_action_jump(const struct rte_flow_action *action,
+flow_dv_validate_action_jump(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     const struct rte_flow_action *action,
 			     uint64_t action_flags,
 			     const struct rte_flow_attr *attributes,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table;
 	int ret = 0;
-
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attributes->transfer,
+		.fdb_def_rule = 1,
+		.std_tbl_fix = 0
+	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
 			    MLX5_FLOW_FATE_ESWITCH_ACTIONS))
 		return rte_flow_error_set(error, EINVAL,
@@ -3977,11 +3984,13 @@ flow_dv_validate_action_jump(const struct rte_flow_action *action,
 					  NULL, "action configuration not set");
 	target_group =
 		((const struct rte_flow_action_jump *)action->conf)->group;
-	ret = mlx5_flow_group_to_table(attributes, external, target_group,
-				       true, &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, target_group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
-	if (attributes->group == target_group)
+	if (attributes->group == target_group &&
+	    !(action_flags & (MLX5_FLOW_ACTION_TUNNEL_SET |
+			      MLX5_FLOW_ACTION_TUNNEL_MATCH)))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "target group must be other than"
@@ -5160,8 +5169,9 @@ flow_dv_counter_release(struct rte_eth_dev *dev, uint32_t counter)
  */
 static int
 flow_dv_validate_attributes(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_tunnel *tunnel,
 			    const struct rte_flow_attr *attributes,
-			    bool external __rte_unused,
+			    struct flow_grp_info grp_info,
 			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5169,6 +5179,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 	int ret = 0;
 
 #ifndef HAVE_MLX5DV_DR
+	RTE_SET_USED(tunnel);
+	RTE_SET_USED(grp_info);
 	if (attributes->group)
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -5177,9 +5189,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 #else
 	uint32_t table = 0;
 
-	ret = mlx5_flow_group_to_table(attributes, external,
-				       attributes->group, !!priv->fdb_def_rule,
-				       &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attributes->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	if (!table)
@@ -5293,10 +5304,28 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 	const struct rte_flow_item_vlan *vlan_m = NULL;
 	int16_t rw_act_num = 0;
 	uint64_t is_root;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	if (items == NULL)
 		return -1;
-	ret = flow_dv_validate_attributes(dev, attr, external, error);
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		tunnel = flow_items_to_tunnel(items);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_MATCH |
+				MLX5_FLOW_ACTION_DECAP;
+	} else if (is_flow_tunnel_steer_rule(dev, attr, items, actions)) {
+		tunnel = flow_actions_to_tunnel(actions);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+	} else {
+		tunnel = NULL;
+	}
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = flow_dv_validate_attributes(dev, tunnel, attr, grp_info, error);
 	if (ret < 0)
 		return ret;
 	is_root = (uint64_t)ret;
@@ -5309,6 +5338,15 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  NULL, "item not supported");
 		switch (type) {
+		case MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL:
+			if (items[0].type != (typeof(items[0].type))
+						MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ITEM,
+						NULL, "MLX5 private items "
+						"must be the first");
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -5894,7 +5932,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_MDF_TTL;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			ret = flow_dv_validate_action_jump(actions,
+			ret = flow_dv_validate_action_jump(dev, tunnel, actions,
 							   action_flags,
 							   attr, external,
 							   error);
@@ -6003,6 +6041,17 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			action_flags |= MLX5_FLOW_ACTION_SAMPLE;
 			++actions_n;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			if (actions[0].type != (typeof(actions[0].type))
+				MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL, "MLX5 private action "
+						"must be the first");
+
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -6010,6 +6059,54 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  "action not supported");
 		}
 	}
+	/*
+	 * Validate actions in flow rules
+	 * - Explicit decap action is prohibited by the tunnel offload API.
+	 * - Drop action in tunnel steer rule is prohibited by the API.
+	 * - Application cannot use MARK action because it's value can mask
+	 *   tunnel default miss nitification.
+	 * - JUMP in tunnel match rule has no support in current PMD
+	 *   implementation.
+	 * - TAG & META are reserved for future uses.
+	 */
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_SET) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_DECAP    |
+					    MLX5_FLOW_ACTION_MARK     |
+					    MLX5_FLOW_ACTION_SET_TAG  |
+					    MLX5_FLOW_ACTION_SET_META |
+					    MLX5_FLOW_ACTION_DROP;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set decap rule");
+		if (!(action_flags & MLX5_FLOW_ACTION_JUMP))
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel set decap rule must terminate "
+					"with JUMP");
+		if (!attr->ingress)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel flows for ingress traffic only");
+	}
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_MATCH) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_JUMP    |
+					    MLX5_FLOW_ACTION_MARK    |
+					    MLX5_FLOW_ACTION_SET_TAG |
+					    MLX5_FLOW_ACTION_SET_META;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set match rule");
+	}
 	/*
 	 * Validate the drop action mutual exclusion with other actions.
 	 * Drop action is mutually-exclusive with any other action, except for
@@ -7876,6 +7973,9 @@ static struct mlx5_flow_tbl_resource *
 flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 			 uint32_t table_id, uint8_t egress,
 			 uint8_t transfer,
+			 bool external,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group_id,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -7912,6 +8012,9 @@ flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	tbl_data->idx = idx;
+	tbl_data->tunnel = tunnel;
+	tbl_data->group_id = group_id;
+	tbl_data->external = external;
 	tbl = &tbl_data->tbl;
 	pos = &tbl_data->entry;
 	if (transfer)
@@ -7975,6 +8078,41 @@ flow_dv_tbl_resource_release(struct rte_eth_dev *dev,
 
 		mlx5_flow_os_destroy_flow_tbl(tbl->obj);
 		tbl->obj = NULL;
+		if (is_tunnel_offload_active(dev) && tbl_data->external) {
+			struct mlx5_hlist_entry *he;
+			struct mlx5_hlist *tunnel_grp_hash;
+			struct mlx5_flow_tunnel_hub *thub =
+							mlx5_tunnel_hub(dev);
+			union tunnel_tbl_key tunnel_key = {
+				.tunnel_id = tbl_data->tunnel ?
+						tbl_data->tunnel->tunnel_id : 0,
+				.group = tbl_data->group_id
+			};
+			union mlx5_flow_tbl_key table_key = {
+				.v64 = pos->key
+			};
+			uint32_t table_id = table_key.table_id;
+
+			tunnel_grp_hash = tbl_data->tunnel ?
+						tbl_data->tunnel->groups :
+						thub->groups;
+			he = mlx5_hlist_lookup(tunnel_grp_hash, tunnel_key.val);
+			if (he) {
+				struct tunnel_tbl_entry *tte;
+				tte = container_of(he, typeof(*tte), hash);
+				MLX5_ASSERT(tte->flow_table == table_id);
+				mlx5_hlist_remove(tunnel_grp_hash, he);
+				mlx5_free(tte);
+			}
+			mlx5_flow_id_release(mlx5_tunnel_hub(dev)->table_ids,
+					     tunnel_flow_tbl_to_id(table_id));
+			DRV_LOG(DEBUG,
+				"port %u release table_id %#x tunnel %u group %u",
+				dev->data->port_id, table_id,
+				tbl_data->tunnel ?
+				tbl_data->tunnel->tunnel_id : 0,
+				tbl_data->group_id);
+		}
 		/* remove the entry from the hash list and free memory. */
 		mlx5_hlist_remove(sh->flow_tbls, pos);
 		mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_JUMP],
@@ -8020,7 +8158,7 @@ flow_dv_matcher_register(struct rte_eth_dev *dev,
 	int ret;
 
 	tbl = flow_dv_tbl_resource_get(dev, key->table_id, key->direction,
-				       key->domain, error);
+				       key->domain, false, NULL, 0, error);
 	if (!tbl)
 		return -rte_errno;	/* No need to refill the error info */
 	tbl_data = container_of(tbl, struct mlx5_flow_tbl_data_entry, tbl);
@@ -8515,7 +8653,8 @@ flow_dv_sample_resource_register(struct rte_eth_dev *dev,
 	*cache_resource = *resource;
 	/* Create normal path table level */
 	tbl = flow_dv_tbl_resource_get(dev, next_ft_id,
-					attr->egress, attr->transfer, error);
+					attr->egress, attr->transfer,
+					dev_flow->external, NULL, 0, error);
 	if (!tbl) {
 		rte_flow_error_set(error, ENOMEM,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -9127,6 +9266,12 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 	int tmp_actions_n = 0;
 	uint32_t table;
 	int ret = 0;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!dev_flow->external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	memset(&mdest_res, 0, sizeof(struct mlx5_flow_dv_dest_array_resource));
 	memset(&sample_res, 0, sizeof(struct mlx5_flow_dv_sample_resource));
@@ -9134,8 +9279,17 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
 	/* update normal path action resource into last index of array */
 	sample_act = &mdest_res.sample_act[MLX5_MAX_DEST_NUM - 1];
-	ret = mlx5_flow_group_to_table(attr, dev_flow->external, attr->group,
-				       !!priv->fdb_def_rule, &table, error);
+	tunnel = is_flow_tunnel_match_rule(dev, attr, items, actions) ?
+		 flow_items_to_tunnel(items) :
+		 is_flow_tunnel_steer_rule(dev, attr, items, actions) ?
+		 flow_actions_to_tunnel(actions) :
+		 dev_flow->tunnel ? dev_flow->tunnel : NULL;
+	mhdr_res->ft_type = attr->egress ? MLX5DV_FLOW_TABLE_TYPE_NIC_TX :
+					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attr->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	dev_flow->dv.group = table;
@@ -9145,6 +9299,45 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		priority = dev_conf->flow_prio - 1;
 	/* number of actions must be set to 0 in case of dirty stack. */
 	mhdr_res->actions_num = 0;
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		/*
+		 * do not add decap action if match rule drops packet
+		 * HW rejects rules with decap & drop
+		 */
+		bool add_decap = true;
+		const struct rte_flow_action *ptr = actions;
+		struct mlx5_flow_tbl_resource *tbl;
+
+		for (; ptr->type != RTE_FLOW_ACTION_TYPE_END; ptr++) {
+			if (ptr->type == RTE_FLOW_ACTION_TYPE_DROP) {
+				add_decap = false;
+				break;
+			}
+		}
+		if (add_decap) {
+			if (flow_dv_create_action_l2_decap(dev, dev_flow,
+							   attr->transfer,
+							   error))
+				return -rte_errno;
+			dev_flow->dv.actions[actions_n++] =
+					dev_flow->dv.encap_decap->action;
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
+		}
+		/*
+		 * bind table_id with <group, table> for tunnel match rule.
+		 * Tunnel set rule establishes that bind in JUMP action handler.
+		 * Required for scenario when application creates tunnel match
+		 * rule before tunnel set rule.
+		 */
+		tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+					       attr->transfer,
+					       !!dev_flow->external, tunnel,
+					       attr->group, error);
+		if (!tbl)
+			return rte_flow_error_set
+			       (error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+			       actions, "cannot register tunnel group");
+	}
 	for (; !actions_end ; actions++) {
 		const struct rte_flow_action_queue *queue;
 		const struct rte_flow_action_rss *rss;
@@ -9165,6 +9358,9 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 						  actions,
 						  "action not supported");
 		switch (action_type) {
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -9408,18 +9604,18 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 							action->conf)->group;
-			if (dev_flow->external && jump_group <
-					MLX5_MAX_TABLES_EXTERNAL)
-				jump_group *= MLX5_FLOW_TABLE_FACTOR;
-			ret = mlx5_flow_group_to_table(attr, dev_flow->external,
+			grp_info.std_tbl_fix = 0;
+			ret = mlx5_flow_group_to_table(dev, tunnel,
 						       jump_group,
-						       !!priv->fdb_def_rule,
-						       &table, error);
+						       &table,
+						       grp_info, error);
 			if (ret)
 				return ret;
-			tbl = flow_dv_tbl_resource_get(dev, table,
-						       attr->egress,
-						       attr->transfer, error);
+			tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+						       attr->transfer,
+						       !!dev_flow->external,
+						       tunnel, jump_group,
+						       error);
 			if (!tbl)
 				return rte_flow_error_set
 						(error, errno,
@@ -10890,7 +11086,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 		dtb = &mtb->ingress;
 	/* Create the meter table with METER level. */
 	dtb->tbl = flow_dv_tbl_resource_get(dev, MLX5_FLOW_TABLE_LEVEL_METER,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->tbl) {
 		DRV_LOG(ERR, "Failed to create meter policer table.");
 		return -1;
@@ -10898,7 +11095,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 	/* Create the meter suffix table with SUFFIX level. */
 	dtb->sfx_tbl = flow_dv_tbl_resource_get(dev,
 					    MLX5_FLOW_TABLE_LEVEL_SUFFIX,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->sfx_tbl) {
 		DRV_LOG(ERR, "Failed to create meter suffix table.");
 		return -1;
@@ -11217,10 +11415,10 @@ mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev)
 	void *flow = NULL;
 	int i, ret = -1;
 
-	tbl = flow_dv_tbl_resource_get(dev, 0, 0, 0, NULL);
+	tbl = flow_dv_tbl_resource_get(dev, 0, 0, 0, false, NULL, 0, NULL);
 	if (!tbl)
 		goto err;
-	dest_tbl = flow_dv_tbl_resource_get(dev, 1, 0, 0, NULL);
+	dest_tbl = flow_dv_tbl_resource_get(dev, 1, 0, 0, false, NULL, 0, NULL);
 	if (!dest_tbl)
 		goto err;
 	dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0x4);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH v5] net/mlx5: implement tunnel offload API
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (14 preceding siblings ...)
  2020-10-23 13:57 ` [dpdk-dev] [PATCH v4] " Gregory Etelson
@ 2020-10-25 14:08 ` Gregory Etelson
  2020-10-25 15:01   ` Raslan Darawsheh
  2020-10-27 16:12 ` [dpdk-dev] [PATCH] net/mlx5: tunnel offload code cleanup Gregory Etelson
  2020-10-28  4:58 ` [dpdk-dev] [PATCH] net/mlx5: fix tunnel flow destroy Gregory Etelson
  17 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-10-25 14:08 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Viacheslav Ovsiienko,
	Shahaf Shuler

Tunnel Offload API provides hardware independent, unified model
to offload tunneled traffic. Key model elements are:
 - apply matches to both outer and inner packet headers
   during entire offload procedure;
 - restore outer header of partially offloaded packet;
 - model is implemented as a set of helper functions.

Implementation details:
* tunnel_offload PMD parameter must be set to 1 to enable the feature.
* application cannot use MARK and META flow actions whith tunnel.
* offload JUMP action is restricted to steering tunnel rule only.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst         |   3 +
 drivers/net/mlx5/linux/mlx5_os.c |  18 +
 drivers/net/mlx5/mlx5.c          |   8 +-
 drivers/net/mlx5/mlx5.h          |   3 +
 drivers/net/mlx5/mlx5_defs.h     |   2 +
 drivers/net/mlx5/mlx5_flow.c     | 681 ++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_flow.h     | 171 +++++++-
 drivers/net/mlx5/mlx5_flow_dv.c  | 254 ++++++++++--
 8 files changed, 1086 insertions(+), 54 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index e4e607c9ea..66524f1262 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -788,6 +788,9 @@ Driver options
     24 bits. The actual supported width can be retrieved in runtime by
     series of rte_flow_validate() trials.
 
+  - 3, this engages tunnel offload mode. In E-Switch configuration, that
+    mode implicitly activates ``dv_xmeta_en=1``.
+
   +------+-----------+-----------+-------------+-------------+
   | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
   +======+===========+===========+=============+=============+
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 40f9446d43..ed3f020d82 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -291,6 +291,12 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		sh->esw_drop_action = mlx5_glue->dr_create_flow_action_drop();
 	}
 #endif
+	if (!sh->tunnel_hub)
+		err = mlx5_alloc_tunnel_hub(sh);
+	if (err) {
+		DRV_LOG(ERR, "mlx5_alloc_tunnel_hub failed err=%d", err);
+		goto error;
+	}
 	if (priv->config.reclaim_mode == MLX5_RCM_AGGR) {
 		mlx5_glue->dr_reclaim_domain_memory(sh->rx_domain, 1);
 		mlx5_glue->dr_reclaim_domain_memory(sh->tx_domain, 1);
@@ -335,6 +341,10 @@ mlx5_alloc_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 	return err;
 }
@@ -391,6 +401,10 @@ mlx5_os_free_shared_dr(struct mlx5_priv *priv)
 		mlx5_hlist_destroy(sh->tag_table, NULL, NULL);
 		sh->tag_table = NULL;
 	}
+	if (sh->tunnel_hub) {
+		mlx5_release_tunnel_hub(sh, priv->dev_port);
+		sh->tunnel_hub = NULL;
+	}
 	mlx5_free_table_hash_list(priv);
 }
 
@@ -733,6 +747,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 			strerror(rte_errno));
 		goto error;
 	}
+	if (config->dv_miss_info) {
+		if (switch_info->master || switch_info->representor)
+			config->dv_xmeta_en = MLX5_XMETA_MODE_META16;
+	}
 	mlx5_malloc_mem_select(config->sys_mem_en);
 	sh = mlx5_alloc_shared_dev_ctx(spawn, config);
 	if (!sh)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2484251b2f..6c422e8b4a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1619,13 +1619,17 @@ mlx5_args_check(const char *key, const char *val, void *opaque)
 	} else if (strcmp(MLX5_DV_XMETA_EN, key) == 0) {
 		if (tmp != MLX5_XMETA_MODE_LEGACY &&
 		    tmp != MLX5_XMETA_MODE_META16 &&
-		    tmp != MLX5_XMETA_MODE_META32) {
+		    tmp != MLX5_XMETA_MODE_META32 &&
+		    tmp != MLX5_XMETA_MODE_MISS_INFO) {
 			DRV_LOG(ERR, "invalid extensive "
 				     "metadata parameter");
 			rte_errno = EINVAL;
 			return -rte_errno;
 		}
-		config->dv_xmeta_en = tmp;
+		if (tmp != MLX5_XMETA_MODE_MISS_INFO)
+			config->dv_xmeta_en = tmp;
+		else
+			config->dv_miss_info = 1;
 	} else if (strcmp(MLX5_LACP_BY_USER, key) == 0) {
 		config->lacp_by_user = !!tmp;
 	} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 658533de6f..bb954c41bb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -208,6 +208,7 @@ struct mlx5_dev_config {
 	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	unsigned int sys_mem_en:1; /* The default memory allocator. */
 	unsigned int decap_en:1; /* Whether decap will be used or not. */
+	unsigned int dv_miss_info:1; /* restore packet after partial hw miss */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
@@ -644,6 +645,8 @@ struct mlx5_dev_ctx_shared {
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
 	struct mlx5_hlist *flow_tbls;
+	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
+	struct mlx5_flow_tunnel_hub *tunnel_hub;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
 	void *pop_vlan_action; /* Pointer to DR pop VLAN action. */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 22e41df1eb..42916ed7a7 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -165,6 +165,8 @@
 #define MLX5_XMETA_MODE_LEGACY 0
 #define MLX5_XMETA_MODE_META16 1
 #define MLX5_XMETA_MODE_META32 2
+/* Provide info on patrial hw miss. Implies MLX5_XMETA_MODE_META16 */
+#define MLX5_XMETA_MODE_MISS_INFO 3
 
 /* MLX5_TX_DB_NC supported values. */
 #define MLX5_TXDB_CACHED 0
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 6077685430..949b9ced9b 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -19,6 +19,7 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 #include <rte_ip.h>
+#include <rte_hash.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
@@ -32,6 +33,18 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_common_os.h"
 
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id);
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev, struct mlx5_flow_tunnel *tunnel);
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark);
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel);
+
+
 /** Device flow drivers. */
 extern const struct mlx5_flow_driver_ops mlx5_flow_verbs_drv_ops;
 
@@ -567,6 +580,162 @@ static int mlx5_shared_action_query
 				 const struct rte_flow_shared_action *action,
 				 void *data,
 				 struct rte_flow_error *error);
+static inline bool
+mlx5_flow_tunnel_validate(struct rte_eth_dev *dev,
+			  struct rte_flow_tunnel *tunnel,
+			  const char *err_msg)
+{
+	err_msg = NULL;
+	if (!is_tunnel_offload_active(dev)) {
+		err_msg = "tunnel offload was not activated";
+		goto out;
+	} else if (!tunnel) {
+		err_msg = "no application tunnel";
+		goto out;
+	}
+
+	switch (tunnel->type) {
+	default:
+		err_msg = "unsupported tunnel type";
+		goto out;
+	case RTE_FLOW_ITEM_TYPE_VXLAN:
+		break;
+	}
+
+out:
+	return !err_msg;
+}
+
+
+static int
+mlx5_flow_tunnel_decap_set(struct rte_eth_dev *dev,
+		    struct rte_flow_tunnel *app_tunnel,
+		    struct rte_flow_action **actions,
+		    uint32_t *num_of_actions,
+		    struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*actions = &tunnel->action;
+	*num_of_actions = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_match(struct rte_eth_dev *dev,
+		       struct rte_flow_tunnel *app_tunnel,
+		       struct rte_flow_item **items,
+		       uint32_t *num_of_items,
+		       struct rte_flow_error *error)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	const char *err_msg = NULL;
+	bool verdict = mlx5_flow_tunnel_validate(dev, app_tunnel, err_msg);
+
+	if (!verdict)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  err_msg);
+	ret = mlx5_get_flow_tunnel(dev, app_tunnel, &tunnel);
+	if (ret < 0) {
+		return rte_flow_error_set(error, ret,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "failed to initialize pmd tunnel");
+	}
+	*items = &tunnel->item;
+	*num_of_items = 1;
+	return 0;
+}
+
+static int
+mlx5_flow_item_release(struct rte_eth_dev *dev,
+		       struct rte_flow_item *pmd_items,
+		       uint32_t num_items, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->item == pmd_items)
+			break;
+	}
+	if (!tun || num_items != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+	return 0;
+}
+
+static int
+mlx5_flow_action_release(struct rte_eth_dev *dev,
+			 struct rte_flow_action *pmd_actions,
+			 uint32_t num_actions, struct rte_flow_error *err)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (&tun->action == pmd_actions)
+			break;
+	}
+	if (!tun || num_actions != 1)
+		return rte_flow_error_set(err, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_HANDLE, NULL,
+					  "invalid argument");
+	if (!__atomic_sub_fetch(&tun->refctn, 1, __ATOMIC_RELAXED))
+		mlx5_flow_tunnel_free(dev, tun);
+
+	return 0;
+}
+
+static int
+mlx5_flow_tunnel_get_restore_info(struct rte_eth_dev *dev,
+				  struct rte_mbuf *m,
+				  struct rte_flow_restore_info *info,
+				  struct rte_flow_error *err)
+{
+	uint64_t ol_flags = m->ol_flags;
+	const struct mlx5_flow_tbl_data_entry *tble;
+	const uint64_t mask = PKT_RX_FDIR | PKT_RX_FDIR_ID;
+
+	if ((ol_flags & mask) != mask)
+		goto err;
+	tble = tunnel_mark_decode(dev, m->hash.fdir.hi);
+	if (!tble) {
+		DRV_LOG(DEBUG, "port %u invalid miss tunnel mark %#x",
+			dev->data->port_id, m->hash.fdir.hi);
+		goto err;
+	}
+	MLX5_ASSERT(tble->tunnel);
+	memcpy(&info->tunnel, &tble->tunnel->app_tunnel, sizeof(info->tunnel));
+	info->group_id = tble->group_id;
+	info->flags = RTE_FLOW_RESTORE_INFO_TUNNEL |
+		      RTE_FLOW_RESTORE_INFO_GROUP_ID |
+		      RTE_FLOW_RESTORE_INFO_ENCAPSULATED;
+
+	return 0;
+
+err:
+	return rte_flow_error_set(err, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  "failed to get restore info");
+}
 
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
@@ -581,6 +750,11 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.shared_action_destroy = mlx5_shared_action_destroy,
 	.shared_action_update = mlx5_shared_action_update,
 	.shared_action_query = mlx5_shared_action_query,
+	.tunnel_decap_set = mlx5_flow_tunnel_decap_set,
+	.tunnel_match = mlx5_flow_tunnel_match,
+	.tunnel_action_decap_release = mlx5_flow_action_release,
+	.tunnel_item_release = mlx5_flow_item_release,
+	.get_restore_info = mlx5_flow_tunnel_get_restore_info,
 };
 
 /* Convert FDIR request to Generic flow. */
@@ -4065,6 +4239,142 @@ flow_hairpin_split(struct rte_eth_dev *dev,
 	return 0;
 }
 
+__extension__
+union tunnel_offload_mark {
+	uint32_t val;
+	struct {
+		uint32_t app_reserve:8;
+		uint32_t table_id:15;
+		uint32_t transfer:1;
+		uint32_t _unused_:8;
+	};
+};
+
+struct tunnel_default_miss_ctx {
+	uint16_t *queue;
+	__extension__
+	union {
+		struct rte_flow_action_rss action_rss;
+		struct rte_flow_action_queue miss_queue;
+		struct rte_flow_action_jump miss_jump;
+		uint8_t raw[0];
+	};
+};
+
+static int
+flow_tunnel_add_default_miss(struct rte_eth_dev *dev,
+			     struct rte_flow *flow,
+			     const struct rte_flow_attr *attr,
+			     const struct rte_flow_action *app_actions,
+			     uint32_t flow_idx,
+			     struct tunnel_default_miss_ctx *ctx,
+			     struct rte_flow_error *error)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_flow *dev_flow;
+	struct rte_flow_attr miss_attr = *attr;
+	const struct mlx5_flow_tunnel *tunnel = app_actions[0].conf;
+	const struct rte_flow_item miss_items[2] = {
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+			.spec = NULL,
+			.last = NULL,
+			.mask = NULL
+		}
+	};
+	union tunnel_offload_mark mark_id;
+	struct rte_flow_action_mark miss_mark;
+	struct rte_flow_action miss_actions[3] = {
+		[0] = { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &miss_mark },
+		[2] = { .type = RTE_FLOW_ACTION_TYPE_END,  .conf = NULL }
+	};
+	const struct rte_flow_action_jump *jump_data;
+	uint32_t i, flow_table = 0; /* prevent compilation warning */
+	struct flow_grp_info grp_info = {
+		.external = 1,
+		.transfer = attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+		.std_tbl_fix = 0,
+	};
+	int ret;
+
+	if (!attr->transfer) {
+		uint32_t q_size;
+
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_RSS;
+		q_size = priv->reta_idx_n * sizeof(ctx->queue[0]);
+		ctx->queue = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, q_size,
+					 0, SOCKET_ID_ANY);
+		if (!ctx->queue)
+			return rte_flow_error_set
+				(error, ENOMEM,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid default miss RSS");
+		ctx->action_rss.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		ctx->action_rss.level = 0,
+		ctx->action_rss.types = priv->rss_conf.rss_hf,
+		ctx->action_rss.key_len = priv->rss_conf.rss_key_len,
+		ctx->action_rss.queue_num = priv->reta_idx_n,
+		ctx->action_rss.key = priv->rss_conf.rss_key,
+		ctx->action_rss.queue = ctx->queue;
+		if (!priv->reta_idx_n || !priv->rxqs_n)
+			return rte_flow_error_set
+				(error, EINVAL,
+				RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+				NULL, "invalid port configuration");
+		if (!(dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS_FLAG))
+			ctx->action_rss.types = 0;
+		for (i = 0; i != priv->reta_idx_n; ++i)
+			ctx->queue[i] = (*priv->reta_idx)[i];
+	} else {
+		miss_actions[1].type = RTE_FLOW_ACTION_TYPE_JUMP;
+		ctx->miss_jump.group = MLX5_TNL_MISS_FDB_JUMP_GRP;
+	}
+	miss_actions[1].conf = (typeof(miss_actions[1].conf))ctx->raw;
+	for (; app_actions->type != RTE_FLOW_ACTION_TYPE_JUMP; app_actions++);
+	jump_data = app_actions->conf;
+	miss_attr.priority = MLX5_TNL_MISS_RULE_PRIORITY;
+	miss_attr.group = jump_data->group;
+	ret = mlx5_flow_group_to_table(dev, tunnel, jump_data->group,
+				       &flow_table, grp_info, error);
+	if (ret)
+		return rte_flow_error_set(error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ACTION_CONF,
+					  NULL, "invalid tunnel id");
+	mark_id.app_reserve = 0;
+	mark_id.table_id = tunnel_flow_tbl_to_id(flow_table);
+	mark_id.transfer = !!attr->transfer;
+	mark_id._unused_ = 0;
+	miss_mark.id = mark_id.val;
+	dev_flow = flow_drv_prepare(dev, flow, &miss_attr,
+				    miss_items, miss_actions, flow_idx, error);
+	if (!dev_flow)
+		return -rte_errno;
+	dev_flow->flow = flow;
+	dev_flow->external = true;
+	dev_flow->tunnel = tunnel;
+	/* Subflow object was created, we must include one in the list. */
+	SILIST_INSERT(&flow->dev_handles, dev_flow->handle_idx,
+		      dev_flow->handle, next);
+	DRV_LOG(DEBUG,
+		"port %u tunnel type=%d id=%u miss rule priority=%u group=%u",
+		dev->data->port_id, tunnel->app_tunnel.type,
+		tunnel->tunnel_id, miss_attr.priority, miss_attr.group);
+	ret = flow_drv_translate(dev, dev_flow, &miss_attr, miss_items,
+				  miss_actions, error);
+	if (!ret)
+		ret = flow_mreg_update_copy_table(dev, flow, miss_actions,
+						  error);
+
+	return ret;
+}
+
 /**
  * The last stage of splitting chain, just creates the subflow
  * without any modification.
@@ -5187,6 +5497,27 @@ flow_create_split_outer(struct rte_eth_dev *dev,
 	return ret;
 }
 
+static struct mlx5_flow_tunnel *
+flow_tunnel_from_rule(struct rte_eth_dev *dev,
+		      const struct rte_flow_attr *attr,
+		      const struct rte_flow_item items[],
+		      const struct rte_flow_action actions[])
+{
+	struct mlx5_flow_tunnel *tunnel;
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wcast-qual"
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)items[0].spec;
+	else if (is_flow_tunnel_steer_rule(dev, attr, items, actions))
+		tunnel = (struct mlx5_flow_tunnel *)actions[0].conf;
+	else
+		tunnel = NULL;
+#pragma GCC diagnostic pop
+
+	return tunnel;
+}
+
 /**
  * Create a flow and add it to @p list.
  *
@@ -5253,6 +5584,8 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	struct rte_flow_attr attr_factor = {0};
 	const struct rte_flow_action *actions;
 	struct rte_flow_action *translated_actions = NULL;
+	struct mlx5_flow_tunnel *tunnel;
+	struct tunnel_default_miss_ctx default_miss_ctx = { 0, };
 	int ret = flow_shared_actions_translate(original_actions,
 						shared_actions,
 						&shared_actions_n,
@@ -5264,8 +5597,6 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	}
 	actions = translated_actions ? translated_actions : original_actions;
 	memcpy((void *)&attr_factor, (const void *)attr, sizeof(*attr));
-	if (external)
-		attr_factor.group *= MLX5_FLOW_TABLE_FACTOR;
 	p_actions_rx = actions;
 	hairpin_flow = flow_check_hairpin_split(dev, &attr_factor, actions);
 	ret = flow_drv_validate(dev, &attr_factor, items, p_actions_rx,
@@ -5340,6 +5671,19 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 					      error);
 		if (ret < 0)
 			goto error;
+		if (is_flow_tunnel_steer_rule(dev, attr,
+					      buf->entry[i].pattern,
+					      p_actions_rx)) {
+			ret = flow_tunnel_add_default_miss(dev, flow, attr,
+							   p_actions_rx,
+							   idx,
+							   &default_miss_ctx,
+							   error);
+			if (ret < 0) {
+				mlx5_free(default_miss_ctx.queue);
+				goto error;
+			}
+		}
 	}
 	/* Create the tx flow. */
 	if (hairpin_flow) {
@@ -5395,6 +5739,13 @@ flow_list_create(struct rte_eth_dev *dev, uint32_t *list,
 	priv->flow_idx = priv->flow_nested_idx;
 	if (priv->flow_nested_idx)
 		priv->flow_nested_idx = 0;
+	tunnel = flow_tunnel_from_rule(dev, attr, items, actions);
+	if (tunnel) {
+		flow->tunnel = 1;
+		flow->tunnel_id = tunnel->tunnel_id;
+		__atomic_add_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED);
+		mlx5_free(default_miss_ctx.queue);
+	}
 	return idx;
 error:
 	MLX5_ASSERT(flow);
@@ -5530,6 +5881,7 @@ mlx5_flow_create(struct rte_eth_dev *dev,
 				   "port not started");
 		return NULL;
 	}
+
 	return (void *)(uintptr_t)flow_list_create(dev, &priv->flows,
 				  attr, items, actions, true, error);
 }
@@ -5584,6 +5936,13 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 		}
 	}
 	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
+	if (flow->tunnel) {
+		struct mlx5_flow_tunnel *tunnel;
+		tunnel = mlx5_find_tunnel_id(dev, flow->tunnel_id);
+		RTE_VERIFY(tunnel);
+		if (!__atomic_sub_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED))
+			mlx5_flow_tunnel_free(dev, tunnel);
+	}
 }
 
 /**
@@ -7118,19 +7477,122 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 	sh->cmng.pending_queries--;
 }
 
+static const struct mlx5_flow_tbl_data_entry  *
+tunnel_mark_decode(struct rte_eth_dev *dev, uint32_t mark)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_hlist_entry *he;
+	union tunnel_offload_mark mbits = { .val = mark };
+	union mlx5_flow_tbl_key table_key = {
+		{
+			.table_id = tunnel_id_to_flow_tbl(mbits.table_id),
+			.reserved = 0,
+			.domain = !!mbits.transfer,
+			.direction = 0,
+		}
+	};
+	he = mlx5_hlist_lookup(sh->flow_tbls, table_key.v64);
+	return he ?
+	       container_of(he, struct mlx5_flow_tbl_data_entry, entry) : NULL;
+}
+
+static uint32_t
+tunnel_flow_group_to_flow_table(struct rte_eth_dev *dev,
+				const struct mlx5_flow_tunnel *tunnel,
+				uint32_t group, uint32_t *table,
+				struct rte_flow_error *error)
+{
+	struct mlx5_hlist_entry *he;
+	struct tunnel_tbl_entry *tte;
+	union tunnel_tbl_key key = {
+		.tunnel_id = tunnel ? tunnel->tunnel_id : 0,
+		.group = group
+	};
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_hlist *group_hash;
+
+	group_hash = tunnel ? tunnel->groups : thub->groups;
+	he = mlx5_hlist_lookup(group_hash, key.val);
+	if (!he) {
+		int ret;
+		tte = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+				  sizeof(*tte), 0,
+				  SOCKET_ID_ANY);
+		if (!tte)
+			goto err;
+		tte->hash.key = key.val;
+		ret = mlx5_flow_id_get(thub->table_ids, &tte->flow_table);
+		if (ret) {
+			mlx5_free(tte);
+			goto err;
+		}
+		tte->flow_table = tunnel_id_to_flow_tbl(tte->flow_table);
+		mlx5_hlist_insert(group_hash, &tte->hash);
+	} else {
+		tte = container_of(he, typeof(*tte), hash);
+	}
+	*table = tte->flow_table;
+	DRV_LOG(DEBUG, "port %u tunnel %u group=%#x table=%#x",
+		dev->data->port_id, key.tunnel_id, group, *table);
+	return 0;
+
+err:
+	return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+				  NULL, "tunnel group index not supported");
+}
+
+static int
+flow_group_to_table(uint32_t port_id, uint32_t group, uint32_t *table,
+		    struct flow_grp_info grp_info, struct rte_flow_error *error)
+{
+	if (grp_info.transfer && grp_info.external && grp_info.fdb_def_rule) {
+		if (group == UINT32_MAX)
+			return rte_flow_error_set
+						(error, EINVAL,
+						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+						 NULL,
+						 "group index not supported");
+		*table = group + 1;
+	} else {
+		*table = group;
+	}
+	DRV_LOG(DEBUG, "port %u group=%#x table=%#x", port_id, group, *table);
+	return 0;
+}
+
 /**
  * Translate the rte_flow group index to HW table value.
  *
- * @param[in] attributes
- *   Pointer to flow attributes
- * @param[in] external
- *   Value is part of flow rule created by request external to PMD.
+ * If tunnel offload is disabled, all group ids coverted to flow table
+ * id using the standard method.
+ * If tunnel offload is enabled, group id can be converted using the
+ * standard or tunnel conversion method. Group conversion method
+ * selection depends on flags in `grp_info` parameter:
+ * - Internal (grp_info.external == 0) groups conversion uses the
+ *   standard method.
+ * - Group ids in JUMP action converted with the tunnel conversion.
+ * - Group id in rule attribute conversion depends on a rule type and
+ *   group id value:
+ *   ** non zero group attributes converted with the tunnel method
+ *   ** zero group attribute in non-tunnel rule is converted using the
+ *      standard method - there's only one root table
+ *   ** zero group attribute in steer tunnel rule is converted with the
+ *      standard method - single root table
+ *   ** zero group attribute in match tunnel rule is a special OvS
+ *      case: that value is used for portability reasons. That group
+ *      id is converted with the tunnel conversion method.
+ *
+ * @param[in] dev
+ *   Port device
+ * @param[in] tunnel
+ *   PMD tunnel offload object
  * @param[in] group
  *   rte_flow group index value.
- * @param[out] fdb_def_rule
- *   Whether fdb jump to table 1 is configured.
  * @param[out] table
  *   HW table value.
+ * @param[in] grp_info
+ *   flags used for conversion
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -7138,22 +7600,36 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx5_flow_group_to_table(const struct rte_flow_attr *attributes, bool external,
-			 uint32_t group, bool fdb_def_rule, uint32_t *table,
+mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group, uint32_t *table,
+			 struct flow_grp_info grp_info,
 			 struct rte_flow_error *error)
 {
-	if (attributes->transfer && external && fdb_def_rule) {
-		if (group == UINT32_MAX)
-			return rte_flow_error_set
-						(error, EINVAL,
-						 RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
-						 NULL,
-						 "group index not supported");
-		*table = group + 1;
+	int ret;
+	bool standard_translation;
+
+	if (grp_info.external && group < MLX5_MAX_TABLES_EXTERNAL)
+		group *= MLX5_FLOW_TABLE_FACTOR;
+	if (is_tunnel_offload_active(dev)) {
+		standard_translation = !grp_info.external ||
+					grp_info.std_tbl_fix;
 	} else {
-		*table = group;
+		standard_translation = true;
 	}
-	return 0;
+	DRV_LOG(DEBUG,
+		"port %u group=%#x transfer=%d external=%d fdb_def_rule=%d translate=%s",
+		dev->data->port_id, group, grp_info.transfer,
+		grp_info.external, grp_info.fdb_def_rule,
+		standard_translation ? "STANDARD" : "TUNNEL");
+	if (standard_translation)
+		ret = flow_group_to_table(dev->data->port_id, group, table,
+					  grp_info, error);
+	else
+		ret = tunnel_flow_group_to_flow_table(dev, tunnel, group,
+						      table, error);
+
+	return ret;
 }
 
 /**
@@ -7524,3 +8000,168 @@ mlx5_shared_action_flush(struct rte_eth_dev *dev)
 	}
 	return ret;
 }
+
+static void
+mlx5_flow_tunnel_free(struct rte_eth_dev *dev,
+		      struct mlx5_flow_tunnel *tunnel)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+
+	DRV_LOG(DEBUG, "port %u release pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+	RTE_VERIFY(!__atomic_load_n(&tunnel->refctn, __ATOMIC_RELAXED));
+	LIST_REMOVE(tunnel, chain);
+	mlx5_flow_id_release(id_pool, tunnel->tunnel_id);
+	mlx5_hlist_destroy(tunnel->groups, NULL, NULL);
+	mlx5_free(tunnel);
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_find_tunnel_id(struct rte_eth_dev *dev, uint32_t id)
+{
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (tun->tunnel_id == id)
+			break;
+	}
+
+	return tun;
+}
+
+static struct mlx5_flow_tunnel *
+mlx5_flow_tunnel_allocate(struct rte_eth_dev *dev,
+			  const struct rte_flow_tunnel *app_tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel *tunnel;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_id_pool *id_pool = thub->tunnel_ids;
+	uint32_t id;
+
+	ret = mlx5_flow_id_get(id_pool, &id);
+	if (ret)
+		return NULL;
+	/**
+	 * mlx5 flow tunnel is an auxlilary data structure
+	 * It's not part of IO. No need to allocate it from
+	 * huge pages pools dedicated for IO
+	 */
+	tunnel = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*tunnel),
+			     0, SOCKET_ID_ANY);
+	if (!tunnel) {
+		mlx5_flow_id_pool_release(id_pool);
+		return NULL;
+	}
+	tunnel->groups = mlx5_hlist_create("tunnel groups", 1024);
+	if (!tunnel->groups) {
+		mlx5_flow_id_pool_release(id_pool);
+		mlx5_free(tunnel);
+		return NULL;
+	}
+	/* initiate new PMD tunnel */
+	memcpy(&tunnel->app_tunnel, app_tunnel, sizeof(*app_tunnel));
+	tunnel->tunnel_id = id;
+	tunnel->action.type = (typeof(tunnel->action.type))
+			      MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET;
+	tunnel->action.conf = tunnel;
+	tunnel->item.type = (typeof(tunnel->item.type))
+			    MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL;
+	tunnel->item.spec = tunnel;
+	tunnel->item.last = NULL;
+	tunnel->item.mask = NULL;
+
+	DRV_LOG(DEBUG, "port %u new pmd tunnel id=0x%x",
+		dev->data->port_id, tunnel->tunnel_id);
+
+	return tunnel;
+}
+
+static int
+mlx5_get_flow_tunnel(struct rte_eth_dev *dev,
+		     const struct rte_flow_tunnel *app_tunnel,
+		     struct mlx5_flow_tunnel **tunnel)
+{
+	int ret;
+	struct mlx5_flow_tunnel_hub *thub = mlx5_tunnel_hub(dev);
+	struct mlx5_flow_tunnel *tun;
+
+	LIST_FOREACH(tun, &thub->tunnels, chain) {
+		if (!memcmp(app_tunnel, &tun->app_tunnel,
+			    sizeof(*app_tunnel))) {
+			*tunnel = tun;
+			ret = 0;
+			break;
+		}
+	}
+	if (!tun) {
+		tun = mlx5_flow_tunnel_allocate(dev, app_tunnel);
+		if (tun) {
+			LIST_INSERT_HEAD(&thub->tunnels, tun, chain);
+			*tunnel = tun;
+		} else {
+			ret = -ENOMEM;
+		}
+	}
+	if (tun)
+		__atomic_add_fetch(&tun->refctn, 1, __ATOMIC_RELAXED);
+
+	return ret;
+}
+
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id)
+{
+	struct mlx5_flow_tunnel_hub *thub = sh->tunnel_hub;
+
+	if (!thub)
+		return;
+	if (!LIST_EMPTY(&thub->tunnels))
+		DRV_LOG(WARNING, "port %u tunnels present\n", port_id);
+	mlx5_flow_id_pool_release(thub->tunnel_ids);
+	mlx5_flow_id_pool_release(thub->table_ids);
+	mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	mlx5_free(thub);
+}
+
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh)
+{
+	int err;
+	struct mlx5_flow_tunnel_hub *thub;
+
+	thub = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, sizeof(*thub),
+			   0, SOCKET_ID_ANY);
+	if (!thub)
+		return -ENOMEM;
+	LIST_INIT(&thub->tunnels);
+	thub->tunnel_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TUNNELS);
+	if (!thub->tunnel_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->table_ids = mlx5_flow_id_pool_alloc(MLX5_MAX_TABLES);
+	if (!thub->table_ids) {
+		err = -rte_errno;
+		goto err;
+	}
+	thub->groups = mlx5_hlist_create("flow groups", MLX5_MAX_TABLES);
+	if (!thub->groups) {
+		err = -rte_errno;
+		goto err;
+	}
+	sh->tunnel_hub = thub;
+
+	return 0;
+
+err:
+	if (thub->groups)
+		mlx5_hlist_destroy(thub->groups, NULL, NULL);
+	if (thub->table_ids)
+		mlx5_flow_id_pool_release(thub->table_ids);
+	if (thub->tunnel_ids)
+		mlx5_flow_id_pool_release(thub->tunnel_ids);
+	if (thub)
+		mlx5_free(thub);
+	return err;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 7faab43fe6..20beb96193 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -26,6 +26,7 @@ enum mlx5_rte_flow_item_type {
 	MLX5_RTE_FLOW_ITEM_TYPE_TAG,
 	MLX5_RTE_FLOW_ITEM_TYPE_TX_QUEUE,
 	MLX5_RTE_FLOW_ITEM_TYPE_VLAN,
+	MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL,
 };
 
 /* Private (internal) rte flow actions. */
@@ -36,6 +37,7 @@ enum mlx5_rte_flow_action_type {
 	MLX5_RTE_FLOW_ACTION_TYPE_COPY_MREG,
 	MLX5_RTE_FLOW_ACTION_TYPE_DEFAULT_MISS,
 	MLX5_RTE_FLOW_ACTION_TYPE_SHARED_RSS,
+	MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET,
 };
 
 /* Matches on selected register. */
@@ -74,7 +76,6 @@ enum mlx5_feature_name {
 	MLX5_MTR_SFX,
 };
 
-/* Pattern outer Layer bits. */
 #define MLX5_FLOW_LAYER_OUTER_L2 (1u << 0)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV4 (1u << 1)
 #define MLX5_FLOW_LAYER_OUTER_L3_IPV6 (1u << 2)
@@ -202,6 +203,8 @@ enum mlx5_feature_name {
 #define MLX5_FLOW_ACTION_AGE (1ull << 34)
 #define MLX5_FLOW_ACTION_DEFAULT_MISS (1ull << 35)
 #define MLX5_FLOW_ACTION_SAMPLE (1ull << 36)
+#define MLX5_FLOW_ACTION_TUNNEL_SET (1ull << 37)
+#define MLX5_FLOW_ACTION_TUNNEL_MATCH (1ull << 38)
 
 #define MLX5_FLOW_FATE_ACTIONS \
 	(MLX5_FLOW_ACTION_DROP | MLX5_FLOW_ACTION_QUEUE | \
@@ -531,6 +534,10 @@ struct mlx5_flow_tbl_data_entry {
 	struct mlx5_flow_dv_jump_tbl_resource jump;
 	/**< jump resource, at most one for each table created. */
 	uint32_t idx; /**< index for the indexed mempool. */
+	/**< tunnel offload */
+	const struct mlx5_flow_tunnel *tunnel;
+	uint32_t group_id;
+	bool external;
 };
 
 /* Sub rdma-core actions list. */
@@ -769,6 +776,7 @@ struct mlx5_flow {
 	};
 	struct mlx5_flow_handle *handle;
 	uint32_t handle_idx; /* Index of the mlx5 flow handle memory. */
+	const struct mlx5_flow_tunnel *tunnel;
 };
 
 /* Flow meter state. */
@@ -914,6 +922,112 @@ struct mlx5_fdir_flow {
 
 #define HAIRPIN_FLOW_ID_BITS 28
 
+#define MLX5_MAX_TUNNELS 256
+#define MLX5_TNL_MISS_RULE_PRIORITY 3
+#define MLX5_TNL_MISS_FDB_JUMP_GRP  0x1234faac
+
+/*
+ * When tunnel offload is active, all JUMP group ids are converted
+ * using the same method. That conversion is applied both to tunnel and
+ * regular rule types.
+ * Group ids used in tunnel rules are relative to it's tunnel (!).
+ * Application can create number of steer rules, using the same
+ * tunnel, with different group id in each rule.
+ * Each tunnel stores its groups internally in PMD tunnel object.
+ * Groups used in regular rules do not belong to any tunnel and are stored
+ * in tunnel hub.
+ */
+
+struct mlx5_flow_tunnel {
+	LIST_ENTRY(mlx5_flow_tunnel) chain;
+	struct rte_flow_tunnel app_tunnel;	/** app tunnel copy */
+	uint32_t tunnel_id;			/** unique tunnel ID */
+	uint32_t refctn;
+	struct rte_flow_action action;
+	struct rte_flow_item item;
+	struct mlx5_hlist *groups;		/** tunnel groups */
+};
+
+/** PMD tunnel related context */
+struct mlx5_flow_tunnel_hub {
+	LIST_HEAD(, mlx5_flow_tunnel) tunnels;
+	struct mlx5_flow_id_pool *tunnel_ids;
+	struct mlx5_flow_id_pool *table_ids;
+	struct mlx5_hlist *groups;		/** non tunnel groups */
+};
+
+/* convert jump group to flow table ID in tunnel rules */
+struct tunnel_tbl_entry {
+	struct mlx5_hlist_entry hash;
+	uint32_t flow_table;
+};
+
+static inline uint32_t
+tunnel_id_to_flow_tbl(uint32_t id)
+{
+	return id | (1u << 16);
+}
+
+static inline uint32_t
+tunnel_flow_tbl_to_id(uint32_t flow_tbl)
+{
+	return flow_tbl & ~(1u << 16);
+}
+
+union tunnel_tbl_key {
+	uint64_t val;
+	struct {
+		uint32_t tunnel_id;
+		uint32_t group;
+	};
+};
+
+static inline struct mlx5_flow_tunnel_hub *
+mlx5_tunnel_hub(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return priv->sh->tunnel_hub;
+}
+
+static inline bool
+is_tunnel_offload_active(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	return !!priv->config.dv_miss_info;
+}
+
+static inline bool
+is_flow_tunnel_match_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (items[0].type == (typeof(items[0].type))
+				 MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL);
+}
+
+static inline bool
+is_flow_tunnel_steer_rule(__rte_unused struct rte_eth_dev *dev,
+			  __rte_unused const struct rte_flow_attr *attr,
+			  __rte_unused const struct rte_flow_item items[],
+			  __rte_unused const struct rte_flow_action actions[])
+{
+	return (actions[0].type == (typeof(actions[0].type))
+				   MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET);
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_actions_to_tunnel(const struct rte_flow_action actions[])
+{
+	return actions[0].conf;
+}
+
+static inline const struct mlx5_flow_tunnel *
+flow_items_to_tunnel(const struct rte_flow_item items[])
+{
+	return items[0].spec;
+}
+
 /* Flow structure. */
 struct rte_flow {
 	ILIST_ENTRY(uint32_t)next; /**< Index to the next flow structure. */
@@ -922,12 +1036,14 @@ struct rte_flow {
 	/**< Device flow handles that are part of the flow. */
 	uint32_t drv_type:2; /**< Driver type. */
 	uint32_t fdir:1; /**< Identifier of associated FDIR if any. */
+	uint32_t tunnel:1;
 	uint32_t hairpin_flow_id:HAIRPIN_FLOW_ID_BITS;
 	/**< The flow id used for hairpin. */
 	uint32_t copy_applied:1; /**< The MARK copy Flow os applied. */
 	uint32_t rix_mreg_copy;
 	/**< Index to metadata register copy table resource. */
 	uint32_t counter; /**< Holds flow counter. */
+	uint32_t tunnel_id;  /**< Tunnel id */
 	uint16_t meter; /**< Holds flow meter id. */
 } __rte_packed;
 
@@ -1089,9 +1205,54 @@ void mlx5_flow_id_pool_release(struct mlx5_flow_id_pool *pool);
 uint32_t mlx5_flow_id_get(struct mlx5_flow_id_pool *pool, uint32_t *id);
 uint32_t mlx5_flow_id_release(struct mlx5_flow_id_pool *pool,
 			      uint32_t id);
-int mlx5_flow_group_to_table(const struct rte_flow_attr *attributes,
-			     bool external, uint32_t group, bool fdb_def_rule,
-			     uint32_t *table, struct rte_flow_error *error);
+__extension__
+struct flow_grp_info {
+	uint64_t external:1;
+	uint64_t transfer:1;
+	uint64_t fdb_def_rule:1;
+	/* force standard group translation */
+	uint64_t std_tbl_fix:1;
+};
+
+static inline bool
+tunnel_use_standard_attr_group_translate
+		    (struct rte_eth_dev *dev,
+		     const struct mlx5_flow_tunnel *tunnel,
+		     const struct rte_flow_attr *attr,
+		     const struct rte_flow_item items[],
+		     const struct rte_flow_action actions[])
+{
+	bool verdict;
+
+	if (!is_tunnel_offload_active(dev))
+		/* no tunnel offload API */
+		verdict = true;
+	else if (tunnel) {
+		/*
+		 * OvS will use jump to group 0 in tunnel steer rule.
+		 * If tunnel steer rule starts from group 0 (attr.group == 0)
+		 * that 0 group must be traslated with standard method.
+		 * attr.group == 0 in tunnel match rule translated with tunnel
+		 * method
+		 */
+		verdict = !attr->group &&
+			  is_flow_tunnel_steer_rule(dev, attr, items, actions);
+	} else {
+		/*
+		 * non-tunnel group translation uses standard method for
+		 * root group only: attr.group == 0
+		 */
+		verdict = !attr->group;
+	}
+
+	return verdict;
+}
+
+int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     uint32_t group, uint32_t *table,
+			     struct flow_grp_info flags,
+				 struct rte_flow_error *error);
 uint64_t mlx5_flow_hashfields_adjust(struct mlx5_flow_rss_desc *rss_desc,
 				     int tunnel, uint64_t layer_types,
 				     uint64_t hash_fields);
@@ -1231,4 +1392,6 @@ int mlx5_flow_meter_flush(struct rte_eth_dev *dev,
 int mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev);
 struct rte_flow_shared_action *mlx5_flow_get_shared_rss(struct rte_flow *flow);
 int mlx5_shared_action_flush(struct rte_eth_dev *dev);
+void mlx5_release_tunnel_hub(struct mlx5_dev_ctx_shared *sh, uint16_t port_id);
+int mlx5_alloc_tunnel_hub(struct mlx5_dev_ctx_shared *sh);
 #endif /* RTE_PMD_MLX5_FLOW_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 66d81e9598..504d842c09 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3947,14 +3947,21 @@ flow_dv_validate_action_modify_ttl(const uint64_t action_flags,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-flow_dv_validate_action_jump(const struct rte_flow_action *action,
+flow_dv_validate_action_jump(struct rte_eth_dev *dev,
+			     const struct mlx5_flow_tunnel *tunnel,
+			     const struct rte_flow_action *action,
 			     uint64_t action_flags,
 			     const struct rte_flow_attr *attributes,
 			     bool external, struct rte_flow_error *error)
 {
 	uint32_t target_group, table;
 	int ret = 0;
-
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attributes->transfer,
+		.fdb_def_rule = 1,
+		.std_tbl_fix = 0
+	};
 	if (action_flags & (MLX5_FLOW_FATE_ACTIONS |
 			    MLX5_FLOW_FATE_ESWITCH_ACTIONS))
 		return rte_flow_error_set(error, EINVAL,
@@ -3977,11 +3984,13 @@ flow_dv_validate_action_jump(const struct rte_flow_action *action,
 					  NULL, "action configuration not set");
 	target_group =
 		((const struct rte_flow_action_jump *)action->conf)->group;
-	ret = mlx5_flow_group_to_table(attributes, external, target_group,
-				       true, &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, target_group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
-	if (attributes->group == target_group)
+	if (attributes->group == target_group &&
+	    !(action_flags & (MLX5_FLOW_ACTION_TUNNEL_SET |
+			      MLX5_FLOW_ACTION_TUNNEL_MATCH)))
 		return rte_flow_error_set(error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
 					  "target group must be other than"
@@ -5160,8 +5169,9 @@ flow_dv_counter_release(struct rte_eth_dev *dev, uint32_t counter)
  */
 static int
 flow_dv_validate_attributes(struct rte_eth_dev *dev,
+			    const struct mlx5_flow_tunnel *tunnel,
 			    const struct rte_flow_attr *attributes,
-			    bool external __rte_unused,
+			    struct flow_grp_info grp_info,
 			    struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -5169,6 +5179,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 	int ret = 0;
 
 #ifndef HAVE_MLX5DV_DR
+	RTE_SET_USED(tunnel);
+	RTE_SET_USED(grp_info);
 	if (attributes->group)
 		return rte_flow_error_set(error, ENOTSUP,
 					  RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -5177,9 +5189,8 @@ flow_dv_validate_attributes(struct rte_eth_dev *dev,
 #else
 	uint32_t table = 0;
 
-	ret = mlx5_flow_group_to_table(attributes, external,
-				       attributes->group, !!priv->fdb_def_rule,
-				       &table, error);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attributes->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	if (!table)
@@ -5293,10 +5304,28 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 	const struct rte_flow_item_vlan *vlan_m = NULL;
 	int16_t rw_act_num = 0;
 	uint64_t is_root;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	if (items == NULL)
 		return -1;
-	ret = flow_dv_validate_attributes(dev, attr, external, error);
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		tunnel = flow_items_to_tunnel(items);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_MATCH |
+				MLX5_FLOW_ACTION_DECAP;
+	} else if (is_flow_tunnel_steer_rule(dev, attr, items, actions)) {
+		tunnel = flow_actions_to_tunnel(actions);
+		action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+	} else {
+		tunnel = NULL;
+	}
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = flow_dv_validate_attributes(dev, tunnel, attr, grp_info, error);
 	if (ret < 0)
 		return ret;
 	is_root = (uint64_t)ret;
@@ -5309,6 +5338,15 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  RTE_FLOW_ERROR_TYPE_ITEM,
 						  NULL, "item not supported");
 		switch (type) {
+		case MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL:
+			if (items[0].type != (typeof(items[0].type))
+						MLX5_RTE_FLOW_ITEM_TYPE_TUNNEL)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ITEM,
+						NULL, "MLX5 private items "
+						"must be the first");
+			break;
 		case RTE_FLOW_ITEM_TYPE_VOID:
 			break;
 		case RTE_FLOW_ITEM_TYPE_PORT_ID:
@@ -5894,7 +5932,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			rw_act_num += MLX5_ACT_NUM_MDF_TTL;
 			break;
 		case RTE_FLOW_ACTION_TYPE_JUMP:
-			ret = flow_dv_validate_action_jump(actions,
+			ret = flow_dv_validate_action_jump(dev, tunnel, actions,
 							   action_flags,
 							   attr, external,
 							   error);
@@ -6003,6 +6041,17 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 			action_flags |= MLX5_FLOW_ACTION_SAMPLE;
 			++actions_n;
 			break;
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			if (actions[0].type != (typeof(actions[0].type))
+				MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET)
+				return rte_flow_error_set
+						(error, EINVAL,
+						RTE_FLOW_ERROR_TYPE_ACTION,
+						NULL, "MLX5 private action "
+						"must be the first");
+
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		default:
 			return rte_flow_error_set(error, ENOTSUP,
 						  RTE_FLOW_ERROR_TYPE_ACTION,
@@ -6010,6 +6059,54 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 						  "action not supported");
 		}
 	}
+	/*
+	 * Validate actions in flow rules
+	 * - Explicit decap action is prohibited by the tunnel offload API.
+	 * - Drop action in tunnel steer rule is prohibited by the API.
+	 * - Application cannot use MARK action because it's value can mask
+	 *   tunnel default miss nitification.
+	 * - JUMP in tunnel match rule has no support in current PMD
+	 *   implementation.
+	 * - TAG & META are reserved for future uses.
+	 */
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_SET) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_DECAP    |
+					    MLX5_FLOW_ACTION_MARK     |
+					    MLX5_FLOW_ACTION_SET_TAG  |
+					    MLX5_FLOW_ACTION_SET_META |
+					    MLX5_FLOW_ACTION_DROP;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set decap rule");
+		if (!(action_flags & MLX5_FLOW_ACTION_JUMP))
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel set decap rule must terminate "
+					"with JUMP");
+		if (!attr->ingress)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"tunnel flows for ingress traffic only");
+	}
+	if (action_flags & MLX5_FLOW_ACTION_TUNNEL_MATCH) {
+		uint64_t bad_actions_mask = MLX5_FLOW_ACTION_JUMP    |
+					    MLX5_FLOW_ACTION_MARK    |
+					    MLX5_FLOW_ACTION_SET_TAG |
+					    MLX5_FLOW_ACTION_SET_META;
+
+		if (action_flags & bad_actions_mask)
+			return rte_flow_error_set
+					(error, EINVAL,
+					RTE_FLOW_ERROR_TYPE_ACTION, NULL,
+					"Invalid RTE action in tunnel "
+					"set match rule");
+	}
 	/*
 	 * Validate the drop action mutual exclusion with other actions.
 	 * Drop action is mutually-exclusive with any other action, except for
@@ -7876,6 +7973,9 @@ static struct mlx5_flow_tbl_resource *
 flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 			 uint32_t table_id, uint8_t egress,
 			 uint8_t transfer,
+			 bool external,
+			 const struct mlx5_flow_tunnel *tunnel,
+			 uint32_t group_id,
 			 struct rte_flow_error *error)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
@@ -7912,6 +8012,9 @@ flow_dv_tbl_resource_get(struct rte_eth_dev *dev,
 		return NULL;
 	}
 	tbl_data->idx = idx;
+	tbl_data->tunnel = tunnel;
+	tbl_data->group_id = group_id;
+	tbl_data->external = external;
 	tbl = &tbl_data->tbl;
 	pos = &tbl_data->entry;
 	if (transfer)
@@ -7975,6 +8078,41 @@ flow_dv_tbl_resource_release(struct rte_eth_dev *dev,
 
 		mlx5_flow_os_destroy_flow_tbl(tbl->obj);
 		tbl->obj = NULL;
+		if (is_tunnel_offload_active(dev) && tbl_data->external) {
+			struct mlx5_hlist_entry *he;
+			struct mlx5_hlist *tunnel_grp_hash;
+			struct mlx5_flow_tunnel_hub *thub =
+							mlx5_tunnel_hub(dev);
+			union tunnel_tbl_key tunnel_key = {
+				.tunnel_id = tbl_data->tunnel ?
+						tbl_data->tunnel->tunnel_id : 0,
+				.group = tbl_data->group_id
+			};
+			union mlx5_flow_tbl_key table_key = {
+				.v64 = pos->key
+			};
+			uint32_t table_id = table_key.table_id;
+
+			tunnel_grp_hash = tbl_data->tunnel ?
+						tbl_data->tunnel->groups :
+						thub->groups;
+			he = mlx5_hlist_lookup(tunnel_grp_hash, tunnel_key.val);
+			if (he) {
+				struct tunnel_tbl_entry *tte;
+				tte = container_of(he, typeof(*tte), hash);
+				MLX5_ASSERT(tte->flow_table == table_id);
+				mlx5_hlist_remove(tunnel_grp_hash, he);
+				mlx5_free(tte);
+			}
+			mlx5_flow_id_release(mlx5_tunnel_hub(dev)->table_ids,
+					     tunnel_flow_tbl_to_id(table_id));
+			DRV_LOG(DEBUG,
+				"port %u release table_id %#x tunnel %u group %u",
+				dev->data->port_id, table_id,
+				tbl_data->tunnel ?
+				tbl_data->tunnel->tunnel_id : 0,
+				tbl_data->group_id);
+		}
 		/* remove the entry from the hash list and free memory. */
 		mlx5_hlist_remove(sh->flow_tbls, pos);
 		mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_JUMP],
@@ -8020,7 +8158,7 @@ flow_dv_matcher_register(struct rte_eth_dev *dev,
 	int ret;
 
 	tbl = flow_dv_tbl_resource_get(dev, key->table_id, key->direction,
-				       key->domain, error);
+				       key->domain, false, NULL, 0, error);
 	if (!tbl)
 		return -rte_errno;	/* No need to refill the error info */
 	tbl_data = container_of(tbl, struct mlx5_flow_tbl_data_entry, tbl);
@@ -8511,7 +8649,8 @@ flow_dv_sample_resource_register(struct rte_eth_dev *dev,
 	*cache_resource = *resource;
 	/* Create normal path table level */
 	tbl = flow_dv_tbl_resource_get(dev, next_ft_id,
-					attr->egress, attr->transfer, error);
+					attr->egress, attr->transfer,
+					dev_flow->external, NULL, 0, error);
 	if (!tbl) {
 		rte_flow_error_set(error, ENOMEM,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
@@ -9123,6 +9262,12 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 	int tmp_actions_n = 0;
 	uint32_t table;
 	int ret = 0;
+	const struct mlx5_flow_tunnel *tunnel;
+	struct flow_grp_info grp_info = {
+		.external = !!dev_flow->external,
+		.transfer = !!attr->transfer,
+		.fdb_def_rule = !!priv->fdb_def_rule,
+	};
 
 	memset(&mdest_res, 0, sizeof(struct mlx5_flow_dv_dest_array_resource));
 	memset(&sample_res, 0, sizeof(struct mlx5_flow_dv_sample_resource));
@@ -9130,8 +9275,17 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
 	/* update normal path action resource into last index of array */
 	sample_act = &mdest_res.sample_act[MLX5_MAX_DEST_NUM - 1];
-	ret = mlx5_flow_group_to_table(attr, dev_flow->external, attr->group,
-				       !!priv->fdb_def_rule, &table, error);
+	tunnel = is_flow_tunnel_match_rule(dev, attr, items, actions) ?
+		 flow_items_to_tunnel(items) :
+		 is_flow_tunnel_steer_rule(dev, attr, items, actions) ?
+		 flow_actions_to_tunnel(actions) :
+		 dev_flow->tunnel ? dev_flow->tunnel : NULL;
+	mhdr_res->ft_type = attr->egress ? MLX5DV_FLOW_TABLE_TYPE_NIC_TX :
+					   MLX5DV_FLOW_TABLE_TYPE_NIC_RX;
+	grp_info.std_tbl_fix = tunnel_use_standard_attr_group_translate
+				(dev, tunnel, attr, items, actions);
+	ret = mlx5_flow_group_to_table(dev, tunnel, attr->group, &table,
+				       grp_info, error);
 	if (ret)
 		return ret;
 	dev_flow->dv.group = table;
@@ -9141,6 +9295,45 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		priority = dev_conf->flow_prio - 1;
 	/* number of actions must be set to 0 in case of dirty stack. */
 	mhdr_res->actions_num = 0;
+	if (is_flow_tunnel_match_rule(dev, attr, items, actions)) {
+		/*
+		 * do not add decap action if match rule drops packet
+		 * HW rejects rules with decap & drop
+		 */
+		bool add_decap = true;
+		const struct rte_flow_action *ptr = actions;
+		struct mlx5_flow_tbl_resource *tbl;
+
+		for (; ptr->type != RTE_FLOW_ACTION_TYPE_END; ptr++) {
+			if (ptr->type == RTE_FLOW_ACTION_TYPE_DROP) {
+				add_decap = false;
+				break;
+			}
+		}
+		if (add_decap) {
+			if (flow_dv_create_action_l2_decap(dev, dev_flow,
+							   attr->transfer,
+							   error))
+				return -rte_errno;
+			dev_flow->dv.actions[actions_n++] =
+					dev_flow->dv.encap_decap->action;
+			action_flags |= MLX5_FLOW_ACTION_DECAP;
+		}
+		/*
+		 * bind table_id with <group, table> for tunnel match rule.
+		 * Tunnel set rule establishes that bind in JUMP action handler.
+		 * Required for scenario when application creates tunnel match
+		 * rule before tunnel set rule.
+		 */
+		tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+					       attr->transfer,
+					       !!dev_flow->external, tunnel,
+					       attr->group, error);
+		if (!tbl)
+			return rte_flow_error_set
+			       (error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
+			       actions, "cannot register tunnel group");
+	}
 	for (; !actions_end ; actions++) {
 		const struct rte_flow_action_queue *queue;
 		const struct rte_flow_action_rss *rss;
@@ -9161,6 +9354,9 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 						  actions,
 						  "action not supported");
 		switch (action_type) {
+		case MLX5_RTE_FLOW_ACTION_TYPE_TUNNEL_SET:
+			action_flags |= MLX5_FLOW_ACTION_TUNNEL_SET;
+			break;
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			break;
 		case RTE_FLOW_ACTION_TYPE_PORT_ID:
@@ -9404,18 +9600,18 @@ __flow_dv_translate(struct rte_eth_dev *dev,
 		case RTE_FLOW_ACTION_TYPE_JUMP:
 			jump_group = ((const struct rte_flow_action_jump *)
 							action->conf)->group;
-			if (dev_flow->external && jump_group <
-					MLX5_MAX_TABLES_EXTERNAL)
-				jump_group *= MLX5_FLOW_TABLE_FACTOR;
-			ret = mlx5_flow_group_to_table(attr, dev_flow->external,
+			grp_info.std_tbl_fix = 0;
+			ret = mlx5_flow_group_to_table(dev, tunnel,
 						       jump_group,
-						       !!priv->fdb_def_rule,
-						       &table, error);
+						       &table,
+						       grp_info, error);
 			if (ret)
 				return ret;
-			tbl = flow_dv_tbl_resource_get(dev, table,
-						       attr->egress,
-						       attr->transfer, error);
+			tbl = flow_dv_tbl_resource_get(dev, table, attr->egress,
+						       attr->transfer,
+						       !!dev_flow->external,
+						       tunnel, jump_group,
+						       error);
 			if (!tbl)
 				return rte_flow_error_set
 						(error, errno,
@@ -11439,7 +11635,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 		dtb = &mtb->ingress;
 	/* Create the meter table with METER level. */
 	dtb->tbl = flow_dv_tbl_resource_get(dev, MLX5_FLOW_TABLE_LEVEL_METER,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->tbl) {
 		DRV_LOG(ERR, "Failed to create meter policer table.");
 		return -1;
@@ -11447,7 +11644,8 @@ flow_dv_prepare_mtr_tables(struct rte_eth_dev *dev,
 	/* Create the meter suffix table with SUFFIX level. */
 	dtb->sfx_tbl = flow_dv_tbl_resource_get(dev,
 					    MLX5_FLOW_TABLE_LEVEL_SUFFIX,
-					    egress, transfer, &error);
+					    egress, transfer, false, NULL, 0,
+					    &error);
 	if (!dtb->sfx_tbl) {
 		DRV_LOG(ERR, "Failed to create meter suffix table.");
 		return -1;
@@ -11766,10 +11964,10 @@ mlx5_flow_dv_discover_counter_offset_support(struct rte_eth_dev *dev)
 	void *flow = NULL;
 	int i, ret = -1;
 
-	tbl = flow_dv_tbl_resource_get(dev, 0, 0, 0, NULL);
+	tbl = flow_dv_tbl_resource_get(dev, 0, 0, 0, false, NULL, 0, NULL);
 	if (!tbl)
 		goto err;
-	dest_tbl = flow_dv_tbl_resource_get(dev, 1, 0, 0, NULL);
+	dest_tbl = flow_dv_tbl_resource_get(dev, 1, 0, 0, false, NULL, 0, NULL);
 	if (!dest_tbl)
 		goto err;
 	dcs = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, 0x4);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v5] net/mlx5: implement tunnel offload API
  2020-10-25 14:08 ` [dpdk-dev] [PATCH v5] " Gregory Etelson
@ 2020-10-25 15:01   ` Raslan Darawsheh
  0 siblings, 0 replies; 95+ messages in thread
From: Raslan Darawsheh @ 2020-10-25 15:01 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: Matan Azrad, Eli Britstein, Oz Shlomo, Slava Ovsiienko, Shahaf Shuler

Hi,

> -----Original Message-----
> From: Gregory Etelson <getelson@nvidia.com>
> Sent: Sunday, October 25, 2020 4:08 PM
> To: dev@dpdk.org
> Cc: Gregory Etelson <getelson@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Eli Britstein
> <elibr@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>
> Subject: [PATCH v5] net/mlx5: implement tunnel offload API
> 
> Tunnel Offload API provides hardware independent, unified model
> to offload tunneled traffic. Key model elements are:
>  - apply matches to both outer and inner packet headers
>    during entire offload procedure;
>  - restore outer header of partially offloaded packet;
>  - model is implemented as a set of helper functions.
> 
> Implementation details:
> * tunnel_offload PMD parameter must be set to 1 to enable the feature.
> * application cannot use MARK and META flow actions whith tunnel.
> * offload JUMP action is restricted to steering tunnel rule only.
> 
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  doc/guides/nics/mlx5.rst         |   3 +
>  drivers/net/mlx5/linux/mlx5_os.c |  18 +
>  drivers/net/mlx5/mlx5.c          |   8 +-
>  drivers/net/mlx5/mlx5.h          |   3 +
>  drivers/net/mlx5/mlx5_defs.h     |   2 +
>  drivers/net/mlx5/mlx5_flow.c     | 681
> ++++++++++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_flow.h     | 171 +++++++-
>  drivers/net/mlx5/mlx5_flow_dv.c  | 254 ++++++++++--
>  8 files changed, 1086 insertions(+), 54 deletions(-)
> 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH] net/mlx5: tunnel offload code cleanup
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (15 preceding siblings ...)
  2020-10-25 14:08 ` [dpdk-dev] [PATCH v5] " Gregory Etelson
@ 2020-10-27 16:12 ` Gregory Etelson
  2020-10-27 16:29   ` Slava Ovsiienko
  2020-10-27 17:16   ` Raslan Darawsheh
  2020-10-28  4:58 ` [dpdk-dev] [PATCH] net/mlx5: fix tunnel flow destroy Gregory Etelson
  17 siblings, 2 replies; 95+ messages in thread
From: Gregory Etelson @ 2020-10-27 16:12 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Shahaf Shuler,
	Viacheslav Ovsiienko

Remove unused reference to rte_hash structure from PMD tunnel offload
code.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
 drivers/net/mlx5/mlx5.h      | 1 -
 drivers/net/mlx5/mlx5_flow.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 09026288fc..88bbd316f0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -648,7 +648,6 @@ struct mlx5_dev_ctx_shared {
 	/* UAR same-page access control required in 32bit implementations. */
 #endif
 	struct mlx5_hlist *flow_tbls;
-	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
 	struct mlx5_flow_tunnel_hub *tunnel_hub;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 02e19e83ae..082d886c74 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -19,7 +19,6 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 #include <rte_ip.h>
-#include <rte_hash.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_devx_cmds.h>
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH] net/mlx5: tunnel offload code cleanup
  2020-10-27 16:12 ` [dpdk-dev] [PATCH] net/mlx5: tunnel offload code cleanup Gregory Etelson
@ 2020-10-27 16:29   ` Slava Ovsiienko
  2020-10-27 17:16   ` Raslan Darawsheh
  1 sibling, 0 replies; 95+ messages in thread
From: Slava Ovsiienko @ 2020-10-27 16:29 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: Matan Azrad, Raslan Darawsheh, Eli Britstein, Oz Shlomo, Shahaf Shuler

> -----Original Message-----
> From: Gregory Etelson <getelson@nvidia.com>
> Sent: Tuesday, October 27, 2020 18:12
> To: dev@dpdk.org
> Cc: Gregory Etelson <getelson@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Eli Britstein
> <elibr@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>; Shahaf Shuler
> <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: [PATCH] net/mlx5: tunnel offload code cleanup
> 
> Remove unused reference to rte_hash structure from PMD tunnel offload
> code.
> 
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH] net/mlx5: tunnel offload code cleanup
  2020-10-27 16:12 ` [dpdk-dev] [PATCH] net/mlx5: tunnel offload code cleanup Gregory Etelson
  2020-10-27 16:29   ` Slava Ovsiienko
@ 2020-10-27 17:16   ` Raslan Darawsheh
  2020-10-28 12:33     ` Andrew Rybchenko
  1 sibling, 1 reply; 95+ messages in thread
From: Raslan Darawsheh @ 2020-10-27 17:16 UTC (permalink / raw)
  To: Gregory Etelson, dev, ferruh.yigit
  Cc: Matan Azrad, Eli Britstein, Oz Shlomo, Shahaf Shuler, Slava Ovsiienko

Hi,

> -----Original Message-----
> From: Gregory Etelson <getelson@nvidia.com>
> Sent: Tuesday, October 27, 2020 6:12 PM
> To: dev@dpdk.org
> Cc: Gregory Etelson <getelson@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Eli Britstein
> <elibr@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>; Shahaf Shuler
> <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: [PATCH] net/mlx5: tunnel offload code cleanup
> 
> Remove unused reference to rte_hash structure from PMD tunnel offload
> code.
> 
Added missing fixes line:
Fixes: dc67aa65c698 ("net/mlx5: implement tunnel offload API")

> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> ---
>  drivers/net/mlx5/mlx5.h      | 1 -
>  drivers/net/mlx5/mlx5_flow.c | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
> index 09026288fc..88bbd316f0 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -648,7 +648,6 @@ struct mlx5_dev_ctx_shared {
>  	/* UAR same-page access control required in 32bit implementations.
> */
>  #endif
>  	struct mlx5_hlist *flow_tbls;
> -	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
>  	struct mlx5_flow_tunnel_hub *tunnel_hub;
>  	/* Direct Rules tables for FDB, NIC TX+RX */
>  	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 02e19e83ae..082d886c74 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -19,7 +19,6 @@
>  #include <rte_flow_driver.h>
>  #include <rte_malloc.h>
>  #include <rte_ip.h>
> -#include <rte_hash.h>
> 
>  #include <mlx5_glue.h>
>  #include <mlx5_devx_cmds.h>
> --
> 2.28.0

Patch applied to next-net-mlx,
@ferruh.yigit@intel.com,
Can you kindly squash it into relevant commit in next-net ?

Kindest regards
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [dpdk-dev] [PATCH] net/mlx5: fix tunnel flow destroy
  2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
                   ` (16 preceding siblings ...)
  2020-10-27 16:12 ` [dpdk-dev] [PATCH] net/mlx5: tunnel offload code cleanup Gregory Etelson
@ 2020-10-28  4:58 ` Gregory Etelson
  2020-11-02 16:27   ` Raslan Darawsheh
  17 siblings, 1 reply; 95+ messages in thread
From: Gregory Etelson @ 2020-10-28  4:58 UTC (permalink / raw)
  To: dev
  Cc: getelson, matan, rasland, elibr, ozsh, Shahaf Shuler,
	Viacheslav Ovsiienko

Flow destructor tired to access flow related resources after the
flow object memory was already released and crashed dpdk process.

The patch moves flow memory release to the end of destructor.

Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 949b9ced9b..3128da1b47 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -5935,7 +5935,6 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 			mlx5_free(priv_fdir_flow);
 		}
 	}
-	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
 	if (flow->tunnel) {
 		struct mlx5_flow_tunnel *tunnel;
 		tunnel = mlx5_find_tunnel_id(dev, flow->tunnel_id);
@@ -5943,6 +5942,7 @@ flow_list_destroy(struct rte_eth_dev *dev, uint32_t *list,
 		if (!__atomic_sub_fetch(&tunnel->refctn, 1, __ATOMIC_RELAXED))
 			mlx5_flow_tunnel_free(dev, tunnel);
 	}
+	mlx5_ipool_free(priv->sh->ipool[MLX5_IPOOL_RTE_FLOW], flow_idx);
 }
 
 /**
-- 
2.28.0


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH] net/mlx5: tunnel offload code cleanup
  2020-10-27 17:16   ` Raslan Darawsheh
@ 2020-10-28 12:33     ` Andrew Rybchenko
  0 siblings, 0 replies; 95+ messages in thread
From: Andrew Rybchenko @ 2020-10-28 12:33 UTC (permalink / raw)
  To: Raslan Darawsheh, Gregory Etelson, dev, ferruh.yigit
  Cc: Matan Azrad, Eli Britstein, Oz Shlomo, Shahaf Shuler, Slava Ovsiienko

On 10/27/20 8:16 PM, Raslan Darawsheh wrote:
> Hi,
> 
>> -----Original Message-----
>> From: Gregory Etelson <getelson@nvidia.com>
>> Sent: Tuesday, October 27, 2020 6:12 PM
>> To: dev@dpdk.org
>> Cc: Gregory Etelson <getelson@nvidia.com>; Matan Azrad
>> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Eli Britstein
>> <elibr@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>; Shahaf Shuler
>> <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
>> Subject: [PATCH] net/mlx5: tunnel offload code cleanup
>>
>> Remove unused reference to rte_hash structure from PMD tunnel offload
>> code.
>>
> Added missing fixes line:
> Fixes: dc67aa65c698 ("net/mlx5: implement tunnel offload API")
> 
>> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
>> ---
>>  drivers/net/mlx5/mlx5.h      | 1 -
>>  drivers/net/mlx5/mlx5_flow.c | 1 -
>>  2 files changed, 2 deletions(-)
>>
>> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
>> index 09026288fc..88bbd316f0 100644
>> --- a/drivers/net/mlx5/mlx5.h
>> +++ b/drivers/net/mlx5/mlx5.h
>> @@ -648,7 +648,6 @@ struct mlx5_dev_ctx_shared {
>>  	/* UAR same-page access control required in 32bit implementations.
>> */
>>  #endif
>>  	struct mlx5_hlist *flow_tbls;
>> -	struct rte_hash *flow_tbl_map; /* app group-to-flow table map */
>>  	struct mlx5_flow_tunnel_hub *tunnel_hub;
>>  	/* Direct Rules tables for FDB, NIC TX+RX */
>>  	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
>> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
>> index 02e19e83ae..082d886c74 100644
>> --- a/drivers/net/mlx5/mlx5_flow.c
>> +++ b/drivers/net/mlx5/mlx5_flow.c
>> @@ -19,7 +19,6 @@
>>  #include <rte_flow_driver.h>
>>  #include <rte_malloc.h>
>>  #include <rte_ip.h>
>> -#include <rte_hash.h>
>>
>>  #include <mlx5_glue.h>
>>  #include <mlx5_devx_cmds.h>
>> --
>> 2.28.0
> 
> Patch applied to next-net-mlx,
> @ferruh.yigit@intel.com,
> Can you kindly squash it into relevant commit in next-net ?

Done, squashed it into relevant commit in next-net/main, thanks.


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH] net/mlx5: fix tunnel flow destroy
  2020-10-28  4:58 ` [dpdk-dev] [PATCH] net/mlx5: fix tunnel flow destroy Gregory Etelson
@ 2020-11-02 16:27   ` Raslan Darawsheh
  0 siblings, 0 replies; 95+ messages in thread
From: Raslan Darawsheh @ 2020-11-02 16:27 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: Matan Azrad, Eli Britstein, Oz Shlomo, Shahaf Shuler, Slava Ovsiienko

Hi,

> -----Original Message-----
> From: Gregory Etelson <getelson@nvidia.com>
> Sent: Wednesday, October 28, 2020 6:58 AM
> To: dev@dpdk.org
> Cc: Gregory Etelson <getelson@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Eli Britstein
> <elibr@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>; Shahaf Shuler
> <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: [PATCH] net/mlx5: fix tunnel flow destroy
> 
> Flow destructor tired to access flow related resources after the
> flow object memory was already released and crashed dpdk process.
> 
> The patch moves flow memory release to the end of destructor.
> 
Added missing:
Fixes: 424ea8d8fe55 ("net/mlx5: implement tunnel offload")
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>  drivers/net/mlx5/mlx5_flow.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model
  2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model Gregory Etelson
  2020-10-16 15:41     ` Kinsella, Ray
@ 2021-03-02  9:22     ` Ivan Malov
  2021-03-02  9:42       ` Thomas Monjalon
  2021-03-08 14:01       ` Gregory Etelson
  1 sibling, 2 replies; 95+ messages in thread
From: Ivan Malov @ 2021-03-02  9:22 UTC (permalink / raw)
  To: Gregory Etelson, dev
  Cc: matan, rasland, elibr, ozsh, asafp, Eli Britstein, Ori Kam,
	Viacheslav Ovsiienko, Ray Kinsella, Neil Horman, Thomas Monjalon,
	Ferruh Yigit, Andrew Rybchenko

Hi Gregory, Eli,

On 16/10/2020 15:51, Gregory Etelson wrote:
 > Applications wishing to offload tunneled traffic are required to use
 > the rte_flow primitives, such as group, meta, mark, tag, and others to
 > model their high-level objects.  The hardware model design for

As far as I understand, the model is an abstraction of sorts, and the 
quoted lines provide examples of primitives which applications would 
have to use if the model did not exist.

Looking at the implementation, I don't quite discern any means to let 
the application query PMD-specific values of "rte_flow_attr". In 
example, the PMD may need to enforce "transfer=1" both for the "tunnel 
set" rule and for "tunnel match" one. What if "group" and "priority" 
values also need to be implicitly controlled by the PMD for the tunnel 
offload rules?

Have you considered adding an abstraction for "rte_flow_attr" in terms 
of this model?

 > The first VXLAN packet that arrives matches the rule in group 0 and
 > jumps to group 3.  In group 3 the packet will miss since there is no

This example considers jumping from group 0 to group 3. First of all, 
it's unclear whether this model intentionally lets the application 
choose group values freely (see my question above) or simply lacks an 
interface to let the application use values enforced by the PMD (if 
any). Secondly, given the fact that existing description of 
"rte_flow_attr" does not shed any light on how groups are ordered by 
default, when no JUMPs are configured (it only explains how priority 
levels are ordered within the given group but not how groups are 
ordered), it's unclear whether the model intentionally permits the 
application to jump between arbitrary groups (in example, from 0 to 3) 
and not necessarily between, say, 0 to 1. More to that, it's unclear 
whether the model lets the application jump from, say, group 0 to the 
very same group, 0 ("recirculation") or not. Or is the "recirculation" 
is in fact the main scenario in the model? Could you please elaborate on 
the model's expectations of groups?

Thank you.

-- 
Ivan M

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model
  2021-03-02  9:22     ` Ivan Malov
@ 2021-03-02  9:42       ` Thomas Monjalon
  2021-03-03 14:03         ` Ivan Malov
  2021-03-08 14:01       ` Gregory Etelson
  1 sibling, 1 reply; 95+ messages in thread
From: Thomas Monjalon @ 2021-03-02  9:42 UTC (permalink / raw)
  To: Gregory Etelson, Ivan Malov
  Cc: dev, matan, rasland, elibr, ozsh, asafp, Eli Britstein, Ori Kam,
	Viacheslav Ovsiienko, Ray Kinsella, Neil Horman, Ferruh Yigit,
	Andrew Rybchenko

02/03/2021 10:22, Ivan Malov:
> Hi Gregory, Eli,
> 
> On 16/10/2020 15:51, Gregory Etelson wrote:
>  > Applications wishing to offload tunneled traffic are required to use
>  > the rte_flow primitives, such as group, meta, mark, tag, and others to
>  > model their high-level objects.  The hardware model design for
> 
> As far as I understand, the model is an abstraction of sorts, and the 
> quoted lines provide examples of primitives which applications would 
> have to use if the model did not exist.
> 
> Looking at the implementation, I don't quite discern any means to let 
> the application query PMD-specific values of "rte_flow_attr". In 
> example, the PMD may need to enforce "transfer=1" both for the "tunnel 
> set" rule and for "tunnel match" one. What if "group" and "priority" 
> values also need to be implicitly controlled by the PMD for the tunnel 
> offload rules?
> 
> Have you considered adding an abstraction for "rte_flow_attr" in terms 
> of this model?
> 
>  > The first VXLAN packet that arrives matches the rule in group 0 and
>  > jumps to group 3.  In group 3 the packet will miss since there is no
> 
> This example considers jumping from group 0 to group 3. First of all, 
> it's unclear whether this model intentionally lets the application 
> choose group values freely (see my question above) or simply lacks an 
> interface to let the application use values enforced by the PMD (if 
> any). Secondly, given the fact that existing description of 
> "rte_flow_attr" does not shed any light on how groups are ordered by 
> default, when no JUMPs are configured (it only explains how priority 
> levels are ordered within the given group but not how groups are 
> ordered), it's unclear whether the model intentionally permits the 
> application to jump between arbitrary groups (in example, from 0 to 3) 
> and not necessarily between, say, 0 to 1. More to that, it's unclear 
> whether the model lets the application jump from, say, group 0 to the 
> very same group, 0 ("recirculation") or not. Or is the "recirculation" 
> is in fact the main scenario in the model? Could you please elaborate on 
> the model's expectations of groups?
> 
> Thank you.

Given the questions, I think the discussion should be concluded
(when done) with a patch updating the API description.
Thanks for the questions and clarifications to come :)





^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model
  2021-03-02  9:42       ` Thomas Monjalon
@ 2021-03-03 14:03         ` Ivan Malov
  2021-03-04  6:35           ` Eli Britstein
  0 siblings, 1 reply; 95+ messages in thread
From: Ivan Malov @ 2021-03-03 14:03 UTC (permalink / raw)
  To: Thomas Monjalon, Gregory Etelson
  Cc: dev, matan, rasland, elibr, ozsh, asafp, Eli Britstein, Ori Kam,
	Viacheslav Ovsiienko, Ray Kinsella, Neil Horman, Ferruh Yigit,
	Andrew Rybchenko

Hi,

Could someone please also clarify the meaning (from PMD standpoint and 
from application's standpoint) of the 64-bit field "tun_id" in 
"rte_flow_tunnel"? The current comment reads: "Tunnel identification", 
which is quite ambiguous, not to mention the fact that it simply expands 
the field's name. Does this ID have anything to do with VXLAN's VNI 
(which is 24-bit field)?

As far as I can learn from the testpmd patch series, this field gets 
printed only if it's non-zero. Hmm. A special value of sorts? Maybe 
clarify this, too?

Thank you.

On 02/03/2021 12:42, Thomas Monjalon wrote:
> 02/03/2021 10:22, Ivan Malov:
>> Hi Gregory, Eli,
>>
>> On 16/10/2020 15:51, Gregory Etelson wrote:
>>   > Applications wishing to offload tunneled traffic are required to use
>>   > the rte_flow primitives, such as group, meta, mark, tag, and others to
>>   > model their high-level objects.  The hardware model design for
>>
>> As far as I understand, the model is an abstraction of sorts, and the
>> quoted lines provide examples of primitives which applications would
>> have to use if the model did not exist.
>>
>> Looking at the implementation, I don't quite discern any means to let
>> the application query PMD-specific values of "rte_flow_attr". In
>> example, the PMD may need to enforce "transfer=1" both for the "tunnel
>> set" rule and for "tunnel match" one. What if "group" and "priority"
>> values also need to be implicitly controlled by the PMD for the tunnel
>> offload rules?
>>
>> Have you considered adding an abstraction for "rte_flow_attr" in terms
>> of this model?
>>
>>   > The first VXLAN packet that arrives matches the rule in group 0 and
>>   > jumps to group 3.  In group 3 the packet will miss since there is no
>>
>> This example considers jumping from group 0 to group 3. First of all,
>> it's unclear whether this model intentionally lets the application
>> choose group values freely (see my question above) or simply lacks an
>> interface to let the application use values enforced by the PMD (if
>> any). Secondly, given the fact that existing description of
>> "rte_flow_attr" does not shed any light on how groups are ordered by
>> default, when no JUMPs are configured (it only explains how priority
>> levels are ordered within the given group but not how groups are
>> ordered), it's unclear whether the model intentionally permits the
>> application to jump between arbitrary groups (in example, from 0 to 3)
>> and not necessarily between, say, 0 to 1. More to that, it's unclear
>> whether the model lets the application jump from, say, group 0 to the
>> very same group, 0 ("recirculation") or not. Or is the "recirculation"
>> is in fact the main scenario in the model? Could you please elaborate on
>> the model's expectations of groups?
>>
>> Thank you.
> 
> Given the questions, I think the discussion should be concluded
> (when done) with a patch updating the API description.
> Thanks for the questions and clarifications to come :)
> 
> 
> 

-- 
Ivan M

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model
  2021-03-03 14:03         ` Ivan Malov
@ 2021-03-04  6:35           ` Eli Britstein
  0 siblings, 0 replies; 95+ messages in thread
From: Eli Britstein @ 2021-03-04  6:35 UTC (permalink / raw)
  To: Ivan Malov, Thomas Monjalon, Gregory Etelson
  Cc: dev, matan, rasland, ozsh, asafp, Eli Britstein, Ori Kam,
	Viacheslav Ovsiienko, Ray Kinsella, Neil Horman, Ferruh Yigit,
	Andrew Rybchenko


On 3/3/2021 4:03 PM, Ivan Malov wrote:
> Hi,
>
> Could someone please also clarify the meaning (from PMD standpoint and 
> from application's standpoint) of the 64-bit field "tun_id" in 
> "rte_flow_tunnel"? The current comment reads: "Tunnel identification", 
> which is quite ambiguous, not to mention the fact that it simply 
> expands the field's name. Does this ID have anything to do with 
> VXLAN's VNI (which is 24-bit field)?

rte_flow_tunnel struct is the tunnel details communicated between the 
application and PMD when applying flows, and from the PMD to the 
application upon miss.

The fields in it are porting of the kernel's 'struct ip_tunnel_key'. See 
https://elixir.bootlin.com/linux/latest/source/include/net/ip_tunnels.h#L40

Regarding 'tun_id' - yes. In case of VXLAN tunnel it stores the VNI.

>
> As far as I can learn from the testpmd patch series, this field gets 
> printed only if it's non-zero. Hmm. A special value of sorts? Maybe 
> clarify this, too?
>
> Thank you.
>
> On 02/03/2021 12:42, Thomas Monjalon wrote:
>> 02/03/2021 10:22, Ivan Malov:
>>> Hi Gregory, Eli,
>>>
>>> On 16/10/2020 15:51, Gregory Etelson wrote:
>>>   > Applications wishing to offload tunneled traffic are required to 
>>> use
>>>   > the rte_flow primitives, such as group, meta, mark, tag, and 
>>> others to
>>>   > model their high-level objects.  The hardware model design for
>>>
>>> As far as I understand, the model is an abstraction of sorts, and the
>>> quoted lines provide examples of primitives which applications would
>>> have to use if the model did not exist.
>>>
>>> Looking at the implementation, I don't quite discern any means to let
>>> the application query PMD-specific values of "rte_flow_attr". In
>>> example, the PMD may need to enforce "transfer=1" both for the "tunnel
>>> set" rule and for "tunnel match" one. What if "group" and "priority"
>>> values also need to be implicitly controlled by the PMD for the tunnel
>>> offload rules?
>>>
>>> Have you considered adding an abstraction for "rte_flow_attr" in terms
>>> of this model?
>>>
>>>   > The first VXLAN packet that arrives matches the rule in group 0 and
>>>   > jumps to group 3.  In group 3 the packet will miss since there 
>>> is no
>>>
>>> This example considers jumping from group 0 to group 3. First of all,
>>> it's unclear whether this model intentionally lets the application
>>> choose group values freely (see my question above) or simply lacks an
>>> interface to let the application use values enforced by the PMD (if
>>> any). Secondly, given the fact that existing description of
>>> "rte_flow_attr" does not shed any light on how groups are ordered by
>>> default, when no JUMPs are configured (it only explains how priority
>>> levels are ordered within the given group but not how groups are
>>> ordered), it's unclear whether the model intentionally permits the
>>> application to jump between arbitrary groups (in example, from 0 to 3)
>>> and not necessarily between, say, 0 to 1. More to that, it's unclear
>>> whether the model lets the application jump from, say, group 0 to the
>>> very same group, 0 ("recirculation") or not. Or is the "recirculation"
>>> is in fact the main scenario in the model? Could you please 
>>> elaborate on
>>> the model's expectations of groups?
>>>
>>> Thank you.
>>
>> Given the questions, I think the discussion should be concluded
>> (when done) with a patch updating the API description.
>> Thanks for the questions and clarifications to come :)
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model
  2021-03-02  9:22     ` Ivan Malov
  2021-03-02  9:42       ` Thomas Monjalon
@ 2021-03-08 14:01       ` Gregory Etelson
  1 sibling, 0 replies; 95+ messages in thread
From: Gregory Etelson @ 2021-03-08 14:01 UTC (permalink / raw)
  To: Ivan Malov, dev
  Cc: Matan Azrad, Raslan Darawsheh, Eli Britstein, Oz Shlomo,
	Asaf Penso, Eli Britstein, Ori Kam, Slava Ovsiienko,
	Ray Kinsella, Neil Horman, NBU-Contact-Thomas Monjalon,
	Ferruh Yigit, Andrew Rybchenko

Hello,

Please find my answers below.

Regards,
Gregory

> External email: Use caution opening links or attachments
> 
> 
> Hi Gregory, Eli,
> 
> On 16/10/2020 15:51, Gregory Etelson wrote:
>  > Applications wishing to offload tunneled traffic are required to use  > the
> rte_flow primitives, such as group, meta, mark, tag, and others to  > model
> their high-level objects.  The hardware model design for
> 
> As far as I understand, the model is an abstraction of sorts, and the quoted
> lines provide examples of primitives which applications would have to use if
> the model did not exist.
> 
> Looking at the implementation, I don't quite discern any means to let the
> application query PMD-specific values of "rte_flow_attr". In example, the
> PMD may need to enforce "transfer=1" both for the "tunnel set" rule and
> for "tunnel match" one. What if "group" and "priority"
> values also need to be implicitly controlled by the PMD for the tunnel
> offload rules?
> 
> Have you considered adding an abstraction for "rte_flow_attr" in terms of
> this model?
> 


Tunnel offload introduces application model to recover packet outer headers
after partial miss. We were able to build the model with addition of
rte_flow_tunnel_decap_set & rte_flow_tunnel_match functions only. Please note, 
that except from returning private flow rule elements, these functions can provide
internal PMD hints on tunnel flow rules processing. PMD can use these hints with
addition of private PMD rule elements to modify any part of incoming tunnel flow
rule - attributes, items or actions, to implement the model according
to hardware specifications.

>  > The first VXLAN packet that arrives matches the rule in group 0 and  >
> jumps to group 3.  In group 3 the packet will miss since there is no
> 
> This example considers jumping from group 0 to group 3. First of all, it's
> unclear whether this model intentionally lets the application choose group
> values freely (see my question above) or simply lacks an interface to let the
> application use values enforced by the PMD (if any). Secondly, given the fact
> that existing description of "rte_flow_attr" does not shed any light on how
> groups are ordered by default, when no JUMPs are configured (it only
> explains how priority levels are ordered within the given group but not how
> groups are ordered), it's unclear whether the model intentionally permits
> the application to jump between arbitrary groups (in example, from 0 to 3)
> and not necessarily between, say, 0 to 1. More to that, it's unclear whether
> the model lets the application jump from, say, group 0 to the very same
> group, 0 ("recirculation") or not. Or is the "recirculation"
> is in fact the main scenario in the model? Could you please elaborate on the
> model's expectations of groups?
> 

The model has no limitation on application group selection.
Application can use any group value in jump action destination,
including the same one as was used for flow creation.

> Thank you.
> 
> --
> Ivan M

^ permalink raw reply	[flat|nested] 95+ messages in thread

end of thread, other threads:[~2021-03-08 14:01 UTC | newest]

Thread overview: 95+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25 16:03 [dpdk-dev] [PATCH 0/2] ethdev: tunnel offload model Gregory Etelson
2020-06-25 16:03 ` [dpdk-dev] [PATCH 1/2] ethdev: allow negative values in flow rule types Gregory Etelson
2020-07-05 13:34   ` Andrew Rybchenko
2020-08-19 14:33     ` Gregory Etelson
2020-06-25 16:03 ` [dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model Gregory Etelson
     [not found]   ` <DB8PR05MB6761ED02BCD188771BDCDE64A86F0@DB8PR05MB6761.eurprd05.prod.outlook.com>
     [not found]     ` <38d3513f-1261-0fbc-7c56-f83ced61f97a@ashroe.eu>
2020-07-01  6:52       ` Gregory Etelson
2020-07-13  8:21         ` Thomas Monjalon
2020-07-13 13:23           ` Gregory Etelson
2020-07-05 14:50   ` Andrew Rybchenko
2020-08-19 14:30     ` Gregory Etelson
2020-07-05 13:39 ` [dpdk-dev] [PATCH 0/2] " Andrew Rybchenko
2020-09-08 20:15 ` [dpdk-dev] [PATCH v2 0/4] Tunnel Offload API Gregory Etelson
2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
2020-09-15  4:36     ` Ajit Khaparde
2020-09-15  8:46       ` Andrew Rybchenko
2020-09-15 10:27         ` Gregory Etelson
2020-09-16 17:21           ` Gregory Etelson
2020-09-17  6:49             ` Andrew Rybchenko
2020-09-17  7:47               ` Ori Kam
2020-09-17 15:15                 ` Andrew Rybchenko
2020-09-17  7:56               ` Gregory Etelson
2020-09-17 15:18                 ` Andrew Rybchenko
2020-09-15  8:45     ` Andrew Rybchenko
2020-09-15 16:17       ` Gregory Etelson
2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 2/4] ethdev: tunnel offload model Gregory Etelson
2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 3/4] net/mlx5: implement tunnel offload API Gregory Etelson
2020-09-08 20:15   ` [dpdk-dev] [PATCH v2 4/4] app/testpmd: support " Gregory Etelson
2020-09-15  4:47     ` Ajit Khaparde
2020-09-15 10:44       ` Gregory Etelson
2020-09-30  9:18 ` [dpdk-dev] [PATCH v3 0/4] Tunnel Offload API Gregory Etelson
2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
2020-10-04  5:40     ` Ajit Khaparde
2020-10-04  9:24       ` Gregory Etelson
2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 2/4] ethdev: tunnel offload model Gregory Etelson
2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 3/4] net/mlx5: implement tunnel offload API Gregory Etelson
2020-09-30  9:18   ` [dpdk-dev] [PATCH v3 4/4] app/testpmd: add commands for " Gregory Etelson
2020-10-01  5:32     ` Ajit Khaparde
2020-10-01  9:05       ` Gregory Etelson
2020-10-04  5:40         ` Ajit Khaparde
2020-10-04  9:29           ` Gregory Etelson
2020-10-04 13:50 ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Gregory Etelson
2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 1/4] ethdev: allow negative values in flow rule types Gregory Etelson
2020-10-14 23:40     ` Thomas Monjalon
2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 2/4] ethdev: tunnel offload model Gregory Etelson
2020-10-06  9:47     ` Sriharsha Basavapatna
2020-10-07 12:36       ` Gregory Etelson
2020-10-14 17:23         ` Ferruh Yigit
2020-10-16  9:15           ` Gregory Etelson
2020-10-14 23:55     ` Thomas Monjalon
2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 3/4] net/mlx5: implement tunnel offload API Gregory Etelson
2020-10-04 13:50   ` [dpdk-dev] [PATCH v4 4/4] app/testpmd: add commands for " Gregory Etelson
2020-10-04 13:59     ` Ori Kam
2020-10-14 17:25   ` [dpdk-dev] [PATCH v4 0/4] Tunnel Offload API Ferruh Yigit
2020-10-15 12:41 ` [dpdk-dev] [PATCH v5 0/3] " Gregory Etelson
2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 2/3] ethdev: tunnel offload model Gregory Etelson
2020-10-15 12:41   ` [dpdk-dev] [PATCH v5 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
2020-10-15 22:47   ` [dpdk-dev] [PATCH v5 0/3] Tunnel Offload API Ferruh Yigit
2020-10-16  8:55 ` [dpdk-dev] [PATCH v6 " Gregory Etelson
2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 2/3] ethdev: tunnel offload model Gregory Etelson
2020-10-16  8:55   ` [dpdk-dev] [PATCH v6 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
2020-10-16 10:33 ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Gregory Etelson
2020-10-16 10:33   ` [dpdk-dev] [PATCH v7 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
2020-10-16 10:33   ` [dpdk-dev] [PATCH v7 2/3] ethdev: tunnel offload model Gregory Etelson
2020-10-16 10:34   ` [dpdk-dev] [PATCH v7 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
2020-10-16 12:10   ` [dpdk-dev] [PATCH v7 0/3] Tunnel Offload API Ferruh Yigit
2020-10-16 12:51 ` [dpdk-dev] [PATCH v8 " Gregory Etelson
2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 1/3] ethdev: allow negative values in flow rule types Gregory Etelson
2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 2/3] ethdev: tunnel offload model Gregory Etelson
2020-10-16 15:41     ` Kinsella, Ray
2021-03-02  9:22     ` Ivan Malov
2021-03-02  9:42       ` Thomas Monjalon
2021-03-03 14:03         ` Ivan Malov
2021-03-04  6:35           ` Eli Britstein
2021-03-08 14:01       ` Gregory Etelson
2020-10-16 12:51   ` [dpdk-dev] [PATCH v8 3/3] app/testpmd: add commands for tunnel offload API Gregory Etelson
2020-10-16 13:19   ` [dpdk-dev] [PATCH v8 0/3] Tunnel Offload API Ferruh Yigit
2020-10-16 14:20     ` Ferruh Yigit
2020-10-18 12:15 ` [dpdk-dev] [PATCH] ethdev: rename tunnel offload callbacks Gregory Etelson
2020-10-19  8:31   ` Ferruh Yigit
2020-10-19  9:56     ` Kinsella, Ray
2020-10-19 21:29       ` Thomas Monjalon
2020-10-21  9:22 ` [dpdk-dev] [PATCH] net/mlx5: implement tunnel offload API Gregory Etelson
2020-10-22 16:00 ` [dpdk-dev] [PATCH v2] " Gregory Etelson
2020-10-23 13:49 ` [dpdk-dev] [PATCH v3] " Gregory Etelson
2020-10-23 13:57 ` [dpdk-dev] [PATCH v4] " Gregory Etelson
2020-10-25 14:08 ` [dpdk-dev] [PATCH v5] " Gregory Etelson
2020-10-25 15:01   ` Raslan Darawsheh
2020-10-27 16:12 ` [dpdk-dev] [PATCH] net/mlx5: tunnel offload code cleanup Gregory Etelson
2020-10-27 16:29   ` Slava Ovsiienko
2020-10-27 17:16   ` Raslan Darawsheh
2020-10-28 12:33     ` Andrew Rybchenko
2020-10-28  4:58 ` [dpdk-dev] [PATCH] net/mlx5: fix tunnel flow destroy Gregory Etelson
2020-11-02 16:27   ` Raslan Darawsheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).