DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC] - Offloading tunnel ports
@ 2020-06-09 15:07 Oz Shlomo
  2020-06-24 17:09 ` Thomas Monjalon
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Oz Shlomo @ 2020-06-09 15:07 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Ori Kam, Eli Britstein, Sriharsha Basavapatna,
	Hemal Shah

Rte_flow API provides the building blocks for vendor agnostic flow
classification offloads.  The rte_flow match and action primitives are fine
grained, thus enabling DPDK applications the flexibility to offload network
stacks and complex pipelines.

Applications wishing to offload complex data structures (e.g. tunnel virtual
ports) are required to use the rte_flow primitives, such as group, meta, mark,
tag and others to model their high level objects.

The hardware model design for high level software objects is not trivial.
Furthermore, an optimal design is often vendor specific.

The goal of this RFC is to provide applications with the hardware offload
model for common high level software objects which is optimal in regards
to the underlying hardware.

Tunnel ports are the first of such objects.

Tunnel ports
------------
Ingress processing of tunneled traffic requires the classification
of the tunnel type followed by a decap action.

In software, once a packet is decapsulated the in_port field is changed
to a virtual port representing the tunnel type. The outer header fields
are stored as packet metadata members and may be matched by proceeding
flows.

Openvswitch, for example, uses two flows:
1. classification flow - setting the virtual port representing the tunnel type
For example: match on udp port 4789 actions=tnl_pop(vxlan_vport)
2. steering flow according to outer and inner header matches
match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X
The benefits of multi-flow tables are described in [1].

Offloading tunnel ports
-----------------------
Tunnel ports introduce a new stateless field that can be matched on.
Currently the rte_flow library provides an API to encap, decap and match
on tunnel headers. However, there is no rte_flow primitive to set and
match tunnel virtual ports.

There are several possible hardware models for offloading virtual tunnel port
flows including, but not limited to, the following:
1. Setting the virtual port on a hw register using the rte_flow_action_mark/
rte_flow_action_tag/rte_flow_set_meta objects.
2. Mapping a virtual port to an rte_flow group
3. Avoiding the need to match on transient objects by merging multi-table
flows to a single rte_flow rule.

Every approach has its pros and cons.
The preferred approach should take into account the entire system architecture
and is very often vendor specific.

The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed
to provide a common, vendor agnostic, API for setting the virtual port value.
The helper API enables PMD implementations to return vendor specific combination of
rte_flow actions realizing the vendor's hardware model for setting a tunnel port.
Applications may append the list of actions returned from the helper function when
creating an rte_flow rule in hardware.

Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for
multiple hardware implementations to return a list of fte_flow items.

Miss handling
-------------
Packets going through multiple rte_flow groups are exposed to hw misses due to
partial packet processing. In such cases, the software should continue the
packet's processing from the point where the hardware missed.

We propose a generic rte_flow_restore structure providing the state that was
stored in hardware when the packet missed.

Currently, the structure will provide the tunnel state of the packet that
missed, namely:
1. The group id that missed
2. The tunnel port that missed
3. Tunnel information that was stored in memory (due to decap action).
In the future, we may add additional fields as more state may be stored in
the device memory (e.g. ct_state).

Applications may query the state via a new rte_flow_get_restore_info(mbuf) API,
thus allowing a vendor specific implementation.

API draft is provided below

---
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index b0e4199192..49c871fc46 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -3324,6 +3324,193 @@ int
  rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
  			uint32_t nb_contexts, struct rte_flow_error *error);

+/* Tunnel information. */
+__rte_experimental
+struct rte_flow_ip_tunnel_key {
+	rte_be64_t tun_id; /**< Tunnel identification. */
+	union {
+		struct {
+			rte_be32_t src; /**< IPv4 source address. */
+			rte_be32_t dst; /**< IPv4 destination address. */
+		} ipv4;
+		struct {
+			uint8_t src[16]; /**< IPv6 source address. */
+			uint8_t dst[16]; /**< IPv6 destination address. */
+		} ipv6;
+	} u;
+	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
+	rte_be16_t tun_flags; /**< Tunnel flags. */
+	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
+	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
+	rte_be32_t label; /**< Flow Label for IPv6. */
+	rte_be16_t tp_src; /**< Tunnel port source. */
+	rte_be16_t tp_dst; /**< Tunnel port destination. */
+};
+
+
+/* Tunnel has a type and the key information. */
+__rte_experimental
+struct rte_flow_tunnel {
+	/** Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
+	  * RTE_FLOW_ITEM_TYPE_NVGRE etc. */
+	enum rte_flow_item_type		type;
+	struct rte_flow_ip_tunnel_key	tun_info; /**< Tunnel key info. */
+};
+
+/**
+ * Indicate that the packet has a tunnel.
+ */
+#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
+
+/**
+ * Indicate that the packet has a non decapsulated tunnel header.
+ */
+#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
+
+/**
+ * Indicate that the packet has a group_id.
+ */
+#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
+
+/**
+ * Restore information structure to communicate the current packet processing
+ * state when some of the processing pipeline is done in hardware and should
+ * continue in software.
+ */
+__rte_experimental
+struct rte_flow_restore_info {
+	/** Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
+	  * other fields in struct rte_flow_restore_info.
+	  */
+	uint64_t flags;
+	uint32_t group_id; /**< Group ID. */
+	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
+};
+
+/**
+ * Allocate an array of actions to be used in rte_flow_create, to implement
+ * tunnel-set for the given tunnel.
+ * Sample usage:
+ *   actions vxlan_decap / tunnel_set(tunnel properties) / jump group 0 / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] actions
+ *   Array of actions to be allocated by the PMD. This array should be
+ *   concatenated with the actions array provided to rte_flow_create.
+ * @param[out] num_of_actions
+ *   Number of actions allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_set(uint16_t port_id,
+		    struct rte_flow_tunnel *tunnel,
+		    struct rte_flow_action **actions,
+		    uint32_t *num_of_actions,
+		    struct rte_flow_error *error);
+
+/**
+ * Allocate an array of items to be used in rte_flow_create, to implement
+ * tunnel-match for the given tunnel.
+ * Sample usage:
+ *   pattern tunnel-match(tunnel properties) / outer-header-matches /
+ *           inner-header-matches / end
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] tunnel
+ *   Tunnel properties.
+ * @param[out] items
+ *   Array of items to be allocated by the PMD. This array should be
+ *   concatenated with the items array provided to rte_flow_create.
+ * @param[out] num_of_items
+ *   Number of items allocated.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_tunnel_match(uint16_t port_id,
+		      struct rte_flow_tunnel *tunnel,
+		      struct rte_flow_item **items,
+		      uint32_t *num_of_items,
+		      struct rte_flow_error *error);
+
+/**
+ * Populate the current packet processing state, if exists, for the given mbuf.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] m
+ *   Mbuf struct.
+ * @param[out] info
+ *   Restore information. Upon success contains the HW state.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_get_restore_info(uint16_t port_id,
+			  struct rte_mbuf *m,
+			  struct rte_flow_restore_info *info,
+			  struct rte_flow_error *error);
+
+/**
+ * Release the action array as allocated by rte_flow_tunnel_set.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] actions
+ *   Array of actions to be released.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_action_release(uint16_t port_id,
+			struct rte_flow_action *actions,
+			struct rte_flow_error *error);
+
+/**
+ * Release the item array as allocated by rte_flow_tunnel_match.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] items
+ *   Array of items to be released.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_flow_item_release(uint16_t port_id,
+		      struct rte_flow_item *items,
+		      struct rte_flow_error *error);
+
  #ifdef __cplusplus
  }
  #endif

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [RFC] - Offloading tunnel ports
  2020-06-09 15:07 [dpdk-dev] [RFC] - Offloading tunnel ports Oz Shlomo
@ 2020-06-24 17:09 ` Thomas Monjalon
  2020-07-02 11:34 ` Sriharsha Basavapatna
  2020-07-12 16:34 ` William Tu
  2 siblings, 0 replies; 7+ messages in thread
From: Thomas Monjalon @ 2020-06-24 17:09 UTC (permalink / raw)
  To: dev
  Cc: Ori Kam, Eli Britstein, Sriharsha Basavapatna, Hemal Shah,
	Oz Shlomo, ajit.khaparde

Ping for review

09/06/2020 17:07, Oz Shlomo:
> Rte_flow API provides the building blocks for vendor agnostic flow
> classification offloads.  The rte_flow match and action primitives are fine
> grained, thus enabling DPDK applications the flexibility to offload network
> stacks and complex pipelines.
> 
> Applications wishing to offload complex data structures (e.g. tunnel virtual
> ports) are required to use the rte_flow primitives, such as group, meta, mark,
> tag and others to model their high level objects.
> 
> The hardware model design for high level software objects is not trivial.
> Furthermore, an optimal design is often vendor specific.
> 
> The goal of this RFC is to provide applications with the hardware offload
> model for common high level software objects which is optimal in regards
> to the underlying hardware.
> 
> Tunnel ports are the first of such objects.
> 
> Tunnel ports
> ------------
> Ingress processing of tunneled traffic requires the classification
> of the tunnel type followed by a decap action.
> 
> In software, once a packet is decapsulated the in_port field is changed
> to a virtual port representing the tunnel type. The outer header fields
> are stored as packet metadata members and may be matched by proceeding
> flows.
> 
> Openvswitch, for example, uses two flows:
> 1. classification flow - setting the virtual port representing the tunnel type
> For example: match on udp port 4789 actions=tnl_pop(vxlan_vport)
> 2. steering flow according to outer and inner header matches
> match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X
> The benefits of multi-flow tables are described in [1].
> 
> Offloading tunnel ports
> -----------------------
> Tunnel ports introduce a new stateless field that can be matched on.
> Currently the rte_flow library provides an API to encap, decap and match
> on tunnel headers. However, there is no rte_flow primitive to set and
> match tunnel virtual ports.
> 
> There are several possible hardware models for offloading virtual tunnel port
> flows including, but not limited to, the following:
> 1. Setting the virtual port on a hw register using the rte_flow_action_mark/
> rte_flow_action_tag/rte_flow_set_meta objects.
> 2. Mapping a virtual port to an rte_flow group
> 3. Avoiding the need to match on transient objects by merging multi-table
> flows to a single rte_flow rule.
> 
> Every approach has its pros and cons.
> The preferred approach should take into account the entire system architecture
> and is very often vendor specific.
> 
> The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed
> to provide a common, vendor agnostic, API for setting the virtual port value.
> The helper API enables PMD implementations to return vendor specific combination of
> rte_flow actions realizing the vendor's hardware model for setting a tunnel port.
> Applications may append the list of actions returned from the helper function when
> creating an rte_flow rule in hardware.
> 
> Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for
> multiple hardware implementations to return a list of fte_flow items.
> 
> Miss handling
> -------------
> Packets going through multiple rte_flow groups are exposed to hw misses due to
> partial packet processing. In such cases, the software should continue the
> packet's processing from the point where the hardware missed.
> 
> We propose a generic rte_flow_restore structure providing the state that was
> stored in hardware when the packet missed.
> 
> Currently, the structure will provide the tunnel state of the packet that
> missed, namely:
> 1. The group id that missed
> 2. The tunnel port that missed
> 3. Tunnel information that was stored in memory (due to decap action).
> In the future, we may add additional fields as more state may be stored in
> the device memory (e.g. ct_state).
> 
> Applications may query the state via a new rte_flow_get_restore_info(mbuf) API,
> thus allowing a vendor specific implementation.
> 
> API draft is provided below
> 
> ---
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index b0e4199192..49c871fc46 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3324,6 +3324,193 @@ int
>   rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>   			uint32_t nb_contexts, struct rte_flow_error *error);
> 
> +/* Tunnel information. */
> +__rte_experimental
> +struct rte_flow_ip_tunnel_key {
> +	rte_be64_t tun_id; /**< Tunnel identification. */
> +	union {
> +		struct {
> +			rte_be32_t src; /**< IPv4 source address. */
> +			rte_be32_t dst; /**< IPv4 destination address. */
> +		} ipv4;
> +		struct {
> +			uint8_t src[16]; /**< IPv6 source address. */
> +			uint8_t dst[16]; /**< IPv6 destination address. */
> +		} ipv6;
> +	} u;
> +	bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> +	rte_be16_t tun_flags; /**< Tunnel flags. */
> +	uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> +	uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
> +	rte_be32_t label; /**< Flow Label for IPv6. */
> +	rte_be16_t tp_src; /**< Tunnel port source. */
> +	rte_be16_t tp_dst; /**< Tunnel port destination. */
> +};
> +
> +
> +/* Tunnel has a type and the key information. */
> +__rte_experimental
> +struct rte_flow_tunnel {
> +	/** Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> +	  * RTE_FLOW_ITEM_TYPE_NVGRE etc. */
> +	enum rte_flow_item_type		type;
> +	struct rte_flow_ip_tunnel_key	tun_info; /**< Tunnel key info. */
> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +__rte_experimental
> +struct rte_flow_restore_info {
> +	/** Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> +	  * other fields in struct rte_flow_restore_info.
> +	  */
> +	uint64_t flags;
> +	uint32_t group_id; /**< Group ID. */
> +	struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-set for the given tunnel.
> + * Sample usage:
> + *   actions vxlan_decap / tunnel_set(tunnel properties) / jump group 0 / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] actions
> + *   Array of actions to be allocated by the PMD. This array should be
> + *   concatenated with the actions array provided to rte_flow_create.
> + * @param[out] num_of_actions
> + *   Number of actions allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_set(uint16_t port_id,
> +		    struct rte_flow_tunnel *tunnel,
> +		    struct rte_flow_action **actions,
> +		    uint32_t *num_of_actions,
> +		    struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> + *           inner-header-matches / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] items
> + *   Array of items to be allocated by the PMD. This array should be
> + *   concatenated with the items array provided to rte_flow_create.
> + * @param[out] num_of_items
> + *   Number of items allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +		      struct rte_flow_tunnel *tunnel,
> +		      struct rte_flow_item **items,
> +		      uint32_t *num_of_items,
> +		      struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given mbuf.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] m
> + *   Mbuf struct.
> + * @param[out] info
> + *   Restore information. Upon success contains the HW state.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_get_restore_info(uint16_t port_id,
> +			  struct rte_mbuf *m,
> +			  struct rte_flow_restore_info *info,
> +			  struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_set.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] actions
> + *   Array of actions to be released.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_action_release(uint16_t port_id,
> +			struct rte_flow_action *actions,
> +			struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] items
> + *   Array of items to be released.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_item_release(uint16_t port_id,
> +		      struct rte_flow_item *items,
> +		      struct rte_flow_error *error);
> +
>   #ifdef __cplusplus
>   }
>   #endif





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [RFC] - Offloading tunnel ports
  2020-06-09 15:07 [dpdk-dev] [RFC] - Offloading tunnel ports Oz Shlomo
  2020-06-24 17:09 ` Thomas Monjalon
@ 2020-07-02 11:34 ` Sriharsha Basavapatna
  2020-07-02 11:43   ` Oz Shlomo
  2020-07-12 16:34 ` William Tu
  2 siblings, 1 reply; 7+ messages in thread
From: Sriharsha Basavapatna @ 2020-07-02 11:34 UTC (permalink / raw)
  To: Oz Shlomo
  Cc: dev, Thomas Monjalon, Ori Kam, Eli Britstein, Hemal Shah,
	Sriharsha Basavapatna

On Tue, Jun 9, 2020 at 8:37 PM Oz Shlomo <ozsh@mellanox.com> wrote:
>
> Rte_flow API provides the building blocks for vendor agnostic flow
> classification offloads.  The rte_flow match and action primitives are fine
> grained, thus enabling DPDK applications the flexibility to offload network
> stacks and complex pipelines.
>
> Applications wishing to offload complex data structures (e.g. tunnel virtual
> ports) are required to use the rte_flow primitives, such as group, meta, mark,
> tag and others to model their high level objects.
>
> The hardware model design for high level software objects is not trivial.
> Furthermore, an optimal design is often vendor specific.
>
> The goal of this RFC is to provide applications with the hardware offload
> model for common high level software objects which is optimal in regards
> to the underlying hardware.
>
> Tunnel ports are the first of such objects.
>
> Tunnel ports
> ------------
> Ingress processing of tunneled traffic requires the classification
> of the tunnel type followed by a decap action.
>
> In software, once a packet is decapsulated the in_port field is changed
> to a virtual port representing the tunnel type. The outer header fields
> are stored as packet metadata members and may be matched by proceeding
> flows.
>
> Openvswitch, for example, uses two flows:
> 1. classification flow - setting the virtual port representing the tunnel type
> For example: match on udp port 4789 actions=tnl_pop(vxlan_vport)
> 2. steering flow according to outer and inner header matches
> match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X
> The benefits of multi-flow tables are described in [1].

You probably missed to add a link to this reference [1] ? I couldn't
find it in this email.

Thanks,
-Harsha
>
> Offloading tunnel ports
> -----------------------
> Tunnel ports introduce a new stateless field that can be matched on.
> Currently the rte_flow library provides an API to encap, decap and match
> on tunnel headers. However, there is no rte_flow primitive to set and
> match tunnel virtual ports.
>
> There are several possible hardware models for offloading virtual tunnel port
> flows including, but not limited to, the following:
> 1. Setting the virtual port on a hw register using the rte_flow_action_mark/
> rte_flow_action_tag/rte_flow_set_meta objects.
> 2. Mapping a virtual port to an rte_flow group
> 3. Avoiding the need to match on transient objects by merging multi-table
> flows to a single rte_flow rule.
>
> Every approach has its pros and cons.
> The preferred approach should take into account the entire system architecture
> and is very often vendor specific.
>
> The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed
> to provide a common, vendor agnostic, API for setting the virtual port value.
> The helper API enables PMD implementations to return vendor specific combination of
> rte_flow actions realizing the vendor's hardware model for setting a tunnel port.
> Applications may append the list of actions returned from the helper function when
> creating an rte_flow rule in hardware.
>
> Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for
> multiple hardware implementations to return a list of fte_flow items.
>
> Miss handling
> -------------
> Packets going through multiple rte_flow groups are exposed to hw misses due to
> partial packet processing. In such cases, the software should continue the
> packet's processing from the point where the hardware missed.
>
> We propose a generic rte_flow_restore structure providing the state that was
> stored in hardware when the packet missed.
>
> Currently, the structure will provide the tunnel state of the packet that
> missed, namely:
> 1. The group id that missed
> 2. The tunnel port that missed
> 3. Tunnel information that was stored in memory (due to decap action).
> In the future, we may add additional fields as more state may be stored in
> the device memory (e.g. ct_state).
>
> Applications may query the state via a new rte_flow_get_restore_info(mbuf) API,
> thus allowing a vendor specific implementation.
>
> API draft is provided below
>
> ---
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index b0e4199192..49c871fc46 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3324,6 +3324,193 @@ int
>   rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>                         uint32_t nb_contexts, struct rte_flow_error *error);
>
> +/* Tunnel information. */
> +__rte_experimental
> +struct rte_flow_ip_tunnel_key {
> +       rte_be64_t tun_id; /**< Tunnel identification. */
> +       union {
> +               struct {
> +                       rte_be32_t src; /**< IPv4 source address. */
> +                       rte_be32_t dst; /**< IPv4 destination address. */
> +               } ipv4;
> +               struct {
> +                       uint8_t src[16]; /**< IPv6 source address. */
> +                       uint8_t dst[16]; /**< IPv6 destination address. */
> +               } ipv6;
> +       } u;
> +       bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> +       rte_be16_t tun_flags; /**< Tunnel flags. */
> +       uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> +       uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
> +       rte_be32_t label; /**< Flow Label for IPv6. */
> +       rte_be16_t tp_src; /**< Tunnel port source. */
> +       rte_be16_t tp_dst; /**< Tunnel port destination. */
> +};
> +
> +
> +/* Tunnel has a type and the key information. */
> +__rte_experimental
> +struct rte_flow_tunnel {
> +       /** Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> +         * RTE_FLOW_ITEM_TYPE_NVGRE etc. */
> +       enum rte_flow_item_type         type;
> +       struct rte_flow_ip_tunnel_key   tun_info; /**< Tunnel key info. */
> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +__rte_experimental
> +struct rte_flow_restore_info {
> +       /** Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> +         * other fields in struct rte_flow_restore_info.
> +         */
> +       uint64_t flags;
> +       uint32_t group_id; /**< Group ID. */
> +       struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-set for the given tunnel.
> + * Sample usage:
> + *   actions vxlan_decap / tunnel_set(tunnel properties) / jump group 0 / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] actions
> + *   Array of actions to be allocated by the PMD. This array should be
> + *   concatenated with the actions array provided to rte_flow_create.
> + * @param[out] num_of_actions
> + *   Number of actions allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_set(uint16_t port_id,
> +                   struct rte_flow_tunnel *tunnel,
> +                   struct rte_flow_action **actions,
> +                   uint32_t *num_of_actions,
> +                   struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> + *           inner-header-matches / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] items
> + *   Array of items to be allocated by the PMD. This array should be
> + *   concatenated with the items array provided to rte_flow_create.
> + * @param[out] num_of_items
> + *   Number of items allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +                     struct rte_flow_tunnel *tunnel,
> +                     struct rte_flow_item **items,
> +                     uint32_t *num_of_items,
> +                     struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given mbuf.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] m
> + *   Mbuf struct.
> + * @param[out] info
> + *   Restore information. Upon success contains the HW state.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_get_restore_info(uint16_t port_id,
> +                         struct rte_mbuf *m,
> +                         struct rte_flow_restore_info *info,
> +                         struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_set.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] actions
> + *   Array of actions to be released.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_action_release(uint16_t port_id,
> +                       struct rte_flow_action *actions,
> +                       struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] items
> + *   Array of items to be released.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_item_release(uint16_t port_id,
> +                     struct rte_flow_item *items,
> +                     struct rte_flow_error *error);
> +
>   #ifdef __cplusplus
>   }
>   #endif

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [RFC] - Offloading tunnel ports
  2020-07-02 11:34 ` Sriharsha Basavapatna
@ 2020-07-02 11:43   ` Oz Shlomo
  0 siblings, 0 replies; 7+ messages in thread
From: Oz Shlomo @ 2020-07-02 11:43 UTC (permalink / raw)
  To: Sriharsha Basavapatna
  Cc: dev, Thomas Monjalon, Ori Kam, Eli Britstein, Hemal Shah



On 7/2/2020 2:34 PM, Sriharsha Basavapatna wrote:
> On Tue, Jun 9, 2020 at 8:37 PM Oz Shlomo <ozsh@mellanox.com> wrote:
>>
>> Rte_flow API provides the building blocks for vendor agnostic flow
>> classification offloads.  The rte_flow match and action primitives are fine
>> grained, thus enabling DPDK applications the flexibility to offload network
>> stacks and complex pipelines.
>>
>> Applications wishing to offload complex data structures (e.g. tunnel virtual
>> ports) are required to use the rte_flow primitives, such as group, meta, mark,
>> tag and others to model their high level objects.
>>
>> The hardware model design for high level software objects is not trivial.
>> Furthermore, an optimal design is often vendor specific.
>>
>> The goal of this RFC is to provide applications with the hardware offload
>> model for common high level software objects which is optimal in regards
>> to the underlying hardware.
>>
>> Tunnel ports are the first of such objects.
>>
>> Tunnel ports
>> ------------
>> Ingress processing of tunneled traffic requires the classification
>> of the tunnel type followed by a decap action.
>>
>> In software, once a packet is decapsulated the in_port field is changed
>> to a virtual port representing the tunnel type. The outer header fields
>> are stored as packet metadata members and may be matched by proceeding
>> flows.
>>
>> Openvswitch, for example, uses two flows:
>> 1. classification flow - setting the virtual port representing the tunnel type
>> For example: match on udp port 4789 actions=tnl_pop(vxlan_vport)
>> 2. steering flow according to outer and inner header matches
>> match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X
>> The benefits of multi-flow tables are described in [1].
> 
> You probably missed to add a link to this reference [1] ? I couldn't
> find it in this email.
> 
> Thanks,
> -Harsha

Right, sorry about that. Here is the reference:
[1] - https://www.opennetworking.org/wp-content/uploads/2014/10/TR_Multiple_Flow_Tables_and_TTPs.pdf

>>
>> Offloading tunnel ports
>> -----------------------
>> Tunnel ports introduce a new stateless field that can be matched on.
>> Currently the rte_flow library provides an API to encap, decap and match
>> on tunnel headers. However, there is no rte_flow primitive to set and
>> match tunnel virtual ports.
>>
>> There are several possible hardware models for offloading virtual tunnel port
>> flows including, but not limited to, the following:
>> 1. Setting the virtual port on a hw register using the rte_flow_action_mark/
>> rte_flow_action_tag/rte_flow_set_meta objects.
>> 2. Mapping a virtual port to an rte_flow group
>> 3. Avoiding the need to match on transient objects by merging multi-table
>> flows to a single rte_flow rule.
>>
>> Every approach has its pros and cons.
>> The preferred approach should take into account the entire system architecture
>> and is very often vendor specific.
>>
>> The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed
>> to provide a common, vendor agnostic, API for setting the virtual port value.
>> The helper API enables PMD implementations to return vendor specific combination of
>> rte_flow actions realizing the vendor's hardware model for setting a tunnel port.
>> Applications may append the list of actions returned from the helper function when
>> creating an rte_flow rule in hardware.
>>
>> Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for
>> multiple hardware implementations to return a list of fte_flow items.
>>
>> Miss handling
>> -------------
>> Packets going through multiple rte_flow groups are exposed to hw misses due to
>> partial packet processing. In such cases, the software should continue the
>> packet's processing from the point where the hardware missed.
>>
>> We propose a generic rte_flow_restore structure providing the state that was
>> stored in hardware when the packet missed.
>>
>> Currently, the structure will provide the tunnel state of the packet that
>> missed, namely:
>> 1. The group id that missed
>> 2. The tunnel port that missed
>> 3. Tunnel information that was stored in memory (due to decap action).
>> In the future, we may add additional fields as more state may be stored in
>> the device memory (e.g. ct_state).
>>
>> Applications may query the state via a new rte_flow_get_restore_info(mbuf) API,
>> thus allowing a vendor specific implementation.
>>
>> API draft is provided below
>>
>> ---
>> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
>> index b0e4199192..49c871fc46 100644
>> --- a/lib/librte_ethdev/rte_flow.h
>> +++ b/lib/librte_ethdev/rte_flow.h
>> @@ -3324,6 +3324,193 @@ int
>>    rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>>                          uint32_t nb_contexts, struct rte_flow_error *error);
>>
>> +/* Tunnel information. */
>> +__rte_experimental
>> +struct rte_flow_ip_tunnel_key {
>> +       rte_be64_t tun_id; /**< Tunnel identification. */
>> +       union {
>> +               struct {
>> +                       rte_be32_t src; /**< IPv4 source address. */
>> +                       rte_be32_t dst; /**< IPv4 destination address. */
>> +               } ipv4;
>> +               struct {
>> +                       uint8_t src[16]; /**< IPv6 source address. */
>> +                       uint8_t dst[16]; /**< IPv6 destination address. */
>> +               } ipv6;
>> +       } u;
>> +       bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
>> +       rte_be16_t tun_flags; /**< Tunnel flags. */
>> +       uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
>> +       uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
>> +       rte_be32_t label; /**< Flow Label for IPv6. */
>> +       rte_be16_t tp_src; /**< Tunnel port source. */
>> +       rte_be16_t tp_dst; /**< Tunnel port destination. */
>> +};
>> +
>> +
>> +/* Tunnel has a type and the key information. */
>> +__rte_experimental
>> +struct rte_flow_tunnel {
>> +       /** Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
>> +         * RTE_FLOW_ITEM_TYPE_NVGRE etc. */
>> +       enum rte_flow_item_type         type;
>> +       struct rte_flow_ip_tunnel_key   tun_info; /**< Tunnel key info. */
>> +};
>> +
>> +/**
>> + * Indicate that the packet has a tunnel.
>> + */
>> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
>> +
>> +/**
>> + * Indicate that the packet has a non decapsulated tunnel header.
>> + */
>> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
>> +
>> +/**
>> + * Indicate that the packet has a group_id.
>> + */
>> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
>> +
>> +/**
>> + * Restore information structure to communicate the current packet processing
>> + * state when some of the processing pipeline is done in hardware and should
>> + * continue in software.
>> + */
>> +__rte_experimental
>> +struct rte_flow_restore_info {
>> +       /** Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
>> +         * other fields in struct rte_flow_restore_info.
>> +         */
>> +       uint64_t flags;
>> +       uint32_t group_id; /**< Group ID. */
>> +       struct rte_flow_tunnel tunnel; /**< Tunnel information. */
>> +};
>> +
>> +/**
>> + * Allocate an array of actions to be used in rte_flow_create, to implement
>> + * tunnel-set for the given tunnel.
>> + * Sample usage:
>> + *   actions vxlan_decap / tunnel_set(tunnel properties) / jump group 0 / end
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] tunnel
>> + *   Tunnel properties.
>> + * @param[out] actions
>> + *   Array of actions to be allocated by the PMD. This array should be
>> + *   concatenated with the actions array provided to rte_flow_create.
>> + * @param[out] num_of_actions
>> + *   Number of actions allocated.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_tunnel_set(uint16_t port_id,
>> +                   struct rte_flow_tunnel *tunnel,
>> +                   struct rte_flow_action **actions,
>> +                   uint32_t *num_of_actions,
>> +                   struct rte_flow_error *error);
>> +
>> +/**
>> + * Allocate an array of items to be used in rte_flow_create, to implement
>> + * tunnel-match for the given tunnel.
>> + * Sample usage:
>> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
>> + *           inner-header-matches / end
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] tunnel
>> + *   Tunnel properties.
>> + * @param[out] items
>> + *   Array of items to be allocated by the PMD. This array should be
>> + *   concatenated with the items array provided to rte_flow_create.
>> + * @param[out] num_of_items
>> + *   Number of items allocated.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_tunnel_match(uint16_t port_id,
>> +                     struct rte_flow_tunnel *tunnel,
>> +                     struct rte_flow_item **items,
>> +                     uint32_t *num_of_items,
>> +                     struct rte_flow_error *error);
>> +
>> +/**
>> + * Populate the current packet processing state, if exists, for the given mbuf.
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] m
>> + *   Mbuf struct.
>> + * @param[out] info
>> + *   Restore information. Upon success contains the HW state.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_get_restore_info(uint16_t port_id,
>> +                         struct rte_mbuf *m,
>> +                         struct rte_flow_restore_info *info,
>> +                         struct rte_flow_error *error);
>> +
>> +/**
>> + * Release the action array as allocated by rte_flow_tunnel_set.
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] actions
>> + *   Array of actions to be released.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_action_release(uint16_t port_id,
>> +                       struct rte_flow_action *actions,
>> +                       struct rte_flow_error *error);
>> +
>> +/**
>> + * Release the item array as allocated by rte_flow_tunnel_match.
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] items
>> + *   Array of items to be released.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_item_release(uint16_t port_id,
>> +                     struct rte_flow_item *items,
>> +                     struct rte_flow_error *error);
>> +
>>    #ifdef __cplusplus
>>    }
>>    #endif

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [RFC] - Offloading tunnel ports
  2020-06-09 15:07 [dpdk-dev] [RFC] - Offloading tunnel ports Oz Shlomo
  2020-06-24 17:09 ` Thomas Monjalon
  2020-07-02 11:34 ` Sriharsha Basavapatna
@ 2020-07-12 16:34 ` William Tu
  2020-07-13  4:52   ` Oz Shlomo
  2 siblings, 1 reply; 7+ messages in thread
From: William Tu @ 2020-07-12 16:34 UTC (permalink / raw)
  To: Oz Shlomo
  Cc: dev, Thomas Monjalon, Ori Kam, Eli Britstein,
	Sriharsha Basavapatna, Hemal Shah

Hi Oz,

I started to learn about this and have a couple of questions below.
Thank you in advance.

On Tue, Jun 9, 2020 at 8:07 AM Oz Shlomo <ozsh@mellanox.com> wrote:
>
> Rte_flow API provides the building blocks for vendor agnostic flow
> classification offloads.  The rte_flow match and action primitives are fine
> grained, thus enabling DPDK applications the flexibility to offload network
> stacks and complex pipelines.
>
> Applications wishing to offload complex data structures (e.g. tunnel virtual
> ports) are required to use the rte_flow primitives, such as group, meta, mark,
> tag and others to model their high level objects.
>
> The hardware model design for high level software objects is not trivial.
> Furthermore, an optimal design is often vendor specific.
>
> The goal of this RFC is to provide applications with the hardware offload
> model for common high level software objects which is optimal in regards
> to the underlying hardware.
>
> Tunnel ports are the first of such objects.
>
> Tunnel ports
> ------------
> Ingress processing of tunneled traffic requires the classification
> of the tunnel type followed by a decap action.
>
> In software, once a packet is decapsulated the in_port field is changed
> to a virtual port representing the tunnel type. The outer header fields
> are stored as packet metadata members and may be matched by proceeding
> flows.
>
> Openvswitch, for example, uses two flows:
> 1. classification flow - setting the virtual port representing the tunnel type
> For example: match on udp port 4789 actions=tnl_pop(vxlan_vport)
> 2. steering flow according to outer and inner header matches
> match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X
> The benefits of multi-flow tables are described in [1].
>
> Offloading tunnel ports
> -----------------------
> Tunnel ports introduce a new stateless field that can be matched on.
> Currently the rte_flow library provides an API to encap, decap and match
> on tunnel headers. However, there is no rte_flow primitive to set and
> match tunnel virtual ports.
>
> There are several possible hardware models for offloading virtual tunnel port
> flows including, but not limited to, the following:
> 1. Setting the virtual port on a hw register using the rte_flow_action_mark/
> rte_flow_action_tag/rte_flow_set_meta objects.
> 2. Mapping a virtual port to an rte_flow group
> 3. Avoiding the need to match on transient objects by merging multi-table
> flows to a single rte_flow rule.
>
> Every approach has its pros and cons.
> The preferred approach should take into account the entire system architecture
> and is very often vendor specific.

Are these three solutions mutually exclusive?
And IIUC, based on the description below, you're proposing solution 1, right?
and the patch on OVS is using solution 2?
https://patchwork.ozlabs.org/project/openvswitch/cover/20200120150830.16262-1-elibr@mellanox.com/

>
> The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed
> to provide a common, vendor agnostic, API for setting the virtual port value.
> The helper API enables PMD implementations to return vendor specific combination of
> rte_flow actions realizing the vendor's hardware model for setting a tunnel port.
> Applications may append the list of actions returned from the helper function when
> creating an rte_flow rule in hardware.
>
> Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for
> multiple hardware implementations to return a list of fte_flow items.
>
And if we're using solution 1 "Setting the virtual port on a hw
register using the rte_flow_action_mark/
rte_flow_action_tag/rte_flow_set_meta objects."
For the classification flow, does that mean HW no longer needs to
translate tnl_pop to mark + jump,
but the HW can directly execute the tnl_pop(vxlan_vport) action
because the outer header is
saved using rte_flow_set_meta?

> Miss handling
> -------------
> Packets going through multiple rte_flow groups are exposed to hw misses due to
> partial packet processing. In such cases, the software should continue the
> packet's processing from the point where the hardware missed.
>
> We propose a generic rte_flow_restore structure providing the state that was
> stored in hardware when the packet missed.
>
> Currently, the structure will provide the tunnel state of the packet that
> missed, namely:
> 1. The group id that missed
> 2. The tunnel port that missed
> 3. Tunnel information that was stored in memory (due to decap action).
> In the future, we may add additional fields as more state may be stored in
> the device memory (e.g. ct_state).
>
> Applications may query the state via a new rte_flow_get_restore_info(mbuf) API,
> thus allowing a vendor specific implementation.
>

Thanks
William

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [RFC] - Offloading tunnel ports
  2020-07-12 16:34 ` William Tu
@ 2020-07-13  4:52   ` Oz Shlomo
  2020-07-13 15:12     ` William Tu
  0 siblings, 1 reply; 7+ messages in thread
From: Oz Shlomo @ 2020-07-13  4:52 UTC (permalink / raw)
  To: William Tu
  Cc: dev, Thomas Monjalon, Ori Kam, Eli Britstein,
	Sriharsha Basavapatna, Hemal Shah

Hi William,

On 7/12/2020 7:34 PM, William Tu wrote:
> Hi Oz,
> 
> I started to learn about this and have a couple of questions below.
> Thank you in advance.
> 
> On Tue, Jun 9, 2020 at 8:07 AM Oz Shlomo <ozsh@mellanox.com> wrote:
>>
>> Rte_flow API provides the building blocks for vendor agnostic flow
>> classification offloads.  The rte_flow match and action primitives are fine
>> grained, thus enabling DPDK applications the flexibility to offload network
>> stacks and complex pipelines.
>>
>> Applications wishing to offload complex data structures (e.g. tunnel virtual
>> ports) are required to use the rte_flow primitives, such as group, meta, mark,
>> tag and others to model their high level objects.
>>
>> The hardware model design for high level software objects is not trivial.
>> Furthermore, an optimal design is often vendor specific.
>>
>> The goal of this RFC is to provide applications with the hardware offload
>> model for common high level software objects which is optimal in regards
>> to the underlying hardware.
>>
>> Tunnel ports are the first of such objects.
>>
>> Tunnel ports
>> ------------
>> Ingress processing of tunneled traffic requires the classification
>> of the tunnel type followed by a decap action.
>>
>> In software, once a packet is decapsulated the in_port field is changed
>> to a virtual port representing the tunnel type. The outer header fields
>> are stored as packet metadata members and may be matched by proceeding
>> flows.
>>
>> Openvswitch, for example, uses two flows:
>> 1. classification flow - setting the virtual port representing the tunnel type
>> For example: match on udp port 4789 actions=tnl_pop(vxlan_vport)
>> 2. steering flow according to outer and inner header matches
>> match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X
>> The benefits of multi-flow tables are described in [1].
>>
>> Offloading tunnel ports
>> -----------------------
>> Tunnel ports introduce a new stateless field that can be matched on.
>> Currently the rte_flow library provides an API to encap, decap and match
>> on tunnel headers. However, there is no rte_flow primitive to set and
>> match tunnel virtual ports.
>>
>> There are several possible hardware models for offloading virtual tunnel port
>> flows including, but not limited to, the following:
>> 1. Setting the virtual port on a hw register using the rte_flow_action_mark/
>> rte_flow_action_tag/rte_flow_set_meta objects.
>> 2. Mapping a virtual port to an rte_flow group
>> 3. Avoiding the need to match on transient objects by merging multi-table
>> flows to a single rte_flow rule.
>>
>> Every approach has its pros and cons.
>> The preferred approach should take into account the entire system architecture
>> and is very often vendor specific.
> 
> Are these three solutions mutually exclusive?
> And IIUC, based on the description below, you're proposing solution 1, right?
> and the patch on OVS is using solution 2?
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fopenvswitch%2Fcover%2F20200120150830.16262-1-elibr%40mellanox.com%2F&amp;data=02%7C01%7Cozsh%40mellanox.com%7C4ece31d745d246e30f9308d8268185cb%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637301685025981024&amp;sdata=mPCFG468xYkHRX3HJRkrrDix4hfDLstAZtlILQfGxr8%3D&amp;reserved=0
> 

 From the OVS patchset we learned that it might be better to provide each vendor
with the flexibility to implement its optimal hardware model.
We propose this design as an alternative to the submitted OVS patchset.

This patch is designed to provide an abstract API.
As such, any of the solutions listed above, or others, are possible.
The Mellanox PMD is planned to implemented solution 2.


>>
>> The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed
>> to provide a common, vendor agnostic, API for setting the virtual port value.
>> The helper API enables PMD implementations to return vendor specific combination of
>> rte_flow actions realizing the vendor's hardware model for setting a tunnel port.
>> Applications may append the list of actions returned from the helper function when
>> creating an rte_flow rule in hardware.
>>
>> Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for
>> multiple hardware implementations to return a list of fte_flow items.
>>
> And if we're using solution 1 "Setting the virtual port on a hw
> register using the rte_flow_action_mark/
> rte_flow_action_tag/rte_flow_set_meta objects."
> For the classification flow, does that mean HW no longer needs to
> translate tnl_pop to mark + jump,
> but the HW can directly execute the tnl_pop(vxlan_vport) action
> because the outer header is
> saved using rte_flow_set_meta?
> 

In this case we would need to map the outer header fields to a unique id.
This can be done either from the datapath (for capable hardware) or from the
flows. The latter option, requires the flow to match on the outer header fields
that should be stored. OVS matches on the outer header fields only after it
classifies the tunnel port (i.e. after the tnl_pop action).


>> Miss handling
>> -------------
>> Packets going through multiple rte_flow groups are exposed to hw misses due to
>> partial packet processing. In such cases, the software should continue the
>> packet's processing from the point where the hardware missed.
>>
>> We propose a generic rte_flow_restore structure providing the state that was
>> stored in hardware when the packet missed.
>>
>> Currently, the structure will provide the tunnel state of the packet that
>> missed, namely:
>> 1. The group id that missed
>> 2. The tunnel port that missed
>> 3. Tunnel information that was stored in memory (due to decap action).
>> In the future, we may add additional fields as more state may be stored in
>> the device memory (e.g. ct_state).
>>
>> Applications may query the state via a new rte_flow_get_restore_info(mbuf) API,
>> thus allowing a vendor specific implementation.
>>
> 
> Thanks
> William
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [RFC] - Offloading tunnel ports
  2020-07-13  4:52   ` Oz Shlomo
@ 2020-07-13 15:12     ` William Tu
  0 siblings, 0 replies; 7+ messages in thread
From: William Tu @ 2020-07-13 15:12 UTC (permalink / raw)
  To: Oz Shlomo
  Cc: dev, Thomas Monjalon, Ori Kam, Eli Britstein,
	Sriharsha Basavapatna, Hemal Shah

On Sun, Jul 12, 2020 at 9:52 PM Oz Shlomo <ozsh@mellanox.com> wrote:
>
> Hi William,
>
> On 7/12/2020 7:34 PM, William Tu wrote:
> > Hi Oz,
> >
snip

> >>
> >> Openvswitch, for example, uses two flows:
> >> 1. classification flow - setting the virtual port representing the tunnel type
> >> For example: match on udp port 4789 actions=tnl_pop(vxlan_vport)
> >> 2. steering flow according to outer and inner header matches
> >> match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X
> >> The benefits of multi-flow tables are described in [1].
> >>
> >> Offloading tunnel ports
> >> -----------------------
> >> Tunnel ports introduce a new stateless field that can be matched on.
> >> Currently the rte_flow library provides an API to encap, decap and match
> >> on tunnel headers. However, there is no rte_flow primitive to set and
> >> match tunnel virtual ports.
> >>
> >> There are several possible hardware models for offloading virtual tunnel port
> >> flows including, but not limited to, the following:
> >> 1. Setting the virtual port on a hw register using the rte_flow_action_mark/
> >> rte_flow_action_tag/rte_flow_set_meta objects.
> >> 2. Mapping a virtual port to an rte_flow group
> >> 3. Avoiding the need to match on transient objects by merging multi-table
> >> flows to a single rte_flow rule.
> >>
> >> Every approach has its pros and cons.
> >> The preferred approach should take into account the entire system architecture
> >> and is very often vendor specific.
> >
> > Are these three solutions mutually exclusive?
> > And IIUC, based on the description below, you're proposing solution 1, right?
> > and the patch on OVS is using solution 2?
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fopenvswitch%2Fcover%2F20200120150830.16262-1-elibr%40mellanox.com%2F&amp;data=02%7C01%7Cozsh%40mellanox.com%7C4ece31d745d246e30f9308d8268185cb%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637301685025981024&amp;sdata=mPCFG468xYkHRX3HJRkrrDix4hfDLstAZtlILQfGxr8%3D&amp;reserved=0
> >
>
>  From the OVS patchset we learned that it might be better to provide each vendor
> with the flexibility to implement its optimal hardware model.
> We propose this design as an alternative to the submitted OVS patchset.
>
> This patch is designed to provide an abstract API.
> As such, any of the solutions listed above, or others, are possible.
> The Mellanox PMD is planned to implemented solution 2.
>
>
> >>
> >> The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed
> >> to provide a common, vendor agnostic, API for setting the virtual port value.
> >> The helper API enables PMD implementations to return vendor specific combination of
> >> rte_flow actions realizing the vendor's hardware model for setting a tunnel port.
> >> Applications may append the list of actions returned from the helper function when
> >> creating an rte_flow rule in hardware.
> >>
> >> Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for
> >> multiple hardware implementations to return a list of fte_flow items.
> >>
> > And if we're using solution 1 "Setting the virtual port on a hw
> > register using the rte_flow_action_mark/
> > rte_flow_action_tag/rte_flow_set_meta objects."
> > For the classification flow, does that mean HW no longer needs to
> > translate tnl_pop to mark + jump,
> > but the HW can directly execute the tnl_pop(vxlan_vport) action
> > because the outer header is
> > saved using rte_flow_set_meta?
> >
>
> In this case we would need to map the outer header fields to a unique id.
> This can be done either from the datapath (for capable hardware) or from the
> flows. The latter option, requires the flow to match on the outer header fields
> that should be stored. OVS matches on the outer header fields only after it
> classifies the tunnel port (i.e. after the tnl_pop action).
>
Hi Oz,
Thanks for your explanation. It's much more clear to me now.
William

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-13 15:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-09 15:07 [dpdk-dev] [RFC] - Offloading tunnel ports Oz Shlomo
2020-06-24 17:09 ` Thomas Monjalon
2020-07-02 11:34 ` Sriharsha Basavapatna
2020-07-02 11:43   ` Oz Shlomo
2020-07-12 16:34 ` William Tu
2020-07-13  4:52   ` Oz Shlomo
2020-07-13 15:12     ` William Tu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).