DPDK patches and discussions
 help / color / mirror / Atom feed
From: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
To: Oz Shlomo <ozsh@mellanox.com>
Cc: dev@dpdk.org, Thomas Monjalon <thomasm@mellanox.com>,
	Ori Kam <orika@mellanox.com>,  Eli Britstein <elibr@mellanox.com>,
	Hemal Shah <hemal.shah@broadcom.com>,
	 Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Subject: Re: [dpdk-dev] [RFC] - Offloading tunnel ports
Date: Thu, 2 Jul 2020 17:04:57 +0530	[thread overview]
Message-ID: <CAHHeUGWf2fzGZiB2RwQjK5AkFyoghduss6A0mZypc1exZAqLsw@mail.gmail.com> (raw)
In-Reply-To: <5862610e-76cc-7783-7d66-2b2173eeb974@mellanox.com>

On Tue, Jun 9, 2020 at 8:37 PM Oz Shlomo <ozsh@mellanox.com> wrote:
>
> Rte_flow API provides the building blocks for vendor agnostic flow
> classification offloads.  The rte_flow match and action primitives are fine
> grained, thus enabling DPDK applications the flexibility to offload network
> stacks and complex pipelines.
>
> Applications wishing to offload complex data structures (e.g. tunnel virtual
> ports) are required to use the rte_flow primitives, such as group, meta, mark,
> tag and others to model their high level objects.
>
> The hardware model design for high level software objects is not trivial.
> Furthermore, an optimal design is often vendor specific.
>
> The goal of this RFC is to provide applications with the hardware offload
> model for common high level software objects which is optimal in regards
> to the underlying hardware.
>
> Tunnel ports are the first of such objects.
>
> Tunnel ports
> ------------
> Ingress processing of tunneled traffic requires the classification
> of the tunnel type followed by a decap action.
>
> In software, once a packet is decapsulated the in_port field is changed
> to a virtual port representing the tunnel type. The outer header fields
> are stored as packet metadata members and may be matched by proceeding
> flows.
>
> Openvswitch, for example, uses two flows:
> 1. classification flow - setting the virtual port representing the tunnel type
> For example: match on udp port 4789 actions=tnl_pop(vxlan_vport)
> 2. steering flow according to outer and inner header matches
> match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X
> The benefits of multi-flow tables are described in [1].

You probably missed to add a link to this reference [1] ? I couldn't
find it in this email.

Thanks,
-Harsha
>
> Offloading tunnel ports
> -----------------------
> Tunnel ports introduce a new stateless field that can be matched on.
> Currently the rte_flow library provides an API to encap, decap and match
> on tunnel headers. However, there is no rte_flow primitive to set and
> match tunnel virtual ports.
>
> There are several possible hardware models for offloading virtual tunnel port
> flows including, but not limited to, the following:
> 1. Setting the virtual port on a hw register using the rte_flow_action_mark/
> rte_flow_action_tag/rte_flow_set_meta objects.
> 2. Mapping a virtual port to an rte_flow group
> 3. Avoiding the need to match on transient objects by merging multi-table
> flows to a single rte_flow rule.
>
> Every approach has its pros and cons.
> The preferred approach should take into account the entire system architecture
> and is very often vendor specific.
>
> The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed
> to provide a common, vendor agnostic, API for setting the virtual port value.
> The helper API enables PMD implementations to return vendor specific combination of
> rte_flow actions realizing the vendor's hardware model for setting a tunnel port.
> Applications may append the list of actions returned from the helper function when
> creating an rte_flow rule in hardware.
>
> Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for
> multiple hardware implementations to return a list of fte_flow items.
>
> Miss handling
> -------------
> Packets going through multiple rte_flow groups are exposed to hw misses due to
> partial packet processing. In such cases, the software should continue the
> packet's processing from the point where the hardware missed.
>
> We propose a generic rte_flow_restore structure providing the state that was
> stored in hardware when the packet missed.
>
> Currently, the structure will provide the tunnel state of the packet that
> missed, namely:
> 1. The group id that missed
> 2. The tunnel port that missed
> 3. Tunnel information that was stored in memory (due to decap action).
> In the future, we may add additional fields as more state may be stored in
> the device memory (e.g. ct_state).
>
> Applications may query the state via a new rte_flow_get_restore_info(mbuf) API,
> thus allowing a vendor specific implementation.
>
> API draft is provided below
>
> ---
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index b0e4199192..49c871fc46 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3324,6 +3324,193 @@ int
>   rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>                         uint32_t nb_contexts, struct rte_flow_error *error);
>
> +/* Tunnel information. */
> +__rte_experimental
> +struct rte_flow_ip_tunnel_key {
> +       rte_be64_t tun_id; /**< Tunnel identification. */
> +       union {
> +               struct {
> +                       rte_be32_t src; /**< IPv4 source address. */
> +                       rte_be32_t dst; /**< IPv4 destination address. */
> +               } ipv4;
> +               struct {
> +                       uint8_t src[16]; /**< IPv6 source address. */
> +                       uint8_t dst[16]; /**< IPv6 destination address. */
> +               } ipv6;
> +       } u;
> +       bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> +       rte_be16_t tun_flags; /**< Tunnel flags. */
> +       uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
> +       uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
> +       rte_be32_t label; /**< Flow Label for IPv6. */
> +       rte_be16_t tp_src; /**< Tunnel port source. */
> +       rte_be16_t tp_dst; /**< Tunnel port destination. */
> +};
> +
> +
> +/* Tunnel has a type and the key information. */
> +__rte_experimental
> +struct rte_flow_tunnel {
> +       /** Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> +         * RTE_FLOW_ITEM_TYPE_NVGRE etc. */
> +       enum rte_flow_item_type         type;
> +       struct rte_flow_ip_tunnel_key   tun_info; /**< Tunnel key info. */
> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +__rte_experimental
> +struct rte_flow_restore_info {
> +       /** Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> +         * other fields in struct rte_flow_restore_info.
> +         */
> +       uint64_t flags;
> +       uint32_t group_id; /**< Group ID. */
> +       struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-set for the given tunnel.
> + * Sample usage:
> + *   actions vxlan_decap / tunnel_set(tunnel properties) / jump group 0 / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] actions
> + *   Array of actions to be allocated by the PMD. This array should be
> + *   concatenated with the actions array provided to rte_flow_create.
> + * @param[out] num_of_actions
> + *   Number of actions allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_set(uint16_t port_id,
> +                   struct rte_flow_tunnel *tunnel,
> +                   struct rte_flow_action **actions,
> +                   uint32_t *num_of_actions,
> +                   struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
> + *           inner-header-matches / end
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] tunnel
> + *   Tunnel properties.
> + * @param[out] items
> + *   Array of items to be allocated by the PMD. This array should be
> + *   concatenated with the items array provided to rte_flow_create.
> + * @param[out] num_of_items
> + *   Number of items allocated.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> +                     struct rte_flow_tunnel *tunnel,
> +                     struct rte_flow_item **items,
> +                     uint32_t *num_of_items,
> +                     struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given mbuf.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] m
> + *   Mbuf struct.
> + * @param[out] info
> + *   Restore information. Upon success contains the HW state.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_get_restore_info(uint16_t port_id,
> +                         struct rte_mbuf *m,
> +                         struct rte_flow_restore_info *info,
> +                         struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_set.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] actions
> + *   Array of actions to be released.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_action_release(uint16_t port_id,
> +                       struct rte_flow_action *actions,
> +                       struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] items
> + *   Array of items to be released.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL. PMDs initialize this
> + *   structure in case of error only.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_item_release(uint16_t port_id,
> +                     struct rte_flow_item *items,
> +                     struct rte_flow_error *error);
> +
>   #ifdef __cplusplus
>   }
>   #endif

  parent reply	other threads:[~2020-07-02 11:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-09 15:07 Oz Shlomo
2020-06-24 17:09 ` Thomas Monjalon
2020-07-02 11:34 ` Sriharsha Basavapatna [this message]
2020-07-02 11:43   ` Oz Shlomo
2020-07-12 16:34 ` William Tu
2020-07-13  4:52   ` Oz Shlomo
2020-07-13 15:12     ` William Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHHeUGWf2fzGZiB2RwQjK5AkFyoghduss6A0mZypc1exZAqLsw@mail.gmail.com \
    --to=sriharsha.basavapatna@broadcom.com \
    --cc=dev@dpdk.org \
    --cc=elibr@mellanox.com \
    --cc=hemal.shah@broadcom.com \
    --cc=orika@mellanox.com \
    --cc=ozsh@mellanox.com \
    --cc=thomasm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).