From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 11E5DA0350; Wed, 24 Jun 2020 19:09:06 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EDACF1DA18; Wed, 24 Jun 2020 19:09:05 +0200 (CEST) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by dpdk.org (Postfix) with ESMTP id 708851DA12 for ; Wed, 24 Jun 2020 19:09:04 +0200 (CEST) Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 1EFE25C00F0; Wed, 24 Jun 2020 13:09:04 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute7.internal (MEProxy); Wed, 24 Jun 2020 13:09:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm1; bh= RH1/IURbcVYw9UZHJpwJpwUcsGrbHCH5wrLk1NAiaD0=; b=rLCZS4RMNBqD/JQG Hl6mpsAac7y7kjGL88Sg6RowShZWITmTqyQVtXkc3V12HdxgNGNBDDETNcTWLncG uc1TY5K+t/Ra+WlIvqJrgXSkBA2PYq2CLqo4CFrptQNMgwZOGJfqaI/Lc/TWT5dO C+/mulEewYd/GJU1m4ooU8lQ/IHT227YKwfGhk4jYzcjREpFeWSyX69LJ1davDJR 3bRbk3CYnFa2erxFax+nw61EsxjXbiHxVKBrWLki9pEiMqLmeAF9qxIuBxU0XAK6 fjUPdnubOvvlMi/N6H1iI4Cht6dR0hv2zRF3FPOmFAJIdTCj0ShnA7vmpwRVR+1W sDFi0Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=RH1/IURbcVYw9UZHJpwJpwUcsGrbHCH5wrLk1NAia D0=; b=oOTsfjg3AokUSRX2Gjh7n6KuDgmFONs04zz/PPVUxzOLBsbxQZV9fdmaf DsJ4vlkjmkAx6tuvrXF6iEvH100gfKooNyrouGMMyIRVy4b4YbRpABBsv4Xqcl9/ tHZannLwPEuqlpWKWWNL1vcPu5NCuzPzR+MNfJ8wuARQCcTgArXMAUwqCIpCymvd 2e5DUD13X9Oh56DDhloDvo5toTAxr4eVM9tvQjltsH/Hs/rYeYm1QWDDsvfH6wid BBglUlmZkk7+GWgvRL0EH1fa2vyJt4WM4R05OqpyupLf+4tg7aHubAnkIvQ6vXnf vPMTx8SALGB150e3KluCdxiqpqhhQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduhedrudekjedguddtgecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkjghfggfgtgesthfuredttddtvdenucfhrhhomhepvfhhohhm rghsucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtqeenuc ggtffrrghtthgvrhhnpedugefgvdefudfftdefgeelgffhueekgfffhfeujedtteeutdej ueeiiedvffegheenucfkphepjeejrddufeegrddvtdefrddukeegnecuvehluhhsthgvrh fuihiivgepudenucfrrghrrghmpehmrghilhhfrhhomhepthhhohhmrghssehmohhnjhgr lhhonhdrnhgvth X-ME-Proxy: Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184]) by mail.messagingengine.com (Postfix) with ESMTPA id E71C13280063; Wed, 24 Jun 2020 13:09:02 -0400 (EDT) From: Thomas Monjalon To: dev@dpdk.org Cc: Ori Kam , Eli Britstein , Sriharsha Basavapatna , Hemal Shah , Oz Shlomo , ajit.khaparde@broadcom.com Date: Wed, 24 Jun 2020 19:09:01 +0200 Message-ID: <2787244.yqipf475h2@thomas> In-Reply-To: <5862610e-76cc-7783-7d66-2b2173eeb974@mellanox.com> References: <5862610e-76cc-7783-7d66-2b2173eeb974@mellanox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [RFC] - Offloading tunnel ports X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Ping for review 09/06/2020 17:07, Oz Shlomo: > Rte_flow API provides the building blocks for vendor agnostic flow > classification offloads. The rte_flow match and action primitives are fine > grained, thus enabling DPDK applications the flexibility to offload network > stacks and complex pipelines. > > Applications wishing to offload complex data structures (e.g. tunnel virtual > ports) are required to use the rte_flow primitives, such as group, meta, mark, > tag and others to model their high level objects. > > The hardware model design for high level software objects is not trivial. > Furthermore, an optimal design is often vendor specific. > > The goal of this RFC is to provide applications with the hardware offload > model for common high level software objects which is optimal in regards > to the underlying hardware. > > Tunnel ports are the first of such objects. > > Tunnel ports > ------------ > Ingress processing of tunneled traffic requires the classification > of the tunnel type followed by a decap action. > > In software, once a packet is decapsulated the in_port field is changed > to a virtual port representing the tunnel type. The outer header fields > are stored as packet metadata members and may be matched by proceeding > flows. > > Openvswitch, for example, uses two flows: > 1. classification flow - setting the virtual port representing the tunnel type > For example: match on udp port 4789 actions=tnl_pop(vxlan_vport) > 2. steering flow according to outer and inner header matches > match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X > The benefits of multi-flow tables are described in [1]. > > Offloading tunnel ports > ----------------------- > Tunnel ports introduce a new stateless field that can be matched on. > Currently the rte_flow library provides an API to encap, decap and match > on tunnel headers. However, there is no rte_flow primitive to set and > match tunnel virtual ports. > > There are several possible hardware models for offloading virtual tunnel port > flows including, but not limited to, the following: > 1. Setting the virtual port on a hw register using the rte_flow_action_mark/ > rte_flow_action_tag/rte_flow_set_meta objects. > 2. Mapping a virtual port to an rte_flow group > 3. Avoiding the need to match on transient objects by merging multi-table > flows to a single rte_flow rule. > > Every approach has its pros and cons. > The preferred approach should take into account the entire system architecture > and is very often vendor specific. > > The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed > to provide a common, vendor agnostic, API for setting the virtual port value. > The helper API enables PMD implementations to return vendor specific combination of > rte_flow actions realizing the vendor's hardware model for setting a tunnel port. > Applications may append the list of actions returned from the helper function when > creating an rte_flow rule in hardware. > > Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for > multiple hardware implementations to return a list of fte_flow items. > > Miss handling > ------------- > Packets going through multiple rte_flow groups are exposed to hw misses due to > partial packet processing. In such cases, the software should continue the > packet's processing from the point where the hardware missed. > > We propose a generic rte_flow_restore structure providing the state that was > stored in hardware when the packet missed. > > Currently, the structure will provide the tunnel state of the packet that > missed, namely: > 1. The group id that missed > 2. The tunnel port that missed > 3. Tunnel information that was stored in memory (due to decap action). > In the future, we may add additional fields as more state may be stored in > the device memory (e.g. ct_state). > > Applications may query the state via a new rte_flow_get_restore_info(mbuf) API, > thus allowing a vendor specific implementation. > > API draft is provided below > > --- > diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h > index b0e4199192..49c871fc46 100644 > --- a/lib/librte_ethdev/rte_flow.h > +++ b/lib/librte_ethdev/rte_flow.h > @@ -3324,6 +3324,193 @@ int > rte_flow_get_aged_flows(uint16_t port_id, void **contexts, > uint32_t nb_contexts, struct rte_flow_error *error); > > +/* Tunnel information. */ > +__rte_experimental > +struct rte_flow_ip_tunnel_key { > + rte_be64_t tun_id; /**< Tunnel identification. */ > + union { > + struct { > + rte_be32_t src; /**< IPv4 source address. */ > + rte_be32_t dst; /**< IPv4 destination address. */ > + } ipv4; > + struct { > + uint8_t src[16]; /**< IPv6 source address. */ > + uint8_t dst[16]; /**< IPv6 destination address. */ > + } ipv6; > + } u; > + bool is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */ > + rte_be16_t tun_flags; /**< Tunnel flags. */ > + uint8_t tos; /**< TOS for IPv4, TC for IPv6. */ > + uint8_t ttl; /**< TTL for IPv4, HL for IPv6. */ > + rte_be32_t label; /**< Flow Label for IPv6. */ > + rte_be16_t tp_src; /**< Tunnel port source. */ > + rte_be16_t tp_dst; /**< Tunnel port destination. */ > +}; > + > + > +/* Tunnel has a type and the key information. */ > +__rte_experimental > +struct rte_flow_tunnel { > + /** Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN, > + * RTE_FLOW_ITEM_TYPE_NVGRE etc. */ > + enum rte_flow_item_type type; > + struct rte_flow_ip_tunnel_key tun_info; /**< Tunnel key info. */ > +}; > + > +/** > + * Indicate that the packet has a tunnel. > + */ > +#define RTE_FLOW_RESTORE_INFO_TUNNEL (1ULL << 0) > + > +/** > + * Indicate that the packet has a non decapsulated tunnel header. > + */ > +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED (1ULL << 1) > + > +/** > + * Indicate that the packet has a group_id. > + */ > +#define RTE_FLOW_RESTORE_INFO_GROUP_ID (1ULL << 2) > + > +/** > + * Restore information structure to communicate the current packet processing > + * state when some of the processing pipeline is done in hardware and should > + * continue in software. > + */ > +__rte_experimental > +struct rte_flow_restore_info { > + /** Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of > + * other fields in struct rte_flow_restore_info. > + */ > + uint64_t flags; > + uint32_t group_id; /**< Group ID. */ > + struct rte_flow_tunnel tunnel; /**< Tunnel information. */ > +}; > + > +/** > + * Allocate an array of actions to be used in rte_flow_create, to implement > + * tunnel-set for the given tunnel. > + * Sample usage: > + * actions vxlan_decap / tunnel_set(tunnel properties) / jump group 0 / end > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] tunnel > + * Tunnel properties. > + * @param[out] actions > + * Array of actions to be allocated by the PMD. This array should be > + * concatenated with the actions array provided to rte_flow_create. > + * @param[out] num_of_actions > + * Number of actions allocated. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_tunnel_set(uint16_t port_id, > + struct rte_flow_tunnel *tunnel, > + struct rte_flow_action **actions, > + uint32_t *num_of_actions, > + struct rte_flow_error *error); > + > +/** > + * Allocate an array of items to be used in rte_flow_create, to implement > + * tunnel-match for the given tunnel. > + * Sample usage: > + * pattern tunnel-match(tunnel properties) / outer-header-matches / > + * inner-header-matches / end > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] tunnel > + * Tunnel properties. > + * @param[out] items > + * Array of items to be allocated by the PMD. This array should be > + * concatenated with the items array provided to rte_flow_create. > + * @param[out] num_of_items > + * Number of items allocated. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_tunnel_match(uint16_t port_id, > + struct rte_flow_tunnel *tunnel, > + struct rte_flow_item **items, > + uint32_t *num_of_items, > + struct rte_flow_error *error); > + > +/** > + * Populate the current packet processing state, if exists, for the given mbuf. > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] m > + * Mbuf struct. > + * @param[out] info > + * Restore information. Upon success contains the HW state. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_get_restore_info(uint16_t port_id, > + struct rte_mbuf *m, > + struct rte_flow_restore_info *info, > + struct rte_flow_error *error); > + > +/** > + * Release the action array as allocated by rte_flow_tunnel_set. > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] actions > + * Array of actions to be released. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_action_release(uint16_t port_id, > + struct rte_flow_action *actions, > + struct rte_flow_error *error); > + > +/** > + * Release the item array as allocated by rte_flow_tunnel_match. > + * > + * @param port_id > + * Port identifier of Ethernet device. > + * @param[in] items > + * Array of items to be released. > + * @param[out] error > + * Perform verbose error reporting if not NULL. PMDs initialize this > + * structure in case of error only. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +__rte_experimental > +int > +rte_flow_item_release(uint16_t port_id, > + struct rte_flow_item *items, > + struct rte_flow_error *error); > + > #ifdef __cplusplus > } > #endif