From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 884A3A0520; Thu, 2 Jul 2020 13:43:26 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 06D611D99F; Thu, 2 Jul 2020 13:43:26 +0200 (CEST) Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60080.outbound.protection.outlook.com [40.107.6.80]) by dpdk.org (Postfix) with ESMTP id 5378A1D997 for ; Thu, 2 Jul 2020 13:43:24 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SFSxfoTzYNwX+J9dVcPLS4n3ujak4t/8KxHeD6piOyVUPJzKw00CVkZJPf7nqSLW+buL6bunSVbzMlSMyJgMFS/lspVfcPnq+3mbQG3/JvK1nb1DqYkkwoTRtVbPhKDQjCGwqbA66DFgEaMwWWrzZhbNtLLci5FOvcXj/YgNL7HRFYRhFQZ5PdxpdAwQIUGZqRi0fdPqrdSGyFk74qAb+G7xw6O40EFb/Dq3cHqUCl1RAh4/9V1RkCEc3yq9T/dK+TDDksaAihNcGDa9n/6GCCIek9ODUShhO6n+1qOCIwN2oEw8c58WBY5VkwbwvI0d/Z1rbA2wJSUCG7x+/wSNaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UCqBa/Nvy1AS5pE5dpf7+TyGk0tzrgLiov8If6UDmXo=; b=UQ8mpOX4AU9086gNfcmadI+Ee2i3ViZKcoloXzIfH0q85fX9Sw8OC798BWjM+FiIbGNRwjNAPsi79kMDzJceEbh/xDqSil3Bp//s8Q49QeyDgziBUwFioN2cK91Gg9KvYYw+hpNYfH+nJi5un0eFKht2jw1KjsdgAhgF/3WzEL1F+iiXyVrCwbbSDOpGUeGJIZRwPZxcHIoxlcvSM7741am9qE4+Pm3yGrtFb5ilSJE10SSqpfm1bfexiLYIpV1C66a3fQ1mj7PswJj8uemGszS8nK37dylEuFRlUpBL9R7XY38bkJUfgLkzOackJhU+V7AxwvIAKu2gCnuWcoiTTw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com; dkim=pass header.d=mellanox.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UCqBa/Nvy1AS5pE5dpf7+TyGk0tzrgLiov8If6UDmXo=; b=OR1Gvwm/ZVw74c/agefiZFp3t7y8IqZmfReE6cYTHPNvq9syskEr7MqnQUyY5WVJ1ObQM7bgQ9qrw8bNPU/Z6WJQV/s1XrGWSbnf8D91NjIbhFk/vT8hQv2V+jY+oFt6W3puoOj0OaCmU9cdMpIRE0e6nAonCGQA3AFAJWT2ztM= Authentication-Results: broadcom.com; dkim=none (message not signed) header.d=none;broadcom.com; dmarc=none action=none header.from=mellanox.com; Received: from AM0PR0502MB3924.eurprd05.prod.outlook.com (2603:10a6:208:20::30) by AM0PR05MB4259.eurprd05.prod.outlook.com (2603:10a6:208:66::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3153.27; Thu, 2 Jul 2020 11:43:23 +0000 Received: from AM0PR0502MB3924.eurprd05.prod.outlook.com ([fe80::cd67:f25f:c3aa:f459]) by AM0PR0502MB3924.eurprd05.prod.outlook.com ([fe80::cd67:f25f:c3aa:f459%2]) with mapi id 15.20.3153.024; Thu, 2 Jul 2020 11:43:23 +0000 To: Sriharsha Basavapatna Cc: dev@dpdk.org, Thomas Monjalon , Ori Kam , Eli Britstein , Hemal Shah References: <5862610e-76cc-7783-7d66-2b2173eeb974@mellanox.com> From: Oz Shlomo Message-ID: Date: Thu, 2 Jul 2020 14:43:06 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-ClientProxiedBy: AM0PR03CA0002.eurprd03.prod.outlook.com (2603:10a6:208:14::15) To AM0PR0502MB3924.eurprd05.prod.outlook.com (2603:10a6:208:20::30) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [192.168.14.169] (79.178.203.162) by AM0PR03CA0002.eurprd03.prod.outlook.com (2603:10a6:208:14::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3153.21 via Frontend Transport; Thu, 2 Jul 2020 11:43:22 +0000 X-Originating-IP: [79.178.203.162] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 8c3c4140-2c41-4376-402f-08d81e7d1ff3 X-MS-TrafficTypeDiagnostic: AM0PR05MB4259: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-Forefront-PRVS: 0452022BE1 X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: zh7m3aTd7mOrMqqHGBO9676cr8S5EV0Uy1eXjurUTnpvhmy+tS5Hms5mn0dg4RPBZ2xcW3dkCRp03ORSaSz44VQLdbYmITlY5WXrP4dXVzapOn5eqlk8UIokQZEi1tAq/M7xifkwFJ8mntuQrAa+wb1PmfZ86ZWuueB73Q3j7UcDXsz0hu14WdnEGrnmlJucB1hP6oXBcYjmf5EMe9zgCB+k0VpD1xHzeA574o28JefywwQOpHCReBMYVDkk/A1bVBMSE9GHxZ8xbgSsvEd2+4+q9i+9ySQcVoHAJ7mB+rKCu1XL/sBqtSlSMMzC8ZIlEkDFE7cgZe2jVe3+9ZXSdnHg9/fCB7Ssx2yIPBv2ETLYu0ShklJeA9162MXNvHzNIDTXQ78V9PNaxkTSj0vxJtH01UHjduIQnh+b+NROdoIiCK5ZnsGq9yO61KrOLqKbS+A3mRFjPdYzYw2iBYdnjHydbfq3EmSLO/F+oSn2EuE7qvRoknIwMFEv217OAqvT X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM0PR0502MB3924.eurprd05.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(396003)(366004)(136003)(346002)(376002)(39860400002)(30864003)(4326008)(956004)(186003)(2616005)(83380400001)(6486002)(6666004)(16526019)(66946007)(54906003)(66556008)(36756003)(66476007)(16576012)(6916009)(8936002)(26005)(31686004)(52116002)(966005)(86362001)(316002)(2906002)(5660300002)(8676002)(31696002)(53546011)(478600001)(43740500002)(15398625002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: 08pkKwnJgx5SDwkT0TgF9uBJpWGuWmA1rPaEJRD4uAaiJAq6AAXGC8MlmyCwol8xd+jUDPTG/3132Qevd0GhJXwda/G6ihvMVOOT8sb/zG4MKd+MZSfL7URrKF0M6VfpLxOVmC3gv657E2cpwHath7riykfZX1HAG9Lam0MwbShZL2nF93q/icGddZlJloYwo16tF3WFxsMIu2XcVOJzBYhWUOh5y6NE5p6UBjoJwXT6QL/aYjYk8mzAGcBf0rPDQjaPHEPdrf+P7sjx/Rcrgmz0lDbI8yNMGVPTsNawO8T/TXEu4cp2wCxm2ntQ6K269qbqtcYEGHQjXYW9x1Bb7Y7v8/4tCzg4LINnCnaxOSboGXUI2y9ajv/7zs1KgMADBdfZIQzcvQr6Ep1u8nOyIDUyYypDlUCXxLhXeSjnkUJYEjxU6sNuFPnvmdBY8Um/UqZ9evWpK0YrwOjGU3uOxK05NLyXB5fa8y+f0RXnMmA5buSveH0EzhGb18C5J3Uv X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8c3c4140-2c41-4376-402f-08d81e7d1ff3 X-MS-Exchange-CrossTenant-AuthSource: AM0PR0502MB3924.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jul 2020 11:43:23.0334 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: HA7yO5P7veIZIUCxEUxyvr1fsGgKfEk5iGVhRbiQXoNp6v2c5WDBN0tXDvG5Jb2QokufGR/lIc6McqbA68CwIA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR05MB4259 Subject: Re: [dpdk-dev] [RFC] - Offloading tunnel ports X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 7/2/2020 2:34 PM, Sriharsha Basavapatna wrote: > On Tue, Jun 9, 2020 at 8:37 PM Oz Shlomo wrote: >> >> Rte_flow API provides the building blocks for vendor agnostic flow >> classification offloads. The rte_flow match and action primitives are fine >> grained, thus enabling DPDK applications the flexibility to offload network >> stacks and complex pipelines. >> >> Applications wishing to offload complex data structures (e.g. tunnel virtual >> ports) are required to use the rte_flow primitives, such as group, meta, mark, >> tag and others to model their high level objects. >> >> The hardware model design for high level software objects is not trivial. >> Furthermore, an optimal design is often vendor specific. >> >> The goal of this RFC is to provide applications with the hardware offload >> model for common high level software objects which is optimal in regards >> to the underlying hardware. >> >> Tunnel ports are the first of such objects. >> >> Tunnel ports >> ------------ >> Ingress processing of tunneled traffic requires the classification >> of the tunnel type followed by a decap action. >> >> In software, once a packet is decapsulated the in_port field is changed >> to a virtual port representing the tunnel type. The outer header fields >> are stored as packet metadata members and may be matched by proceeding >> flows. >> >> Openvswitch, for example, uses two flows: >> 1. classification flow - setting the virtual port representing the tunnel type >> For example: match on udp port 4789 actions=tnl_pop(vxlan_vport) >> 2. steering flow according to outer and inner header matches >> match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X >> The benefits of multi-flow tables are described in [1]. > > You probably missed to add a link to this reference [1] ? I couldn't > find it in this email. > > Thanks, > -Harsha Right, sorry about that. Here is the reference: [1] - https://www.opennetworking.org/wp-content/uploads/2014/10/TR_Multiple_Flow_Tables_and_TTPs.pdf >> >> Offloading tunnel ports >> ----------------------- >> Tunnel ports introduce a new stateless field that can be matched on. >> Currently the rte_flow library provides an API to encap, decap and match >> on tunnel headers. However, there is no rte_flow primitive to set and >> match tunnel virtual ports. >> >> There are several possible hardware models for offloading virtual tunnel port >> flows including, but not limited to, the following: >> 1. Setting the virtual port on a hw register using the rte_flow_action_mark/ >> rte_flow_action_tag/rte_flow_set_meta objects. >> 2. Mapping a virtual port to an rte_flow group >> 3. Avoiding the need to match on transient objects by merging multi-table >> flows to a single rte_flow rule. >> >> Every approach has its pros and cons. >> The preferred approach should take into account the entire system architecture >> and is very often vendor specific. >> >> The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed >> to provide a common, vendor agnostic, API for setting the virtual port value. >> The helper API enables PMD implementations to return vendor specific combination of >> rte_flow actions realizing the vendor's hardware model for setting a tunnel port. >> Applications may append the list of actions returned from the helper function when >> creating an rte_flow rule in hardware. >> >> Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for >> multiple hardware implementations to return a list of fte_flow items. >> >> Miss handling >> ------------- >> Packets going through multiple rte_flow groups are exposed to hw misses due to >> partial packet processing. In such cases, the software should continue the >> packet's processing from the point where the hardware missed. >> >> We propose a generic rte_flow_restore structure providing the state that was >> stored in hardware when the packet missed. >> >> Currently, the structure will provide the tunnel state of the packet that >> missed, namely: >> 1. The group id that missed >> 2. The tunnel port that missed >> 3. Tunnel information that was stored in memory (due to decap action). >> In the future, we may add additional fields as more state may be stored in >> the device memory (e.g. ct_state). >> >> Applications may query the state via a new rte_flow_get_restore_info(mbuf) API, >> thus allowing a vendor specific implementation. >> >> API draft is provided below >> >> --- >> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h >> index b0e4199192..49c871fc46 100644 >> --- a/lib/librte_ethdev/rte_flow.h >> +++ b/lib/librte_ethdev/rte_flow.h >> @@ -3324,6 +3324,193 @@ int >> rte_flow_get_aged_flows(uint16_t port_id, void **contexts, >> uint32_t nb_contexts, struct rte_flow_error *error); >> >> +/* Tunnel information. */ >> +__rte_experimental >> +struct rte_flow_ip_tunnel_key { >> + rte_be64_t tun_id; /**< Tunnel identification. */ >> + union { >> + struct { >> + rte_be32_t src; /**< IPv4 source address. */ >> + rte_be32_t dst; /**< IPv4 destination address. */ >> + } ipv4; >> + struct { >> + uint8_t src[16]; /**< IPv6 source address. */ >> + uint8_t dst[16]; /**< IPv6 destination address. */ >> + } ipv6; >> + } u; >> + bool is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */ >> + rte_be16_t tun_flags; /**< Tunnel flags. */ >> + uint8_t tos; /**< TOS for IPv4, TC for IPv6. */ >> + uint8_t ttl; /**< TTL for IPv4, HL for IPv6. */ >> + rte_be32_t label; /**< Flow Label for IPv6. */ >> + rte_be16_t tp_src; /**< Tunnel port source. */ >> + rte_be16_t tp_dst; /**< Tunnel port destination. */ >> +}; >> + >> + >> +/* Tunnel has a type and the key information. */ >> +__rte_experimental >> +struct rte_flow_tunnel { >> + /** Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN, >> + * RTE_FLOW_ITEM_TYPE_NVGRE etc. */ >> + enum rte_flow_item_type type; >> + struct rte_flow_ip_tunnel_key tun_info; /**< Tunnel key info. */ >> +}; >> + >> +/** >> + * Indicate that the packet has a tunnel. >> + */ >> +#define RTE_FLOW_RESTORE_INFO_TUNNEL (1ULL << 0) >> + >> +/** >> + * Indicate that the packet has a non decapsulated tunnel header. >> + */ >> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED (1ULL << 1) >> + >> +/** >> + * Indicate that the packet has a group_id. >> + */ >> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID (1ULL << 2) >> + >> +/** >> + * Restore information structure to communicate the current packet processing >> + * state when some of the processing pipeline is done in hardware and should >> + * continue in software. >> + */ >> +__rte_experimental >> +struct rte_flow_restore_info { >> + /** Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of >> + * other fields in struct rte_flow_restore_info. >> + */ >> + uint64_t flags; >> + uint32_t group_id; /**< Group ID. */ >> + struct rte_flow_tunnel tunnel; /**< Tunnel information. */ >> +}; >> + >> +/** >> + * Allocate an array of actions to be used in rte_flow_create, to implement >> + * tunnel-set for the given tunnel. >> + * Sample usage: >> + * actions vxlan_decap / tunnel_set(tunnel properties) / jump group 0 / end >> + * >> + * @param port_id >> + * Port identifier of Ethernet device. >> + * @param[in] tunnel >> + * Tunnel properties. >> + * @param[out] actions >> + * Array of actions to be allocated by the PMD. This array should be >> + * concatenated with the actions array provided to rte_flow_create. >> + * @param[out] num_of_actions >> + * Number of actions allocated. >> + * @param[out] error >> + * Perform verbose error reporting if not NULL. PMDs initialize this >> + * structure in case of error only. >> + * >> + * @return >> + * 0 on success, a negative errno value otherwise and rte_errno is set. >> + */ >> +__rte_experimental >> +int >> +rte_flow_tunnel_set(uint16_t port_id, >> + struct rte_flow_tunnel *tunnel, >> + struct rte_flow_action **actions, >> + uint32_t *num_of_actions, >> + struct rte_flow_error *error); >> + >> +/** >> + * Allocate an array of items to be used in rte_flow_create, to implement >> + * tunnel-match for the given tunnel. >> + * Sample usage: >> + * pattern tunnel-match(tunnel properties) / outer-header-matches / >> + * inner-header-matches / end >> + * >> + * @param port_id >> + * Port identifier of Ethernet device. >> + * @param[in] tunnel >> + * Tunnel properties. >> + * @param[out] items >> + * Array of items to be allocated by the PMD. This array should be >> + * concatenated with the items array provided to rte_flow_create. >> + * @param[out] num_of_items >> + * Number of items allocated. >> + * @param[out] error >> + * Perform verbose error reporting if not NULL. PMDs initialize this >> + * structure in case of error only. >> + * >> + * @return >> + * 0 on success, a negative errno value otherwise and rte_errno is set. >> + */ >> +__rte_experimental >> +int >> +rte_flow_tunnel_match(uint16_t port_id, >> + struct rte_flow_tunnel *tunnel, >> + struct rte_flow_item **items, >> + uint32_t *num_of_items, >> + struct rte_flow_error *error); >> + >> +/** >> + * Populate the current packet processing state, if exists, for the given mbuf. >> + * >> + * @param port_id >> + * Port identifier of Ethernet device. >> + * @param[in] m >> + * Mbuf struct. >> + * @param[out] info >> + * Restore information. Upon success contains the HW state. >> + * @param[out] error >> + * Perform verbose error reporting if not NULL. PMDs initialize this >> + * structure in case of error only. >> + * >> + * @return >> + * 0 on success, a negative errno value otherwise and rte_errno is set. >> + */ >> +__rte_experimental >> +int >> +rte_flow_get_restore_info(uint16_t port_id, >> + struct rte_mbuf *m, >> + struct rte_flow_restore_info *info, >> + struct rte_flow_error *error); >> + >> +/** >> + * Release the action array as allocated by rte_flow_tunnel_set. >> + * >> + * @param port_id >> + * Port identifier of Ethernet device. >> + * @param[in] actions >> + * Array of actions to be released. >> + * @param[out] error >> + * Perform verbose error reporting if not NULL. PMDs initialize this >> + * structure in case of error only. >> + * >> + * @return >> + * 0 on success, a negative errno value otherwise and rte_errno is set. >> + */ >> +__rte_experimental >> +int >> +rte_flow_action_release(uint16_t port_id, >> + struct rte_flow_action *actions, >> + struct rte_flow_error *error); >> + >> +/** >> + * Release the item array as allocated by rte_flow_tunnel_match. >> + * >> + * @param port_id >> + * Port identifier of Ethernet device. >> + * @param[in] items >> + * Array of items to be released. >> + * @param[out] error >> + * Perform verbose error reporting if not NULL. PMDs initialize this >> + * structure in case of error only. >> + * >> + * @return >> + * 0 on success, a negative errno value otherwise and rte_errno is set. >> + */ >> +__rte_experimental >> +int >> +rte_flow_item_release(uint16_t port_id, >> + struct rte_flow_item *items, >> + struct rte_flow_error *error); >> + >> #ifdef __cplusplus >> } >> #endif