From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rcdn-iport-9.cisco.com (rcdn-iport-9.cisco.com [173.37.86.80]) by dpdk.org (Postfix) with ESMTP id 0141171B3 for ; Thu, 11 Jan 2018 22:44:14 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=19174; q=dns/txt; s=iport; t=1515707055; x=1516916655; h=from:to:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=Y4YuEqdW5v+kgdgutkDFwND6k6TDcakJLNWbGd7nlPI=; b=RLEl/5E71XjcORe60Y8g5vTL2Wy10knJOuAdDiVM23yrPUGdt7FIRAUr HRyY4fibRwBY9s6F4sBjTuV4l0vnKtbTGK3gQkrivIYYININAItxNm8Uz g+DyISV/2ULzgxg1ETBtSYzHFmO0Zq/Dqz0rXSYcNavof1h6wTwKs3FD2 c=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0ADAQCp2Vda/4QNJK1dGQEBAQEBAQEBA?= =?us-ascii?q?QEBAQcBAQEBAYNBgVonB44kjmCCAnyWNIIWCoFcg18ChD4/GAEBAQEBAQEBAWs?= =?us-ascii?q?ohSMBAQEBAgE6SwQCAQgRBAEBHwkHMhQJCAEBBAESCIojCLA7ijoBAQEBAQEBA?= =?us-ascii?q?QEBAQEBAQEBAQEBAQEdhCuBdCGBV4Fpgng2gy8Eh2sFilSZEAKLf4k9lBeWdwI?= =?us-ascii?q?RGQGBOwEfOYFQbxU9giqCVByBZ3iLJoEXAQEB?= X-IronPort-AV: E=Sophos;i="5.46,346,1511827200"; d="scan'208";a="336047165" Received: from alln-core-10.cisco.com ([173.36.13.132]) by rcdn-iport-9.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jan 2018 21:44:13 +0000 Received: from XCH-RCD-006.cisco.com (xch-rcd-006.cisco.com [173.37.102.16]) by alln-core-10.cisco.com (8.14.5/8.14.5) with ESMTP id w0BLiDpY011276 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Thu, 11 Jan 2018 21:44:13 GMT Received: from xch-rcd-007.cisco.com (173.37.102.17) by XCH-RCD-006.cisco.com (173.37.102.16) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Thu, 11 Jan 2018 15:44:12 -0600 Received: from xch-rcd-007.cisco.com ([173.37.102.17]) by XCH-RCD-007.cisco.com ([173.37.102.17]) with mapi id 15.00.1320.000; Thu, 11 Jan 2018 15:44:12 -0600 From: "John Daley (johndale)" To: "Doherty, Declan" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement Thread-Index: AdN6qTnlxEdgGpD8Qpa520sLsCebsAQdW00g Date: Thu, 11 Jan 2018 21:44:12 +0000 Message-ID: <9bc42f401ab543d1806d3a60ab86dd7e@XCH-RCD-007.cisco.com> References: <345C63BAECC1AD42A2EC8C63AFFC3ADCC488E501@IRSMSX102.ger.corp.intel.com> In-Reply-To: <345C63BAECC1AD42A2EC8C63AFFC3ADCC488E501@IRSMSX102.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.19.145.149] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jan 2018 21:44:15 -0000 Hi, One comment on DECAP action and a "feature request". I'll also reply to th= e top of thread discussion separately. Thanks for the RFC Declan! Feature request associated with ENCAP action: VPP (and probably other apps) would like the ability to simply specify an i= ndependent tunnel ID as part of egress match criteria in an rte_flow rule. = Then egress packets could specify a tunnel ID and valid flag in the mbuf. = If it matched the rte_flow tunnel ID item, a simple lookup in the nic could= be done and the associated actions (particularly ENCAP) executed. The appl= ication already know the tunnel that the packet is associated with so no ne= ed to have the nic do matching on a header pattern. Plus it's possible that= packet headers alone are not enough to determine the correct encap action = (the bridge where the packet came from might be required).=20 This would require a new mbuf field to specify the tunnel ID (maybe in tx_o= ffload) and a valid flag. It would also require a new rte flow item type f= or matching the tunnel ID (like RTE_FLOW_ITEM_TYPE_META_TUNNEL_ID). Is something like this being considered by others? If not, should it be par= t of this RFC or a new one? I think this would be the 1st meta-data match c= riteria in rte_flow, but I could see others following.=20 -johnd > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Doherty, Declan > Sent: Thursday, December 21, 2017 2:21 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement >=20 > This RFC contains a proposal to add a new tunnel endpoint API to DPDK tha= t > when used in conjunction with rte_flow enables the configuration of inlin= e > data path encapsulation and decapsulation of tunnel endpoint network > overlays on accelerated IO devices. >=20 > The proposed new API would provide for the creation, destruction, and > monitoring of a tunnel endpoint in supporting hw, as well as capabilities= APIs > to allow the acceleration features to be discovered by applications. >=20 > /** Tunnel Endpoint context, opaque structure */ struct rte_tep; >=20 > enum rte_tep_type { > RTE_TEP_TYPE_VXLAN =3D 1, /**< VXLAN Protocol */ > RTE_TEP_TYPE_NVGRE, /**< NVGRE Protocol */ > ... > }; >=20 > /** Tunnel Endpoint Attributes */ > struct rte_tep_attr { > enum rte_type_type type; >=20 > /* other endpoint attributes here */ } >=20 > /** > * Create a tunnel end-point context as specified by the flow attribute an= d > pattern > * > * @param port_id Port identifier of Ethernet device. > * @param attr Flow rule attributes. > * @param pattern Pattern specification by list of rte_flow_items. > * @return > * - On success returns pointer to TEP context > * - On failure returns NULL > */ > struct rte_tep *rte_tep_create(uint16_t port_id, > struct rte_tep_attr *attr, struct rte_flow_= item pattern[]) >=20 > /** > * Destroy an existing tunnel end-point context. All the end-points contex= t > * will be destroyed, so all active flows using tep should be freed before > * destroying context. > * @param port_id Port identifier of Ethernet device. > * @param tep Tunnel endpoint context > * @return > * - On success returns 0 > * - On failure returns 1 > */ > int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep) >=20 > /** > * Get tunnel endpoint statistics > * > * @param port_id Port identifier of Ethernet device. > * @param tep Tunnel endpoint context > * @param stats Tunnel endpoint statistics > * > * @return > * - On success returns 0 > * - On failure returns 1 > */ > Int > rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep, > struct rte_tep_stats *stats) >=20 > /** > * Get ports tunnel endpoint capabilities > * > * @param port_id Port identifier of Ethernet device. > * @param capabilities Tunnel endpoint capabilities > * > * @return > * - On success returns 0 > * - On failure returns 1 > */ > int > rte_tep_capabilities_get(uint16_t port_id, > struct rte_tep_capabilities *capabilities) >=20 >=20 > To direct traffic flows to hw terminated tunnel endpoint the rte_flow API= is > enhanced to add a new flow item type. This contains a pointer to the TEP > context as well as the overlay flow id to which the traffic flow is assoc= iated. >=20 > struct rte_flow_item_tep { > struct rte_tep *tep; > uint32_t flow_id; > } >=20 > Also 2 new generic actions types are added encapsulation and decapsulatio= n. >=20 > RTE_FLOW_ACTION_TYPE_ENCAP > RTE_FLOW_ACTION_TYPE_DECAP >=20 > struct rte_flow_action_encap { > struct rte_flow_item *item; } >=20 > struct rte_flow_action_decap { > struct rte_flow_item *item; } >=20 > The following section outlines the intended usage of the new APIs and the= n > how they are combined with the existing rte_flow APIs. >=20 > Tunnel endpoints are created on logical ports which support the capabilit= y > using rte_tep_create() using a combination of TEP attributes and > rte_flow_items. In the example below a new IPv4 VxLAN endpoint is being > defined. > The attrs parameter sets the TEP type, and could be used for other possib= le > attributes. >=20 > struct rte_tep_attr attrs =3D { .type =3D RTE_TEP_TYPE_VXLAN }; >=20 > The values for the headers which make up the tunnel endpointr are then > defined using spec parameter in the rte flow items (IPv4, UDP and VxLAN i= n > this case) >=20 > struct rte_flow_item_ipv4 ipv4_item =3D { > .hdr =3D { .src_addr =3D saddr, .dst_addr =3D daddr } }; >=20 > struct rte_flow_item_udp udp_item =3D { > .hdr =3D { .src_port =3D sport, .dst_port =3D dport } }; >=20 > struct rte_flow_item_vxlan vxlan_item =3D { .flags =3D vxlan_flags }; >=20 > struct rte_flow_item pattern[] =3D { > { .type =3D RTE_FLOW_ITEM_TYPE_IPV4, .spec =3D &ipv4_item = }, > { .type =3D RTE_FLOW_ITEM_TYPE_UDP, .spec =3D &udp_item }, > { .type =3D RTE_FLOW_ITEM_TYPE_VXLAN, .spec =3D &vxlan_ite= m }, > { .type =3D RTE_FLOW_ITEM_TYPE_END } }; >=20 > The tunnel endpoint can then be create on the port. Whether or not any hw > configuration is required at this point would be hw dependent, but if not= the > context for the TEP is available for use in programming flow, so the > application is not forced to redefine the TEP parameters on each flow > addition. >=20 > struct rte_tep *tep =3D rte_tep_create(port_id, &attrs, pattern); >=20 > Once the tep context is created flows can then be directed to that endpoi= nt > for processing. The following sections will outline how the author envisa= ge > flow programming will work and also how TEP acceleration can be combined > with other accelerations. >=20 >=20 > Ingress TEP decapsulation, mark and forward to queue: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >=20 > The flows definition for TEP decapsulation actions should specify the ful= l > outer packet to be matched at a minimum. The outer packet definition > should match the tunnel definition in the tep context and the tep flow id= . > This example shows describes matching on the outer, marking the packet > with the VXLAN VNI and directing to a specified queue of the port. >=20 > Source Packet >=20 > Decapsulate Outer Hdr > / \ decap o= uter crc > / \ / = \ > +-----+------+-----+-------+-----+------+-----+---------+-----+------= -----+ > | ETH | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC | OUTER > CRC | > +-----+------+-----+-------+-----+------+-----+---------+-----+------= -----+ >=20 > /* Flow Attributes/Items Definitions */ >=20 > struct rte_flow_attr attr =3D { .ingress =3D 1 }; >=20 > struct rte_flow_item_eth eth_item =3D { .src =3D s_addr, .dst =3D d_addr,= .type =3D > ether_type }; struct rte_flow_item_tep tep_item =3D { .tep =3D tep, .id = =3D vni }; >=20 > struct rte_flow_item pattern[] =3D { > { .type =3D RTE_FLOW_ITEM_TYPE_ETH, .spec =3D ð_item }, > { .type =3D RTE_FLOW_ITEM_TYPE_TEP, .spec =3D &tep_item }= , > { .type =3D RTE_FLOW_ITEM_TYPE_END } }; >=20 > /* Flow Actions Definitions */ >=20 > struct rte_flow_action_decap decap_eth =3D { > .type =3D RTE_FLOW_ITEM_TYPE_ETH, > .item =3D { .src =3D s_addr, .dst =3D d_addr, .type =3D et= her_type } }; >=20 > struct rte_flow_action_decap decap_tep =3D { > .type =3D RTE_FLOW_ITEM_TYPE_TEP, .spec =3D &tep_item }; >=20 > struct rte_flow_action_queue queue_action =3D { .index =3D qid }; >=20 > struct rte_flow_action_port mark_action =3D { .index =3D vni }; >=20 > struct rte_flow_action actions[] =3D { > { .type =3D RTE_FLOW_ACTION_TYPE_DECAP, .conf =3D &decap_e= th }, > { .type =3D RTE_FLOW_ACTION_TYPE_DECAP, .conf =3D &decap_t= ep }, > { .type =3D RTE_FLOW_ACTION_TYPE_MARK, .conf =3D &mark_act= ion }, > { .type =3D RTE_FLOW_ACTION_TYPE_QUEUE, .conf =3D &queue_a= ction }, > { .type =3D RTE_FLOW_ACTION_TYPE_END } }; >=20 Does the conf for RTE_FLOW_ACTION_TYPE_DECAP action specify the first patt= ern to decap up to? In the above, is the 1st decap action needed? Wouldn't = the 2nd action decap up to the matching vni? On our nic, we would have to translate the decap actions into a (level, off= set) pair which requires a lot of effort. Since the packet is already match= ed perhaps 'struct rte_flow_item' is not the right thing to pass to the dec= ap action and a simple (layer, offset) could be used instead? E.g to decap = up to the inner Ethernet header of a VxLAN packet: struct rte_flow_action_decap { uint32_t level; uint8_t offset; } struct rte_flow_action_decap_tep { .level =3D RTE_PTYPE_L4_UDP, .offset =3D sizeof(struct vxlan_hdr) } Using RTE_PTYPE... is just for illustration- we might to define our own lay= ers in rte_flow.h. You could specify inner packet layers, and the offset n= eed not be restricted to the size of the header so that decap to an absolu= te offset could be allowed, e.g: struct rte_flow_action_decap_42 { .level =3D RTE_PTYPE_L2_ETHER, .offset =3D 42 } > /** VERY IMPORTANT NOTE **/ > One of the core concepts of this proposal is that actions which modify th= e > packet are defined in the order which they are to be processed. So first > decap outer ethernet header, then the outer TEP headers. > I think this is not only logical from a usability point of view, it shoul= d also > simplify the logic required in PMDs to parse the desired actions. >=20 > struct rte_flow *flow =3D > rte_flow_create(port_id, &attr, pattern, ac= tions, &err); >=20 > The processed packets are delivered to specifed queue with mbuf metadata > denoting marked flow id and with mbuf ol_flags PKT_RX_TEP_OFFLOAD set. >=20 > +-----+------+-----+---------+-----+ > | ETH | IPv4 | TCP | PAYLOAD | CRC | > +-----+------+-----+---------+-----+ >=20 >=20 > Ingress TEP decapsulation switch to port: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >=20 > This is intended to represent how a TEP decapsulation could be configured= in > a switching offload case, it makes an assumption that there is a logical = port > representation for all ports on the hw switch in the DPDK application, bu= t > similar functionality could be achieved by specifying something like a VF= ID of > the device. >=20 > Like the previous scenario the flows definition for TEP decapsulation act= ions > should specify the full outer packet to be matched at a minimum but also > define the elements of the inner match to match against including masks i= f > required. >=20 > struct rte_flow_attr attr =3D { .ingress =3D 1 }; >=20 > struct rte_flow_item pattern[] =3D { > { .type =3D RTE_FLOW_ITEM_TYPE_ETH, .spec =3D &outer_eth_i= tem }, > { .type =3D RTE_FLOW_ITEM_TYPE_TEP, .spec =3D &outer_tep_i= tem, > .mask =3D &tep_mask }, > { .type =3D RTE_FLOW_ITEM_TYPE_ETH, .spec =3D &inner_eth_i= tem, > .mask =3D ð_mask } > { .type =3D RTE_FLOW_ITEM_TYPE_IPv4, .spec =3D &inner_ipv4= _item, > .mask =3D &ipv4_mask }, > { .type =3D RTE_FLOW_ITEM_TYPE_TCP, .spec =3D &inner_tcp_i= tem, > .mask =3D &tcp_mask }, > { .type =3D RTE_FLOW_ITEM_TYPE_END } }; >=20 > /* Flow Actions Definitions */ >=20 > struct rte_flow_action_decap decap_eth =3D { > .type =3D RTE_FLOW_ITEM_TYPE_ETH, > .item =3D { .src =3D s_addr, .dst =3D d_addr, .type =3D et= her_type } }; >=20 > struct rte_flow_action_decap decap_tep =3D { > .type =3D RTE_FLOW_ITEM_TYPE_TEP, > .item =3D &outer_tep_item > }; >=20 > struct rte_flow_action_port port_action =3D { .index =3D port_id }; >=20 > struct rte_flow_action actions[] =3D { > { .type =3D RTE_FLOW_ACTION_TYPE_DECAP, .conf =3D &decap_e= th }, > { .type =3D RTE_FLOW_ACTION_TYPE_DECAP, .conf =3D &decap_t= ep }, > { .type =3D RTE_FLOW_ACTION_TYPE_PORT, .conf =3D &port_act= ion }, > { .type =3D RTE_FLOW_ACTION_TYPE_END } }; >=20 > struct rte_flow *flow =3D rte_flow_create(port_id, &attr, pattern, action= s, > &err); >=20 > This action will forward the decapsulated packets to another port of the > switch fabric but no information will on the tunnel or the fact that the = packet > was decapsulated will be passed with it, thereby enable segregation of th= e > infrastructure and >=20 >=20 > Egress TEP encapsulation: > ~~~~~~~~~~~~~~~~~~~~~~~~~ >=20 > Encapulsation TEP actions require the flow definitions for the source pac= ket > and then the actions to do on that, this example shows a ipv4/tcp packet > action. >=20 > Source Packet >=20 > +-----+------+-----+---------+-----+ > | ETH | IPv4 | TCP | PAYLOAD | CRC | > +-----+------+-----+---------+-----+ >=20 > struct rte_flow_attr attr =3D { .egress =3D 1 }; >=20 > struct rte_flow_item_eth eth_item =3D { .src =3D s_addr, .dst =3D d_addr,= .type =3D > ether_type }; struct rte_flow_item_ipv4 ipv4_item =3D { .hdr =3D { .src_a= ddr =3D > src_addr, .dst_addr =3D dst_addr } }; struct rte_flow_item_udp tcp_item = =3D { > .hdr =3D { .src_port =3D src_port, .dst_port =3D dst_port } }; >=20 > struct rte_flow_item pattern[] =3D { > { .type =3D RTE_FLOW_ITEM_TYPE_ETH, .spec =3D ð_item }, > { .type =3D RTE_FLOW_ITEM_TYPE_IPV4, .spec =3D &ipv4_item = }, > { .type =3D RTE_FLOW_ITEM_TYPE_TCP, .spec =3D &tcp_item }, > { .type =3D RTE_FLOW_ITEM_TYPE_END } }; >=20 > /* Flow Actions Definitions */ >=20 > struct rte_flow_action_encap encap_eth =3D { > .type =3D RTE_FLOW_ITEM_TYPE_ETH, > .item =3D { .src =3D s_addr, .dst =3D d_addr, .type =3D et= her_type } }; >=20 > struct rte_flow_action_encap encap_tep =3D { > .type =3D RTE_FLOW_ITEM_TYPE_TEP, > .item =3D { .tep =3D tep, .id =3D vni } }; struct rte_flow= _action_mark > port_action =3D { .index =3D port_id }; >=20 > struct rte_flow_action actions[] =3D { > { .type =3D RTE_FLOW_ACTION_TYPE_ENCAP, .conf =3D &encap_t= ep }, > { .type =3D RTE_FLOW_ACTION_TYPE_ENCAP, .conf =3D &encap_e= th }, > { .type =3D RTE_FLOW_ACTION_TYPE_PORT, .conf =3D &port_act= ion }, > { .type =3D RTE_FLOW_ACTION_TYPE_END } } struct rte_flow *= flow =3D > rte_flow_create(port_id, &attr, pattern, actions, &err); >=20 >=20 > encapsulating Outer Hdr > / \ outer= crc > / \ / = \ > +-----+------+-----+-------+-----+------+-----+---------+-----+------= -----+ > | ETH | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC | OUTER > CRC | > +-----+------+-----+-------+-----+------+-----+---------+-----+------= -----+ >=20 >=20 >=20 > Chaining multiple modification actions eg IPsec and TEP > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >=20 > For example the definition for full hw acceleration for an IPsec ESP/Tran= sport > SA encapsulated in a vxlan tunnel would look something like: >=20 > struct rte_flow_action actions[] =3D { > { .type =3D RTE_FLOW_ACTION_TYPE_ENCAP, .conf =3D &encap_t= ep }, > { .type =3D RTE_FLOW_ACTION_TYPE_SECURITY, .conf =3D &sec_= session > }, > { .type =3D RTE_FLOW_ACTION_TYPE_ENCAP, .conf =3D &encap_e= th }, > { .type =3D RTE_FLOW_ACTION_TYPE_END } } >=20 > 1. Source Packet > +-----+------+-----+---------+-----+ > | ETH | IPv4 | TCP | PAYLOAD | CRC | > +-----+------+-----+---------+-----+ >=20 > 2. First Action - Tunnel Endpoint Encapsulation >=20 > +------+-----+-------+-----+------+-----+---------+-----+ > | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC | > +------+-----+-------+-----+------+-----+---------+-----+ >=20 > 3. Second Action - IPsec ESP/Transport Security Processing >=20 > +------+-----+-----+-------+-----+------+-----+---------+-----+----= ---------+ > | IPv4 | ESP | ENCRYPTED PAYLOAD | ESP= TRAILER | > +------+-----+-----+-------+-----+------+-----+---------+-----+----= ---------+ >=20 > 4. Third Action - Outer Ethernet Encapsulation >=20 > +-----+------+-----+-----+-------+-----+------+-----+---------+-----+----= ---------+--- > --------+ > | ETH | IPv4 | ESP | ENCRYPTED PAYLOAD | ESP= TRAILER | > OUTER CRC | > +-----+------+-----+-----+-------+-----+------+-----+---------+-----+----= ---------+--- > --------+ >=20 > This example demonstrates the importance of making the interoperation of > actions to be ordered, as in the above example, a security action can be > defined on both the inner and outer packet by simply placing another > security action at the beginning of the action list. >=20 > It also demonstrates the rationale for not collapsing the Ethernet into t= he TEP > definition as when you have multiple encapsulating actions, all could > potentially be the place where the Ethernet header needs to be defined. >=20