From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10049.outbound.protection.outlook.com [40.107.1.49]) by dpdk.org (Postfix) with ESMTP id 9F7591B58C for ; Wed, 10 Oct 2018 15:17:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ZPp7SjvG2NKR/zIXmGBS57sq3jQjtrIpytsG5O1JnkE=; b=tFEGhPBHbutTEquL/89Mxud2fQ3oSUsTCWTpG5Qa9SxxvnIMi1hAIcyUHW5Fpj/UGltbHyQcGflmb/WRm5YtX11WF1bGLhDjXc0YG7tZA512UeKnXPrAVJfbqRjSHPwrcxnCd+TE1ncqwRVFQikYUmhIDSH+lSwPdTPUylyjqV8= Received: from AM4PR05MB3425.eurprd05.prod.outlook.com (10.171.187.142) by AM4PR05MB1650.eurprd05.prod.outlook.com (10.165.245.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1207.26; Wed, 10 Oct 2018 13:17:01 +0000 Received: from AM4PR05MB3425.eurprd05.prod.outlook.com ([fe80::fdeb:3890:f0f0:2202]) by AM4PR05MB3425.eurprd05.prod.outlook.com ([fe80::fdeb:3890:f0f0:2202%4]) with mapi id 15.20.1207.029; Wed, 10 Oct 2018 13:17:01 +0000 From: Ori Kam To: Adrien Mazarguil CC: Andrew Rybchenko , Ferruh Yigit , "stephen@networkplumber.org" , Declan Doherty , "dev@dpdk.org" , Dekel Peled , Thomas Monjalon , =?iso-8859-1?Q?N=E9lio_Laranjeiro?= , Yongseok Koh , Shahaf Shuler Thread-Topic: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions Thread-Index: AQHUXj17J8xRsVbRkUC9nhuj/0sKpaUXIzyAgADp24CAAB8FgIAAOYqAgAAG5YA= Date: Wed, 10 Oct 2018 13:17:01 +0000 Message-ID: References: <1537995646-95260-1-git-send-email-orika@mellanox.com> <1538917054-68283-1-git-send-email-orika@mellanox.com> <9760f054-bbe9-2036-dd5d-d39edd906496@intel.com> <1165fc19-c68b-f13c-e2a6-eeb3f6937922@solarflare.com> <20181010120207.GM18937@6wind.com> In-Reply-To: <20181010120207.GM18937@6wind.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=orika@mellanox.com; x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM4PR05MB1650; 6:y5LyJALWAE6hkPojfWUT7c1tk27VITTIXzp0lejVwufM76YJu49L8J0YKFbz4B0rwARkI3EV/d4rDDdPWSBnzZ6IEebU6By4+WaoOoD/7mxrXYYhkzcF/gHg0UWbytm8JCiDXLfmy44ZCAPL1HnocT6r8pJQWQNCvnvp64JUHzETS8pajBFrHWjRE7CLaJusvNHvqpgbl2tBB1FsuofSjuzk4cQby+Riao2ZOH9caK5FmqYOhj+OTlzK4wBdSpUZ3Ogf+IGBPMkif5PdVtw/OdVKAiYiq05XTTRWYfvhlGuc4KtxHsEPx6M4Egkr79i8gEgJgqDWPjuAVfq94YILFk59QdH/5aUpeFyjEe6fdhkE1Dfzh3l9J31paoy0ECECP/5otEGR3J0cSTXb1TJQxKHPRHwj6qw27/piRram8UhDFTNgPamjpZWIhrnZtf+jselPdgbgMcLKETcrhkqQmw==; 5:F5SJHqamWrXFPBJ0igZjz0MiAtVDEMMwBLpNZ4rkRPNOspM9g2WFG5Eq3dh13so/4K76wMIOUbNyO+oMMlDgoxho0scTiZCXngtsL+0mFSCNWRex51ApR2wbgpeDVOz7xAmBbPwkFiRalsaJwWJisCZv3SYtbvdJX8dpBCVKQys=; 7:M7IHrPNdfe4+5NRSLdugU0h92TIqhKmXM1qGdkipmTio7+3h4HmbDRCn6XhFYUyGcVNPlcQMzQxnTErx8QDc1Ib7JQLuP8p+vvCrkvahtX7CdAV+DfTXBnyk9jRyjUL53aZC/r9bqp2Sny9bP6MHgPZWpy+a1coFrvgsbA5kjoDe5yUG9iCVdHyy1HupujVW6dp3kVgdXqcNHbHtbxvtrHJEpN/8z/MiGmik7IpwVPEpnQ7ZwGt7XzLTrTyPyZBQ x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 5cf80743-77f1-41d2-25a2-08d62eb2aa3b x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM4PR05MB1650; x-ms-traffictypediagnostic: AM4PR05MB1650: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(244540007438412)(45079756050767)(189930954265078)(228905959029699); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3231355)(944501410)(52105095)(3002001)(93006095)(93001095)(10201501046)(6055026)(149066)(150057)(6041310)(20161123560045)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123558120)(201708071742011)(7699051)(76991055); SRVR:AM4PR05MB1650; BCL:0; PCL:0; RULEID:; SRVR:AM4PR05MB1650; x-forefront-prvs: 08213D42D3 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(979002)(39860400002)(376002)(346002)(366004)(136003)(396003)(189003)(199004)(13464003)(14444005)(68736007)(476003)(7736002)(74316002)(229853002)(4744004)(7696005)(305945005)(316002)(6506007)(66574009)(93886005)(446003)(53546011)(8676002)(53936002)(97736004)(86362001)(105586002)(575784001)(71190400001)(99286004)(71200400001)(106356001)(5250100002)(486006)(9686003)(25786009)(107886003)(2900100001)(102836004)(66066001)(45080400002)(4326008)(6306002)(26005)(3846002)(55016002)(6116002)(186003)(76176011)(11346002)(6916009)(2906002)(256004)(54906003)(561944003)(966005)(33656002)(478600001)(81166006)(5660300001)(6246003)(14454004)(54075001)(81156014)(8936002)(6436002)(41533002)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR05MB1650; H:AM4PR05MB3425.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: VllQqa457bsjN8lv2v3kBJb67zwu9Mzp+XOnu/yXX2H4qnk47zy59CvQfDzPwQR7oFPJ/Hjf9fR4uPAPW/1o249vN7aIQ3Knsv/cfuO2mrYheKT3Hj8ZTSbOIXhnOI4RGVbmw7Y/ym0fOXKgT3WJpT1CJWKnU5ajGbgdbzB7Xu/t1TPFkxapaLUbvjrLCJhPM1NjL9WnL8+RQVNYX57qh8umxGdwPFjGb626DyU3RnG8Kk11mNLCcdJNU2/MXyN09LSdseMWPUFYkfc6emHpvRpVEUekdOXnpv4+aeTuQqWZX/zzUFWMlxkz0KZd9M3qlgdsIxOZxAxZb6vxCoaqim0JSMN2kM5qD4cylqAzmdQ= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5cf80743-77f1-41d2-25a2-08d62eb2aa3b X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Oct 2018 13:17:01.3938 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR05MB1650 Subject: Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2018 13:17:04 -0000 Hi PSB. > -----Original Message----- > From: Adrien Mazarguil > Sent: Wednesday, October 10, 2018 3:02 PM > To: Ori Kam > Cc: Andrew Rybchenko ; Ferruh Yigit > ; stephen@networkplumber.org; Declan Doherty > ; dev@dpdk.org; Dekel Peled > ; Thomas Monjalon ; N=E9lio > Laranjeiro ; Yongseok Koh > ; Shahaf Shuler > Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulatio= n > actions >=20 > Sorry if I'm a bit late to the discussion, please see below. >=20 > On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote: > > > > On 10/7/2018 1:57 PM, Ori Kam wrote: > > > This series implement the generic L2/L3 tunnel encapsulation actions > > > and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actio= ns" > > > > > > Currenlty the encap/decap actions only support encapsulation > > > of VXLAN and NVGRE L2 packets (L2 encapsulation is where > > > the inner packet has a valid Ethernet header, while L3 encapsulation > > > is where the inner packet doesn't have the Ethernet header). > > > In addtion the parameter to to the encap action is a list of rte item= s, > > > this results in 2 extra translation, between the application to the a= ction > > > and from the action to the NIC. This results in negetive impact on th= e > > > insertion performance. >=20 > Not sure it's a valid concern since in this proposal, PMD is still expect= ed > to interpret the opaque buffer contents regardless for validation and to > convert it to its internal format. >=20 This is the action to take, we should assume that the pattern is valid and not parse it at all. Another issue, we have a lot of complains about the time we take=20 for validation, I know that currently we must validate the rule when creati= ng it, but this can change, why should a rule that was validate and the only chang= e is the IP dest of the encap data? virtual switch after creating the first flow are just modifying it so why f= orce them into revalidating it? (but this issue is a different topic) > Worse, it will require a packet parser to iterate over enclosed headers > instead of a list of convenient rte_flow_whatever objects. It won't be > faster without the convenience of pointers to properly aligned structures > that only contain relevant data fields. > Also in the rte_item we are not aligned so there is no difference in perfor= mance, between the two approaches, In the rte_item actually we have unused pointer= which are just a waste. Also needs to consider how application are using it. They are already have = it in raw buffer so it saves the conversation time for the application. =20 > > > Looking forward there are going to be a need to support many more tun= nel > > > encapsulations. For example MPLSoGRE, MPLSoUDP. > > > Adding the new encapsulation will result in duplication of code. > > > For example the code for handling NVGRE and VXLAN are exactly the sam= e, > > > and each new tunnel will have the same exact structure. > > > > > > This series introduce a generic encapsulation for L2 tunnel types, an= d > > > generic encapsulation for L3 tunnel types. In addtion the new > > > encapsulations commands are using raw buffer inorder to save the > > > converstion time, both for the application and the PMD. >=20 > From a usability standpoint I'm not a fan of the current interface to > perform NVGRE/VXLAN encap, however this proposal adds another layer of > opaqueness in the name of making things more generic than rte_flow alread= y > is. >=20 I'm sorry but I don't understand why it is more opaqueness, as I see it is = very simple just give the encapsulation data and that's it. For example on system that = support number of encapsulations they don't need to call to a different function just to chan= ge the buffer. > Assuming they are not to be interpreted by PMDs, maybe there's a case for > prepending arbitrary buffers to outgoing traffic and removing them from > incoming traffic. However this feature should not be named "generic tunne= l > encap/decap" as it's misleading. >=20 > Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be > more > appropriate. I think on the "pop" side, only the size would matter. >=20 Maybe the name can be change but again the application does encapsulation s= o it will be more intuitive for it. > Another problem is that you must not require actions to rely on specific > pattern content: >=20 I don't think this can be true anymore since for example what do you expect to happen when you place an action for example modify ip to packet with no = ip? This may raise issues in the NIC. Same goes for decap after the flow is in the NIC he must assume that he can= remove otherwise=20 really unexpected beaver can accord. > [...] > * > * Decapsulate outer most tunnel from matched flow. > * > * The flow pattern must have a valid tunnel header > */ > RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP, >=20 > For maximum flexibility, all actions should be usable on their own on emp= ty > pattern. On the other hand, you can document undefined behavior when > performing some action on traffic that doesn't contain something. >=20 Like I said and like it is already defined for VXLAN_enacp we must know the pattern otherwise the rule can be declined in Kernel / crash when tryin= g to decap=20 packet without outer tunnel. > Reason is that invalid traffic may have already been removed by other flo= w > rules (or whatever happens) before such a rule is reached; it's a user's > responsibility to provide such guarantee. >=20 > When parsing an action, a PMD is not supposed to look at the pattern. Act= ion > list should contain all the needed info, otherwise it means the API is ba= dly > defined. >=20 > I'm aware the above makes it tough to implement something like > RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP as defined in this series, but that's > unfortunately why I think it must not be defined like that. >=20 > My opinion is that the best generic approach to perform encap/decap with > rte_flow would use one dedicated action per protocol header to > add/remove/modify. This is the suggestion I originally made for > VXLAN/NVGRE [2] and this is one of the reasons the order of actions now > matters [3]. I agree that your approach make a lot of sense, but there are number of iss= ues with it * it is harder and takes more time from the application point of view. * it is slower when compared to the raw buffer.=20 >=20 > Remember that whatever is provided, be it an opaque buffer (like you did)= , a > separate list of items (like VXLAN/NVGRE) or the rte_flow action list its= elf > (what I'm suggesting to do), PMDs have to process it. There will be a CPU > cost. Keep in mind odd use cases that involve QinQinQinQinQ. >=20 > > > I like the idea to generalize encap/decap actions. It makes a bit har= der > > > for reader to find which encap/decap actions are supported in fact, > > > but it changes nothing for automated usage in the code - just try it > > > (as a generic way used in rte_flow). > > > > > > > Even now the user doesn't know which encapsulation is supported since > > it is PMD and sometime kernel related. On the other end it simplify add= ing > > encapsulation to specific costumers with some time just FW update. >=20 > Except for raw push/pop of uninterpreted headers, tunnel encapsulations n= ot > explicitly supported by rte_flow shouldn't be possible. Who will expect > something that isn't defined by the API to work and rely on it in their > application? I don't see it happening. >=20 Some of our customers are working with private tunnel type, and they can co= nfigure it using kernel=20 or just new FW this is a real use case. > Come on, adding new encap/decap actions to DPDK is shouldn't be such a pa= in > that the only alternative is a generic API to work around me :) >=20 Yes but like I said when a costumer asks for a ecnap and I can give it to h= im why wait for the DPDK next release? > > > Arguments about a way of encap/decap headers specification (flow item= s > > > vs raw) sound sensible, but I'm not sure about it. > > > It would be simpler if the tunnel header is added appended or removed > > > as is, but as I understand it is not true. For example, IPv4 ID will = be > > > different in incoming packets to be decapsulated and different values > > > should be used on encapsulation. Checksums will be different (but > > > offloaded in any case). > > > > > > > I'm not sure I understand your comment. > > Decapsulation is independent of encapsulation, for example if we decap > > L2 tunnel type then there is no parameter at all the NIC just removes > > the outer layers. >=20 > According to the pattern? As described above, you can't rely on that. > Pattern does not necessarily match the full stack of outer layers. >=20 > Decap action must be able to determine what to do on its own, possibly in > conjunction with other actions in the list but that's all. >=20 Decap removes the outer headers. Some tunnels don't have inner L2 and it must be added after the decap this is what L3 decap means, and the user must supply the valid L2 header. > > > Current way allows to specify which fields do not matter and which on= e > > > must match. It allows to say that, for example, VNI match is sufficie= nt > > > to decapsulate. > > > > > > > The encapsulation according to definition, is a list of headers that sh= ould > > encapsulate the packet. So I don't understand your comment about matchi= ng > > fields. The matching is based on the flow and the encapsulation is just= data > > that should be added on top of the packet. > > > > > Also arguments assume that action input is accepted as is by the HW. > > > It could be true, but could be obviously false and HW interface may > > > require parsed input (i.e. driver must parse the input buffer and ext= ract > > > required fields of packet headers). > > > > > > > You are correct there some PMD even Mellanox (for the E-Switch) require= to > parsed input > > There is no driver that knows rte_flow structure so in any case there s= hould > be > > Some translation between the encapsulation data and the NIC data. > > I agree that writing the code for translation can be harder in this app= roach, > > but the code is only written once is the insertion speed is much higher= this > way. >=20 > Avoiding code duplication enough of a reason to do something. Yes NVGRE a= nd > VXLAN encap/decap should be redefined because of that. But IMO, they shou= ld > prepend a single VXLAN or NVGRE header and be followed by other actions t= hat > in turn prepend a UDP header, an IPv4/IPv6 one, any number of VLAN header= s > and finally an Ethernet header. >=20 > > Also like I said some Virtual Switches are already store this data in r= aw buffer > > (they update only specific fields) so this will also save time for the = application > when > > creating a rule. > > > > > So, I'd say no. It should be better motivated if we change existing > > > approach (even advertised as experimental). > > > > I think the reasons I gave are very good motivation to change the appro= ach > > please also consider that there is no implementation yet that supports = the > > old approach. >=20 > Well, although the existing API made this painful, I did submit one [4] a= nd > there's an updated version from Slava [5] for mlx5. >=20 > > while we do have code that uses the new approach. >=20 > If you need the ability to prepend a raw buffer, please consider a differ= ent > name for the related actions, redefine them without reliance on specific > pattern items and leave NVGRE/VXLAN encap/decap as is for the time > being. They can deprecated anytime without ABI impact. >=20 > On the other hand if that raw buffer is to be interpreted by the PMD for > more intelligent tunnel encap/decap handling, I do not agree with the > proposed approach for usability reasons. >=20 > [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions >=20 > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fmail= s.d > pdk.org%2Farchives%2Fdev%2F2018- > April%2F096418.html&data=3D02%7C01%7Corika%40mellanox.com%7C7b9 > 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b% > 7C0%7C0%7C636747697489048905&sdata=3DprABlYixGAkdnyZ2cetpgz5%2F > vkMmiC66T3ZNE%2FewkQ4%3D&reserved=3D0 >=20 > [3] ethdev: alter behavior of flow API actions >=20 > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgit.= dpdk > .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=3D02%7C01%7C > orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971 > c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata > =3DVavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=3D0 >=20 > [4] net/mlx5: add VXLAN encap support to switch flow rules >=20 > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fmail= s.d > pdk.org%2Farchives%2Fdev%2F2018- > August%2F110598.html&data=3D02%7C01%7Corika%40mellanox.com%7C7b > 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b > %7C0%7C0%7C636747697489058915&sdata=3DlpfDWp9oBN8AFNYZ6VL5BjI > 38SDFt91iuU7pvhbC%2F0E%3D&reserved=3D0 >=20 > [5] net/mlx5: e-switch VXLAN flow validation routine >=20 > https://emea01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fmail= s.d > pdk.org%2Farchives%2Fdev%2F2018- > October%2F113782.html&data=3D02%7C01%7Corika%40mellanox.com%7C7 > b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461 > b%7C0%7C0%7C636747697489058915&sdata=3D8GCbYk6uB2ahZaHaqWX4 > OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=3D0 >=20 > -- > Adrien Mazarguil > 6WIND