From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) by dpdk.org (Postfix) with ESMTP id 0710E1B438 for ; Wed, 10 Oct 2018 18:10:34 +0200 (CEST) Received: by mail-wm1-f66.google.com with SMTP id y140-v6so13781932wmd.0 for ; Wed, 10 Oct 2018 09:10:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=PT+GnyMy0WMqFBajcW8LmCAG+HevCSqSjft9CcbOrV4=; b=g0YyD7tZTU0inHtmVEwhPzgu5WdjfzSJ/WmOn4N8iv/37bXPRWJIwf7KszJO7pyoT2 B1mOH/L0XkxFWffrW7Bl3MxSS69m7vcZCajZjMunmMjoo0Xx7wyyohA5mmZMAKnqjOGK E5OT4u7wx+Rj4AHymBNQ+bDRgrhIFwQ1BR9OTnHCFT0gFrbOCCe+i0nbrDod4NFxXxPC WoEPTB8faHXiWNXWeigChEzX9iE3ZfYsTUbGHZ0DHGlYKm+GBSRHB//CVwak0K+lzAoL /P+RufsBAKGYloTqoM9+b2+lqqd4uuG8OY8OMyuLD8B1xY0r34sP0ASOeCd8VSNyKMaO a90g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=PT+GnyMy0WMqFBajcW8LmCAG+HevCSqSjft9CcbOrV4=; b=p41VfMMDKNXyOz76IantT8loXT8gAo1k5D504LPywjanU/zN92UwBdu4T4gYzmrRA/ 856y48zzjfNq1FHXmVuWiMACj9+4PTAYCUpJbxP40p0Q8ar6pJjUkTRwUb5RvOX6ktR0 wgvRzReSU6jQZr1IfmfGJY3wTYyWZTGaF6MEFUMlyADhxjhXiZuHOKiRaZBMkXoXudI3 00f2URxKnRMHQK3R4PL+g2S2+FuBYhV9U7JS+ZuXCgpI1LSyfHW2F/GbM+wopshBYo7G xXZWgG5XpqwWYLPc5Mf64Eo/dNeYqDC9u1zdOYCN4BrxU5kpXeU5vzDzpa6W6SPNVLvR jJtw== X-Gm-Message-State: ABuFfogiDyfhxBrjl/ZV+L+lVhk/JjAjQAt/Q4gRrw8Upm9JU2jHip+0 xVHGRoe+1atjamrnYnh0c57Rlw== X-Google-Smtp-Source: ACcGV635GqaAoRaVlbbqjY6M3EnJ2+S56HrvJnhzJLgey85cFdtSrQEVDlMKAE4qEGTdB+OTRqrYzw== X-Received: by 2002:a1c:5fc2:: with SMTP id t185-v6mr1457803wmb.12.1539187834445; Wed, 10 Oct 2018 09:10:34 -0700 (PDT) Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id q77-v6sm9010110wmd.33.2018.10.10.09.10.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Oct 2018 09:10:33 -0700 (PDT) Date: Wed, 10 Oct 2018 18:10:15 +0200 From: Adrien Mazarguil To: Ori Kam Cc: Andrew Rybchenko , Ferruh Yigit , "stephen@networkplumber.org" , Declan Doherty , "dev@dpdk.org" , Dekel Peled , Thomas Monjalon , =?utf-8?B?TsOpbGlv?= Laranjeiro , Yongseok Koh , Shahaf Shuler Message-ID: <20181010161015.GN18937@6wind.com> References: <1537995646-95260-1-git-send-email-orika@mellanox.com> <1538917054-68283-1-git-send-email-orika@mellanox.com> <9760f054-bbe9-2036-dd5d-d39edd906496@intel.com> <1165fc19-c68b-f13c-e2a6-eeb3f6937922@solarflare.com> <20181010120207.GM18937@6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2018 16:10:35 -0000 On Wed, Oct 10, 2018 at 01:17:01PM +0000, Ori Kam wrote: > > -----Original Message----- > > From: Adrien Mazarguil > > On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote: > > > > > > On 10/7/2018 1:57 PM, Ori Kam wrote: > > > > In addtion the parameter to to the encap action is a list of rte items, > > > > this results in 2 extra translation, between the application to the action > > > > and from the action to the NIC. This results in negetive impact on the > > > > insertion performance. > > > > Not sure it's a valid concern since in this proposal, PMD is still expected > > to interpret the opaque buffer contents regardless for validation and to > > convert it to its internal format. > > > This is the action to take, we should assume > that the pattern is valid and not parse it at all. > Another issue, we have a lot of complains about the time we take > for validation, I know that currently we must validate the rule when creating it, > but this can change, why should a rule that was validate and the only change > is the IP dest of the encap data? > virtual switch after creating the first flow are just modifying it so why force > them into revalidating it? (but this issue is a different topic) Did you measure what proportion of time is spent on validation when creating a flow rule? Based on past experience with mlx4/mlx5, creation used to involve a number of expensive system calls while validation was basically a single logic loop checking individual items/actions while performing conversion to HW format (mandatory for creation). Context switches related to kernel involvement are the true performance killers. I'm not sure this is a valid argument in favor of this approach since flow rule validation still needs to happen regardless. By the way, applications are not supposed to call rte_flow_validate() before rte_flow_create(). The former can be helpful in some cases (e.g. to get a rough idea of PMD capabilities during initialization) but they should in practice only rely on rte_flow_create(), then fall back to software processing if that fails. > > Worse, it will require a packet parser to iterate over enclosed headers > > instead of a list of convenient rte_flow_whatever objects. It won't be > > faster without the convenience of pointers to properly aligned structures > > that only contain relevant data fields. > > > Also in the rte_item we are not aligned so there is no difference in performance, > between the two approaches, In the rte_item actually we have unused pointer which > are just a waste. Regarding unused pointers: right, VXLAN/NVGRE encap actions shouldn't have relied on _pattern item_ structures, the room for their "last" pointer is arguably wasted. On the other hand, the "mask" pointer allows masking relevant fields that matter to the application (e.g. source/destination addresses as opposed to IPv4 length, version and other irrelevant fields for encap). Not sure why you think it's not aligned. We're comparing an array of rte_flow_item objects with raw packet data. The latter requires interpretation of each protocol header to jump to the next offset. This is more complex on both sides: to build such a buffer for the application, then to have it processed by the PMD. > Also needs to consider how application are using it. They are already have it in raw buffer > so it saves the conversation time for the application. I don't think so. Applications typically know where some traffic is supposed to go and what VNI it should use. They don't have a prefabricated packet handy to prepend to outgoing traffic. If that was the case they'd most likely do so themselves through a extra packet segment and not bother with PMD offloads. > > From a usability standpoint I'm not a fan of the current interface to > > perform NVGRE/VXLAN encap, however this proposal adds another layer of > > opaqueness in the name of making things more generic than rte_flow already > > is. > > > I'm sorry but I don't understand why it is more opaqueness, as I see it is very simple > just give the encapsulation data and that's it. For example on system that support number of > encapsulations they don't need to call to a different function just to change the buffer. I'm saying it's opaque from an API standpoint if you expect the PMD to interpret that buffer's contents in order to prepend it in a smart way. Since this generic encap does not support masks, there is no way for an application to at least tell a PMD what data matters and what doesn't in the provided buffer. This means invalid checksums, lengths and so on must be sent as is to the wire. What's the use case for such a behavior? > > Assuming they are not to be interpreted by PMDs, maybe there's a case for > > prepending arbitrary buffers to outgoing traffic and removing them from > > incoming traffic. However this feature should not be named "generic tunnel > > encap/decap" as it's misleading. > > > > Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be > > more > > appropriate. I think on the "pop" side, only the size would matter. > > > Maybe the name can be change but again the application does encapsulation so it will > be more intuitive for it. > > > Another problem is that you must not require actions to rely on specific > > pattern content: > > > I don't think this can be true anymore since for example what do you expect > to happen when you place an action for example modify ip to packet with no ip? > > This may raise issues in the NIC. > Same goes for decap after the flow is in the NIC he must assume that he can remove otherwise > really unexpected beaver can accord. Right, that's why it must be documented as undefined behavior. The API is not supposed to enforce the relationship. A PMD may require the presence of some pattern item in order to perform some action, but this is a PMD limitation, not a limitation of the API itself. > For maximum flexibility, all actions should be usable on their own on empty > > pattern. On the other hand, you can document undefined behavior when > > performing some action on traffic that doesn't contain something. > > > > Like I said and like it is already defined for VXLAN_enacp we must know > the pattern otherwise the rule can be declined in Kernel / crash when trying to decap > packet without outer tunnel. Right, PMD limitation then. You are free to document it in the PMD. > > My opinion is that the best generic approach to perform encap/decap with > > rte_flow would use one dedicated action per protocol header to > > add/remove/modify. This is the suggestion I originally made for > > VXLAN/NVGRE [2] and this is one of the reasons the order of actions now > > matters [3]. > > I agree that your approach make a lot of sense, but there are number of issues with it > * it is harder and takes more time from the application point of view. > * it is slower when compared to the raw buffer. I'm convinced of the opposite :) We could try to implement your raw buffer approach as well as mine in testpmd (one action per layer, not the current VXLAN/NVGRE encap mess mind you) in order to determine which is the most convenient on the application side. > > Except for raw push/pop of uninterpreted headers, tunnel encapsulations not > > explicitly supported by rte_flow shouldn't be possible. Who will expect > > something that isn't defined by the API to work and rely on it in their > > application? I don't see it happening. > > > Some of our customers are working with private tunnel type, and they can configure it using kernel > or just new FW this is a real use case. You can already use negative types to quickly address HW and customer-specific needs by the way. Could this [6] perhaps address the issue? PMDs can expose public APIs. You could devise one that spits new negative item/action types based on some data, to be subsequently used by flow rules with that PMD only. > > Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain > > that the only alternative is a generic API to work around me :) > > > > Yes but like I said when a costumer asks for a ecnap and I can give it to him why wait for the DPDK next release? I don't know, is rte_flow held to a special standard compared to other DPDK features in this regard? Engineering patches can always be provided, backported and whatnot. Customer applications will have to be modified and recompiled to benefit from any new FW capabilities regardless, it's extremely unlikely to be just a matter of installing a new FW image. > > Pattern does not necessarily match the full stack of outer layers. > > > > Decap action must be able to determine what to do on its own, possibly in > > conjunction with other actions in the list but that's all. > > > Decap removes the outer headers. > Some tunnels don't have inner L2 and it must be added after the decap > this is what L3 decap means, and the user must supply the valid L2 header. My point is that any data required to perform decap must be provided by the decap action itself, not through a pattern item, whose only purpose is to filter traffic and may not be present. Precisely what you did for L3 decap. > > > I think the reasons I gave are very good motivation to change the approach > > > please also consider that there is no implementation yet that supports the > > > old approach. > > > > Well, although the existing API made this painful, I did submit one [4] and > > there's an updated version from Slava [5] for mlx5. > > > > > while we do have code that uses the new approach. > > > > If you need the ability to prepend a raw buffer, please consider a different > > name for the related actions, redefine them without reliance on specific > > pattern items and leave NVGRE/VXLAN encap/decap as is for the time > > being. They can deprecated anytime without ABI impact. > > > > On the other hand if that raw buffer is to be interpreted by the PMD for > > more intelligent tunnel encap/decap handling, I do not agree with the > > proposed approach for usability reasons. I'm still not convinced by your approach. If these new actions *must* be included unmodified right now to prevent some customer cataclysm, then fine as an experiment but please leave VXLAN/NVGRE encaps alone for the time being. > > [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > > pdk.org%2Farchives%2Fdev%2F2018- > > April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9 > > 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b% > > 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F > > vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0 > > > > [3] ethdev: alter behavior of flow API actions > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk > > .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C > > orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971 > > c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata > > =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0 > > > > [4] net/mlx5: add VXLAN encap support to switch flow rules > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > > pdk.org%2Farchives%2Fdev%2F2018- > > August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b > > 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b > > %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI > > 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0 > > > > [5] net/mlx5: e-switch VXLAN flow validation routine > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d > > pdk.org%2Farchives%2Fdev%2F2018- > > October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7 > > b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461 > > b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4 > > OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0 [6] "9.2.9. Negative types" http://doc.dpdk.org/guides-18.08/prog_guide/rte_flow.html#negative-types On an unrelated note, is there a way to prevent Outlook from mangling URLs on your side? (those emea01.safelinks things) -- Adrien Mazarguil 6WIND