From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f68.google.com (mail-pa0-f68.google.com [209.85.220.68]) by dpdk.org (Postfix) with ESMTP id 32A7D5592 for ; Tue, 9 Aug 2016 23:48:14 +0200 (CEST) Received: by mail-pa0-f68.google.com with SMTP id cf3so1560147pad.2 for ; Tue, 09 Aug 2016 14:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=do8zcgsn96ojO/aEPJryyEEYvaZ2iZfxDA8+Gv24h/o=; b=zxJsGa0P4SlZTjHgDBnIjYnWkX9bgaL7vZ/xDbe/+3JgO2fbtg00QPKRTseGDXjOVy UrRmTBkIoWKWlFE0j4LSz5iVx80nkP88ytkEdoNZkrE1LztSC08d/KagkE/9qjHZrEbS DAXVkrt0ANPcq7SLdlL/HZz77DnTQCzpEfkU2wZA7HEHvMMVUgn+RB5wZeVSCjtioaYU f8pSTQAXWem5nMj+DFjDynOtbPlBNcxniXst/BbKcoKZYFMsFgTdYYVrGHcnwlVei8Tg UcVnIaGcW7TiZj74LkCoHfQJGORyXHAiYI28qwsBuwWwuujXofy/eDq8wTscGkQZyHZb S9HA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=do8zcgsn96ojO/aEPJryyEEYvaZ2iZfxDA8+Gv24h/o=; b=ZvwzxWbdluU2BsI+w0zifS9puaf3tKnQEOzBxnN7Y243EjJMxgBKta93qD7T02YbEa WmWyhSqjENIi1/uFQUxLYWUsBkzcxCfvBqdKDQNAutbycmsb2W023YkTUTeHAoXWNb21 2WprxCR0NTRGhIW00q0IrnkGvW7o6PdEcYAC+e2HS+Rgct5Afffv/WiW1cPgT9XYgVPh sGk0B8bH8307Si+p4nqMUPkB+L1eB9H/7KcXo9IYPDJRVOmIuYOyRvbep0RD7731Lq7i G7tRRWVoHF2jZ0FgjAAes2ySL7hd63LoqY0NZWPAZ+WOFhxQLVCfp2CpkyBjB8OrioOM fNrw== X-Gm-Message-State: AEkoous9MKvv5Yu3rc2QL6RgAMfwx8pCy1KC2NZ7Of+0pCOU4Jygn4kgL25aQogrO0bu8w== X-Received: by 10.66.221.229 with SMTP id qh5mr1023479pac.66.1470779293280; Tue, 09 Aug 2016 14:48:13 -0700 (PDT) Received: from [192.168.1.6] ([72.168.144.214]) by smtp.googlemail.com with ESMTPSA id ps2sm58428588pab.10.2016.08.09.14.47.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 09 Aug 2016 14:48:12 -0700 (PDT) To: Rahul Lakkireddy , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , Pablo de Lara , Olga Shern , Kumar A S , Nirranjan Kirubaharan , Indranil Choudhury References: <20160705181646.GO7621@6wind.com> <20160721081335.GA15856@chelsio.com> <20160721170738.GT7621@6wind.com> <20160725113229.GA24036@chelsio.com> <579640E2.50702@gmail.com> <20160726100731.GA2542@chelsio.com> <20160803164410.GH3336@6wind.com> <57A241FC.30508@gmail.com> <20160804132453.GN3336@6wind.com> From: John Fastabend Message-ID: <57AA4F80.6040101@gmail.com> Date: Tue, 9 Aug 2016 14:47:44 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160804132453.GN3336@6wind.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2016 21:48:14 -0000 On 16-08-04 06:24 AM, Adrien Mazarguil wrote: > On Wed, Aug 03, 2016 at 12:11:56PM -0700, John Fastabend wrote: >> [...] >> >>>>>>>> The proposal looks very good. It satisfies most of the features >>>>>>>> supported by Chelsio NICs. We are looking for suggestions on exposing >>>>>>>> more additional features supported by Chelsio NICs via this API. >>>>>>>> >>>>>>>> Chelsio NICs have two regions in which filters can be placed - >>>>>>>> Maskfull and Maskless regions. As their names imply, maskfull region >>>>>>>> can accept masks to match a range of values; whereas, maskless region >>>>>>>> don't accept any masks and hence perform a more strict exact-matches. >>>>>>>> Filters without masks can also be placed in maskfull region. By >>>>>>>> default, maskless region have higher priority over the maskfull region. >>>>>>>> However, the priority between the two regions is configurable. >>>>>>> >>>>>>> I understand this configuration affects the entire device. Just to be clear, >>>>>>> assuming some filters are already configured, are they affected by a change >>>>>>> of region priority later? >>>>>>> >>>>>> >>>>>> Both the regions exist at the same time in the device. Each filter can >>>>>> either belong to maskfull or the maskless region. >>>>>> >>>>>> The priority is configured at time of filter creation for every >>>>>> individual filter and cannot be changed while the filter is still >>>>>> active. If priority needs to be changed for a particular filter then, >>>>>> it needs to be deleted first and re-created. >>>>> >>>>> Could you model this as two tables and add a table_id to the API? This >>>>> way user space could populate the table it chooses. We would have to add >>>>> some capabilities attributes to "learn" if tables support masks or not >>>>> though. >>>>> >>>> >>>> This approach sounds interesting. >>> >>> Now I understand the idea behind these tables, however from an application >>> point of view I still think it's better if the PMD could take care of flow >>> rules optimizations automatically. Think about it, PMDs have exactly a >>> single kind of device they know perfectly well to manage, while applications >>> want the best possible performance out of any device in the most generic >>> fashion. >> >> The problem is keeping priorities in order and/or possibly breaking >> rules apart (e.g. you have an L2 table and an L3 table) becomes very >> complex to manage at driver level. I think its easier for the >> application which has some context to do this. The application "knows" >> if its a router for example will likely be able to pack rules better >> than a PMD will. > > I don't think most applications know they are L2 or L3 routers. They may not > know more than the pattern provided to the PMD, which may indeed end at a L2 > or L3 protocol. If the application simply chooses a table based on this > information, then the PMD could have easily done the same. > But when we start thinking about encap/decap then its natural to start using this interface to implement various forwarding dataplanes. And one common way to organize a switch is into a TEP, router, switch (mac/vlan), ACL tables, etc. In fact we see this topology starting to show up in the NICs now. Further each table may be "managed" by a different entity. In which case the software will want to manage the physical and virtual networks separately. It doesn't make sense to me to require a software aggregator object to marshal the rules into a flat table then for a PMD to split them apart again. > I understand the issue is what happens when applications really want to > define e.g. L2/L3/L2 rules in this specific order (or any ordering that > cannot be satisfied by HW due to table constraints). > > By exposing tables, in such a case applications should move all rules from > L2 to a L3 table themselves (assuming this is even supported) to guarantee > ordering between rules, or fail to add them. This is basically what the PMD > could have done, possibly in a more efficient manner in my opinion. I disagree with the more efficient comment :) If the software layer is working on L2/TEP/ACL/router layers merging them just to pull them back apart is not going to be more efficient. > > Let's assume two opposite scenarios for this discussion: > > - App #1 is a command-line interface directly mapped to flow rules, which > basically gets slow random input from users depending on how they want to > configure their traffic. All rules differ considerably (L2, L3, L4, some > with incomplete bit-masks, etc). All in all, few but complex rules with > specific priorities. > Agree with this and in this case the application should be behind any network physical/virtual and not giving rules like encap/decap/etc. This application either sits on the physical function and "owns" the hardware resource or sits behind a virtual switch. > - App #2 is something like OVS, creating and deleting a large number of very > specific (without incomplete bit-masks) and mostly identical > single-priority rules automatically and very frequently. > Maybe for OVS but not all virtual switches are built with flat tables at the bottom like this. Nor is it optimal it necessarily optimal. Another application (the one I'm concerned about :) would be build as a pipeline, something like ACL -> TEP -> ACL -> VEB -> ACL If I have hardware that supports a TEP hardware block an ACL hardware block and a VEB block for example I don't want to merge my control plane into a single table. The merging in this case is just pure overhead/complexity for no gain. > Actual applications will certainly be a mix of both. > > For app #1, users would have to be aware of these tables and base their > filtering decisions according to them. Reporting tables capabilities, making > sure priorities between tables are well configured will be their > responsibility. Obviously applications may take care of these details for > them, but the end result will be the same. At some point, some combination > won't be possible. Getting there was only more complicated from > users/applications point of view. > > For app #2 if the first rule can be created then subsequent rules shouldn't > be a problem until their number reaches device limits. Selecting the proper > table to use for these can easily be done by the PMD. > But it requires rewriting my pipeline software to be useful and this I want to avoid. Using my TEP example again I'll need something in software to catch every VEB/ACL rule and append the rest of the rule creating wide rules. For my use cases its not a very user friendly API. >>>>> I don't see how the PMD can sort this out in any meaningful way and it >>>>> has to be exposed to the application that has the intelligence to 'know' >>>>> priorities between masks and non-masks filters. I'm sure you could come >>>>> up with something but it would be less than ideal in many cases I would >>>>> guess and we can't have the driver getting priorities wrong or we may >>>>> not get the correct behavior. >>> >>> It may be solved by having the PMD maintain a SW state to quickly know which >>> rules are currently created and in what state the device is so basically the >>> application doesn't have to perform this work. >>> >>> This API allows applications to express basic needs such as "redirect >>> packets matching this pattern to that queue". It must not deal with HW >>> details and limitations in my opinion. If a request cannot be satisfied, >>> then the rule cannot be created. No help from the application must be >>> expected by PMDs, otherwise it opens the door to the same issues as the >>> legacy filtering APIs. >> >> This depends on the application and what/how it wants to manage the >> device. If the application manages a pipeline with some set of tables, >> then mapping this down to a single table, which then the PMD has to >> unwind back to a multi-table topology to me seems like a waste. > > Of course, only I am not sure applications will behave differently if they > are aware of HW tables. I fear it will make things more complicated for > them and they will just stick with the most capable table all the time, but > I agree it should be easier for PMDs. > On the other side if the API doesn't match my software pipeline the complexity/overhead of merging it just to tear it apart again may prohibit use of the API in these cases. >>> [...] >>>>>> Unfortunately, our maskfull region is extremely small too compared to >>>>>> maskless region. >>>>>> >>>>> >>>>> To me this means a userspace application would want to pack it >>>>> carefully to get the full benefit. So you need some mechanism to specify >>>>> the "region" hence the above table proposal. >>>>> >>>> >>>> Right. Makes sense. >>> >>> I do not agree, applications should not be aware of it. Note this case can >>> be handled differently, so that rules do not have to be moved back and forth >>> between both tables. If the first created rule requires a maskfull entry, >>> then all subsequent rules will be entered into that table. Otherwise no >>> maskfull entry can be created as long as there is one maskless entry. When >>> either table is full, no more rules may be added. Would that work for you? >>> >> >> Its not about mask vs no mask. The devices with multiple tables that I >> have don't have this mask limitations. Its about how to optimally pack >> the rules and who implements that logic. I think its best done in the >> application where I have the context. >> >> Is there a way to omit the table field if the PMD is expected to do >> a best effort and add the table field if the user wants explicit >> control over table mgmt. This would support both models. I at least >> would like to have explicit control over rule population in my pipeline >> for use cases where I'm building a pipeline on top of the hardware. > > Yes that's a possibility. Perhaps the table ID to use could be specified as > a meta pattern item? We'd still need methods to report how many tables exist > and perhaps some way to report their limitations, these could be later > through a separate set of functions. Sure I think a meta pattern item would be fine or put it in the API call directly, something like rte_flow_create(port_id, pattern, actions); rte_flow_create_table(port_id, table_id, pattern, actions); > > [...] >>>>> For this adding a meta-data item seems simplest to me. And if you want >>>>> to make the default to be only a single port that would maybe make it >>>>> easier for existing apps to port from flow director. Then if an >>>>> application cares it can create a list of ports if needed. >>>>> >>>> >>>> Agreed. >>> >>> However although I'm not opposed to adding dedicated meta items, remember >>> applications will not automatically benefit from the increased performance >>> if a single PMD implements this feature, their maintainers will probably not >>> bother with it. >>> >> >> Unless as we noted in other thread the application is closely bound to >> its hardware for capability reasons. In this case it would make sense >> to implement. > > Sure. > > [...] >