From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f65.google.com (mail-pa0-f65.google.com [209.85.220.65]) by dpdk.org (Postfix) with ESMTP id 3C53A37AC for ; Wed, 3 Aug 2016 20:11:13 +0200 (CEST) Received: by mail-pa0-f65.google.com with SMTP id hh10so14472917pac.1 for ; Wed, 03 Aug 2016 11:11:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=NLClHH0BaGGoqghqnrvzHDyc8ewb8k+KtYlJb0mEXmE=; b=Kf6odQeaymY+3Xco7yL2hvcAD2tMGGRxhiBwp/6CjjhfXi4imI9TZW+Z0rR27CmTsr 858gfQictuyWpQGwY3xI0gSSkLqrN/6SYcAek4qHIWlFUkEidqLXshRk0kCi024A0RoS i+hZAHCUvvhlHp7IhyD/aZV82NEvZULsqbDvcbKUCuhvV5JH+XsjZbtShVdDDQcVWIzw R/FS5iqzmv2N43WoUUts/D1JKATmSLqsv/8nJEouA4e0V61E6dEu01pTzE5RFo+BdeXn EBIOIORM4jGK5xVBBZzu86SOnNnjh/nAAeLBdMYsrzzJTGZHKeooP0JnjPXfIytrgsKg HQmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=NLClHH0BaGGoqghqnrvzHDyc8ewb8k+KtYlJb0mEXmE=; b=Nr9qveKhYeDGPJc8rc2EoLRd5ac4igQoFXC+b+gQyCdB6/hWySjaPunZ92YoI2g3Ua hmqmTb2xaDPr/qgBqmQhvTo5OAj3nyhnPW9lCpdAAfSODiL+++PJ36cJYFsZ3BRIO08h 3sgC9o5U4qiBlZcFwuBUjnFv9cdHRUBcTJGKwah3XHKUlCMyxfKA9r2q6tT7xk6Zbtjn vyu8h5M/EqETVyfyICkwXpiuEVg1k/aKOeABlRAw4T/0wAfFUprEOyoJbVgIHnr0XAHl lh+K/BqJZbQ5JfUhJJJjvdcyfKiPDdOlm0ePhNXe4fEjOSFYj8O3SHHeZs6Yrv2/mXdd xYyQ== X-Gm-Message-State: AEkoouuZDj1aZPSUHXgzLB+CdUKvwNU0RCorQzLa/jONS6VTBnEFQzf5vjxCWhlsQA83jQ== X-Received: by 10.66.232.37 with SMTP id tl5mr117361678pac.13.1470247871983; Wed, 03 Aug 2016 11:11:11 -0700 (PDT) Received: from [192.168.1.6] ([72.168.145.53]) by smtp.googlemail.com with ESMTPSA id d185sm14244006pfd.80.2016.08.03.11.10.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Aug 2016 11:11:11 -0700 (PDT) To: Jerin Jacob , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Pablo de Lara , Olga Shern References: <20160705181646.GO7621@6wind.com> <20160711104141.GA10172@localhost.localdomain> <20160721192023.GU7621@6wind.com> <5793DD3E.3080605@gmail.com> <57A0E423.2030804@gmail.com> <20160803143049.GF3336@6wind.com> From: John Fastabend Message-ID: <57A233A9.3000006@gmail.com> Date: Wed, 3 Aug 2016 11:10:49 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160803143049.GF3336@6wind.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Aug 2016 18:11:13 -0000 [...] >>>> Considering that allowed pattern/actions combinations cannot be known in >>>> advance and would result in an unpractically large number of capabilities to >>>> expose, a method is provided to validate a given rule from the current >>>> device configuration state without actually adding it (akin to a "dry run" >>>> mode). >>> >>> Rather than have a query/validate process why did we jump over having an >>> intermediate representation of the capabilities? Here you state it is >>> unpractical but we know how to represent parse graphs and the drivers >>> could report their supported parse graph via a single query to a middle >>> layer. >>> >>> This will actually reduce the msg chatter imagine many applications at >>> init time or in boundary cases where a large set of applications come >>> online at once and start banging on the interface all at once seems less >>> than ideal. > > Well, I also thought about a kind of graph to represent capabilities but > feared the extra complexity would not be worth the trouble, thus settled on > the query idea. A couple more reasons: > > - Capabilities evolve at the same time as devices are configured. For > example, if a device supports a single RSS context, then a single rule > with a RSS action may be created. The graph would have to be rewritten > accordingly and thus queried/parsed again by the application. The graph would not help here because this is an action restriction not a parsing restriction. This is yet another query to see what actions are supported and how many of each action are supported. get_parse_graph - report the parsable fields get_actions - report the supported actions and possible num of each > > - Expressing capabilities at bit granularity (say, for a matching pattern > item mask) is complex, there is no way to simplify the representation of > capabilities without either losing information or making the graph more > complex to parse than simply providing a flow rule from an application > point of view. > I'm not sure I understand 'bit granularity' here. I would say we have devices now that have rather strange restrictions due to hardware implementation. Going forward we should get better hardware and a lot of this will go away in my view. Yes this is a long term view and doesn't help the current state. The overall point you are making is the sum off all these strange/odd bits in the hardware implementation means capabilities queries are very difficult to guarantee. On existing hardware and I think you've convinced me. Thanks ;) > With that in mind, I am not opposed to the idea, both methods could even > coexist, with the query function eventually evolving to become a front-end > to a capability graph. Just remember that I am only defining the > fundamentals for the initial implementation, i.e. how rules are expressed as > patterns/actions and the basic functions to manage them, ideally without > having to redefine them ever. > Agreed they should be able to coexist. So I can get my capabilities queries as a layer on top of the API here. >> A bit more details on possible interface for capabilities query, >> >> One way I've used to describe these graphs from driver to software >> stacks is to use a set of structures to build the graph. For fixed >> graphs this could just be *.h file for programmable hardware (typically >> coming from fw update on nics) the driver can read the parser details >> out of firmware and render the structures. > > I understand, however I think this approach may be too low-level to express > all the possible combinations. This graph would have to include possible > actions for each possible pattern, all while considering that some actions > are not possible with some patterns and that there are exclusive actions. > Really? You have hardware that has dependencies between the parser and the supported actions? Ugh... If the hardware has separate tables then we shouldn't try to have the PMD flatten those into a single table because we will have no way of knowing how to do that. (I'll respond to the other thread on this in an attempt to not get to scattered). > Also while memory consumption is not really an issue, such a graph may be > huge. It could take a while for the PMD to update it when adding a rule > impacting capabilities. Ugh... I wouldn't suggest updating the capabilities at runtime like this. But I see your point if the graph has to _guarantee_ correctness how does it represent limited number of masks and other strange hw, its unfortunate the hardware isn't more regular. You have convinced me that guaranteed correctness via capabilities is going to difficult for many types of devices although not all. [...] >> >> The cost doing all this is some additional overhead at init time. But >> building generic function over this and having a set of predefined >> uids for well-known protocols such ip, udp, tcp, etc helps. What you >> get for the cost is a few things that I think are worth it. (i) Now >> new protocols can be added/removed without recompiling DPDK (ii) a >> software package can use the capability query to verify the required >> protocols are off-loadable vs a possibly large set of test queries and >> (iii) when we do the programming of the device we can provide a tuple >> (table-uid, header-uid, field-uid, value, mask, priority) and the >> middle layer "knowing" the above graph can verify the command so >> drivers only ever see "good" commands, (iv) finally it should be >> faster in terms of cmds per second because the drivers can map the >> tuple (table, header, field, priority) to a slot efficiently vs >> parsing. >> >> IMO point (iii) and (iv) will in practice make the code much simpler >> because we can maintain common middle layer and not require parsing >> by drivers. Making each driver simpler by abstracting into common >> layer. > > Before answering your points, let's consider how applications are going to > be written. Not only devices do not support all possible pattern/actions > combinations, they also have memory constraints. Whichever method > applications use to determine if a flow rule is supported, at some point > they won't be able to add any more due to device limitations. > > Sane applications designed to work regardless of the underlying device won't > simply call abort() at this point but provide a software fallback > instead. My bet is that applications will provide one every time a rule > cannot be added for any reason, they won't even bother to query capabilities > except perhaps for a very small subset, as in "does this device support the > ID action at all?". > > Applications that really want/need to know at init time whether all the > rules they may want to possibly create are supported will spend about the > same time in both cases (query or graph). For queries, by iterating on a > list of typical rules. For a graph, by walking through it. Either way, it > won't be done later from the data path. The queries and graph suffer from the same problems you noted above if actually instantiating the rules will impact what rules are allowed. So that in both cases we may run into corner cases but it seems that this is a result of hardware deficiencies and can't be solved easily at least with software. My concern is this non-determinism will create performance issues in the network because when a flow may or may not be offloaded this can have a rather significant impact on its performance. This can make debugging network wide performance miserable when at time X I get performance X and then for whatever reason something degrades to software and at time Y I get some performance Y << X. I suspect that in general applications will bind tightly with hardware they know works. > > I think that for an application maintainer, writing or even generating a set > of typical rules will also be easier than walking through a graph. It should > also be easier on the PMD side. > I tend to think getting a graph and doing operations on graphs is easier myself but I can see this is a matter of opinion/style. > For individual points: > > (i) should be doable with the query API without recompiling DPDK as well, > the fact API/ABI breakage must be avoided being part of the requirements. If > you think there is a problem regarding this, can you provide a specific > example? What I was after you noted yourself in the doc here, "PMDs can rely on this capability to simulate support for protocols with fixed headers not directly recognized by hardware." I was trying to get variable header support with the RAW capabilities. A parse graph supports this for example the proposed query API does not. > > (ii) as described above, I think this use case won't be very common in the > wild, except for applications designed for a specific device and then they > will probably know enough about it to skip the query step entirely. If time > must be spent anyway, it will be in the control path at initialization > time. > OK. > (iii) misses the fact that capabilities evolve as flow rules get added, > there is no way for PMDs to only see "valid" rules also because device > limitations may prevent adding an otherwise valid rule. OK I agree for devices with this evolving characteristic we are lost. > > (iv) could be true if not for the same reason as (iii). The graph would have > to be verfied again before adding another rule. Note that PMDs maintainers > are encouraged to make their query function as fast as possible, they may > rely on static data internally for this as well. > OK I'm not going to get hung up on this because I think its an implementation detail and not an API problem. I would prefer to be pragmatic and see how fast the API is before I bikeshed it to death for no good reason. >>> Worse in my opinion it requires all drivers to write mostly duplicating >>> validation code where a common layer could easily do this if every >>> driver reported a common data structure representing its parse graph >>> instead. The nice fallout of this initial effort upfront is the driver >>> no longer needs to do error handling/checking/etc and can assume all >>> rules are correct and valid. It makes driver code much simpler to >>> support. And IMO at least by doing this we get some other nice benefits >>> described below. > > About duplicated code, my usual reply is that DPDK will provide internal > helper methods to assist PMDs with rules management/parsing/etc. These are > not discussed in the specification because I wanted everyone to agree to the > application side of things first, and it is difficult to know how much > assistance PMDs might need without an initial implementation. > > I think this private API will be built at the same time as support is added > to PMDs and maintainers notice generic code that can be shared. > Documentation may be written later once things start to settle down. OK lets see. > >>> Another related question is about performance. >>> >>>> Creation >>>> ~~~~~~~~ >>>> >>>> Creating a flow rule is similar to validating one, except the rule is >>>> actually created. >>>> >>>> :: >>>> >>>> struct rte_flow * >>>> rte_flow_create(uint8_t port_id, >>>> const struct rte_flow_pattern *pattern, >>>> const struct rte_flow_actions *actions); >>> >>> I gather this implies that each driver must parse the pattern/action >>> block and map this onto the hardware. How many rules per second can this >>> support? I've run into systems that expect a level of service somewhere >>> around 50k cmds per second. So bulking will help at the message level >>> but it seems like a lot of overhead to unpack the pattern/action section. > > There is indeed no guarantee on the time taken to create a flow rule, as > debated with Sugesh (see the full thread): > > http://dpdk.org/ml/archives/dev/2016-July/043958.html > > I will update the specification accordingly. > > Consider that even 50k cmds per second may not be fast enough. Applications > always need to have some kind of fallback ready, and the ability to know > whether a packet has been matched by a rule is a way to help with that. > > In any case, flow rules must be managed from the control path, the data path > must only handle consequences. Same as above lets see I think it can probably be made fast enough. > >>> One strategy I've used in other systems that worked relatively well >>> is if the query for the parse graph above returns a key for each node >>> in the graph then a single lookup can map the key to a node. Its >>> unambiguous and then these operations simply become a table lookup. >>> So to be a bit more concrete this changes the pattern structure in >>> rte_flow_create() into a tuple where the key is known >>> by the initial parse graph query. If you reserve a set of well-defined >>> key values for well known protocols like ethernet, ip, etc. then the >>> query model also works but the middle layer catches errors in this case >>> and again the driver only gets known good flows. So something like this, >>> >>> struct rte_flow_pattern { >>> uint32_t priority; >>> uint32_t key; >>> uint32_t value_length; >>> u8 *value; >>> } > > I agree that having an integer representing an entire pattern/actions combo > would be great, however how do you tell whether you want matched packets to > be duplicated to queue 6 and redirected to queue 3? This method can be used > to check if a type of rule is allowed but not whether it is actually > applicable. You still need to provide the entire pattern/actions description > to create a flow rule. In reality its almost the same as your proposal it just took me a moment to see it. The only difference I can see is adding new headers via RAW type only supports fixed length headers. To answer your question the flow_pattern would have to include a action set as well to give a list of actions to perform. I just didn't include it here. > >>> Also if we have multiple tables what do you think about adding a >>> table_id to the signature. Probably not needed in the first generation >>> but is likely useful for hardware with multiple tables so that it >>> would be, >>> >>> rte_flow_create(uint8_t port_id, uint8_t table_id, ...); > > Not sure if I understand the table ID concept, do you mean in case a device > supports entirely different sets of features depending on something? (What?) > In many devices we support multiple tables each with their own size, match fields and action set. This is useful for building routers for example along with lots of other constructs. The basic idea is smashing everything into a single table creates a Cartesian product problem. >>> Finally one other problem we've had which would be great to address >>> if we are doing a rewrite of the API is adding new protocols to >>> already deployed DPDK stacks. This is mostly a Linux distribution >>> problem where you can't easily update DPDK. >>> >>> In the prototype header linked in this document it seems to add new >>> headers requires adding a new enum in the rte_flow_item_type but there >>> is at least an attempt at a catch all here, >>> >>>> /** >>>> * Matches a string of a given length at a given offset (in bytes), >>>> * or anywhere in the payload of the current protocol layer >>>> * (including L2 header if used as the first item in the stack). >>>> * >>>> * See struct rte_flow_item_raw. >>>> */ >>>> RTE_FLOW_ITEM_TYPE_RAW, >>> >>> Actually this is a nice implementation because it works after the >>> previous item in the stack correct? > > Yes, this is correct. Great. > >>> So you can put it after "known" >>> variable length headers like IP. The limitation is it can't get past >>> undefined variable length headers. > > RTE_FLOW_ITEM_TYPE_ANY is made for that purpose. Is that what you are > looking for? > But FLOW_ITEM_TYPE_ANY skips "any" header type is my understanding if we have new variable length header in the future we will have to add a new type RTE_FLOW_ITEM_TYPE_FOO for example. The RAW type will work for fixed headers as noted above. >>> However if you use the above parse >>> graph reporting from the driver mechanism and the driver always reports >>> its largest supported graph then we don't have this issue where a new >>> hardware sku/ucode/etc added support for new headers but we have no >>> way to deploy it to existing software users without recompiling and >>> redeploying. > > I really would like to understand if you see a limitation regarding this > with the specified API, even assuming DPDK is compiled as a shared library > and thus not part of the user application. > Thanks this thread was very helpful for me at least. So the summary for me is. Capability queries can be build on top of this API no problem and for many existing devices capability queries will not be able to guarantee a flow insertion success due to hardware quirks/limitations. The two open items from me are do we need to support adding new variable length headers? And how do we handle multiple tables I'll take that up in the other thread. >>> I looked at the git repo but I only saw the header definition I guess >>> the implementation is TBD after there is enough agreement on the >>> interface? > > Precisely, I intend to update the tree and send a v2 soon (unfortunately did > not have much time these past few days to work on this). > > Now what if, instead of a seemingly complex parse graph and still in > addition to the query method, enum values were defined for PMDs to report > an array of supported items, typical patterns and actions so applications > can get a quick idea of what devices are capable of without being too > specific. Something like: > > enum rte_flow_capability { > RTE_FLOW_CAPABILITY_ITEM_ETH, > RTE_FLOW_CAPABILITY_PATTERN_ETH_IP_TCP, > RTE_FLOW_CAPABILITY_ACTION_ID, > ... > }; > > Although I'm not convinced about the usefulness of this because it would > have to be maintained separately, but that would be easier than building a > dummy flow rule for simple query purposes. I'm not sure its necessary either at first. > > The main question I have for you is, do you think the core of the specified > API is adequate enough assuming it can be extended later with new methods? > The above two items are my only opens at this point, I agree with your summary of my capabilities proposal namely it can be added. .John