From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50]) by dpdk.org (Postfix) with ESMTP id 402BE32A5 for ; Thu, 4 Aug 2016 15:05:37 +0200 (CEST) Received: by mail-wm0-f50.google.com with SMTP id o80so377588281wme.1 for ; Thu, 04 Aug 2016 06:05:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to; bh=xU3Ts2mZq810wFgqhoiMhSDz+yLdWxhGMdKYpHb6rOc=; b=ZX2/n6ISnwA76A4vXjMbOaukPigADvAiNtmL/f13+HsdeL8AzX3ODZbtX2G9ZY0MqP kXTwidVdaZPO9y2p66luGkxHGvUPOZVmE6VQm02YJj5Kzvs/KYRaGpzrORZB+xP/7dS3 XrCzsV8v0XR84caxv2xFX7flMDkK92a23qpbR8ZrnIbz3WRKcw/zkfjDyGXOgBKmjnHq AB/UVBKZWpJgKW4+p7ljhvV2VRinCiXKhNKj5Hxqt+8x2FeP5NkvAZDX0C5jPMtgQ0Aj g3j84cHRtawSC7+BzbHDd5waagC1yUjwewXM7x2yY/qnTljJX1nSG4jJUjVrHQr4iM+g p8Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to; bh=xU3Ts2mZq810wFgqhoiMhSDz+yLdWxhGMdKYpHb6rOc=; b=TMZnOrUGmjbA1CTlv/iLxj5pBYeQvOtGi/81zwiUPnAbI1qhEqaC3JVv008YwleCOS dOofkcso2daipPtznzINwWbUcwu+ZFmYJ8zmQWv+DbMCwrgrNAEWmcbDl2Iu4K5PDium FE4h7MdU6D6b8O9ZImIIWwxlb0DBc1JO2r1q6oAwGxsgSyBlnaPt8tAelPme/0069/1d ZLMiXcrp8Mo7eBVYltiJaV4ovjG020gpa0+QtaE63AMEoHgu3hetTxkBNekskQG8zQHw gJwKIfrdBu7TFeRLzyGgOwwCm7NYp261+Cesp+ML8CNmwP1G+Fxge9AZuu9MfCHnMpIx pdBQ== X-Gm-Message-State: AEkoouu411wuFL2AeF5+3sB6mCn9ySsRcnFJF9hpGGXDttf2qPF+SbFAAKh6dMmiAzlGpFeM X-Received: by 10.194.145.103 with SMTP id st7mr65965882wjb.61.1470315936629; Thu, 04 Aug 2016 06:05:36 -0700 (PDT) Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id pm1sm12755079wjb.40.2016.08.04.06.05.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Aug 2016 06:05:35 -0700 (PDT) Date: Thu, 4 Aug 2016 15:05:28 +0200 From: Adrien Mazarguil To: John Fastabend Cc: Jerin Jacob , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Pablo de Lara , Olga Shern Message-ID: <20160804130528.GM3336@6wind.com> Mail-Followup-To: John Fastabend , Jerin Jacob , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Pablo de Lara , Olga Shern References: <20160705181646.GO7621@6wind.com> <20160711104141.GA10172@localhost.localdomain> <20160721192023.GU7621@6wind.com> <5793DD3E.3080605@gmail.com> <57A0E423.2030804@gmail.com> <20160803143049.GF3336@6wind.com> <57A233A9.3000006@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <57A233A9.3000006@gmail.com> Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Aug 2016 13:05:37 -0000 On Wed, Aug 03, 2016 at 11:10:49AM -0700, John Fastabend wrote: > [...] > > >>>> Considering that allowed pattern/actions combinations cannot be known in > >>>> advance and would result in an unpractically large number of capabilities to > >>>> expose, a method is provided to validate a given rule from the current > >>>> device configuration state without actually adding it (akin to a "dry run" > >>>> mode). > >>> > >>> Rather than have a query/validate process why did we jump over having an > >>> intermediate representation of the capabilities? Here you state it is > >>> unpractical but we know how to represent parse graphs and the drivers > >>> could report their supported parse graph via a single query to a middle > >>> layer. > >>> > >>> This will actually reduce the msg chatter imagine many applications at > >>> init time or in boundary cases where a large set of applications come > >>> online at once and start banging on the interface all at once seems less > >>> than ideal. > > > > Well, I also thought about a kind of graph to represent capabilities but > > feared the extra complexity would not be worth the trouble, thus settled on > > the query idea. A couple more reasons: > > > > - Capabilities evolve at the same time as devices are configured. For > > example, if a device supports a single RSS context, then a single rule > > with a RSS action may be created. The graph would have to be rewritten > > accordingly and thus queried/parsed again by the application. > > The graph would not help here because this is an action > restriction not a parsing restriction. This is yet another query to see > what actions are supported and how many of each action are supported. > > get_parse_graph - report the parsable fields > get_actions - report the supported actions and possible num of each OK, now I understand your idea, in my mind the graph was indeed supposed to represent complete flow rules. > > - Expressing capabilities at bit granularity (say, for a matching pattern > > item mask) is complex, there is no way to simplify the representation of > > capabilities without either losing information or making the graph more > > complex to parse than simply providing a flow rule from an application > > point of view. > > > > I'm not sure I understand 'bit granularity' here. I would say we have > devices now that have rather strange restrictions due to hardware > implementation. Going forward we should get better hardware and a lot > of this will go away in my view. Yes this is a long term view and > doesn't help the current state. The overall point you are making is > the sum off all these strange/odd bits in the hardware implementation > means capabilities queries are very difficult to guarantee. On existing > hardware and I think you've convinced me. Thanks ;) Precisely. By "bit granularity" I meant that while it is fairly easy to report whether bit-masking is supported on protocol fields such as MAC addresses at all, devices may have restrictions on the possible bit-masks, like they may only have an effect at byte level (0xff), may not allow specific bits (broadcast) or there even may be a fixed set of bit-masks to choose from. [...] > > I understand, however I think this approach may be too low-level to express > > all the possible combinations. This graph would have to include possible > > actions for each possible pattern, all while considering that some actions > > are not possible with some patterns and that there are exclusive actions. > > > > Really? You have hardware that has dependencies between the parser and > the supported actions? Ugh... Not that I know of actually, even though we cannot rule out this possibility. Here are the possible cases I have in mind with existing HW: - Too many actions specified for a single rule, even though each of them is otherwise supported. - Performing several encap/decap actions. None are defined in the initial specification but these are already planned. - Assuming there is a single table from the application point of view (separate discussion for the other thread), some actions may only be possible with the right pattern item or meta item. Asking HW to perform tunnel decap may only be safe if the pattern specifically matches that protocol. > If the hardware has separate tables then we shouldn't try to have the > PMD flatten those into a single table because we will have no way of > knowing how to do that. (I'll respond to the other thread on this in > an attempt to not get to scattered). OK, will reply there as well. > > Also while memory consumption is not really an issue, such a graph may be > > huge. It could take a while for the PMD to update it when adding a rule > > impacting capabilities. > > Ugh... I wouldn't suggest updating the capabilities at runtime like > this. But I see your point if the graph has to _guarantee_ correctness > how does it represent limited number of masks and other strange hw, > its unfortunate the hardware isn't more regular. > > You have convinced me that guaranteed correctness via capabilities > is going to difficult for many types of devices although not all. I'll just add that these capabilities also depend on side effects of configuration performed outside the scope of this API. The way queues are (re)initialized or offloads configured may affect them. RSS configuration is the most obvious example. > [...] > > >> > >> The cost doing all this is some additional overhead at init time. But > >> building generic function over this and having a set of predefined > >> uids for well-known protocols such ip, udp, tcp, etc helps. What you > >> get for the cost is a few things that I think are worth it. (i) Now > >> new protocols can be added/removed without recompiling DPDK (ii) a > >> software package can use the capability query to verify the required > >> protocols are off-loadable vs a possibly large set of test queries and > >> (iii) when we do the programming of the device we can provide a tuple > >> (table-uid, header-uid, field-uid, value, mask, priority) and the > >> middle layer "knowing" the above graph can verify the command so > >> drivers only ever see "good" commands, (iv) finally it should be > >> faster in terms of cmds per second because the drivers can map the > >> tuple (table, header, field, priority) to a slot efficiently vs > >> parsing. > >> > >> IMO point (iii) and (iv) will in practice make the code much simpler > >> because we can maintain common middle layer and not require parsing > >> by drivers. Making each driver simpler by abstracting into common > >> layer. > > > > Before answering your points, let's consider how applications are going to > > be written. Not only devices do not support all possible pattern/actions > > combinations, they also have memory constraints. Whichever method > > applications use to determine if a flow rule is supported, at some point > > they won't be able to add any more due to device limitations. > > > > Sane applications designed to work regardless of the underlying device won't > > simply call abort() at this point but provide a software fallback > > instead. My bet is that applications will provide one every time a rule > > cannot be added for any reason, they won't even bother to query capabilities > > except perhaps for a very small subset, as in "does this device support the > > ID action at all?". > > > > Applications that really want/need to know at init time whether all the > > rules they may want to possibly create are supported will spend about the > > same time in both cases (query or graph). For queries, by iterating on a > > list of typical rules. For a graph, by walking through it. Either way, it > > won't be done later from the data path. > > The queries and graph suffer from the same problems you noted above if > actually instantiating the rules will impact what rules are allowed. So > that in both cases we may run into corner cases but it seems that this > is a result of hardware deficiencies and can't be solved easily at least > with software. > > My concern is this non-determinism will create performance issues in > the network because when a flow may or may not be offloaded this can > have a rather significant impact on its performance. This can make > debugging network wide performance miserable when at time X I get > performance X and then for whatever reason something degrades to > software and at time Y I get some performance Y << X. I suspect that > in general applications will bind tightly with hardware they know > works. You are right, performance determinism is not taken into account at all, at least not yet. It should not be an issue at the beginning as long as the API has the ability evolve later for applications that need it. Just an idea, could some kind of meta pattern items specifying time constraints for a rule address this issue? Say, how long (cycles/ms) the PMD may take to query/apply/delete the rule. If it cannot be guaranteed, the rule cannot be created. Applications could mantain statistic counters about failed rules to determine if performance issues are caused by the inability to create them. [...] > > For individual points: > > > > (i) should be doable with the query API without recompiling DPDK as well, > > the fact API/ABI breakage must be avoided being part of the requirements. If > > you think there is a problem regarding this, can you provide a specific > > example? > > What I was after you noted yourself in the doc here, > > "PMDs can rely on this capability to simulate support for protocols with > fixed headers not directly recognized by hardware." > > I was trying to get variable header support with the RAW capabilities. A > parse graph supports this for example the proposed query API does not. OK, I see, however the RAW capability itself may not be supported everywhere in patterns. What I described is that PMDs, not applications, could leverage the RAW abilities of underlying devices to implement otherwise unsupported but fixed patterns. So basically you would like to expose the ability to describe fixed protocol definitions following RAW patterns, as in: ETH / RAW / IP / UDP / ... While with such a pattern the current specification makes RAW (4.1.4.2) and IP start matching from the same offset as two different branches, in effect you cannot specify a fixed protocol following a RAW item. It is defined that way because I do not see how HW could parse higher level protocols after having given up due to a RAW pattern, however assuming the entire stack is described only using RAW patterns I guess it could be done. Such a pattern could be generated from a separate function before feeding it to rte_flow_create(), or translated by the PMD afterwards assuming a separate meta item such as RAW_END exists to signal the end of a RAW layer. Of course processing this would be more expensive. [...] > >>> One strategy I've used in other systems that worked relatively well > >>> is if the query for the parse graph above returns a key for each node > >>> in the graph then a single lookup can map the key to a node. Its > >>> unambiguous and then these operations simply become a table lookup. > >>> So to be a bit more concrete this changes the pattern structure in > >>> rte_flow_create() into a tuple where the key is known > >>> by the initial parse graph query. If you reserve a set of well-defined > >>> key values for well known protocols like ethernet, ip, etc. then the > >>> query model also works but the middle layer catches errors in this case > >>> and again the driver only gets known good flows. So something like this, > >>> > >>> struct rte_flow_pattern { > >>> uint32_t priority; > >>> uint32_t key; > >>> uint32_t value_length; > >>> u8 *value; > >>> } > > > > I agree that having an integer representing an entire pattern/actions combo > > would be great, however how do you tell whether you want matched packets to > > be duplicated to queue 6 and redirected to queue 3? This method can be used > > to check if a type of rule is allowed but not whether it is actually > > applicable. You still need to provide the entire pattern/actions description > > to create a flow rule. > > In reality its almost the same as your proposal it just took me a moment > to see it. The only difference I can see is adding new headers via RAW > type only supports fixed length headers. > > To answer your question the flow_pattern would have to include a action > set as well to give a list of actions to perform. I just didn't include > it here. OK. > >>> Also if we have multiple tables what do you think about adding a > >>> table_id to the signature. Probably not needed in the first generation > >>> but is likely useful for hardware with multiple tables so that it > >>> would be, > >>> > >>> rte_flow_create(uint8_t port_id, uint8_t table_id, ...); > > > > Not sure if I understand the table ID concept, do you mean in case a device > > supports entirely different sets of features depending on something? (What?) > > > > In many devices we support multiple tables each with their own size, > match fields and action set. This is useful for building routers for > example along with lots of other constructs. The basic idea is > smashing everything into a single table creates a Cartesian product > problem. Right, so I understand we'd need a method to express table capabilities as well as you described (a topic for the other thread then). [...] > >>> So you can put it after "known" > >>> variable length headers like IP. The limitation is it can't get past > >>> undefined variable length headers. > > > > RTE_FLOW_ITEM_TYPE_ANY is made for that purpose. Is that what you are > > looking for? > > > > But FLOW_ITEM_TYPE_ANY skips "any" header type is my understanding if > we have new variable length header in the future we will have to add > a new type RTE_FLOW_ITEM_TYPE_FOO for example. The RAW type will work > for fixed headers as noted above. I'm (slowly) starting to get it. How about the suggestion I made above for RAW items then? [...] > The two open items from me are do we need to support adding new variable > length headers? And how do we handle multiple tables I'll take that up > in the other thread. I think variable length headers may be eventually supported through pattern tricks or eventually a separate conversion layer. > >>> I looked at the git repo but I only saw the header definition I guess > >>> the implementation is TBD after there is enough agreement on the > >>> interface? > > > > Precisely, I intend to update the tree and send a v2 soon (unfortunately did > > not have much time these past few days to work on this). > > > > Now what if, instead of a seemingly complex parse graph and still in > > addition to the query method, enum values were defined for PMDs to report > > an array of supported items, typical patterns and actions so applications > > can get a quick idea of what devices are capable of without being too > > specific. Something like: > > > > enum rte_flow_capability { > > RTE_FLOW_CAPABILITY_ITEM_ETH, > > RTE_FLOW_CAPABILITY_PATTERN_ETH_IP_TCP, > > RTE_FLOW_CAPABILITY_ACTION_ID, > > ... > > }; > > > > Although I'm not convinced about the usefulness of this because it would > > have to be maintained separately, but that would be easier than building a > > dummy flow rule for simple query purposes. > > I'm not sure its necessary either at first. Then I'll discard this idea. > > The main question I have for you is, do you think the core of the specified > > API is adequate enough assuming it can be extended later with new methods? > > > > The above two items are my only opens at this point, I agree with your > summary of my capabilities proposal namely it can be added. Thanks, see you in the other thread. -- Adrien Mazarguil 6WIND