From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f68.google.com (mail-pa0-f68.google.com [209.85.220.68]) by dpdk.org (Postfix) with ESMTP id 8579A558D for ; Tue, 9 Aug 2016 23:24:53 +0200 (CEST) Received: by mail-pa0-f68.google.com with SMTP id vy10so1530249pac.0 for ; Tue, 09 Aug 2016 14:24:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=m+jEM5YZuMvUhZ4M0px0WdgKSRVQiV0NpxZfoK4gDjA=; b=NVWVQhPUeoyvzNwjZhre1LXfCpAPl61FYF7EfVPTtCXWIEKp1v237FLsZc/L4HAD5K qbTuPvxYco3PiFzmTBxxG1AF4bvhU2eRndYHdOtI09XkthblQTuYj+q+bSfUY/65Pkzp XyXkpS9Cbo1m5bc2EWvw/mTNvttoVWSk/lqeyKtVxzDC5iNBNnoliorUpAxN8zOYMEdJ FOVXyUW4g7B6py31Wydda1iqUb42u+f4Kk3Czzq0vI/dGY5v3c1JoZg4ths6Xl4DpRt+ WI595WdVBIJMad2SrjGo5tDwEV1PrXky/GuV/ycvi7X6ttU+G/5ez96KkaRuk1hfEldu r+kQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=m+jEM5YZuMvUhZ4M0px0WdgKSRVQiV0NpxZfoK4gDjA=; b=GX0IeDXKbzuh3KWl4b9WZtARrAPK/2lownZy7yy2NfAbcphgB/Sxt+13Y8aQSggpji 9DC9Cj3QxsDyPR/UZYB8+FmnyWrimdcelkxOS/KgSeWIs5pKwVOK0TJEUa3p7yY+dEDC J4RMWNQ2nSvOJuOWEQp6v9oy5/W29ttTe7Ho9ZVj32e2qQT+Jh2VpPcgAHx2hI3Fbg8c KjMNUsW8v7Lvv9rL+1ZG9AniW+8N1q/pYt5G9Lp2FWia954eVpjcfi9Kq2vdhrXqmNM1 VFick87JTVvgeaGEz5cCTOwQbB67JknC3A8CUwBQjbtJo+CflqT+HyCFmnw1KUmsg5Vl OxTw== X-Gm-Message-State: AEkoouuu2ltXTD3fTtwuJSI99AIVN+1IoJzyBZ5UtaiJx3t1I4DtBuZxmMo2DbjNVDbF3g== X-Received: by 10.66.244.199 with SMTP id xi7mr767590pac.127.1470777892504; Tue, 09 Aug 2016 14:24:52 -0700 (PDT) Received: from [192.168.1.6] ([72.168.144.214]) by smtp.googlemail.com with ESMTPSA id c64sm58226582pfg.35.2016.08.09.14.24.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 09 Aug 2016 14:24:51 -0700 (PDT) To: Jerin Jacob , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Pablo de Lara , Olga Shern References: <20160705181646.GO7621@6wind.com> <20160711104141.GA10172@localhost.localdomain> <20160721192023.GU7621@6wind.com> <5793DD3E.3080605@gmail.com> <57A0E423.2030804@gmail.com> <20160803143049.GF3336@6wind.com> <57A233A9.3000006@gmail.com> <20160804130528.GM3336@6wind.com> From: John Fastabend Message-ID: <57AA4A0A.8060809@gmail.com> Date: Tue, 9 Aug 2016 14:24:26 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160804130528.GM3336@6wind.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2016 21:24:54 -0000 [...] >> I'm not sure I understand 'bit granularity' here. I would say we have >> devices now that have rather strange restrictions due to hardware >> implementation. Going forward we should get better hardware and a lot >> of this will go away in my view. Yes this is a long term view and >> doesn't help the current state. The overall point you are making is >> the sum off all these strange/odd bits in the hardware implementation >> means capabilities queries are very difficult to guarantee. On existing >> hardware and I think you've convinced me. Thanks ;) > > Precisely. By "bit granularity" I meant that while it is fairly easy to > report whether bit-masking is supported on protocol fields such as MAC > addresses at all, devices may have restrictions on the possible bit-masks, > like they may only have an effect at byte level (0xff), may not allow > specific bits (broadcast) or there even may be a fixed set of bit-masks to > choose from. Yep lots of strange hardware implementation voodoo here. > > [...] >>> I understand, however I think this approach may be too low-level to express >>> all the possible combinations. This graph would have to include possible >>> actions for each possible pattern, all while considering that some actions >>> are not possible with some patterns and that there are exclusive actions. >>> >> >> Really? You have hardware that has dependencies between the parser and >> the supported actions? Ugh... > > Not that I know of actually, even though we cannot rule out this > possibility. > > Here are the possible cases I have in mind with existing HW: > > - Too many actions specified for a single rule, even though each of them is > otherwise supported. Yep most hardware will have this restriction. > > - Performing several encap/decap actions. None are defined in the initial > specification but these are already planned. > Great this is certainly needed. > - Assuming there is a single table from the application point of view > (separate discussion for the other thread), some actions may only be > possible with the right pattern item or meta item. Asking HW to perform > tunnel decap may only be safe if the pattern specifically matches that > protocol. > Yep continue in other thread. >> If the hardware has separate tables then we shouldn't try to have the >> PMD flatten those into a single table because we will have no way of >> knowing how to do that. (I'll respond to the other thread on this in >> an attempt to not get to scattered). > > OK, will reply there as well. > >>> Also while memory consumption is not really an issue, such a graph may be >>> huge. It could take a while for the PMD to update it when adding a rule >>> impacting capabilities. >> >> Ugh... I wouldn't suggest updating the capabilities at runtime like >> this. But I see your point if the graph has to _guarantee_ correctness >> how does it represent limited number of masks and other strange hw, >> its unfortunate the hardware isn't more regular. >> >> You have convinced me that guaranteed correctness via capabilities >> is going to difficult for many types of devices although not all. > > I'll just add that these capabilities also depend on side effects of > configuration performed outside the scope of this API. The way queues are > (re)initialized or offloads configured may affect them. RSS configuration is > the most obvious example. > OK. [...] >> >> My concern is this non-determinism will create performance issues in >> the network because when a flow may or may not be offloaded this can >> have a rather significant impact on its performance. This can make >> debugging network wide performance miserable when at time X I get >> performance X and then for whatever reason something degrades to >> software and at time Y I get some performance Y << X. I suspect that >> in general applications will bind tightly with hardware they know >> works. > > You are right, performance determinism is not taken into account at all, at > least not yet. It should not be an issue at the beginning as long as the > API has the ability evolve later for applications that need it. > > Just an idea, could some kind of meta pattern items specifying time > constraints for a rule address this issue? Say, how long (cycles/ms) the PMD > may take to query/apply/delete the rule. If it cannot be guaranteed, the > rule cannot be created. Applications could mantain statistic counters about > failed rules to determine if performance issues are caused by the inability > to create them. It seems a bit heavy to me to have each PMD driver implementing something like this. But it would be interesting to explore probably after the basic support is implemented though. > > [...] >>> For individual points: >>> >>> (i) should be doable with the query API without recompiling DPDK as well, >>> the fact API/ABI breakage must be avoided being part of the requirements. If >>> you think there is a problem regarding this, can you provide a specific >>> example? >> >> What I was after you noted yourself in the doc here, >> >> "PMDs can rely on this capability to simulate support for protocols with >> fixed headers not directly recognized by hardware." >> >> I was trying to get variable header support with the RAW capabilities. A >> parse graph supports this for example the proposed query API does not. > > OK, I see, however the RAW capability itself may not be supported everywhere > in patterns. What I described is that PMDs, not applications, could leverage > the RAW abilities of underlying devices to implement otherwise unsupported > but fixed patterns. > > So basically you would like to expose the ability to describe fixed protocol > definitions following RAW patterns, as in: Correct for say some new tunnel metadata or something. > > ETH / RAW / IP / UDP / ... > > While with such a pattern the current specification makes RAW (4.1.4.2) and > IP start matching from the same offset as two different branches, in effect > you cannot specify a fixed protocol following a RAW item. What this means though is for every new protocol we will need to rebuild drivers and dpdk. For a shared lib DPDK environment or a Linux distribution this can be painful. It would be best to avoid this. > > It is defined that way because I do not see how HW could parse higher level > protocols after having given up due to a RAW pattern, however assuming the > entire stack is described only using RAW patterns I guess it could be done. > > Such a pattern could be generated from a separate function before feeding it > to rte_flow_create(), or translated by the PMD afterwards assuming a > separate meta item such as RAW_END exists to signal the end of a RAW layer. > Of course processing this would be more expensive. > Or the supported parse graph could be fetched from the hardware with the values for each protocol so that the programming interface is the same. The well known protocols could keep the 'enum values' in the header rte_flow_item_type enum so that users would not be required to do the parse graph but for new or experimental protocols we could query the parse graph and get the programming pattern matching id for them. The normal flow would be unchanged but we don't get stuck upgrading everything to add our own protocol. So the flow would be, rte_get_parse_graph(graph); flow_item_proto = is_my_proto_supported(graph); pattern = build_flow_match(flow_item_proto, value, mask); action = build_action(); rte_flow_create(my_port, pattern, action); The only change to the API proposed to support this would be to allow unsupported RTE_FLOW_ values to be pushed to the hardware and define a range of values that are reserved for use by the parse graph discover. This would not have to be any more expensive. [...] >>>>> So you can put it after "known" >>>>> variable length headers like IP. The limitation is it can't get past >>>>> undefined variable length headers. >>> >>> RTE_FLOW_ITEM_TYPE_ANY is made for that purpose. Is that what you are >>> looking for? >>> >> >> But FLOW_ITEM_TYPE_ANY skips "any" header type is my understanding if >> we have new variable length header in the future we will have to add >> a new type RTE_FLOW_ITEM_TYPE_FOO for example. The RAW type will work >> for fixed headers as noted above. > > I'm (slowly) starting to get it. How about the suggestion I made above for > RAW items then? hmm for performance reasons building an entire graph up using RAW items seems to be a bit heavy. Another alternative to the above parse graph notion would be to allow users to add RAW node definitions at init time and have the PMD give a ID back for those. Then the new node could be used just like any other RTE_FLOW_ITEM_TYPE in a pattern. Something like, ret_flow_item_type_foo = rte_create_raw_node(foo_raw_pattern) ret_flow_item_type_bar = rte_create_raw_node(bar_raw_pattern) then allow ret_flow_item_type_{foo|bar} to be used in subsequent pattern matching items. And if the hardware can not support this return an error from the initial rte_create_raw_node() API call. Do any either of those proposals sound like reasonable extensions? > > [...] >> The two open items from me are do we need to support adding new variable >> length headers? And how do we handle multiple tables I'll take that up >> in the other thread. > > I think variable length headers may be eventually supported through pattern > tricks or eventually a separate conversion layer. > A parse graph notion would support this naturally though without pattern tricks hence my above suggestions. Also in the current scheme how would I match an ipv6 option or specific nsh option or mpls tag? >>>>> I looked at the git repo but I only saw the header definition I guess >>>>> the implementation is TBD after there is enough agreement on the >>>>> interface? >>> >>> Precisely, I intend to update the tree and send a v2 soon (unfortunately did >>> not have much time these past few days to work on this). >>> >>> Now what if, instead of a seemingly complex parse graph and still in >>> addition to the query method, enum values were defined for PMDs to report >>> an array of supported items, typical patterns and actions so applications >>> can get a quick idea of what devices are capable of without being too >>> specific. Something like: >>> >>> enum rte_flow_capability { >>> RTE_FLOW_CAPABILITY_ITEM_ETH, >>> RTE_FLOW_CAPABILITY_PATTERN_ETH_IP_TCP, >>> RTE_FLOW_CAPABILITY_ACTION_ID, >>> ... >>> }; >>> >>> Although I'm not convinced about the usefulness of this because it would >>> have to be maintained separately, but that would be easier than building a >>> dummy flow rule for simple query purposes. >> >> I'm not sure its necessary either at first. > > Then I'll discard this idea. > >>> The main question I have for you is, do you think the core of the specified >>> API is adequate enough assuming it can be extended later with new methods? >>> >> >> The above two items are my only opens at this point, I agree with your >> summary of my capabilities proposal namely it can be added. > > Thanks, see you in the other thread. > Thanks, John