From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f196.google.com (mail-pf0-f196.google.com [209.85.192.196]) by dpdk.org (Postfix) with ESMTP id 526E837A4 for ; Wed, 3 Aug 2016 21:12:21 +0200 (CEST) Received: by mail-pf0-f196.google.com with SMTP id i6so15148831pfe.0 for ; Wed, 03 Aug 2016 12:12:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=rC1Y0xgFtsmO+d0DV5ZeBscJG3dzgx8h20wG3SoNzRU=; b=wsd3D4y7fFbth2T+/02OGWt3zZZpd1hEYO7zjThJpf7jrD/rhgDrJLWpJktzJYMhuf LkV6EI83xhXHkdGcnKBrUKroGZclEXdKzK2+z05ujTU3QST3HRg/RogvHkvHubYQiBIb Ygj38kZzt57VF50gOXpsvcd1CYl7jqr0UKLwXcNKJtPf9ZTOy67SnDZKgOVE+s634iDt istltdW2T24ihun+9jmTHxVDRTVxfqRzqPSMO9H7Q1h0AUttUsUeY16YXhSr7PLHelAL 9eqGSphCecPIFwXAjwlE2c4vD0fbgJDlFNs1uH/6u4cuQnq90APk3ergXEMwitNX4hNu AWww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=rC1Y0xgFtsmO+d0DV5ZeBscJG3dzgx8h20wG3SoNzRU=; b=hlQNWgzaOswi0RWey2laQt7gZNKzNpbrMnWuXDgPBS8/AEqBHAs1Yh4hLsSYCJoNMp eFpAeY4Qem3Sz6jWZMjKeEC6niExWoppwihZGOKyx2n0Q2xYc5sRCWBJkfTfMizds7VK cUrHb1Y9QyJ3RKlovO+7yjKrSLTfHtLO/pe93JazLBTGLtH0hTzbgi8PuN25txY/n2Cx AlR+kuqpGst+XZmX4aHCJiw5QW5p4OvFbQhjJu+lZOXzJkO94Vpu+3aJXkKrkzdIFHG2 GUvHHcvftsjh+ipsdF6WmooZ0LWwH2EzxkhVUJcyf8f3uozjGgX2ezuztNr7AZZ72aZT Esfw== X-Gm-Message-State: AEkooutjn202QbAvVB2MVF1/l2ehHEP6+jmtEsQNGnGW65oCecn3TyOTob4q8vU/xHKubw== X-Received: by 10.98.15.145 with SMTP id 17mr118543674pfp.40.1470251540372; Wed, 03 Aug 2016 12:12:20 -0700 (PDT) Received: from [192.168.1.6] ([72.168.145.53]) by smtp.googlemail.com with ESMTPSA id g21sm14448963pfj.88.2016.08.03.12.12.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Aug 2016 12:12:19 -0700 (PDT) To: Rahul Lakkireddy , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , Pablo de Lara , Olga Shern , Kumar A S , Nirranjan Kirubaharan , Indranil Choudhury References: <20160705181646.GO7621@6wind.com> <20160721081335.GA15856@chelsio.com> <20160721170738.GT7621@6wind.com> <20160725113229.GA24036@chelsio.com> <579640E2.50702@gmail.com> <20160726100731.GA2542@chelsio.com> <20160803164410.GH3336@6wind.com> From: John Fastabend Message-ID: <57A241FC.30508@gmail.com> Date: Wed, 3 Aug 2016 12:11:56 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160803164410.GH3336@6wind.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Aug 2016 19:12:21 -0000 [...] >>>>>> The proposal looks very good. It satisfies most of the features >>>>>> supported by Chelsio NICs. We are looking for suggestions on exposing >>>>>> more additional features supported by Chelsio NICs via this API. >>>>>> >>>>>> Chelsio NICs have two regions in which filters can be placed - >>>>>> Maskfull and Maskless regions. As their names imply, maskfull region >>>>>> can accept masks to match a range of values; whereas, maskless region >>>>>> don't accept any masks and hence perform a more strict exact-matches. >>>>>> Filters without masks can also be placed in maskfull region. By >>>>>> default, maskless region have higher priority over the maskfull region. >>>>>> However, the priority between the two regions is configurable. >>>>> >>>>> I understand this configuration affects the entire device. Just to be clear, >>>>> assuming some filters are already configured, are they affected by a change >>>>> of region priority later? >>>>> >>>> >>>> Both the regions exist at the same time in the device. Each filter can >>>> either belong to maskfull or the maskless region. >>>> >>>> The priority is configured at time of filter creation for every >>>> individual filter and cannot be changed while the filter is still >>>> active. If priority needs to be changed for a particular filter then, >>>> it needs to be deleted first and re-created. >>> >>> Could you model this as two tables and add a table_id to the API? This >>> way user space could populate the table it chooses. We would have to add >>> some capabilities attributes to "learn" if tables support masks or not >>> though. >>> >> >> This approach sounds interesting. > > Now I understand the idea behind these tables, however from an application > point of view I still think it's better if the PMD could take care of flow > rules optimizations automatically. Think about it, PMDs have exactly a > single kind of device they know perfectly well to manage, while applications > want the best possible performance out of any device in the most generic > fashion. The problem is keeping priorities in order and/or possibly breaking rules apart (e.g. you have an L2 table and an L3 table) becomes very complex to manage at driver level. I think its easier for the application which has some context to do this. The application "knows" if its a router for example will likely be able to pack rules better than a PMD will. > >>> I don't see how the PMD can sort this out in any meaningful way and it >>> has to be exposed to the application that has the intelligence to 'know' >>> priorities between masks and non-masks filters. I'm sure you could come >>> up with something but it would be less than ideal in many cases I would >>> guess and we can't have the driver getting priorities wrong or we may >>> not get the correct behavior. > > It may be solved by having the PMD maintain a SW state to quickly know which > rules are currently created and in what state the device is so basically the > application doesn't have to perform this work. > > This API allows applications to express basic needs such as "redirect > packets matching this pattern to that queue". It must not deal with HW > details and limitations in my opinion. If a request cannot be satisfied, > then the rule cannot be created. No help from the application must be > expected by PMDs, otherwise it opens the door to the same issues as the > legacy filtering APIs. This depends on the application and what/how it wants to manage the device. If the application manages a pipeline with some set of tables, then mapping this down to a single table, which then the PMD has to unwind back to a multi-table topology to me seems like a waste. > > [...] >>>> Unfortunately, our maskfull region is extremely small too compared to >>>> maskless region. >>>> >>> >>> To me this means a userspace application would want to pack it >>> carefully to get the full benefit. So you need some mechanism to specify >>> the "region" hence the above table proposal. >>> >> >> Right. Makes sense. > > I do not agree, applications should not be aware of it. Note this case can > be handled differently, so that rules do not have to be moved back and forth > between both tables. If the first created rule requires a maskfull entry, > then all subsequent rules will be entered into that table. Otherwise no > maskfull entry can be created as long as there is one maskless entry. When > either table is full, no more rules may be added. Would that work for you? > Its not about mask vs no mask. The devices with multiple tables that I have don't have this mask limitations. Its about how to optimally pack the rules and who implements that logic. I think its best done in the application where I have the context. Is there a way to omit the table field if the PMD is expected to do a best effort and add the table field if the user wants explicit control over table mgmt. This would support both models. I at least would like to have explicit control over rule population in my pipeline for use cases where I'm building a pipeline on top of the hardware. >> [...] >>>>> Now about this "promisc" match criteria, it can be added as a new meta >>>>> pattern item (4.1.3 Meta item types). Do you want it to be defined from the >>>>> start or add it later with the related code in your PMD? >>>>> >>>> >>>> It could be added as a meta item. If there are other interested >>>> parties, it can be added now. Otherwise, we'll add it with our filtering >>>> related code. >>>> >>> >>> hmm I guess by "promisc" here you mean match packets received from the >>> wire before they have been switched by the silicon? >>> >> >> Match packets received from wire before they have been switched by >> silicon, and which also includes packets not destined for DUT and were >> still received due to interface being in promisc mode. > > I think it's fine, but we'll have to precisely define what happens when a > packet matched with such pattern is part of a terminating rule. For instance > if it is duplicated by HW, then the rule cannot be terminating. > > [...] >>>> This raises another interesting question. What should the PMD do >>>> if it has support to only a subset of fields in the particular item? >>>> >>>> For example, if a rule has been sent to match IP fragmentation along >>>> with several other IPv4 fields, and if the underlying hardware doesn't >>>> support matching based on IP fragmentation, does the PMD reject the >>>> complete rule although it could have done the matching for rest of the >>>> IPv4 fields? >>> >>> I think it has to fail the command other wise user space will not have >>> any way to understand that the full match criteria can not be met and >>> we will get different behavior for the same applications on different >>> nics depending on hardware feature set. This will most likely break >>> applications so we need the error IMO. >>> >> >> Ok. Makes sense. > > Yes, I fully agree with this. > >>>>>> - Match range of physical ports on the NIC in a single rule via masks. >>>>>> For ex: match all UDP packets coming on ports 3 and 4 out of 4 >>>>>> ports available on the NIC. >>>>> >>>>> Applications create flow rules per port, I'd thus suggest that the PMD >>>>> should detect identical rules created on different ports and aggregate them >>>>> as a single HW rule automatically. >>>>> >>>>> If you think this approach is not right, the alternative is a meta pattern >>>>> item that provides a list of ports. I'm not sure this is the right approach >>>>> considering it would most likely not be supported by most NICs. Applications >>>>> may not request it explicitly. >>>>> >>>> >>>> Aggregating via PMD will be expensive operation since it would involve: >>>> - Search of existing filters. >>>> - Deleting those filters. >>>> - Creating a single combined filter. >>>> >>>> And all of above 3 operations would need to be atomic so as not to >>>> affect existing traffic which is hitting above filters. > > Atomicity may not be a problem if the PMD makes sure the new combined rule > is inserted before the others, so they do not need to be removed either. > >>>> Adding a >>>> meta item would be a simpler solution here. > > Yes, clearly. > >>> For this adding a meta-data item seems simplest to me. And if you want >>> to make the default to be only a single port that would maybe make it >>> easier for existing apps to port from flow director. Then if an >>> application cares it can create a list of ports if needed. >>> >> >> Agreed. > > However although I'm not opposed to adding dedicated meta items, remember > applications will not automatically benefit from the increased performance > if a single PMD implements this feature, their maintainers will probably not > bother with it. > Unless as we noted in other thread the application is closely bound to its hardware for capability reasons. In this case it would make sense to implement. >>>>>> - Match range of Physical Functions (PFs) on the NIC in a single rule >>>>>> via masks. For ex: match all traffic coming on several PFs. >>>>> >>>>> The PF and VF pattern items assume there is a single PF associated with a >>>>> DPDK port. VFs are identified with an ID. I basically took the same >>>>> definitions as the existing filter types, perhaps this is not enough for >>>>> Chelsio adapters. >>>>> >>>>> Do you expose more than one PF for a DPDK port? >>>>> >>>>> Anyway, I'd suggest the same approach as above, automatic aggregation of >>>>> rules for performance reasons, otherwise new or updated PF/VF pattern items, >>>>> in which case it would be great if you could provide ideal structure >>>>> definitions for this use case. >>>>> >>>> >>>> In Chelsio hardware, all the ports of a device are exposed via single >>>> PF4. There could be many VFs attached to a PF. Physical NIC functions >>>> are operational on PF4, while VFs can be attached to PFs 0-3. >>>> So, Chelsio hardware doesn't remain tied on a PF-to-Port, one-to-one >>>> mapping assumption. >>>> >>>> There already seems to be a PF meta-item, but it doesn't seem to accept >>>> any "spec" and "mask" field. Similarly, the VF meta-item doesn't >>>> seem to accept a "mask" field. We could probably enable these fields >>>> in the PF and VF meta-items to allow configuration. >>> >>> Maybe a range field would help here as well? So you could specify a VF >>> range. It might be one of the things to consider adding later though if >>> there is no clear use for it now. >>> >> >> VF-value and VF-mask would help to achieve the desired filter. >> VF-mask would also enable to specify a range of VF values. > > Like John, I think a range or even a list instead of a mask would be better, > the PMD can easily create a mask from that if necessary. Reason is that > we've always had bad experiences with bit-fields, they're always too short > at some point and we would like to avoid having to break the ABI to update > existing pattern items later. Agreed avoiding bit-fields is a good idea. > > Also while I don't think this is the case yet, perhaps it will be a good > idea for PFs/VFs to have global unique IDs, just like DPDK ports. > > Thanks. >