From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by dpdk.org (Postfix) with ESMTP id 00C5358DF for ; Wed, 10 Aug 2016 15:38:02 +0200 (CEST) Received: by mail-wm0-f41.google.com with SMTP id q128so94384544wma.1 for ; Wed, 10 Aug 2016 06:38:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to; bh=lZNwJRQ4yKAeibAuOAyY5OC4t4mlMTezLJstYtmDUNI=; b=EZVuZfSvgD2/axjYNwdy45a36xrlM/jrHNaXPFbzkoAWRv+c9WUV0HqXsjWMtA0+t/ jaa1TYWXcXcRojrb5MvrtdluZKap8gqceJTlyN6V4+9+QwrI7VGs+E+AdQPHnYt4SeZ6 jqjpEdBhmRB4n0l1n0l0KQs60UcYCPL1b35+DXeYr/3Q2fI8YUSJlJldD2WYQxUwOsSn x5WHpaxOp5o+LU97mjd5iA5+gzHxv0ErOaoCTPuDC1dEggQjcMuTqNTe5aXbgVIaHR01 +Cp/0DtLYQPndzTmEh3ks4QOAVnIw+SB0d+ijFFIBaNPkMMnRHZl9ZO5izGAKqTIVoa1 oHpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to; bh=lZNwJRQ4yKAeibAuOAyY5OC4t4mlMTezLJstYtmDUNI=; b=m7zsaCHuFXrSxpt/2jyU9HsRFcv5dB4yvlk0IKRs5cBb1/Qekk4ee7c00dl3sIpb1K 5e7JJ+GhOxbNKfPsvMj3yT+vnqP9RJ/xWqBpmP0Hzk+k6Ej7vEbzuu1VX7Zb/BNgvtTY YE0FWsTIYndoJ+f9zbs+W4qFN6HlLs52d2x1G782dkcESOiD4jyE4SDQh82SPuxFI6QF rG9EiDTJB1YsAL8LC/opp7Hxxwc3OHKlcmXWxExiWenfsUQ1iot5Xr2edR55NDzVMKbt vV1dgzntDqJBjhWVZH5aEtbpon7t7jHFp5p38OFjS/1tJG6a1IvDvWcpKxL71I03IDIM lu8g== X-Gm-Message-State: AEkoouvNVxQU4p7TtpOdryj9o1XFMF5dBfbsp67ZcXGtsOxJshYOaLTXcbLkpa3eqL3YKZtz X-Received: by 10.28.145.20 with SMTP id t20mr3228757wmd.74.1470836282474; Wed, 10 Aug 2016 06:38:02 -0700 (PDT) Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id va3sm43204050wjb.18.2016.08.10.06.37.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Aug 2016 06:38:01 -0700 (PDT) Date: Wed, 10 Aug 2016 15:37:55 +0200 From: Adrien Mazarguil To: John Fastabend Cc: Rahul Lakkireddy , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , Pablo de Lara , Olga Shern , Kumar A S , Nirranjan Kirubaharan , Indranil Choudhury Message-ID: <20160810133755.GB3336@6wind.com> Mail-Followup-To: John Fastabend , Rahul Lakkireddy , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , Pablo de Lara , Olga Shern , Kumar A S , Nirranjan Kirubaharan , Indranil Choudhury References: <20160705181646.GO7621@6wind.com> <20160721081335.GA15856@chelsio.com> <20160721170738.GT7621@6wind.com> <20160725113229.GA24036@chelsio.com> <579640E2.50702@gmail.com> <20160726100731.GA2542@chelsio.com> <20160803164410.GH3336@6wind.com> <57A241FC.30508@gmail.com> <20160804132453.GN3336@6wind.com> <57AA4F80.6040101@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <57AA4F80.6040101@gmail.com> Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Aug 2016 13:38:03 -0000 On Tue, Aug 09, 2016 at 02:47:44PM -0700, John Fastabend wrote: > On 16-08-04 06:24 AM, Adrien Mazarguil wrote: > > On Wed, Aug 03, 2016 at 12:11:56PM -0700, John Fastabend wrote: [...] > >> The problem is keeping priorities in order and/or possibly breaking > >> rules apart (e.g. you have an L2 table and an L3 table) becomes very > >> complex to manage at driver level. I think its easier for the > >> application which has some context to do this. The application "knows" > >> if its a router for example will likely be able to pack rules better > >> than a PMD will. > > > > I don't think most applications know they are L2 or L3 routers. They may not > > know more than the pattern provided to the PMD, which may indeed end at a L2 > > or L3 protocol. If the application simply chooses a table based on this > > information, then the PMD could have easily done the same. > > > > But when we start thinking about encap/decap then its natural to start > using this interface to implement various forwarding dataplanes. And one > common way to organize a switch is into a TEP, router, switch > (mac/vlan), ACL tables, etc. In fact we see this topology starting to > show up in the NICs now. > > Further each table may be "managed" by a different entity. In which > case the software will want to manage the physical and virtual networks > separately. > > It doesn't make sense to me to require a software aggregator object to > marshal the rules into a flat table then for a PMD to split them apart > again. OK, my point was mostly about handling basic cases easily and making sure applications do not have to bother with petty HW details when they do not want to, yet still get maximum performance by having the PMD make the most appropriate choices automatically. You've convinced me that in many cases PMDs won't be able to optimize efficiently and that conscious applications will know better. The API has to provide the ability to do so. I think it's fine as long as it is not mandatory. > > I understand the issue is what happens when applications really want to > > define e.g. L2/L3/L2 rules in this specific order (or any ordering that > > cannot be satisfied by HW due to table constraints). > > > > By exposing tables, in such a case applications should move all rules from > > L2 to a L3 table themselves (assuming this is even supported) to guarantee > > ordering between rules, or fail to add them. This is basically what the PMD > > could have done, possibly in a more efficient manner in my opinion. > > I disagree with the more efficient comment :) > > If the software layer is working on L2/TEP/ACL/router layers merging > them just to pull them back apart is not going to be more efficient. Moving flow rules around cannot be efficient by definition, however I think that attempting to describe table capabilities may be as complicated as describing HW bit-masking features. Applications may get it wrong as a result while a PMD would not make any mistake. Your use case is valid though, if the application already groups rules, then sharing this information with the PMD would make sense from a performance standpoint. > > Let's assume two opposite scenarios for this discussion: > > > > - App #1 is a command-line interface directly mapped to flow rules, which > > basically gets slow random input from users depending on how they want to > > configure their traffic. All rules differ considerably (L2, L3, L4, some > > with incomplete bit-masks, etc). All in all, few but complex rules with > > specific priorities. > > > > Agree with this and in this case the application should be behind any > network physical/virtual and not giving rules like encap/decap/etc. This > application either sits on the physical function and "owns" the hardware > resource or sits behind a virtual switch. > > > > - App #2 is something like OVS, creating and deleting a large number of very > > specific (without incomplete bit-masks) and mostly identical > > single-priority rules automatically and very frequently. > > > > Maybe for OVS but not all virtual switches are built with flat tables > at the bottom like this. Nor is it optimal it necessarily optimal. > > Another application (the one I'm concerned about :) would be build as > a pipeline, something like > > ACL -> TEP -> ACL -> VEB -> ACL > > If I have hardware that supports a TEP hardware block an ACL hardware > block and a VEB block for example I don't want to merge my control > plane into a single table. The merging in this case is just pure > overhead/complexity for no gain. It could be done by dedicating priority ranges for each item in the pipeline but then it would be clunky. OK then, let's discuss the best approach to implement this. [...] > >> Its not about mask vs no mask. The devices with multiple tables that I > >> have don't have this mask limitations. Its about how to optimally pack > >> the rules and who implements that logic. I think its best done in the > >> application where I have the context. > >> > >> Is there a way to omit the table field if the PMD is expected to do > >> a best effort and add the table field if the user wants explicit > >> control over table mgmt. This would support both models. I at least > >> would like to have explicit control over rule population in my pipeline > >> for use cases where I'm building a pipeline on top of the hardware. > > > > Yes that's a possibility. Perhaps the table ID to use could be specified as > > a meta pattern item? We'd still need methods to report how many tables exist > > and perhaps some way to report their limitations, these could be later > > through a separate set of functions. > > Sure I think a meta pattern item would be fine or put it in the API call > directly, something like > > rte_flow_create(port_id, pattern, actions); > rte_flow_create_table(port_id, table_id, pattern, actions); I suggest using a common method for both cases, either seems fine to me, as long as a default table value can be provided (zero) when applications do not care. Now about tables management, I think there is no need to not expose table capabilities (in case they have different capabilities) but instead provide guidelines as part of the specification to encourage applications writers to group similar rules in tables. A previously discussed, flow rules priorities would be specific to the table they are affected to. Like flow rules, table priorities could be handled through their index with index 0 having the highest priority. Like flow rule priorities, table indices wouldn't have to be contiguous. If this works for you, how about renaming "tables" to "groups"? -- Adrien Mazarguil 6WIND