From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from stargate3.asicdesigners.com (unknown [12.32.117.8]) by dpdk.org (Postfix) with ESMTP id 4137337A6 for ; Mon, 25 Jul 2016 13:35:24 +0200 (CEST) Received: from localhost (scalar.blr.asicdesigners.com [10.193.185.94]) by stargate3.asicdesigners.com (8.13.8/8.13.8) with ESMTP id u6PBXHHm016863; Mon, 25 Jul 2016 04:33:28 -0700 Date: Mon, 25 Jul 2016 17:02:31 +0530 From: Rahul Lakkireddy To: Adrien Mazarguil Cc: dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , Pablo de Lara , Olga Shern , Kumar Sanghvi , Nirranjan Kirubaharan , Indranil Choudhury Message-ID: <20160725113229.GA24036@chelsio.com> References: <20160705181646.GO7621@6wind.com> <20160721081335.GA15856@chelsio.com> <20160721170738.GT7621@6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160721170738.GT7621@6wind.com> User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2016 11:35:30 -0000 Hi Adrien, On Thursday, July 07/21/16, 2016 at 19:07:38 +0200, Adrien Mazarguil wrote: > Hi Rahul, > > Please see below. > > On Thu, Jul 21, 2016 at 01:43:37PM +0530, Rahul Lakkireddy wrote: > > Hi Adrien, > > > > The proposal looks very good. It satisfies most of the features > > supported by Chelsio NICs. We are looking for suggestions on exposing > > more additional features supported by Chelsio NICs via this API. > > > > Chelsio NICs have two regions in which filters can be placed - > > Maskfull and Maskless regions. As their names imply, maskfull region > > can accept masks to match a range of values; whereas, maskless region > > don't accept any masks and hence perform a more strict exact-matches. > > Filters without masks can also be placed in maskfull region. By > > default, maskless region have higher priority over the maskfull region. > > However, the priority between the two regions is configurable. > > I understand this configuration affects the entire device. Just to be clear, > assuming some filters are already configured, are they affected by a change > of region priority later? > Both the regions exist at the same time in the device. Each filter can either belong to maskfull or the maskless region. The priority is configured at time of filter creation for every individual filter and cannot be changed while the filter is still active. If priority needs to be changed for a particular filter then, it needs to be deleted first and re-created. > > Please suggest on how we can let the apps configure in which region > > filters must be placed and set the corresponding priority accordingly > > via this API. > > Okay. Applications, like customers, are always right. > > With this in mind, PMDs are not allowed to break existing flow rules, and > face two options when applications provide a flow specification that would > break an existing rule: > > - Refuse to create it (easiest). > > - Find a workaround to apply it anyway (possibly quite complicated). > > The reason you have these two regions is performance right? Otherwise I'd > just say put everything in the maskfull region. > Unfortunately, our maskfull region is extremely small too compared to maskless region. > PMDs are allowed to rearrange existing rules or change device parameters as > long as existing constraints are satisfied. In my opinion it does not matter > which region has the highest default priority. Applications always want the > best performance so the first created rule should be in the most appropriate > region. > > If a subsequent rule requires it to be in the other region but the > application specified the wrong priority for this to work, then the PMD can > either choose to swap region priorities on the device (assuming it does not > affect other rules), or destroy and recreate the original rule in a way that > satisfies all constraints (i.e. moving conflicting rules from the maskless > region to the maskfull one). > > Going further, when subsequent rules get destroyed the PMD should ideally > move back maskfull rules back into the maskless region for better > performance. > > This is only a suggestion. PMDs have the right to say "no" at some point. > Filter search and deletion are expensive operations and they need to be atomic in order to not affect existing traffic already hitting the filters. Also, Chelsio hardware can support upto ~500,000 maskless region entries. So, the cost of search becomes too high for the PMD when there are large number of filter entries present. In my opinion, rather than PMD deciding on priority by itself, it would be more simpler if the apps have the flexibility to configure the priority between the regions by themselves? > More important in my opinion is to make sure applications can create a given > set of flow rules in any order. If rules a/b/c can be created, then it won't > make sense from an application point of view if c/a/b for some reason cannot > and the PMD maintainers will rightfully get a bug report. > > > More comments below. > > > > On Tuesday, July 07/05/16, 2016 at 20:16:46 +0200, Adrien Mazarguil wrote: > > > Hi All, > > > > > [...] > > > > > > > > ``ETH`` > > > ^^^^^^^ > > > > > > Matches an Ethernet header. > > > > > > - ``dst``: destination MAC. > > > - ``src``: source MAC. > > > - ``type``: EtherType. > > > - ``tags``: number of 802.1Q/ad tags defined. > > > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one: > > > > > > - ``tpid``: Tag protocol identifier. > > > - ``tci``: Tag control information. > > > > > > ``IPV4`` > > > ^^^^^^^^ > > > > > > Matches an IPv4 header. > > > > > > - ``src``: source IP address. > > > - ``dst``: destination IP address. > > > - ``tos``: ToS/DSCP field. > > > - ``ttl``: TTL field. > > > - ``proto``: protocol number for the next layer. > > > > > > ``IPV6`` > > > ^^^^^^^^ > > > > > > Matches an IPv6 header. > > > > > > - ``src``: source IP address. > > > - ``dst``: destination IP address. > > > - ``tc``: traffic class field. > > > - ``nh``: Next header field (protocol). > > > - ``hop_limit``: hop limit field (TTL). > > > > > > ``ICMP`` > > > ^^^^^^^^ > > > > > > Matches an ICMP header. > > > > > > - TBD. > > > > > > ``UDP`` > > > ^^^^^^^ > > > > > > Matches a UDP header. > > > > > > - ``sport``: source port. > > > - ``dport``: destination port. > > > - ``length``: UDP length. > > > - ``checksum``: UDP checksum. > > > > > > .. raw:: pdf > > > > > > PageBreak > > > > > > ``TCP`` > > > ^^^^^^^ > > > > > > Matches a TCP header. > > > > > > - ``sport``: source port. > > > - ``dport``: destination port. > > > - All other TCP fields and bits. > > > > > > ``VXLAN`` > > > ^^^^^^^^^ > > > > > > Matches a VXLAN header. > > > > > > - TBD. > > > > > > > In addition to above matches, Chelsio NICs have some additional > > features: > > > > - Match based on unicast DST-MAC, multicast DST-MAC, broadcast DST-MAC. > > Also, there is a match criteria available called 'promisc' - which > > matches packets that are not destined for the interface, but had > > been received by the hardware due to interface being in promiscuous > > mode. > > There is no unicast/multicast/broadcast distinction in the ETH pattern item, > those are covered by the specified MAC address. > Ok. Makes sense. > Now about this "promisc" match criteria, it can be added as a new meta > pattern item (4.1.3 Meta item types). Do you want it to be defined from the > start or add it later with the related code in your PMD? > It could be added as a meta item. If there are other interested parties, it can be added now. Otherwise, we'll add it with our filtering related code. > > - Match FCoE packets. > > The "oE" part is covered by the ETH item (using the proper Ethertype), but a > new pattern item should probably be added for the FC protocol itself. As > above, I suggest adding it later. > Same here. > > - Match IP Fragmented packets. > > It seems that I missed quite a few fields in the IPv4 item definition > (struct rte_flow_item_ipv4). It should be in there. > This raises another interesting question. What should the PMD do if it has support to only a subset of fields in the particular item? For example, if a rule has been sent to match IP fragmentation along with several other IPv4 fields, and if the underlying hardware doesn't support matching based on IP fragmentation, does the PMD reject the complete rule although it could have done the matching for rest of the IPv4 fields? > > - Match range of physical ports on the NIC in a single rule via masks. > > For ex: match all UDP packets coming on ports 3 and 4 out of 4 > > ports available on the NIC. > > Applications create flow rules per port, I'd thus suggest that the PMD > should detect identical rules created on different ports and aggregate them > as a single HW rule automatically. > > If you think this approach is not right, the alternative is a meta pattern > item that provides a list of ports. I'm not sure this is the right approach > considering it would most likely not be supported by most NICs. Applications > may not request it explicitly. > Aggregating via PMD will be expensive operation since it would involve: - Search of existing filters. - Deleting those filters. - Creating a single combined filter. And all of above 3 operations would need to be atomic so as not to affect existing traffic which is hitting above filters. Adding a meta item would be a simpler solution here. > > - Match range of Physical Functions (PFs) on the NIC in a single rule > > via masks. For ex: match all traffic coming on several PFs. > > The PF and VF pattern items assume there is a single PF associated with a > DPDK port. VFs are identified with an ID. I basically took the same > definitions as the existing filter types, perhaps this is not enough for > Chelsio adapters. > > Do you expose more than one PF for a DPDK port? > > Anyway, I'd suggest the same approach as above, automatic aggregation of > rules for performance reasons, otherwise new or updated PF/VF pattern items, > in which case it would be great if you could provide ideal structure > definitions for this use case. > In Chelsio hardware, all the ports of a device are exposed via single PF4. There could be many VFs attached to a PF. Physical NIC functions are operational on PF4, while VFs can be attached to PFs 0-3. So, Chelsio hardware doesn't remain tied on a PF-to-Port, one-to-one mapping assumption. There already seems to be a PF meta-item, but it doesn't seem to accept any "spec" and "mask" field. Similarly, the VF meta-item doesn't seem to accept a "mask" field. We could probably enable these fields in the PF and VF meta-items to allow configuration. > > Please suggest on how we can expose the above features to DPDK apps via > > this API. > > > > [...] > > > > > > > > Actions > > > ~~~~~~~ > > > > > > Each possible action is represented by a type. Some have associated > > > configuration structures. Several actions combined in a list can be affected > > > to a flow rule. That list is not ordered. > > > > > > At least one action must be defined in a filter rule in order to do > > > something with matched packets. > > > > > > - Actions are defined with ``struct rte_flow_action``. > > > - A list of actions is defined with ``struct rte_flow_actions``. > > > > > > They fall in three categories: > > > > > > - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent > > > processing matched packets by subsequent flow rules, unless overridden > > > with PASSTHRU. > > > > > > - Non terminating actions (PASSTHRU, DUP) that leave matched packets up for > > > additional processing by subsequent flow rules. > > > > > > - Other non terminating meta actions that do not affect the fate of packets > > > (END, VOID, ID, COUNT). > > > > > > When several actions are combined in a flow rule, they should all have > > > different types (e.g. dropping a packet twice is not possible). However > > > considering the VOID type is an exception to this rule, the defined behavior > > > is for PMDs to only take into account the last action of a given type found > > > in the list. PMDs still perform error checking on the entire list. > > > > > > *Note that PASSTHRU is the only action able to override a terminating rule.* > > > > > > > Chelsio NICs can support an action 'switch' which can re-direct > > matched packets from one port to another port in hardware. In addition, > > it can also optionally: > > > > 1. Perform header rewrites (src-mac/dst-mac rewrite, src-mac/dst-mac > > swap, vlan add/remove/rewrite). > > > > 2. Perform NAT'ing in hardware (4-tuple rewrite). > > > > before sending it out on the wire [1]. > > > > To meet the above requirements, we'd need a way to pass sub-actions > > to action 'switch' and a way to pass extra info (such as new > > src-mac/dst-mac, new vlan, new 4-tuple for NAT) to rewrite > > corresponding fields. > > > > We're looking for suggestions on how we can achieve action 'switch' > > in this new API. > > > > From our understanding of this API, we could just expand > > rte_flow_action_type with an additional action type > > RTE_FLOW_ACTION_TYPE_SWITCH and define several sub-actions such as: > > > > enum rte_flow_action_switch_type { > > RTE_FLOW_ACTION_SWITCH_TYPE_NONE, > > RTE_FLOW_ACTION_SWITCH_TYPE_MAC_REWRITE, > > RTE_FLOW_ACTION_SWITCH_TYPE_MAC_SWAP, > > RTE_FLOW_ACTION_SWITCH_TYPE_VLAN_INSERT, > > RTE_FLOW_ACTION_SWITCH_TYPE_VLAN_DELETE, > > RTE_FLOW_ACTION_SWITCH_TYPE_VLAN_REWRITE, > > RTE_FLOW_ACTION_SWITCH_TYPE_NAT_REWRITE, > > }; > > > > and then define an rte_flow_action_switch as follows: > > > > struct rte_flow_action_switch { > > enum rte_flow_action_switch_type type; /* sub actions to perform */ > > uint16_t port; /* Destination physical port to switch packet */ > > enum rte_flow_item_type type; /* Fields to rewrite */ > > const void *switch_spec; > > /* Holds info to rewrite matched flows */ > > }; > > > > Does the above approach sound right with respect to this new API? > > It does. Not sure I'd go with a sublevel of actions for switching types > though. Think more generic, MAC swapping, MAC rewrite and so on could be > defined as separate actions usable on their own by all PMDs, so you'd > combine a SWITCH action with these. > Ok. Separate actions seem better. > Also be careful with the "port" field definition. I'm sure the Chelsio > switch cannot make a packet leave through a port of a Mellanox device. DPDK > applications are aware of a single "port namespace", so it has to be > translated by the PMD to a physical port on the same Chelio adapter, > otherwise rule creation should fail. > Agreed. > > [...] > > > > > > > > ``COUNT`` > > > ^^^^^^^^^ > > > > > > Enables hits counter for this rule. > > > > > > This counter can be retrieved and reset through ``rte_flow_query()``, see > > > ``struct rte_flow_query_count``. > > > > > > - Counters can be retrieved with ``rte_flow_query()``. > > > - No configurable property. > > > > > > +---------------+ > > > | COUNT | > > > +===============+ > > > | no properties | > > > +---------------+ > > > > > > Query structure to retrieve and reset the flow rule hits counter: > > > > > > +------------------------------------------------+ > > > | COUNT query | > > > +===========+=====+==============================+ > > > | ``reset`` | in | reset counter after query | > > > +-----------+-----+------------------------------+ > > > | ``hits`` | out | number of hits for this flow | > > > +-----------+-----+------------------------------+ > > > > > > > Chelsio NICs can also count the number of bytes that hit the rule. > > So, need a counter "bytes". > > As well as the number of packets, right? Anyway it makes sense, I'll add > a "bytes" field. > Yes. I hope 'hits' is analogous to 'number of packets' right? > > [1] http://www.dpdk.org/ml/archives/dev/2016-February/032605.html > > Wow. I've not paid much attention to this thread at the time. Tweaking FDIR > this much was quite an achievement. Now I feel lazy with my proposal for a > brand new API instead, thanks. Thanks. A generic filtering infrastructure was what I was aiming for and this proposal seems to have hit the mark. Thanks, Rahul