From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f46.google.com (mail-wm0-f46.google.com [74.125.82.46]) by dpdk.org (Postfix) with ESMTP id 5C33D5585 for ; Thu, 21 Jul 2016 19:07:44 +0200 (CEST) Received: by mail-wm0-f46.google.com with SMTP id q128so28726426wma.1 for ; Thu, 21 Jul 2016 10:07:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to; bh=1xeGJOC499Er2+2+8qgibAv90R2br+Zlg7aGjUWWskg=; b=NEcc7RsA3Jf+ydzANsE4JWhR4RGLQi6w7nCzc6aUUs6yr9TN78wv/g+cEQfpz0a+yw 2+mVXpWJR+wncuOiqm+QaO4ltTjvHUo+6Momlvj2MrCrjIy/5/lz5DzLOVgMTBEe4taa 4MSUPwueWhB8xB7paxMZf3moPugN+vNa98tCxWGPIGYl9IJ7XboMkGzbZhssIXZ0Kqwy IvGxNUZ+ih1ps/ZKc3L3SlEHGAT/hAiAVbrXT6lD5vOlwg8Gq1d16lnOTv0xv8TWyyeO ZMoBoRrlCpxr2/aTL6KkpzGQpUCeCJ4s/ngKSd3cKY7aBZMfZ+8Of2NBrpRG7Mt95eeS ErJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to; bh=1xeGJOC499Er2+2+8qgibAv90R2br+Zlg7aGjUWWskg=; b=KQkRdeRoOApM+Z4yazsnAod5QQdLkzZqwqB/EQmFDGcw0iDaGdFMw05dmiTPxfQ1JQ ez9mD2qUA/JkjAhcQoih/1mxAzUpOfSAlAVkPWOEkF5ecSVqBzulDDf/kR3HHnSKZ7J0 CFlEhExZypP+8CTKo0O1D9MZnutsgHpP8MmVYIk59iHk1D++1TJ2k7qS9PQZRJHHKQed HqMg9mLpnj9oyj73p0QsKw0ESzD0AzQfuOSz+/x5e5uw13zUlnYm/nBaMsFIIhCLzz12 GRwUndyH2pRg/vYai3kboqP/MnIvjb51jjFLbH43h1Iio0DltKxl9PQHEEv1kK7lcPCP e7Jg== X-Gm-Message-State: ALyK8tKkCN/fIAJm1+oXSiv08bzUIynPR3U128chvuGq30cZtobWYbokAx9q6Awb4nXENvd4 X-Received: by 10.28.234.16 with SMTP id i16mr19690574wmh.76.1469120863958; Thu, 21 Jul 2016 10:07:43 -0700 (PDT) Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id c131sm3475176wmh.1.2016.07.21.10.07.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Jul 2016 10:07:42 -0700 (PDT) Date: Thu, 21 Jul 2016 19:07:38 +0200 From: Adrien Mazarguil To: Rahul Lakkireddy Cc: dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , Pablo de Lara , Olga Shern , Kumar Sanghvi , Nirranjan Kirubaharan , Indranil Choudhury Message-ID: <20160721170738.GT7621@6wind.com> Mail-Followup-To: Rahul Lakkireddy , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , Pablo de Lara , Olga Shern , Kumar Sanghvi , Nirranjan Kirubaharan , Indranil Choudhury References: <20160705181646.GO7621@6wind.com> <20160721081335.GA15856@chelsio.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160721081335.GA15856@chelsio.com> Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2016 17:07:44 -0000 Hi Rahul, Please see below. On Thu, Jul 21, 2016 at 01:43:37PM +0530, Rahul Lakkireddy wrote: > Hi Adrien, > > The proposal looks very good. It satisfies most of the features > supported by Chelsio NICs. We are looking for suggestions on exposing > more additional features supported by Chelsio NICs via this API. > > Chelsio NICs have two regions in which filters can be placed - > Maskfull and Maskless regions. As their names imply, maskfull region > can accept masks to match a range of values; whereas, maskless region > don't accept any masks and hence perform a more strict exact-matches. > Filters without masks can also be placed in maskfull region. By > default, maskless region have higher priority over the maskfull region. > However, the priority between the two regions is configurable. I understand this configuration affects the entire device. Just to be clear, assuming some filters are already configured, are they affected by a change of region priority later? > Please suggest on how we can let the apps configure in which region > filters must be placed and set the corresponding priority accordingly > via this API. Okay. Applications, like customers, are always right. With this in mind, PMDs are not allowed to break existing flow rules, and face two options when applications provide a flow specification that would break an existing rule: - Refuse to create it (easiest). - Find a workaround to apply it anyway (possibly quite complicated). The reason you have these two regions is performance right? Otherwise I'd just say put everything in the maskfull region. PMDs are allowed to rearrange existing rules or change device parameters as long as existing constraints are satisfied. In my opinion it does not matter which region has the highest default priority. Applications always want the best performance so the first created rule should be in the most appropriate region. If a subsequent rule requires it to be in the other region but the application specified the wrong priority for this to work, then the PMD can either choose to swap region priorities on the device (assuming it does not affect other rules), or destroy and recreate the original rule in a way that satisfies all constraints (i.e. moving conflicting rules from the maskless region to the maskfull one). Going further, when subsequent rules get destroyed the PMD should ideally move back maskfull rules back into the maskless region for better performance. This is only a suggestion. PMDs have the right to say "no" at some point. More important in my opinion is to make sure applications can create a given set of flow rules in any order. If rules a/b/c can be created, then it won't make sense from an application point of view if c/a/b for some reason cannot and the PMD maintainers will rightfully get a bug report. > More comments below. > > On Tuesday, July 07/05/16, 2016 at 20:16:46 +0200, Adrien Mazarguil wrote: > > Hi All, > > > [...] > > > > > ``ETH`` > > ^^^^^^^ > > > > Matches an Ethernet header. > > > > - ``dst``: destination MAC. > > - ``src``: source MAC. > > - ``type``: EtherType. > > - ``tags``: number of 802.1Q/ad tags defined. > > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one: > > > > - ``tpid``: Tag protocol identifier. > > - ``tci``: Tag control information. > > > > ``IPV4`` > > ^^^^^^^^ > > > > Matches an IPv4 header. > > > > - ``src``: source IP address. > > - ``dst``: destination IP address. > > - ``tos``: ToS/DSCP field. > > - ``ttl``: TTL field. > > - ``proto``: protocol number for the next layer. > > > > ``IPV6`` > > ^^^^^^^^ > > > > Matches an IPv6 header. > > > > - ``src``: source IP address. > > - ``dst``: destination IP address. > > - ``tc``: traffic class field. > > - ``nh``: Next header field (protocol). > > - ``hop_limit``: hop limit field (TTL). > > > > ``ICMP`` > > ^^^^^^^^ > > > > Matches an ICMP header. > > > > - TBD. > > > > ``UDP`` > > ^^^^^^^ > > > > Matches a UDP header. > > > > - ``sport``: source port. > > - ``dport``: destination port. > > - ``length``: UDP length. > > - ``checksum``: UDP checksum. > > > > .. raw:: pdf > > > > PageBreak > > > > ``TCP`` > > ^^^^^^^ > > > > Matches a TCP header. > > > > - ``sport``: source port. > > - ``dport``: destination port. > > - All other TCP fields and bits. > > > > ``VXLAN`` > > ^^^^^^^^^ > > > > Matches a VXLAN header. > > > > - TBD. > > > > In addition to above matches, Chelsio NICs have some additional > features: > > - Match based on unicast DST-MAC, multicast DST-MAC, broadcast DST-MAC. > Also, there is a match criteria available called 'promisc' - which > matches packets that are not destined for the interface, but had > been received by the hardware due to interface being in promiscuous > mode. There is no unicast/multicast/broadcast distinction in the ETH pattern item, those are covered by the specified MAC address. Now about this "promisc" match criteria, it can be added as a new meta pattern item (4.1.3 Meta item types). Do you want it to be defined from the start or add it later with the related code in your PMD? > - Match FCoE packets. The "oE" part is covered by the ETH item (using the proper Ethertype), but a new pattern item should probably be added for the FC protocol itself. As above, I suggest adding it later. > - Match IP Fragmented packets. It seems that I missed quite a few fields in the IPv4 item definition (struct rte_flow_item_ipv4). It should be in there. > - Match range of physical ports on the NIC in a single rule via masks. > For ex: match all UDP packets coming on ports 3 and 4 out of 4 > ports available on the NIC. Applications create flow rules per port, I'd thus suggest that the PMD should detect identical rules created on different ports and aggregate them as a single HW rule automatically. If you think this approach is not right, the alternative is a meta pattern item that provides a list of ports. I'm not sure this is the right approach considering it would most likely not be supported by most NICs. Applications may not request it explicitly. > - Match range of Physical Functions (PFs) on the NIC in a single rule > via masks. For ex: match all traffic coming on several PFs. The PF and VF pattern items assume there is a single PF associated with a DPDK port. VFs are identified with an ID. I basically took the same definitions as the existing filter types, perhaps this is not enough for Chelsio adapters. Do you expose more than one PF for a DPDK port? Anyway, I'd suggest the same approach as above, automatic aggregation of rules for performance reasons, otherwise new or updated PF/VF pattern items, in which case it would be great if you could provide ideal structure definitions for this use case. > Please suggest on how we can expose the above features to DPDK apps via > this API. > > [...] > > > > > Actions > > ~~~~~~~ > > > > Each possible action is represented by a type. Some have associated > > configuration structures. Several actions combined in a list can be affected > > to a flow rule. That list is not ordered. > > > > At least one action must be defined in a filter rule in order to do > > something with matched packets. > > > > - Actions are defined with ``struct rte_flow_action``. > > - A list of actions is defined with ``struct rte_flow_actions``. > > > > They fall in three categories: > > > > - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent > > processing matched packets by subsequent flow rules, unless overridden > > with PASSTHRU. > > > > - Non terminating actions (PASSTHRU, DUP) that leave matched packets up for > > additional processing by subsequent flow rules. > > > > - Other non terminating meta actions that do not affect the fate of packets > > (END, VOID, ID, COUNT). > > > > When several actions are combined in a flow rule, they should all have > > different types (e.g. dropping a packet twice is not possible). However > > considering the VOID type is an exception to this rule, the defined behavior > > is for PMDs to only take into account the last action of a given type found > > in the list. PMDs still perform error checking on the entire list. > > > > *Note that PASSTHRU is the only action able to override a terminating rule.* > > > > Chelsio NICs can support an action 'switch' which can re-direct > matched packets from one port to another port in hardware. In addition, > it can also optionally: > > 1. Perform header rewrites (src-mac/dst-mac rewrite, src-mac/dst-mac > swap, vlan add/remove/rewrite). > > 2. Perform NAT'ing in hardware (4-tuple rewrite). > > before sending it out on the wire [1]. > > To meet the above requirements, we'd need a way to pass sub-actions > to action 'switch' and a way to pass extra info (such as new > src-mac/dst-mac, new vlan, new 4-tuple for NAT) to rewrite > corresponding fields. > > We're looking for suggestions on how we can achieve action 'switch' > in this new API. > > From our understanding of this API, we could just expand > rte_flow_action_type with an additional action type > RTE_FLOW_ACTION_TYPE_SWITCH and define several sub-actions such as: > > enum rte_flow_action_switch_type { > RTE_FLOW_ACTION_SWITCH_TYPE_NONE, > RTE_FLOW_ACTION_SWITCH_TYPE_MAC_REWRITE, > RTE_FLOW_ACTION_SWITCH_TYPE_MAC_SWAP, > RTE_FLOW_ACTION_SWITCH_TYPE_VLAN_INSERT, > RTE_FLOW_ACTION_SWITCH_TYPE_VLAN_DELETE, > RTE_FLOW_ACTION_SWITCH_TYPE_VLAN_REWRITE, > RTE_FLOW_ACTION_SWITCH_TYPE_NAT_REWRITE, > }; > > and then define an rte_flow_action_switch as follows: > > struct rte_flow_action_switch { > enum rte_flow_action_switch_type type; /* sub actions to perform */ > uint16_t port; /* Destination physical port to switch packet */ > enum rte_flow_item_type type; /* Fields to rewrite */ > const void *switch_spec; > /* Holds info to rewrite matched flows */ > }; > > Does the above approach sound right with respect to this new API? It does. Not sure I'd go with a sublevel of actions for switching types though. Think more generic, MAC swapping, MAC rewrite and so on could be defined as separate actions usable on their own by all PMDs, so you'd combine a SWITCH action with these. Also be careful with the "port" field definition. I'm sure the Chelsio switch cannot make a packet leave through a port of a Mellanox device. DPDK applications are aware of a single "port namespace", so it has to be translated by the PMD to a physical port on the same Chelio adapter, otherwise rule creation should fail. > [...] > > > > > ``COUNT`` > > ^^^^^^^^^ > > > > Enables hits counter for this rule. > > > > This counter can be retrieved and reset through ``rte_flow_query()``, see > > ``struct rte_flow_query_count``. > > > > - Counters can be retrieved with ``rte_flow_query()``. > > - No configurable property. > > > > +---------------+ > > | COUNT | > > +===============+ > > | no properties | > > +---------------+ > > > > Query structure to retrieve and reset the flow rule hits counter: > > > > +------------------------------------------------+ > > | COUNT query | > > +===========+=====+==============================+ > > | ``reset`` | in | reset counter after query | > > +-----------+-----+------------------------------+ > > | ``hits`` | out | number of hits for this flow | > > +-----------+-----+------------------------------+ > > > > Chelsio NICs can also count the number of bytes that hit the rule. > So, need a counter "bytes". As well as the number of packets, right? Anyway it makes sense, I'll add a "bytes" field. > [1] http://www.dpdk.org/ml/archives/dev/2016-February/032605.html Wow. I've not paid much attention to this thread at the time. Tweaking FDIR this much was quite an achievement. Now I feel lazy with my proposal for a brand new API instead, thanks. -- Adrien Mazarguil 6WIND