From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by dpdk.org (Postfix) with ESMTP id 26651FE5 for ; Thu, 7 Jul 2016 12:26:55 +0200 (CEST) Received: by mail-wm0-f53.google.com with SMTP id f126so204833406wma.1 for ; Thu, 07 Jul 2016 03:26:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=iSXIOefQb65Ev7atulmgi+AdoPUDnQnewbhEPA279Ww=; b=oTxiXTl3lz2lWBGassi5Wghzuuq4fe1JOqhb1ys6Ajg5EKy7bjwExlkBMddsMdgt3z GK9Jg/rrxoq8q72PcqD7oyMXBv7HLN6VbQGo9Zli4AxrozicPz4GslvDWduCZzEKMdB8 2V0BPaZk6gg+8T6wXCgrCe3N9o1S/xinfis9VcyNsgyratz7ND4UPfKPw5TYYi7zbE8C gKN+m3xJKqcVPeA2dp0JP/18WYZFcrbQVJE66rMCiCU/8qSGu7GowktEVC8VhTH3hoHs trV5S3kd8PNpmPQZMS1REVIuqcph4JnqPd+jrYiziHP/zLCmE5L5Hi4H/OMhVpr6sGMe 3Nrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=iSXIOefQb65Ev7atulmgi+AdoPUDnQnewbhEPA279Ww=; b=AM8/6gEMh/TvOB6BmzLB2oUTbH2Sm1zFZ1CB345YiZuj/yZnC7Qcl2zezuKjEf24Ld qwYuY5BQ4jxYUyUlYPNbmz9prxrVc6deaa6KthLYbrXWJlQP+hr7Twe0YItKFTXS617q 5HBbUhHsKJCtNgytyZFRRXbOGsumvvcjQ/5L6l1HzX4IjK47lmKl2VNVNz1F9MO+v3hx 0vYx9/uA887Tw5A9hpX8nbpZNL7BsSRJeLND6/JMMXyDYkdgw+uTj+IGqzKtkeM6Nga7 iRJxwBR2RGiOzMOmU4BUIc52MJt/sOlcmITrdpQ1RoyV4XmA6Z++ejkW46t3rHIsJ1z5 QQQw== X-Gm-Message-State: ALyK8tJ7MQIkhWoEIxWr0CbfmGC/oELaVtm2x6hx/SftVtq1R5K/bnfwoV64iuNZ8cIqgy1F X-Received: by 10.28.165.5 with SMTP id o5mr2124450wme.87.1467887214818; Thu, 07 Jul 2016 03:26:54 -0700 (PDT) Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id f73sm2327076wmg.1.2016.07.07.03.26.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Jul 2016 03:26:53 -0700 (PDT) Date: Thu, 7 Jul 2016 12:26:50 +0200 From: Adrien Mazarguil To: "Lu, Wenzhuo" Cc: "dev@dpdk.org" , Thomas Monjalon , "Zhang, Helin" , "Wu, Jingjing" , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Jan Medala , John Daley , "Chen, Jing D" , "Ananyev, Konstantin" , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , "De Lara Guarch, Pablo" , Olga Shern Message-ID: <20160707102650.GU7621@6wind.com> Mail-Followup-To: "Lu, Wenzhuo" , "dev@dpdk.org" , Thomas Monjalon , "Zhang, Helin" , "Wu, Jingjing" , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Jan Medala , John Daley , "Chen, Jing D" , "Ananyev, Konstantin" , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , "De Lara Guarch, Pablo" , Olga Shern References: <20160705181646.GO7621@6wind.com> <6A0DE07E22DDAD4C9103DF62FEBC09090348E1A7@shsmsx102.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6A0DE07E22DDAD4C9103DF62FEBC09090348E1A7@shsmsx102.ccr.corp.intel.com> Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 10:26:55 -0000 Hi Lu Wenzhuo, Thanks for your feedback, I'm replying below as well. On Thu, Jul 07, 2016 at 07:14:18AM +0000, Lu, Wenzhuo wrote: > Hi Adrien, > I have some questions, please see inline, thanks. > > > -----Original Message----- > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] > > Sent: Wednesday, July 6, 2016 2:17 AM > > To: dev@dpdk.org > > Cc: Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody; Ajit Khaparde; > > Rahul Lakkireddy; Lu, Wenzhuo; Jan Medala; John Daley; Chen, Jing D; Ananyev, > > Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; De Lara > > Guarch, Pablo; Olga Shern > > Subject: [RFC] Generic flow director/filtering/classification API > > > > > > Requirements for a new API: > > > > - Flexible and extensible without causing API/ABI problems for existing > > applications. > > - Should be unambiguous and easy to use. > > - Support existing filtering features and actions listed in `Filter types`_. > > - Support packet alteration. > > - In case of overlapping filters, their priority should be well documented. > Does that mean we don't guarantee the consistent of priority? The priority can be different on different NICs. So the behavior of the actions can be different. Right? No, the intent is precisely to define what happens in order to get a consistent result across different devices, and document cases with undefined behavior. There must be no room left for interpretation. For example, the API must describe what happens when two overlapping filters (e.g. one matching an Ethernet header, another one matching an IP header) match a given packet at a given priority level. It is documented in section 4.1.1 (priorities) as "undefined behavior". Applications remain free to do it and deal with consequences, at least they know they cannot expect a consistent outcome, unless they use different priority levels for both rules, see also 4.4.5 (flow rules priority). > Seems the users still need to aware the some details of the HW? Do we need to add the negotiation for the priority? Priorities as defined in this document may not be directly mappable to HW capabilities (e.g. HW does not support enough priorities, or that some corner case make them not work as described), in which case the PMD may choose to simulate priorities (again 4.4.5), as long as the end result follows the specification. So users must not be aware of some HW details, the PMD does and must perform the needed workarounds to suit their expectations. Users may only be impacted by errors while attempting to create rules that are either unsupported or would cause them (or existing rules) to diverge from the spec. > > Flow rules can have several distinct actions (such as counting, > > encapsulating, decapsulating before redirecting packets to a particular > > queue, etc.), instead of relying on several rules to achieve this and having > > applications deal with hardware implementation details regarding their > > order. > I think normally HW doesn't support several actions in one rule. If a rule has several actions, seems HW has to split it to several rules. The order can still be a problem. Note that, except for a very small subset of pattern items and actions, supporting multiple actions for a given rule is not mandatory, and can be emulated as you said by having to split them into several rules each with its own priority if possible (see 3.3 "high level design"). Also, a rule "action" as defined in this API can be just about anything, for example combining a queue redirection with 32-bit tagging. FDIR supports many cases of what can be described as several actions, see 5.7 "FDIR to most item types → QUEUE, DROP, PASSTHRU". If you were thinking about having two queue targets for a given rule, then I agree with you - that is why a rule cannot have more than a single action of a given type (see 4.1.5 actions), to avoid such abuse from applications. Applications must use several pass-through rules with different priority levels if they want to perform a given action several times on a given packet. Again, PMDs support is not mandatory as pass-through is optional. > > ``ETH`` > > ^^^^^^^ > > > > Matches an Ethernet header. > > > > - ``dst``: destination MAC. > > - ``src``: source MAC. > > - ``type``: EtherType. > > - ``tags``: number of 802.1Q/ad tags defined. > > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one: > > > > - ``tpid``: Tag protocol identifier. > > - ``tci``: Tag control information. > "ETH" means all the parameters, dst, src, type... need to be matched? The same question for IPv4, IPv6 ... Yes, it's basically the description of all Ethernet header fields including VLAN tags (same for other protocols). Please see the linked draft header file which should make all of this easier to understand: https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h > > ``UDP`` > > ^^^^^^^ > > > > Matches a UDP header. > > > > - ``sport``: source port. > > - ``dport``: destination port. > > - ``length``: UDP length. > > - ``checksum``: UDP checksum. > Why checksum? Do we need to filter the packets by checksum value? Well, I've decided to include all protocol header fields for completeness (so the ABI does not need to be broken later then they become necessary, or require another pattern item), not that all of them make sense in a pattern. In this specific case, all PMDs I know of must reject a pattern specification with a nonzero mask for the checksum field, because none of them support it. > > ``VOID`` (action) > > ^^^^^^^^^^^^^^^^^ > > > > Used as a placeholder for convenience. It is ignored and simply discarded by > > PMDs. > Don't understand why we need VOID. If it’s about the format. Why not guarantee it in rte layer? I'm not sure to understand your question about rte layer, but this type is fully managed by the PMD and is not supposed to be translated to a hardware action. I think it may come handy in some cases (like the VOID pattern item), so it is defined just in case. Should be relatively trivial to support. Applications may find a use for it when they want to statically define templates for flow rules, when they need room for some reason. > > Behavior > > -------- > > > > - API operations are synchronous and blocking (``EAGAIN`` cannot be > > returned). > > > > - There is no provision for reentrancy/multi-thread safety, although nothing > > should prevent different devices from being configured at the same > > time. PMDs may protect their control path functions accordingly. > > > > - Stopping the data path (TX/RX) should not be necessary when managing flow > > rules. If this cannot be achieved naturally or with workarounds (such as > > temporarily replacing the burst function pointers), an appropriate error > > code must be returned (``EBUSY``). > PMD cannot stop the data path without adding lock. So I think if some rules cannot be applied without stopping rx/tx, PMD has to return fail. > Or let the APP to stop the data path. Agreed, that is the intent. If the PMD cannot touch flow rules for some reason even after trying really hard, then it just returns EBUSY. Perhaps we should write down that applications may get a different outcome after stopping the data path if they get EBUSY? > > - PMDs, not applications, are responsible for maintaining flow rules > > configuration when stopping and restarting a port or performing other > > actions which may affect them. They can only be destroyed explicitly. > Don’t understand " They can only be destroyed explicitly." This part says that as long as an application has not called rte_flow_destroy() on a flow rule, it never disappears, whatever happens to the port (stopped, restarted). The application is not responsible for re-creating rules after that. Note that according to the specification, this may translate to not being able to stop a port as long as a flow rule is present, depending on how nice the PMD intends to be with applications. Implementation can be done in small steps with minimal amount of code on the PMD side. > If a new rule has conflict with an old one, what should we do? Return fail? That should not happen. If say 100 rules have been created with various priorities and the port is happily running with them, stopping the port may require the PMD to destroy them, it then has to re-create all 100 of them exactly as they were automatically when restarting the port. If re-creating them is not possible for some reason, the port cannot be restarted as long as rules that cannot be added back haven't been destroyed by the application. Frankly, this should not happen. To manage this case, I suggest preventing applications from doing things that conflict with existing flow rules while the port is stopped (just like when it is not stopped, as described in 5.7 "FDIR to most item types"). > > ``ANY`` pattern item > > ~~~~~~~~~~~~~~~~~~~~ > > > > This pattern item stands for anything, which can be difficult to translate > > to something hardware would understand, particularly if followed by more > > specific types. > > > > Consider the following pattern: > > > > +---+--------------------------------+ > > | 0 | ETHER | > > +---+--------------------------------+ > > | 1 | ANY (``min`` = 1, ``max`` = 1) | > > +---+--------------------------------+ > > | 2 | TCP | > > +---+--------------------------------+ > > > > Knowing that TCP does not make sense with something other than IPv4 and IPv6 > > as L3, such a pattern may be translated to two flow rules instead: > > > > +---+--------------------+ > > | 0 | ETHER | > > +---+--------------------+ > > | 1 | IPV4 (zeroed mask) | > > +---+--------------------+ > > | 2 | TCP | > > +---+--------------------+ > > > > +---+--------------------+ > > | 0 | ETHER | > > +---+--------------------+ > > | 1 | IPV6 (zeroed mask) | > > +---+--------------------+ > > | 2 | TCP | > > +---+--------------------+ > > > > Note that as soon as a ANY rule covers several layers, this approach may > > yield a large number of hidden flow rules. It is thus suggested to only > > support the most common scenarios (anything as L2 and/or L3). > I think "any" may make things confusing. How about if the NIC doesn't support IPv6? Should we return fail for this rule? In a sense you are right, ANY relies on HW capabilities so you cannot know that it won't match unsupported protocols. The above example would be somewhat useless for a conscious application which should really have created two separate flow rules (and gotten an error on the IPv6 one). So an ANY flow rule only able to match v4 packets won't return an error. ANY can be useful to match outer packets when only a tunnel header and the inner packet are meaningful to the application. HW that does not recognize the outer packet is not able to recognize the inner one anyway. This section only says that PMDs should do their best to make HW match what they can when faced with ANY. Also once again, ANY support is not mandatory. > > Flow rules priority > > ~~~~~~~~~~~~~~~~~~~ > > > > While it would naturally make sense, flow rules cannot be assumed to be > > processed by hardware in the same order as their creation for several > > reasons: > > > > - They may be managed internally as a tree or a hash table instead of a > > list. > > - Removing a flow rule before adding another one can either put the new rule > > at the end of the list or reuse a freed entry. > > - Duplication may occur when packets are matched by several rules. > > > > For overlapping rules (particularly in order to use the `PASSTHRU`_ action) > > predictable behavior is only guaranteed by using different priority levels. > > > > Priority levels are not necessarily implemented in hardware, or may be > > severely limited (e.g. a single priority bit). > > > > For these reasons, priority levels may be implemented purely in software by > > PMDs. > > > > - For devices expecting flow rules to be added in the correct order, PMDs > > may destroy and re-create existing rules after adding a new one with > > a higher priority. > > > > - A configurable number of dummy or empty rules can be created at > > initialization time to save high priority slots for later. > > > > - In order to save priority levels, PMDs may evaluate whether rules are > > likely to collide and adjust their priority accordingly. > If there's 3 rules, r1, r2,r3. The rules say the priority is r1 > r2 > r3. If PMD can only support r1 > r3 > r2, or doesn't support r2. Should PMD apply r1 and r3 or totally not support them all? Remember that the API lets applications create only one rule at a time. If all 3 rules are not supported together but individually are, the answer depends on what the application does: 1. r1 OK, r2 FAIL => application chooses to stop here, thus only r1 works as expected (may roll back and remove r1 as a result). 2. r1 OK, r2 FAIL, r3 OK => application chooses to ignore the fact r2 failed and added r3 anyway, so it should end up with r1 > r3. Applications should do as described in 1, they need to check for errors if they want consistency. This document describes only the basic functions, but may be extended later with methods to add several flow rules at once, so rules that depend on others can be added together and a single failure is returned without the need for a rollback at the application level. > A generic question, is the parsing supposed to be done by rte or PMD? Actually, a bit of both. EAL will certainly at least provide helpers to assist PMDs. This specification defines only the public-facing API for now, but our goal is really to have something that is not too difficult to implement both for applications and PMDs. These helpers can be defined later with the first implementation. -- Adrien Mazarguil 6WIND