From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by dpdk.org (Postfix) with ESMTP id 22384F72 for ; Tue, 19 Jul 2016 15:12:25 +0200 (CEST) Received: by mail-wm0-f51.google.com with SMTP id o80so25917241wme.1 for ; Tue, 19 Jul 2016 06:12:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=2zOvKmYhYWJMzdW1UJyW8H3V7ptR6+4Qan71tkxkTRA=; b=FJcSxlp81CDpB3IsUI4ZCERzJEdI7ValfnpU+idthcLUvB/T36W84QT1bMh46kVpNx ZLkrIJz/dS89XPqMWn6gmxAd/Y8FZKjsTbSxVeJdLT2PtLq/01F9XJVPHAmR1K5Mfmbz xxL6WcyRt+K9q0OwA1ndXQYst4A5KvM3KMhL3O3eN3yZNvTBD4u8c2O59YJMBD7DLg/o jJHcjwVUnnPO+IYn4TbvBd1d0y3Df4q4F3Cu0OQTrWXa2V5XnOe15uEfCLqVSyKx9T3B x+wzZaSdYexXUWyCacqFjmgbF4jQ2+VlUcqiTH5mqbqYpDRzMYl7z/ejeMTh4hMfWXlf tlxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=2zOvKmYhYWJMzdW1UJyW8H3V7ptR6+4Qan71tkxkTRA=; b=ZSHWRrOkbqJltjXLIh62XErVTc4TG73u17MTyv1Q6cBD0w+dzFu803VWMLazToRJpT d8DIn9h92Dsq3ZJ1uE8pUPDMp5X893i4q5sb1GUauoNGyJKrpzT1wDyca9HhefBsvO07 ThZsigodUjcTodOl+xZ2sk6kTFE4No9IiLEeotUwsAQD/ZfpAlfMDRDkiYtAAIETrLFp 9nZ3RoKza5YMOPA5+9AJtZHJY9miya+faeD7yf9zwWwhfZuibj4Ni6qw5/q9V9tShh9N ChTNX5dcolpUXyVNJqiIMERbswzo43QqNUJk0A0Mtx9i3sUngxDlKg7TumIjR//PblWU Kzpg== X-Gm-Message-State: ALyK8tJwU1uCx2Gv4Vq9qV7AoDdNe6sPBKQsaTzyI29Cj+HkeA14fJGhj/GdN9Z3sVkTL0EK X-Received: by 10.28.176.7 with SMTP id z7mr4128345wme.17.1468933944164; Tue, 19 Jul 2016 06:12:24 -0700 (PDT) Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by smtp.gmail.com with ESMTPSA id i66sm22872494wmg.9.2016.07.19.06.12.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 Jul 2016 06:12:23 -0700 (PDT) Date: Tue, 19 Jul 2016 15:12:19 +0200 From: Adrien Mazarguil To: "Lu, Wenzhuo" Cc: "dev@dpdk.org" , Thomas Monjalon , "Zhang, Helin" , "Wu, Jingjing" , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Jan Medala , John Daley , "Chen, Jing D" , "Ananyev, Konstantin" , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , "De Lara Guarch, Pablo" , Olga Shern Message-ID: <20160719131219.GK7621@6wind.com> Mail-Followup-To: "Lu, Wenzhuo" , "dev@dpdk.org" , Thomas Monjalon , "Zhang, Helin" , "Wu, Jingjing" , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Jan Medala , John Daley , "Chen, Jing D" , "Ananyev, Konstantin" , Matej Vido , Alejandro Lucero , Sony Chacko , Jerin Jacob , "De Lara Guarch, Pablo" , Olga Shern References: <20160705181646.GO7621@6wind.com> <6A0DE07E22DDAD4C9103DF62FEBC09090348E1A7@shsmsx102.ccr.corp.intel.com> <20160707102650.GU7621@6wind.com> <6A0DE07E22DDAD4C9103DF62FEBC090903492563@shsmsx102.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6A0DE07E22DDAD4C9103DF62FEBC090903492563@shsmsx102.ccr.corp.intel.com> Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jul 2016 13:12:25 -0000 On Tue, Jul 19, 2016 at 08:11:48AM +0000, Lu, Wenzhuo wrote: > Hi Adrien, > Thanks for your clarification. Most of my questions are clear, but still something may need to be discussed, comment below. Hi Wenzhuo, Please see below. [...] > > > > Requirements for a new API: > > > > > > > > - Flexible and extensible without causing API/ABI problems for existing > > > > applications. > > > > - Should be unambiguous and easy to use. > > > > - Support existing filtering features and actions listed in `Filter types`_. > > > > - Support packet alteration. > > > > - In case of overlapping filters, their priority should be well documented. > > > Does that mean we don't guarantee the consistent of priority? The priority can > > be different on different NICs. So the behavior of the actions can be different. > > Right? > > > > No, the intent is precisely to define what happens in order to get a consistent > > result across different devices, and document cases with undefined behavior. > > There must be no room left for interpretation. > > > > For example, the API must describe what happens when two overlapping filters > > (e.g. one matching an Ethernet header, another one matching an IP header) > > match a given packet at a given priority level. > > > > It is documented in section 4.1.1 (priorities) as "undefined behavior". > > Applications remain free to do it and deal with consequences, at least they know > > they cannot expect a consistent outcome, unless they use different priority > > levels for both rules, see also 4.4.5 (flow rules priority). > > > > > Seems the users still need to aware the some details of the HW? Do we need > > to add the negotiation for the priority? > > > > Priorities as defined in this document may not be directly mappable to HW > > capabilities (e.g. HW does not support enough priorities, or that some corner > > case make them not work as described), in which case the PMD may choose to > > simulate priorities (again 4.4.5), as long as the end result follows the > > specification. > > > > So users must not be aware of some HW details, the PMD does and must > > perform the needed workarounds to suit their expectations. Users may only be > > impacted by errors while attempting to create rules that are either unsupported > > or would cause them (or existing rules) to diverge from the spec. > The problem is sometime the priority of the filters is fixed according to > > HW's implementation. For example, on ixgbe, n-tuple has a higher > > priority than flow director. As a side note I did not know that N-tuple had a higher priority than flow director on ixgbe, priorities among filter types do not seem to be documented at all in DPDK. This is one of the reasons I think we need a generic API to handle flow configuration. So, today an application cannot combine N-tuple and FDIR flow rules and get a reliable outcome, unless it is designed for specific devices with a known behavior. > What's the right behavior of PMD if APP want to create a flow director rule which has a higher or even equal priority than an existing n-tuple rule? Should PMD return fail? First remember applications only deal with the generic API, PMDs are responsible for choosing the most appropriate HW implementation to use according to the requested flow rules (FDIR, N-tuple or anything else). For the specific case of FDIR vs N-tuple, if the underlying HW supports both I do not see why the PMD would create a N-tuple rule. Doesn't FDIR support everything N-tuple can do and much more? Assuming such a thing happened anyway, that the PMD had to create a rule using a high priority filter type and that the application requests the creation of a rule that can only be done using a lower priority filter type, but also requested a higher priority for that rule, then yes, it should obviously fail. That is, unless the PMD can perform some kind of workaround to have both. > If so, do we need more fail reasons? According to this RFC, I think we need return " EEXIST: collision with an existing rule. ", but it's not very clear, APP doesn't know the problem is priority, maybe more detailed reason is helpful. Possibly, I've defined a basic set of errors, there are quite a number of errno values to choose from. However I think we should not define too many values. In my opinion the basic set covers every possible failure: - EINVAL: invalid format, rule is broken or cannot be understood by the PMD anyhow. - ENOTSUP: pattern/actions look fine but something in the requested rule is not supported and thus cannot be applied. - EEXIST: pattern/actions are fine and could have been applied if only some other rule did not prevent the PMD to do it (I see it as the closest thing to "ETOOBAD" which unfortunately does not exist). - ENOMEM: like EEXIST, except it is due to the lack of resources not because of another rule. I wasn't sure which of ENOMEM or ENOSPC was better but settled on ENOMEM as it is well known. Still open to debate. Errno values are only useful to get a rough idea of the reason, and another mechanism is needed to pinpoint the exact problem for debugging/reporting purposes, something like: enum rte_flow_error_type { RTE_FLOW_ERROR_TYPE_NONE, RTE_FLOW_ERROR_TYPE_UNKNOWN, RTE_FLOW_ERROR_TYPE_PRIORITY, RTE_FLOW_ERROR_TYPE_PATTERN, RTE_FLOW_ERROR_TYPE_ACTION, }; struct rte_flow_error { enum rte_flow_error_type type; void *offset; /* Points to the exact pattern item or action. */ const char *message; }; Then either provide an optional struct rte_flow_error pointer to rte_flow_validate(), or a separate function (rte_flow_analyze()?), since processing this may be quite expensive and applications may not care about the exact reason. What do you suggest? > > > > Behavior > > > > -------- > > > > > > > > - API operations are synchronous and blocking (``EAGAIN`` cannot be > > > > returned). > > > > > > > > - There is no provision for reentrancy/multi-thread safety, although nothing > > > > should prevent different devices from being configured at the same > > > > time. PMDs may protect their control path functions accordingly. > > > > > > > > - Stopping the data path (TX/RX) should not be necessary when managing > > flow > > > > rules. If this cannot be achieved naturally or with workarounds (such as > > > > temporarily replacing the burst function pointers), an appropriate error > > > > code must be returned (``EBUSY``). > > > PMD cannot stop the data path without adding lock. So I think if some rules > > cannot be applied without stopping rx/tx, PMD has to return fail. > > > Or let the APP to stop the data path. > > > > Agreed, that is the intent. If the PMD cannot touch flow rules for some reason > > even after trying really hard, then it just returns EBUSY. > > > > Perhaps we should write down that applications may get a different outcome > > after stopping the data path if they get EBUSY? > Agree, it's better to describe more about the APP. BTW, I checked the behavior of ixgbe/igb, I think we can add/delete filters during runtime. Hopefully we'll not hit too many EBUSY problems on other NICs :) OK, I will add it. > > > > - PMDs, not applications, are responsible for maintaining flow rules > > > > configuration when stopping and restarting a port or performing other > > > > actions which may affect them. They can only be destroyed explicitly. > > > Don’t understand " They can only be destroyed explicitly." > > > > This part says that as long as an application has not called > > rte_flow_destroy() on a flow rule, it never disappears, whatever happens to the > > port (stopped, restarted). The application is not responsible for re-creating rules > > after that. > > > > Note that according to the specification, this may translate to not being able to > > stop a port as long as a flow rule is present, depending on how nice the PMD > > intends to be with applications. Implementation can be done in small steps with > > minimal amount of code on the PMD side. > Does it mean PMD should store and maintain all the rules? Why not let rte do that? I think if PMD maintain all the rules, it means every kind of NIC should have a copy of code for the rules. But if rte do that, only one copy of code need to be maintained, right? I've considered having rules stored in a common format understood at the RTE level and not specific to each PMD and decided that the opaque rte_flow pointer was a better choice for the following reasons: - Even though flow rules management is done in the control path, processing must be as fast as possible. Letting PMDs store flow rules using their own internal representation gives them the chance to achieve better performance. - An opaque context managed by PMDs would probably have to be stored somewhere as well anyway. - PMDs may not need to allocate/store anything at all if they exclusively rely on HW state for everything. In my opinion, the generic API has enough constraints for this to work and maintain consistency between flow rules. Note this is currently how most PMDs implement FDIR and other filter types. - RTE can (and will) provide helpers to avoid most of the code redundancy, PMDs are free to use them or manage everything by themselves. - Given that the opaque rte_flow pointer associated with a flow rule is to be stored by the application, PMDs do not even have to keep references to them. - The flow rules format described in this specification (pattern / actions) will be used by applications directly, and will be free to arrange them in lists, trees or in any other way if they need to keep flow specifications around for further processing. > When the port is stopped and restarted, rte can reconfigure the rules. Is the concern that PMD may adjust the sequence of the rules according to the priority, so every NIC has a different list of rules? But PMD can adjust them again when rte reconfiguring the rules. What about PMDs able to stop and restart ports without destroying their own flow rules? If we assume flow rules must be destroyed when stopping a port, these PMDs are needlessly penalized with slower stop/start cycles. Think about it assuming thousands of flow rules. Thus from an application point of view, whatever happens when stopping and restarting a port should not matter. If a flow rule was present before, it must still be present afterwards. If the PMD had to destroy flow rules and re-create them, it does not actually matter if they differ slightly at the HW level, as long as: - Existing opaque flow rule pointers (rte_flow) are still valid to the PMD and refer to the same rules. - The overall behavior of all rules is the same. The list of rules you think of (patterns / actions) is maintained by applications (not RTE), and only if they need them. RTE would needlessly duplicate this. -- Adrien Mazarguil 6WIND