From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 3A662374F for ; Tue, 23 May 2017 18:42:12 +0200 (CEST) Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga105.fm.intel.com with ESMTP; 23 May 2017 09:42:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,382,1491289200"; d="scan'208";a="105660187" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.237.220.81]) ([10.237.220.81]) by fmsmga005.fm.intel.com with ESMTP; 23 May 2017 09:42:10 -0700 To: Adrien Mazarguil Cc: "Iremonger, Bernard" , "dev@dpdk.org" , "Mcnamara, John" , "Tahhan, Maryam" References: <20170420185448.19162-1-ferruh.yigit@intel.com> <20170518181203.15244-1-ferruh.yigit@intel.com> <8CEF83825BEC744B83065625E567D7C224D5F20E@IRSMSX108.ger.corp.intel.com> <7428261c-4276-1d28-63c2-3a51a583cdf9@intel.com> <20170523122612.GC1758@6wind.com> <2742b205-fec9-3414-7693-e0c2369ae80d@intel.com> <20170523133057.GD1758@6wind.com> From: Ferruh Yigit Message-ID: <949610d4-c6fe-057e-f32c-f653ac15f863@intel.com> Date: Tue, 23 May 2017 17:42:09 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170523133057.GD1758@6wind.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC v2] Flow classification library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 May 2017 16:42:13 -0000 On 5/23/2017 2:30 PM, Adrien Mazarguil wrote: > On Tue, May 23, 2017 at 01:58:44PM +0100, Ferruh Yigit wrote: >> On 5/23/2017 1:26 PM, Adrien Mazarguil wrote: >>> On Mon, May 22, 2017 at 02:53:28PM +0100, Ferruh Yigit wrote: >>>> On 5/19/2017 5:30 PM, Iremonger, Bernard wrote: >>>>> Hi Ferruh, >>>>> >>>>>> -----Original Message----- >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit >>>>>> Sent: Thursday, May 18, 2017 7:12 PM >>>>>> To: dev@dpdk.org >>>>>> Cc: Yigit, Ferruh ; Mcnamara, John >>>>>> ; Tahhan, Maryam >>>>>> >>>>>> Subject: [dpdk-dev] [RFC v2] Flow classification library >>>>>> >>>>>> DPDK works with packets, but some network administration tools works >>>>>> based on flow information. >>>>>> >>>>>> This library is suggested to provide helper API to convert packet based >>>>>> information to the flow records. >>>>>> >>>>>> Basically the library consist of a single API that gets packets, flow definition >>>>>> and action as parameter and provides flow stats based on action. Application >>>>>> should call the API for all received packets. >>>>>> >>>>>> Library header file has more comments on how library works and provided >>>>>> APIs. >>>>>> >>>>>> Packets to flow conversion will cause performance drop, that is why >>>>>> conversion done on demand by an API call provided by this library. >>>>>> >>>>>> Initial implementation in mind is to provide support for IPFIX metering >>>>>> process but library planned to be as generic as possible. And flow information >>>>>> provided by this library is missing to implement full IPFIX features, but this is >>>>>> planned to be initial step. >>>>>> >>>>>> Flows are defined using rte_flow, also measurements (actions) are provided >>>>>> by rte_flow. To support more IPFIX measurements, the implementation may >>>>>> require extending rte_flow addition to implementing this library. >>>>> >>>>> Do you know what extensions are needed to the rte_flow code? >>>> >>>> The extension may be required on two fields: >>>> 1- Defining the flow >>>> 2- Available actions >>>> >>>> For defining the flow, an update may not be required, specially at first >>>> version of the library. >>>> >>>> But for action, there may be some updates. >>>> >>>> IPFIX RFC defines Metering process as [1], (in [2]). This library should >>>> provide helper APIs to metering process. >>>> >>>> Currently only action can be used in rte_flow is COUNT, more actions can >>>> be added to help "packet header capturing, timestamping, sampling, >>>> classifying" tasks of the metering process. >>>> >>>> The exact list depends on the what will be implemented in this release. >>>> >>>> >>>> [1] >>>> Metering Process >>>> >>>> The Metering Process generates Flow Records. Inputs to the >>>> process are packet headers, characteristics, and Packet Treatment >>>> observed at one or more Observation Points. >>>> >>>> The Metering Process consists of a set of functions that includes >>>> packet header capturing, timestamping, sampling, classifying, and >>>> maintaining Flow Records. >>>> >>>> The maintenance of Flow Records may include creating new records, >>>> updating existing ones, computing Flow statistics, deriving >>>> further Flow properties, detecting Flow expiration, passing Flow >>>> Records to the Exporting Process, and deleting Flow Records. >>>> >>>> [2] >>>> https://tools.ietf.org/html/rfc7011 >>> >>> Since I did not take this into account in my previous answer [3], I now >>> understand several of these requirements cannot be met by hardware (at least >>> in the near future). Therefore I think it makes sense to leave IPFIX and >>> more generally the maintenance of software data associated with flows to >>> separate libraries, instead of adding many new rte_flow actions that can >>> only be implemented in software. >>> >>> A hybrid solution as described in [3] is still needed regardless to offload >>> flow recognition, so that only flows of interest are reprocessed in software >>> to compute IPFIX and other data. >>> >>> You suggested at one point to take flow rules in addition to mbufs as input >>> to handle that. Well, that's actually a nice approach. >>> >>> For this to work, rte_flow_classify would have to use opaque handles like >>> rte_flow, provided back by the application when attempting to classify >>> traffic. If the handle is not known (e.g. MARK is unsupported), a separate >>> API function could take a mbuf as input and spit the related >>> rte_flow_classify object if any. >>> >>> To be clear: >>> >>> 1. Create classifier object: >>> >>> classify = rte_flow_classify_create([some rte_flow pattern], >>> [classify-specific actions list, associated resources]); >>> >>> 2. Create some flow rule with a MARK action to identify it uniquely. This >>> step might fail and flow can be NULL, that's not an issue: >>> >>> flow = rte_flow_create([the same pattern], MARK with id 42) >>> >>> 3. For each received packet: >>> >>> /* >>> * Attempt HW and fall back on SW for flow identification in order to >>> * update classifier flow-related data. >>> */ >>> if (flow) { >>> if (mbuf->ol_flags & PKT_RX_FDIR_ID && mbuf->hash.fdir.hi == 42) >>> tmp_classify = classify; >>> } else { >>> tmp_classify = rte_flow_classify_lookup([classifier candidates], mbuf); >>> } >>> if (tmp_classify) >>> rte_flow_classify_update(tmp_classify, mbuf); >>> >>> 4. At some point, retrieve computed data from the classifier object itself: >>> >>> rte_flow_classify_stats_get(classify, [output buffer(s)]) >>> >>> On the RX path, the MARK action can be enough to implement the above. When >>> not supported, it could also be emulated through the "sw_fallback" bit >>> described in [3] however if the above approach is fine, no need for that. >>> >>> It's a bit more complicated to benefit from rte_flow on the TX path since no >>> MARK data can be returned. There is currently no other solution than doing >>> it all in software anyway. >>> >>> [3] http://dpdk.org/ml/archives/dev/2017-May/066177.html >>> >> >> Using MARK action is good idea indeed, but I believe the software >> version needs to be implemented anyway, so the MARK action can be next >> step optimization. > > Does it mean you are you fine with the separation of RFCv2's > rte_flow_classify_stats_get() into a sort of rte_flow_classify_lookup() and > rte_flow_classify_update() as described? No indeed, I was planning to postpone that step and keep continue with single rte_flow_classify_stats_get() > > There is otherwise no change needed to benefit from MARK optimization > beside providing the ability to not perform the lookup step when not > necessary. > >> And another thing is, having 3. and 4. as separate steps means >> rte_flow_classify to maintain the flow_records in the library for an >> unknown time. > > I think it should be an application's problem. The library does not > necessarily allocate anything since the related temporary storage can be > provided as arguments to rte_flow_classify_create(). > > Expiration could be handled by making rte_flow_classify_lookup() return some > error code when the flow has expired. Application would then be free to call > rte_flow_classify_destroy() directly or retrieve stats one last time through > rte_flow_classify_stats_get() before that. > >> Application needs to explicitly do the call for 3., why not return the >> output buffers immediately so the application do the required >> association of data to the ports. > > 3. is typically done in the data path and must be as fast as possible, > right? The way I see it, in many cases if the flow is already identified > (classifier object provided) the mbuf does not even need to be parsed for > associated counters to be incremented. > > Also because rte_flow_classify_stats_get() could perform extra computation > to convert internal data to an application-friendly format. > >> Mainly, I would like to keep same API logic in RFC v2, but agree to add >> two new APIs: >> rte_flow_classify_create() >> rte_flow_classify_destroy() >> >> Create will return and opaque rte_flow object (as done in rte_flow), >> meanwhile it can create some masks in rte_flow object to help parsing mbuf. > > OK, I suggest naming that object "struct rte_flow_classify" to avoid > confusion though. OK, makes sense. > >> And rte_flow_classify_stats_get() will get rte_flow object and stats as >> parameter. Also instead of getting an array of filters, to simplify the >> logic, it will get a rte_flow object at a time. > > Perfect. > >> I am planning to send an updated RFC (v3) as above, if there is no major >> objection, we can continue to discuss on that RFC. > > Except for my above comments, looks like we're converging. >