From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id DD988A055A; Thu, 27 Feb 2020 09:54:28 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 414791BFA9; Thu, 27 Feb 2020 09:54:28 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 2F2C737AF for ; Thu, 27 Feb 2020 09:54:26 +0100 (CET) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Feb 2020 00:54:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,491,1574150400"; d="scan'208";a="256651492" Received: from hyperscan.sh.intel.com (HELO hyperscan) ([10.67.104.81]) by orsmga002.jf.intel.com with ESMTP; 27 Feb 2020 00:54:16 -0800 Date: Thu, 27 Feb 2020 17:16:43 +0800 From: Wang Xiang To: Ori Kam Cc: Jerin Jacob , Jerin Jacob , dpdk-dev , Pavan Nikhilesh , Shahaf Shuler , Hemant Agrawal , Opher Reviv , Alex Rosenbaum , "dovrat@marvell.com" , Prasun Kapoor , Nipun Gupta , "Richardson, Bruce" , "yang.a.hong@intel.com" , "harry.chang@intel.com" , "gu.jian1@zte.com.cn" , "shanjiangh@chinatelecom.cn" , "zhangy.yun@chinatelecom.cn" , "lixingfu@huachentel.com" , "wushuai@inspur.com" , "yuyingxia@yxlink.com" , "fanchenggang@sunyainfo.com" , "davidfgao@tencent.com" , "liuzhong1@chinaunicom.cn" , "zhaoyong11@huawei.com" , "oc@yunify.com" , "jim@netgate.com" , "hongjun.ni@intel.com" , "j.bromhead@titan-ic.com" , "deri@ntop.org" , "fc@napatech.com" , "arthur.su@lionic.com" , Thomas Monjalon Message-ID: <20200227091643.GA11131@hyperscan> References: <20200226090318.GA85327@hs1> <20200227092539.GA2188@hs1> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Ori, Look forward to your v4:). Thanks, Xiang On Thu, Feb 27, 2020 at 07:31:48AM +0000, Ori Kam wrote: > Hi Xiang, > > > -----Original Message----- > > From: Wang Xiang > > Sent: Thursday, February 27, 2020 11:26 AM > > To: Ori Kam > > Cc: Jerin Jacob ; Jerin Jacob ; > > dpdk-dev ; Pavan Nikhilesh ; > > Shahaf Shuler ; Hemant Agrawal > > ; Opher Reviv ; Alex > > Rosenbaum ; dovrat@marvell.com; Prasun Kapoor > > ; Nipun Gupta ; Richardson, > > Bruce ; yang.a.hong@intel.com; > > harry.chang@intel.com; gu.jian1@zte.com.cn; shanjiangh@chinatelecom.cn; > > zhangy.yun@chinatelecom.cn; lixingfu@huachentel.com; wushuai@inspur.com; > > yuyingxia@yxlink.com; fanchenggang@sunyainfo.com; > > davidfgao@tencent.com; liuzhong1@chinaunicom.cn; > > zhaoyong11@huawei.com; oc@yunify.com; jim@netgate.com; > > hongjun.ni@intel.com; j.bromhead@titan-ic.com; deri@ntop.org; > > fc@napatech.com; arthur.su@lionic.com; Thomas Monjalon > > > > Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem > > > > Hi Ori, > > > > Thanks for the comments. > > > > Hyperscan supports both start_offset and end_offset modes with most > > users choosing end_offset for rule coverage and performance reasons. > > I'm OK to have the default behavior with start_offset and len. > > It'll be good to change RTE_REGEX_DEV_CFG_MATCH_AS_START to > > RTE_REGEX_DEV_CFG_MATCH_AS_END. For users who need only end_offset, > > they have to set RTE_REGEX_DEV_CFG_MATCH_AS_END bit. We may also > > remove > > RTE_REGEX_DEV_SUPP_MATCH_AS_START if you like. > > Since you say that Hyperscan can support both modes, > What about changing the cap field to RTE_REGEX_DEV_SUPP_MATCH_AS_END? > Ack. Looks good to me. > > > > One question is related to the consistency of start_offset definition > > among different solutions, does all solutions return the leftmost > > start_offset, i.e. for rule: foo.*bar and input: foofoobar, the returned > > start_offset will be 0 not 3? > > > Yes, you are correct, > In Mellanox case there will be only one result: start_offset 0 > > > Thanks, > > Xiang > > > > On Wed, Feb 26, 2020 at 08:36:51AM +0000, Ori Kam wrote: > > > Hi Xiang, > > > > > > > > > > -----Original Message----- > > > > From: dev On Behalf Of Wang Xiang > > > > Sent: Wednesday, February 26, 2020 11:03 AM > > > > To: Ori Kam > > > > Cc: Jerin Jacob ; Jerin Jacob > > ; > > > > dpdk-dev ; Pavan Nikhilesh ; > > > > Shahaf Shuler ; Hemant Agrawal > > > > ; Opher Reviv ; Alex > > > > Rosenbaum ; dovrat@marvell.com; Prasun Kapoor > > > > ; Nipun Gupta ; > > Richardson, > > > > Bruce ; yang.a.hong@intel.com; > > > > harry.chang@intel.com; gu.jian1@zte.com.cn; > > shanjiangh@chinatelecom.cn; > > > > zhangy.yun@chinatelecom.cn; lixingfu@huachentel.com; > > wushuai@inspur.com; > > > > yuyingxia@yxlink.com; fanchenggang@sunyainfo.com; > > > > davidfgao@tencent.com; liuzhong1@chinaunicom.cn; > > > > zhaoyong11@huawei.com; oc@yunify.com; jim@netgate.com; > > > > hongjun.ni@intel.com; j.bromhead@titan-ic.com; deri@ntop.org; > > > > fc@napatech.com; arthur.su@lionic.com; Thomas Monjalon > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem > > > > > > > > Hi Ori and Jerin, > > > > > > > > One comment regarding my concern with len and end_offset problem. > > > > From open source SW regex library(libpcre, re2 and Hyperscan) and > > > > Intel's perspective, the matching results returned are always start > > > > offset and end offset. More importantly, Hyperscan only reports end offset > > > > most of the time. > > > > > > > > It'll be good to keep this union as an abstraction and enforce the default > > > > behavior for each solution, i.e. HW solutions doesn't support > > MATCH_AS_START > > > > flag at rule compile time. Applications will know the meaning of variable at > > > > rule compile time with the flag so they don't have to do extra check at fast > > path > > > > run-time matching. > > > > Welcome for better abstraction ideas. > > > > > > > > > > I don't mind to keep the union as it was in V3, but I would like to remove the > > > configuration bit (RTE_REGEX_DEV_CFG_MATCH_AS_START). > > > Meaning that if the device reports > > RTE_REGEX_DEV_SUPP_MATCH_AS_START > > > the result will always be with start_offset and len. > > > > > > Best, > > > Ori > > > > > > > Thanks, > > > > Xiang > > > > > > > > > > > > + /**< Starting Byte Position for matched rule. */ > > > > > > > > + RTE_STD_C11 > > > > > > > > + union { > > > > > > > > + uint16_t len; > > > > > > > > + /**< Length of match in bytes */ > > > > > > > > + uint16_t end_offset; > > > > > > > > + /**< The end offset of the match. In case > > > > > > > > + * MATCH_AS_START configuration is disabled. > > > > > > > > + * @see > > RTE_REGEX_DEV_CFG_MATCH_AS_START > > > > > > > > + */ > > > > > > > > > > > > > > We have not concluded on this scheme. Have one field which has > > > > > > > different meaning will be difficult > > > > > > > for application. i.e fast path we need to have a check for this. > > > > > > > > > > > > > > > > > > > This is the time to conclude . at least for the first version. > > > > > > Why do we have one field with different meaning? > > > > > > The result can be ether len or end_offset. > > > > > > > > > > > > > I think, Based on the majority of HW/SW implementation, we need to > > > > > > > either go with len or > > > > > > > end_offset. What Mellanox HW returns? len or end_offset? > > > > > > > > > > > > > > > > > > > From Mellanox perspective we prefer the len approach. We also think > > > > > > it is much more user oriented. > > > > > > > > > > > > > or We can keep it as len or end_offset based on which drivers > > upstream > > > > > first, > > > > > > > other drivers when it comes, we can see how to abstract it? > > > > > > > > > > > > > > > > > > > I can except that assuming we choose the start and len approach > > > > > > > > > > I think, we can have first version with "start and len" by removing > > > > > RTE_REGEX_DEV_CFG_MATCH_AS_START. > > > > > When can think, how to abstract new drivers when it upstream based on > > > > > the overhead. > > > > > > > > > > > > > > > > > On Tue, Feb 25, 2020 at 07:48:54AM +0000, Ori Kam wrote: > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Jerin Jacob > > > > > > Sent: Tuesday, February 25, 2020 7:57 AM > > > > > > To: Ori Kam > > > > > > Cc: Jerin Jacob ; xiang.w.wang@intel.com; dpdk- > > dev > > > > > > ; Pavan Nikhilesh ; > > Shahaf > > > > > > Shuler ; Hemant Agrawal > > > > > > ; Opher Reviv ; > > Alex > > > > > > Rosenbaum ; dovrat@marvell.com; Prasun > > Kapoor > > > > > > ; Nipun Gupta ; > > > > Richardson, > > > > > > Bruce ; yang.a.hong@intel.com; > > > > > > harry.chang@intel.com; gu.jian1@zte.com.cn; > > > > shanjiangh@chinatelecom.cn; > > > > > > zhangy.yun@chinatelecom.cn; lixingfu@huachentel.com; > > > > wushuai@inspur.com; > > > > > > yuyingxia@yxlink.com; fanchenggang@sunyainfo.com; > > > > > > davidfgao@tencent.com; liuzhong1@chinaunicom.cn; > > > > > > zhaoyong11@huawei.com; oc@yunify.com; jim@netgate.com; > > > > > > hongjun.ni@intel.com; j.bromhead@titan-ic.com; deri@ntop.org; > > > > > > fc@napatech.com; arthur.su@lionic.com; Thomas Monjalon > > > > > > > > > > > > Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev > > subsystem > > > > > > > > > > > > > > 4) app/test/test_regexdev.c like app/test/test_eventdev.c > > > > > > > > > > > > > > We started to create a super basic app, after the API will be finalized > > and > > > > we > > > > > > will have HW > > > > > > > we can push it. (if you need it faster than feel free) > > > > > > > > > > > > A simple Unit test case needs to be present for the APIs. On the > > > > > > course of developing common code, > > > > > > it can be developed to test the common code with dummy/skeleton > > driver. > > > > > > > > > > > > > > > > Agree this is what we are currently have. > > > > > > > > > > > > > > > > > > > > 5) Need a maintainer for maintaining the regex subsystem > > > > > > > > > > > > > > > We wish to maintain it if you agree. > > > > > > > > > > > > Yes. Please. > > > > > > > > > > > > > > > > Great. > > > > > > > > > > > > > > > > > > > > > > > One more thing, regarding the ops structure, I think it is better to > > split > > > > it > > > > > > to 2 > > > > > > > > different > > > > > > > > > structures one enque and one for dequeue, since there are no real > > > > shared > > > > > > > > data and we will > > > > > > > > > be able to save memory, what do you think? > > > > > > > > > > > > > > > > Ops are allocated from mempool so it will be overhead to manage > > both. > > > > > > > > moreover, some > > > > > > > > of the fields added in req can be used for resp as info. cryptodev > > > > > > > > follows the similar concept, > > > > > > > > I think, we can have symmetry with cryptodev wherever is possible > > to > > > > avoid > > > > > > > > end-user to learn new API models. > > > > > > > > > > > > > > True that there will be overhead with 2 mempools (small one) > > > > > > > but lets assume 255 results. This means that the buffer should be 255 > > * > > > > > > sizeof(rte_regex_match) = 2K > > > > > > > also this will enable us to replace groupX with group[] which will allow > > > > even > > > > > > more groups. > > > > > > > In addition don't think that crypto is a good example. > > > > > > > The main difference is that in RegEx the output is different format > > then > > > > the > > > > > > input. > > > > > > > > > > > > # IMO, Some of the fields may be useful for a response as well. I > > > > > > think application may be interested in following > > > > > > req filed in the response. > > > > > > a) buf_addr > > > > > > > > > > I don't see how this can be used in the response. since if working in out of > > > > order result. > > > > > you don’t know which result will be returned. > > > > > I also think it is error prone to use the same op for the enqueue and > > dequeue. > > > > > > > > > > > b) scan_size > > > > > > > > > > Please see above. > > > > > > > > > > > c) user_id (This would be main one) > > > > > > > > > > Agree > > > > > > > > > > > > > > > > > # Having two mempools adds overhead per lcore L1 cache usage and > > extra > > > > > > complexity to the application. > > > > > > > > > > > > # IMO, From a performance perspective, one mempool is good due to > > less > > > > > > stress on the cache and it is costly to > > > > > > add new mempool for HW mempool implementations. > > > > > > > > > > > > # I think, group[] use case we can add it when it required by > > > > > > introducing "matches_start_offset" field, which will > > > > > > tell the req, where is the end of group[] and where "matches" start > > > > > > with single mempool scheme also. > > > > > > > > > > > > # I think, one of the other use case for "matches_start_offset" that, > > > > > > It may possible to put vendor-specific > > > > > > opaque data. It will be filled by driver on response. The application > > > > > > can reference the matches as > > > > > > > > > > > > struct rte_regex_match *matches = RTE_PTR_ADD(ops, ops- > > > > > > > > > > > >matches_start_offset); > > > > > > > > > > > > > > > > O.K for now we will keep it as is, and we will see what will be in the > > future. > > > > > > > > > > > > > > > > > > > > I assume you will send the v4 with these comments. I think, with v4 > > we > > > > > > > > can start implementing common library code. > > > > > > > > > > > > > > Just need to agree on the split (one more iteration ) > > > > > > > and I will start working on the common code. > > > > > > > > > > > > Ack. > > > > > > > > > > Great, > > > > > I'm starting to work on V4 with all comments so the RFC will be acked and > > > > then will start > > > > > coding the rest of the common code. > > > > >