From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 5A126A055E;
	Wed, 26 Feb 2020 02:47:30 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 3109C1BFAC;
	Wed, 26 Feb 2020 02:47:29 +0100 (CET)
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id E3A7E2C02
 for <dev@dpdk.org>; Wed, 26 Feb 2020 02:47:26 +0100 (CET)
X-Amp-Result: UNKNOWN
X-Amp-Original-Verdict: FILE UNKNOWN
X-Amp-File-Uploaded: False
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 25 Feb 2020 17:47:25 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.70,486,1574150400"; d="scan'208";a="271548255"
Received: from hs1.sh.intel.com (HELO hs1) ([10.67.119.44])
 by fmsmga002.fm.intel.com with ESMTP; 25 Feb 2020 17:47:18 -0800
Date: Wed, 26 Feb 2020 04:03:18 -0500
From: Wang Xiang <xiang.w.wang@intel.com>
To: Ori Kam <orika@mellanox.com>
Cc: Jerin Jacob <jerinjacobk@gmail.com>, Jerin Jacob <jerinj@marvell.com>,
 dpdk-dev <dev@dpdk.org>, Pavan Nikhilesh <pbhagavatula@marvell.com>,
 Shahaf Shuler <shahafs@mellanox.com>,
 Hemant Agrawal <hemant.agrawal@nxp.com>, Opher Reviv <opher@mellanox.com>,
 Alex Rosenbaum <alexr@mellanox.com>,
 "dovrat@marvell.com" <dovrat@marvell.com>,
 Prasun Kapoor <pkapoor@marvell.com>, Nipun Gupta <nipun.gupta@nxp.com>,
 "Richardson, Bruce" <bruce.richardson@intel.com>,
 "yang.a.hong@intel.com" <yang.a.hong@intel.com>,
 "harry.chang@intel.com" <harry.chang@intel.com>,
 "gu.jian1@zte.com.cn" <gu.jian1@zte.com.cn>,
 "shanjiangh@chinatelecom.cn" <shanjiangh@chinatelecom.cn>,
 "zhangy.yun@chinatelecom.cn" <zhangy.yun@chinatelecom.cn>,
 "lixingfu@huachentel.com" <lixingfu@huachentel.com>,
 "wushuai@inspur.com" <wushuai@inspur.com>,
 "yuyingxia@yxlink.com" <yuyingxia@yxlink.com>,
 "fanchenggang@sunyainfo.com" <fanchenggang@sunyainfo.com>,
 "davidfgao@tencent.com" <davidfgao@tencent.com>,
 "liuzhong1@chinaunicom.cn" <liuzhong1@chinaunicom.cn>,
 "zhaoyong11@huawei.com" <zhaoyong11@huawei.com>,
 "oc@yunify.com" <oc@yunify.com>, "jim@netgate.com" <jim@netgate.com>,
 "hongjun.ni@intel.com" <hongjun.ni@intel.com>,
 "j.bromhead@titan-ic.com" <j.bromhead@titan-ic.com>,
 "deri@ntop.org" <deri@ntop.org>, "fc@napatech.com" <fc@napatech.com>,
 "arthur.su@lionic.com" <arthur.su@lionic.com>,
 Thomas Monjalon <thomas@monjalon.net>
Message-ID: <20200226090318.GA85327@hs1>
References: <20190627155036.56940-1-jerinj@marvell.com>
 <1580202029-37096-1-git-send-email-orika@mellanox.com>
 <CALBAE1O823JWYtwNPUkLZb=QOxxWTP3ipsUdikaMQ4udgR9mrA@mail.gmail.com>
 <AM6PR05MB5176AC40CD7F5A269F8A4F99DBEF0@AM6PR05MB5176.eurprd05.prod.outlook.com>
 <CALBAE1M_XKjs7W93+VLjiXJm_v5zamoWfnn9aSjG=ArU9LVOPA@mail.gmail.com>
 <AM0PR05MB51728A2770B60D32AFC8AE3DDBEF0@AM0PR05MB5172.eurprd05.prod.outlook.com>
 <CALBAE1NzYnWUXXBmnjFd1=rcV+Rsb4q_koirgEWmC9zA4B=x8A@mail.gmail.com>
 <AM6PR05MB5176621E6EEFB45C88CB4AD3DBED0@AM6PR05MB5176.eurprd05.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <AM6PR05MB5176621E6EEFB45C88CB4AD3DBED0@AM6PR05MB5176.eurprd05.prod.outlook.com>
User-Agent: Mutt/1.9.4 (2018-02-28)
Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Ori and Jerin,

One comment regarding my concern with len and end_offset problem.
>From open source SW regex library(libpcre, re2 and Hyperscan) and 
Intel's perspective, the matching results returned are always start
offset and end offset. More importantly, Hyperscan only reports end offset
most of the time.

It'll be good to keep this union as an abstraction and enforce the default
behavior for each solution, i.e. HW solutions doesn't support MATCH_AS_START
flag at rule compile time. Applications will know the meaning of variable at
rule compile time with the flag so they don't have to do extra check at fast path
run-time matching.
Welcome for better abstraction ideas.

Thanks,
Xiang

> > > > +                       /**< Starting Byte Position for matched rule. */
> > > > +                       RTE_STD_C11
> > > > +                       union {
> > > > +                               uint16_t len;
> > > > +                               /**< Length of match in bytes */
> > > > +                               uint16_t end_offset;
> > > > +                               /**< The end offset of the match. In case
> > > > +                                * MATCH_AS_START configuration is disabled.
> > > > +                                * @see RTE_REGEX_DEV_CFG_MATCH_AS_START
> > > > +                                */
> > >
> > > We have not concluded on this scheme. Have one field which has
> > > different meaning will be difficult
> > > for application. i.e fast path we need to have a check for this.
> > >
> >
> > This is the time to conclude . at least for the first version.
> > Why do we have one field with different meaning?
> > The result can be ether len or end_offset.
> >
> > > I think, Based on the majority of HW/SW implementation, we need to
> > > either go with len or
> > > end_offset. What Mellanox HW returns? len or end_offset?
> > >
> >
> > From Mellanox perspective we prefer the len approach. We also think
> > it is much more user oriented.
> >
> > > or We can keep it as len or end_offset based on which drivers upstream
> first,
> > > other drivers when it comes, we can see how to abstract it?
> > >
> >
> > I can except that assuming we choose the start and len approach
>
> I think, we can have first version with "start and len" by removing
> RTE_REGEX_DEV_CFG_MATCH_AS_START.
> When can think, how to abstract new drivers when it upstream based on
> the overhead.
>


On Tue, Feb 25, 2020 at 07:48:54AM +0000, Ori Kam wrote:
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Tuesday, February 25, 2020 7:57 AM
> > To: Ori Kam <orika@mellanox.com>
> > Cc: Jerin Jacob <jerinj@marvell.com>; xiang.w.wang@intel.com; dpdk-dev
> > <dev@dpdk.org>; Pavan Nikhilesh <pbhagavatula@marvell.com>; Shahaf
> > Shuler <shahafs@mellanox.com>; Hemant Agrawal
> > <hemant.agrawal@nxp.com>; Opher Reviv <opher@mellanox.com>; Alex
> > Rosenbaum <alexr@mellanox.com>; dovrat@marvell.com; Prasun Kapoor
> > <pkapoor@marvell.com>; Nipun Gupta <nipun.gupta@nxp.com>; Richardson,
> > Bruce <bruce.richardson@intel.com>; yang.a.hong@intel.com;
> > harry.chang@intel.com; gu.jian1@zte.com.cn; shanjiangh@chinatelecom.cn;
> > zhangy.yun@chinatelecom.cn; lixingfu@huachentel.com; wushuai@inspur.com;
> > yuyingxia@yxlink.com; fanchenggang@sunyainfo.com;
> > davidfgao@tencent.com; liuzhong1@chinaunicom.cn;
> > zhaoyong11@huawei.com; oc@yunify.com; jim@netgate.com;
> > hongjun.ni@intel.com; j.bromhead@titan-ic.com; deri@ntop.org;
> > fc@napatech.com; arthur.su@lionic.com; Thomas Monjalon
> > <thomas@monjalon.net>
> > Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem
> > 
> > > > 4) app/test/test_regexdev.c like app/test/test_eventdev.c
> > >
> > > We started to create a super basic app, after the API will be finalized and we
> > will have HW
> > > we can push it. (if you need it faster than feel free)
> > 
> > A simple Unit test case needs to be present for the APIs. On the
> > course of developing common code,
> > it can be developed to test the common code with dummy/skeleton driver.
> > 
> 
> Agree this is what we are currently have.
> 
> > >
> > > > 5) Need a maintainer for maintaining the regex subsystem
> > > >
> > > We wish to maintain it if you agree.
> > 
> > Yes. Please.
> > 
> 
> Great.
> 
> > > > >
> > > > > One more thing, regarding the ops structure, I think it is better to split it
> > to 2
> > > > different
> > > > > structures one enque and one for dequeue, since there are no real shared
> > > > data and we will
> > > > > be able to save memory, what do you think?
> > > >
> > > > Ops are allocated from mempool so it will be overhead to manage both.
> > > > moreover, some
> > > > of the fields added in req can be used for resp as info. cryptodev
> > > > follows the similar concept,
> > > > I think, we can have symmetry with cryptodev wherever is possible to avoid
> > > > end-user to learn new API models.
> > >
> > > True that there will be overhead with 2 mempools (small one)
> > > but lets assume 255 results. This means that the buffer should be 255 *
> > sizeof(rte_regex_match) = 2K
> > > also this will enable us to replace groupX with group[] which will allow even
> > more groups.
> > > In addition don't think that crypto is a good example.
> > > The main difference is that in RegEx the output is different format then the
> > input.
> > 
> > # IMO, Some of the fields may be useful for a response as well. I
> > think application may be interested in following
> > req filed in the response.
> > a) buf_addr
> 
> I don't see how this can be used in the response. since if working in out of order result.
> you don’t know which result will be returned. 
> I also think it is error prone to use the same op for the enqueue and dequeue.
> 
> > b) scan_size
> 
> Please see above.
> 
> > c) user_id (This would be main one)
> 
> Agree
> 
> > 
> > # Having two mempools adds overhead per lcore L1 cache usage and extra
> > complexity to the application.
> > 
> > # IMO, From a performance perspective, one mempool is good due to less
> > stress on the cache and it is costly to
> > add new mempool for HW mempool implementations.
> > 
> > # I think, group[] use case we can add it when it required by
> > introducing "matches_start_offset" field, which will
> > tell the req, where is the end of group[] and where "matches" start
> > with single mempool scheme also.
> > 
> > # I think, one of the other use case for "matches_start_offset" that,
> > It may possible to put vendor-specific
> > opaque data. It will be filled by driver on response. The application
> > can reference the matches as
> > 
> > struct rte_regex_match *matches = RTE_PTR_ADD(ops, ops-
> > 
> >matches_start_offset);
> > 
> 
> O.K for now we will keep  it as is, and we will see what will be in the future.
> 
> > >
> > > > I assume you will send the v4 with these comments. I think, with v4 we
> > > > can start implementing common library code.
> > >
> > > Just need to agree on the split (one more iteration )
> > > and I will start working on the common code.
> > 
> > Ack.
> 
> Great,
> I'm starting to work on V4 with all comments so the RFC will be acked and then will start 
> coding the rest of the common code.
>