From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D37DCA0530; Sun, 26 Jan 2020 12:55:06 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 34F791BEA1; Sun, 26 Jan 2020 12:55:05 +0100 (CET) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150081.outbound.protection.outlook.com [40.107.15.81]) by dpdk.org (Postfix) with ESMTP id 857DC1BE9E for ; Sun, 26 Jan 2020 12:55:04 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=V6Y9lSNGEU21vRJ7RlgjvfniyfnrajMerZHR+xONDCyMux1sOL737gurqAFAt0egpGfmtQ5cqb2WziQYYPas7x+KxWoALJFGDvd9Ly/kYgxTwwJSMnvKHgKXNfdRaMvd/5sfsEpxg3wcEF0kC1xYQBHvdDIgOl42Y6RhtWxZiLSlfdLM58ET5L4G8wnGCpuGA6s9pyqxfpZ6zhOS7HqdWY3MjS3eBiScx794cBoIM4GvuQWsNpCQ1sdIDx0D3ld/RhTbhc4fxl8oVSIrlGzTb9+D15pLYxuZ3kph1M9ERERdoafZ43/fbWhVrtrdYmIkjdLHfLbYmOdHxD+JAsCWbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BU9i1O3te3DHYv31E9nTPVY6WO7vgbk7yU9uNZsoW/w=; b=GI5VQ18+3B1xeLAcHnTYbQ0ap187cx3nrNiYOjrNvUeDsFOBp8GqVMU/M/4Ug6uj4sbaRsc3wYM1IJisq9T0L8wf1R5AxQrulc9rkqr3cwAytPCCf/mYW6UPuhsIEDxPe3M5RIk2dBXlGu49XKRjsizFEbwoJaF9pOjkuHmO3O6pBJmFcySrZ8xpRK+98VWdEcuubiS7F5U2D1uEu9R+hu7vdypfS3JMeKjWgFWg9S30SSbh0mySgrkxzXvfJn1NWNkomtfLOgTBn3z+M0O14yrttrQrWC0eDENzyY0M5W/4urKsNKyeP2T1q4jllmrB12y+bn28O3GEww0+bKt1YA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com; dkim=pass header.d=mellanox.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BU9i1O3te3DHYv31E9nTPVY6WO7vgbk7yU9uNZsoW/w=; b=DXmZyN6VI850Jvhx6pTFUvEv8jiB4tYHHOvip6vG9UM0gGCKJTCSX0Q1Arp1D3KujSsM/JqiBUbjw6fug5Aou4JdKp7TKjYaB8HqX+G8pNzOO6uEcwxtuwpeZkvUvfNPWPLhT+QD94wFJT30PjL7WE4GjRGhID4J1DlhnsRna3Y= Received: from AM6PR05MB5176.eurprd05.prod.outlook.com (20.177.196.158) by AM6PR05MB5158.eurprd05.prod.outlook.com (20.177.191.77) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2665.20; Sun, 26 Jan 2020 11:55:02 +0000 Received: from AM6PR05MB5176.eurprd05.prod.outlook.com ([fe80::1888:dbf5:a84e:c53b]) by AM6PR05MB5176.eurprd05.prod.outlook.com ([fe80::1888:dbf5:a84e:c53b%6]) with mapi id 15.20.2665.017; Sun, 26 Jan 2020 11:55:02 +0000 From: Ori Kam To: Wang Xiang , Jerin Jacob Kollanukkaran CC: Thomas Monjalon , "dev@dpdk.org" , Pavan Nikhilesh Bhagavatula , Shahaf Shuler , Hemant Agrawal , Opher Reviv , Alex Rosenbaum , Dovrat Zifroni , Prasun Kapoor , Nipun Gupta , "Richardson, Bruce" , "Hong, Yang A" , "Chang, Harry" , "gu.jian1@zte.com.cn" , "shanjiangh@chinatelecom.cn" , "zhangy.yun@chinatelecom.cn" , "lixingfu@huachentel.com" , "wushuai@inspur.com" , "yuyingxia@yxlink.com" , "fanchenggang@sunyainfo.com" , "davidfgao@tencent.com" , "liuzhong1@chinaunicom.cn" , "zhaoyong11@huawei.com" , "oc@yunify.com" , "jim@netgate.com" , "Ni, Hongjun" , "j.bromhead@titan-ic.com" , "deri@ntop.org" , "fc@napatech.com" , "arthur.su@lionic.com" , Guy Kaneti , Smadar Fuks , Liron Himi , "edwin.verplanke@intel.com" , "keith.wiles@intel.com" Thread-Topic: [dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev subsystem Thread-Index: AQHVU0zRJ7ue0PASgk+3dw++3bpSlab8FCEAgAW8SoCAAX2FAIAhaIeAgA6HsYCADJy5AIAarbAAgKNJRCA= Date: Sun, 26 Jan 2020 11:55:02 +0000 Message-ID: References: <20190627155036.56940-1-jerinj@marvell.com> <8285913.8xKIzI91KM@xps> <1922242.dABWq9CbNQ@xps> <20190919135857.GA82263@hs1> <20191014135924.GA50406@hs1> In-Reply-To: <20191014135924.GA50406@hs1> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=orika@mellanox.com; x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 407f2c6d-1be2-4e05-4975-08d7a25693d6 x-ms-traffictypediagnostic: AM6PR05MB5158:|AM6PR05MB5158: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-forefront-prvs: 02945962BD x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(396003)(39860400002)(376002)(136003)(366004)(346002)(189003)(199004)(6506007)(55016002)(66476007)(30864003)(66946007)(5660300002)(52536014)(81166006)(81156014)(8676002)(64756008)(66446008)(7416002)(7406005)(7696005)(66556008)(33656002)(76116006)(186003)(26005)(8936002)(53546011)(110136005)(2906002)(316002)(86362001)(9686003)(71200400001)(478600001)(54906003)(4326008); DIR:OUT; SFP:1101; SCL:1; SRVR:AM6PR05MB5158; H:AM6PR05MB5176.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: FNGLBv/ToonSTKQpHAVOdJSRtxJoxXFS7kG5LaUY0JNrzMZX8m7MlXKlnnvYpZVx3MjcXTR5vOS3ehj48eK6XVJpiKGaDWLk6i53ATgzPeLquI2eapTCLVDFeyDYg3Rrdg2i82+iihdHYaHJof10jodHcVEFUyY8+d/Em/5KHXvkeZIgwFZueb9r2pWL50Ffmbfk/I1hZ9s2mlKgTSW/6FqGqoJnUYU39NYR2qBH/5QrwXMun0R1Zh+e5X+0SXp+fZxAlkEk8OFeIwgdWs2n6pk1ESxsBKHsO2pFfaF5rgPORYFH9MyT0l9seO5v6Fs4N/lCmF1FH14f0Ie0XQjV0nMlllAcgCgtmNau1+jUMSaHEyd3Np00UdBvKU/zzT+xsBnvEvDVT4TPKWOputvKraX9ZIIp12wOGd++16ECyhdhS7SIrKDcFfqoTSoak5ew x-ms-exchange-antispam-messagedata: sBKOEEOBmbWA7Hf0OE550ZBF9YeQx5XU/8bCIeZ/gYyroSYbMHZB6FDybl1mN9St+nI+GjCugbkdajG2+S5t0ReJ8TpFp1xEJppPJHhq2F90MVvbfCFy2IxB0+rZx3+HnrikdBzisJfP1mn3dY91gw== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 407f2c6d-1be2-4e05-4975-08d7a25693d6 X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Jan 2020 11:55:02.5137 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: ddJm/qrDA4TpWZyhOkS/eSZKD44bm9gxQ0HAq5BrBlnmv8xCKvCcPOeezHX3D8MHmu9d9jx1Mq2fc/3yg1xK3Q== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR05MB5158 Subject: Re: [dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev subsystem X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Jerin and Xiang, PSB > -----Original Message----- > From: dev On Behalf Of Wang Xiang > Sent: Monday, October 14, 2019 4:59 PM > To: Jerin Jacob Kollanukkaran > Cc: Thomas Monjalon ; dev@dpdk.org; Pavan > Nikhilesh Bhagavatula ; Shahaf Shuler > ; Hemant Agrawal ; > Opher Reviv ; Alex Rosenbaum > ; Dovrat Zifroni ; Prasun Kapoor > ; Nipun Gupta ; Richardson, > Bruce ; Hong, Yang A ; > Chang, Harry ; gu.jian1@zte.com.cn; > shanjiangh@chinatelecom.cn; zhangy.yun@chinatelecom.cn; > lixingfu@huachentel.com; wushuai@inspur.com; yuyingxia@yxlink.com; > fanchenggang@sunyainfo.com; davidfgao@tencent.com; > liuzhong1@chinaunicom.cn; zhaoyong11@huawei.com; oc@yunify.com; > jim@netgate.com; Ni, Hongjun ; j.bromhead@titan- > ic.com; deri@ntop.org; fc@napatech.com; arthur.su@lionic.com; Guy Kaneti > ; Smadar Fuks ; Liron Himi > ; edwin.verplanke@intel.com; keith.wiles@intel.com > Subject: Re: [dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev > subsystem >=20 > On Fri, Sep 27, 2019 at 02:35:00PM +0000, Jerin Jacob Kollanukkaran wrote= : > > > -----Original Message----- > > > From: Wang Xiang > > > > > > Hi Jerin, > > > > > > Thanks for your response. More comments below and inline. > > > > > > 1) I think the size of some varaibles (e.g. nb_matches, scan_size, ma= tching > > > offset, etc) should be increased based on what Hyperscan supports. > > > > > > a) struct rte_regex_ops: > > > > > > uint16_t scan_size =3D> uint32_t scan_size > > > > I think, packet buffers will not be > 64K and getting more than contigu= ous > > 64K DMAable memory will be difficult in DPDK. > > Other than that, rte_regex_match is 64bit now, increasing width of > > Len could increase the size of "rte_regex_match". i.e Need more > > Bandwidth for response. > > Could other HW implementations share the views on max length > > is supported on their implementation? Based on that we can decide. > > > OK, let's gather ideas from HW implementation. I agree, that 16 bit for buffer length is good, and that the size of rte_re= gex_match should stay 64 bit, in order to have better performance. (PCI bandwidth and= caching) > > > > > uint8_t nb_actual_matches =3D> uint64 nb_actual_matches > > > uint8_t nb_matches =3D> uint64 nb__matches > > > > 2^64 matches will be never possible in practical system. How about 2^16= . > > > I think the number of matches depends on the number of total rules and > scan size. Based on the definitions (16-bit nb_rules_per_group, > 16-bit nb_groups and 16-bit scan size), the maximum possible matches > could exceed 2^16. Users may get partial matches in this case while > Hyperscan doesn't make compromises. It'll also be good to check other HW > implementation. I think that we can increase the number of matches to 16 bit. But in any c= ase our HW can't support working on more then 4 groups in a single search, and since we can'= t support buffer larger then 2^16, and if we could saving that many results in in HW is not = practical. > > > > > > > > b) struct rte_regex_match: > > > uint16_t offset =3D> uint32_t offset > > > uint16_t len =3D> uint32_t len > > > > See above. > > > > > > > > c) uint16_t > > > rte_regex_rule_db_update(uint8_t dev_id, const struct rte_reg= ex_rule > > > *rules, > > > uint16_t nb_rules); > > > =3D> > > > uint32_t > > > rte_regex_rule_db_update(uint8_t dev_id, const struct rte_reg= ex_rule > > > *rules, > > > uint32_t nb_rules); > > > > OK. I will change it next version. > > > > > > > > d) int > > > rte_regex_queue_pair_setup(uint8_t dev_id, uint8_t queue_pair_id, > > > const struct rte_regex_qp_conf *qp_conf); > > > =3D> > > > int > > > rte_regex_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id= , > > > const struct rte_regex_qp_conf *qp_conf); > > > > OK. I will change it next version. > > > > > > > > e) struct rte_regex_dev_config: > > > uint8_t nb_max_matches =3D> uint64_t nb_max_matches > > > > 2^64 matches will be never possible in practical system. How about 2^16= . > > > See above. See above. > > > > > > > > f) struct rte_regex_dev_info: > > > uint8_t max_matches =3D> uint64_t max_matches > > > > 2^64 matches will be never possible in practical system. How about 2^16= . > > > See above. See above. > > > > > > > > 2) There are rte_regex_dev_attr_get() and rte_regex_dev_attr_set() > defined. > > > Are all the attributes below could be set by users? Is any of them re= ad-only? > > > > See below, > > > > > /** Enumerates RegEx device attribute identifier */ enum > > > rte_regex_dev_attr_id { > > > RTE_REGEX_DEV_ATTR_SOCKET_ID, > > > /**< The NUMA socket id to which the device is connected or > > > * a default of zero if the socket could not be determined. > > > * datatype: *int* > > > * operation: *get* > > > > *get* means read only. *get* and *set* means it support both operation > > > > > */ > > > RTE_REGEX_DEV_ATTR_MAX_MATCHES, > > > /**< Maximum number of matches per scan. > > > * datatype: *uint8_t* > > > * operation: *get* and *set* > > > * > > > * @see RTE_REGEX_OPS_RSP_MAX_MATCH_F > > > */ > > > RTE_REGEX_DEV_ATTR_MAX_SCAN_TIMEOUT, > > > /**< Upper bound scan time in ns. > > > * datatype: *uint16_t* > > > * operation: *get* and *set* > > > * > > > * @see RTE_REGEX_OPS_RSP_MAX_SCAN_TIMEOUT_F > > > */ > > > RTE_REGEX_DEV_ATTR_MAX_PREFIX, > > > /**< Maximum number of prefix detected per scan. > > > * This would be useful for denial of service detection. > > > * datatype: *uint16_t* > > > * operation: *get* and *set* > > > * > > > * @see RTE_REGEX_OPS_RSP_MAX_PREFIX_F > > > */ > > > }; > > > > > > 3) Both RTE_REGEX_PCRE_RULE_* and > > > RTE_REGEX_DEV_PCRE_UNSUP_* can be viewed as device capabilities. Can > we > > > merge them with RTE_REGEX_DEV_CAPA_RUNTIME_COMPILATION_F and > have > > > a unified regex_dev_capa in struct rte_regex_dev_info. > > > > Sure. I will fix it next version. > > > > > > > > > > > 4) It'll be good if we can also define synchronous matching API for u= sers > who > > > want to have a one-off scan and wait for the results. > > > > Makes sense. I will add synchronous matching API in next version(I > understand, it will be useful for SW > > Implementations). Probably expose as INFO flag to expose the it as > preference. > > > > > > > > On Tue, Sep 10, 2019 at 08:05:39AM +0000, Jerin Jacob Kollanukkaran > wrote: > > > > Hi Xiang, > > > > > > > > Sorry for delay in response(Was busy with 19.11 proposal deadline). > Please > > > see inline. > > > > > > > > > > > > > > Reply to Xiang's queries in main thread: > > > > > > > > > > Hi all, > > > > > > > > > > Some questions regarding APIs. Could you please give more insight= s? > > > > > > > > > > 1) rte_regex_ops > > > > > a) rsp_flags > > > > > These two flags RTE_REGEX_OPS_RSP_PMI_SOJ_F and > > > > > RTE_REGEX_OPS_RSP_PMI_EOJ_F are used for cross buffer scan. > > > > > RTE_REGEX_OPS_RSP_PMI_EOJ_F tells whether we have a partial > > > > > match at the end of current buffer after scan. > > > > > What's the purpose of having RTE_REGEX_OPS_RSP_PMI_SOJ_F? > > > > > > > > > > [Jerin] Since we need three states to represent partial match > > > > > buffer, RTE_REGEX_OPS_RSP_PMI_SOJ_F to represent start of the > > > > > buffer, intermediate buffers with no flag, and end of the buffer > > > > > with RTE_REGEX_OPS_RSP_PMI_EOJ > > > > > > > > > [Xiang] How could a user leverage these flags for matching? Suppo= se > > > > > a large buffer is divided into multiple chunks. Will > > > > > RTE_REGEX_OPS_RSP_PMI_SOJ_F cause an early quit once it isn't set > > > > > after scan the first chunk. Similarly, RTE_REGEX_OPS_RSP_PMI_EOJ > > > > > tells a user whether to stop matching future buffers after finish= the last > > > chunk? > > > > > > > > Let me describe with an example, > > > > > > > > Assume, > > > > 1) struct rte_regex_dev_info:: max_payload_size set to 1024 > > > > 2) rte_regex_dev_config:: dev_cfg_flags configured with > > > > RTE_REGEX_DEV_CFG_CROSS_BUFFER_SCAN_F > > > > 3) Device programmed with matching "hello\s+world" pattern > > > > 4) user enqueue struct rte_regex_ops:: buf_addr point following "da= ta" > > > > and struct rte_regex_op:: scan_size =3D 1024 > > > > > > > > data[0..1021] =3D data don???t have hello world pattern data[1022] = =3D 'h' > > > > data[1023] =3D 'e' > > > > > > > > 5) user enqueue struct rte_regex_ops:: buf_addr point following "da= ta" > > > > and struct rte_regex_op:: scan_size =3D 9 > > > > > > > > data[0] =3D 'l' > > > > data[1] =3D 'l' > > > > data[2] =3D 'o' > > > > data[3] =3D ' ' > > > > data[4] =3D 'w' > > > > data[5] =3D 'o' > > > > data[6] =3D 'r' > > > > data[7] =3D 'l' > > > > data[8] =3D 'd' > > > > > > > > If so, > > > > > > > > Response to 4) will be RTE_REGEX_OPS_RSP_PMI_SOJ_F in > rte_regex_ops:: > > > > rsp_flags on dequeue Where rte_regex_match:: offset is 1022 and len= 2 > > > > > > > > Response to 5) will be RTE_REGEX_OPS_RSP_PMI_EOJ_F in > rte_regex_ops:: > > > > rsp_flags on dequeue Where rte_regex_match:: offset is 0 and len 9 > > > > > > > If the defined pattern is "hello.*world" instead of "hello\s+world", = and we > > > enqueue following struct rte_regex_ops: > > > > > > 1) rte_regex_op:: scan_size =3D 1024 > > > > > > data[0..1021] =3D data don???t have hello world pattern > > > data[1022] =3D 'h' > > > data[1023] =3D 'e' > > > > > > 2) rte_regex_op:: scan_size =3D 9 > > > data[0] =3D 'l' > > > data[1] =3D 'l' > > > data[2] =3D 'o' > > > data[3] =3D ' ' > > > data[4] =3D 'w' > > > data[5] =3D 'o' > > > data[6] =3D 'r' > > > data[7] =3D 'l' > > > data[8] =3D 'd' > > > > > > 3) rte_regex_op:: scan_size =3D 5 > > > data[0] =3D 'w' > > > data[1] =3D 'o' > > > data[2] =3D 'r' > > > data[3] =3D 'l' > > > data[4] =3D 'd' > > > > > > Will response to 3) have RTE_REGEX_OPS_RSP_PMI_EOJ_F in > rte_regex_ops:: > > > rsp_flags on dequeue > > > Where rte_regex_match:: offset is 0 and len 4? > > > > Yes. > > > > > > > > I am wondering what's your expected behavior for .* or similar syntax= and > if > > > there are syntax compatability issues. We report all matches in Hyper= scan, > e.g. > > > report end match offsets 11 and 16 for pattern "hello.*world" and cor= pus > > > "hello worldworld". > > > > > > BTW, not sure how other hardware devices handle cross buffer scan. > Hyperscan > > > doesn't reports matches for start and intermediate buffers but only r= eports > end > > > offset if a full match is found. > > > > > > > > > > > > > > > > > RTE_REGEX_OPS_RSP_MAX_PREFIX_F: This looks like a definitio= n > > > > > for a specific hardware implementation. I am wondering what this > > > > > PREFIX refers to:)? > > > > > > > > > > [Jerin] Yes. Looks like it is for hardware specific implementatio= n. > > > > > Introduced rte_regex_dev_attr_set/get functions to make it portab= le > > > > > and To add new implementation specific fields. > > > > > For example, if a rule is > > > > > /ABCDEF.*XYZ/, ABCD is considered the prefix, and EF.*XYZ is > > > > > considered the factor. The prefix is a literal string, while the > > > > > factor can contain complex regular expression constructs. As a > > > > > result, rule matching occurs in two stages: prefix matching and > > > > > factor matching. > > > > > > > > > > b) user_id or user_ptr > > > > > Under what kind of circumstances should an application pass > > > > > value into these variables for enqueue and dequeuer operations? > > > > > > > > > > [Jerin] Just like rte_crypto_ops, struct rte_regex_ops also > > > > > allocated using mempool normally, on enqueue, user can specify > > > > > user_id If needed to in order identify the op on dequeue if > > > > > required. The use case could be to store the sequence number from > > > > > application POV or storing the mbuf ptr in which pattern is reque= sted > etc. > > > > > > > > > > > > > > > 2) rte_regex_match > > > > > a) offset; /**< Starting Byte Position for matched rule. */ > > > > > and uint16_t len; /**< Length of match in bytes */ > > > > > Looks like the matching offset is defined as *starting > > > > > matching offset* instead of *end matching offset*, e.g. report th= e offset > of > > > "a" instead of "c" > > > > > for pattern "abc". > > > > > If so, this makes it hard to integrate software regex > > > > > libraries such as Hyperscan and RE2 as they only report *end > > > > > matching offset* without length of match. > > > > > Although Hyperscan has API for *starting matching offset*, = it > > > > > only delivers partial syntax support. So I think we have to defin= e > > > > > *end of matching offset* for software solutions. > > > > > > > > > > [Jerin] I understand the hyperscan's HS_FLAG_SOM_LEFTMOST > tradeoffs. > > > > > I thought application would need always the length of the match. > > > > > Probably we will see how other HW implementation (from Mellanox) > > > > > etc. We will try to abstract it, probably we can make it as funct= ion > > > > > of "user requested". > > > > > [Xiang] Yes, it will be good to make it per user request. At leas= t > > > > > from Hyperscan user's point of view, start of match and match len= gth > > > > > are not mandatory. > > > > > > > > OK. I think, we can introduce RTE_REGEX_DEV_CFG_MATCH_AS_START In > > > > device configure. > > > > > > > > Since offset+len =3D=3D end, we can introduce following generic inl= ine > function. > > > > > > > > static inline > > > > rte_regex_match_end(truct rte_regex_match *match) { > > > > match->offset + match->len; > > > > } > > > > > > > > Example: pattern to match is "hello\s+world" and data is followi= ng > > > > data[4] =3D 'h' > > > > data[5] =3D 'e' > > > > data[6] =3D 'l' > > > > data[7] =3D 'l' > > > > data[8] =3D 'o' > > > > data[9] =3D ' ' > > > > data[10] =3D 'w' > > > > data[11] =3D 'o' > > > > data[12] =3D 'r' > > > > data[13] =3D 'l' > > > > data[14] =3D 'd' > > > > > > > > if device is configured with RTE_REGEX_DEV_CFG_MATCH_AS_START > > > > match->offset returns 4 > > > > match->len returns 11 > > > > > > > > if device is NOT configured with > RTE_REGEX_DEV_CFG_MATCH_AS_START > > > > driver MAY return the following(in hyperscan case) > > > > match->offset returns 0 > > > > match->len returns 11 + 4 > > > > > > > > In both case(irrespective of flags, to make application life easy) > > > rte_regex_match_end() would return 15. > > > > If application demands for MATCH_AS_START then driver can return > > > > match->offset returns 4 and match->len returns 11 Aka set > > > > HS_FLAG_SOM_LEFTMOST in hyperscan driver, But application should us= e > > > rte_regex_match_end() for finding the end of the match. To make, work= in > all > > > cases. > > > > > > > > Is it OK? > > > > > > > Can we replace len with end offset? So we can change "offset" to > "start_offset" > > > and len to "end_ offset" in struct rte_regex_match. Users interested = in len > > > could take "end_offset - start_offset". > > > We may also change RTE_REGEX_DEV_CFG_MATCH_AS_START to > > > RTE_REGEX_DEV_CFG_MATCH_START > > > > > > In your example, > > > if device is configured with RTE_REGEX_DEV_CFG_MATCH_START > > > match->start_offset returns 4 > > > match->end_offset returns 15 > > > > > > if device is NOT configured with RTE_REGEX_DEV_CFG_MATCH_START > > > match->start_offset returns 0 > > > match->end_offset returns 15 > > > > > > This part is little tricky as HW descriptions need to be rewritten on r= esponse. > > This is a one issue, I foresee earlier, to come up with rte_regex_match > > That's works for all implementation without performance issue. > > > > We have two HW implementations, both returns start_off and len. > > Lets get input from other HW implementation on the semantics of > > rte_regex_match. Based on that, we can decide how to go about it? > > Thoughts from Mellanox or other vendors? > > > Sure. Let's get more inputs on this. I think Jerin approach is the better one, since at least in our case we see= a request=20 to copy the match, so it is more user friendly to give the offset and len. > > > > > > > > > > > > > > > > > 3) rte_regex_rule_db_update() > > > > > Does this mean we can dynamically add or delete rules for an > > > > > already generated database without recompile from scratch for > > > > > hardware Regex implementation? > > > > > If so, this isn't possible for software solutions as they don= 't > > > > > support dynamic database update and require recompile. > > > > > > > > > > [Jerin] rte_regex_rule_db_update() internally it would call > > > > > recompile function for both HW and SW. > > > > > See rte_regex_dev_config::rule_db in rte_regex_dev_configure() fo= r > > > > > precompiled rule database case. > > > > > [Xiang] OK, sounds like we have to save the original rule-set for > > > > > the device in order to do recompile. I see both ADD and REMOVE > > > > > operators from rte_regex_rule. > > > > > For rules with REMOVE operator, what's the expected behavior to > > > > > handle them for the old rule-set? Do we need to go through the ol= d > > > > > rule-set and remove corresponding rules before doing recompile? > > > > > > > > Yes. > > > > > > > I think it'll be better to change rte_regex_rule_db_update() to > > > rte_regex_rule_compile() and have users to provide a full rule-set. > > > So we don't have to maintain old rule-set and decide which one to kee= p and > > > remove. We can simply recompile new rule-set and get rid of > > > rte_regex_rule_op in this case. > > > > > > On virtualized, HW implementations, The RULE database is maintained by > single > > body. So the above scheme, works with SW and HW implementations. > > And It make user life easy as they don't need to maintain the rules. > > > > I don't have preference on the rte_regex_rule_db_update() name, I can > change to > > rte_regex_rule_compile() if required keeping above functionality. Let m= e > know. > > > > > OK, I'm good if your are willing to maintain it for users. Then both > rte_regex_rule_db_update() and rte_regex_rule_compile() work for me. Combining with Shahaf request. My suggestion is: rte_regex_rule_db_update() - only insert/remove rules from the internal-set rte_regex_rule_db_compile_activate() - compile and activate the new rule se= t. > > > > > > > > > > > > Best, Ori