From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4C64443CFE; Wed, 20 Mar 2024 08:39:18 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1111940F1A; Wed, 20 Mar 2024 08:39:18 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 6543E402A2 for ; Wed, 20 Mar 2024 08:39:16 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 42K3HH2w003415; Wed, 20 Mar 2024 00:39:10 -0700 Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 3wxka5041b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 20 Mar 2024 00:39:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KUxuZtbLiqGxFVxe8dQz56JksHwOc1Ixt47ToSLXEC7B2ZW2JfaN5Vn7PsRxu8+1TG/EUO0CQR0y4S+jBNEoPbzVBGrfYyB9aXde+Ir90Fthi4KRC3DNOshRGVGGc+lDtZjOqIBFX/il6W/qvw61iPt4Sb5qt6SFiQpXuyE0z8let6bJDH6ZmtPs1LYdjagjiNWMJWZQ0xVTg79pYV3XaM48ICNQdyRjVaRkmPpHT78OdCRq789RAJHauk1yOD/dfUTiGryvL3FM9m8TtLqeMZAfFlUKx5T5O9eeVelhosyjZXdkF3oWwLHKUBy2yxhUBu2bP7PCnSb5MNrGPSaOBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ASgmecz6kYXQ405uI9r2PA4LPJJJ69Mp4aXAT8vigt0=; b=cOkodsfIY/3PrpPCFWXVj923gbz8VE2Yco9aw0QRpx9UqfNp/FaNtBBplbyZCJOai/QSxJQDD3jgzw3zbu1iz27HyFL9O63yTWxD9PJdfPs55SnwXq2ivjvQkazPRJUY2EKCW2aWXA8caIzDTWAAzkAiJk3uXEc7nFZXBouv/ySTHrNgPWWncThVXEybrlJXchTI6Jo1f7nC20P6EoUQL79Dioj5115UNRBO21rHWWXaV5n/P4aA1fKTL6Tr2prSfX//4JqN+usI8pG3pXTNHBo/E/r9EmLIQq2Mnj5T++IODltxN+t1ZedW81Hoc1bWDmN2tU2QGVKiojrQmN44bw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=marvell.com; dmarc=pass action=none header.from=marvell.com; dkim=pass header.d=marvell.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ASgmecz6kYXQ405uI9r2PA4LPJJJ69Mp4aXAT8vigt0=; b=ZCG02t2FjZekuv+YcPFjq2Vx3uNEupf32uGftD03wucqeS8PR5oZmLx7y5v3lqPoHNCS0rGTLXQ2QQQcW5zycpW/Z3yVJJnzuZCdKE948nboUMng3e9YAZxJ1HPL6gQgvYmvQJWsign2scaOWNA6jreKfVBo6rCu5PoKDBLC1TU= Received: from PH0PR18MB4086.namprd18.prod.outlook.com (2603:10b6:510:3::9) by CO6PR18MB3857.namprd18.prod.outlook.com (2603:10b6:5:347::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7386.31; Wed, 20 Mar 2024 07:37:47 +0000 Received: from PH0PR18MB4086.namprd18.prod.outlook.com ([fe80::a843:2d6f:fc75:edfd]) by PH0PR18MB4086.namprd18.prod.outlook.com ([fe80::a843:2d6f:fc75:edfd%5]) with mapi id 15.20.7386.025; Wed, 20 Mar 2024 07:37:47 +0000 From: Pavan Nikhilesh Bhagavatula To: Yoan Picchi , Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin CC: "dev@dpdk.org" , "nd@arm.com" , Ruifeng Wang , Nathan Brown , Jerin Jacob Subject: RE: [EXTERNAL] [PATCH v7 2/4] hash: optimize compare signature for NEON Thread-Topic: [EXTERNAL] [PATCH v7 2/4] hash: optimize compare signature for NEON Thread-Index: AQHadJPyIwf2h3yxR064Je3Ok+SSxrFASDIA Date: Wed, 20 Mar 2024 07:37:47 +0000 Message-ID: References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240312154215.802374-1-yoan.picchi@arm.com> <20240312154215.802374-3-yoan.picchi@arm.com> In-Reply-To: <20240312154215.802374-3-yoan.picchi@arm.com> Accept-Language: en-US, en-IN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-traffictypediagnostic: PH0PR18MB4086:EE_|CO6PR18MB3857:EE_ x-ms-office365-filtering-correlation-id: ad52b8d0-cc78-466f-2c4e-08dc48b0a37e x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: RQL8tq9ZoVNEos+SLDtiLcl6xCVJ4mcWfnZX8w9o7UM1YlRYVuFlFglDCuo2ro2Qzuu/T42Ra4KJ3qqg2I06jdWaYurW61FJCon+Ho6MOi+u+xh28+Ifm49mgXa5hkPgKBqck7Zef/4eq7omHUEi0m1SS3K6302i3DKjPkoSbMHgIk3v21YatfQgxMUxDClxbPRR8H3/EJogzkJeG0QyCh0dOHbJX+M96cE8aVIOZ0H8zo8WCM1HerSNt27i1B4lSLfYUSgHmyMas6mJz2sO7agGIVuCrh/wlAu6/aJKjvDmM5TZ05RIH0b7EcGlgAVIZbMmT8AQCHanCv/smMKY9Al/X2HBEC+KE9w5Iz1OrCsNfizUpBXlO4DDbcpzzrdZk0z8HOZPMxbZtrAGWs4l1+ecvnUNv7tW5TiFo8gxyNwcVGQu9HqPndYXn5CnBRSqhm/J6qo7N1WD9xAuSfHPh3sCkjmTk3jQHPcT113TEyoLunMzmXz85+7LH6+VR3olbJkA0MnP8uOS6XKcTcT5y/YSGTsPCJsMbfZRUihf2AmmYD9j0klILVK+5tXbewY6qr7RINuE9fmCvG8hZtum+fn9OunTXL/GIsgQPUpDgLHmJWO+o99RvCxDf20vp4LtJ09iqDdtK6NmX3PM/hupGPIp0Pr8jCX/ciyh7K/mwwM= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH0PR18MB4086.namprd18.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(1800799015)(366007)(376005)(38070700009); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?fV98qL1sR5ml24a5z1upfL0fOqLi6ouwlBNDNZT3SLI1fFPMwSa98vxyALWo?= =?us-ascii?Q?jrMo+2ZAgkSHFmoQSbHhx+4TIDXZn960/1KkXKV9BpkGGS/la87beKpiYPmH?= =?us-ascii?Q?3EOdMjiHk7hga+hqMCm/veSpadVgxMNOeczSYpQ3Hh2FRR7BZhC8bqlLkwWi?= =?us-ascii?Q?taD2ycKeTM2QfBeeRvmxlT+LsM5jZsv7PZ2jPSsMAAmUgj6lSz94vQQ2uzmJ?= =?us-ascii?Q?Vl2PgZaR/Saj3VAGmBMlqxYfMpExjTzZr/g0gZzN5eGSlzY6gt/JBi1bSjSu?= =?us-ascii?Q?xCMjaDyOexnH5NJ54ldm4Z4BpLG9ckqkSVDCN/qnOvDab6pGJvMXxUYT1oxf?= =?us-ascii?Q?H0/PXuknscLEiy92JWo1npmos09DmIyIhQkHWqAy0eiKhVTfxEKHDF3Mj7A2?= =?us-ascii?Q?wNrnhq89YtiMDxSn5gKFamCoNIa78TLZA2Sx+s/tk/MeW/m1CqzTWl05L8gv?= =?us-ascii?Q?+TW55YtFhGobWPgIW+naFCxMlLm55Vh7S5TkGihDLPauPg50I5wYX84m4AS2?= =?us-ascii?Q?2QVKWqB3PrE9gacT8FvRabN3W7Ca6hQ+Bz2NHUAeisrC+hblgs/6EBOo3D0g?= =?us-ascii?Q?ZmWzrC/ptL1LkvFFxHvWA1NaZkN4SHHH5iYQ1Z1rd0Qkk3G+YeMd0ePtIhi/?= =?us-ascii?Q?0cLA97unla63le23aTv/Gc10HQL3j2j4VAauUHIFYbWubBHjxZhP9ClzPhcU?= =?us-ascii?Q?dZSXKxdXx59zk3URLzdWM67VcflRoUQfixJ4Nn8khWnAFNvinUXIL+6XrmXI?= =?us-ascii?Q?O/Qdu3Nr1H72kVSjI2NS3JXtWNCALFJhopLwRJ59ikn7HBcLd6rzZVulV6R1?= =?us-ascii?Q?1wxixtgKCAjXpQA3kpb2qBqq7odpavY53rEUaE5z7QFtfO5aWEZ5zXEOCwHQ?= =?us-ascii?Q?TL/IsCbjyGmvHeK4EjHTZN+FX8KsH05HpyeYBa32HDrwaJvCwKbTGZXoGCyY?= =?us-ascii?Q?fFfAxXLHUxTczl2HLRcgJsi7FhtFv/Bh/fqst2K0IkIYASpz0XCwtsMi4alz?= =?us-ascii?Q?rxkJtzloAnuvxEglZx8tEoCsEhzrJq4VtW0yB9VMt9tnIA5eDieWjecBpgl+?= =?us-ascii?Q?FZLs7NlwEMVQnpp1CLHqBUp4daQlZIww3JbzwEBUNgS2V/s9z3xUHReyM9kA?= =?us-ascii?Q?43cAp0Qc9mFP5gzek8/hiZVN+OSGMt2YxCYCoLBk1sEK3rjjAReARdFhgrA7?= =?us-ascii?Q?E64brsOw69D4L3/LbmiOg9EM4SikOV3C/0qd3SodM/7ykbtIZ9rB+3op3R2c?= =?us-ascii?Q?WEwt2UMJpAC1PZF/Ud8HDnMtsTKOn4MBwHynp5LWzTDoS2z1G3rMlwy+bupj?= =?us-ascii?Q?Iy4ANsPoEP3FNnphgyIvFNFgfXqFFMPcpCHI0lJnXE9cN0k8C12ScqqTj+ro?= =?us-ascii?Q?CNND9OKevfg1CQ1Ow1g1pgBW7dbbMHTUO/KxR7miszMz/2bDM2Y3gtxaA3rm?= =?us-ascii?Q?FPuwSI/qE94J/MmFtkQof+wd8ha1G/4RJP7IIszFNnE/2MX1FVTKREpOEkVA?= =?us-ascii?Q?D7H/COBpq/y+CpI78q/sRF9yb54h27zX2l3trz+2I0ToXd5FHIBxSCvONw4V?= =?us-ascii?Q?bcz2p3QjuE+g19UOnh1kwn8bpqzfbEwje8Z1af5w?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: marvell.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PH0PR18MB4086.namprd18.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: ad52b8d0-cc78-466f-2c4e-08dc48b0a37e X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Mar 2024 07:37:47.3678 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: XziXpae3bPUQMg4K5MFlccthRZtRGSeGORSF1GVdGOjUVKI8vzMR4WtlHkkB75ZWpmwflFePKCUomF8/IQxbF4KJjZ+/bNqj7FcvnucpL4k= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO6PR18MB3857 X-Proofpoint-GUID: Y4TTlSlRd68Bs1mtW8QTV2ZG5sW1nXFf X-Proofpoint-ORIG-GUID: Y4TTlSlRd68Bs1mtW8QTV2ZG5sW1nXFf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-20_04,2024-03-18_03,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > Upon a successful comparison, NEON sets all the bits in the lane to 1 > We can skip shifting by simply masking with specific masks. >=20 > Signed-off-by: Yoan Picchi > Reviewed-by: Ruifeng Wang > Reviewed-by: Nathan Brown > --- > lib/hash/arch/arm/compare_signatures.h | 24 +++++++++++------------- > 1 file changed, 11 insertions(+), 13 deletions(-) >=20 > diff --git a/lib/hash/arch/arm/compare_signatures.h > b/lib/hash/arch/arm/compare_signatures.h > index 1af6ba8190..b5a457f936 100644 > --- a/lib/hash/arch/arm/compare_signatures.h > +++ b/lib/hash/arch/arm/compare_signatures.h > @@ -30,23 +30,21 @@ compare_signatures_dense(uint16_t > *hitmask_buffer, > switch (sig_cmp_fn) { > #if RTE_HASH_BUCKET_ENTRIES <=3D 8 > case RTE_HASH_COMPARE_NEON: { > - uint16x8_t vmat, vsig, x; > - int16x8_t shift =3D {0, 1, 2, 3, 4, 5, 6, 7}; > - uint16_t low, high; > + uint16x8_t vmat, hit1, hit2; > + const uint16x8_t mask =3D {0x1, 0x2, 0x4, 0x8, 0x10, 0x20, > 0x40, 0x80}; > + const uint16x8_t vsig =3D vld1q_dup_u16((uint16_t const > *)&sig); >=20 > - vsig =3D vld1q_dup_u16((uint16_t const *)&sig); > /* Compare all signatures in the primary bucket */ > - vmat =3D vceqq_u16(vsig, > - vld1q_u16((uint16_t const *)prim_bucket_sigs)); > - x =3D vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), > shift); > - low =3D (uint16_t)(vaddvq_u16(x)); > + vmat =3D vceqq_u16(vsig, vld1q_u16(prim_bucket_sigs)); > + hit1 =3D vandq_u16(vmat, mask); > + > /* Compare all signatures in the secondary bucket */ > - vmat =3D vceqq_u16(vsig, > - vld1q_u16((uint16_t const *)sec_bucket_sigs)); > - x =3D vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), > shift); > - high =3D (uint16_t)(vaddvq_u16(x)); > - *hitmask_buffer =3D low | high << RTE_HASH_BUCKET_ENTRIES; > + vmat =3D vceqq_u16(vsig, vld1q_u16(sec_bucket_sigs)); > + hit2 =3D vandq_u16(vmat, mask); >=20 > + hit2 =3D vshlq_n_u16(hit2, RTE_HASH_BUCKET_ENTRIES); > + hit2 =3D vorrq_u16(hit1, hit2); > + *hitmask_buffer =3D vaddvq_u16(hit2); Since vaddv is expensive could you convert it to vshrn? https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-bl= og/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon https://github.com/DPDK/dpdk/blob/main/examples/l3fwd/l3fwd_neon.h#L226 > } > break; > #endif > -- > 2.25.1