From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8B52C43CFA; Tue, 19 Mar 2024 11:41:55 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 54EBB40298; Tue, 19 Mar 2024 11:41:55 +0100 (CET) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by mails.dpdk.org (Postfix) with ESMTP id 75428400D5 for ; Tue, 19 Mar 2024 11:41:53 +0100 (CET) Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4TzSv70l4wz6K5yg; Tue, 19 Mar 2024 18:41:23 +0800 (CST) Received: from frapeml100006.china.huawei.com (unknown [7.182.85.201]) by mail.maildlp.com (Postfix) with ESMTPS id A1FE0140A36; Tue, 19 Mar 2024 18:41:51 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by frapeml100006.china.huawei.com (7.182.85.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Tue, 19 Mar 2024 11:41:51 +0100 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.035; Tue, 19 Mar 2024 11:41:51 +0100 From: Konstantin Ananyev To: Yoan Picchi , Thomas Monjalon , Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin CC: "dev@dpdk.org" , "nd@arm.com" , Ruifeng Wang , Nathan Brown Subject: RE: [PATCH v7 1/4] hash: pack the hitmask for hash in bulk lookup Thread-Topic: [PATCH v7 1/4] hash: pack the hitmask for hash in bulk lookup Thread-Index: AQHadJP+5Ro6T8yAkEGoOrDN5ls2cbE+6Q3Q Date: Tue, 19 Mar 2024 10:41:51 +0000 Message-ID: <28cff0e5ea404051acdcb71c567d9d7c@huawei.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240312154215.802374-1-yoan.picchi@arm.com> <20240312154215.802374-2-yoan.picchi@arm.com> In-Reply-To: <20240312154215.802374-2-yoan.picchi@arm.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.206.138.42] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi, > Current hitmask includes padding due to Intel's SIMD > implementation detail. This patch allows non Intel SIMD > implementations to benefit from a dense hitmask. > In addition, the new dense hitmask interweave the primary > and secondary matches which allow a better cache usage and > enable future improvements for the SIMD implementations >=20 > Signed-off-by: Yoan Picchi > Reviewed-by: Ruifeng Wang > Reviewed-by: Nathan Brown > --- > .mailmap | 2 + > lib/hash/arch/arm/compare_signatures.h | 61 +++++++ > lib/hash/arch/common/compare_signatures.h | 38 +++++ > lib/hash/arch/x86/compare_signatures.h | 53 ++++++ > lib/hash/rte_cuckoo_hash.c | 192 ++++++++++++---------- > 5 files changed, 255 insertions(+), 91 deletions(-) > create mode 100644 lib/hash/arch/arm/compare_signatures.h > create mode 100644 lib/hash/arch/common/compare_signatures.h > create mode 100644 lib/hash/arch/x86/compare_signatures.h >=20 > diff --git a/.mailmap b/.mailmap > index 66ebc20666..00b50414d3 100644 > --- a/.mailmap > +++ b/.mailmap > @@ -494,6 +494,7 @@ Hari Kumar Vemula > Harini Ramakrishnan > Hariprasad Govindharajan > Harish Patil > +Harjot Singh > Harman Kalra > Harneet Singh > Harold Huang > @@ -1633,6 +1634,7 @@ Yixue Wang > Yi Yang > Yi Zhang > Yoann Desmouceaux > +Yoan Picchi > Yogesh Jangra > Yogev Chaimovich > Yongjie Gu > diff --git a/lib/hash/arch/arm/compare_signatures.h b/lib/hash/arch/arm/c= ompare_signatures.h > new file mode 100644 > index 0000000000..1af6ba8190 > --- /dev/null > +++ b/lib/hash/arch/arm/compare_signatures.h > @@ -0,0 +1,61 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2010-2016 Intel Corporation > + * Copyright(c) 2018-2024 Arm Limited > + */ > + > +/* > + * Arm's version uses a densely packed hitmask buffer: > + * Every bit is in use. > + */ > + > +#include > +#include > +#include > +#include "rte_cuckoo_hash.h" > + > +#define DENSE_HASH_BULK_LOOKUP 1 > + > +static inline void > +compare_signatures_dense(uint16_t *hitmask_buffer, > + const uint16_t *prim_bucket_sigs, > + const uint16_t *sec_bucket_sigs, > + uint16_t sig, > + enum rte_hash_sig_compare_function sig_cmp_fn) > +{ > + > + static_assert(sizeof(*hitmask_buffer) >=3D 2*(RTE_HASH_BUCKET_ENTRIES/8= ), > + "The hitmask must be exactly wide enough to accept the whole hitmask if= it is dense"); > + > + /* For match mask every bits indicates the match */ > + switch (sig_cmp_fn) { > +#if RTE_HASH_BUCKET_ENTRIES <=3D 8 > + case RTE_HASH_COMPARE_NEON: { > + uint16x8_t vmat, vsig, x; > + int16x8_t shift =3D {0, 1, 2, 3, 4, 5, 6, 7}; > + uint16_t low, high; > + > + vsig =3D vld1q_dup_u16((uint16_t const *)&sig); > + /* Compare all signatures in the primary bucket */ > + vmat =3D vceqq_u16(vsig, > + vld1q_u16((uint16_t const *)prim_bucket_sigs)); > + x =3D vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); > + low =3D (uint16_t)(vaddvq_u16(x)); > + /* Compare all signatures in the secondary bucket */ > + vmat =3D vceqq_u16(vsig, > + vld1q_u16((uint16_t const *)sec_bucket_sigs)); > + x =3D vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); > + high =3D (uint16_t)(vaddvq_u16(x)); > + *hitmask_buffer =3D low | high << RTE_HASH_BUCKET_ENTRIES; > + > + } > + break; > +#endif > + default: > + for (unsigned int i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { > + *hitmask_buffer |=3D > + ((sig =3D=3D prim_bucket_sigs[i]) << i); > + *hitmask_buffer |=3D > + ((sig =3D=3D sec_bucket_sigs[i]) << i) << RTE_HASH_BUCKET_ENTRIES; > + } > + } > +} > diff --git a/lib/hash/arch/common/compare_signatures.h b/lib/hash/arch/co= mmon/compare_signatures.h > new file mode 100644 > index 0000000000..dcf9444032 > --- /dev/null > +++ b/lib/hash/arch/common/compare_signatures.h > @@ -0,0 +1,38 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2010-2016 Intel Corporation > + * Copyright(c) 2018-2024 Arm Limited > + */ > + > +/* > + * The generic version could use either a dense or sparsely packed hitma= sk buffer, > + * but the dense one is slightly faster. > + */ > + > +#include > +#include > +#include > +#include "rte_cuckoo_hash.h" > + > +#define DENSE_HASH_BULK_LOOKUP 1 > + > +static inline void > +compare_signatures_dense(uint16_t *hitmask_buffer, > + const uint16_t *prim_bucket_sigs, > + const uint16_t *sec_bucket_sigs, > + uint16_t sig, > + enum rte_hash_sig_compare_function sig_cmp_fn) > +{ > + (void) sig_cmp_fn; > + > + static_assert(sizeof(*hitmask_buffer) >=3D 2*(RTE_HASH_BUCKET_ENTRIES/8= ), > + "The hitmask must be exactly wide enough to accept the whole hitmask if= it is dense"); > + > + /* For match mask every bits indicates the match */ > + for (unsigned int i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { > + *hitmask_buffer |=3D > + ((sig =3D=3D prim_bucket_sigs[i]) << i); > + *hitmask_buffer |=3D > + ((sig =3D=3D sec_bucket_sigs[i]) << i) << RTE_HASH_BUCKET_ENTRIES; > + } > + > +} Thanks for re-factoring compare_signatures_...() code, it looks much cleane= r that way. One question I have - does it mean that now for x86 we always use 'sparse' = while for all other ARM and non-ARM platforms we switch to 'dense'? > diff --git a/lib/hash/arch/x86/compare_signatures.h b/lib/hash/arch/x86/c= ompare_signatures.h > new file mode 100644 > index 0000000000..7eec499e1f > --- /dev/null > +++ b/lib/hash/arch/x86/compare_signatures.h > @@ -0,0 +1,53 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2010-2016 Intel Corporation > + * Copyright(c) 2018-2024 Arm Limited > + */ > + > +/* > + * x86's version uses a sparsely packed hitmask buffer: > + * Every other bit is padding. > + */ > + > +#include > +#include > +#include > +#include "rte_cuckoo_hash.h" > + > +#define DENSE_HASH_BULK_LOOKUP 0 > + > +static inline void > +compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t *sec_has= h_matches, > + const struct rte_hash_bucket *prim_bkt, > + const struct rte_hash_bucket *sec_bkt, > + uint16_t sig, > + enum rte_hash_sig_compare_function sig_cmp_fn) > +{ > + /* For match mask the first bit of every two bits indicates the match *= / > + switch (sig_cmp_fn) { > +#if defined(__SSE2__) && RTE_HASH_BUCKET_ENTRIES <=3D 8 > + case RTE_HASH_COMPARE_SSE: > + /* Compare all signatures in the bucket */ > + *prim_hash_matches =3D _mm_movemask_epi8(_mm_cmpeq_epi16( > + _mm_load_si128( > + (__m128i const *)prim_bkt->sig_current), > + _mm_set1_epi16(sig))); > + /* Extract the even-index bits only */ > + *prim_hash_matches &=3D 0x5555; > + /* Compare all signatures in the bucket */ > + *sec_hash_matches =3D _mm_movemask_epi8(_mm_cmpeq_epi16( > + _mm_load_si128( > + (__m128i const *)sec_bkt->sig_current), > + _mm_set1_epi16(sig))); > + /* Extract the even-index bits only */ > + *sec_hash_matches &=3D 0x5555; > + break; > +#endif /* defined(__SSE2__) */ > + default: > + for (unsigned int i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { > + *prim_hash_matches |=3D > + ((sig =3D=3D prim_bkt->sig_current[i]) << (i << 1)); > + *sec_hash_matches |=3D > + ((sig =3D=3D sec_bkt->sig_current[i]) << (i << 1)); > + } > + } > +}