From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0ACF445584; Thu, 4 Jul 2024 22:31:18 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CE007410F1; Thu, 4 Jul 2024 22:31:17 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id A346A402A7 for ; Thu, 4 Jul 2024 22:31:16 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1720125076; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XpNPmirVpRxwT7Zso9Xz0CtMR+ioOg6TpPgm8s9hwJc=; b=c2XbouyJ0GZ14V4dRje9wjl3BtlJsQLryrdAveGh5g10MrinVMzbNcNI2iHMFJhDsw/Dzm sTlSxOOzceHB9ObK4H7bq3Sp6QtHZtsd7S57cV7e5fZqpM9W3AJVm5QNWIfzvHvajSGJ81 9EeypTlt0HWJ/v0rc5G7oG8xcdOqMfs= Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-288-BmUo2R_OPD2MBDdyKI1YAw-1; Thu, 04 Jul 2024 16:31:14 -0400 X-MC-Unique: BmUo2R_OPD2MBDdyKI1YAw-1 Received: by mail-lf1-f72.google.com with SMTP id 2adb3069b0e04-52e9f195bf6so1156530e87.3 for ; Thu, 04 Jul 2024 13:31:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720125073; x=1720729873; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XpNPmirVpRxwT7Zso9Xz0CtMR+ioOg6TpPgm8s9hwJc=; b=O23DMTOLHJYxpspFwzsuN/0/9INKwB/I9YWFxtEW0drXXOHmuD/0J/0+WqThgg9XTU 3fqADrUfdzUGAlugin6qZ+cSQuDB8nLD3+MU4zsIwoqgriVBwejWNMdf2S5UFxOO4HEx TquerSobKUl0D58rf45lP2Oo+mdb0mu3RcZZTmDc7QPg2UGICGXCcOwhM+nH3AiQ2lvs Rjjeq7qXR5d8NadEMq4xZDykXES/xQO+L0X6WTRhnhrer3YvGW7GzHigtwmJl6UtTHME PZ8I7jNsbMmTWfrCcAGP1gX8k+eLx6ak5k+09XynJJHtjyD4Gryv8Aye4hyYQ4+7/R7C vSpw== X-Forwarded-Encrypted: i=1; AJvYcCXo96eGRCHpGjevHmUrFFRMssqTIWRN89K0UNpvsJ2Zq/uGp3mtjpcBfBIfzZ4C4QXC+l0m/PxJGtRKV3o= X-Gm-Message-State: AOJu0Yw1krrbdF51CPLRlRoQzBNVEnU2pPmQg0OoHlXvDg1eY/a4kkaG ONMIYKNAkgVEHSuRuVtHIUZNdC4f2L/zGlOsNdZW108cuXkd8DUizB0rvVp26UTKu3JOfpevsrs m836ztuqtTkOMN5J+J0xNUEWsSyb3Zw66W1cEUZkkkcmLLYaoz5E8wdsPoNaIQxtfKmCO1DX9Gf JfC+VqrVxDpoKAtJ0= X-Received: by 2002:a05:6512:2087:b0:52c:dba2:4f1 with SMTP id 2adb3069b0e04-52ea06a6abamr2290131e87.48.1720125073401; Thu, 04 Jul 2024 13:31:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFk4mdcD6ah6oF3Fk+UquOrIjox5PnvCEobvnKKDVwa+TnT+iEPc6gFirHiq94MHlPu6k0ACHSCBqXv0hYiXuY= X-Received: by 2002:a05:6512:2087:b0:52c:dba2:4f1 with SMTP id 2adb3069b0e04-52ea06a6abamr2290114e87.48.1720125072892; Thu, 04 Jul 2024 13:31:12 -0700 (PDT) MIME-Version: 1.0 References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240703171315.1470547-1-yoan.picchi@arm.com> <20240703171315.1470547-2-yoan.picchi@arm.com> In-Reply-To: <20240703171315.1470547-2-yoan.picchi@arm.com> From: David Marchand Date: Thu, 4 Jul 2024 22:31:01 +0200 Message-ID: Subject: Re: [PATCH v10 1/4] hash: pack the hitmask for hash in bulk lookup To: Yoan Picchi Cc: Thomas Monjalon , Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin , dev@dpdk.org, nd@arm.com, Ruifeng Wang , Nathan Brown X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hello Yoan, On Wed, Jul 3, 2024 at 7:13=E2=80=AFPM Yoan Picchi wr= ote: > > Current hitmask includes padding due to Intel's SIMD > implementation detail. This patch allows non Intel SIMD > implementations to benefit from a dense hitmask. > In addition, the new dense hitmask interweave the primary > and secondary matches which allow a better cache usage and > enable future improvements for the SIMD implementations > The default non SIMD path now use this dense mask. > > Signed-off-by: Yoan Picchi > Reviewed-by: Ruifeng Wang > Reviewed-by: Nathan Brown This patch does too many things at the same time. There is code movement and behavior modifications all mixed in. As there was still no review from the lib maintainer... I am going a bit more in depth this time. Please split this patch to make it less hard to understand. I can see the need for at least one patch for isolating the change on sig_cmp_fn from the exposed API, then one patch for moving the code to per arch headers with *no behavior change*, and one patch for introducing/switching to "dense hitmask". More comments below. > --- > .mailmap | 1 + > lib/hash/compare_signatures_arm_pvt.h | 60 +++++++ > lib/hash/compare_signatures_generic_pvt.h | 37 +++++ > lib/hash/compare_signatures_x86_pvt.h | 49 ++++++ > lib/hash/hash_sig_cmp_func_pvt.h | 20 +++ > lib/hash/rte_cuckoo_hash.c | 190 +++++++++++----------- > lib/hash/rte_cuckoo_hash.h | 10 +- > 7 files changed, 267 insertions(+), 100 deletions(-) > create mode 100644 lib/hash/compare_signatures_arm_pvt.h > create mode 100644 lib/hash/compare_signatures_generic_pvt.h > create mode 100644 lib/hash/compare_signatures_x86_pvt.h > create mode 100644 lib/hash/hash_sig_cmp_func_pvt.h > > diff --git a/.mailmap b/.mailmap > index f76037213d..ec525981fe 100644 > --- a/.mailmap > +++ b/.mailmap > @@ -1661,6 +1661,7 @@ Yixue Wang > Yi Yang > Yi Zhang > Yoann Desmouceaux > +Yoan Picchi > Yogesh Jangra > Yogev Chaimovich > Yongjie Gu > diff --git a/lib/hash/compare_signatures_arm_pvt.h b/lib/hash/compare_sig= natures_arm_pvt.h > new file mode 100644 > index 0000000000..e83bae9912 > --- /dev/null > +++ b/lib/hash/compare_signatures_arm_pvt.h I guess pvt stands for private. No need for such suffix, this header won't be exported in any case. > @@ -0,0 +1,60 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2010-2016 Intel Corporation > + * Copyright(c) 2018-2024 Arm Limited > + */ > + > +/* > + * Arm's version uses a densely packed hitmask buffer: > + * Every bit is in use. > + */ Please put a header guard. #ifndef _H #define _H > + > +#include > +#include > +#include > + > +#include "rte_cuckoo_hash.h" > +#include "hash_sig_cmp_func_pvt.h" > + > +#define DENSE_HASH_BULK_LOOKUP 1 > + > +static inline void > +compare_signatures_dense(uint16_t *hitmask_buffer, > + const uint16_t *prim_bucket_sigs, > + const uint16_t *sec_bucket_sigs, > + uint16_t sig, > + enum rte_hash_sig_compare_function sig_cmp_fn) > +{ > + > + static_assert(sizeof(*hitmask_buffer) >=3D 2 * (RTE_HASH_BUCKET_E= NTRIES / 8), > + "hitmask_buffer must be wide enough to fit a dense hitmas= k"); > + > + /* For match mask every bits indicates the match */ > + switch (sig_cmp_fn) { > +#if RTE_HASH_BUCKET_ENTRIES <=3D 8 > + case RTE_HASH_COMPARE_NEON: { > + uint16x8_t vmat, vsig, x; > + int16x8_t shift =3D {0, 1, 2, 3, 4, 5, 6, 7}; > + uint16_t low, high; > + > + vsig =3D vld1q_dup_u16((uint16_t const *)&sig); > + /* Compare all signatures in the primary bucket */ > + vmat =3D vceqq_u16(vsig, vld1q_u16((uint16_t const *)prim= _bucket_sigs)); > + x =3D vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shi= ft); > + low =3D (uint16_t)(vaddvq_u16(x)); > + /* Compare all signatures in the secondary bucket */ > + vmat =3D vceqq_u16(vsig, vld1q_u16((uint16_t const *)sec_= bucket_sigs)); > + x =3D vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shi= ft); > + high =3D (uint16_t)(vaddvq_u16(x)); > + *hitmask_buffer =3D low | high << RTE_HASH_BUCKET_ENTRIES= ; > + > + } > + break; > +#endif > + default: > + for (unsigned int i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i= ++) { > + *hitmask_buffer |=3D (sig =3D=3D prim_bucket_sigs= [i]) << i; > + *hitmask_buffer |=3D > + ((sig =3D=3D sec_bucket_sigs[i]) << i) <<= RTE_HASH_BUCKET_ENTRIES; > + } > + } > +} IIRC, this code is copied in all three headers. It is a common scalar version, so the ARM code could simply call the "generic" implementation rather than copy/paste. [snip] > diff --git a/lib/hash/compare_signatures_x86_pvt.h b/lib/hash/compare_sig= natures_x86_pvt.h > new file mode 100644 > index 0000000000..932912ba19 > --- /dev/null > +++ b/lib/hash/compare_signatures_x86_pvt.h > @@ -0,0 +1,49 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2010-2016 Intel Corporation > + * Copyright(c) 2018-2024 Arm Limited > + */ > + > +/* > + * x86's version uses a sparsely packed hitmask buffer: > + * Every other bit is padding. > + */ > + > +#include > +#include > +#include > + > +#include "rte_cuckoo_hash.h" > +#include "hash_sig_cmp_func_pvt.h" > + > +#define DENSE_HASH_BULK_LOOKUP 0 > + > +static inline void > +compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t *sec_has= h_matches, > + const struct rte_hash_bucket *prim_bkt, > + const struct rte_hash_bucket *sec_bkt, > + uint16_t sig, > + enum rte_hash_sig_compare_function sig_cmp_fn) > +{ > + /* For match mask the first bit of every two bits indicates the m= atch */ > + switch (sig_cmp_fn) { > +#if defined(__SSE2__) && RTE_HASH_BUCKET_ENTRIES <=3D 8 The check on RTE_HASH_BUCKET_ENTRIES <=3D 8 seems new. It was not present in the previous implementation for SSE2, and this difference is not explained. > + case RTE_HASH_COMPARE_SSE: > + /* Compare all signatures in the bucket */ > + *prim_hash_matches =3D _mm_movemask_epi8(_mm_cmpeq_epi16(= _mm_load_si128( > + (__m128i const *)prim_bkt->sig_current), _mm_set1= _epi16(sig))); > + /* Extract the even-index bits only */ > + *prim_hash_matches &=3D 0x5555; > + /* Compare all signatures in the bucket */ > + *sec_hash_matches =3D _mm_movemask_epi8(_mm_cmpeq_epi16(_= mm_load_si128( > + (__m128i const *)sec_bkt->sig_current), _mm_set1_= epi16(sig))); > + /* Extract the even-index bits only */ > + *sec_hash_matches &=3D 0x5555; > + break; > +#endif /* defined(__SSE2__) */ > + default: > + for (unsigned int i =3D 0; i < RTE_HASH_BUCKET_ENTRIES; i= ++) { > + *prim_hash_matches |=3D (sig =3D=3D prim_bkt->sig= _current[i]) << (i << 1); > + *sec_hash_matches |=3D (sig =3D=3D sec_bkt->sig_c= urrent[i]) << (i << 1); > + } > + } > +} > diff --git a/lib/hash/hash_sig_cmp_func_pvt.h b/lib/hash/hash_sig_cmp_fun= c_pvt.h > new file mode 100644 > index 0000000000..d8d2fbffaf > --- /dev/null > +++ b/lib/hash/hash_sig_cmp_func_pvt.h Please rename as compare_signatures.h or maybe a simpler option is to move this enum declaration in rte_cuckoo_hash.c before including the per arch headers. > @@ -0,0 +1,20 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2024 Arm Limited > + */ > + > +#ifndef _SIG_CMP_FUNC_H_ > +#define _SIG_CMP_FUNC_H_ If keeping a header, this guard must reflect the file name. > + > +/** Enum used to select the implementation of the signature comparison f= unction to use /* is enough, doxygen only parses public headers. > + * eg: A system supporting SVE might want to use a NEON implementation. > + * Those may change and are for internal use only > + */ > +enum rte_hash_sig_compare_function { > + RTE_HASH_COMPARE_SCALAR =3D 0, > + RTE_HASH_COMPARE_SSE, > + RTE_HASH_COMPARE_NEON, > + RTE_HASH_COMPARE_SVE, > + RTE_HASH_COMPARE_NUM > +}; > + > +#endif [snip] > diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h > index a528f1d1a0..26a992419a 100644 > --- a/lib/hash/rte_cuckoo_hash.h > +++ b/lib/hash/rte_cuckoo_hash.h > @@ -134,14 +134,6 @@ struct rte_hash_key { > char key[0]; > }; > > -/* All different signature compare functions */ > -enum rte_hash_sig_compare_function { > - RTE_HASH_COMPARE_SCALAR =3D 0, > - RTE_HASH_COMPARE_SSE, > - RTE_HASH_COMPARE_NEON, > - RTE_HASH_COMPARE_NUM > -}; > - > /** Bucket structure */ > struct __rte_cache_aligned rte_hash_bucket { > uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES]; > @@ -199,7 +191,7 @@ struct __rte_cache_aligned rte_hash { > /**< Custom function used to compare keys. */ > enum cmp_jump_table_case cmp_jump_table_idx; > /**< Indicates which compare function to use. */ > - enum rte_hash_sig_compare_function sig_cmp_fn; > + unsigned int sig_cmp_fn; >From an ABI perspective, it looks ok. We may be breaking users that would inspect this public object, but I think it is ok. In any case, put this change in a separate patch so it is more visible. > /**< Indicates which signature compare function to use. */ > uint32_t bucket_bitmask; > /**< Bitmask for getting bucket index from hash signature. */ > -- > 2.25.1 > --=20 David Marchand