DPDK patches and discussions
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: Yoan Picchi <yoan.picchi@foss.arm.com>,
	Yoan Picchi <yoan.picchi@arm.com>,
	 Thomas Monjalon <thomas@monjalon.net>,
	Yipeng Wang <yipeng1.wang@intel.com>,
	Sameh Gobriel <sameh.gobriel@intel.com>,
	Bruce Richardson <bruce.richardson@intel.com>,
	Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "nd@arm.com" <nd@arm.com>,
	Ruifeng Wang <ruifeng.wang@arm.com>,
	Nathan Brown <nathan.brown@arm.com>
Subject: RE: [PATCH v7 1/4] hash: pack the hitmask for hash in bulk lookup
Date: Tue, 19 Mar 2024 13:25:58 +0000	[thread overview]
Message-ID: <3cfce8e3b128473096e1d43683fbe6f0@huawei.com> (raw)
In-Reply-To: <169368f3-c385-4591-ab3c-531c8918eba3@foss.arm.com>



> >
> > Hi,
> >
> >> Current hitmask includes padding due to Intel's SIMD
> >> implementation detail. This patch allows non Intel SIMD
> >> implementations to benefit from a dense hitmask.
> >> In addition, the new dense hitmask interweave the primary
> >> and secondary matches which allow a better cache usage and
> >> enable future improvements for the SIMD implementations
> >>
> >> Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
> >> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> Reviewed-by: Nathan Brown <nathan.brown@arm.com>
> >> ---
> >>   .mailmap                                  |   2 +
> >>   lib/hash/arch/arm/compare_signatures.h    |  61 +++++++
> >>   lib/hash/arch/common/compare_signatures.h |  38 +++++
> >>   lib/hash/arch/x86/compare_signatures.h    |  53 ++++++
> >>   lib/hash/rte_cuckoo_hash.c                | 192 ++++++++++++----------
> >>   5 files changed, 255 insertions(+), 91 deletions(-)
> >>   create mode 100644 lib/hash/arch/arm/compare_signatures.h
> >>   create mode 100644 lib/hash/arch/common/compare_signatures.h
> >>   create mode 100644 lib/hash/arch/x86/compare_signatures.h
> >>
> >> diff --git a/.mailmap b/.mailmap
> >> index 66ebc20666..00b50414d3 100644
> >> --- a/.mailmap
> >> +++ b/.mailmap
> >> @@ -494,6 +494,7 @@ Hari Kumar Vemula <hari.kumarx.vemula@intel.com>
> >>   Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>
> >>   Hariprasad Govindharajan <hariprasad.govindharajan@intel.com>
> >>   Harish Patil <harish.patil@cavium.com> <harish.patil@qlogic.com>
> >> +Harjot Singh <harjot.singh@arm.com>
> >>   Harman Kalra <hkalra@marvell.com>
> >>   Harneet Singh <harneet.singh@intel.com>
> >>   Harold Huang <baymaxhuang@gmail.com>
> >> @@ -1633,6 +1634,7 @@ Yixue Wang <yixue.wang@intel.com>
> >>   Yi Yang <yangyi01@inspur.com> <yi.y.yang@intel.com>
> >>   Yi Zhang <zhang.yi75@zte.com.cn>
> >>   Yoann Desmouceaux <ydesmouc@cisco.com>
> >> +Yoan Picchi <yoan.picchi@arm.com>
> >>   Yogesh Jangra <yogesh.jangra@intel.com>
> >>   Yogev Chaimovich <yogev@cgstowernetworks.com>
> >>   Yongjie Gu <yongjiex.gu@intel.com>
> >> diff --git a/lib/hash/arch/arm/compare_signatures.h b/lib/hash/arch/arm/compare_signatures.h
> >> new file mode 100644
> >> index 0000000000..1af6ba8190
> >> --- /dev/null
> >> +++ b/lib/hash/arch/arm/compare_signatures.h
> >> @@ -0,0 +1,61 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(c) 2010-2016 Intel Corporation
> >> + * Copyright(c) 2018-2024 Arm Limited
> >> + */
> >> +
> >> +/*
> >> + * Arm's version uses a densely packed hitmask buffer:
> >> + * Every bit is in use.
> >> + */
> >> +
> >> +#include <inttypes.h>
> >> +#include <rte_common.h>
> >> +#include <rte_vect.h>
> >> +#include "rte_cuckoo_hash.h"
> >> +
> >> +#define DENSE_HASH_BULK_LOOKUP 1
> >> +
> >> +static inline void
> >> +compare_signatures_dense(uint16_t *hitmask_buffer,
> >> +			const uint16_t *prim_bucket_sigs,
> >> +			const uint16_t *sec_bucket_sigs,
> >> +			uint16_t sig,
> >> +			enum rte_hash_sig_compare_function sig_cmp_fn)
> >> +{
> >> +
> >> +	static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
> >> +	"The hitmask must be exactly wide enough to accept the whole hitmask if it is dense");
> >> +
> >> +	/* For match mask every bits indicates the match */
> >> +	switch (sig_cmp_fn) {
> >> +#if RTE_HASH_BUCKET_ENTRIES <= 8
> >> +	case RTE_HASH_COMPARE_NEON: {
> >> +		uint16x8_t vmat, vsig, x;
> >> +		int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7};
> >> +		uint16_t low, high;
> >> +
> >> +		vsig = vld1q_dup_u16((uint16_t const *)&sig);
> >> +		/* Compare all signatures in the primary bucket */
> >> +		vmat = vceqq_u16(vsig,
> >> +			vld1q_u16((uint16_t const *)prim_bucket_sigs));
> >> +		x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
> >> +		low = (uint16_t)(vaddvq_u16(x));
> >> +		/* Compare all signatures in the secondary bucket */
> >> +		vmat = vceqq_u16(vsig,
> >> +			vld1q_u16((uint16_t const *)sec_bucket_sigs));
> >> +		x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift);
> >> +		high = (uint16_t)(vaddvq_u16(x));
> >> +		*hitmask_buffer = low | high << RTE_HASH_BUCKET_ENTRIES;
> >> +
> >> +		}
> >> +		break;
> >> +#endif
> >> +	default:
> >> +		for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> >> +			*hitmask_buffer |=
> >> +				((sig == prim_bucket_sigs[i]) << i);
> >> +			*hitmask_buffer |=
> >> +				((sig == sec_bucket_sigs[i]) << i) << RTE_HASH_BUCKET_ENTRIES;
> >> +		}
> >> +	}
> >> +}
> >> diff --git a/lib/hash/arch/common/compare_signatures.h b/lib/hash/arch/common/compare_signatures.h
> >> new file mode 100644
> >> index 0000000000..dcf9444032
> >> --- /dev/null
> >> +++ b/lib/hash/arch/common/compare_signatures.h
> >> @@ -0,0 +1,38 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(c) 2010-2016 Intel Corporation
> >> + * Copyright(c) 2018-2024 Arm Limited
> >> + */
> >> +
> >> +/*
> >> + * The generic version could use either a dense or sparsely packed hitmask buffer,
> >> + * but the dense one is slightly faster.
> >> + */
> >> +
> >> +#include <inttypes.h>
> >> +#include <rte_common.h>
> >> +#include <rte_vect.h>
> >> +#include "rte_cuckoo_hash.h"
> >> +
> >> +#define DENSE_HASH_BULK_LOOKUP 1
> >> +
> >> +static inline void
> >> +compare_signatures_dense(uint16_t *hitmask_buffer,
> >> +			const uint16_t *prim_bucket_sigs,
> >> +			const uint16_t *sec_bucket_sigs,
> >> +			uint16_t sig,
> >> +			enum rte_hash_sig_compare_function sig_cmp_fn)
> >> +{
> >> +	(void) sig_cmp_fn;
> >> +
> >> +	static_assert(sizeof(*hitmask_buffer) >= 2*(RTE_HASH_BUCKET_ENTRIES/8),
> >> +	"The hitmask must be exactly wide enough to accept the whole hitmask if it is dense");
> >> +
> >> +	/* For match mask every bits indicates the match */
> >> +	for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> >> +		*hitmask_buffer |=
> >> +			((sig == prim_bucket_sigs[i]) << i);
> >> +		*hitmask_buffer |=
> >> +			((sig == sec_bucket_sigs[i]) << i) << RTE_HASH_BUCKET_ENTRIES;
> >> +	}
> >> +
> >> +}
> >
> > Thanks for re-factoring compare_signatures_...() code, it looks much cleaner that way.
> > One question I have - does it mean that now for x86 we always use 'sparse' while for all other
> > ARM and non-ARM platforms we switch to 'dense'?
> 
> Yes it does. x86 support only the sparse method (the legacy one). Arm
> and generic code could support both dense and sparse. The reason I made
> them use the dense method is because it was slightly faster in my tests.

Ok, but before that, a 'generic' one (non-x86 and non-ARM) used 'sparse' one, correct?
If so, then probably need to outline it a bit more in patch comments and might be even release notes.
At least that would be my expectations, probably hash lib maintainers need to say what is the best way here.
The code refactoring itself - LGTM.

> (no need to add padding and shifts amongst other benefit.)
> 
> >
> >> diff --git a/lib/hash/arch/x86/compare_signatures.h b/lib/hash/arch/x86/compare_signatures.h
> >> new file mode 100644
> >> index 0000000000..7eec499e1f
> >> --- /dev/null
> >> +++ b/lib/hash/arch/x86/compare_signatures.h
> >> @@ -0,0 +1,53 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(c) 2010-2016 Intel Corporation
> >> + * Copyright(c) 2018-2024 Arm Limited
> >> + */
> >> +
> >> +/*
> >> + * x86's version uses a sparsely packed hitmask buffer:
> >> + * Every other bit is padding.
> >> + */
> >> +
> >> +#include <inttypes.h>
> >> +#include <rte_common.h>
> >> +#include <rte_vect.h>
> >> +#include "rte_cuckoo_hash.h"
> >> +
> >> +#define DENSE_HASH_BULK_LOOKUP 0
> >> +
> >> +static inline void
> >> +compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
> >> +			const struct rte_hash_bucket *prim_bkt,
> >> +			const struct rte_hash_bucket *sec_bkt,
> >> +			uint16_t sig,
> >> +			enum rte_hash_sig_compare_function sig_cmp_fn)
> >> +{
> >> +	/* For match mask the first bit of every two bits indicates the match */
> >> +	switch (sig_cmp_fn) {
> >> +#if defined(__SSE2__) && RTE_HASH_BUCKET_ENTRIES <= 8
> >> +	case RTE_HASH_COMPARE_SSE:
> >> +		/* Compare all signatures in the bucket */
> >> +		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> >> +				_mm_load_si128(
> >> +					(__m128i const *)prim_bkt->sig_current),
> >> +				_mm_set1_epi16(sig)));
> >> +		/* Extract the even-index bits only */
> >> +		*prim_hash_matches &= 0x5555;
> >> +		/* Compare all signatures in the bucket */
> >> +		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> >> +				_mm_load_si128(
> >> +					(__m128i const *)sec_bkt->sig_current),
> >> +				_mm_set1_epi16(sig)));
> >> +		/* Extract the even-index bits only */
> >> +		*sec_hash_matches &= 0x5555;
> >> +		break;
> >> +#endif /* defined(__SSE2__) */
> >> +	default:
> >> +		for (unsigned int i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> >> +			*prim_hash_matches |=
> >> +				((sig == prim_bkt->sig_current[i]) << (i << 1));
> >> +			*sec_hash_matches |=
> >> +				((sig == sec_bkt->sig_current[i]) << (i << 1));
> >> +		}
> >> +	}
> >> +}


  reply	other threads:[~2024-03-19 13:26 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-20 16:51 [PATCH v2 0/4] hash: add SVE support for bulk key lookup Yoan Picchi
2023-10-20 16:51 ` [PATCH v2 1/4] hash: pack the hitmask for hash in bulk lookup Yoan Picchi
2023-10-20 16:51 ` [PATCH v2 2/4] hash: optimize compare signature for NEON Yoan Picchi
2023-10-20 16:51 ` [PATCH v2 3/4] test/hash: check bulk lookup of keys after collision Yoan Picchi
2023-10-20 16:51 ` [PATCH v2 4/4] hash: add SVE support for bulk key lookup Yoan Picchi
2024-02-27 17:41 ` [PATCH v5 0/4] " Yoan Picchi
2024-02-27 17:42   ` [PATCH v5 1/4] hash: pack the hitmask for hash in bulk lookup Yoan Picchi
2024-02-27 17:42   ` [PATCH v5 2/4] hash: optimize compare signature for NEON Yoan Picchi
2024-02-27 17:42   ` [PATCH v5 3/4] test/hash: check bulk lookup of keys after collision Yoan Picchi
2024-02-27 17:42   ` [PATCH v5 4/4] hash: add SVE support for bulk key lookup Yoan Picchi
2024-02-28 10:56     ` Konstantin Ananyev
2024-02-28 14:48       ` Yoan Picchi
2024-03-04 13:35         ` Konstantin Ananyev
2024-03-05 15:36           ` Yoan Picchi
2024-03-11 23:21 ` [PATCH v6 0/4] " Yoan Picchi
2024-03-11 23:21   ` [PATCH v6 1/4] hash: pack the hitmask for hash in bulk lookup Yoan Picchi
2024-03-11 23:21   ` [PATCH v6 2/4] hash: optimize compare signature for NEON Yoan Picchi
2024-03-11 23:21   ` [PATCH v6 3/4] test/hash: check bulk lookup of keys after collision Yoan Picchi
2024-03-11 23:21   ` [PATCH v6 4/4] hash: add SVE support for bulk key lookup Yoan Picchi
2024-03-12  3:57     ` fengchengwen
2024-03-12 15:08       ` Yoan Picchi
2024-03-12 15:42 ` [PATCH v7 0/4] " Yoan Picchi
2024-03-12 15:42   ` [PATCH v7 1/4] hash: pack the hitmask for hash in bulk lookup Yoan Picchi
2024-03-19 10:41     ` Konstantin Ananyev
2024-03-19 13:09       ` Yoan Picchi
2024-03-19 13:25         ` Konstantin Ananyev [this message]
2024-03-19 16:09     ` Stephen Hemminger
2024-03-12 15:42   ` [PATCH v7 2/4] hash: optimize compare signature for NEON Yoan Picchi
2024-03-20  7:37     ` [EXTERNAL] " Pavan Nikhilesh Bhagavatula
2024-04-11 13:32       ` Yoan Picchi
2024-03-12 15:42   ` [PATCH v7 3/4] test/hash: check bulk lookup of keys after collision Yoan Picchi
2024-03-12 15:42   ` [PATCH v7 4/4] hash: add SVE support for bulk key lookup Yoan Picchi
2024-04-17 16:08 ` [PATCH v8 0/4] " Yoan Picchi
2024-04-17 16:08   ` [PATCH v8 1/4] hash: pack the hitmask for hash in bulk lookup Yoan Picchi
2024-04-17 18:12     ` Stephen Hemminger
2024-04-17 16:08   ` [PATCH v8 2/4] hash: optimize compare signature for NEON Yoan Picchi
2024-04-17 16:08   ` [PATCH v8 3/4] test/hash: check bulk lookup of keys after collision Yoan Picchi
2024-04-17 16:08   ` [PATCH v8 4/4] hash: add SVE support for bulk key lookup Yoan Picchi
2024-04-30 16:27 ` [PATCH v9 0/4] " Yoan Picchi
2024-04-30 16:27   ` [PATCH v9 1/4] hash: pack the hitmask for hash in bulk lookup Yoan Picchi
2024-04-30 16:27   ` [PATCH v9 2/4] hash: optimize compare signature for NEON Yoan Picchi
2024-04-30 16:27   ` [PATCH v9 3/4] test/hash: check bulk lookup of keys after collision Yoan Picchi
2024-04-30 16:27   ` [PATCH v9 4/4] hash: add SVE support for bulk key lookup Yoan Picchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3cfce8e3b128473096e1d43683fbe6f0@huawei.com \
    --to=konstantin.ananyev@huawei.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=nathan.brown@arm.com \
    --cc=nd@arm.com \
    --cc=ruifeng.wang@arm.com \
    --cc=sameh.gobriel@intel.com \
    --cc=thomas@monjalon.net \
    --cc=vladimir.medvedkin@intel.com \
    --cc=yipeng1.wang@intel.com \
    --cc=yoan.picchi@arm.com \
    --cc=yoan.picchi@foss.arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).