DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: Jieqiang Wang <jieqiang.wang@arm.com>
Cc: Yipeng Wang <yipeng1.wang@intel.com>,
	Sameh Gobriel <sameh.gobriel@intel.com>,
	Vladimir Medvedkin <vladimir.medvedkin@intel.com>,
	 Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,
	Dharmik Thakkar <dharmik.thakkar@arm.com>, <dev@dpdk.org>,
	<nd@arm.com>, <stable@dpdk.org>,
	Feifei Wang <feifei.wang2@arm.com>,
	Ruifeng Wang <ruifeng.wang@arm.com>
Subject: Re: [PATCH] hash: fix SSE comparison
Date: Mon, 2 Oct 2023 11:39:54 +0100	[thread overview]
Message-ID: <ZRqd+uQnCb2ivanN@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <20230906023100.3618303-1-jieqiang.wang@arm.com>

On Wed, Sep 06, 2023 at 10:31:00AM +0800, Jieqiang Wang wrote:
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> equal. In original SSE2 implementation for function compare_signatures,
> it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> element, while we should only care about the MSB of lower 8-bit in each
> 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
> 
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: yipeng1.wang@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Fix looks correct, but see comment below. I think we can convert the vector
mask to a simpler - and possibly faster - scalar one.

/Bruce

> ---
>  lib/hash/rte_cuckoo_hash.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index d92a903bb3..acaa8b74bd 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
>  	/* For match mask the first bit of every two bits indicates the match */
>  	switch (sig_cmp_fn) {
>  #if defined(__SSE2__)
> -	case RTE_HASH_COMPARE_SSE:
> +	case RTE_HASH_COMPARE_SSE: {
>  		/* Compare all signatures in the bucket */
> -		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -				_mm_load_si128(
> +		__m128i shift_mask = _mm_set1_epi16(0x0080);

Not sure that this variable name is the most descriptive, as we don't
actually shift anything using this. How about "results_mask".

> +		__m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>  					(__m128i const *)prim_bkt->sig_current),
> -				_mm_set1_epi16(sig)));
> +					_mm_set1_epi16(sig));
> +		*prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp, shift_mask));

While this will work like you describe, I would think the simpler solution
here is not to do a vector mask, but instead to simply do a scalar one.
This would save extra vector loads too, since all values could just be
masked with compile-time constant 0xAAAA.

>  		/* Compare all signatures in the bucket */
> -		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -				_mm_load_si128(
> +		__m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>  					(__m128i const *)sec_bkt->sig_current),
> -				_mm_set1_epi16(sig)));
> +					_mm_set1_epi16(sig));
> +		*sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, shift_mask));
> +		}
>  		break;
>  #elif defined(__ARM_NEON)
>  	case RTE_HASH_COMPARE_NEON: {
> -- 
> 2.25.1
> 

  parent reply	other threads:[~2023-10-02 10:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06  2:31 Jieqiang Wang
2023-09-29 15:32 ` David Marchand
2023-10-02 10:39 ` Bruce Richardson [this message]
2023-10-07  6:41   ` 回复: " Jieqiang Wang
2023-10-07  7:15 ` [PATCH v2] " Jieqiang Wang
2023-10-07  7:36 ` [PATCH v3] " Jieqiang Wang
2023-10-09 14:33   ` Bruce Richardson
2023-10-10  9:50     ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRqd+uQnCb2ivanN@bricha3-MOBL.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=dharmik.thakkar@arm.com \
    --cc=feifei.wang2@arm.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jieqiang.wang@arm.com \
    --cc=nd@arm.com \
    --cc=ruifeng.wang@arm.com \
    --cc=sameh.gobriel@intel.com \
    --cc=stable@dpdk.org \
    --cc=vladimir.medvedkin@intel.com \
    --cc=yipeng1.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).