patches for DPDK stable branches
 help / color / mirror / Atom feed
* [PATCH] hash: fix SSE comparison
@ 2023-09-06  2:31 Jieqiang Wang
  2023-09-29 15:32 ` David Marchand
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Jieqiang Wang @ 2023-09-06  2:31 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin,
	Honnappa Nagarahalli, Dharmik Thakkar
  Cc: dev, nd, Jieqiang Wang, stable, Feifei Wang, Ruifeng Wang

__mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
equal. In original SSE2 implementation for function compare_signatures,
it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
element, while we should only care about the MSB of lower 8-bit in each
16-bit element.
For example, if the comparison result is all equal, SSE2 path returns
0xFFFF while NEON and default scalar path return 0x5555.
Although this bug is not causing any negative effects since the caller
function solely examines the trailing zeros of each match mask, we
recommend this fix to ensure consistency with NEON and default scalar
code behaviors.

Fixes: c7d93df552c2 ("hash: use partial-key hashing")
Cc: yipeng1.wang@intel.com
Cc: stable@dpdk.org

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/hash/rte_cuckoo_hash.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d92a903bb3..acaa8b74bd 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 	/* For match mask the first bit of every two bits indicates the match */
 	switch (sig_cmp_fn) {
 #if defined(__SSE2__)
-	case RTE_HASH_COMPARE_SSE:
+	case RTE_HASH_COMPARE_SSE: {
 		/* Compare all signatures in the bucket */
-		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
-				_mm_load_si128(
+		__m128i shift_mask = _mm_set1_epi16(0x0080);
+		__m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
-				_mm_set1_epi16(sig)));
+					_mm_set1_epi16(sig));
+		*prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp, shift_mask));
 		/* Compare all signatures in the bucket */
-		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
-				_mm_load_si128(
+		__m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
-				_mm_set1_epi16(sig)));
+					_mm_set1_epi16(sig));
+		*sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, shift_mask));
+		}
 		break;
 #elif defined(__ARM_NEON)
 	case RTE_HASH_COMPARE_NEON: {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] hash: fix SSE comparison
  2023-09-06  2:31 [PATCH] hash: fix SSE comparison Jieqiang Wang
@ 2023-09-29 15:32 ` David Marchand
  2023-10-02 10:39 ` Bruce Richardson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: David Marchand @ 2023-09-29 15:32 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin
  Cc: Honnappa Nagarahalli, Dharmik Thakkar, dev, nd, Jieqiang Wang,
	stable, Feifei Wang, Ruifeng Wang

On Wed, Sep 6, 2023 at 4:31 AM Jieqiang Wang <jieqiang.wang@arm.com> wrote:
>
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> equal. In original SSE2 implementation for function compare_signatures,
> it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> element, while we should only care about the MSB of lower 8-bit in each
> 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
>
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: yipeng1.wang@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

A review from this library maintainers please?


-- 
David Marchand


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] hash: fix SSE comparison
  2023-09-06  2:31 [PATCH] hash: fix SSE comparison Jieqiang Wang
  2023-09-29 15:32 ` David Marchand
@ 2023-10-02 10:39 ` Bruce Richardson
  2023-10-07  6:41   ` 回复: " Jieqiang Wang
  2023-10-07  7:15 ` [PATCH v2] " Jieqiang Wang
  2023-10-07  7:36 ` [PATCH v3] " Jieqiang Wang
  3 siblings, 1 reply; 8+ messages in thread
From: Bruce Richardson @ 2023-10-02 10:39 UTC (permalink / raw)
  To: Jieqiang Wang
  Cc: Yipeng Wang, Sameh Gobriel, Vladimir Medvedkin,
	Honnappa Nagarahalli, Dharmik Thakkar, dev, nd, stable,
	Feifei Wang, Ruifeng Wang

On Wed, Sep 06, 2023 at 10:31:00AM +0800, Jieqiang Wang wrote:
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> equal. In original SSE2 implementation for function compare_signatures,
> it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> element, while we should only care about the MSB of lower 8-bit in each
> 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
> 
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: yipeng1.wang@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Fix looks correct, but see comment below. I think we can convert the vector
mask to a simpler - and possibly faster - scalar one.

/Bruce

> ---
>  lib/hash/rte_cuckoo_hash.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index d92a903bb3..acaa8b74bd 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
>  	/* For match mask the first bit of every two bits indicates the match */
>  	switch (sig_cmp_fn) {
>  #if defined(__SSE2__)
> -	case RTE_HASH_COMPARE_SSE:
> +	case RTE_HASH_COMPARE_SSE: {
>  		/* Compare all signatures in the bucket */
> -		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -				_mm_load_si128(
> +		__m128i shift_mask = _mm_set1_epi16(0x0080);

Not sure that this variable name is the most descriptive, as we don't
actually shift anything using this. How about "results_mask".

> +		__m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>  					(__m128i const *)prim_bkt->sig_current),
> -				_mm_set1_epi16(sig)));
> +					_mm_set1_epi16(sig));
> +		*prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp, shift_mask));

While this will work like you describe, I would think the simpler solution
here is not to do a vector mask, but instead to simply do a scalar one.
This would save extra vector loads too, since all values could just be
masked with compile-time constant 0xAAAA.

>  		/* Compare all signatures in the bucket */
> -		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -				_mm_load_si128(
> +		__m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>  					(__m128i const *)sec_bkt->sig_current),
> -				_mm_set1_epi16(sig)));
> +					_mm_set1_epi16(sig));
> +		*sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, shift_mask));
> +		}
>  		break;
>  #elif defined(__ARM_NEON)
>  	case RTE_HASH_COMPARE_NEON: {
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 回复: [PATCH] hash: fix SSE comparison
  2023-10-02 10:39 ` Bruce Richardson
@ 2023-10-07  6:41   ` Jieqiang Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Jieqiang Wang @ 2023-10-07  6:41 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Yipeng Wang, Sameh Gobriel, Vladimir Medvedkin,
	Honnappa Nagarahalli, Dharmik Jayesh Thakkar, dev, nd, stable,
	Feifei Wang, Ruifeng Wang

Thanks for your comments, Bruce!
A few comments inline.

BR,
Jieqiang Wang
-----邮件原件-----
发件人: Bruce Richardson <bruce.richardson@intel.com>
发送时间: Monday, October 2, 2023 6:40 PM
收件人: Jieqiang Wang <Jieqiang.Wang@arm.com>
抄送: Yipeng Wang <yipeng1.wang@intel.com>; Sameh Gobriel <sameh.gobriel@intel.com>; Vladimir Medvedkin <vladimir.medvedkin@intel.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Dharmik Jayesh Thakkar <DharmikJayesh.Thakkar@arm.com>; dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org; Feifei Wang <Feifei.Wang2@arm.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>
主题: Re: [PATCH] hash: fix SSE comparison

On Wed, Sep 06, 2023 at 10:31:00AM +0800, Jieqiang Wang wrote:
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements
> are equal. In original SSE2 implementation for function
> compare_signatures, it utilizes _mm_movemask_epi8 to create mask from
> the MSB of each 8-bit element, while we should only care about the MSB
> of lower 8-bit in each 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
>
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: yipeng1.wang@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Fix looks correct, but see comment below. I think we can convert the vector mask to a simpler - and possibly faster - scalar one.

/Bruce

> ---
>  lib/hash/rte_cuckoo_hash.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index d92a903bb3..acaa8b74bd 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
>       /* For match mask the first bit of every two bits indicates the match */
>       switch (sig_cmp_fn) {
>  #if defined(__SSE2__)
> -     case RTE_HASH_COMPARE_SSE:
> +     case RTE_HASH_COMPARE_SSE: {
>               /* Compare all signatures in the bucket */
> -             *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -                             _mm_load_si128(
> +             __m128i shift_mask = _mm_set1_epi16(0x0080);

Not sure that this variable name is the most descriptive, as we don't actually shift anything using this. How about "results_mask".

Ack.

> +             __m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>                                       (__m128i const *)prim_bkt->sig_current),
> -                             _mm_set1_epi16(sig)));
> +                                     _mm_set1_epi16(sig));
> +             *prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp,
> +shift_mask));

While this will work like you describe, I would think the simpler solution here is not to do a vector mask, but instead to simply do a scalar one.
This would save extra vector loads too, since all values could just be masked with compile-time constant 0xAAAA.

Bingo! That's indeed a better way to fix this issue. Just to confirm my understanding: we don't need to construct a vector mask to execute AND operation with the compared mask. Instead, we can AND the result(prim_hash_matches/sec_hash_matches) with a constant mask in the end. But It appears the correct constant should be 0x5555, not 0xAAAA, because we only care about the even-index bits based on the code logic of the default scalar path.

>               /* Compare all signatures in the bucket */
> -             *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -                             _mm_load_si128(
> +             __m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>                                       (__m128i const *)sec_bkt->sig_current),
> -                             _mm_set1_epi16(sig)));
> +                                     _mm_set1_epi16(sig));
> +             *sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, shift_mask));
> +             }
>               break;
>  #elif defined(__ARM_NEON)
>       case RTE_HASH_COMPARE_NEON: {
> --
> 2.25.1
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] hash: fix SSE comparison
  2023-09-06  2:31 [PATCH] hash: fix SSE comparison Jieqiang Wang
  2023-09-29 15:32 ` David Marchand
  2023-10-02 10:39 ` Bruce Richardson
@ 2023-10-07  7:15 ` Jieqiang Wang
  2023-10-07  7:36 ` [PATCH v3] " Jieqiang Wang
  3 siblings, 0 replies; 8+ messages in thread
From: Jieqiang Wang @ 2023-10-07  7:15 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin,
	Dharmik Thakkar, Honnappa Nagarahalli
  Cc: dev, nd, Jieqiang Wang, stable, Feifei Wang, Ruifeng Wang

__mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
equal. In original SSE2 implementation for function compare_signatures,
it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
element, while we should only care about the MSB of lower 8-bit in each
16-bit element.
For example, if the comparison result is all equal, SSE2 path returns
0xFFFF while NEON and default scalar path return 0x5555.
Although this bug is not causing any negative effects since the caller
function solely examines the trailing zeros of each match mask, we
recommend this fix to ensure consistency with NEON and default scalar
code behaviors.

Fixes: c7d93df552c2 ("hash: use partial-key hashing")
Cc: stable@dpdk.org

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/hash/rte_cuckoo_hash.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d92a903bb3..d348d45246 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1868,11 +1868,15 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
 				_mm_set1_epi16(sig)));
+        /* Extract the even-index bits only */
+        *prim_hash_matches &= 0x5555;
 		/* Compare all signatures in the bucket */
 		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
 				_mm_set1_epi16(sig)));
+        /* Extract the even-index bits only */
+        *sec_hash_matches &= 0x5555;
 		break;
 #elif defined(__ARM_NEON)
 	case RTE_HASH_COMPARE_NEON: {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3] hash: fix SSE comparison
  2023-09-06  2:31 [PATCH] hash: fix SSE comparison Jieqiang Wang
                   ` (2 preceding siblings ...)
  2023-10-07  7:15 ` [PATCH v2] " Jieqiang Wang
@ 2023-10-07  7:36 ` Jieqiang Wang
  2023-10-09 14:33   ` Bruce Richardson
  3 siblings, 1 reply; 8+ messages in thread
From: Jieqiang Wang @ 2023-10-07  7:36 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin,
	Honnappa Nagarahalli, Dharmik Thakkar
  Cc: dev, nd, Jieqiang Wang, stable, Feifei Wang, Ruifeng Wang

__mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
equal. In original SSE2 implementation for function compare_signatures,
it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
element, while we should only care about the MSB of lower 8-bit in each
16-bit element.
For example, if the comparison result is all equal, SSE2 path returns
0xFFFF while NEON and default scalar path return 0x5555.
Although this bug is not causing any negative effects since the caller
function solely examines the trailing zeros of each match mask, we
recommend this fix to ensure consistency with NEON and default scalar
code behaviors.

Fixes: c7d93df552c2 ("hash: use partial-key hashing")
Cc: stable@dpdk.org

v2:
1. Utilize scalar mask instead of vector mask to save extra loads (Bruce)

v3:
1. Fix coding style warnings

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/hash/rte_cuckoo_hash.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index d92a903bb3..19b23f2a97 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -1868,11 +1868,15 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
 				_mm_set1_epi16(sig)));
+		/* Extract the even-index bits only */
+		*prim_hash_matches &= 0x5555;
 		/* Compare all signatures in the bucket */
 		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
 				_mm_set1_epi16(sig)));
+		/* Extract the even-index bits only */
+		*sec_hash_matches &= 0x5555;
 		break;
 #elif defined(__ARM_NEON)
 	case RTE_HASH_COMPARE_NEON: {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] hash: fix SSE comparison
  2023-10-07  7:36 ` [PATCH v3] " Jieqiang Wang
@ 2023-10-09 14:33   ` Bruce Richardson
  2023-10-10  9:50     ` David Marchand
  0 siblings, 1 reply; 8+ messages in thread
From: Bruce Richardson @ 2023-10-09 14:33 UTC (permalink / raw)
  To: Jieqiang Wang
  Cc: Yipeng Wang, Sameh Gobriel, Vladimir Medvedkin,
	Honnappa Nagarahalli, Dharmik Thakkar, dev, nd, stable,
	Feifei Wang, Ruifeng Wang

On Sat, Oct 07, 2023 at 03:36:34PM +0800, Jieqiang Wang wrote:
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> equal. In original SSE2 implementation for function compare_signatures,
> it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> element, while we should only care about the MSB of lower 8-bit in each
> 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
> 
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: stable@dpdk.org
> 
> v2:
> 1. Utilize scalar mask instead of vector mask to save extra loads (Bruce)
> 
> v3:
> 1. Fix coding style warnings
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] hash: fix SSE comparison
  2023-10-09 14:33   ` Bruce Richardson
@ 2023-10-10  9:50     ` David Marchand
  0 siblings, 0 replies; 8+ messages in thread
From: David Marchand @ 2023-10-10  9:50 UTC (permalink / raw)
  To: Jieqiang Wang
  Cc: Bruce Richardson, Yipeng Wang, Sameh Gobriel, Vladimir Medvedkin,
	Honnappa Nagarahalli, Dharmik Thakkar, dev, nd, stable,
	Feifei Wang, Ruifeng Wang

On Mon, Oct 9, 2023 at 4:33 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Sat, Oct 07, 2023 at 03:36:34PM +0800, Jieqiang Wang wrote:
> > __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> > equal. In original SSE2 implementation for function compare_signatures,
> > it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> > element, while we should only care about the MSB of lower 8-bit in each
> > 16-bit element.
> > For example, if the comparison result is all equal, SSE2 path returns
> > 0xFFFF while NEON and default scalar path return 0x5555.
> > Although this bug is not causing any negative effects since the caller
> > function solely examines the trailing zeros of each match mask, we
> > recommend this fix to ensure consistency with NEON and default scalar
> > code behaviors.
> >
> > Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> > Cc: stable@dpdk.org
> >
> > v2:
> > 1. Utilize scalar mask instead of vector mask to save extra loads (Bruce)
> >
> > v3:
> > 1. Fix coding style warnings

Changelog and other notes on a patch should not be in the commitlog
itself, but should go after ---.
https://doc.dpdk.org/guides/contributing/patches.html#creating-patches


> >
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied, thanks Jieqiang and Bruce.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-10  9:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-06  2:31 [PATCH] hash: fix SSE comparison Jieqiang Wang
2023-09-29 15:32 ` David Marchand
2023-10-02 10:39 ` Bruce Richardson
2023-10-07  6:41   ` 回复: " Jieqiang Wang
2023-10-07  7:15 ` [PATCH v2] " Jieqiang Wang
2023-10-07  7:36 ` [PATCH v3] " Jieqiang Wang
2023-10-09 14:33   ` Bruce Richardson
2023-10-10  9:50     ` David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).