From: Bruce Richardson <bruce.richardson@intel.com>
To: Jieqiang Wang <jieqiang.wang@arm.com>
Cc: Yipeng Wang <yipeng1.wang@intel.com>,
Sameh Gobriel <sameh.gobriel@intel.com>,
Vladimir Medvedkin <vladimir.medvedkin@intel.com>,
Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,
Dharmik Thakkar <dharmik.thakkar@arm.com>, <dev@dpdk.org>,
<nd@arm.com>, <stable@dpdk.org>,
Feifei Wang <feifei.wang2@arm.com>,
Ruifeng Wang <ruifeng.wang@arm.com>
Subject: Re: [PATCH] hash: fix SSE comparison
Date: Mon, 2 Oct 2023 11:39:54 +0100 [thread overview]
Message-ID: <ZRqd+uQnCb2ivanN@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <20230906023100.3618303-1-jieqiang.wang@arm.com>
On Wed, Sep 06, 2023 at 10:31:00AM +0800, Jieqiang Wang wrote:
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> equal. In original SSE2 implementation for function compare_signatures,
> it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> element, while we should only care about the MSB of lower 8-bit in each
> 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
>
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: yipeng1.wang@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Signed-off-by: Jieqiang Wang <jieqiang.wang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Fix looks correct, but see comment below. I think we can convert the vector
mask to a simpler - and possibly faster - scalar one.
/Bruce
> ---
> lib/hash/rte_cuckoo_hash.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index d92a903bb3..acaa8b74bd 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
> /* For match mask the first bit of every two bits indicates the match */
> switch (sig_cmp_fn) {
> #if defined(__SSE2__)
> - case RTE_HASH_COMPARE_SSE:
> + case RTE_HASH_COMPARE_SSE: {
> /* Compare all signatures in the bucket */
> - *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> - _mm_load_si128(
> + __m128i shift_mask = _mm_set1_epi16(0x0080);
Not sure that this variable name is the most descriptive, as we don't
actually shift anything using this. How about "results_mask".
> + __m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128(
> (__m128i const *)prim_bkt->sig_current),
> - _mm_set1_epi16(sig)));
> + _mm_set1_epi16(sig));
> + *prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp, shift_mask));
While this will work like you describe, I would think the simpler solution
here is not to do a vector mask, but instead to simply do a scalar one.
This would save extra vector loads too, since all values could just be
masked with compile-time constant 0xAAAA.
> /* Compare all signatures in the bucket */
> - *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> - _mm_load_si128(
> + __m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128(
> (__m128i const *)sec_bkt->sig_current),
> - _mm_set1_epi16(sig)));
> + _mm_set1_epi16(sig));
> + *sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, shift_mask));
> + }
> break;
> #elif defined(__ARM_NEON)
> case RTE_HASH_COMPARE_NEON: {
> --
> 2.25.1
>
next prev parent reply other threads:[~2023-10-02 10:40 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-06 2:31 Jieqiang Wang
2023-09-29 15:32 ` David Marchand
2023-10-02 10:39 ` Bruce Richardson [this message]
2023-10-07 6:41 ` 回复: " Jieqiang Wang
2023-10-07 7:15 ` [PATCH v2] " Jieqiang Wang
2023-10-07 7:36 ` [PATCH v3] " Jieqiang Wang
2023-10-09 14:33 ` Bruce Richardson
2023-10-10 9:50 ` David Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRqd+uQnCb2ivanN@bricha3-MOBL.ger.corp.intel.com \
--to=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=dharmik.thakkar@arm.com \
--cc=feifei.wang2@arm.com \
--cc=honnappa.nagarahalli@arm.com \
--cc=jieqiang.wang@arm.com \
--cc=nd@arm.com \
--cc=ruifeng.wang@arm.com \
--cc=sameh.gobriel@intel.com \
--cc=stable@dpdk.org \
--cc=vladimir.medvedkin@intel.com \
--cc=yipeng1.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).