From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B9CA6431E0; Mon, 23 Oct 2023 10:55:08 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D849540A8A; Mon, 23 Oct 2023 10:55:04 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 7EA9B4027C for ; Fri, 20 Oct 2023 18:53:02 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E00BF143D; Fri, 20 Oct 2023 09:53:42 -0700 (PDT) Received: from ampere-altra-2-2.usa.Arm.com (unknown [10.118.91.160]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E133F3F5A1; Fri, 20 Oct 2023 09:53:01 -0700 (PDT) From: Yoan Picchi To: Thomas Monjalon , Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, Yoan Picchi Subject: [PATCH v2 1/4] hash: pack the hitmask for hash in bulk lookup Date: Fri, 20 Oct 2023 16:51:56 +0000 Message-Id: <20231020165159.1649282-2-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231020165159.1649282-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Mailman-Approved-At: Mon, 23 Oct 2023 10:55:02 +0200 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Current hitmask includes padding due to Intel's SIMD implementation detail. This patch allows non Intel SIMD implementations to benefit from a dense hitmask. Signed-off-by: Yoan Picchi --- .mailmap | 2 + lib/hash/rte_cuckoo_hash.c | 118 ++++++++++++++++++++++++++----------- 2 files changed, 86 insertions(+), 34 deletions(-) diff --git a/.mailmap b/.mailmap index 3f5bab26a8..b9c49aa7f6 100644 --- a/.mailmap +++ b/.mailmap @@ -485,6 +485,7 @@ Hari Kumar Vemula Harini Ramakrishnan Hariprasad Govindharajan Harish Patil +Harjot Singh Harman Kalra Harneet Singh Harold Huang @@ -1602,6 +1603,7 @@ Yixue Wang Yi Yang Yi Zhang Yoann Desmouceaux +Yoan Picchi Yogesh Jangra Yogev Chaimovich Yongjie Gu diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c index 19b23f2a97..2aa96eb862 100644 --- a/lib/hash/rte_cuckoo_hash.c +++ b/lib/hash/rte_cuckoo_hash.c @@ -1850,8 +1850,50 @@ rte_hash_free_key_with_position(const struct rte_hash *h, } +#if defined(__ARM_NEON) + +static inline void +compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, + const struct rte_hash_bucket *prim_bkt, + const struct rte_hash_bucket *sec_bkt, + uint16_t sig, + enum rte_hash_sig_compare_function sig_cmp_fn) +{ + unsigned int i; + + /* For match mask every bits indicates the match */ + switch (sig_cmp_fn) { + case RTE_HASH_COMPARE_NEON: { + uint16x8_t vmat, vsig, x; + int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7}; + + vsig = vld1q_dup_u16((uint16_t const *)&sig); + /* Compare all signatures in the primary bucket */ + vmat = vceqq_u16(vsig, + vld1q_u16((uint16_t const *)prim_bkt->sig_current)); + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); + *prim_hash_matches = (uint32_t)(vaddvq_u16(x)); + /* Compare all signatures in the secondary bucket */ + vmat = vceqq_u16(vsig, + vld1q_u16((uint16_t const *)sec_bkt->sig_current)); + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); + *sec_hash_matches = (uint32_t)(vaddvq_u16(x)); + } + break; + default: + for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { + *prim_hash_matches |= + ((sig == prim_bkt->sig_current[i]) << i); + *sec_hash_matches |= + ((sig == sec_bkt->sig_current[i]) << i); + } + } +} + +#else + static inline void -compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, +compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, const struct rte_hash_bucket *prim_bkt, const struct rte_hash_bucket *sec_bkt, uint16_t sig, @@ -1878,25 +1920,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, /* Extract the even-index bits only */ *sec_hash_matches &= 0x5555; break; -#elif defined(__ARM_NEON) - case RTE_HASH_COMPARE_NEON: { - uint16x8_t vmat, vsig, x; - int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1}; - - vsig = vld1q_dup_u16((uint16_t const *)&sig); - /* Compare all signatures in the primary bucket */ - vmat = vceqq_u16(vsig, - vld1q_u16((uint16_t const *)prim_bkt->sig_current)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); - *prim_hash_matches = (uint32_t)(vaddvq_u16(x)); - /* Compare all signatures in the secondary bucket */ - vmat = vceqq_u16(vsig, - vld1q_u16((uint16_t const *)sec_bkt->sig_current)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); - *sec_hash_matches = (uint32_t)(vaddvq_u16(x)); - } - break; -#endif +#endif /* defined(__SSE2__) */ default: for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { *prim_hash_matches |= @@ -1907,6 +1931,8 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, } } +#endif /* defined(__ARM_NEON) */ + static inline void __bulk_lookup_l(const struct rte_hash *h, const void **keys, const struct rte_hash_bucket **primary_bkt, @@ -1921,18 +1947,30 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0}; struct rte_hash_bucket *cur_bkt, *next_bkt; +#if defined(__ARM_NEON) + const int hitmask_padding = 0; +#else + const int hitmask_padding = 1; +#endif + __hash_rw_reader_lock(h); /* Compare signatures and prefetch key slot of first hit */ for (i = 0; i < num_keys; i++) { - compare_signatures(&prim_hitmask[i], &sec_hitmask[i], +#if defined(__ARM_NEON) + compare_signatures_dense(&prim_hitmask[i], &sec_hitmask[i], + primary_bkt[i], secondary_bkt[i], + sig[i], h->sig_cmp_fn); +#else + compare_signatures_sparse(&prim_hitmask[i], &sec_hitmask[i], primary_bkt[i], secondary_bkt[i], sig[i], h->sig_cmp_fn); +#endif if (prim_hitmask[i]) { uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = primary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -1946,7 +1984,7 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, if (sec_hitmask[i]) { uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -1963,7 +2001,7 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, while (prim_hitmask[i]) { uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = primary_bkt[i]->key_idx[hit_index]; const struct rte_hash_key *key_slot = @@ -1985,13 +2023,13 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - prim_hitmask[i] &= ~(3ULL << (hit_index << 1)); + prim_hitmask[i] &= ~(1 << (hit_index << hitmask_padding)); } while (sec_hitmask[i]) { uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index]; const struct rte_hash_key *key_slot = @@ -2014,7 +2052,7 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - sec_hitmask[i] &= ~(3ULL << (hit_index << 1)); + sec_hitmask[i] &= ~(1 << (hit_index << hitmask_padding)); } next_key: continue; @@ -2069,6 +2107,12 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, struct rte_hash_bucket *cur_bkt, *next_bkt; uint32_t cnt_b, cnt_a; +#if defined(__ARM_NEON) + const int hitmask_padding = 0; +#else + const int hitmask_padding = 1; +#endif + for (i = 0; i < num_keys; i++) positions[i] = -ENOENT; @@ -2082,14 +2126,20 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, /* Compare signatures and prefetch key slot of first hit */ for (i = 0; i < num_keys; i++) { - compare_signatures(&prim_hitmask[i], &sec_hitmask[i], +#if defined(__ARM_NEON) + compare_signatures_dense(&prim_hitmask[i], &sec_hitmask[i], primary_bkt[i], secondary_bkt[i], sig[i], h->sig_cmp_fn); +#else + compare_signatures_sparse(&prim_hitmask[i], &sec_hitmask[i], + primary_bkt[i], secondary_bkt[i], + sig[i], h->sig_cmp_fn); +#endif if (prim_hitmask[i]) { uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = primary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -2103,7 +2153,7 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, if (sec_hitmask[i]) { uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -2119,7 +2169,7 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, while (prim_hitmask[i]) { uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = __atomic_load_n( &primary_bkt[i]->key_idx[hit_index], @@ -2145,13 +2195,13 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - prim_hitmask[i] &= ~(3ULL << (hit_index << 1)); + prim_hitmask[i] &= ~(1 << (hit_index << hitmask_padding)); } while (sec_hitmask[i]) { uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = __atomic_load_n( &secondary_bkt[i]->key_idx[hit_index], @@ -2178,7 +2228,7 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - sec_hitmask[i] &= ~(3ULL << (hit_index << 1)); + sec_hitmask[i] &= ~(1 << (hit_index << hitmask_padding)); } next_key: continue; -- 2.25.1