From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 350D143C0D; Tue, 27 Feb 2024 18:42:37 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1CCA542EE6; Tue, 27 Feb 2024 18:42:22 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id EE55440279 for ; Tue, 27 Feb 2024 18:42:17 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E297E150C; Tue, 27 Feb 2024 09:42:53 -0800 (PST) Received: from octeon10-1.usa.Arm.com (unknown [10.118.91.161]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D32D63FB3E; Tue, 27 Feb 2024 09:42:14 -0800 (PST) From: Yoan Picchi To: Thomas Monjalon , Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin Cc: dev@dpdk.org, nd@arm.com, Yoan Picchi , Ruifeng Wang , Nathan Brown Subject: [PATCH v5 1/4] hash: pack the hitmask for hash in bulk lookup Date: Tue, 27 Feb 2024 17:42:00 +0000 Message-Id: <20240227174203.2889333-2-yoan.picchi@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240227174203.2889333-1-yoan.picchi@arm.com> References: <20231020165159.1649282-1-yoan.picchi@arm.com> <20240227174203.2889333-1-yoan.picchi@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Current hitmask includes padding due to Intel's SIMD implementation detail. This patch allows non Intel SIMD implementations to benefit from a dense hitmask. Signed-off-by: Yoan Picchi Reviewed-by: Ruifeng Wang Reviewed-by: Nathan Brown --- .mailmap | 2 + lib/hash/rte_cuckoo_hash.c | 118 ++++++++++++++++++++++++++----------- 2 files changed, 86 insertions(+), 34 deletions(-) diff --git a/.mailmap b/.mailmap index 12d2875641..60500bbe36 100644 --- a/.mailmap +++ b/.mailmap @@ -492,6 +492,7 @@ Hari Kumar Vemula Harini Ramakrishnan Hariprasad Govindharajan Harish Patil +Harjot Singh Harman Kalra Harneet Singh Harold Huang @@ -1625,6 +1626,7 @@ Yixue Wang Yi Yang Yi Zhang Yoann Desmouceaux +Yoan Picchi Yogesh Jangra Yogev Chaimovich Yongjie Gu diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c index 9cf94645f6..0550165584 100644 --- a/lib/hash/rte_cuckoo_hash.c +++ b/lib/hash/rte_cuckoo_hash.c @@ -1857,8 +1857,50 @@ rte_hash_free_key_with_position(const struct rte_hash *h, } +#if defined(__ARM_NEON) + +static inline void +compare_signatures_dense(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, + const struct rte_hash_bucket *prim_bkt, + const struct rte_hash_bucket *sec_bkt, + uint16_t sig, + enum rte_hash_sig_compare_function sig_cmp_fn) +{ + unsigned int i; + + /* For match mask every bits indicates the match */ + switch (sig_cmp_fn) { + case RTE_HASH_COMPARE_NEON: { + uint16x8_t vmat, vsig, x; + int16x8_t shift = {0, 1, 2, 3, 4, 5, 6, 7}; + + vsig = vld1q_dup_u16((uint16_t const *)&sig); + /* Compare all signatures in the primary bucket */ + vmat = vceqq_u16(vsig, + vld1q_u16((uint16_t const *)prim_bkt->sig_current)); + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); + *prim_hash_matches = (uint32_t)(vaddvq_u16(x)); + /* Compare all signatures in the secondary bucket */ + vmat = vceqq_u16(vsig, + vld1q_u16((uint16_t const *)sec_bkt->sig_current)); + x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x0001)), shift); + *sec_hash_matches = (uint32_t)(vaddvq_u16(x)); + } + break; + default: + for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { + *prim_hash_matches |= + ((sig == prim_bkt->sig_current[i]) << i); + *sec_hash_matches |= + ((sig == sec_bkt->sig_current[i]) << i); + } + } +} + +#else + static inline void -compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, +compare_signatures_sparse(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, const struct rte_hash_bucket *prim_bkt, const struct rte_hash_bucket *sec_bkt, uint16_t sig, @@ -1885,25 +1927,7 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, /* Extract the even-index bits only */ *sec_hash_matches &= 0x5555; break; -#elif defined(__ARM_NEON) - case RTE_HASH_COMPARE_NEON: { - uint16x8_t vmat, vsig, x; - int16x8_t shift = {-15, -13, -11, -9, -7, -5, -3, -1}; - - vsig = vld1q_dup_u16((uint16_t const *)&sig); - /* Compare all signatures in the primary bucket */ - vmat = vceqq_u16(vsig, - vld1q_u16((uint16_t const *)prim_bkt->sig_current)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); - *prim_hash_matches = (uint32_t)(vaddvq_u16(x)); - /* Compare all signatures in the secondary bucket */ - vmat = vceqq_u16(vsig, - vld1q_u16((uint16_t const *)sec_bkt->sig_current)); - x = vshlq_u16(vandq_u16(vmat, vdupq_n_u16(0x8000)), shift); - *sec_hash_matches = (uint32_t)(vaddvq_u16(x)); - } - break; -#endif +#endif /* defined(__SSE2__) */ default: for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) { *prim_hash_matches |= @@ -1914,6 +1938,8 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, } } +#endif /* defined(__ARM_NEON) */ + static inline void __bulk_lookup_l(const struct rte_hash *h, const void **keys, const struct rte_hash_bucket **primary_bkt, @@ -1928,18 +1954,30 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0}; struct rte_hash_bucket *cur_bkt, *next_bkt; +#if defined(__ARM_NEON) + const int hitmask_padding = 0; +#else + const int hitmask_padding = 1; +#endif + __hash_rw_reader_lock(h); /* Compare signatures and prefetch key slot of first hit */ for (i = 0; i < num_keys; i++) { - compare_signatures(&prim_hitmask[i], &sec_hitmask[i], +#if defined(__ARM_NEON) + compare_signatures_dense(&prim_hitmask[i], &sec_hitmask[i], + primary_bkt[i], secondary_bkt[i], + sig[i], h->sig_cmp_fn); +#else + compare_signatures_sparse(&prim_hitmask[i], &sec_hitmask[i], primary_bkt[i], secondary_bkt[i], sig[i], h->sig_cmp_fn); +#endif if (prim_hitmask[i]) { uint32_t first_hit = rte_ctz32(prim_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = primary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -1953,7 +1991,7 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, if (sec_hitmask[i]) { uint32_t first_hit = rte_ctz32(sec_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -1970,7 +2008,7 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, while (prim_hitmask[i]) { uint32_t hit_index = rte_ctz32(prim_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = primary_bkt[i]->key_idx[hit_index]; const struct rte_hash_key *key_slot = @@ -1992,13 +2030,13 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - prim_hitmask[i] &= ~(3ULL << (hit_index << 1)); + prim_hitmask[i] &= ~(1 << (hit_index << hitmask_padding)); } while (sec_hitmask[i]) { uint32_t hit_index = rte_ctz32(sec_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index]; const struct rte_hash_key *key_slot = @@ -2021,7 +2059,7 @@ __bulk_lookup_l(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - sec_hitmask[i] &= ~(3ULL << (hit_index << 1)); + sec_hitmask[i] &= ~(1 << (hit_index << hitmask_padding)); } next_key: continue; @@ -2076,6 +2114,12 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, struct rte_hash_bucket *cur_bkt, *next_bkt; uint32_t cnt_b, cnt_a; +#if defined(__ARM_NEON) + const int hitmask_padding = 0; +#else + const int hitmask_padding = 1; +#endif + for (i = 0; i < num_keys; i++) positions[i] = -ENOENT; @@ -2089,14 +2133,20 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, /* Compare signatures and prefetch key slot of first hit */ for (i = 0; i < num_keys; i++) { - compare_signatures(&prim_hitmask[i], &sec_hitmask[i], +#if defined(__ARM_NEON) + compare_signatures_dense(&prim_hitmask[i], &sec_hitmask[i], primary_bkt[i], secondary_bkt[i], sig[i], h->sig_cmp_fn); +#else + compare_signatures_sparse(&prim_hitmask[i], &sec_hitmask[i], + primary_bkt[i], secondary_bkt[i], + sig[i], h->sig_cmp_fn); +#endif if (prim_hitmask[i]) { uint32_t first_hit = rte_ctz32(prim_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = primary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -2110,7 +2160,7 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, if (sec_hitmask[i]) { uint32_t first_hit = rte_ctz32(sec_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit]; const struct rte_hash_key *key_slot = @@ -2126,7 +2176,7 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, while (prim_hitmask[i]) { uint32_t hit_index = rte_ctz32(prim_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = rte_atomic_load_explicit( &primary_bkt[i]->key_idx[hit_index], @@ -2152,13 +2202,13 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - prim_hitmask[i] &= ~(3ULL << (hit_index << 1)); + prim_hitmask[i] &= ~(1 << (hit_index << hitmask_padding)); } while (sec_hitmask[i]) { uint32_t hit_index = rte_ctz32(sec_hitmask[i]) - >> 1; + >> hitmask_padding; uint32_t key_idx = rte_atomic_load_explicit( &secondary_bkt[i]->key_idx[hit_index], @@ -2185,7 +2235,7 @@ __bulk_lookup_lf(const struct rte_hash *h, const void **keys, positions[i] = key_idx - 1; goto next_key; } - sec_hitmask[i] &= ~(3ULL << (hit_index << 1)); + sec_hitmask[i] &= ~(1 << (hit_index << hitmask_padding)); } next_key: continue; -- 2.34.1