DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing
@ 2018-10-22 18:39 Yipeng Wang
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 1/4] hash: fix race condition in iterate Yipeng Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Yipeng Wang @ 2018-10-22 18:39 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli

This patch has dependency on another bug fix patch set:
http://patchwork.dpdk.org/cover/45611/

This patch set makes two major optimizations over the current rte_hash
library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

v7->v8:
patchwork splits the V7 into two patch series and the first one got merged
into V6.
Try to post again to resolve this issue.

v6->v7:
Fix a bug accidentally introduced in V5. In iterate function, the main table
copmleted condition should be 
*next == total_entries_main instead of total_entries

v5->v6:
1. hash: fix a typo in comment, found by Honnappa.
2. Fix typos in commit messages.
3. Add review-by and acked-by.

v4->v5:
1. hash: for the first commit, move back the lock and read "position" in the
while condition as Honnappa suggested.
2. hash: minor coding style change (Honnappa) and commit message typo fix.
3. Add Review-by from Honnappa.

v3->v4:
1. hash: Revise commit message to be more clear for "utilization" (Honnappa)
2. hash: in delete key function, return bucket change to use rte_ring_sp_enqueue
instead of rte_ring_mp_enqueue, since it is already protected inside locks.
3. hash: update rte_hash_iterate comments (Honnappa)
4. hash: Add a new commit to fix race condition in the rte_hash_iterate (Honnappa)
5. hash/test: during utilization test, double check rte_hash_cnt returns correct
value (Honnappa)
6. hash: for partial-key-hashing commit, break the get_buckets_index function
into three. It may make future extension easier (Honnappa)
7. hash: change the comment for typedef uint32_t hash_sig_t to be more clear
to users (Honnappa)

v2->v3:
The first four commits were separated from this patch set as another
independent patch set:
https://mails.dpdk.org/archives/dev/2018-September/113118.html
1. hash: move snprintf for ext_ring name under the ext_table condition.
2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
3. hash: after failing cuckoo path, search not only ext buckets, but also the
secondary bucket first to see if there may be an empty location now.
4. hash: totally rewrote the key deleting function logic. If the deleted key was
not in the last bucket of the linked list when ext table enabled, the last
entry in the linked list will be placed in the vacant slot from the deleted
key. The purpose is to compact the entries in the linked list to be more close
to the main table. This is to make sure that not many extendable buckets are
wasted with only one or two entries after some time of running, also benefit
lookup speed.
5. Other minor coding style/comments improvements.

V1->V2:
1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
2. hash: Reorder the rte_hash struct to align cache line better.
3. test: Minor changes in auto test to add key insertion failure check during
iteration test.
4. test: Add new commit to fix read-write test non-consecutive core issue.
4. hash: Add a new commit to remove unnecessary code introduced by previous
patches.
5. hash: Comments improvement and coding style improvements over multiple
places.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>


Yipeng Wang (4):
  hash: fix race condition in iterate
  hash: add extendable bucket feature
  test/hash: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 582 ++++++++++++++++++++++++++++----------
 lib/librte_hash/rte_cuckoo_hash.h |  11 +-
 lib/librte_hash/rte_hash.h        |   8 +-
 test/test/test_hash.c             | 159 ++++++++++-
 test/test/test_hash_perf.c        | 114 ++++++--
 5 files changed, 678 insertions(+), 196 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH v8 1/4] hash: fix race condition in iterate
  2018-10-22 18:39 [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
@ 2018-10-22 18:39 ` Yipeng Wang
  2018-10-23  8:53   ` Bruce Richardson
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 2/4] hash: add extendable bucket feature Yipeng Wang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Yipeng Wang @ 2018-10-22 18:39 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli

In rte_hash_iterate, the reader lock did not protect the
while loop which checks empty entry. This created a race
condition that the entry may become empty when enters
the lock, then a wrong key data value would be read out.

This commit reads out the position in the while condition,
which makes sure that the position will not be changed
to empty before entering the lock.

Fixes: f2e3001b53ec ("hash: support read/write concurrency")
Cc: stable@dpdk.org

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..da8ddf4 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1318,7 +1318,7 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	idx = *next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+	while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
 		if (*next == total_entries)
@@ -1326,9 +1326,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
+
 	__hash_rw_reader_lock(h);
-	/* Get position of entry in key table */
-	position = h->buckets[bucket_idx].key_idx[idx];
 	next_key = (struct rte_hash_key *) ((char *)h->key_store +
 				position * h->key_entry_size);
 	/* Return key and data */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH v8 2/4] hash: add extendable bucket feature
  2018-10-22 18:39 [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 1/4] hash: fix race condition in iterate Yipeng Wang
@ 2018-10-22 18:39 ` Yipeng Wang
  2018-10-23  8:55   ` Bruce Richardson
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Yipeng Wang @ 2018-10-22 18:39 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli

In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the hash table load can always achieve 100%.
In other words, the table can always accommodate the same
number of keys as the specified table size. This provides
100% table capacity guarantee.

Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 371 ++++++++++++++++++++++++++++++++------
 lib/librte_hash/rte_cuckoo_hash.h |   5 +
 lib/librte_hash/rte_hash.h        |   3 +
 3 files changed, 327 insertions(+), 52 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index da8ddf4..b872caa 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,6 +31,10 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
+#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)                            \
+	for (CURRENT_BKT = START_BUCKET;                                      \
+		CURRENT_BKT != NULL;                                          \
+		CURRENT_BKT = CURRENT_BKT->next)
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -63,6 +67,14 @@ rte_hash_find_existing(const char *name)
 	return h;
 }
 
+static inline struct rte_hash_bucket *
+rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt)
+{
+	while (lst_bkt->next != NULL)
+		lst_bkt = lst_bkt->next;
+	return lst_bkt;
+}
+
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)
 {
 	h->cmp_jump_table_idx = KEY_CUSTOM;
@@ -85,13 +97,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	struct rte_tailq_entry *te = NULL;
 	struct rte_hash_list *hash_list;
 	struct rte_ring *r = NULL;
+	struct rte_ring *r_ext = NULL;
 	char hash_name[RTE_HASH_NAMESIZE];
 	void *k = NULL;
 	void *buckets = NULL;
+	void *buckets_ext = NULL;
 	char ring_name[RTE_RING_NAMESIZE];
+	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
 	unsigned i;
 	unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +140,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		multi_writer_support = 1;
 	}
 
+	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
+		ext_table_support = 1;
+
 	/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
 	if (multi_writer_support)
 		/*
@@ -145,6 +164,24 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err;
 	}
 
+	const uint32_t num_buckets = rte_align32pow2(params->entries) /
+						RTE_HASH_BUCKET_ENTRIES;
+
+	/* Create ring for extendable buckets. */
+	if (ext_table_support) {
+		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
+								params->name);
+		r_ext = rte_ring_create(ext_ring_name,
+				rte_align32pow2(num_buckets + 1),
+				params->socket_id, 0);
+
+		if (r_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+								"failed\n");
+			goto err;
+		}
+	}
+
 	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
 
 	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -177,18 +214,34 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		goto err_unlock;
 	}
 
-	const uint32_t num_buckets = rte_align32pow2(params->entries)
-					/ RTE_HASH_BUCKET_ENTRIES;
-
 	buckets = rte_zmalloc_socket(NULL,
 				num_buckets * sizeof(struct rte_hash_bucket),
 				RTE_CACHE_LINE_SIZE, params->socket_id);
 
 	if (buckets == NULL) {
-		RTE_LOG(ERR, HASH, "memory allocation failed\n");
+		RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
 		goto err_unlock;
 	}
 
+	/* Allocate same number of extendable buckets */
+	if (ext_table_support) {
+		buckets_ext = rte_zmalloc_socket(NULL,
+				num_buckets * sizeof(struct rte_hash_bucket),
+				RTE_CACHE_LINE_SIZE, params->socket_id);
+		if (buckets_ext == NULL) {
+			RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+							"failed\n");
+			goto err_unlock;
+		}
+		/* Populate ext bkt ring. We reserve 0 similar to the
+		 * key-data slot, just in case in future we want to
+		 * use bucket index for the linked list and 0 means NULL
+		 * for next bucket
+		 */
+		for (i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+	}
+
 	const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params->key_len;
 	const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
 
@@ -262,6 +315,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->num_buckets = num_buckets;
 	h->bucket_bitmask = h->num_buckets - 1;
 	h->buckets = buckets;
+	h->buckets_ext = buckets_ext;
+	h->free_ext_bkts = r_ext;
 	h->hash_func = (params->hash_func == NULL) ?
 		default_hash_func : params->hash_func;
 	h->key_store = k;
@@ -269,6 +324,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->hw_trans_mem_support = hw_trans_mem_support;
 	h->multi_writer_support = multi_writer_support;
 	h->readwrite_concur_support = readwrite_concur_support;
+	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -304,9 +360,11 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
 err:
 	rte_ring_free(r);
+	rte_ring_free(r_ext);
 	rte_free(te);
 	rte_free(h);
 	rte_free(buckets);
+	rte_free(buckets_ext);
 	rte_free(k);
 	return NULL;
 }
@@ -344,8 +402,10 @@ rte_hash_free(struct rte_hash *h)
 		rte_free(h->readwrite_lock);
 	}
 	rte_ring_free(h->free_slots);
+	rte_ring_free(h->free_ext_bkts);
 	rte_free(h->key_store);
 	rte_free(h->buckets);
+	rte_free(h->buckets_ext);
 	rte_free(h);
 	rte_free(te);
 }
@@ -403,7 +463,6 @@ __hash_rw_writer_lock(const struct rte_hash *h)
 		rte_rwlock_write_lock(h->readwrite_lock);
 }
 
-
 static inline void
 __hash_rw_reader_lock(const struct rte_hash *h)
 {
@@ -448,6 +507,14 @@ rte_hash_reset(struct rte_hash *h)
 	while (rte_ring_dequeue(h->free_slots, &ptr) == 0)
 		rte_pause();
 
+	/* clear free extendable bucket ring and memory */
+	if (h->ext_table_support) {
+		memset(h->buckets_ext, 0, h->num_buckets *
+						sizeof(struct rte_hash_bucket));
+		while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0)
+			rte_pause();
+	}
+
 	/* Repopulate the free slots ring. Entry zero is reserved for key misses */
 	if (h->multi_writer_support)
 		tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
@@ -458,6 +525,13 @@ rte_hash_reset(struct rte_hash *h)
 	for (i = 1; i < tot_ring_cnt + 1; i++)
 		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
 
+	/* Repopulate the free ext bkt ring. */
+	if (h->ext_table_support) {
+		for (i = 1; i <= h->num_buckets; i++)
+			rte_ring_sp_enqueue(h->free_ext_bkts,
+						(void *)((uintptr_t) i));
+	}
+
 	if (h->multi_writer_support) {
 		/* Reset local caches per lcore */
 		for (i = 0; i < RTE_MAX_LCORE; i++)
@@ -524,24 +598,27 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		int32_t *ret_val)
 {
 	unsigned int i;
-	struct rte_hash_bucket *cur_bkt = prim_bkt;
+	struct rte_hash_bucket *cur_bkt;
 	int32_t ret;
 
 	__hash_rw_writer_lock(h);
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	/* Insert new entry if there is room in the primary
@@ -580,7 +657,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
-	struct rte_hash_bucket *cur_bkt = bkt;
+	struct rte_hash_bucket *cur_bkt;
 	struct queue_node *prev_node, *curr_node = leaf;
 	struct rte_hash_bucket *prev_bkt, *curr_bkt = leaf->bkt;
 	uint32_t prev_slot, curr_slot = leaf_slot;
@@ -597,18 +674,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, cur_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
 		return 1;
 	}
 
-	ret = search_and_update(h, data, key, alt_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		*ret_val = ret;
-		return 1;
+	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			*ret_val = ret;
+			return 1;
+		}
 	}
 
 	while (likely(curr_node->prev != NULL)) {
@@ -711,15 +790,18 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	hash_sig_t alt_hash;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
-	struct rte_hash_bucket *prim_bkt, *sec_bkt;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
 	void *slot_id = NULL;
-	uint32_t new_idx;
+	void *ext_bkt_id = NULL;
+	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
+	unsigned int i;
 	struct lcore_cache *cached_free_slots = NULL;
 	int32_t ret_val;
+	struct rte_hash_bucket *last;
 
 	prim_bucket_idx = sig & h->bucket_bitmask;
 	prim_bkt = &h->buckets[prim_bucket_idx];
@@ -739,10 +821,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Check if key is already inserted in secondary location */
-	ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
-	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			__hash_rw_writer_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_writer_unlock(h);
 
@@ -808,10 +892,70 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
-	} else {
+	}
+
+	/* if ext table not enabled, we failed the insertion */
+	if (!h->ext_table_support) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret;
 	}
+
+	/* Now we need to go through the extendable bucket. Protection is needed
+	 * to protect all extendable bucket processes.
+	 */
+	__hash_rw_writer_lock(h);
+	/* We check for duplicates again since could be inserted before the lock */
+	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	if (ret != -1) {
+		enqueue_slot_back(h, cached_free_slots, slot_id);
+		goto failure;
+	}
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		if (ret != -1) {
+			enqueue_slot_back(h, cached_free_slots, slot_id);
+			goto failure;
+		}
+	}
+
+	/* Search sec and ext buckets to find an empty entry to insert. */
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+			/* Check if slot is available */
+			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
+				cur_bkt->sig_current[i] = alt_hash;
+				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->key_idx[i] = new_idx;
+				__hash_rw_writer_unlock(h);
+				return new_idx - 1;
+			}
+		}
+	}
+
+	/* Failed to get an empty entry from extendable buckets. Link a new
+	 * extendable bucket. We first get a free bucket from ring.
+	 */
+	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+		ret = -ENOSPC;
+		goto failure;
+	}
+
+	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
+	/* Use the first location of the new bucket */
+	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
+	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
+	/* Link the new bucket to sec bucket linked list */
+	last = rte_hash_get_last_bkt(sec_bkt);
+	last->next = &h->buckets_ext[bkt_id];
+	__hash_rw_writer_unlock(h);
+	return new_idx - 1;
+
+failure:
+	__hash_rw_writer_unlock(h);
+	return ret;
+
 }
 
 int32_t
@@ -890,7 +1034,7 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
+	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
 
 	bucket_idx = sig & h->bucket_bitmask;
@@ -910,10 +1054,12 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 	bkt = &h->buckets[bucket_idx];
 
 	/* Check if key is in secondary location */
-	ret = search_one_bucket(h, key, alt_hash, data, bkt);
-	if (ret != -1) {
-		__hash_rw_reader_unlock(h);
-		return ret;
+	FOR_EACH_BUCKET(cur_bkt, bkt) {
+		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		if (ret != -1) {
+			__hash_rw_reader_unlock(h);
+			return ret;
+		}
 	}
 	__hash_rw_reader_unlock(h);
 	return -ENOENT;
@@ -978,16 +1124,42 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	}
 }
 
+/* Compact the linked list by moving key from last entry in linked list to the
+ * empty slot.
+ */
+static inline void
+__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
+	int i;
+	struct rte_hash_bucket *last_bkt;
+
+	if (!cur_bkt->next)
+		return;
+
+	last_bkt = rte_hash_get_last_bkt(cur_bkt);
+
+	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
+			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
+			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
+			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
+			last_bkt->sig_current[i] = NULL_SIGNATURE;
+			last_bkt->sig_alt[i] = NULL_SIGNATURE;
+			last_bkt->key_idx[i] = EMPTY_SLOT;
+			return;
+		}
+	}
+}
+
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig)
+			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
 	int32_t ret;
 
-	/* Check if key is in primary location */
+	/* Check if key is in bucket */
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 		if (bkt->sig_current[i] == sig &&
 				bkt->key_idx[i] != EMPTY_SLOT) {
@@ -996,12 +1168,12 @@ search_and_remove(const struct rte_hash *h, const void *key,
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
 				remove_entry(h, bkt, i);
 
-				/*
-				 * Return index where key is stored,
+				/* Return index where key is stored,
 				 * subtracting the first dummy index
 				 */
 				ret = bkt->key_idx[i] - 1;
 				bkt->key_idx[i] = EMPTY_SLOT;
+				*pos = i;
 				return ret;
 			}
 		}
@@ -1015,34 +1187,66 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 {
 	uint32_t bucket_idx;
 	hash_sig_t alt_hash;
-	struct rte_hash_bucket *bkt;
-	int32_t ret;
+	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
+	struct rte_hash_bucket *cur_bkt;
+	int pos;
+	int32_t ret, i;
 
 	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	prim_bkt = &h->buckets[bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, bkt, sig);
+	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
 	if (ret != -1) {
-		__hash_rw_writer_unlock(h);
-		return ret;
+		__rte_hash_compact_ll(prim_bkt, pos);
+		last_bkt = prim_bkt->next;
+		prev_bkt = prim_bkt;
+		goto return_bkt;
 	}
 
 	/* Calculate secondary hash */
 	alt_hash = rte_hash_secondary_hash(sig);
 	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[bucket_idx];
+
+	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
+		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		if (ret != -1) {
+			__rte_hash_compact_ll(cur_bkt, pos);
+			last_bkt = sec_bkt->next;
+			prev_bkt = sec_bkt;
+			goto return_bkt;
+		}
+	}
 
-	/* look for key in secondary bucket */
-	ret = search_and_remove(h, key, bkt, alt_hash);
-	if (ret != -1) {
+	__hash_rw_writer_unlock(h);
+	return -ENOENT;
+
+/* Search last bucket to see if empty to be recycled */
+return_bkt:
+	if (!last_bkt) {
 		__hash_rw_writer_unlock(h);
 		return ret;
 	}
+	while (last_bkt->next) {
+		prev_bkt = last_bkt;
+		last_bkt = last_bkt->next;
+	}
+
+	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+		if (last_bkt->key_idx[i] != EMPTY_SLOT)
+			break;
+	}
+	/* found empty bucket and recycle */
+	if (i == RTE_HASH_BUCKET_ENTRIES) {
+		prev_bkt->next = last_bkt->next = NULL;
+		uint32_t index = last_bkt - h->buckets_ext + 1;
+		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+	}
 
 	__hash_rw_writer_unlock(h);
-	return -ENOENT;
+	return ret;
 }
 
 int32_t
@@ -1143,12 +1347,14 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 {
 	uint64_t hits = 0;
 	int32_t i;
+	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+	struct rte_hash_bucket *cur_bkt, *next_bkt;
 
 	/* Prefetch first keys */
 	for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++)
@@ -1266,6 +1472,34 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		continue;
 	}
 
+	/* all found, do not need to go through ext bkt */
+	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
+		if (hit_mask != NULL)
+			*hit_mask = hits;
+		__hash_rw_reader_unlock(h);
+		return;
+	}
+
+	/* need to check ext buckets for match */
+	for (i = 0; i < num_keys; i++) {
+		if ((hits & (1ULL << i)) != 0)
+			continue;
+		next_bkt = secondary_bkt[i]->next;
+		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+			if (data != NULL)
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], &data[i], cur_bkt);
+			else
+				ret = search_one_bucket(h, keys[i],
+						sec_hash[i], NULL, cur_bkt);
+			if (ret != -1) {
+				positions[i] = ret;
+				hits |= 1ULL << i;
+				break;
+			}
+		}
+	}
+
 	__hash_rw_reader_unlock(h);
 
 	if (hit_mask != NULL)
@@ -1308,10 +1542,13 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 
 	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
-	/* Out of bounds */
-	if (*next >= total_entries)
-		return -ENOENT;
+	const uint32_t total_entries_main = h->num_buckets *
+							RTE_HASH_BUCKET_ENTRIES;
+	const uint32_t total_entries = total_entries_main << 1;
+
+	/* Out of bounds of all buckets (both main table and ext table) */
+	if (*next >= total_entries_main)
+		goto extend_table;
 
 	/* Calculate bucket and index of current iterator */
 	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
@@ -1321,8 +1558,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	while ((position = h->buckets[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
-		if (*next == total_entries)
-			return -ENOENT;
+		if (*next == total_entries_main)
+			goto extend_table;
 		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
 		idx = *next % RTE_HASH_BUCKET_ENTRIES;
 	}
@@ -1340,4 +1577,34 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
 	(*next)++;
 
 	return position - 1;
+
+/* Begin to iterate extendable buckets */
+extend_table:
+	/* Out of total bound or if ext bucket feature is not enabled */
+	if (*next >= total_entries || !h->ext_table_support)
+		return -ENOENT;
+
+	bucket_idx = (*next - total_entries_main) / RTE_HASH_BUCKET_ENTRIES;
+	idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+
+	while ((position = h->buckets_ext[bucket_idx].key_idx[idx]) == EMPTY_SLOT) {
+		(*next)++;
+		if (*next == total_entries)
+			return -ENOENT;
+		bucket_idx = (*next - total_entries_main) /
+						RTE_HASH_BUCKET_ENTRIES;
+		idx = (*next - total_entries_main) % RTE_HASH_BUCKET_ENTRIES;
+	}
+	__hash_rw_reader_lock(h);
+	next_key = (struct rte_hash_key *) ((char *)h->key_store +
+				position * h->key_entry_size);
+	/* Return key and data */
+	*key = next_key->key;
+	*data = next_key->pdata;
+
+	__hash_rw_reader_unlock(h);
+
+	/* Increment iterator */
+	(*next)++;
+	return position - 1;
 }
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fc0e5c2..e601520 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -142,6 +142,8 @@ struct rte_hash_bucket {
 	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
+
+	void *next;
 } __rte_cache_aligned;
 
 /** A hash table structure. */
@@ -166,6 +168,7 @@ struct rte_hash {
 	/**< If multi-writer support is enabled. */
 	uint8_t readwrite_concur_support;
 	/**< If read-write concurrency support is enabled */
+	uint8_t ext_table_support;     /**< Enable extendable bucket table */
 	rte_hash_function hash_func;    /**< Function used to calculate hash. */
 	uint32_t hash_func_init_val;    /**< Init value used by hash_func. */
 	rte_hash_cmp_eq_t rte_hash_custom_cmp_eq;
@@ -184,6 +187,8 @@ struct rte_hash {
 	 * to the key table.
 	 */
 	rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
+	struct rte_hash_bucket *buckets_ext; /**< Extra buckets array */
+	struct rte_ring *free_ext_bkts; /**< Ring of indexes of free buckets */
 } __rte_cache_aligned;
 
 struct queue_node {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d931..11d8e28 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -37,6 +37,9 @@ extern "C" {
 /** Flag to support reader writer concurrency */
 #define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
 
+/** Flag to indicate the extendabe bucket table feature should be used */
+#define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
+
 /** Signature of key that is stored internally. */
 typedef uint32_t hash_sig_t;
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH v8 3/4] test/hash: implement extendable bucket hash test
  2018-10-22 18:39 [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 1/4] hash: fix race condition in iterate Yipeng Wang
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 2/4] hash: add extendable bucket feature Yipeng Wang
@ 2018-10-22 18:39 ` Yipeng Wang
  2018-10-23  9:04   ` Bruce Richardson
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 4/4] hash: use partial-key hashing Yipeng Wang
  2018-10-25 22:35 ` [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing Thomas Monjalon
  4 siblings, 1 reply; 10+ messages in thread
From: Yipeng Wang @ 2018-10-22 18:39 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli

This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 test/test/test_hash.c      | 159 +++++++++++++++++++++++++++++++++++++++++++--
 test/test/test_hash_perf.c | 114 +++++++++++++++++++++++---------
 2 files changed, 238 insertions(+), 35 deletions(-)

diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd..815c734 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -660,6 +660,116 @@ static int test_full_bucket(void)
 	return 0;
 }
 
+/*
+ * Similar to the test above (full bucket test), but for extendable buckets.
+ */
+static int test_extendable_bucket(void)
+{
+	struct rte_hash_parameters params_pseudo_hash = {
+		.name = "test5",
+		.entries = 64,
+		.key_len = sizeof(struct flow_key), /* 13 */
+		.hash_func = pseudo_hash,
+		.hash_func_init_val = 0,
+		.socket_id = 0,
+		.extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
+	};
+	struct rte_hash *handle;
+	int pos[64];
+	int expected_pos[64];
+	unsigned int i;
+	struct flow_key rand_keys[64];
+
+	for (i = 0; i < 64; i++) {
+		rand_keys[i].port_dst = i;
+		rand_keys[i].port_src = i+1;
+	}
+
+	handle = rte_hash_create(&params_pseudo_hash);
+	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
+
+	/* Fill bucket */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add - update */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to find key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Delete 1 key, check other keys are still found */
+	pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
+	print_key_info("Del", &rand_keys[35], pos[35]);
+	RETURN_IF_ERROR(pos[35] != expected_pos[35],
+			"failed to delete key (pos[1]=%d)", pos[35]);
+	pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
+	print_key_info("Lkp", &rand_keys[20], pos[20]);
+	RETURN_IF_ERROR(pos[20] != expected_pos[20],
+			"failed lookup after deleting key from same bucket "
+			"(pos[20]=%d)", pos[20]);
+
+	/* Go back to previous state */
+	pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
+	print_key_info("Add", &rand_keys[35], pos[35]);
+	expected_pos[35] = pos[35];
+	RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)", pos[35]);
+
+	/* Delete */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
+		print_key_info("Del", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != expected_pos[i],
+			"failed to delete key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Lookup */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+		print_key_info("Lkp", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] != -ENOENT,
+			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
+	}
+
+	/* Add again */
+	for (i = 0; i < 64; i++) {
+		pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+		print_key_info("Add", &rand_keys[i], pos[i]);
+		RETURN_IF_ERROR(pos[i] < 0,
+			"failed to add key (pos[%u]=%d)", i, pos[i]);
+		expected_pos[i] = pos[i];
+	}
+
+	rte_hash_free(handle);
+
+	/* Cover the NULL case. */
+	rte_hash_free(0);
+	return 0;
+}
+
 /******************************************************************************/
 static int
 fbk_hash_unit_test(void)
@@ -1096,7 +1206,7 @@ test_hash_creation_with_good_parameters(void)
  * Test to see the average table utilization (entries added/max entries)
  * before hitting a random entry that cannot be added
  */
-static int test_average_table_utilization(void)
+static int test_average_table_utilization(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	uint8_t simple_key[MAX_KEYSIZE];
@@ -1107,12 +1217,23 @@ static int test_average_table_utilization(void)
 
 	printf("\n# Running test to determine average utilization"
 	       "\n  before adding elements begins to fail\n");
+	if (ext_table)
+		printf("ext table is enabled\n");
+	else
+		printf("ext table is disabled\n");
+
 	printf("Measuring performance, please wait");
 	fflush(stdout);
 	ut_params.entries = 1 << 16;
 	ut_params.name = "test_average_utilization";
 	ut_params.hash_func = rte_jhash;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
+
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
 	for (j = 0; j < ITERATIONS; j++) {
@@ -1139,6 +1260,14 @@ static int test_average_table_utilization(void)
 			rte_hash_free(handle);
 			return -1;
 		}
+		if (ext_table) {
+			if (cnt != ut_params.entries) {
+				printf("rte_hash_count returned wrong value "
+					"%u, %u, %u\n", j, added_keys, cnt);
+				rte_hash_free(handle);
+				return -1;
+			}
+		}
 
 		average_keys_added += added_keys;
 
@@ -1161,7 +1290,7 @@ static int test_average_table_utilization(void)
 }
 
 #define NUM_ENTRIES 256
-static int test_hash_iteration(void)
+static int test_hash_iteration(uint32_t ext_table)
 {
 	struct rte_hash *handle;
 	unsigned i;
@@ -1177,6 +1306,11 @@ static int test_hash_iteration(void)
 	ut_params.name = "test_hash_iteration";
 	ut_params.hash_func = rte_jhash;
 	ut_params.key_len = 16;
+	if (ext_table)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+	else
+		ut_params.extra_flag &= ~RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
@@ -1186,8 +1320,13 @@ static int test_hash_iteration(void)
 		for (i = 0; i < ut_params.key_len; i++)
 			keys[added_keys][i] = rte_rand() % 255;
 		ret = rte_hash_add_key_data(handle, keys[added_keys], data[added_keys]);
-		if (ret < 0)
+		if (ret < 0) {
+			if (ext_table) {
+				printf("Insertion failed for ext table\n");
+				goto err;
+			}
 			break;
+		}
 	}
 
 	/* Iterate through the hash table */
@@ -1474,6 +1613,8 @@ test_hash(void)
 		return -1;
 	if (test_full_bucket() < 0)
 		return -1;
+	if (test_extendable_bucket() < 0)
+		return -1;
 
 	if (test_fbk_hash_find_existing() < 0)
 		return -1;
@@ -1483,9 +1624,17 @@ test_hash(void)
 		return -1;
 	if (test_hash_creation_with_good_parameters() < 0)
 		return -1;
-	if (test_average_table_utilization() < 0)
+
+	/* ext table disabled */
+	if (test_average_table_utilization(0) < 0)
+		return -1;
+	if (test_hash_iteration(0) < 0)
+		return -1;
+
+	/* ext table enabled */
+	if (test_average_table_utilization(1) < 0)
 		return -1;
-	if (test_hash_iteration() < 0)
+	if (test_hash_iteration(1) < 0)
 		return -1;
 
 	run_hash_func_tests();
diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 0d39e10..5252111 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -18,7 +18,8 @@
 #include "test.h"
 
 #define MAX_ENTRIES (1 << 19)
-#define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
+#define KEYS_TO_ADD (MAX_ENTRIES)
+#define ADD_PERCENT 0.75 /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times */
 /* BUCKET_SIZE should be same as RTE_HASH_BUCKET_ENTRIES in rte_hash library */
 #define BUCKET_SIZE 8
@@ -78,7 +79,7 @@ static struct rte_hash_parameters ut_params = {
 
 static int
 create_table(unsigned int with_data, unsigned int table_index,
-		unsigned int with_locks)
+		unsigned int with_locks, unsigned int ext)
 {
 	char name[RTE_HASH_NAMESIZE];
 
@@ -96,6 +97,9 @@ create_table(unsigned int with_data, unsigned int table_index,
 	else
 		ut_params.extra_flag = 0;
 
+	if (ext)
+		ut_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	ut_params.name = name;
 	ut_params.key_len = hashtest_key_lens[table_index];
 	ut_params.socket_id = rte_socket_id();
@@ -117,15 +121,21 @@ create_table(unsigned int with_data, unsigned int table_index,
 
 /* Shuffle the keys that have been added, so lookups will be totally random */
 static void
-shuffle_input_keys(unsigned table_index)
+shuffle_input_keys(unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	uint32_t swap_idx;
 	uint8_t temp_key[MAX_KEYSIZE];
 	hash_sig_t temp_signature;
 	int32_t temp_position;
+	unsigned int keys_to_add;
+
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = KEYS_TO_ADD - 1; i > 0; i--) {
+	for (i = keys_to_add - 1; i > 0; i--) {
 		swap_idx = rte_rand() % i;
 
 		memcpy(temp_key, keys[i], hashtest_key_lens[table_index]);
@@ -147,14 +157,20 @@ shuffle_input_keys(unsigned table_index)
  * ALL can fit in hash table (no errors)
  */
 static int
-get_input_keys(unsigned with_pushes, unsigned table_index)
+get_input_keys(unsigned int with_pushes, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j;
 	unsigned bucket_idx, incr, success = 1;
 	uint8_t k = 0;
 	int32_t ret;
 	const uint32_t bucket_bitmask = NUM_BUCKETS - 1;
+	unsigned int keys_to_add;
 
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 	/* Reset all arrays */
 	for (i = 0; i < MAX_ENTRIES; i++)
 		slot_taken[i] = 0;
@@ -171,7 +187,7 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 	 * Regardless a key has been added correctly or not (success),
 	 * the next one to try will be increased by 1.
 	 */
-	for (i = 0; i < KEYS_TO_ADD;) {
+	for (i = 0; i < keys_to_add;) {
 		incr = 0;
 		if (i != 0) {
 			keys[i][0] = ++k;
@@ -235,14 +251,20 @@ get_input_keys(unsigned with_pushes, unsigned table_index)
 }
 
 static int
-timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_adds(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *data;
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		data = (void *) ((uintptr_t) signatures[i]);
 		if (with_hash && with_data) {
 			ret = rte_hash_add_key_with_hash_data(h[table_index],
@@ -284,22 +306,31 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][ADD][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][ADD][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
 
 static int
-timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_lookups(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i, j;
 	const uint64_t start_tsc = rte_rdtsc();
 	void *ret_data;
 	void *expected_data;
 	int32_t ret;
-
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD; j++) {
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
+	for (i = 0; i < num_lookups / keys_to_add; i++) {
+		for (j = 0; j < keys_to_add; j++) {
 			if (with_hash && with_data) {
 				ret = rte_hash_lookup_with_hash_data(h[table_index],
 							(const void *) keys[j],
@@ -352,13 +383,14 @@ timed_lookups(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP][with_hash][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_lookups_multi(unsigned with_data, unsigned table_index)
+timed_lookups_multi(unsigned int with_data, unsigned int table_index,
+							unsigned int ext)
 {
 	unsigned i, j, k;
 	int32_t positions_burst[BURST_SIZE];
@@ -367,11 +399,20 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	void *ret_data[BURST_SIZE];
 	uint64_t hit_mask;
 	int ret;
+	unsigned int keys_to_add, num_lookups;
+
+	if (!ext) {
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+		num_lookups = NUM_LOOKUPS * ADD_PERCENT;
+	} else {
+		keys_to_add = KEYS_TO_ADD;
+		num_lookups = NUM_LOOKUPS;
+	}
 
 	const uint64_t start_tsc = rte_rdtsc();
 
-	for (i = 0; i < NUM_LOOKUPS/KEYS_TO_ADD; i++) {
-		for (j = 0; j < KEYS_TO_ADD/BURST_SIZE; j++) {
+	for (i = 0; i < num_lookups/keys_to_add; i++) {
+		for (j = 0; j < keys_to_add/BURST_SIZE; j++) {
 			for (k = 0; k < BURST_SIZE; k++)
 				keys_burst[k] = keys[j * BURST_SIZE + k];
 			if (with_data) {
@@ -419,19 +460,25 @@ timed_lookups_multi(unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/NUM_LOOKUPS;
+	cycles[table_index][LOOKUP_MULTI][0][with_data] = time_taken/num_lookups;
 
 	return 0;
 }
 
 static int
-timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
+timed_deletes(unsigned int with_hash, unsigned int with_data,
+				unsigned int table_index, unsigned int ext)
 {
 	unsigned i;
 	const uint64_t start_tsc = rte_rdtsc();
 	int32_t ret;
+	unsigned int keys_to_add;
+	if (!ext)
+		keys_to_add = KEYS_TO_ADD * ADD_PERCENT;
+	else
+		keys_to_add = KEYS_TO_ADD;
 
-	for (i = 0; i < KEYS_TO_ADD; i++) {
+	for (i = 0; i < keys_to_add; i++) {
 		/* There are no delete functions with data, so just call two functions */
 		if (with_hash)
 			ret = rte_hash_del_key_with_hash(h[table_index],
@@ -451,7 +498,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, unsigned table_index)
 	const uint64_t end_tsc = rte_rdtsc();
 	const uint64_t time_taken = end_tsc - start_tsc;
 
-	cycles[table_index][DELETE][with_hash][with_data] = time_taken/KEYS_TO_ADD;
+	cycles[table_index][DELETE][with_hash][with_data] = time_taken/keys_to_add;
 
 	return 0;
 }
@@ -469,7 +516,8 @@ reset_table(unsigned table_index)
 }
 
 static int
-run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
+run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks,
+						unsigned int ext)
 {
 	unsigned i, j, with_data, with_hash;
 
@@ -478,25 +526,25 @@ run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
 
 	for (with_data = 0; with_data <= 1; with_data++) {
 		for (i = 0; i < NUM_KEYSIZES; i++) {
-			if (create_table(with_data, i, with_locks) < 0)
+			if (create_table(with_data, i, with_locks, ext) < 0)
 				return -1;
 
-			if (get_input_keys(with_pushes, i) < 0)
+			if (get_input_keys(with_pushes, i, ext) < 0)
 				return -1;
 			for (with_hash = 0; with_hash <= 1; with_hash++) {
-				if (timed_adds(with_hash, with_data, i) < 0)
+				if (timed_adds(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				for (j = 0; j < NUM_SHUFFLES; j++)
-					shuffle_input_keys(i);
+					shuffle_input_keys(i, ext);
 
-				if (timed_lookups(with_hash, with_data, i) < 0)
+				if (timed_lookups(with_hash, with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_lookups_multi(with_data, i) < 0)
+				if (timed_lookups_multi(with_data, i, ext) < 0)
 					return -1;
 
-				if (timed_deletes(with_hash, with_data, i) < 0)
+				if (timed_deletes(with_hash, with_data, i, ext) < 0)
 					return -1;
 
 				/* Print a dot to show progress on operations */
@@ -632,10 +680,16 @@ test_hash_perf(void)
 				printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
 			else
 				printf("\nELEMENTS IN PRIMARY OR SECONDARY LOCATION\n");
-			if (run_all_tbl_perf_tests(with_pushes, with_locks) < 0)
+			if (run_all_tbl_perf_tests(with_pushes, with_locks, 0) < 0)
 				return -1;
 		}
 	}
+
+	printf("\n EXTENDABLE BUCKETS PERFORMANCE\n");
+
+	if (run_all_tbl_perf_tests(1, 0, 1) < 0)
+		return -1;
+
 	if (fbk_hash_perf_test() < 0)
 		return -1;
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] [PATCH v8 4/4] hash: use partial-key hashing
  2018-10-22 18:39 [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
                   ` (2 preceding siblings ...)
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-10-22 18:39 ` Yipeng Wang
  2018-10-23  9:07   ` Bruce Richardson
  2018-10-25 22:35 ` [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing Thomas Monjalon
  4 siblings, 1 reply; 10+ messages in thread
From: Yipeng Wang @ 2018-10-22 18:39 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev, yipeng1.wang, honnappa.nagarahalli

This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Basically the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 246 +++++++++++++++++++-------------------
 lib/librte_hash/rte_cuckoo_hash.h |   6 +-
 lib/librte_hash/rte_hash.h        |   5 +-
 3 files changed, 131 insertions(+), 126 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index b872caa..9a48934 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -90,6 +90,36 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const struct rte_hash *h)
 		return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, h->key_len);
 }
 
+/*
+ * We use higher 16 bits of hash as the signature value stored in table.
+ * We use the lower bits for the primary bucket
+ * location. Then we XOR primary bucket location and the signature
+ * to get the secondary bucket location. This is same as
+ * proposed in Bin Fan, et al's paper
+ * "MemC3: Compact and Concurrent MemCache with Dumber Caching and
+ * Smarter Hashing". The benefit to use
+ * XOR is that one could derive the alternative bucket location
+ * by only using the current bucket location and the signature.
+ */
+static inline uint16_t
+get_short_sig(const hash_sig_t hash)
+{
+	return hash >> 16;
+}
+
+static inline uint32_t
+get_prim_bucket_index(const struct rte_hash *h, const hash_sig_t hash)
+{
+	return hash & h->bucket_bitmask;
+}
+
+static inline uint32_t
+get_alt_bucket_index(const struct rte_hash *h,
+			uint32_t cur_bkt_idx, uint16_t sig)
+{
+	return (cur_bkt_idx ^ sig) & h->bucket_bitmask;
+}
+
 struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
@@ -327,9 +357,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
-		h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
-	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
 		h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
 	else
 #endif
@@ -417,18 +445,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
 	return h->hash_func(key, h->key_len, h->hash_func_init_val);
 }
 
-/* Calc the secondary hash value from the primary hash value of a given key */
-static inline hash_sig_t
-rte_hash_secondary_hash(const hash_sig_t primary_hash)
-{
-	static const unsigned all_bits_shift = 12;
-	static const unsigned alt_bits_xor = 0x5bd1e995;
-
-	uint32_t tag = primary_hash >> all_bits_shift;
-
-	return primary_hash ^ ((tag + 1) * alt_bits_xor);
-}
-
 int32_t
 rte_hash_count(const struct rte_hash *h)
 {
@@ -560,14 +576,13 @@ enqueue_slot_back(const struct rte_hash *h,
 /* Search a key from bucket and update its data */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
-	struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+	struct rte_hash_bucket *bkt, uint16_t sig)
 {
 	int i;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		if (bkt->sig_current[i] == sig &&
-				bkt->sig_alt[i] == alt_hash) {
+		if (bkt->sig_current[i] == sig) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -594,7 +609,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		struct rte_hash_bucket *prim_bkt,
 		struct rte_hash_bucket *sec_bkt,
 		const struct rte_hash_key *key, void *data,
-		hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+		uint16_t sig, uint32_t new_idx,
 		int32_t *ret_val)
 {
 	unsigned int i;
@@ -605,7 +620,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region in case of inserting duplicated keys.
 	 */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -613,7 +628,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -628,7 +643,6 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			prim_bkt->sig_alt[i] = alt_hash;
 			prim_bkt->key_idx[i] = new_idx;
 			break;
 		}
@@ -653,7 +667,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *alt_bkt,
 			const struct rte_hash_key *key, void *data,
 			struct queue_node *leaf, uint32_t leaf_slot,
-			hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+			uint16_t sig, uint32_t new_idx,
 			int32_t *ret_val)
 {
 	uint32_t prev_alt_bkt_idx;
@@ -674,7 +688,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	/* Check if key was inserted after last check but before this
 	 * protected region.
 	 */
-	ret = search_and_update(h, data, key, bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, bkt, sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		*ret_val = ret;
@@ -682,7 +696,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, alt_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			*ret_val = ret;
@@ -695,8 +709,9 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		prev_bkt = prev_node->bkt;
 		prev_slot = curr_node->prev_slot;
 
-		prev_alt_bkt_idx =
-			prev_bkt->sig_alt[prev_slot] & h->bucket_bitmask;
+		prev_alt_bkt_idx = get_alt_bucket_index(h,
+					prev_node->cur_bkt_idx,
+					prev_bkt->sig_current[prev_slot]);
 
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
@@ -710,10 +725,8 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 		 * Cuckoo insert to move elements back to its
 		 * primary bucket if available
 		 */
-		curr_bkt->sig_alt[curr_slot] =
-			 prev_bkt->sig_current[prev_slot];
 		curr_bkt->sig_current[curr_slot] =
-			prev_bkt->sig_alt[prev_slot];
+			prev_bkt->sig_current[prev_slot];
 		curr_bkt->key_idx[curr_slot] =
 			prev_bkt->key_idx[prev_slot];
 
@@ -723,7 +736,6 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
-	curr_bkt->sig_alt[curr_slot] = alt_hash;
 	curr_bkt->key_idx[curr_slot] = new_idx;
 
 	__hash_rw_writer_unlock(h);
@@ -741,39 +753,44 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
 			struct rte_hash_bucket *bkt,
 			struct rte_hash_bucket *sec_bkt,
 			const struct rte_hash_key *key, void *data,
-			hash_sig_t sig, hash_sig_t alt_hash,
+			uint16_t sig, uint32_t bucket_idx,
 			uint32_t new_idx, int32_t *ret_val)
 {
 	unsigned int i;
 	struct queue_node queue[RTE_HASH_BFS_QUEUE_MAX_LEN];
 	struct queue_node *tail, *head;
 	struct rte_hash_bucket *curr_bkt, *alt_bkt;
+	uint32_t cur_idx, alt_idx;
 
 	tail = queue;
 	head = queue + 1;
 	tail->bkt = bkt;
 	tail->prev = NULL;
 	tail->prev_slot = -1;
+	tail->cur_bkt_idx = bucket_idx;
 
 	/* Cuckoo bfs Search */
 	while (likely(tail != head && head <
 					queue + RTE_HASH_BFS_QUEUE_MAX_LEN -
 					RTE_HASH_BUCKET_ENTRIES)) {
 		curr_bkt = tail->bkt;
+		cur_idx = tail->cur_bkt_idx;
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			if (curr_bkt->key_idx[i] == EMPTY_SLOT) {
 				int32_t ret = rte_hash_cuckoo_move_insert_mw(h,
 						bkt, sec_bkt, key, data,
-						tail, i, sig, alt_hash,
+						tail, i, sig,
 						new_idx, ret_val);
 				if (likely(ret != -1))
 					return ret;
 			}
 
 			/* Enqueue new node and keep prev node info */
-			alt_bkt = &(h->buckets[curr_bkt->sig_alt[i]
-						    & h->bucket_bitmask]);
+			alt_idx = get_alt_bucket_index(h, cur_idx,
+						curr_bkt->sig_current[i]);
+			alt_bkt = &(h->buckets[alt_idx]);
 			head->bkt = alt_bkt;
+			head->cur_bkt_idx = alt_idx;
 			head->prev = tail;
 			head->prev_slot = i;
 			head++;
@@ -788,7 +805,7 @@ static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig, void *data)
 {
-	hash_sig_t alt_hash;
+	uint16_t short_sig;
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
@@ -803,18 +820,17 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	int32_t ret_val;
 	struct rte_hash_bucket *last;
 
-	prim_bucket_idx = sig & h->bucket_bitmask;
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
 	prim_bkt = &h->buckets[prim_bucket_idx];
-	rte_prefetch0(prim_bkt);
-
-	alt_hash = rte_hash_secondary_hash(sig);
-	sec_bucket_idx = alt_hash & h->bucket_bitmask;
 	sec_bkt = &h->buckets[sec_bucket_idx];
+	rte_prefetch0(prim_bkt);
 	rte_prefetch0(sec_bkt);
 
 	/* Check if key is already inserted in primary location */
 	__hash_rw_writer_lock(h);
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		__hash_rw_writer_unlock(h);
 		return ret;
@@ -822,12 +838,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Check if key is already inserted in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			__hash_rw_writer_unlock(h);
 			return ret;
 		}
 	}
+
 	__hash_rw_writer_unlock(h);
 
 	/* Did not find a match, so get a new slot for storing the new key */
@@ -865,7 +882,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+					short_sig, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -875,7 +892,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-					sig, alt_hash, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, new_idx, &ret_val);
 	if (ret == 0)
 		return new_idx - 1;
 	else if (ret == 1) {
@@ -885,7 +902,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-					alt_hash, sig, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, new_idx, &ret_val);
 
 	if (ret == 0)
 		return new_idx - 1;
@@ -905,14 +922,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	 */
 	__hash_rw_writer_lock(h);
 	/* We check for duplicates again since could be inserted before the lock */
-	ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+	ret = search_and_update(h, data, key, prim_bkt, short_sig);
 	if (ret != -1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		goto failure;
 	}
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_update(h, data, key, cur_bkt, alt_hash, sig);
+		ret = search_and_update(h, data, key, cur_bkt, short_sig);
 		if (ret != -1) {
 			enqueue_slot_back(h, cached_free_slots, slot_id);
 			goto failure;
@@ -924,8 +941,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
-				cur_bkt->sig_current[i] = alt_hash;
-				cur_bkt->sig_alt[i] = sig;
+				cur_bkt->sig_current[i] = short_sig;
 				cur_bkt->key_idx[i] = new_idx;
 				__hash_rw_writer_unlock(h);
 				return new_idx - 1;
@@ -943,8 +959,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = alt_hash;
-	(h->buckets_ext[bkt_id]).sig_alt[0] = sig;
+	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
@@ -1003,7 +1018,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
 
 /* Search one bucket to find the match key */
 static inline int32_t
-search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
+search_one_bucket(const struct rte_hash *h, const void *key, uint16_t sig,
 			void **data, const struct rte_hash_bucket *bkt)
 {
 	int i;
@@ -1032,30 +1047,30 @@ static inline int32_t
 __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
 					hash_sig_t sig, void **data)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *bkt, *cur_bkt;
 	int ret;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_reader_lock(h);
 
 	/* Check if key is in primary location */
-	ret = search_one_bucket(h, key, sig, data, bkt);
+	ret = search_one_bucket(h, key, short_sig, data, bkt);
 	if (ret != -1) {
 		__hash_rw_reader_unlock(h);
 		return ret;
 	}
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	bkt = &h->buckets[bucket_idx];
+	bkt = &h->buckets[sec_bucket_idx];
 
 	/* Check if key is in secondary location */
 	FOR_EACH_BUCKET(cur_bkt, bkt) {
-		ret = search_one_bucket(h, key, alt_hash, data, cur_bkt);
+		ret = search_one_bucket(h, key, short_sig, data, cur_bkt);
 		if (ret != -1) {
 			__hash_rw_reader_unlock(h);
 			return ret;
@@ -1102,7 +1117,6 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 	struct lcore_cache *cached_free_slots;
 
 	bkt->sig_current[i] = NULL_SIGNATURE;
-	bkt->sig_alt[i] = NULL_SIGNATURE;
 	if (h->multi_writer_support) {
 		lcore_id = rte_lcore_id();
 		cached_free_slots = &h->local_free_slots[lcore_id];
@@ -1141,9 +1155,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
 			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
-			cur_bkt->sig_alt[pos] = last_bkt->sig_alt[i];
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			last_bkt->sig_alt[i] = NULL_SIGNATURE;
 			last_bkt->key_idx[i] = EMPTY_SLOT;
 			return;
 		}
@@ -1153,7 +1165,7 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 /* Search one bucket and remove the matched key */
 static inline int32_t
 search_and_remove(const struct rte_hash *h, const void *key,
-			struct rte_hash_bucket *bkt, hash_sig_t sig, int *pos)
+			struct rte_hash_bucket *bkt, uint16_t sig, int *pos)
 {
 	struct rte_hash_key *k, *keys = h->key_store;
 	unsigned int i;
@@ -1185,19 +1197,21 @@ static inline int32_t
 __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 						hash_sig_t sig)
 {
-	uint32_t bucket_idx;
-	hash_sig_t alt_hash;
+	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt, *last_bkt;
 	struct rte_hash_bucket *cur_bkt;
 	int pos;
 	int32_t ret, i;
+	uint16_t short_sig;
 
-	bucket_idx = sig & h->bucket_bitmask;
-	prim_bkt = &h->buckets[bucket_idx];
+	short_sig = get_short_sig(sig);
+	prim_bucket_idx = get_prim_bucket_index(h, sig);
+	sec_bucket_idx = get_alt_bucket_index(h, prim_bucket_idx, short_sig);
+	prim_bkt = &h->buckets[prim_bucket_idx];
 
 	__hash_rw_writer_lock(h);
 	/* look for key in primary bucket */
-	ret = search_and_remove(h, key, prim_bkt, sig, &pos);
+	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
 	if (ret != -1) {
 		__rte_hash_compact_ll(prim_bkt, pos);
 		last_bkt = prim_bkt->next;
@@ -1206,12 +1220,10 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 
 	/* Calculate secondary hash */
-	alt_hash = rte_hash_secondary_hash(sig);
-	bucket_idx = alt_hash & h->bucket_bitmask;
-	sec_bkt = &h->buckets[bucket_idx];
+	sec_bkt = &h->buckets[sec_bucket_idx];
 
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-		ret = search_and_remove(h, key, cur_bkt, alt_hash, &pos);
+		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
 		if (ret != -1) {
 			__rte_hash_compact_ll(cur_bkt, pos);
 			last_bkt = sec_bkt->next;
@@ -1288,55 +1300,35 @@ static inline void
 compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
 			const struct rte_hash_bucket *prim_bkt,
 			const struct rte_hash_bucket *sec_bkt,
-			hash_sig_t prim_hash, hash_sig_t sec_hash,
+			uint16_t sig,
 			enum rte_hash_sig_compare_function sig_cmp_fn)
 {
 	unsigned int i;
 
+	/* For match mask the first bit of every two bits indicates the match */
 	switch (sig_cmp_fn) {
-#ifdef RTE_MACHINE_CPUFLAG_AVX2
-	case RTE_HASH_COMPARE_AVX2:
-		*prim_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)prim_bkt->sig_current),
-				_mm256_set1_epi32(prim_hash)));
-		*sec_hash_matches = _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
-				_mm256_load_si256(
-					(__m256i const *)sec_bkt->sig_current),
-				_mm256_set1_epi32(sec_hash)));
-		break;
-#endif
 #ifdef RTE_MACHINE_CPUFLAG_SSE2
 	case RTE_HASH_COMPARE_SSE:
-		/* Compare the first 4 signatures in the bucket */
-		*prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+		/* Compare all signatures in the bucket */
+		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)prim_bkt->sig_current),
-				_mm_set1_epi32(prim_hash)));
-		*prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&prim_bkt->sig_current[4]),
-				_mm_set1_epi32(prim_hash)))) << 4;
-		/* Compare the first 4 signatures in the bucket */
-		*sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+				_mm_set1_epi16(sig)));
+		/* Compare all signatures in the bucket */
+		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
 				_mm_load_si128(
 					(__m128i const *)sec_bkt->sig_current),
-				_mm_set1_epi32(sec_hash)));
-		*sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
-				_mm_load_si128(
-					(__m128i const *)&sec_bkt->sig_current[4]),
-				_mm_set1_epi32(sec_hash)))) << 4;
+				_mm_set1_epi16(sig)));
 		break;
 #endif
 	default:
 		for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
 			*prim_hash_matches |=
-				((prim_hash == prim_bkt->sig_current[i]) << i);
+				((sig == prim_bkt->sig_current[i]) << (i << 1));
 			*sec_hash_matches |=
-				((sec_hash == sec_bkt->sig_current[i]) << i);
+				((sig == sec_bkt->sig_current[i]) << (i << 1));
 		}
 	}
-
 }
 
 #define PREFETCH_OFFSET 4
@@ -1349,7 +1341,9 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	int32_t i;
 	int32_t ret;
 	uint32_t prim_hash[RTE_HASH_LOOKUP_BULK_MAX];
-	uint32_t sec_hash[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t prim_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint32_t sec_index[RTE_HASH_LOOKUP_BULK_MAX];
+	uint16_t sig[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *primary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
@@ -1368,10 +1362,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		rte_prefetch0(keys[i + PREFETCH_OFFSET]);
 
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1380,10 +1377,13 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	/* Calculate and prefetch rest of the buckets */
 	for (; i < num_keys; i++) {
 		prim_hash[i] = rte_hash_hash(h, keys[i]);
-		sec_hash[i] = rte_hash_secondary_hash(prim_hash[i]);
 
-		primary_bkt[i] = &h->buckets[prim_hash[i] & h->bucket_bitmask];
-		secondary_bkt[i] = &h->buckets[sec_hash[i] & h->bucket_bitmask];
+		sig[i] = get_short_sig(prim_hash[i]);
+		prim_index[i] = get_prim_bucket_index(h, prim_hash[i]);
+		sec_index[i] = get_alt_bucket_index(h, prim_index[i], sig[i]);
+
+		primary_bkt[i] = &h->buckets[prim_index[i]];
+		secondary_bkt[i] = &h->buckets[sec_index[i]];
 
 		rte_prefetch0(primary_bkt[i]);
 		rte_prefetch0(secondary_bkt[i]);
@@ -1394,10 +1394,11 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
 				primary_bkt[i], secondary_bkt[i],
-				prim_hash[i], sec_hash[i], h->sig_cmp_fn);
+				sig[i], h->sig_cmp_fn);
 
 		if (prim_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 			uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1408,7 +1409,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		}
 
 		if (sec_hitmask[i]) {
-			uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t first_hit =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 			uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
 			const struct rte_hash_key *key_slot =
 				(const struct rte_hash_key *)(
@@ -1422,7 +1424,8 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 	for (i = 0; i < num_keys; i++) {
 		positions[i] = -ENOENT;
 		while (prim_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(prim_hitmask[i]) >> 1;
 
 			uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1441,11 +1444,12 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			prim_hitmask[i] &= ~(1 << (hit_index));
+			prim_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 		while (sec_hitmask[i]) {
-			uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
+			uint32_t hit_index =
+					__builtin_ctzl(sec_hitmask[i]) >> 1;
 
 			uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
 			const struct rte_hash_key *key_slot =
@@ -1465,7 +1469,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 				positions[i] = key_idx - 1;
 				goto next_key;
 			}
-			sec_hitmask[i] &= ~(1 << (hit_index));
+			sec_hitmask[i] &= ~(3ULL << (hit_index << 1));
 		}
 
 next_key:
@@ -1488,10 +1492,10 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
 			if (data != NULL)
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], &data[i], cur_bkt);
+						sig[i], &data[i], cur_bkt);
 			else
 				ret = search_one_bucket(h, keys[i],
-						sec_hash[i], NULL, cur_bkt);
+						sig[i], NULL, cur_bkt);
 			if (ret != -1) {
 				positions[i] = ret;
 				hits |= 1ULL << i;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index e601520..7753cd8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -129,18 +129,15 @@ struct rte_hash_key {
 enum rte_hash_sig_compare_function {
 	RTE_HASH_COMPARE_SCALAR = 0,
 	RTE_HASH_COMPARE_SSE,
-	RTE_HASH_COMPARE_AVX2,
 	RTE_HASH_COMPARE_NUM
 };
 
 /** Bucket structure */
 struct rte_hash_bucket {
-	hash_sig_t sig_current[RTE_HASH_BUCKET_ENTRIES];
+	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
 	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
 
-	hash_sig_t sig_alt[RTE_HASH_BUCKET_ENTRIES];
-
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
 	void *next;
@@ -193,6 +190,7 @@ struct rte_hash {
 
 struct queue_node {
 	struct rte_hash_bucket *bkt; /* Current bucket on the bfs search */
+	uint32_t cur_bkt_idx;
 
 	struct queue_node *prev;     /* Parent(bucket) in search path */
 	int prev_slot;               /* Parent(slot) in search path */
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 11d8e28..6ace64e 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -40,7 +40,10 @@ extern "C" {
 /** Flag to indicate the extendabe bucket table feature should be used */
 #define RTE_HASH_EXTRA_FLAGS_EXT_TABLE 0x08
 
-/** Signature of key that is stored internally. */
+/**
+ * The type of hash value of a key.
+ * It should be a value of at least 32bit with fully random pattern.
+ */
 typedef uint32_t hash_sig_t;
 
 /** Type of function that can be used for calculating the hash value. */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/4] hash: fix race condition in iterate
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 1/4] hash: fix race condition in iterate Yipeng Wang
@ 2018-10-23  8:53   ` Bruce Richardson
  0 siblings, 0 replies; 10+ messages in thread
From: Bruce Richardson @ 2018-10-23  8:53 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, honnappa.nagarahalli

On Mon, Oct 22, 2018 at 11:39:45AM -0700, Yipeng Wang wrote:
> In rte_hash_iterate, the reader lock did not protect the
> while loop which checks empty entry. This created a race
> condition that the entry may become empty when enters
> the lock, then a wrong key data value would be read out.
> 
> This commit reads out the position in the while condition,
> which makes sure that the position will not be changed
> to empty before entering the lock.
> 
> Fixes: f2e3001b53ec ("hash: support read/write concurrency")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Reported-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v8 2/4] hash: add extendable bucket feature
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 2/4] hash: add extendable bucket feature Yipeng Wang
@ 2018-10-23  8:55   ` Bruce Richardson
  0 siblings, 0 replies; 10+ messages in thread
From: Bruce Richardson @ 2018-10-23  8:55 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, honnappa.nagarahalli

On Mon, Oct 22, 2018 at 11:39:46AM -0700, Yipeng Wang wrote:
> In use cases that hash table capacity needs to be guaranteed,
> the extendable bucket feature can be used to contain extra
> keys in linked lists when conflict happens. This is similar
> concept to the extendable bucket hash table in packet
> framework.
> 
> This commit adds the extendable bucket feature. User can turn
> it on or off through the extra flag field during table
> creation time.
> 
> Extendable bucket table composes of buckets that can be
> linked list to current main table. When extendable bucket
> is enabled, the hash table load can always achieve 100%.
> In other words, the table can always accommodate the same
> number of keys as the specified table size. This provides
> 100% table capacity guarantee.
> 
> Although keys ending up in the ext buckets may have longer
> look up time, they should be rare due to the cuckoo
> algorithm.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v8 3/4] test/hash: implement extendable bucket hash test
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
@ 2018-10-23  9:04   ` Bruce Richardson
  0 siblings, 0 replies; 10+ messages in thread
From: Bruce Richardson @ 2018-10-23  9:04 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, honnappa.nagarahalli

On Mon, Oct 22, 2018 at 11:39:47AM -0700, Yipeng Wang wrote:
> This commit changes the current rte_hash unit test to
> test the extendable table feature and performance.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v8 4/4] hash: use partial-key hashing
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 4/4] hash: use partial-key hashing Yipeng Wang
@ 2018-10-23  9:07   ` Bruce Richardson
  0 siblings, 0 replies; 10+ messages in thread
From: Bruce Richardson @ 2018-10-23  9:07 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, honnappa.nagarahalli

On Mon, Oct 22, 2018 at 11:39:48AM -0700, Yipeng Wang wrote:
> This commit changes the hashing mechanism to "partial-key
> hashing" to calculate bucket index and signature of key.
> 
> This is  proposed in Bin Fan, et al's paper
> "MemC3: Compact and Concurrent MemCache with Dumber Caching
> and Smarter Hashing". Basically the idea is to use "xor" to
> derive alternative bucket from current bucket index and
> signature.
> 
> With "partial-key hashing", it reduces the bucket memory
> requirement from two cache lines to one cache line, which
> improves the memory efficiency and thus the lookup speed.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Dharmik Thakkar <dharmik.thakkar@arm.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing
  2018-10-22 18:39 [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
                   ` (3 preceding siblings ...)
  2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 4/4] hash: use partial-key hashing Yipeng Wang
@ 2018-10-25 22:35 ` Thomas Monjalon
  4 siblings, 0 replies; 10+ messages in thread
From: Thomas Monjalon @ 2018-10-25 22:35 UTC (permalink / raw)
  To: Yipeng Wang; +Cc: dev, bruce.richardson, honnappa.nagarahalli

22/10/2018 20:39, Yipeng Wang:
> This patch has dependency on another bug fix patch set:
> http://patchwork.dpdk.org/cover/45611/
> 
> This patch set makes two major optimizations over the current rte_hash
> library.
> 
> First, it adds Extendable Bucket Table feature: a new structure that can
> accommodate keys that failed to get inserted into the main hash table due to
> the unlikely event of excessive hash collisions. The hash table buckets will
> get extended using a linked list to host these keys. This new design will
> guarantee insertion of 100% of the keys for a given hash table size with
> minimal overhead. A new flag value is added for user to indicate if the
> extendable bucket feature should be enabled or not. The linked list buckets is
> similar concept to the extendable bucket hash table in packet framework.
> In details, for insertion, the linked buckets will be used to store the keys
> that fail to get in the primary and the secondary bucket and the cuckoo path
> could not find an empty location for the maximum path length (small
> probability). For lookup, the key is checked first in the primary, then the
> secondary, then if the secondary is extended the linked list is traversed
> for a possible match.
> 
> Second, the patch set changes the current hashing algorithm to be "partial-key
> hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
> "MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
> Hashing". Instead of storing both 32-bit signature and alternative signature
> in the bucket, we only store a small 16-bit signature and calculate the
> alternative bucket index by XORing the signature with the current bucket index.
> This doubles the hash table memory efficiency since now one bucket
> only occupies one cache line instead of two in the original design.
> 
> v7->v8:
> patchwork splits the V7 into two patch series and the first one got merged
> into V6.
> Try to post again to resolve this issue.
> 
> v6->v7:
> Fix a bug accidentally introduced in V5. In iterate function, the main table
> copmleted condition should be 
> *next == total_entries_main instead of total_entries
> 
> v5->v6:
> 1. hash: fix a typo in comment, found by Honnappa.
> 2. Fix typos in commit messages.
> 3. Add review-by and acked-by.
> 
> v4->v5:
> 1. hash: for the first commit, move back the lock and read "position" in the
> while condition as Honnappa suggested.
> 2. hash: minor coding style change (Honnappa) and commit message typo fix.
> 3. Add Review-by from Honnappa.
> 
> v3->v4:
> 1. hash: Revise commit message to be more clear for "utilization" (Honnappa)
> 2. hash: in delete key function, return bucket change to use rte_ring_sp_enqueue
> instead of rte_ring_mp_enqueue, since it is already protected inside locks.
> 3. hash: update rte_hash_iterate comments (Honnappa)
> 4. hash: Add a new commit to fix race condition in the rte_hash_iterate (Honnappa)
> 5. hash/test: during utilization test, double check rte_hash_cnt returns correct
> value (Honnappa)
> 6. hash: for partial-key-hashing commit, break the get_buckets_index function
> into three. It may make future extension easier (Honnappa)
> 7. hash: change the comment for typedef uint32_t hash_sig_t to be more clear
> to users (Honnappa)
> 
> v2->v3:
> The first four commits were separated from this patch set as another
> independent patch set:
> https://mails.dpdk.org/archives/dev/2018-September/113118.html
> 1. hash: move snprintf for ext_ring name under the ext_table condition.
> 2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
> 3. hash: after failing cuckoo path, search not only ext buckets, but also the
> secondary bucket first to see if there may be an empty location now.
> 4. hash: totally rewrote the key deleting function logic. If the deleted key was
> not in the last bucket of the linked list when ext table enabled, the last
> entry in the linked list will be placed in the vacant slot from the deleted
> key. The purpose is to compact the entries in the linked list to be more close
> to the main table. This is to make sure that not many extendable buckets are
> wasted with only one or two entries after some time of running, also benefit
> lookup speed.
> 5. Other minor coding style/comments improvements.
> 
> V1->V2:
> 1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
> 2. hash: Reorder the rte_hash struct to align cache line better.
> 3. test: Minor changes in auto test to add key insertion failure check during
> iteration test.
> 4. test: Add new commit to fix read-write test non-consecutive core issue.
> 4. hash: Add a new commit to remove unnecessary code introduced by previous
> patches.
> 5. hash: Comments improvement and coding style improvements over multiple
> places.
> 
> Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
> 
> 
> Yipeng Wang (4):
>   hash: fix race condition in iterate
>   hash: add extendable bucket feature
>   test/hash: implement extendable bucket hash test
>   hash: use partial-key hashing

Applied, thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-10-25 22:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-22 18:39 [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing Yipeng Wang
2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 1/4] hash: fix race condition in iterate Yipeng Wang
2018-10-23  8:53   ` Bruce Richardson
2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 2/4] hash: add extendable bucket feature Yipeng Wang
2018-10-23  8:55   ` Bruce Richardson
2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 3/4] test/hash: implement extendable bucket hash test Yipeng Wang
2018-10-23  9:04   ` Bruce Richardson
2018-10-22 18:39 ` [dpdk-dev] [PATCH v8 4/4] hash: use partial-key hashing Yipeng Wang
2018-10-23  9:07   ` Bruce Richardson
2018-10-25 22:35 ` [dpdk-dev] [PATCH v8 0/4] hash: add extendable bucket and partial key hashing Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).