[dpdk-dev] [RFC 0/2] hash: add lock free support for ext bkt

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [RFC 0/2] hash: add lock free support for ext bkt
@ 2019-03-02  0:23 Dharmik Thakkar
  2019-03-02  0:23 ` [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket Dharmik Thakkar
  2019-03-02  0:23 ` [dpdk-dev] [RFC 2/2] test/hash: lock-free rw concurrency test ext bkt Dharmik Thakkar
  0 siblings, 2 replies; 9+ messages in thread
From: Dharmik Thakkar @ 2019-03-02  0:23 UTC (permalink / raw)
  Cc: dev, Dharmik Thakkar

This patch series:
- Enables lock-free read-write concurrency support for extendable
bucket feature.
- Adds lock-free read-write concurrency tests for ext bkt

Dharmik Thakkar (2):
  hash: add lock free support for extendable bucket
  test/hash: lock-free rw concurrency test ext bkt

 app/test/test_hash_readwrite_lf.c  | 350 +++++++++++++++++++++++------
 doc/guides/prog_guide/hash_lib.rst |   3 +-
 lib/librte_hash/rte_cuckoo_hash.c  | 150 +++++++++----
 lib/librte_hash/rte_cuckoo_hash.h  |   8 +
 4 files changed, 394 insertions(+), 117 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket
  2019-03-02  0:23 [dpdk-dev] [RFC 0/2] hash: add lock free support for ext bkt Dharmik Thakkar
@ 2019-03-02  0:23 ` Dharmik Thakkar
  2019-03-07 17:49   ` Wang, Yipeng1
  2019-03-02  0:23 ` [dpdk-dev] [RFC 2/2] test/hash: lock-free rw concurrency test ext bkt Dharmik Thakkar
  1 sibling, 1 reply; 9+ messages in thread
From: Dharmik Thakkar @ 2019-03-02  0:23 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Pablo de Lara,
	John McNamara, Marko Kovacevic
  Cc: dev, Dharmik Thakkar

This patch enables lock-free read-write concurrency support for
extendable bucket feature.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 doc/guides/prog_guide/hash_lib.rst |   3 +-
 lib/librte_hash/rte_cuckoo_hash.c  | 150 +++++++++++++++++++----------
 lib/librte_hash/rte_cuckoo_hash.h  |   8 ++
 3 files changed, 110 insertions(+), 51 deletions(-)

diff --git a/doc/guides/prog_guide/hash_lib.rst b/doc/guides/prog_guide/hash_lib.rst
index 85a6edfa8b16..b00446e949ba 100644
--- a/doc/guides/prog_guide/hash_lib.rst
+++ b/doc/guides/prog_guide/hash_lib.rst
@@ -108,8 +108,7 @@ Extendable Bucket Functionality support
 An extra flag is used to enable this functionality (flag is not set by default). When the (RTE_HASH_EXTRA_FLAGS_EXT_TABLE) is set and
 in the very unlikely case due to excessive hash collisions that a key has failed to be inserted, the hash table bucket is extended with a linked
 list to insert these failed keys. This feature is important for the workloads (e.g. telco workloads) that need to insert up to 100% of the
-hash table size and can't tolerate any key insertion failure (even if very few). Currently the extendable bucket is not supported
-with the lock-free concurrency implementation (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF).
+hash table size and can't tolerate any key insertion failure (even if very few).
 
 
 Implementation Details (non Extendable Bucket Case)
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index c01489ba5193..54762533ce36 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -170,15 +170,6 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		return NULL;
 	}
 
-	if ((params->extra_flag & RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF) &&
-	    (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)) {
-		rte_errno = EINVAL;
-		RTE_LOG(ERR, HASH, "rte_hash_create: extendable bucket "
-			"feature not supported with rw concurrency "
-			"lock free\n");
-		return NULL;
-	}
-
 	/* Check extra flags field to check extra options. */
 	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT)
 		hw_trans_mem_support = 1;
@@ -1054,7 +1045,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
 				cur_bkt->sig_current[i] = short_sig;
-				cur_bkt->key_idx[i] = new_idx;
+				/* Key can be of arbitrary length, so it is
+				 * not possible to store it atomically.
+				 * Hence the new key element's memory stores
+				 * (key as well as data) should be complete
+				 * before it is referenced.
+				 */
+				__atomic_store_n(&cur_bkt->key_idx[i],
+						 new_idx,
+						 __ATOMIC_RELEASE);
 				__hash_rw_writer_unlock(h);
 				return new_idx - 1;
 			}
@@ -1072,10 +1071,23 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
-	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
+	/* Key can be of arbitrary length, so it is
+	 * not possible to store it atomically.
+	 * Hence the new key element's memory stores
+	 * (key as well as data) should be complete
+	 * before it is referenced.
+	 */
+	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
+			 new_idx,
+			 __ATOMIC_RELEASE);
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
-	last->next = &h->buckets_ext[bkt_id];
+	/* New bucket's memory stores (key as well as data)
+	 * should be complete before it is referenced
+	 */
+	__atomic_store_n(&last->next,
+			 &h->buckets_ext[bkt_id],
+			 __ATOMIC_RELEASE);
 	__hash_rw_writer_unlock(h);
 	return new_idx - 1;
 
@@ -1366,7 +1378,8 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
  * empty slot.
  */
 static inline void
-__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
+__rte_hash_compact_ll(const struct rte_hash *h,
+			struct rte_hash_bucket *cur_bkt, int pos) {
 	int i;
 	struct rte_hash_bucket *last_bkt;
 
@@ -1377,10 +1390,27 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
 
 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
-			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
+			__atomic_store_n(&cur_bkt->key_idx[pos],
+					 last_bkt->key_idx[i],
+					 __ATOMIC_RELEASE);
+			if (h->readwrite_concur_lf_support) {
+				/* Inform the readers that the table has changed
+				 * Since there is one writer, load acquires on
+				 * tbl_chng_cnt are not required.
+				 */
+				__atomic_store_n(h->tbl_chng_cnt,
+					 *h->tbl_chng_cnt + 1,
+					 __ATOMIC_RELEASE);
+				/* The stores to sig_alt and sig_current should
+				 * not move above the store to tbl_chng_cnt.
+				 */
+				__atomic_thread_fence(__ATOMIC_RELEASE);
+			}
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			last_bkt->key_idx[i] = EMPTY_SLOT;
+			__atomic_store_n(&last_bkt->key_idx[i],
+					 EMPTY_SLOT,
+					 __ATOMIC_RELEASE);
 			return;
 		}
 	}
@@ -1449,7 +1479,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	/* look for key in primary bucket */
 	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
 	if (ret != -1) {
-		__rte_hash_compact_ll(prim_bkt, pos);
+		__rte_hash_compact_ll(h, prim_bkt, pos);
 		last_bkt = prim_bkt->next;
 		prev_bkt = prim_bkt;
 		goto return_bkt;
@@ -1461,7 +1491,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
 		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
 		if (ret != -1) {
-			__rte_hash_compact_ll(cur_bkt, pos);
+			__rte_hash_compact_ll(h, cur_bkt, pos);
 			last_bkt = sec_bkt->next;
 			prev_bkt = sec_bkt;
 			goto return_bkt;
@@ -1488,11 +1518,21 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 	}
 	/* found empty bucket and recycle */
 	if (i == RTE_HASH_BUCKET_ENTRIES) {
-		prev_bkt->next = last_bkt->next = NULL;
+		__atomic_store_n(&prev_bkt->next,
+				 NULL,
+				 __ATOMIC_RELEASE);
 		uint32_t index = last_bkt - h->buckets_ext + 1;
-		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
-	}
+		if (!h->no_free_on_del)
+			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+		else {
+			struct rte_hash_key *key_slot =
+				(struct rte_hash_key *)(
+				(char *)h->key_store +
+				ret * h->key_entry_size);
+			key_slot->ext_bkt_to_free = index;
 
+		}
+	}
 	__hash_rw_writer_unlock(h);
 	return ret;
 }
@@ -1567,6 +1607,14 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 				(void *)((uintptr_t)position));
 	}
 
+	const struct rte_hash_key *key_slot = (const struct rte_hash_key *)(
+						(const char *)h->key_store +
+						position * h->key_entry_size);
+	uint32_t index = key_slot->ext_bkt_to_free;
+	if (!index)
+		/* Recycle empty ext bkt to free list. */
+		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+
 	return 0;
 }
 
@@ -1855,6 +1903,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 		rte_prefetch0(secondary_bkt[i]);
 	}
 
+	for (i = 0; i < num_keys; i++)
+		positions[i] = -ENOENT;
+
 	do {
 		/* Load the table change counter before the lookup
 		 * starts. Acquire semantics will make sure that
@@ -1899,7 +1950,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 
 		/* Compare keys, first hits in primary first */
 		for (i = 0; i < num_keys; i++) {
-			positions[i] = -ENOENT;
 			while (prim_hitmask[i]) {
 				uint32_t hit_index =
 						__builtin_ctzl(prim_hitmask[i])
@@ -1972,6 +2022,36 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 			continue;
 		}
 
+
+		/* all found, do not need to go through ext bkt */
+		if (hits == ((1ULL << num_keys) - 1)) {
+			if (hit_mask != NULL)
+				*hit_mask = hits;
+			return;
+		}
+		/* need to check ext buckets for match */
+		if (h->ext_table_support) {
+			for (i = 0; i < num_keys; i++) {
+				if ((hits & (1ULL << i)) != 0)
+					continue;
+				next_bkt = secondary_bkt[i]->next;
+				FOR_EACH_BUCKET(cur_bkt, next_bkt) {
+					if (data != NULL)
+						ret = search_one_bucket_lf(h,
+							keys[i], sig[i],
+							&data[i], cur_bkt);
+					else
+						ret = search_one_bucket_lf(h,
+								keys[i], sig[i],
+								NULL, cur_bkt);
+					if (ret != -1) {
+						positions[i] = ret;
+						hits |= 1ULL << i;
+						break;
+					}
+				}
+			}
+		}
 		/* The loads of sig_current in compare_signatures
 		 * should not move below the load from tbl_chng_cnt.
 		 */
@@ -1988,34 +2068,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					__ATOMIC_ACQUIRE);
 	} while (cnt_b != cnt_a);
 
-	/* all found, do not need to go through ext bkt */
-	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
-		if (hit_mask != NULL)
-			*hit_mask = hits;
-		__hash_rw_reader_unlock(h);
-		return;
-	}
-
-	/* need to check ext buckets for match */
-	for (i = 0; i < num_keys; i++) {
-		if ((hits & (1ULL << i)) != 0)
-			continue;
-		next_bkt = secondary_bkt[i]->next;
-		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
-			if (data != NULL)
-				ret = search_one_bucket_lf(h, keys[i],
-						sig[i], &data[i], cur_bkt);
-			else
-				ret = search_one_bucket_lf(h, keys[i],
-						sig[i], NULL, cur_bkt);
-			if (ret != -1) {
-				positions[i] = ret;
-				hits |= 1ULL << i;
-				break;
-			}
-		}
-	}
-
 	if (hit_mask != NULL)
 		*hit_mask = hits;
 }
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index eacdaa8d4684..062cc2dd0296 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -129,6 +129,14 @@ struct lcore_cache {
 
 /* Structure that stores key-value pair */
 struct rte_hash_key {
+	/* Stores index of an empty ext bkt to be recycled on calling
+	 * rte_hash_del_xxx APIs. When lock free read-wrie concurrency is
+	 * enabled, an empty ext bkt cannot be put into free list immediately
+	 * (as readers might be using it still). Hence freeing of the ext bkt
+	 * is piggy-backed to freeing of the key index.
+	 */
+	uint32_t ext_bkt_to_free;
+
 	union {
 		uintptr_t idata;
 		void *pdata;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [RFC 2/2] test/hash: lock-free rw concurrency test ext bkt
  2019-03-02  0:23 [dpdk-dev] [RFC 0/2] hash: add lock free support for ext bkt Dharmik Thakkar
  2019-03-02  0:23 ` [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket Dharmik Thakkar
@ 2019-03-02  0:23 ` Dharmik Thakkar
  1 sibling, 0 replies; 9+ messages in thread
From: Dharmik Thakkar @ 2019-03-02  0:23 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Pablo de Lara
  Cc: dev, Dharmik Thakkar

Add unit test to check for hash lookup and bulk-lookup perf.
Test with lock-free enabled and with lock-free disabled.

Test include:

- hash lookup on keys in ext bkt,
hash delete causing key-shifts of keys from ext bkt to secondary bkt

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
---
 app/test/test_hash_readwrite_lf.c | 350 ++++++++++++++++++++++++------
 1 file changed, 284 insertions(+), 66 deletions(-)

diff --git a/app/test/test_hash_readwrite_lf.c b/app/test/test_hash_readwrite_lf.c
index cbfd9322696b..0bd189c981b2 100644
--- a/app/test/test_hash_readwrite_lf.c
+++ b/app/test/test_hash_readwrite_lf.c
@@ -41,6 +41,12 @@
 #define READ_PASS_SHIFT_PATH 4
 #define READ_PASS_NON_SHIFT_PATH 8
 #define BULK_LOOKUP 16
+#define READ_PASS_KEY_SHIFTS_EXTBKT 32
+
+#define WRITE_NO_KEY_SHIFT 0
+#define WRITE_KEY_SHIFT 1
+#define WRITE_EXT_BKT 2
+
 #define NUM_TEST 3
 unsigned int rwc_core_cnt[NUM_TEST] = {1, 2, 4};
 
@@ -51,6 +57,7 @@ struct rwc_perf {
 	uint32_t w_ks_r_hit_sp[2][NUM_TEST];
 	uint32_t w_ks_r_miss[2][NUM_TEST];
 	uint32_t multi_rw[NUM_TEST - 1][2][NUM_TEST];
+	uint32_t w_ks_r_hit_extbkt[2][NUM_TEST];
 };
 
 static struct rwc_perf rwc_lf_results, rwc_non_lf_results;
@@ -62,11 +69,15 @@ struct {
 	uint32_t *keys_absent;
 	uint32_t *keys_shift_path;
 	uint32_t *keys_non_shift_path;
+	uint32_t *keys_ext_bkt;
+	uint32_t *keys_ks_extbkt;
 	uint32_t count_keys_no_ks;
 	uint32_t count_keys_ks;
 	uint32_t count_keys_absent;
 	uint32_t count_keys_shift_path;
 	uint32_t count_keys_non_shift_path;
+	uint32_t count_keys_extbkt;
+	uint32_t count_keys_ks_extbkt;
 	uint32_t single_insert;
 	struct rte_hash *h;
 } tbl_rwc_test_param;
@@ -81,6 +92,35 @@ uint16_t enabled_core_ids[RTE_MAX_LCORE];
 
 uint8_t *scanned_bkts;
 
+static inline uint16_t
+get_short_sig(const hash_sig_t hash)
+{
+	return hash >> 16;
+}
+
+static inline uint32_t
+get_prim_bucket_index(__attribute__((unused)) const struct rte_hash *h,
+		      const hash_sig_t hash)
+{
+	uint32_t num_buckets;
+	uint32_t bucket_bitmask;
+	num_buckets  = rte_align32pow2(TOTAL_ENTRY) / 8;
+	bucket_bitmask = num_buckets - 1;
+	return hash & bucket_bitmask;
+}
+
+static inline uint32_t
+get_alt_bucket_index(__attribute__((unused)) const struct rte_hash *h,
+			uint32_t cur_bkt_idx, uint16_t sig)
+{
+	uint32_t num_buckets;
+	uint32_t bucket_bitmask;
+	num_buckets  = rte_align32pow2(TOTAL_ENTRY) / 8;
+	bucket_bitmask = num_buckets - 1;
+	return (cur_bkt_idx ^ sig) & bucket_bitmask;
+}
+
+
 static inline int
 get_enabled_cores_list(void)
 {
@@ -168,9 +208,12 @@ generate_keys(void)
 	uint32_t *keys_ks = NULL;
 	uint32_t *keys_absent = NULL;
 	uint32_t *keys_non_shift_path = NULL;
+	uint32_t *keys_ext_bkt = NULL;
+	uint32_t *keys_ks_extbkt = NULL;
 	uint32_t *found = NULL;
 	uint32_t count_keys_no_ks = 0;
 	uint32_t count_keys_ks = 0;
+	uint32_t count_keys_extbkt = 0;
 	uint32_t i;
 
 	/*
@@ -251,14 +294,32 @@ generate_keys(void)
 		goto err;
 	}
 
+	/*
+	 * This consist of keys which will be stored in extended buckets
+	 */
+	keys_ext_bkt = rte_malloc(NULL, sizeof(uint32_t) * TOTAL_INSERT, 0);
+	if (keys_ext_bkt == NULL) {
+		printf("RTE_MALLOC failed\n");
+		goto err;
+	}
+
+	/*
+	 * This consist of keys which when deleted causes shifting of keys
+	 * in extended buckets to respective secondary buckets
+	 */
+	keys_ks_extbkt = rte_malloc(NULL, sizeof(uint32_t) * TOTAL_INSERT, 0);
+	if (keys_ks_extbkt == NULL) {
+		printf("RTE_MALLOC failed\n");
+		goto err;
+	}
 
 	hash_sig_t sig;
 	uint32_t prim_bucket_idx;
-	int ret;
+	uint32_t sec_bucket_idx;
+	uint16_t short_sig;
 	uint32_t num_buckets;
-	uint32_t bucket_bitmask;
 	num_buckets  = rte_align32pow2(TOTAL_ENTRY) / 8;
-	bucket_bitmask = num_buckets - 1;
+	int ret;
 
 	/*
 	 * Used to mark bkts in which at least one key was shifted to its
@@ -275,6 +336,8 @@ generate_keys(void)
 	tbl_rwc_test_param.keys_ks = keys_ks;
 	tbl_rwc_test_param.keys_absent = keys_absent;
 	tbl_rwc_test_param.keys_non_shift_path = keys_non_shift_path;
+	tbl_rwc_test_param.keys_ext_bkt = keys_ext_bkt;
+	tbl_rwc_test_param.keys_ks_extbkt = keys_ks_extbkt;
 	/* Generate keys by adding previous two keys, neglect overflow */
 	printf("Generating keys...\n");
 	keys[0] = 0;
@@ -287,7 +350,8 @@ generate_keys(void)
 		/* Check if primary bucket has space.*/
 		sig = rte_hash_hash(tbl_rwc_test_param.h,
 					tbl_rwc_test_param.keys+i);
-		prim_bucket_idx = sig & bucket_bitmask;
+		prim_bucket_idx = get_prim_bucket_index(tbl_rwc_test_param.h,
+							sig);
 		ret = check_bucket(prim_bucket_idx, keys[i]);
 		if (ret < 0) {
 			/*
@@ -368,6 +432,47 @@ generate_keys(void)
 	tbl_rwc_test_param.count_keys_absent = count_keys_absent;
 	tbl_rwc_test_param.count_keys_non_shift_path = count;
 
+	memset(scanned_bkts, 0, num_buckets);
+	count = 0;
+	/* Find keys that will be in extended buckets */
+	for (i = 0; i < count_keys_ks; i++) {
+		ret = rte_hash_add_key(tbl_rwc_test_param.h, keys_ks + i);
+		if (ret < 0) {
+			/* Key will be added to ext bkt */
+			keys_ext_bkt[count_keys_extbkt++] = keys_ks[i];
+			/* Sec bkt to be added to keys_ks_extbkt */
+			sig = rte_hash_hash(tbl_rwc_test_param.h,
+					tbl_rwc_test_param.keys_ks + i);
+			prim_bucket_idx = get_prim_bucket_index(
+						tbl_rwc_test_param.h, sig);
+			short_sig = get_short_sig(sig);
+			sec_bucket_idx = get_alt_bucket_index(
+						tbl_rwc_test_param.h,
+						prim_bucket_idx, short_sig);
+			if (scanned_bkts[sec_bucket_idx] == 0)
+				scanned_bkts[sec_bucket_idx] = 1;
+		}
+	}
+
+	/* Find keys that will shift keys in ext bucket*/
+	for (i = 0; i < num_buckets; i++) {
+		if (scanned_bkts[i] == 1) {
+			iter = i * 8;
+			while (rte_hash_iterate(tbl_rwc_test_param.h,
+				&next_key, &next_data, &iter) >= 0) {
+				/* Check if key belongs to the current bucket */
+				if (i >= (iter-1)/8)
+					keys_ks_extbkt[count++]
+						= *(const uint32_t *)next_key;
+				else
+					break;
+			}
+		}
+	}
+
+	tbl_rwc_test_param.count_keys_ks_extbkt = count;
+	tbl_rwc_test_param.count_keys_extbkt = count_keys_extbkt;
+
 	printf("\nCount of keys NOT causing shifting of existing keys to "
 	"alternate location: %d\n", tbl_rwc_test_param.count_keys_no_ks);
 	printf("\nCount of keys causing shifting of existing keys to alternate "
@@ -378,6 +483,10 @@ generate_keys(void)
 	       tbl_rwc_test_param.count_keys_shift_path);
 	printf("Count of keys not likely to be on the shift path: %d\n\n",
 	       tbl_rwc_test_param.count_keys_non_shift_path);
+	printf("Count of keys in extended buckets: %d\n\n",
+	       tbl_rwc_test_param.count_keys_extbkt);
+	printf("Count of keys shifting keys in ext buckets: %d\n\n",
+	       tbl_rwc_test_param.count_keys_ks_extbkt);
 
 	rte_free(found);
 	rte_hash_free(tbl_rwc_test_param.h);
@@ -390,12 +499,14 @@ generate_keys(void)
 	rte_free(keys_absent);
 	rte_free(found);
 	rte_free(tbl_rwc_test_param.keys_shift_path);
+	rte_free(keys_ext_bkt);
+	rte_free(keys_ks_extbkt);
 	rte_free(scanned_bkts);
 	return -1;
 }
 
 static int
-init_params(int rwc_lf, int use_jhash, int htm)
+init_params(int rwc_lf, int use_jhash, int htm, int ext_bkt)
 {
 	struct rte_hash *handle;
 
@@ -425,6 +536,9 @@ init_params(int rwc_lf, int use_jhash, int htm)
 			RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY |
 			RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD;
 
+	if (ext_bkt)
+		hash_params.extra_flag |= RTE_HASH_EXTRA_FLAGS_EXT_TABLE;
+
 	hash_params.name = "tests";
 
 	handle = rte_hash_create(&hash_params);
@@ -452,7 +566,7 @@ test_rwc_reader(__attribute__((unused)) void *arg)
 	void *temp_a[BULK_LOOKUP_SIZE];
 
 	/* Used to identify keys not inserted in the hash table */
-	pos = rte_zmalloc(NULL, sizeof(uint32_t) * BULK_LOOKUP_SIZE, 0);
+	pos = rte_malloc(NULL, sizeof(uint32_t) * BULK_LOOKUP_SIZE, 0);
 	if (pos == NULL) {
 		printf("RTE_MALLOC failed\n");
 		return -1;
@@ -467,6 +581,9 @@ test_rwc_reader(__attribute__((unused)) void *arg)
 	} else if (read_type & READ_PASS_SHIFT_PATH) {
 		keys = tbl_rwc_test_param.keys_shift_path;
 		read_cnt = tbl_rwc_test_param.count_keys_shift_path;
+	} else if (read_type & READ_PASS_KEY_SHIFTS_EXTBKT) {
+		keys = tbl_rwc_test_param.keys_ext_bkt;
+		read_cnt = tbl_rwc_test_param.count_keys_extbkt;
 	} else {
 		keys = tbl_rwc_test_param.keys_non_shift_path;
 		read_cnt = tbl_rwc_test_param.count_keys_non_shift_path;
@@ -482,7 +599,6 @@ test_rwc_reader(__attribute__((unused)) void *arg)
 				/* Array of  pointer to the list of keys */
 				for (j = 0; j < BULK_LOOKUP_SIZE; j++)
 					temp_a[j] = keys + i + j;
-
 				rte_hash_lookup_bulk(tbl_rwc_test_param.h,
 						   (const void **)
 						   ((uintptr_t)temp_a),
@@ -539,22 +655,25 @@ test_rwc_reader(__attribute__((unused)) void *arg)
 }
 
 static int
-write_keys(uint8_t key_shift)
+write_keys(uint8_t write_type)
 {
 	uint32_t i;
 	int ret;
 	uint32_t key_cnt;
 	uint32_t *keys;
-	if (key_shift) {
+	if (write_type == WRITE_KEY_SHIFT) {
 		key_cnt = tbl_rwc_test_param.count_keys_ks;
 		keys = tbl_rwc_test_param.keys_ks;
-	} else {
+	} else if (write_type == WRITE_NO_KEY_SHIFT) {
 		key_cnt = tbl_rwc_test_param.count_keys_no_ks;
 		keys = tbl_rwc_test_param.keys_no_ks;
+	} else if (write_type == WRITE_EXT_BKT) {
+		key_cnt = tbl_rwc_test_param.count_keys_extbkt;
+		keys = tbl_rwc_test_param.keys_ext_bkt;
 	}
 	for (i = 0; i < key_cnt; i++) {
 		ret = rte_hash_add_key(tbl_rwc_test_param.h, keys + i);
-		if (!key_shift && ret < 0) {
+		if ((write_type == WRITE_NO_KEY_SHIFT) && ret < 0) {
 			printf("writer failed %"PRIu32"\n", i);
 			return -1;
 		}
@@ -581,18 +700,18 @@ test_rwc_multi_writer(__attribute__((unused)) void *arg)
  */
 static int
 test_hash_add_no_ks_lookup_hit(struct rwc_perf *rwc_perf_results, int rwc_lf,
-				int htm)
+				int htm, int ext_bkt)
 {
 	unsigned int n, m;
 	uint64_t i;
 	int use_jhash = 0;
-	uint8_t key_shift = 0;
+	uint8_t write_type = WRITE_NO_KEY_SHIFT;
 	uint8_t read_type = READ_PASS_NO_KEY_SHIFTS;
 
 	rte_atomic64_init(&greads);
 	rte_atomic64_init(&gread_cycles);
 
-	if (init_params(rwc_lf, use_jhash, htm) != 0)
+	if (init_params(rwc_lf, use_jhash, htm, ext_bkt) != 0)
 		goto err;
 	printf("\nTest: Hash add - no key-shifts, read - hit\n");
 	for (m = 0; m < 2; m++) {
@@ -612,7 +731,7 @@ test_hash_add_no_ks_lookup_hit(struct rwc_perf *rwc_perf_results, int rwc_lf,
 
 			rte_hash_reset(tbl_rwc_test_param.h);
 			writer_done = 0;
-			if (write_keys(key_shift) < 0)
+			if (write_keys(write_type) < 0)
 				goto err;
 			writer_done = 1;
 			for (i = 1; i <= rwc_core_cnt[n]; i++)
@@ -650,19 +769,19 @@ test_hash_add_no_ks_lookup_hit(struct rwc_perf *rwc_perf_results, int rwc_lf,
  */
 static int
 test_hash_add_no_ks_lookup_miss(struct rwc_perf *rwc_perf_results, int rwc_lf,
-				int htm)
+				int htm, int ext_bkt)
 {
 	unsigned int n, m;
 	uint64_t i;
 	int use_jhash = 0;
-	uint8_t key_shift = 0;
+	uint8_t write_type = WRITE_NO_KEY_SHIFT;
 	uint8_t read_type = READ_FAIL;
 	int ret;
 
 	rte_atomic64_init(&greads);
 	rte_atomic64_init(&gread_cycles);
 
-	if (init_params(rwc_lf, use_jhash, htm) != 0)
+	if (init_params(rwc_lf, use_jhash, htm, ext_bkt) != 0)
 		goto err;
 	printf("\nTest: Hash add - no key-shifts, Hash lookup - miss\n");
 	for (m = 0; m < 2; m++) {
@@ -687,7 +806,7 @@ test_hash_add_no_ks_lookup_miss(struct rwc_perf *rwc_perf_results, int rwc_lf,
 				rte_eal_remote_launch(test_rwc_reader,
 						(void *)(uintptr_t)read_type,
 							enabled_core_ids[i]);
-			ret = write_keys(key_shift);
+			ret = write_keys(write_type);
 			writer_done = 1;
 			rte_eal_mp_wait_lcore();
 
@@ -722,19 +841,19 @@ test_hash_add_no_ks_lookup_miss(struct rwc_perf *rwc_perf_results, int rwc_lf,
  */
 static int
 test_hash_add_ks_lookup_hit_non_sp(struct rwc_perf *rwc_perf_results,
-				    int rwc_lf, int htm)
+				    int rwc_lf, int htm, int ext_bkt)
 {
 	unsigned int n, m;
 	uint64_t i;
 	int use_jhash = 0;
 	int ret;
-	uint8_t key_shift;
+	uint8_t write_type;
 	uint8_t read_type = READ_PASS_NON_SHIFT_PATH;
 
 	rte_atomic64_init(&greads);
 	rte_atomic64_init(&gread_cycles);
 
-	if (init_params(rwc_lf, use_jhash, htm) != 0)
+	if (init_params(rwc_lf, use_jhash, htm, ext_bkt) != 0)
 		goto err;
 	printf("\nTest: Hash add - key shift, Hash lookup - hit"
 	       " (non-shift-path)\n");
@@ -755,15 +874,15 @@ test_hash_add_ks_lookup_hit_non_sp(struct rwc_perf *rwc_perf_results,
 
 			rte_hash_reset(tbl_rwc_test_param.h);
 			writer_done = 0;
-			key_shift = 0;
-			if (write_keys(key_shift) < 0)
+			write_type = WRITE_NO_KEY_SHIFT;
+			if (write_keys(write_type) < 0)
 				goto err;
 			for (i = 1; i <= rwc_core_cnt[n]; i++)
 				rte_eal_remote_launch(test_rwc_reader,
 						(void *)(uintptr_t)read_type,
 							enabled_core_ids[i]);
-			key_shift = 1;
-			ret = write_keys(key_shift);
+			write_type = WRITE_KEY_SHIFT;
+			ret = write_keys(write_type);
 			writer_done = 1;
 			rte_eal_mp_wait_lcore();
 
@@ -798,19 +917,19 @@ test_hash_add_ks_lookup_hit_non_sp(struct rwc_perf *rwc_perf_results,
  */
 static int
 test_hash_add_ks_lookup_hit_sp(struct rwc_perf *rwc_perf_results, int rwc_lf,
-				int htm)
+				int htm, int ext_bkt)
 {
 	unsigned int n, m;
 	uint64_t i;
 	int use_jhash = 0;
 	int ret;
-	uint8_t key_shift;
+	uint8_t write_type;
 	uint8_t read_type = READ_PASS_SHIFT_PATH;
 
 	rte_atomic64_init(&greads);
 	rte_atomic64_init(&gread_cycles);
 
-	if (init_params(rwc_lf, use_jhash, htm) != 0)
+	if (init_params(rwc_lf, use_jhash, htm, ext_bkt) != 0)
 		goto err;
 	printf("\nTest: Hash add - key shift, Hash lookup - hit (shift-path)"
 	       "\n");
@@ -831,15 +950,15 @@ test_hash_add_ks_lookup_hit_sp(struct rwc_perf *rwc_perf_results, int rwc_lf,
 
 			rte_hash_reset(tbl_rwc_test_param.h);
 			writer_done = 0;
-			key_shift = 0;
-			if (write_keys(key_shift) < 0)
+			write_type = WRITE_NO_KEY_SHIFT;
+			if (write_keys(write_type) < 0)
 				goto err;
 			for (i = 1; i <= rwc_core_cnt[n]; i++)
 				rte_eal_remote_launch(test_rwc_reader,
 						(void *)(uintptr_t)read_type,
 						enabled_core_ids[i]);
-			key_shift = 1;
-			ret = write_keys(key_shift);
+			write_type = WRITE_KEY_SHIFT;
+			ret = write_keys(write_type);
 			writer_done = 1;
 			rte_eal_mp_wait_lcore();
 
@@ -874,19 +993,19 @@ test_hash_add_ks_lookup_hit_sp(struct rwc_perf *rwc_perf_results, int rwc_lf,
  */
 static int
 test_hash_add_ks_lookup_miss(struct rwc_perf *rwc_perf_results, int rwc_lf, int
-			     htm)
+			     htm, int ext_bkt)
 {
 	unsigned int n, m;
 	uint64_t i;
 	int use_jhash = 0;
 	int ret;
-	uint8_t key_shift;
+	uint8_t write_type;
 	uint8_t read_type = READ_FAIL;
 
 	rte_atomic64_init(&greads);
 	rte_atomic64_init(&gread_cycles);
 
-	if (init_params(rwc_lf, use_jhash, htm) != 0)
+	if (init_params(rwc_lf, use_jhash, htm, ext_bkt) != 0)
 		goto err;
 	printf("\nTest: Hash add - key shift, Hash lookup - miss\n");
 	for (m = 0; m < 2; m++) {
@@ -906,15 +1025,15 @@ test_hash_add_ks_lookup_miss(struct rwc_perf *rwc_perf_results, int rwc_lf, int
 
 			rte_hash_reset(tbl_rwc_test_param.h);
 			writer_done = 0;
-			key_shift = 0;
-			if (write_keys(key_shift) < 0)
+			write_type = WRITE_NO_KEY_SHIFT;
+			if (write_keys(write_type) < 0)
 				goto err;
 			for (i = 1; i <= rwc_core_cnt[n]; i++)
 				rte_eal_remote_launch(test_rwc_reader,
 						(void *)(uintptr_t)read_type,
 							enabled_core_ids[i]);
-			key_shift = 1;
-			ret = write_keys(key_shift);
+			write_type = WRITE_KEY_SHIFT;
+			ret = write_keys(write_type);
 			writer_done = 1;
 			rte_eal_mp_wait_lcore();
 
@@ -949,18 +1068,18 @@ test_hash_add_ks_lookup_miss(struct rwc_perf *rwc_perf_results, int rwc_lf, int
  */
 static int
 test_hash_multi_add_lookup(struct rwc_perf *rwc_perf_results, int rwc_lf,
-			   int htm)
+			   int htm, int ext_bkt)
 {
 	unsigned int n, m, k;
 	uint64_t i;
 	int use_jhash = 0;
-	uint8_t key_shift;
+	uint8_t write_type;
 	uint8_t read_type = READ_PASS_SHIFT_PATH;
 
 	rte_atomic64_init(&greads);
 	rte_atomic64_init(&gread_cycles);
 
-	if (init_params(rwc_lf, use_jhash, htm) != 0)
+	if (init_params(rwc_lf, use_jhash, htm, ext_bkt) != 0)
 		goto err;
 	printf("\nTest: Multi-add-lookup\n");
 	uint8_t pos_core;
@@ -991,8 +1110,8 @@ test_hash_multi_add_lookup(struct rwc_perf *rwc_perf_results, int rwc_lf,
 				writer_done = 0;
 				for (i = 0; i < 4; i++)
 					multi_writer_done[i] = 0;
-				key_shift = 0;
-				if (write_keys(key_shift) < 0)
+				write_type = WRITE_NO_KEY_SHIFT;
+				if (write_keys(write_type) < 0)
 					goto err;
 
 				/* Launch reader(s) */
@@ -1000,7 +1119,7 @@ test_hash_multi_add_lookup(struct rwc_perf *rwc_perf_results, int rwc_lf,
 					rte_eal_remote_launch(test_rwc_reader,
 						(void *)(uintptr_t)read_type,
 						enabled_core_ids[i]);
-				key_shift = 1;
+				write_type = WRITE_KEY_SHIFT;
 				pos_core = 0;
 
 				/* Launch writers */
@@ -1045,6 +1164,88 @@ test_hash_multi_add_lookup(struct rwc_perf *rwc_perf_results, int rwc_lf,
 	return -1;
 }
 
+/*
+ * Test lookup perf:
+ * Reader(s) lookup keys present in the extendable bkt.
+ */
+static int
+test_hash_add_ks_lookup_hit_extbkt(struct rwc_perf *rwc_perf_results,
+				int rwc_lf, int htm, int ext_bkt)
+{
+	unsigned int n, m;
+	uint64_t i;
+	int use_jhash = 0;
+	uint8_t write_type;
+	uint8_t read_type = READ_PASS_KEY_SHIFTS_EXTBKT;
+
+	rte_atomic64_init(&greads);
+	rte_atomic64_init(&gread_cycles);
+
+	if (init_params(rwc_lf, use_jhash, htm, ext_bkt) != 0)
+		goto err;
+	printf("\nTest: Hash add - key-shifts, read - hit (ext_bkt)\n");
+	for (m = 0; m < 2; m++) {
+		if (m == 1) {
+			printf("\n** With bulk-lookup **\n");
+			read_type |= BULK_LOOKUP;
+		}
+		for (n = 0; n < NUM_TEST; n++) {
+			unsigned int tot_lcore = rte_lcore_count();
+			if (tot_lcore < rwc_core_cnt[n] + 1)
+				goto finish;
+
+			printf("\nNumber of readers: %u\n", rwc_core_cnt[n]);
+
+			rte_atomic64_clear(&greads);
+			rte_atomic64_clear(&gread_cycles);
+
+			rte_hash_reset(tbl_rwc_test_param.h);
+			write_type = WRITE_NO_KEY_SHIFT;
+			if (write_keys(write_type) < 0)
+				goto err;
+			write_type = WRITE_KEY_SHIFT;
+			if (write_keys(write_type) < 0)
+				goto err;
+			writer_done = 0;
+			for (i = 1; i <= rwc_core_cnt[n]; i++)
+				rte_eal_remote_launch(test_rwc_reader,
+						(void *)(uintptr_t)read_type,
+							enabled_core_ids[i]);
+			for (i = 0; i < tbl_rwc_test_param.count_keys_ks_extbkt;
+			     i++) {
+				if (rte_hash_del_key(tbl_rwc_test_param.h,
+					tbl_rwc_test_param.keys_ks_extbkt + i)
+							< 0) {
+					printf("Delete Failed: %u\n",
+					tbl_rwc_test_param.keys_ks_extbkt[i]);
+					goto err;
+				}
+			}
+			writer_done = 1;
+			rte_eal_mp_wait_lcore();
+
+			for (i = 1; i <= rwc_core_cnt[n]; i++)
+				if (lcore_config[i].ret < 0)
+					goto err;
+
+			unsigned long long cycles_per_lookup =
+				rte_atomic64_read(&gread_cycles) /
+				rte_atomic64_read(&greads);
+			rwc_perf_results->w_ks_r_hit_extbkt[m][n]
+						= cycles_per_lookup;
+			printf("Cycles per lookup: %llu\n", cycles_per_lookup);
+		}
+	}
+
+finish:
+	rte_hash_free(tbl_rwc_test_param.h);
+	return 0;
+
+err:
+	rte_hash_free(tbl_rwc_test_param.h);
+	return -1;
+}
+
 static int
 test_hash_readwrite_lf_main(void)
 {
@@ -1057,6 +1258,7 @@ test_hash_readwrite_lf_main(void)
 	int rwc_lf = 0;
 	int htm;
 	int use_jhash = 0;
+	int ext_bkt = 0;
 	if (rte_lcore_count() == 1) {
 		printf("More than one lcore is required "
 			"to do read write lock-free concurrency test\n");
@@ -1070,7 +1272,7 @@ test_hash_readwrite_lf_main(void)
 	else
 		htm = 0;
 
-	if (init_params(rwc_lf, use_jhash, htm) != 0)
+	if (init_params(rwc_lf, use_jhash, htm, ext_bkt) != 0)
 		return -1;
 	if (generate_keys() != 0)
 		return -1;
@@ -1079,25 +1281,29 @@ test_hash_readwrite_lf_main(void)
 
 	if (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF) {
 		rwc_lf = 1;
+		ext_bkt = 1;
 		printf("Test lookup with read-write concurrency lock free support"
 		       " enabled\n");
 		if (test_hash_add_no_ks_lookup_hit(&rwc_lf_results, rwc_lf,
-							htm) < 0)
+							htm, ext_bkt) < 0)
 			return -1;
 		if (test_hash_add_no_ks_lookup_miss(&rwc_lf_results, rwc_lf,
-							htm) < 0)
+							htm, ext_bkt) < 0)
 			return -1;
 		if (test_hash_add_ks_lookup_hit_non_sp(&rwc_lf_results, rwc_lf,
-							htm) < 0)
+							htm, ext_bkt) < 0)
 			return -1;
 		if (test_hash_add_ks_lookup_hit_sp(&rwc_lf_results, rwc_lf,
-							htm) < 0)
+							htm, ext_bkt) < 0)
 			return -1;
-		if (test_hash_add_ks_lookup_miss(&rwc_lf_results, rwc_lf, htm)
-							< 0)
+		if (test_hash_add_ks_lookup_miss(&rwc_lf_results, rwc_lf, htm,
+						 ext_bkt) < 0)
 			return -1;
-		if (test_hash_multi_add_lookup(&rwc_lf_results, rwc_lf, htm)
-							< 0)
+		if (test_hash_multi_add_lookup(&rwc_lf_results, rwc_lf, htm,
+					       ext_bkt) < 0)
+			return -1;
+		if (test_hash_add_ks_lookup_hit_extbkt(&rwc_lf_results, rwc_lf,
+							htm, ext_bkt) < 0)
 			return -1;
 	}
 	printf("\nTest lookup with read-write concurrency lock free support"
@@ -1112,21 +1318,26 @@ test_hash_readwrite_lf_main(void)
 		}
 	} else
 		printf("With HTM Enabled\n");
-	if (test_hash_add_no_ks_lookup_hit(&rwc_non_lf_results, rwc_lf, htm)
-						< 0)
+	if (test_hash_add_no_ks_lookup_hit(&rwc_non_lf_results, rwc_lf, htm,
+					   ext_bkt) < 0)
 		return -1;
-	if (test_hash_add_no_ks_lookup_miss(&rwc_non_lf_results, rwc_lf, htm)
-						< 0)
+	if (test_hash_add_no_ks_lookup_miss(&rwc_non_lf_results, rwc_lf, htm,
+						ext_bkt) < 0)
 		return -1;
 	if (test_hash_add_ks_lookup_hit_non_sp(&rwc_non_lf_results, rwc_lf,
-						htm) < 0)
+						htm, ext_bkt) < 0)
+		return -1;
+	if (test_hash_add_ks_lookup_hit_sp(&rwc_non_lf_results, rwc_lf, htm,
+						ext_bkt) < 0)
 		return -1;
-	if (test_hash_add_ks_lookup_hit_sp(&rwc_non_lf_results, rwc_lf, htm)
-						< 0)
+	if (test_hash_add_ks_lookup_miss(&rwc_non_lf_results, rwc_lf, htm,
+					 ext_bkt) < 0)
 		return -1;
-	if (test_hash_add_ks_lookup_miss(&rwc_non_lf_results, rwc_lf, htm) < 0)
+	if (test_hash_multi_add_lookup(&rwc_non_lf_results, rwc_lf, htm,
+							ext_bkt) < 0)
 		return -1;
-	if (test_hash_multi_add_lookup(&rwc_non_lf_results, rwc_lf, htm) < 0)
+	if (test_hash_add_ks_lookup_hit_extbkt(&rwc_lf_results, rwc_lf,
+						htm, ext_bkt) < 0)
 		return -1;
 results:
 	printf("\n\t\t\t\t\t\t********** Results summary **********\n\n");
@@ -1158,8 +1369,11 @@ test_hash_readwrite_lf_main(void)
 			       "(shift-path)\t\t%u\n\t\t\t\t\t\t\t\t",
 			       rwc_lf_results.w_ks_r_hit_sp[j][i]);
 			printf("Hash add - key-shifts, Hash lookup miss\t\t\t\t"
-				"%u\n\n\t\t\t\t",
+				"%u\n\t\t\t\t\t\t\t\t",
 				rwc_lf_results.w_ks_r_miss[j][i]);
+			printf("Hash add - key-shifts, Hash lookup hit (ext_bkt)\t\t"
+				"%u\n\n\t\t\t\t",
+				rwc_lf_results.w_ks_r_hit_extbkt[j][i]);
 
 			printf("Disabled\t");
 			if (htm)
@@ -1179,7 +1393,11 @@ test_hash_readwrite_lf_main(void)
 			       "(shift-path)\t\t%u\n\t\t\t\t\t\t\t\t",
 			       rwc_non_lf_results.w_ks_r_hit_sp[j][i]);
 			printf("Hash add - key-shifts, Hash lookup miss\t\t\t\t"
-			       "%u\n", rwc_non_lf_results.w_ks_r_miss[j][i]);
+			       "%u\n\t\t\t\t\t\t\t\t",
+			       rwc_non_lf_results.w_ks_r_miss[j][i]);
+			printf("Hash add - key-shifts, Hash lookup hit (ext_bkt)\t\t"
+				"%u\n",
+				rwc_non_lf_results.w_ks_r_hit_extbkt[j][i]);
 
 			printf("_______\t\t_______\t\t_________\t___\t\t"
 			       "_________\t\t\t\t\t\t_________________\n");
-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket
  2019-03-02  0:23 ` [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket Dharmik Thakkar
@ 2019-03-07 17:49   ` Wang, Yipeng1
  2019-03-07 22:14     ` Dharmik Thakkar
  0 siblings, 1 reply; 9+ messages in thread
From: Wang, Yipeng1 @ 2019-03-07 17:49 UTC (permalink / raw)
  To: Dharmik Thakkar, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo, Mcnamara, John, Kovacevic, Marko
  Cc: dev, Tai, Charlie

Thanks for this patch! Some initial comments inlined:

>-----Original Message-----
>From: Dharmik Thakkar [mailto:dharmik.thakkar@arm.com]
>Sent: Friday, March 1, 2019 4:24 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Mcnamara, John
><john.mcnamara@intel.com>; Kovacevic, Marko <marko.kovacevic@intel.com>
>Cc: dev@dpdk.org; Dharmik Thakkar <dharmik.thakkar@arm.com>
>Subject: [RFC 1/2] hash: add lock free support for extendable bucket
>
>This patch enables lock-free read-write concurrency support for
>extendable bucket feature.
>
>Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
>---
> doc/guides/prog_guide/hash_lib.rst |   3 +-
> lib/librte_hash/rte_cuckoo_hash.c  | 150 +++++++++++++++++++----------
> lib/librte_hash/rte_cuckoo_hash.h  |   8 ++
> 3 files changed, 110 insertions(+), 51 deletions(-)
>
>diff --git a/doc/guides/prog_guide/hash_lib.rst b/doc/guides/prog_guide/hash_lib.rst
>index 85a6edfa8b16..b00446e949ba 100644
>--- a/doc/guides/prog_guide/hash_lib.rst
>+++ b/doc/guides/prog_guide/hash_lib.rst
>@@ -108,8 +108,7 @@ Extendable Bucket Functionality support
> An extra flag is used to enable this functionality (flag is not set by default). When the (RTE_HASH_EXTRA_FLAGS_EXT_TABLE) is set
>and
> in the very unlikely case due to excessive hash collisions that a key has failed to be inserted, the hash table bucket is extended with a
>linked
> list to insert these failed keys. This feature is important for the workloads (e.g. telco workloads) that need to insert up to 100% of the
>-hash table size and can't tolerate any key insertion failure (even if very few). Currently the extendable bucket is not supported
>-with the lock-free concurrency implementation (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF).
>+hash table size and can't tolerate any key insertion failure (even if very few).
>
>
> Implementation Details (non Extendable Bucket Case)
>diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>index c01489ba5193..54762533ce36 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.c
>+++ b/lib/librte_hash/rte_cuckoo_hash.c
>@@ -170,15 +170,6 @@ rte_hash_create(const struct rte_hash_parameters *params)
> 		return NULL;
> 	}
>
>-	if ((params->extra_flag & RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF) &&
>-	    (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)) {
>-		rte_errno = EINVAL;
>-		RTE_LOG(ERR, HASH, "rte_hash_create: extendable bucket "
>-			"feature not supported with rw concurrency "
>-			"lock free\n");
>-		return NULL;
>-	}
>-
> 	/* Check extra flags field to check extra options. */
> 	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT)
> 		hw_trans_mem_support = 1;
>@@ -1054,7 +1045,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> 			/* Check if slot is available */
> 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
> 				cur_bkt->sig_current[i] = short_sig;
>-				cur_bkt->key_idx[i] = new_idx;
>+				/* Key can be of arbitrary length, so it is
>+				 * not possible to store it atomically.
>+				 * Hence the new key element's memory stores
>+				 * (key as well as data) should be complete
>+				 * before it is referenced.
>+				 */
>+				__atomic_store_n(&cur_bkt->key_idx[i],
>+						 new_idx,
>+						 __ATOMIC_RELEASE);
> 				__hash_rw_writer_unlock(h);
> 				return new_idx - 1;
> 			}
>@@ -1072,10 +1071,23 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
> 	/* Use the first location of the new bucket */
> 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
>-	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
>+	/* Key can be of arbitrary length, so it is
>+	 * not possible to store it atomically.
>+	 * Hence the new key element's memory stores
>+	 * (key as well as data) should be complete
>+	 * before it is referenced.
>+	 */
>+	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
>+			 new_idx,
>+			 __ATOMIC_RELEASE);
 [Wang, Yipeng] Since it has not been linked and later on the linking is protected by
release, do we really need atomic store here?

> 	/* Link the new bucket to sec bucket linked list */
> 	last = rte_hash_get_last_bkt(sec_bkt);
>-	last->next = &h->buckets_ext[bkt_id];
>+	/* New bucket's memory stores (key as well as data)
>+	 * should be complete before it is referenced
>+	 */
>+	__atomic_store_n(&last->next,
>+			 &h->buckets_ext[bkt_id],
>+			 __ATOMIC_RELEASE);
> 	__hash_rw_writer_unlock(h);
> 	return new_idx - 1;
>
>@@ -1366,7 +1378,8 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
>  * empty slot.
>  */
> static inline void
>-__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
>+__rte_hash_compact_ll(const struct rte_hash *h,
>+			struct rte_hash_bucket *cur_bkt, int pos) {
> 	int i;
> 	struct rte_hash_bucket *last_bkt;
>
>@@ -1377,10 +1390,27 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
>
> 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
> 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
>-			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
> 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
>+			__atomic_store_n(&cur_bkt->key_idx[pos],
>+					 last_bkt->key_idx[i],
>+					 __ATOMIC_RELEASE);
>+			if (h->readwrite_concur_lf_support) {
>+				/* Inform the readers that the table has changed
>+				 * Since there is one writer, load acquires on
>+				 * tbl_chng_cnt are not required.
>+				 */
>+				__atomic_store_n(h->tbl_chng_cnt,
>+					 *h->tbl_chng_cnt + 1,
>+					 __ATOMIC_RELEASE);
>+				/* The stores to sig_alt and sig_current should
>+				 * not move above the store to tbl_chng_cnt.
>+				 */
>+				__atomic_thread_fence(__ATOMIC_RELEASE);
>+			}
> 			last_bkt->sig_current[i] = NULL_SIGNATURE;
>-			last_bkt->key_idx[i] = EMPTY_SLOT;
>+			__atomic_store_n(&last_bkt->key_idx[i],
>+					 EMPTY_SLOT,
>+					 __ATOMIC_RELEASE);
> 			return;
> 		}
> 	}
>@@ -1449,7 +1479,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
> 	/* look for key in primary bucket */
> 	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
> 	if (ret != -1) {
>-		__rte_hash_compact_ll(prim_bkt, pos);
>+		__rte_hash_compact_ll(h, prim_bkt, pos);
> 		last_bkt = prim_bkt->next;
> 		prev_bkt = prim_bkt;
> 		goto return_bkt;
>@@ -1461,7 +1491,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
> 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> 		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
> 		if (ret != -1) {
>-			__rte_hash_compact_ll(cur_bkt, pos);
>+			__rte_hash_compact_ll(h, cur_bkt, pos);
> 			last_bkt = sec_bkt->next;
> 			prev_bkt = sec_bkt;
> 			goto return_bkt;
>@@ -1488,11 +1518,21 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
> 	}
> 	/* found empty bucket and recycle */
> 	if (i == RTE_HASH_BUCKET_ENTRIES) {
>-		prev_bkt->next = last_bkt->next = NULL;
>+		__atomic_store_n(&prev_bkt->next,
>+				 NULL,
>+				 __ATOMIC_RELEASE);
> 		uint32_t index = last_bkt - h->buckets_ext + 1;
>-		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
>-	}
>+		if (!h->no_free_on_del)
>+			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
>+		else {
>+			struct rte_hash_key *key_slot =
>+				(struct rte_hash_key *)(
>+				(char *)h->key_store +
>+				ret * h->key_entry_size);
>+			key_slot->ext_bkt_to_free = index;
[Wang, Yipeng] Is there chance that a key_slot may already have one previous ext_bkt
and now got overwritten, so that the previous one gone forever?
>
>+		}
>+	}
> 	__hash_rw_writer_unlock(h);
> 	return ret;
> }
>@@ -1567,6 +1607,14 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
> 				(void *)((uintptr_t)position));
> 	}
>
>+	const struct rte_hash_key *key_slot = (const struct rte_hash_key *)(
>+						(const char *)h->key_store +
>+						position * h->key_entry_size);
>+	uint32_t index = key_slot->ext_bkt_to_free;
>+	if (!index)
>+		/* Recycle empty ext bkt to free list. */
>+		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
>+
> 	return 0;
> }
>
>@@ -1855,6 +1903,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
> 		rte_prefetch0(secondary_bkt[i]);
> 	}
>
>+	for (i = 0; i < num_keys; i++)
>+		positions[i] = -ENOENT;
[Wang, Yipeng] So is this for performance reason?
>+
> 	do {
> 		/* Load the table change counter before the lookup
> 		 * starts. Acquire semantics will make sure that
>@@ -1899,7 +1950,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
>
> 		/* Compare keys, first hits in primary first */
> 		for (i = 0; i < num_keys; i++) {
>-			positions[i] = -ENOENT;
> 			while (prim_hitmask[i]) {
> 				uint32_t hit_index =
> 						__builtin_ctzl(prim_hitmask[i])
>@@ -1972,6 +2022,36 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
> 			continue;
> 		}
>
>+
>+		/* all found, do not need to go through ext bkt */
>+		if (hits == ((1ULL << num_keys) - 1)) {
>+			if (hit_mask != NULL)
>+				*hit_mask = hits;

[Wang, Yipeng] I think you need to check the version counter before return,
and how about the fence?
>+			return;
>+		}
>+		/* need to check ext buckets for match */
>+		if (h->ext_table_support) {
>+			for (i = 0; i < num_keys; i++) {
>+				if ((hits & (1ULL << i)) != 0)
>+					continue;
>+				next_bkt = secondary_bkt[i]->next;
>+				FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>+					if (data != NULL)
>+						ret = search_one_bucket_lf(h,
>+							keys[i], sig[i],
>+							&data[i], cur_bkt);
>+					else
>+						ret = search_one_bucket_lf(h,
>+								keys[i], sig[i],
>+								NULL, cur_bkt);
>+					if (ret != -1) {
>+						positions[i] = ret;
>+						hits |= 1ULL << i;
>+						break;
>+					}
>+				}
>+			}
>+		}
> 		/* The loads of sig_current in compare_signatures
> 		 * should not move below the load from tbl_chng_cnt.
> 		 */
>@@ -1988,34 +2068,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
> 					__ATOMIC_ACQUIRE);
> 	} while (cnt_b != cnt_a);
>
>-	/* all found, do not need to go through ext bkt */
>-	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
>-		if (hit_mask != NULL)
>-			*hit_mask = hits;
>-		__hash_rw_reader_unlock(h);
>-		return;
>-	}
>-
>-	/* need to check ext buckets for match */
>-	for (i = 0; i < num_keys; i++) {
>-		if ((hits & (1ULL << i)) != 0)
>-			continue;
>-		next_bkt = secondary_bkt[i]->next;
>-		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>-			if (data != NULL)
>-				ret = search_one_bucket_lf(h, keys[i],
>-						sig[i], &data[i], cur_bkt);
>-			else
>-				ret = search_one_bucket_lf(h, keys[i],
>-						sig[i], NULL, cur_bkt);
>-			if (ret != -1) {
>-				positions[i] = ret;
>-				hits |= 1ULL << i;
>-				break;
>-			}
>-		}
>-	}
>-
> 	if (hit_mask != NULL)
> 		*hit_mask = hits;
> }
>diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
>index eacdaa8d4684..062cc2dd0296 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.h
>+++ b/lib/librte_hash/rte_cuckoo_hash.h
>@@ -129,6 +129,14 @@ struct lcore_cache {
>
> /* Structure that stores key-value pair */
> struct rte_hash_key {
>+	/* Stores index of an empty ext bkt to be recycled on calling
>+	 * rte_hash_del_xxx APIs. When lock free read-wrie concurrency is
[Wang, Yipeng] typo
>+	 * enabled, an empty ext bkt cannot be put into free list immediately
>+	 * (as readers might be using it still). Hence freeing of the ext bkt
>+	 * is piggy-backed to freeing of the key index.
>+	 */
[Wang, Yipeng] I am thinking if this breaks the "guarantee" provided by ext table, Since
a whole bucket could not be reused if one key not freed. Is there any fundamental issue with
a new API to recycle ext bucket or you just do not want to add a new API?
>+	uint32_t ext_bkt_to_free;
>+
> 	union {
> 		uintptr_t idata;
> 		void *pdata;
>--
>2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket
  2019-03-07 17:49   ` Wang, Yipeng1
@ 2019-03-07 22:14     ` Dharmik Thakkar
  2019-03-15  0:31       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 9+ messages in thread
From: Dharmik Thakkar @ 2019-03-07 22:14 UTC (permalink / raw)
  To: Wang, Yipeng1
  Cc: Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo,
	Mcnamara, John, Kovacevic, Marko, dev, Tai, Charlie,
	Honnappa Nagarahalli, nd

+ Honnappa

Hi Yipeng,

Thank you for the review comments!


> On Mar 7, 2019, at 11:49 AM, Wang, Yipeng1 <yipeng1.wang@intel.com> wrote:
> 
> Thanks for this patch! Some initial comments inlined:
> 
>> -----Original Message-----
>> From: Dharmik Thakkar [mailto:dharmik.thakkar@arm.com]
>> Sent: Friday, March 1, 2019 4:24 PM
>> To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Mcnamara, John
>> <john.mcnamara@intel.com>; Kovacevic, Marko <marko.kovacevic@intel.com>
>> Cc: dev@dpdk.org; Dharmik Thakkar <dharmik.thakkar@arm.com>
>> Subject: [RFC 1/2] hash: add lock free support for extendable bucket
>> 
>> This patch enables lock-free read-write concurrency support for
>> extendable bucket feature.
>> 
>> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
>> ---
>> doc/guides/prog_guide/hash_lib.rst |   3 +-
>> lib/librte_hash/rte_cuckoo_hash.c  | 150 +++++++++++++++++++----------
>> lib/librte_hash/rte_cuckoo_hash.h  |   8 ++
>> 3 files changed, 110 insertions(+), 51 deletions(-)
>> 
>> diff --git a/doc/guides/prog_guide/hash_lib.rst b/doc/guides/prog_guide/hash_lib.rst
>> index 85a6edfa8b16..b00446e949ba 100644
>> --- a/doc/guides/prog_guide/hash_lib.rst
>> +++ b/doc/guides/prog_guide/hash_lib.rst
>> @@ -108,8 +108,7 @@ Extendable Bucket Functionality support
>> An extra flag is used to enable this functionality (flag is not set by default). When the (RTE_HASH_EXTRA_FLAGS_EXT_TABLE) is set
>> and
>> in the very unlikely case due to excessive hash collisions that a key has failed to be inserted, the hash table bucket is extended with a
>> linked
>> list to insert these failed keys. This feature is important for the workloads (e.g. telco workloads) that need to insert up to 100% of the
>> -hash table size and can't tolerate any key insertion failure (even if very few). Currently the extendable bucket is not supported
>> -with the lock-free concurrency implementation (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF).
>> +hash table size and can't tolerate any key insertion failure (even if very few).
>> 
>> 
>> Implementation Details (non Extendable Bucket Case)
>> diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>> index c01489ba5193..54762533ce36 100644
>> --- a/lib/librte_hash/rte_cuckoo_hash.c
>> +++ b/lib/librte_hash/rte_cuckoo_hash.c
>> @@ -170,15 +170,6 @@ rte_hash_create(const struct rte_hash_parameters *params)
>> 		return NULL;
>> 	}
>> 
>> -	if ((params->extra_flag & RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF) &&
>> -	    (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)) {
>> -		rte_errno = EINVAL;
>> -		RTE_LOG(ERR, HASH, "rte_hash_create: extendable bucket "
>> -			"feature not supported with rw concurrency "
>> -			"lock free\n");
>> -		return NULL;
>> -	}
>> -
>> 	/* Check extra flags field to check extra options. */
>> 	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT)
>> 		hw_trans_mem_support = 1;
>> @@ -1054,7 +1045,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>> 			/* Check if slot is available */
>> 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
>> 				cur_bkt->sig_current[i] = short_sig;
>> -				cur_bkt->key_idx[i] = new_idx;
>> +				/* Key can be of arbitrary length, so it is
>> +				 * not possible to store it atomically.
>> +				 * Hence the new key element's memory stores
>> +				 * (key as well as data) should be complete
>> +				 * before it is referenced.
>> +				 */
>> +				__atomic_store_n(&cur_bkt->key_idx[i],
>> +						 new_idx,
>> +						 __ATOMIC_RELEASE);
>> 				__hash_rw_writer_unlock(h);
>> 				return new_idx - 1;
>> 			}
>> @@ -1072,10 +1071,23 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>> 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
>> 	/* Use the first location of the new bucket */
>> 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
>> -	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
>> +	/* Key can be of arbitrary length, so it is
>> +	 * not possible to store it atomically.
>> +	 * Hence the new key element's memory stores
>> +	 * (key as well as data) should be complete
>> +	 * before it is referenced.
>> +	 */
>> +	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
>> +			 new_idx,
>> +			 __ATOMIC_RELEASE);
> [Wang, Yipeng] Since it has not been linked and later on the linking is protected by
> release, do we really need atomic store here?
Atomic store is used here to maintain the code consistency.
> 
>> 	/* Link the new bucket to sec bucket linked list */
>> 	last = rte_hash_get_last_bkt(sec_bkt);
>> -	last->next = &h->buckets_ext[bkt_id];
>> +	/* New bucket's memory stores (key as well as data)
>> +	 * should be complete before it is referenced
>> +	 */
>> +	__atomic_store_n(&last->next,
>> +			 &h->buckets_ext[bkt_id],
>> +			 __ATOMIC_RELEASE);
>> 	__hash_rw_writer_unlock(h);
>> 	return new_idx - 1;
>> 
>> @@ -1366,7 +1378,8 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
>> * empty slot.
>> */
>> static inline void
>> -__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
>> +__rte_hash_compact_ll(const struct rte_hash *h,
>> +			struct rte_hash_bucket *cur_bkt, int pos) {
>> 	int i;
>> 	struct rte_hash_bucket *last_bkt;
>> 
>> @@ -1377,10 +1390,27 @@ __rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
>> 
>> 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
>> 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
>> -			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
>> 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
>> +			__atomic_store_n(&cur_bkt->key_idx[pos],
>> +					 last_bkt->key_idx[i],
>> +					 __ATOMIC_RELEASE);
>> +			if (h->readwrite_concur_lf_support) {
>> +				/* Inform the readers that the table has changed
>> +				 * Since there is one writer, load acquires on
>> +				 * tbl_chng_cnt are not required.
>> +				 */
>> +				__atomic_store_n(h->tbl_chng_cnt,
>> +					 *h->tbl_chng_cnt + 1,
>> +					 __ATOMIC_RELEASE);
>> +				/* The stores to sig_alt and sig_current should
>> +				 * not move above the store to tbl_chng_cnt.
>> +				 */
>> +				__atomic_thread_fence(__ATOMIC_RELEASE);
>> +			}
>> 			last_bkt->sig_current[i] = NULL_SIGNATURE;
>> -			last_bkt->key_idx[i] = EMPTY_SLOT;
>> +			__atomic_store_n(&last_bkt->key_idx[i],
>> +					 EMPTY_SLOT,
>> +					 __ATOMIC_RELEASE);
>> 			return;
>> 		}
>> 	}
>> @@ -1449,7 +1479,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
>> 	/* look for key in primary bucket */
>> 	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
>> 	if (ret != -1) {
>> -		__rte_hash_compact_ll(prim_bkt, pos);
>> +		__rte_hash_compact_ll(h, prim_bkt, pos);
>> 		last_bkt = prim_bkt->next;
>> 		prev_bkt = prim_bkt;
>> 		goto return_bkt;
>> @@ -1461,7 +1491,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
>> 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
>> 		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
>> 		if (ret != -1) {
>> -			__rte_hash_compact_ll(cur_bkt, pos);
>> +			__rte_hash_compact_ll(h, cur_bkt, pos);
>> 			last_bkt = sec_bkt->next;
>> 			prev_bkt = sec_bkt;
>> 			goto return_bkt;
>> @@ -1488,11 +1518,21 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
>> 	}
>> 	/* found empty bucket and recycle */
>> 	if (i == RTE_HASH_BUCKET_ENTRIES) {
>> -		prev_bkt->next = last_bkt->next = NULL;
>> +		__atomic_store_n(&prev_bkt->next,
>> +				 NULL,
>> +				 __ATOMIC_RELEASE);
>> 		uint32_t index = last_bkt - h->buckets_ext + 1;
>> -		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
>> -	}
>> +		if (!h->no_free_on_del)
>> +			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
>> +		else {
>> +			struct rte_hash_key *key_slot =
>> +				(struct rte_hash_key *)(
>> +				(char *)h->key_store +
>> +				ret * h->key_entry_size);
>> +			key_slot->ext_bkt_to_free = index;
> [Wang, Yipeng] Is there chance that a key_slot may already have one previous ext_bkt
> and now got overwritten, so that the previous one gone forever?
No, it is not possible. Since, the index is being stored in a key_slot which is associated with a deleted key.
>> 
>> +		}
>> +	}
>> 	__hash_rw_writer_unlock(h);
>> 	return ret;
>> }
>> @@ -1567,6 +1607,14 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
>> 				(void *)((uintptr_t)position));
>> 	}
>> 
>> +	const struct rte_hash_key *key_slot = (const struct rte_hash_key *)(
>> +						(const char *)h->key_store +
>> +						position * h->key_entry_size);
>> +	uint32_t index = key_slot->ext_bkt_to_free;
>> +	if (!index)
>> +		/* Recycle empty ext bkt to free list. */
>> +		rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
>> +
>> 	return 0;
>> }
>> 
>> @@ -1855,6 +1903,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
>> 		rte_prefetch0(secondary_bkt[i]);
>> 	}
>> 
>> +	for (i = 0; i < num_keys; i++)
>> +		positions[i] = -ENOENT;
> [Wang, Yipeng] So is this for performance reason?
Resetting positions[] on each iteration of do…while() require ‘hits’ to be reset as well, which causes performance hit.
>> +
>> 	do {
>> 		/* Load the table change counter before the lookup
>> 		 * starts. Acquire semantics will make sure that
>> @@ -1899,7 +1950,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
>> 
>> 		/* Compare keys, first hits in primary first */
>> 		for (i = 0; i < num_keys; i++) {
>> -			positions[i] = -ENOENT;
>> 			while (prim_hitmask[i]) {
>> 				uint32_t hit_index =
>> 						__builtin_ctzl(prim_hitmask[i])
>> @@ -1972,6 +2022,36 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
>> 			continue;
>> 		}
>> 
>> +
>> +		/* all found, do not need to go through ext bkt */
>> +		if (hits == ((1ULL << num_keys) - 1)) {
>> +			if (hit_mask != NULL)
>> +				*hit_mask = hits;
> 
> [Wang, Yipeng] I think you need to check the version counter before return,
> and how about the fence?
If all the keys are found, there is no need to check the counter.
>> +			return;
>> +		}
>> +		/* need to check ext buckets for match */
>> +		if (h->ext_table_support) {
>> +			for (i = 0; i < num_keys; i++) {
>> +				if ((hits & (1ULL << i)) != 0)
>> +					continue;
>> +				next_bkt = secondary_bkt[i]->next;
>> +				FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>> +					if (data != NULL)
>> +						ret = search_one_bucket_lf(h,
>> +							keys[i], sig[i],
>> +							&data[i], cur_bkt);
>> +					else
>> +						ret = search_one_bucket_lf(h,
>> +								keys[i], sig[i],
>> +								NULL, cur_bkt);
>> +					if (ret != -1) {
>> +						positions[i] = ret;
>> +						hits |= 1ULL << i;
>> +						break;
>> +					}
>> +				}
>> +			}
>> +		}
>> 		/* The loads of sig_current in compare_signatures
>> 		 * should not move below the load from tbl_chng_cnt.
>> 		 */
>> @@ -1988,34 +2068,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
>> 					__ATOMIC_ACQUIRE);
>> 	} while (cnt_b != cnt_a);
>> 
>> -	/* all found, do not need to go through ext bkt */
>> -	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
>> -		if (hit_mask != NULL)
>> -			*hit_mask = hits;
>> -		__hash_rw_reader_unlock(h);
>> -		return;
>> -	}
>> -
>> -	/* need to check ext buckets for match */
>> -	for (i = 0; i < num_keys; i++) {
>> -		if ((hits & (1ULL << i)) != 0)
>> -			continue;
>> -		next_bkt = secondary_bkt[i]->next;
>> -		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>> -			if (data != NULL)
>> -				ret = search_one_bucket_lf(h, keys[i],
>> -						sig[i], &data[i], cur_bkt);
>> -			else
>> -				ret = search_one_bucket_lf(h, keys[i],
>> -						sig[i], NULL, cur_bkt);
>> -			if (ret != -1) {
>> -				positions[i] = ret;
>> -				hits |= 1ULL << i;
>> -				break;
>> -			}
>> -		}
>> -	}
>> -
>> 	if (hit_mask != NULL)
>> 		*hit_mask = hits;
>> }
>> diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
>> index eacdaa8d4684..062cc2dd0296 100644
>> --- a/lib/librte_hash/rte_cuckoo_hash.h
>> +++ b/lib/librte_hash/rte_cuckoo_hash.h
>> @@ -129,6 +129,14 @@ struct lcore_cache {
>> 
>> /* Structure that stores key-value pair */
>> struct rte_hash_key {
>> +	/* Stores index of an empty ext bkt to be recycled on calling
>> +	 * rte_hash_del_xxx APIs. When lock free read-wrie concurrency is
> [Wang, Yipeng] typo
Will update it in the next version.
>> +	 * enabled, an empty ext bkt cannot be put into free list immediately
>> +	 * (as readers might be using it still). Hence freeing of the ext bkt
>> +	 * is piggy-backed to freeing of the key index.
>> +	 */
> [Wang, Yipeng] I am thinking if this breaks the "guarantee" provided by ext table, Since
> a whole bucket could not be reused if one key not freed. Is there any fundamental issue with
> a new API to recycle ext bucket or you just do not want to add a new API?
With lock-free feature, ‘delete’ becomes a two step process of ‘delete’ and ‘free’. In other words,
it can be viewed by the applications as a 'prolonged delete’. I’m not sure how adding a new API
to recycle ext bucket will help solving the issue.
>> +	uint32_t ext_bkt_to_free;
>> +
>> 	union {
>> 		uintptr_t idata;
>> 		void *pdata;
>> --
>> 2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket
  2019-03-07 22:14     ` Dharmik Thakkar
@ 2019-03-15  0:31       ` Honnappa Nagarahalli
  2019-03-15  0:31         ` Honnappa Nagarahalli
  2019-03-25 20:06         ` Dharmik Thakkar
  0 siblings, 2 replies; 9+ messages in thread
From: Honnappa Nagarahalli @ 2019-03-15  0:31 UTC (permalink / raw)
  To: Dharmik Thakkar, Wang, Yipeng1
  Cc: Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo,
	Mcnamara, John, Kovacevic, Marko, dev, Tai, Charlie, nd, nd

<snip>

> >> @@ -1072,10 +1071,23 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
> >> 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
> >> 	/* Use the first location of the new bucket */
> >> 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
> >> -	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
> >> +	/* Key can be of arbitrary length, so it is
> >> +	 * not possible to store it atomically.
> >> +	 * Hence the new key element's memory stores
> >> +	 * (key as well as data) should be complete
> >> +	 * before it is referenced.
> >> +	 */
> >> +	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
> >> +			 new_idx,
> >> +			 __ATOMIC_RELEASE);
> > [Wang, Yipeng] Since it has not been linked and later on the linking
> > is protected by release, do we really need atomic store here?
> Atomic store is used here to maintain the code consistency.
Agree the release order is not required. Removing it does not help much as it is only for the 1st element of a new bucket.

> >
> >> 	/* Link the new bucket to sec bucket linked list */
> >> 	last = rte_hash_get_last_bkt(sec_bkt);
> >> -	last->next = &h->buckets_ext[bkt_id];
> >> +	/* New bucket's memory stores (key as well as data)
> >> +	 * should be complete before it is referenced
> >> +	 */
> >> +	__atomic_store_n(&last->next,
> >> +			 &h->buckets_ext[bkt_id],
> >> +			 __ATOMIC_RELEASE);
The corresponding load-acquire is missing in the 'FOR_EACH_BUCKET' macro

> >> 	__hash_rw_writer_unlock(h);
> >> 	return new_idx - 1;
> >>
> >> @@ -1366,7 +1378,8 @@ remove_entry(const struct rte_hash *h, struct
> >> rte_hash_bucket *bkt, unsigned i)
> >> * empty slot.
> >> */
> >> static inline void
> >> -__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
> >> +__rte_hash_compact_ll(const struct rte_hash *h,
> >> +			struct rte_hash_bucket *cur_bkt, int pos) {
> >> 	int i;
> >> 	struct rte_hash_bucket *last_bkt;
> >>
> >> @@ -1377,10 +1390,27 @@ __rte_hash_compact_ll(struct
> rte_hash_bucket
> >> *cur_bkt, int pos) {
> >>
> >> 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
> >> 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
> >> -			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
> >> 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
> >> +			__atomic_store_n(&cur_bkt->key_idx[pos],
> >> +					 last_bkt->key_idx[i],
> >> +					 __ATOMIC_RELEASE);
> >> +			if (h->readwrite_concur_lf_support) {
> >> +				/* Inform the readers that the table has
> changed
> >> +				 * Since there is one writer, load acquires on
> >> +				 * tbl_chng_cnt are not required.
> >> +				 */
> >> +				__atomic_store_n(h->tbl_chng_cnt,
> >> +					 *h->tbl_chng_cnt + 1,
> >> +					 __ATOMIC_RELEASE);
> >> +				/* The stores to sig_alt and sig_current should
> >> +				 * not move above the store to tbl_chng_cnt.
> >> +				 */
> >> +				__atomic_thread_fence(__ATOMIC_RELEASE);
> >> +			}
> >> 			last_bkt->sig_current[i] = NULL_SIGNATURE;
> >> -			last_bkt->key_idx[i] = EMPTY_SLOT;
> >> +			__atomic_store_n(&last_bkt->key_idx[i],
> >> +					 EMPTY_SLOT,
> >> +					 __ATOMIC_RELEASE);
> >> 			return;
> >> 		}
> >> 	}
> >> @@ -1449,7 +1479,7 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,
> >> 	/* look for key in primary bucket */
> >> 	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
> >> 	if (ret != -1) {
> >> -		__rte_hash_compact_ll(prim_bkt, pos);
> >> +		__rte_hash_compact_ll(h, prim_bkt, pos);
> >> 		last_bkt = prim_bkt->next;
> >> 		prev_bkt = prim_bkt;
> >> 		goto return_bkt;
> >> @@ -1461,7 +1491,7 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,
> >> 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> >> 		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
> >> 		if (ret != -1) {
> >> -			__rte_hash_compact_ll(cur_bkt, pos);
> >> +			__rte_hash_compact_ll(h, cur_bkt, pos);
> >> 			last_bkt = sec_bkt->next;
> >> 			prev_bkt = sec_bkt;
> >> 			goto return_bkt;
> >> @@ -1488,11 +1518,21 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,
> >> 	}
> >> 	/* found empty bucket and recycle */
> >> 	if (i == RTE_HASH_BUCKET_ENTRIES) {
> >> -		prev_bkt->next = last_bkt->next = NULL;
> >> +		__atomic_store_n(&prev_bkt->next,
> >> +				 NULL,
> >> +				 __ATOMIC_RELEASE);
> >> 		uint32_t index = last_bkt - h->buckets_ext + 1;
> >> -		rte_ring_sp_enqueue(h->free_ext_bkts, (void
> *)(uintptr_t)index);
> >> -	}
> >> +		if (!h->no_free_on_del)
> >> +			rte_ring_sp_enqueue(h->free_ext_bkts, (void
> *)(uintptr_t)index);
> >> +		else {
> >> +			struct rte_hash_key *key_slot =
> >> +				(struct rte_hash_key *)(
> >> +				(char *)h->key_store +
> >> +				ret * h->key_entry_size);
> >> +			key_slot->ext_bkt_to_free = index;
> > [Wang, Yipeng] Is there chance that a key_slot may already have one
> > previous ext_bkt and now got overwritten, so that the previous one gone
> forever?
> No, it is not possible. Since, the index is being stored in a key_slot which is
> associated with a deleted key.
> >>
> >> +		}
> >> +	}
> >> 	__hash_rw_writer_unlock(h);
> >> 	return ret;
> >> }
> >> @@ -1567,6 +1607,14 @@ rte_hash_free_key_with_position(const struct
> rte_hash *h,
> >> 				(void *)((uintptr_t)position));
> >> 	}
> >>
> >> +	const struct rte_hash_key *key_slot = (const struct rte_hash_key *)(
> >> +						(const char *)h->key_store +
> >> +						position * h->key_entry_size);
> >> +	uint32_t index = key_slot->ext_bkt_to_free;
> >> +	if (!index)
> >> +		/* Recycle empty ext bkt to free list. */
> >> +		rte_ring_sp_enqueue(h->free_ext_bkts, (void
> *)(uintptr_t)index);
Suggest moving this to before freeing the key_index to avoid race conditions.
key_slot->ext_bkt_to_free needs to be set to 0 after freeing.

> >> +
> >> 	return 0;
> >> }
> >>
> >> @@ -1855,6 +1903,9 @@ __rte_hash_lookup_bulk_lf(const struct
> rte_hash *h, const void **keys,
> >> 		rte_prefetch0(secondary_bkt[i]);
> >> 	}
> >>
> >> +	for (i = 0; i < num_keys; i++)
> >> +		positions[i] = -ENOENT;
> > [Wang, Yipeng] So is this for performance reason?
> Resetting positions[] on each iteration of do…while() require ‘hits’ to be reset
> as well, which causes performance hit.
> >> +
> >> 	do {
> >> 		/* Load the table change counter before the lookup
> >> 		 * starts. Acquire semantics will make sure that @@ -1899,7
> +1950,6
> >> @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void
> >> **keys,
> >>
> >> 		/* Compare keys, first hits in primary first */
> >> 		for (i = 0; i < num_keys; i++) {
> >> -			positions[i] = -ENOENT;
> >> 			while (prim_hitmask[i]) {
> >> 				uint32_t hit_index =
> >> 						__builtin_ctzl(prim_hitmask[i])
> @@ -1972,6 +2022,36 @@
> >> __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
> >> 			continue;
> >> 		}
> >>
> >> +
> >> +		/* all found, do not need to go through ext bkt */
> >> +		if (hits == ((1ULL << num_keys) - 1)) {
> >> +			if (hit_mask != NULL)
> >> +				*hit_mask = hits;
> >
> > [Wang, Yipeng] I think you need to check the version counter before
> > return, and how about the fence?
> If all the keys are found, there is no need to check the counter.
> >> +			return;
> >> +		}
> >> +		/* need to check ext buckets for match */
> >> +		if (h->ext_table_support) {
> >> +			for (i = 0; i < num_keys; i++) {
> >> +				if ((hits & (1ULL << i)) != 0)
> >> +					continue;
> >> +				next_bkt = secondary_bkt[i]->next;
> >> +				FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> >> +					if (data != NULL)
> >> +						ret = search_one_bucket_lf(h,
> >> +							keys[i], sig[i],
> >> +							&data[i], cur_bkt);
> >> +					else
> >> +						ret = search_one_bucket_lf(h,
> >> +								keys[i], sig[i],
> >> +								NULL,
> cur_bkt);
> >> +					if (ret != -1) {
> >> +						positions[i] = ret;
> >> +						hits |= 1ULL << i;
> >> +						break;
> >> +					}
> >> +				}
> >> +			}
> >> +		}
> >> 		/* The loads of sig_current in compare_signatures
> >> 		 * should not move below the load from tbl_chng_cnt.
> >> 		 */
> >> @@ -1988,34 +2068,6 @@ __rte_hash_lookup_bulk_lf(const struct
> rte_hash *h, const void **keys,
> >> 					__ATOMIC_ACQUIRE);
> >> 	} while (cnt_b != cnt_a);
> >>
> >> -	/* all found, do not need to go through ext bkt */
> >> -	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
> >> -		if (hit_mask != NULL)
> >> -			*hit_mask = hits;
> >> -		__hash_rw_reader_unlock(h);
> >> -		return;
> >> -	}
> >> -
> >> -	/* need to check ext buckets for match */
> >> -	for (i = 0; i < num_keys; i++) {
> >> -		if ((hits & (1ULL << i)) != 0)
> >> -			continue;
> >> -		next_bkt = secondary_bkt[i]->next;
> >> -		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> >> -			if (data != NULL)
> >> -				ret = search_one_bucket_lf(h, keys[i],
> >> -						sig[i], &data[i], cur_bkt);
> >> -			else
> >> -				ret = search_one_bucket_lf(h, keys[i],
> >> -						sig[i], NULL, cur_bkt);
> >> -			if (ret != -1) {
> >> -				positions[i] = ret;
> >> -				hits |= 1ULL << i;
> >> -				break;
> >> -			}
> >> -		}
> >> -	}
> >> -
> >> 	if (hit_mask != NULL)
> >> 		*hit_mask = hits;
> >> }
> >> diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> >> b/lib/librte_hash/rte_cuckoo_hash.h
> >> index eacdaa8d4684..062cc2dd0296 100644
> >> --- a/lib/librte_hash/rte_cuckoo_hash.h
> >> +++ b/lib/librte_hash/rte_cuckoo_hash.h
> >> @@ -129,6 +129,14 @@ struct lcore_cache {
> >>
> >> /* Structure that stores key-value pair */ struct rte_hash_key {
> >> +	/* Stores index of an empty ext bkt to be recycled on calling
> >> +	 * rte_hash_del_xxx APIs. When lock free read-wrie concurrency is
> > [Wang, Yipeng] typo
> Will update it in the next version.
> >> +	 * enabled, an empty ext bkt cannot be put into free list immediately
> >> +	 * (as readers might be using it still). Hence freeing of the ext bkt
> >> +	 * is piggy-backed to freeing of the key index.
> >> +	 */
> > [Wang, Yipeng] I am thinking if this breaks the "guarantee" provided
> > by ext table, Since a whole bucket could not be reused if one key not
> > freed. Is there any fundamental issue with a new API to recycle ext bucket or
> you just do not want to add a new API?
> With lock-free feature, ‘delete’ becomes a two step process of ‘delete’ and
> ‘free’. In other words, it can be viewed by the applications as a 'prolonged
> delete’. I’m not sure how adding a new API to recycle ext bucket will help
> solving the issue.
> >> +	uint32_t ext_bkt_to_free;
> >> +
> >> 	union {
> >> 		uintptr_t idata;
> >> 		void *pdata;
> >> --
> >> 2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket
  2019-03-15  0:31       ` Honnappa Nagarahalli
@ 2019-03-15  0:31         ` Honnappa Nagarahalli
  2019-03-25 20:06         ` Dharmik Thakkar
  1 sibling, 0 replies; 9+ messages in thread
From: Honnappa Nagarahalli @ 2019-03-15  0:31 UTC (permalink / raw)
  To: Dharmik Thakkar, Wang, Yipeng1
  Cc: Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo,
	Mcnamara, John, Kovacevic, Marko, dev, Tai, Charlie, nd, nd

<snip>

> >> @@ -1072,10 +1071,23 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
> >> 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
> >> 	/* Use the first location of the new bucket */
> >> 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
> >> -	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
> >> +	/* Key can be of arbitrary length, so it is
> >> +	 * not possible to store it atomically.
> >> +	 * Hence the new key element's memory stores
> >> +	 * (key as well as data) should be complete
> >> +	 * before it is referenced.
> >> +	 */
> >> +	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
> >> +			 new_idx,
> >> +			 __ATOMIC_RELEASE);
> > [Wang, Yipeng] Since it has not been linked and later on the linking
> > is protected by release, do we really need atomic store here?
> Atomic store is used here to maintain the code consistency.
Agree the release order is not required. Removing it does not help much as it is only for the 1st element of a new bucket.

> >
> >> 	/* Link the new bucket to sec bucket linked list */
> >> 	last = rte_hash_get_last_bkt(sec_bkt);
> >> -	last->next = &h->buckets_ext[bkt_id];
> >> +	/* New bucket's memory stores (key as well as data)
> >> +	 * should be complete before it is referenced
> >> +	 */
> >> +	__atomic_store_n(&last->next,
> >> +			 &h->buckets_ext[bkt_id],
> >> +			 __ATOMIC_RELEASE);
The corresponding load-acquire is missing in the 'FOR_EACH_BUCKET' macro

> >> 	__hash_rw_writer_unlock(h);
> >> 	return new_idx - 1;
> >>
> >> @@ -1366,7 +1378,8 @@ remove_entry(const struct rte_hash *h, struct
> >> rte_hash_bucket *bkt, unsigned i)
> >> * empty slot.
> >> */
> >> static inline void
> >> -__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
> >> +__rte_hash_compact_ll(const struct rte_hash *h,
> >> +			struct rte_hash_bucket *cur_bkt, int pos) {
> >> 	int i;
> >> 	struct rte_hash_bucket *last_bkt;
> >>
> >> @@ -1377,10 +1390,27 @@ __rte_hash_compact_ll(struct
> rte_hash_bucket
> >> *cur_bkt, int pos) {
> >>
> >> 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
> >> 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
> >> -			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
> >> 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
> >> +			__atomic_store_n(&cur_bkt->key_idx[pos],
> >> +					 last_bkt->key_idx[i],
> >> +					 __ATOMIC_RELEASE);
> >> +			if (h->readwrite_concur_lf_support) {
> >> +				/* Inform the readers that the table has
> changed
> >> +				 * Since there is one writer, load acquires on
> >> +				 * tbl_chng_cnt are not required.
> >> +				 */
> >> +				__atomic_store_n(h->tbl_chng_cnt,
> >> +					 *h->tbl_chng_cnt + 1,
> >> +					 __ATOMIC_RELEASE);
> >> +				/* The stores to sig_alt and sig_current should
> >> +				 * not move above the store to tbl_chng_cnt.
> >> +				 */
> >> +				__atomic_thread_fence(__ATOMIC_RELEASE);
> >> +			}
> >> 			last_bkt->sig_current[i] = NULL_SIGNATURE;
> >> -			last_bkt->key_idx[i] = EMPTY_SLOT;
> >> +			__atomic_store_n(&last_bkt->key_idx[i],
> >> +					 EMPTY_SLOT,
> >> +					 __ATOMIC_RELEASE);
> >> 			return;
> >> 		}
> >> 	}
> >> @@ -1449,7 +1479,7 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,
> >> 	/* look for key in primary bucket */
> >> 	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
> >> 	if (ret != -1) {
> >> -		__rte_hash_compact_ll(prim_bkt, pos);
> >> +		__rte_hash_compact_ll(h, prim_bkt, pos);
> >> 		last_bkt = prim_bkt->next;
> >> 		prev_bkt = prim_bkt;
> >> 		goto return_bkt;
> >> @@ -1461,7 +1491,7 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,
> >> 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> >> 		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
> >> 		if (ret != -1) {
> >> -			__rte_hash_compact_ll(cur_bkt, pos);
> >> +			__rte_hash_compact_ll(h, cur_bkt, pos);
> >> 			last_bkt = sec_bkt->next;
> >> 			prev_bkt = sec_bkt;
> >> 			goto return_bkt;
> >> @@ -1488,11 +1518,21 @@ __rte_hash_del_key_with_hash(const struct
> rte_hash *h, const void *key,
> >> 	}
> >> 	/* found empty bucket and recycle */
> >> 	if (i == RTE_HASH_BUCKET_ENTRIES) {
> >> -		prev_bkt->next = last_bkt->next = NULL;
> >> +		__atomic_store_n(&prev_bkt->next,
> >> +				 NULL,
> >> +				 __ATOMIC_RELEASE);
> >> 		uint32_t index = last_bkt - h->buckets_ext + 1;
> >> -		rte_ring_sp_enqueue(h->free_ext_bkts, (void
> *)(uintptr_t)index);
> >> -	}
> >> +		if (!h->no_free_on_del)
> >> +			rte_ring_sp_enqueue(h->free_ext_bkts, (void
> *)(uintptr_t)index);
> >> +		else {
> >> +			struct rte_hash_key *key_slot =
> >> +				(struct rte_hash_key *)(
> >> +				(char *)h->key_store +
> >> +				ret * h->key_entry_size);
> >> +			key_slot->ext_bkt_to_free = index;
> > [Wang, Yipeng] Is there chance that a key_slot may already have one
> > previous ext_bkt and now got overwritten, so that the previous one gone
> forever?
> No, it is not possible. Since, the index is being stored in a key_slot which is
> associated with a deleted key.
> >>
> >> +		}
> >> +	}
> >> 	__hash_rw_writer_unlock(h);
> >> 	return ret;
> >> }
> >> @@ -1567,6 +1607,14 @@ rte_hash_free_key_with_position(const struct
> rte_hash *h,
> >> 				(void *)((uintptr_t)position));
> >> 	}
> >>
> >> +	const struct rte_hash_key *key_slot = (const struct rte_hash_key *)(
> >> +						(const char *)h->key_store +
> >> +						position * h->key_entry_size);
> >> +	uint32_t index = key_slot->ext_bkt_to_free;
> >> +	if (!index)
> >> +		/* Recycle empty ext bkt to free list. */
> >> +		rte_ring_sp_enqueue(h->free_ext_bkts, (void
> *)(uintptr_t)index);
Suggest moving this to before freeing the key_index to avoid race conditions.
key_slot->ext_bkt_to_free needs to be set to 0 after freeing.

> >> +
> >> 	return 0;
> >> }
> >>
> >> @@ -1855,6 +1903,9 @@ __rte_hash_lookup_bulk_lf(const struct
> rte_hash *h, const void **keys,
> >> 		rte_prefetch0(secondary_bkt[i]);
> >> 	}
> >>
> >> +	for (i = 0; i < num_keys; i++)
> >> +		positions[i] = -ENOENT;
> > [Wang, Yipeng] So is this for performance reason?
> Resetting positions[] on each iteration of do…while() require ‘hits’ to be reset
> as well, which causes performance hit.
> >> +
> >> 	do {
> >> 		/* Load the table change counter before the lookup
> >> 		 * starts. Acquire semantics will make sure that @@ -1899,7
> +1950,6
> >> @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void
> >> **keys,
> >>
> >> 		/* Compare keys, first hits in primary first */
> >> 		for (i = 0; i < num_keys; i++) {
> >> -			positions[i] = -ENOENT;
> >> 			while (prim_hitmask[i]) {
> >> 				uint32_t hit_index =
> >> 						__builtin_ctzl(prim_hitmask[i])
> @@ -1972,6 +2022,36 @@
> >> __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
> >> 			continue;
> >> 		}
> >>
> >> +
> >> +		/* all found, do not need to go through ext bkt */
> >> +		if (hits == ((1ULL << num_keys) - 1)) {
> >> +			if (hit_mask != NULL)
> >> +				*hit_mask = hits;
> >
> > [Wang, Yipeng] I think you need to check the version counter before
> > return, and how about the fence?
> If all the keys are found, there is no need to check the counter.
> >> +			return;
> >> +		}
> >> +		/* need to check ext buckets for match */
> >> +		if (h->ext_table_support) {
> >> +			for (i = 0; i < num_keys; i++) {
> >> +				if ((hits & (1ULL << i)) != 0)
> >> +					continue;
> >> +				next_bkt = secondary_bkt[i]->next;
> >> +				FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> >> +					if (data != NULL)
> >> +						ret = search_one_bucket_lf(h,
> >> +							keys[i], sig[i],
> >> +							&data[i], cur_bkt);
> >> +					else
> >> +						ret = search_one_bucket_lf(h,
> >> +								keys[i], sig[i],
> >> +								NULL,
> cur_bkt);
> >> +					if (ret != -1) {
> >> +						positions[i] = ret;
> >> +						hits |= 1ULL << i;
> >> +						break;
> >> +					}
> >> +				}
> >> +			}
> >> +		}
> >> 		/* The loads of sig_current in compare_signatures
> >> 		 * should not move below the load from tbl_chng_cnt.
> >> 		 */
> >> @@ -1988,34 +2068,6 @@ __rte_hash_lookup_bulk_lf(const struct
> rte_hash *h, const void **keys,
> >> 					__ATOMIC_ACQUIRE);
> >> 	} while (cnt_b != cnt_a);
> >>
> >> -	/* all found, do not need to go through ext bkt */
> >> -	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
> >> -		if (hit_mask != NULL)
> >> -			*hit_mask = hits;
> >> -		__hash_rw_reader_unlock(h);
> >> -		return;
> >> -	}
> >> -
> >> -	/* need to check ext buckets for match */
> >> -	for (i = 0; i < num_keys; i++) {
> >> -		if ((hits & (1ULL << i)) != 0)
> >> -			continue;
> >> -		next_bkt = secondary_bkt[i]->next;
> >> -		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
> >> -			if (data != NULL)
> >> -				ret = search_one_bucket_lf(h, keys[i],
> >> -						sig[i], &data[i], cur_bkt);
> >> -			else
> >> -				ret = search_one_bucket_lf(h, keys[i],
> >> -						sig[i], NULL, cur_bkt);
> >> -			if (ret != -1) {
> >> -				positions[i] = ret;
> >> -				hits |= 1ULL << i;
> >> -				break;
> >> -			}
> >> -		}
> >> -	}
> >> -
> >> 	if (hit_mask != NULL)
> >> 		*hit_mask = hits;
> >> }
> >> diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> >> b/lib/librte_hash/rte_cuckoo_hash.h
> >> index eacdaa8d4684..062cc2dd0296 100644
> >> --- a/lib/librte_hash/rte_cuckoo_hash.h
> >> +++ b/lib/librte_hash/rte_cuckoo_hash.h
> >> @@ -129,6 +129,14 @@ struct lcore_cache {
> >>
> >> /* Structure that stores key-value pair */ struct rte_hash_key {
> >> +	/* Stores index of an empty ext bkt to be recycled on calling
> >> +	 * rte_hash_del_xxx APIs. When lock free read-wrie concurrency is
> > [Wang, Yipeng] typo
> Will update it in the next version.
> >> +	 * enabled, an empty ext bkt cannot be put into free list immediately
> >> +	 * (as readers might be using it still). Hence freeing of the ext bkt
> >> +	 * is piggy-backed to freeing of the key index.
> >> +	 */
> > [Wang, Yipeng] I am thinking if this breaks the "guarantee" provided
> > by ext table, Since a whole bucket could not be reused if one key not
> > freed. Is there any fundamental issue with a new API to recycle ext bucket or
> you just do not want to add a new API?
> With lock-free feature, ‘delete’ becomes a two step process of ‘delete’ and
> ‘free’. In other words, it can be viewed by the applications as a 'prolonged
> delete’. I’m not sure how adding a new API to recycle ext bucket will help
> solving the issue.
> >> +	uint32_t ext_bkt_to_free;
> >> +
> >> 	union {
> >> 		uintptr_t idata;
> >> 		void *pdata;
> >> --
> >> 2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket
  2019-03-15  0:31       ` Honnappa Nagarahalli
  2019-03-15  0:31         ` Honnappa Nagarahalli
@ 2019-03-25 20:06         ` Dharmik Thakkar
  2019-03-25 20:06           ` Dharmik Thakkar
  1 sibling, 1 reply; 9+ messages in thread
From: Dharmik Thakkar @ 2019-03-25 20:06 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Wang, Yipeng1, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch,
	Pablo, Mcnamara, John, Kovacevic, Marko, dev, Tai, Charlie, nd

Hi Honnappa,

Thank you for the review comments!

> On Mar 14, 2019, at 7:31 PM, Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
> <snip>
> 
>>>> @@ -1072,10 +1071,23 @@ __rte_hash_add_key_with_hash(const struct
>> rte_hash *h, const void *key,
>>>> 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
>>>> 	/* Use the first location of the new bucket */
>>>> 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
>>>> -	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
>>>> +	/* Key can be of arbitrary length, so it is
>>>> +	 * not possible to store it atomically.
>>>> +	 * Hence the new key element's memory stores
>>>> +	 * (key as well as data) should be complete
>>>> +	 * before it is referenced.
>>>> +	 */
>>>> +	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
>>>> +			 new_idx,
>>>> +			 __ATOMIC_RELEASE);
>>> [Wang, Yipeng] Since it has not been linked and later on the linking
>>> is protected by release, do we really need atomic store here?
>> Atomic store is used here to maintain the code consistency.
> Agree the release order is not required. Removing it does not help much as it is only for the 1st element of a new bucket.
> 
>>> 
>>>> 	/* Link the new bucket to sec bucket linked list */
>>>> 	last = rte_hash_get_last_bkt(sec_bkt);
>>>> -	last->next = &h->buckets_ext[bkt_id];
>>>> +	/* New bucket's memory stores (key as well as data)
>>>> +	 * should be complete before it is referenced
>>>> +	 */
>>>> +	__atomic_store_n(&last->next,
>>>> +			 &h->buckets_ext[bkt_id],
>>>> +			 __ATOMIC_RELEASE);
> The corresponding load-acquire is missing in the 'FOR_EACH_BUCKET' macro
It is not required. Also, this store-release can be removed as there won’t be data race conditions as key_idx and sig are stored atomically. I have updated it in the patch.
> 
>>>> 	__hash_rw_writer_unlock(h);
>>>> 	return new_idx - 1;
>>>> 
>>>> @@ -1366,7 +1378,8 @@ remove_entry(const struct rte_hash *h, struct
>>>> rte_hash_bucket *bkt, unsigned i)
>>>> * empty slot.
>>>> */
>>>> static inline void
>>>> -__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
>>>> +__rte_hash_compact_ll(const struct rte_hash *h,
>>>> +			struct rte_hash_bucket *cur_bkt, int pos) {
>>>> 	int i;
>>>> 	struct rte_hash_bucket *last_bkt;
>>>> 
>>>> @@ -1377,10 +1390,27 @@ __rte_hash_compact_ll(struct
>> rte_hash_bucket
>>>> *cur_bkt, int pos) {
>>>> 
>>>> 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
>>>> 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
>>>> -			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
>>>> 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
>>>> +			__atomic_store_n(&cur_bkt->key_idx[pos],
>>>> +					 last_bkt->key_idx[i],
>>>> +					 __ATOMIC_RELEASE);
>>>> +			if (h->readwrite_concur_lf_support) {
>>>> +				/* Inform the readers that the table has
>> changed
>>>> +				 * Since there is one writer, load acquires on
>>>> +				 * tbl_chng_cnt are not required.
>>>> +				 */
>>>> +				__atomic_store_n(h->tbl_chng_cnt,
>>>> +					 *h->tbl_chng_cnt + 1,
>>>> +					 __ATOMIC_RELEASE);
>>>> +				/* The stores to sig_alt and sig_current should
>>>> +				 * not move above the store to tbl_chng_cnt.
>>>> +				 */
>>>> +				__atomic_thread_fence(__ATOMIC_RELEASE);
>>>> +			}
>>>> 			last_bkt->sig_current[i] = NULL_SIGNATURE;
>>>> -			last_bkt->key_idx[i] = EMPTY_SLOT;
>>>> +			__atomic_store_n(&last_bkt->key_idx[i],
>>>> +					 EMPTY_SLOT,
>>>> +					 __ATOMIC_RELEASE);
>>>> 			return;
>>>> 		}
>>>> 	}
>>>> @@ -1449,7 +1479,7 @@ __rte_hash_del_key_with_hash(const struct
>> rte_hash *h, const void *key,
>>>> 	/* look for key in primary bucket */
>>>> 	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
>>>> 	if (ret != -1) {
>>>> -		__rte_hash_compact_ll(prim_bkt, pos);
>>>> +		__rte_hash_compact_ll(h, prim_bkt, pos);
>>>> 		last_bkt = prim_bkt->next;
>>>> 		prev_bkt = prim_bkt;
>>>> 		goto return_bkt;
>>>> @@ -1461,7 +1491,7 @@ __rte_hash_del_key_with_hash(const struct
>> rte_hash *h, const void *key,
>>>> 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
>>>> 		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
>>>> 		if (ret != -1) {
>>>> -			__rte_hash_compact_ll(cur_bkt, pos);
>>>> +			__rte_hash_compact_ll(h, cur_bkt, pos);
>>>> 			last_bkt = sec_bkt->next;
>>>> 			prev_bkt = sec_bkt;
>>>> 			goto return_bkt;
>>>> @@ -1488,11 +1518,21 @@ __rte_hash_del_key_with_hash(const struct
>> rte_hash *h, const void *key,
>>>> 	}
>>>> 	/* found empty bucket and recycle */
>>>> 	if (i == RTE_HASH_BUCKET_ENTRIES) {
>>>> -		prev_bkt->next = last_bkt->next = NULL;
>>>> +		__atomic_store_n(&prev_bkt->next,
>>>> +				 NULL,
>>>> +				 __ATOMIC_RELEASE);
>>>> 		uint32_t index = last_bkt - h->buckets_ext + 1;
>>>> -		rte_ring_sp_enqueue(h->free_ext_bkts, (void
>> *)(uintptr_t)index);
>>>> -	}
>>>> +		if (!h->no_free_on_del)
>>>> +			rte_ring_sp_enqueue(h->free_ext_bkts, (void
>> *)(uintptr_t)index);
>>>> +		else {
>>>> +			struct rte_hash_key *key_slot =
>>>> +				(struct rte_hash_key *)(
>>>> +				(char *)h->key_store +
>>>> +				ret * h->key_entry_size);
>>>> +			key_slot->ext_bkt_to_free = index;
>>> [Wang, Yipeng] Is there chance that a key_slot may already have one
>>> previous ext_bkt and now got overwritten, so that the previous one gone
>> forever?
>> No, it is not possible. Since, the index is being stored in a key_slot which is
>> associated with a deleted key.
>>>> 
>>>> +		}
>>>> +	}
>>>> 	__hash_rw_writer_unlock(h);
>>>> 	return ret;
>>>> }
>>>> @@ -1567,6 +1607,14 @@ rte_hash_free_key_with_position(const struct
>> rte_hash *h,
>>>> 				(void *)((uintptr_t)position));
>>>> 	}
>>>> 
>>>> +	const struct rte_hash_key *key_slot = (const struct rte_hash_key *)(
>>>> +						(const char *)h->key_store +
>>>> +						position * h->key_entry_size);
>>>> +	uint32_t index = key_slot->ext_bkt_to_free;
>>>> +	if (!index)
>>>> +		/* Recycle empty ext bkt to free list. */
>>>> +		rte_ring_sp_enqueue(h->free_ext_bkts, (void
>> *)(uintptr_t)index);
> Suggest moving this to before freeing the key_index to avoid race conditions.
> key_slot->ext_bkt_to_free needs to be set to 0 after freeing.
Correct. Updated in the patch.
> 
>>>> +
>>>> 	return 0;
>>>> }
>>>> 
>>>> @@ -1855,6 +1903,9 @@ __rte_hash_lookup_bulk_lf(const struct
>> rte_hash *h, const void **keys,
>>>> 		rte_prefetch0(secondary_bkt[i]);
>>>> 	}
>>>> 
>>>> +	for (i = 0; i < num_keys; i++)
>>>> +		positions[i] = -ENOENT;
>>> [Wang, Yipeng] So is this for performance reason?
>> Resetting positions[] on each iteration of do…while() require ‘hits’ to be reset
>> as well, which causes performance hit.
>>>> +
>>>> 	do {
>>>> 		/* Load the table change counter before the lookup
>>>> 		 * starts. Acquire semantics will make sure that @@ -1899,7
>> +1950,6
>>>> @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void
>>>> **keys,
>>>> 
>>>> 		/* Compare keys, first hits in primary first */
>>>> 		for (i = 0; i < num_keys; i++) {
>>>> -			positions[i] = -ENOENT;
>>>> 			while (prim_hitmask[i]) {
>>>> 				uint32_t hit_index =
>>>> 						__builtin_ctzl(prim_hitmask[i])
>> @@ -1972,6 +2022,36 @@
>>>> __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
>>>> 			continue;
>>>> 		}
>>>> 
>>>> +
>>>> +		/* all found, do not need to go through ext bkt */
>>>> +		if (hits == ((1ULL << num_keys) - 1)) {
>>>> +			if (hit_mask != NULL)
>>>> +				*hit_mask = hits;
>>> 
>>> [Wang, Yipeng] I think you need to check the version counter before
>>> return, and how about the fence?
>> If all the keys are found, there is no need to check the counter.
>>>> +			return;
>>>> +		}
>>>> +		/* need to check ext buckets for match */
>>>> +		if (h->ext_table_support) {
>>>> +			for (i = 0; i < num_keys; i++) {
>>>> +				if ((hits & (1ULL << i)) != 0)
>>>> +					continue;
>>>> +				next_bkt = secondary_bkt[i]->next;
>>>> +				FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>>>> +					if (data != NULL)
>>>> +						ret = search_one_bucket_lf(h,
>>>> +							keys[i], sig[i],
>>>> +							&data[i], cur_bkt);
>>>> +					else
>>>> +						ret = search_one_bucket_lf(h,
>>>> +								keys[i], sig[i],
>>>> +								NULL,
>> cur_bkt);
>>>> +					if (ret != -1) {
>>>> +						positions[i] = ret;
>>>> +						hits |= 1ULL << i;
>>>> +						break;
>>>> +					}
>>>> +				}
>>>> +			}
>>>> +		}
>>>> 		/* The loads of sig_current in compare_signatures
>>>> 		 * should not move below the load from tbl_chng_cnt.
>>>> 		 */
>>>> @@ -1988,34 +2068,6 @@ __rte_hash_lookup_bulk_lf(const struct
>> rte_hash *h, const void **keys,
>>>> 					__ATOMIC_ACQUIRE);
>>>> 	} while (cnt_b != cnt_a);
>>>> 
>>>> -	/* all found, do not need to go through ext bkt */
>>>> -	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
>>>> -		if (hit_mask != NULL)
>>>> -			*hit_mask = hits;
>>>> -		__hash_rw_reader_unlock(h);
>>>> -		return;
>>>> -	}
>>>> -
>>>> -	/* need to check ext buckets for match */
>>>> -	for (i = 0; i < num_keys; i++) {
>>>> -		if ((hits & (1ULL << i)) != 0)
>>>> -			continue;
>>>> -		next_bkt = secondary_bkt[i]->next;
>>>> -		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>>>> -			if (data != NULL)
>>>> -				ret = search_one_bucket_lf(h, keys[i],
>>>> -						sig[i], &data[i], cur_bkt);
>>>> -			else
>>>> -				ret = search_one_bucket_lf(h, keys[i],
>>>> -						sig[i], NULL, cur_bkt);
>>>> -			if (ret != -1) {
>>>> -				positions[i] = ret;
>>>> -				hits |= 1ULL << i;
>>>> -				break;
>>>> -			}
>>>> -		}
>>>> -	}
>>>> -
>>>> 	if (hit_mask != NULL)
>>>> 		*hit_mask = hits;
>>>> }
>>>> diff --git a/lib/librte_hash/rte_cuckoo_hash.h
>>>> b/lib/librte_hash/rte_cuckoo_hash.h
>>>> index eacdaa8d4684..062cc2dd0296 100644
>>>> --- a/lib/librte_hash/rte_cuckoo_hash.h
>>>> +++ b/lib/librte_hash/rte_cuckoo_hash.h
>>>> @@ -129,6 +129,14 @@ struct lcore_cache {
>>>> 
>>>> /* Structure that stores key-value pair */ struct rte_hash_key {
>>>> +	/* Stores index of an empty ext bkt to be recycled on calling
>>>> +	 * rte_hash_del_xxx APIs. When lock free read-wrie concurrency is
>>> [Wang, Yipeng] typo
>> Will update it in the next version.
>>>> +	 * enabled, an empty ext bkt cannot be put into free list immediately
>>>> +	 * (as readers might be using it still). Hence freeing of the ext bkt
>>>> +	 * is piggy-backed to freeing of the key index.
>>>> +	 */
>>> [Wang, Yipeng] I am thinking if this breaks the "guarantee" provided
>>> by ext table, Since a whole bucket could not be reused if one key not
>>> freed. Is there any fundamental issue with a new API to recycle ext bucket or
>> you just do not want to add a new API?
>> With lock-free feature, ‘delete’ becomes a two step process of ‘delete’ and
>> ‘free’. In other words, it can be viewed by the applications as a 'prolonged
>> delete’. I’m not sure how adding a new API to recycle ext bucket will help
>> solving the issue.
>>>> +	uint32_t ext_bkt_to_free;
>>>> +
>>>> 	union {
>>>> 		uintptr_t idata;
>>>> 		void *pdata;
>>>> --
>>>> 2.17.1
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket
  2019-03-25 20:06         ` Dharmik Thakkar
@ 2019-03-25 20:06           ` Dharmik Thakkar
  0 siblings, 0 replies; 9+ messages in thread
From: Dharmik Thakkar @ 2019-03-25 20:06 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Wang, Yipeng1, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch,
	Pablo, Mcnamara, John, Kovacevic, Marko, dev, Tai, Charlie, nd

Hi Honnappa,

Thank you for the review comments!

> On Mar 14, 2019, at 7:31 PM, Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
> <snip>
> 
>>>> @@ -1072,10 +1071,23 @@ __rte_hash_add_key_with_hash(const struct
>> rte_hash *h, const void *key,
>>>> 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
>>>> 	/* Use the first location of the new bucket */
>>>> 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
>>>> -	(h->buckets_ext[bkt_id]).key_idx[0] = new_idx;
>>>> +	/* Key can be of arbitrary length, so it is
>>>> +	 * not possible to store it atomically.
>>>> +	 * Hence the new key element's memory stores
>>>> +	 * (key as well as data) should be complete
>>>> +	 * before it is referenced.
>>>> +	 */
>>>> +	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
>>>> +			 new_idx,
>>>> +			 __ATOMIC_RELEASE);
>>> [Wang, Yipeng] Since it has not been linked and later on the linking
>>> is protected by release, do we really need atomic store here?
>> Atomic store is used here to maintain the code consistency.
> Agree the release order is not required. Removing it does not help much as it is only for the 1st element of a new bucket.
> 
>>> 
>>>> 	/* Link the new bucket to sec bucket linked list */
>>>> 	last = rte_hash_get_last_bkt(sec_bkt);
>>>> -	last->next = &h->buckets_ext[bkt_id];
>>>> +	/* New bucket's memory stores (key as well as data)
>>>> +	 * should be complete before it is referenced
>>>> +	 */
>>>> +	__atomic_store_n(&last->next,
>>>> +			 &h->buckets_ext[bkt_id],
>>>> +			 __ATOMIC_RELEASE);
> The corresponding load-acquire is missing in the 'FOR_EACH_BUCKET' macro
It is not required. Also, this store-release can be removed as there won’t be data race conditions as key_idx and sig are stored atomically. I have updated it in the patch.
> 
>>>> 	__hash_rw_writer_unlock(h);
>>>> 	return new_idx - 1;
>>>> 
>>>> @@ -1366,7 +1378,8 @@ remove_entry(const struct rte_hash *h, struct
>>>> rte_hash_bucket *bkt, unsigned i)
>>>> * empty slot.
>>>> */
>>>> static inline void
>>>> -__rte_hash_compact_ll(struct rte_hash_bucket *cur_bkt, int pos) {
>>>> +__rte_hash_compact_ll(const struct rte_hash *h,
>>>> +			struct rte_hash_bucket *cur_bkt, int pos) {
>>>> 	int i;
>>>> 	struct rte_hash_bucket *last_bkt;
>>>> 
>>>> @@ -1377,10 +1390,27 @@ __rte_hash_compact_ll(struct
>> rte_hash_bucket
>>>> *cur_bkt, int pos) {
>>>> 
>>>> 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
>>>> 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
>>>> -			cur_bkt->key_idx[pos] = last_bkt->key_idx[i];
>>>> 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
>>>> +			__atomic_store_n(&cur_bkt->key_idx[pos],
>>>> +					 last_bkt->key_idx[i],
>>>> +					 __ATOMIC_RELEASE);
>>>> +			if (h->readwrite_concur_lf_support) {
>>>> +				/* Inform the readers that the table has
>> changed
>>>> +				 * Since there is one writer, load acquires on
>>>> +				 * tbl_chng_cnt are not required.
>>>> +				 */
>>>> +				__atomic_store_n(h->tbl_chng_cnt,
>>>> +					 *h->tbl_chng_cnt + 1,
>>>> +					 __ATOMIC_RELEASE);
>>>> +				/* The stores to sig_alt and sig_current should
>>>> +				 * not move above the store to tbl_chng_cnt.
>>>> +				 */
>>>> +				__atomic_thread_fence(__ATOMIC_RELEASE);
>>>> +			}
>>>> 			last_bkt->sig_current[i] = NULL_SIGNATURE;
>>>> -			last_bkt->key_idx[i] = EMPTY_SLOT;
>>>> +			__atomic_store_n(&last_bkt->key_idx[i],
>>>> +					 EMPTY_SLOT,
>>>> +					 __ATOMIC_RELEASE);
>>>> 			return;
>>>> 		}
>>>> 	}
>>>> @@ -1449,7 +1479,7 @@ __rte_hash_del_key_with_hash(const struct
>> rte_hash *h, const void *key,
>>>> 	/* look for key in primary bucket */
>>>> 	ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
>>>> 	if (ret != -1) {
>>>> -		__rte_hash_compact_ll(prim_bkt, pos);
>>>> +		__rte_hash_compact_ll(h, prim_bkt, pos);
>>>> 		last_bkt = prim_bkt->next;
>>>> 		prev_bkt = prim_bkt;
>>>> 		goto return_bkt;
>>>> @@ -1461,7 +1491,7 @@ __rte_hash_del_key_with_hash(const struct
>> rte_hash *h, const void *key,
>>>> 	FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
>>>> 		ret = search_and_remove(h, key, cur_bkt, short_sig, &pos);
>>>> 		if (ret != -1) {
>>>> -			__rte_hash_compact_ll(cur_bkt, pos);
>>>> +			__rte_hash_compact_ll(h, cur_bkt, pos);
>>>> 			last_bkt = sec_bkt->next;
>>>> 			prev_bkt = sec_bkt;
>>>> 			goto return_bkt;
>>>> @@ -1488,11 +1518,21 @@ __rte_hash_del_key_with_hash(const struct
>> rte_hash *h, const void *key,
>>>> 	}
>>>> 	/* found empty bucket and recycle */
>>>> 	if (i == RTE_HASH_BUCKET_ENTRIES) {
>>>> -		prev_bkt->next = last_bkt->next = NULL;
>>>> +		__atomic_store_n(&prev_bkt->next,
>>>> +				 NULL,
>>>> +				 __ATOMIC_RELEASE);
>>>> 		uint32_t index = last_bkt - h->buckets_ext + 1;
>>>> -		rte_ring_sp_enqueue(h->free_ext_bkts, (void
>> *)(uintptr_t)index);
>>>> -	}
>>>> +		if (!h->no_free_on_del)
>>>> +			rte_ring_sp_enqueue(h->free_ext_bkts, (void
>> *)(uintptr_t)index);
>>>> +		else {
>>>> +			struct rte_hash_key *key_slot =
>>>> +				(struct rte_hash_key *)(
>>>> +				(char *)h->key_store +
>>>> +				ret * h->key_entry_size);
>>>> +			key_slot->ext_bkt_to_free = index;
>>> [Wang, Yipeng] Is there chance that a key_slot may already have one
>>> previous ext_bkt and now got overwritten, so that the previous one gone
>> forever?
>> No, it is not possible. Since, the index is being stored in a key_slot which is
>> associated with a deleted key.
>>>> 
>>>> +		}
>>>> +	}
>>>> 	__hash_rw_writer_unlock(h);
>>>> 	return ret;
>>>> }
>>>> @@ -1567,6 +1607,14 @@ rte_hash_free_key_with_position(const struct
>> rte_hash *h,
>>>> 				(void *)((uintptr_t)position));
>>>> 	}
>>>> 
>>>> +	const struct rte_hash_key *key_slot = (const struct rte_hash_key *)(
>>>> +						(const char *)h->key_store +
>>>> +						position * h->key_entry_size);
>>>> +	uint32_t index = key_slot->ext_bkt_to_free;
>>>> +	if (!index)
>>>> +		/* Recycle empty ext bkt to free list. */
>>>> +		rte_ring_sp_enqueue(h->free_ext_bkts, (void
>> *)(uintptr_t)index);
> Suggest moving this to before freeing the key_index to avoid race conditions.
> key_slot->ext_bkt_to_free needs to be set to 0 after freeing.
Correct. Updated in the patch.
> 
>>>> +
>>>> 	return 0;
>>>> }
>>>> 
>>>> @@ -1855,6 +1903,9 @@ __rte_hash_lookup_bulk_lf(const struct
>> rte_hash *h, const void **keys,
>>>> 		rte_prefetch0(secondary_bkt[i]);
>>>> 	}
>>>> 
>>>> +	for (i = 0; i < num_keys; i++)
>>>> +		positions[i] = -ENOENT;
>>> [Wang, Yipeng] So is this for performance reason?
>> Resetting positions[] on each iteration of do…while() require ‘hits’ to be reset
>> as well, which causes performance hit.
>>>> +
>>>> 	do {
>>>> 		/* Load the table change counter before the lookup
>>>> 		 * starts. Acquire semantics will make sure that @@ -1899,7
>> +1950,6
>>>> @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void
>>>> **keys,
>>>> 
>>>> 		/* Compare keys, first hits in primary first */
>>>> 		for (i = 0; i < num_keys; i++) {
>>>> -			positions[i] = -ENOENT;
>>>> 			while (prim_hitmask[i]) {
>>>> 				uint32_t hit_index =
>>>> 						__builtin_ctzl(prim_hitmask[i])
>> @@ -1972,6 +2022,36 @@
>>>> __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
>>>> 			continue;
>>>> 		}
>>>> 
>>>> +
>>>> +		/* all found, do not need to go through ext bkt */
>>>> +		if (hits == ((1ULL << num_keys) - 1)) {
>>>> +			if (hit_mask != NULL)
>>>> +				*hit_mask = hits;
>>> 
>>> [Wang, Yipeng] I think you need to check the version counter before
>>> return, and how about the fence?
>> If all the keys are found, there is no need to check the counter.
>>>> +			return;
>>>> +		}
>>>> +		/* need to check ext buckets for match */
>>>> +		if (h->ext_table_support) {
>>>> +			for (i = 0; i < num_keys; i++) {
>>>> +				if ((hits & (1ULL << i)) != 0)
>>>> +					continue;
>>>> +				next_bkt = secondary_bkt[i]->next;
>>>> +				FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>>>> +					if (data != NULL)
>>>> +						ret = search_one_bucket_lf(h,
>>>> +							keys[i], sig[i],
>>>> +							&data[i], cur_bkt);
>>>> +					else
>>>> +						ret = search_one_bucket_lf(h,
>>>> +								keys[i], sig[i],
>>>> +								NULL,
>> cur_bkt);
>>>> +					if (ret != -1) {
>>>> +						positions[i] = ret;
>>>> +						hits |= 1ULL << i;
>>>> +						break;
>>>> +					}
>>>> +				}
>>>> +			}
>>>> +		}
>>>> 		/* The loads of sig_current in compare_signatures
>>>> 		 * should not move below the load from tbl_chng_cnt.
>>>> 		 */
>>>> @@ -1988,34 +2068,6 @@ __rte_hash_lookup_bulk_lf(const struct
>> rte_hash *h, const void **keys,
>>>> 					__ATOMIC_ACQUIRE);
>>>> 	} while (cnt_b != cnt_a);
>>>> 
>>>> -	/* all found, do not need to go through ext bkt */
>>>> -	if ((hits == ((1ULL << num_keys) - 1)) || !h->ext_table_support) {
>>>> -		if (hit_mask != NULL)
>>>> -			*hit_mask = hits;
>>>> -		__hash_rw_reader_unlock(h);
>>>> -		return;
>>>> -	}
>>>> -
>>>> -	/* need to check ext buckets for match */
>>>> -	for (i = 0; i < num_keys; i++) {
>>>> -		if ((hits & (1ULL << i)) != 0)
>>>> -			continue;
>>>> -		next_bkt = secondary_bkt[i]->next;
>>>> -		FOR_EACH_BUCKET(cur_bkt, next_bkt) {
>>>> -			if (data != NULL)
>>>> -				ret = search_one_bucket_lf(h, keys[i],
>>>> -						sig[i], &data[i], cur_bkt);
>>>> -			else
>>>> -				ret = search_one_bucket_lf(h, keys[i],
>>>> -						sig[i], NULL, cur_bkt);
>>>> -			if (ret != -1) {
>>>> -				positions[i] = ret;
>>>> -				hits |= 1ULL << i;
>>>> -				break;
>>>> -			}
>>>> -		}
>>>> -	}
>>>> -
>>>> 	if (hit_mask != NULL)
>>>> 		*hit_mask = hits;
>>>> }
>>>> diff --git a/lib/librte_hash/rte_cuckoo_hash.h
>>>> b/lib/librte_hash/rte_cuckoo_hash.h
>>>> index eacdaa8d4684..062cc2dd0296 100644
>>>> --- a/lib/librte_hash/rte_cuckoo_hash.h
>>>> +++ b/lib/librte_hash/rte_cuckoo_hash.h
>>>> @@ -129,6 +129,14 @@ struct lcore_cache {
>>>> 
>>>> /* Structure that stores key-value pair */ struct rte_hash_key {
>>>> +	/* Stores index of an empty ext bkt to be recycled on calling
>>>> +	 * rte_hash_del_xxx APIs. When lock free read-wrie concurrency is
>>> [Wang, Yipeng] typo
>> Will update it in the next version.
>>>> +	 * enabled, an empty ext bkt cannot be put into free list immediately
>>>> +	 * (as readers might be using it still). Hence freeing of the ext bkt
>>>> +	 * is piggy-backed to freeing of the key index.
>>>> +	 */
>>> [Wang, Yipeng] I am thinking if this breaks the "guarantee" provided
>>> by ext table, Since a whole bucket could not be reused if one key not
>>> freed. Is there any fundamental issue with a new API to recycle ext bucket or
>> you just do not want to add a new API?
>> With lock-free feature, ‘delete’ becomes a two step process of ‘delete’ and
>> ‘free’. In other words, it can be viewed by the applications as a 'prolonged
>> delete’. I’m not sure how adding a new API to recycle ext bucket will help
>> solving the issue.
>>>> +	uint32_t ext_bkt_to_free;
>>>> +
>>>> 	union {
>>>> 		uintptr_t idata;
>>>> 		void *pdata;
>>>> --
>>>> 2.17.1
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-03-25 20:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-02  0:23 [dpdk-dev] [RFC 0/2] hash: add lock free support for ext bkt Dharmik Thakkar
2019-03-02  0:23 ` [dpdk-dev] [RFC 1/2] hash: add lock free support for extendable bucket Dharmik Thakkar
2019-03-07 17:49   ` Wang, Yipeng1
2019-03-07 22:14     ` Dharmik Thakkar
2019-03-15  0:31       ` Honnappa Nagarahalli
2019-03-15  0:31         ` Honnappa Nagarahalli
2019-03-25 20:06         ` Dharmik Thakkar
2019-03-25 20:06           ` Dharmik Thakkar
2019-03-02  0:23 ` [dpdk-dev] [RFC 2/2] test/hash: lock-free rw concurrency test ext bkt Dharmik Thakkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).