[dpdk-stable] [PATCH 1/3] lib/hash: use ordered loads only if signature matches

patches for DPDK stable branches
 help / color / mirror / Atom feed

* [dpdk-stable] [PATCH 1/3] lib/hash: use ordered loads only if signature matches
       [not found] <20190625211520.43181-1-honnappa.nagarahalli@arm.com>
@ 2019-06-25 21:15 ` Honnappa Nagarahalli
  2019-06-28 18:04   ` Wang, Yipeng1
  2019-06-25 21:15 ` [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare Honnappa Nagarahalli
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-25 21:15 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

Relaxed signature comparison is done first. Further ordered loads
are done only if the signature matches. Any false positives are
caught by the full key comparison.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 35 ++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 953928f27..f37f6957d 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1188,22 +1188,29 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		key_idx = __atomic_load_n(&bkt->key_idx[i],
+		/* Signature comparison is done before the acquire-load
+		 * of the key index to achieve better performance.
+		 * Any false positives will be caught in full comparison
+		 * of the key.
+		 */
+		if (bkt->sig_current[i] == sig) {
+			key_idx = __atomic_load_n(&bkt->key_idx[i],
 					  __ATOMIC_ACQUIRE);
-		if (bkt->sig_current[i] == sig && key_idx != EMPTY_SLOT) {
-			k = (struct rte_hash_key *) ((char *)keys +
-					key_idx * h->key_entry_size);
-			pdata = __atomic_load_n(&k->pdata,
-					__ATOMIC_ACQUIRE);
+			if (key_idx != EMPTY_SLOT) {
+				k = (struct rte_hash_key *) ((char *)keys +
+						key_idx * h->key_entry_size);
+				pdata = __atomic_load_n(&k->pdata,
+						__ATOMIC_ACQUIRE);
 
-			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-				if (data != NULL)
-					*data = pdata;
-				/*
-				 * Return index where key is stored,
-				 * subtracting the first dummy index
-				 */
-				return key_idx - 1;
+				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
+					if (data != NULL)
+						*data = pdata;
+					/*
+					 * Return index where key is stored,
+					 * subtracting the first dummy index
+					 */
+					return key_idx - 1;
+				}
 			}
 		}
 	}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare
       [not found] <20190625211520.43181-1-honnappa.nagarahalli@arm.com>
  2019-06-25 21:15 ` [dpdk-stable] [PATCH 1/3] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
@ 2019-06-25 21:15 ` Honnappa Nagarahalli
  2019-06-28 18:40   ` Wang, Yipeng1
  2019-06-25 21:15 ` [dpdk-stable] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position Honnappa Nagarahalli
       [not found] ` <20190702211634.37940-1-honnappa.nagarahalli@arm.com>
  3 siblings, 1 reply; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-25 21:15 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

When a hash entry is added, there are 2 sets of stores.

1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.

The only ordering requirement between these 2 is that - the store
to the application data must complete before the store to key_index.
There are no ordering requirements between the stores to the
key/signature and store to application data. The synchronization
point for application data can be any point between the 'store to
application data' and 'store to the key_index'. So, pData should not
be a guard variable for the data in hash table. It should be a guard
variable only for the application data written to the memory location
pointed by pData. Hence, pData can be loaded after full key comparison.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
 1 file changed, 32 insertions(+), 35 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f37f6957d..077328fed 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-				/* 'pdata' acts as the synchronization point
-				 * when an existing hash entry is updated.
-				 * Key is not updated in this case.
+				/* The store to application data at *data
+				 * should not leak after the store to pdata
+				 * in the key store. i.e. pdata is the guard
+				 * variable. Release the application data
+				 * to the readers.
 				 */
 				__atomic_store_n(&k->pdata,
 					data,
@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			/* Key can be of arbitrary length, so it is
-			 * not possible to store it atomically.
-			 * Hence the new key element's memory stores
-			 * (key as well as data) should be complete
-			 * before it is referenced.
+			/* Store to signature and key should not
+			 * leak after the store to key_idx. i.e.
+			 * key_idx is the guard variable for signature
+			 * and key.
 			 */
 			__atomic_store_n(&prim_bkt->key_idx[i],
 					 new_idx,
@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
 	new_idx = (uint32_t)((uintptr_t) slot_id);
-	/* Copy key */
-	memcpy(new_k->key, key, h->key_len);
-	/* Key can be of arbitrary length, so it is not possible to store
-	 * it atomically. Hence the new key element's memory stores
-	 * (key as well as data) should be complete before it is referenced.
-	 * 'pdata' acts as the synchronization point when an existing hash
-	 * entry is updated.
+	/* The store to application data (by the application) at *data should
+	 * not leak after the store of pdata in the key store. i.e. pdata is
+	 * the guard variable. Release the application data to the readers.
 	 */
 	__atomic_store_n(&new_k->pdata,
 		data,
 		__ATOMIC_RELEASE);
+	/* Copy key */
+	memcpy(new_k->key, key, h->key_len);
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
@@ -1064,8 +1063,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
 				cur_bkt->sig_current[i] = short_sig;
-				/* Store to signature should not leak after
-				 * the store to key_idx
+				/* Store to signature and key should not
+				 * leak after the store to key_idx. i.e.
+				 * key_idx is the guard variable for signature
+				 * and key.
 				 */
 				__atomic_store_n(&cur_bkt->key_idx[i],
 						 new_idx,
@@ -1087,8 +1088,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
-	/* Store to signature should not leak after
-	 * the store to key_idx
+	/* Store to signature and key should not leak after
+	 * the store to key_idx. i.e. key_idx is the guard variable
+	 * for signature and key.
 	 */
 	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
 			 new_idx,
@@ -1184,7 +1186,6 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 {
 	int i;
 	uint32_t key_idx;
-	void *pdata;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
@@ -1199,12 +1200,13 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 			if (key_idx != EMPTY_SLOT) {
 				k = (struct rte_hash_key *) ((char *)keys +
 						key_idx * h->key_entry_size);
-				pdata = __atomic_load_n(&k->pdata,
-						__ATOMIC_ACQUIRE);
 
 				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-					if (data != NULL)
-						*data = pdata;
+					if (data != NULL) {
+						*data = __atomic_load_n(
+							&k->pdata,
+							__ATOMIC_ACQUIRE);
+					}
 					/*
 					 * Return index where key is stored,
 					 * subtracting the first dummy index
@@ -1902,7 +1904,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	struct rte_hash_bucket *cur_bkt, *next_bkt;
-	void *pdata[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t cnt_b, cnt_a;
 
 	/* Prefetch first keys */
@@ -2004,10 +2005,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					(const char *)h->key_store +
 					key_idx * h->key_entry_size);
 
-				if (key_idx != EMPTY_SLOT)
-					pdata[i] = __atomic_load_n(
-							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
 				/*
 				 * If key index is 0, do not compare key,
 				 * as it is checking the dummy slot
@@ -2016,7 +2013,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = pdata[i];
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2038,10 +2037,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					(const char *)h->key_store +
 					key_idx * h->key_entry_size);
 
-				if (key_idx != EMPTY_SLOT)
-					pdata[i] = __atomic_load_n(
-							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
 				/*
 				 * If key index is 0, do not compare key,
 				 * as it is checking the dummy slot
@@ -2051,7 +2046,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = pdata[i];
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-stable] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
       [not found] <20190625211520.43181-1-honnappa.nagarahalli@arm.com>
  2019-06-25 21:15 ` [dpdk-stable] [PATCH 1/3] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
  2019-06-25 21:15 ` [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare Honnappa Nagarahalli
@ 2019-06-25 21:15 ` Honnappa Nagarahalli
  2019-06-28 18:50   ` Wang, Yipeng1
       [not found] ` <20190702211634.37940-1-honnappa.nagarahalli@arm.com>
  3 siblings, 1 reply; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-25 21:15 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

tbl_chng_cnt is one of the first elements of the structure used in
the lookup. Move it to the beginning of the cache line to gain
performance.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fb19bb27d..af6451b5c 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -170,7 +170,9 @@ struct rte_hash {
 
 	/* Fields used in lookup */
 
-	uint32_t key_len __rte_cache_aligned;
+	uint32_t *tbl_chng_cnt __rte_cache_aligned;
+	/**< Indicates if the hash table changed from last read. */
+	uint32_t key_len;
 	/**< Length of hash key. */
 	uint8_t hw_trans_mem_support;
 	/**< If hardware transactional memory is used. */
@@ -218,8 +220,6 @@ struct rte_hash {
 	 * is piggy-backed to freeing of the key index.
 	 */
 	uint32_t *ext_bkt_to_free;
-	uint32_t *tbl_chng_cnt;
-	/**< Indicates if the hash table changed from last read. */
 } __rte_cache_aligned;
 
 struct queue_node {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-stable] [PATCH 1/3] lib/hash: use ordered loads only if signature matches
  2019-06-25 21:15 ` [dpdk-stable] [PATCH 1/3] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
@ 2019-06-28 18:04   ` Wang, Yipeng1
  0 siblings, 0 replies; 14+ messages in thread
From: Wang, Yipeng1 @ 2019-06-28 18:04 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
>Sent: Tuesday, June 25, 2019 2:15 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; honnappa.nagarahalli@arm.com
>Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org; nd@arm.com; stable@dpdk.org
>Subject: [PATCH 1/3] lib/hash: use ordered loads only if signature matches
>
>Relaxed signature comparison is done first. Further ordered loads
>are done only if the signature matches. Any false positives are
>caught by the full key comparison.
[Wang, Yipeng] add: This commit improves lookup performance.
>
>Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>Cc: stable@dpdk.org
>
>Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>---
> lib/librte_hash/rte_cuckoo_hash.c | 35 ++++++++++++++++++-------------
> 1 file changed, 21 insertions(+), 14 deletions(-)
>
>diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>index 953928f27..f37f6957d 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.c
>+++ b/lib/librte_hash/rte_cuckoo_hash.c
>@@ -1188,22 +1188,29 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
> 	struct rte_hash_key *k, *keys = h->key_store;
>
> 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>-		key_idx = __atomic_load_n(&bkt->key_idx[i],
>+		/* Signature comparison is done before the acquire-load
>+		 * of the key index to achieve better performance.
>+		 * Any false positives will be caught in full comparison
>+		 * of the key.
>+		 */
[Wang, Yipeng] 
A bit more explanation would be helpful for future understanding of the code:
"Signature comparison is done ... performance.
Although this could accidentally cause the reader to read an old signature while the key_idx
is updated to a new key's,  any false positive will be .... of the key."

Otherwise:
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-06-25 21:15 ` [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare Honnappa Nagarahalli
@ 2019-06-28 18:40   ` Wang, Yipeng1
  2019-07-02  4:35     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 14+ messages in thread
From: Wang, Yipeng1 @ 2019-06-28 18:40 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
>Sent: Tuesday, June 25, 2019 2:15 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; honnappa.nagarahalli@arm.com
>Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org; nd@arm.com; stable@dpdk.org
>Subject: [PATCH 2/3] lib/hash: load pData after full key compare
>
>When a hash entry is added, there are 2 sets of stores.
>
>1) The application writes its data to memory (whose address
>is provided in rte_hash_add_key_with_hash_data API (or NULL))
>2) The rte_hash library writes to its own internal data structures;
>key store entry and the hash table.
>
>The only ordering requirement between these 2 is that - the store
>to the application data must complete before the store to key_index.
>There are no ordering requirements between the stores to the
>key/signature and store to application data. The synchronization
>point for application data can be any point between the 'store to
>application data' and 'store to the key_index'. So, pData should not
>be a guard variable for the data in hash table. It should be a guard
>variable only for the application data written to the memory location
>pointed by pData. Hence, pData can be loaded after full key comparison.
>
>Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>Cc: stable@dpdk.org
>
>Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>---
> lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
> 1 file changed, 32 insertions(+), 35 deletions(-)
>
>diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>index f37f6957d..077328fed 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.c
>+++ b/lib/librte_hash/rte_cuckoo_hash.c
>@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
> 			k = (struct rte_hash_key *) ((char *)keys +
> 					bkt->key_idx[i] * h->key_entry_size);
> 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>-				/* 'pdata' acts as the synchronization point
>-				 * when an existing hash entry is updated.
>-				 * Key is not updated in this case.
>+				/* The store to application data at *data
>+				 * should not leak after the store to pdata
>+				 * in the key store. i.e. pdata is the guard
>+				 * variable. Release the application data
>+				 * to the readers.
> 				 */
> 				__atomic_store_n(&k->pdata,
> 					data,
>@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
> 		/* Check if slot is available */
> 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
> 			prim_bkt->sig_current[i] = sig;
>-			/* Key can be of arbitrary length, so it is
>-			 * not possible to store it atomically.
>-			 * Hence the new key element's memory stores
>-			 * (key as well as data) should be complete
>-			 * before it is referenced.
>+			/* Store to signature and key should not
>+			 * leak after the store to key_idx. i.e.
>+			 * key_idx is the guard variable for signature
>+			 * and key.
> 			 */
> 			__atomic_store_n(&prim_bkt->key_idx[i],
> 					 new_idx,
>@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>
> 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
> 	new_idx = (uint32_t)((uintptr_t) slot_id);
>-	/* Copy key */
>-	memcpy(new_k->key, key, h->key_len);
>-	/* Key can be of arbitrary length, so it is not possible to store
>-	 * it atomically. Hence the new key element's memory stores
>-	 * (key as well as data) should be complete before it is referenced.
>-	 * 'pdata' acts as the synchronization point when an existing hash
>-	 * entry is updated.
>+	/* The store to application data (by the application) at *data should
>+	 * not leak after the store of pdata in the key store. i.e. pdata is
>+	 * the guard variable. Release the application data to the readers.
> 	 */
> 	__atomic_store_n(&new_k->pdata,
> 		data,
> 		__ATOMIC_RELEASE);
[Wang, Yipeng] Actually do we need to guard pdata for the newly inserted key?  I thought the guard of key_idx already can make sure
The order for the application to read data. 
>+	/* Copy key */
>+	memcpy(new_k->key, key, h->key_len);
[Wang, Yipeng] You don't need to do the order change just to show the point of unnecessary ordering I think.
I am afraid it may cause further confusion for future people who read this change, especially it is not in the commit
Message (and it is a bug fix). 
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-stable] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
  2019-06-25 21:15 ` [dpdk-stable] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position Honnappa Nagarahalli
@ 2019-06-28 18:50   ` Wang, Yipeng1
  2019-07-02 17:23     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 14+ messages in thread
From: Wang, Yipeng1 @ 2019-06-28 18:50 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
>Sent: Tuesday, June 25, 2019 2:15 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; honnappa.nagarahalli@arm.com
>Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org; nd@arm.com; stable@dpdk.org
>Subject: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
>
>tbl_chng_cnt is one of the first elements of the structure used in
>the lookup. Move it to the beginning of the cache line to gain
>performance.
>
>Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>Cc: stable@dpdk.org
>
>Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>---
> lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
>index fb19bb27d..af6451b5c 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.h
>+++ b/lib/librte_hash/rte_cuckoo_hash.h
>@@ -170,7 +170,9 @@ struct rte_hash {
>
> 	/* Fields used in lookup */
>
>-	uint32_t key_len __rte_cache_aligned;
>+	uint32_t *tbl_chng_cnt __rte_cache_aligned;
>+	/**< Indicates if the hash table changed from last read. */
>+	uint32_t key_len;
> 	/**< Length of hash key. */
> 	uint8_t hw_trans_mem_support;
> 	/**< If hardware transactional memory is used. */
>@@ -218,8 +220,6 @@ struct rte_hash {
> 	 * is piggy-backed to freeing of the key index.
> 	 */
> 	uint32_t *ext_bkt_to_free;
>-	uint32_t *tbl_chng_cnt;
>-	/**< Indicates if the hash table changed from last read. */
> } __rte_cache_aligned;
>
> struct queue_node {
>--
>2.17.1

[Wang, Yipeng] 
I am not sure about this change. By moving counter to front, I think you seems push key_store out of the cache line. And key_store
Is also used in lookup (and more commonly).
My tests also show perf drop in many cases.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-06-28 18:40   ` Wang, Yipeng1
@ 2019-07-02  4:35     ` Honnappa Nagarahalli
  2019-07-04  0:39       ` Wang, Yipeng1
  0 siblings, 1 reply; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-02  4:35 UTC (permalink / raw)
  To: Wang, Yipeng1, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, Honnappa Nagarahalli, nd, stable, nd

Thank you Yipeng for your comments.

> >
> >When a hash entry is added, there are 2 sets of stores.
> >
> >1) The application writes its data to memory (whose address is provided
> >in rte_hash_add_key_with_hash_data API (or NULL))
> >2) The rte_hash library writes to its own internal data structures; key
> >store entry and the hash table.
> >
> >The only ordering requirement between these 2 is that - the store to
> >the application data must complete before the store to key_index.
> >There are no ordering requirements between the stores to the
> >key/signature and store to application data. The synchronization point
> >for application data can be any point between the 'store to application
> >data' and 'store to the key_index'. So, pData should not be a guard
> >variable for the data in hash table. It should be a guard variable only
> >for the application data written to the memory location pointed by
> >pData. Hence, pData can be loaded after full key comparison.
> >
> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
> >Cc: stable@dpdk.org
> >
> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >---
> > lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
> > 1 file changed, 32 insertions(+), 35 deletions(-)
> >
> >diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> >b/lib/librte_hash/rte_cuckoo_hash.c
> >index f37f6957d..077328fed 100644
> >--- a/lib/librte_hash/rte_cuckoo_hash.c
> >+++ b/lib/librte_hash/rte_cuckoo_hash.c
> >@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void
> *data, const void *key,
> > 			k = (struct rte_hash_key *) ((char *)keys +
> > 					bkt->key_idx[i] * h->key_entry_size);
> > 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
> >-				/* 'pdata' acts as the synchronization point
> >-				 * when an existing hash entry is updated.
> >-				 * Key is not updated in this case.
> >+				/* The store to application data at *data
> >+				 * should not leak after the store to pdata
> >+				 * in the key store. i.e. pdata is the guard
> >+				 * variable. Release the application data
> >+				 * to the readers.
> > 				 */
> > 				__atomic_store_n(&k->pdata,
> > 					data,
> >@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct
> rte_hash *h,
> > 		/* Check if slot is available */
> > 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
> > 			prim_bkt->sig_current[i] = sig;
> >-			/* Key can be of arbitrary length, so it is
> >-			 * not possible to store it atomically.
> >-			 * Hence the new key element's memory stores
> >-			 * (key as well as data) should be complete
> >-			 * before it is referenced.
> >+			/* Store to signature and key should not
> >+			 * leak after the store to key_idx. i.e.
> >+			 * key_idx is the guard variable for signature
> >+			 * and key.
> > 			 */
> > 			__atomic_store_n(&prim_bkt->key_idx[i],
> > 					 new_idx,
> >@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct
> >rte_hash *h, const void *key,
> >
> > 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
> > 	new_idx = (uint32_t)((uintptr_t) slot_id);
> >-	/* Copy key */
> >-	memcpy(new_k->key, key, h->key_len);
> >-	/* Key can be of arbitrary length, so it is not possible to store
> >-	 * it atomically. Hence the new key element's memory stores
> >-	 * (key as well as data) should be complete before it is referenced.
> >-	 * 'pdata' acts as the synchronization point when an existing hash
> >-	 * entry is updated.
> >+	/* The store to application data (by the application) at *data should
> >+	 * not leak after the store of pdata in the key store. i.e. pdata is
> >+	 * the guard variable. Release the application data to the readers.
> > 	 */
> > 	__atomic_store_n(&new_k->pdata,
> > 		data,
> > 		__ATOMIC_RELEASE);
> [Wang, Yipeng] Actually do we need to guard pdata for the newly inserted
> key?  I thought the guard of key_idx already can make sure The order for the
> application to read data.
Yes, you are correct. In the hash_add case, the store-release on key_idx would be sufficient. However, hash_update case requires store-release on pData. This was the reason to keep store-release for pData in hash_add when the lock-free algorithm was introduced.

> >+	/* Copy key */
> >+	memcpy(new_k->key, key, h->key_len);
> [Wang, Yipeng] You don't need to do the order change just to show the point
> of unnecessary ordering I think.
> I am afraid it may cause further confusion for future people who read this
> change, especially it is not in the commit Message (and it is a bug fix).
I made this change to keep it inline with the corresponding change in the lookup function. I can add this explanation to the commit message. Please let me know if this is ok for you.

> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-stable] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
  2019-06-28 18:50   ` Wang, Yipeng1
@ 2019-07-02 17:23     ` Honnappa Nagarahalli
  2019-07-04  0:52       ` Wang, Yipeng1
  0 siblings, 1 reply; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-02 17:23 UTC (permalink / raw)
  To: Wang, Yipeng1, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, Honnappa Nagarahalli, nd, stable, nd

<snip>

> >
> >tbl_chng_cnt is one of the first elements of the structure used in the
> >lookup. Move it to the beginning of the cache line to gain performance.
> >
> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
> >Cc: stable@dpdk.org
> >
> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >---
> > lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> >diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> >b/lib/librte_hash/rte_cuckoo_hash.h
> >index fb19bb27d..af6451b5c 100644
> >--- a/lib/librte_hash/rte_cuckoo_hash.h
> >+++ b/lib/librte_hash/rte_cuckoo_hash.h
> >@@ -170,7 +170,9 @@ struct rte_hash {
> >
> > 	/* Fields used in lookup */
> >
> >-	uint32_t key_len __rte_cache_aligned;
> >+	uint32_t *tbl_chng_cnt __rte_cache_aligned;
> >+	/**< Indicates if the hash table changed from last read. */
> >+	uint32_t key_len;
> > 	/**< Length of hash key. */
> > 	uint8_t hw_trans_mem_support;
> > 	/**< If hardware transactional memory is used. */ @@ -218,8 +220,6
> @@
> >struct rte_hash {
> > 	 * is piggy-backed to freeing of the key index.
> > 	 */
> > 	uint32_t *ext_bkt_to_free;
> >-	uint32_t *tbl_chng_cnt;
> >-	/**< Indicates if the hash table changed from last read. */
> > } __rte_cache_aligned;
> >
> > struct queue_node {
> >--
> >2.17.1
> 
> [Wang, Yipeng]
> I am not sure about this change. By moving counter to front, I think you
> seems push key_store out of the cache line. And key_store Is also used in
> lookup (and more commonly).
> My tests also show perf drop in many cases.
I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you?

But, I do agree that this work is not complete. We can drop this patch and take this up separately.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-stable] [PATCH v2 1/2] lib/hash: use ordered loads only if signature matches
       [not found] ` <20190702211634.37940-1-honnappa.nagarahalli@arm.com>
@ 2019-07-02 21:16   ` Honnappa Nagarahalli
  2019-07-02 21:16   ` [dpdk-stable] [PATCH v2 2/2] lib/hash: load pData after full key compare Honnappa Nagarahalli
  1 sibling, 0 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-02 21:16 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

Relaxed signature comparison is done first. Further ordered loads
are done only if the signature matches. Any false positives are
caught by the full key comparison. This provides performance
benefits as load-acquire is executed only when required.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 37 +++++++++++++++++++------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 953928f27..0e042d924 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1188,22 +1188,31 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		key_idx = __atomic_load_n(&bkt->key_idx[i],
+		/* Signature comparison is done before the acquire-load
+		 * of the key index to achieve better performance.
+		 * This can result in the reader loading old signature
+		 * (which matches), while the key_idx is updated to a
+		 * value that belongs to a new key. However, the full
+		 * key comparison will ensure that the lookup fails.
+		 */
+		if (bkt->sig_current[i] == sig) {
+			key_idx = __atomic_load_n(&bkt->key_idx[i],
 					  __ATOMIC_ACQUIRE);
-		if (bkt->sig_current[i] == sig && key_idx != EMPTY_SLOT) {
-			k = (struct rte_hash_key *) ((char *)keys +
-					key_idx * h->key_entry_size);
-			pdata = __atomic_load_n(&k->pdata,
-					__ATOMIC_ACQUIRE);
+			if (key_idx != EMPTY_SLOT) {
+				k = (struct rte_hash_key *) ((char *)keys +
+						key_idx * h->key_entry_size);
+				pdata = __atomic_load_n(&k->pdata,
+						__ATOMIC_ACQUIRE);
 
-			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-				if (data != NULL)
-					*data = pdata;
-				/*
-				 * Return index where key is stored,
-				 * subtracting the first dummy index
-				 */
-				return key_idx - 1;
+				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
+					if (data != NULL)
+						*data = pdata;
+					/*
+					 * Return index where key is stored,
+					 * subtracting the first dummy index
+					 */
+					return key_idx - 1;
+				}
 			}
 		}
 	}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-stable] [PATCH v2 2/2] lib/hash: load pData after full key compare
       [not found] ` <20190702211634.37940-1-honnappa.nagarahalli@arm.com>
  2019-07-02 21:16   ` [dpdk-stable] [PATCH v2 1/2] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
@ 2019-07-02 21:16   ` Honnappa Nagarahalli
  1 sibling, 0 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-02 21:16 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

When a hash entry is added, there are 2 sets of stores.

1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.

The only ordering requirement between these 2 is that - store
to the application data must complete before the store to key_index.
There are no ordering requirements between the stores to
key/signature and store to application data. The synchronization
point for application data can be any point between the 'store to
application data' and 'store to the key_index'. So, 'pdata' should not
be a guard variable for the data in hash table. It should be a guard
variable only for the application data written to the memory location
pointed by 'pdata'. Hence, in the lookup functions, 'pdata' can be
loaded after full key comparison succeeds.

The synchronization point for the application data (store-release
to 'pdata' in key store) is changed to be consistent with the order
of loads in lookup function. However, this change is cosmetic and
does not affect the functionality.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
 1 file changed, 32 insertions(+), 35 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 0e042d924..55c5c1b8a 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-				/* 'pdata' acts as the synchronization point
-				 * when an existing hash entry is updated.
-				 * Key is not updated in this case.
+				/* The store to application data at *data
+				 * should not leak after the store to pdata
+				 * in the key store. i.e. pdata is the guard
+				 * variable. Release the application data
+				 * to the readers.
 				 */
 				__atomic_store_n(&k->pdata,
 					data,
@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			/* Key can be of arbitrary length, so it is
-			 * not possible to store it atomically.
-			 * Hence the new key element's memory stores
-			 * (key as well as data) should be complete
-			 * before it is referenced.
+			/* Store to signature and key should not
+			 * leak after the store to key_idx. i.e.
+			 * key_idx is the guard variable for signature
+			 * and key.
 			 */
 			__atomic_store_n(&prim_bkt->key_idx[i],
 					 new_idx,
@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
 	new_idx = (uint32_t)((uintptr_t) slot_id);
-	/* Copy key */
-	memcpy(new_k->key, key, h->key_len);
-	/* Key can be of arbitrary length, so it is not possible to store
-	 * it atomically. Hence the new key element's memory stores
-	 * (key as well as data) should be complete before it is referenced.
-	 * 'pdata' acts as the synchronization point when an existing hash
-	 * entry is updated.
+	/* The store to application data (by the application) at *data should
+	 * not leak after the store of pdata in the key store. i.e. pdata is
+	 * the guard variable. Release the application data to the readers.
 	 */
 	__atomic_store_n(&new_k->pdata,
 		data,
 		__ATOMIC_RELEASE);
+	/* Copy key */
+	memcpy(new_k->key, key, h->key_len);
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
@@ -1064,8 +1063,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
 				cur_bkt->sig_current[i] = short_sig;
-				/* Store to signature should not leak after
-				 * the store to key_idx
+				/* Store to signature and key should not
+				 * leak after the store to key_idx. i.e.
+				 * key_idx is the guard variable for signature
+				 * and key.
 				 */
 				__atomic_store_n(&cur_bkt->key_idx[i],
 						 new_idx,
@@ -1087,8 +1088,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
-	/* Store to signature should not leak after
-	 * the store to key_idx
+	/* Store to signature and key should not leak after
+	 * the store to key_idx. i.e. key_idx is the guard variable
+	 * for signature and key.
 	 */
 	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
 			 new_idx,
@@ -1184,7 +1186,6 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 {
 	int i;
 	uint32_t key_idx;
-	void *pdata;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
@@ -1201,12 +1202,13 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 			if (key_idx != EMPTY_SLOT) {
 				k = (struct rte_hash_key *) ((char *)keys +
 						key_idx * h->key_entry_size);
-				pdata = __atomic_load_n(&k->pdata,
-						__ATOMIC_ACQUIRE);
 
 				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-					if (data != NULL)
-						*data = pdata;
+					if (data != NULL) {
+						*data = __atomic_load_n(
+							&k->pdata,
+							__ATOMIC_ACQUIRE);
+					}
 					/*
 					 * Return index where key is stored,
 					 * subtracting the first dummy index
@@ -1904,7 +1906,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	struct rte_hash_bucket *cur_bkt, *next_bkt;
-	void *pdata[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t cnt_b, cnt_a;
 
 	/* Prefetch first keys */
@@ -2006,10 +2007,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					(const char *)h->key_store +
 					key_idx * h->key_entry_size);
 
-				if (key_idx != EMPTY_SLOT)
-					pdata[i] = __atomic_load_n(
-							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
 				/*
 				 * If key index is 0, do not compare key,
 				 * as it is checking the dummy slot
@@ -2018,7 +2015,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = pdata[i];
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2040,10 +2039,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					(const char *)h->key_store +
 					key_idx * h->key_entry_size);
 
-				if (key_idx != EMPTY_SLOT)
-					pdata[i] = __atomic_load_n(
-							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
 				/*
 				 * If key index is 0, do not compare key,
 				 * as it is checking the dummy slot
@@ -2053,7 +2048,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = pdata[i];
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-07-02  4:35     ` Honnappa Nagarahalli
@ 2019-07-04  0:39       ` Wang, Yipeng1
  2019-07-05  5:33         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 14+ messages in thread
From: Wang, Yipeng1 @ 2019-07-04  0:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, nd, stable, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Monday, July 1, 2019 9:35 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
>dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org; nd <nd@arm.com>
>Subject: RE: [PATCH 2/3] lib/hash: load pData after full key compare
>
>Thank you Yipeng for your comments.
>
>> >
>> >When a hash entry is added, there are 2 sets of stores.
>> >
>> >1) The application writes its data to memory (whose address is provided
>> >in rte_hash_add_key_with_hash_data API (or NULL))
>> >2) The rte_hash library writes to its own internal data structures; key
>> >store entry and the hash table.
>> >
>> >The only ordering requirement between these 2 is that - the store to
>> >the application data must complete before the store to key_index.
>> >There are no ordering requirements between the stores to the
>> >key/signature and store to application data. The synchronization point
>> >for application data can be any point between the 'store to application
>> >data' and 'store to the key_index'. So, pData should not be a guard
>> >variable for the data in hash table. It should be a guard variable only
>> >for the application data written to the memory location pointed by
>> >pData. Hence, pData can be loaded after full key comparison.
>> >
>> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>> >Cc: stable@dpdk.org
>> >
>> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> >---
>> > lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
>> > 1 file changed, 32 insertions(+), 35 deletions(-)
>> >
>> >diff --git a/lib/librte_hash/rte_cuckoo_hash.c
>> >b/lib/librte_hash/rte_cuckoo_hash.c
>> >index f37f6957d..077328fed 100644
>> >--- a/lib/librte_hash/rte_cuckoo_hash.c
>> >+++ b/lib/librte_hash/rte_cuckoo_hash.c
>> >@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void
>> *data, const void *key,
>> > 			k = (struct rte_hash_key *) ((char *)keys +
>> > 					bkt->key_idx[i] * h->key_entry_size);
>> > 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>> >-				/* 'pdata' acts as the synchronization point
>> >-				 * when an existing hash entry is updated.
>> >-				 * Key is not updated in this case.
>> >+				/* The store to application data at *data
>> >+				 * should not leak after the store to pdata
>> >+				 * in the key store. i.e. pdata is the guard
>> >+				 * variable. Release the application data
>> >+				 * to the readers.
>> > 				 */
>> > 				__atomic_store_n(&k->pdata,
>> > 					data,
>> >@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct
>> rte_hash *h,
>> > 		/* Check if slot is available */
>> > 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
>> > 			prim_bkt->sig_current[i] = sig;
>> >-			/* Key can be of arbitrary length, so it is
>> >-			 * not possible to store it atomically.
>> >-			 * Hence the new key element's memory stores
>> >-			 * (key as well as data) should be complete
>> >-			 * before it is referenced.
>> >+			/* Store to signature and key should not
>> >+			 * leak after the store to key_idx. i.e.
>> >+			 * key_idx is the guard variable for signature
>> >+			 * and key.
>> > 			 */
>> > 			__atomic_store_n(&prim_bkt->key_idx[i],
>> > 					 new_idx,
>> >@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct
>> >rte_hash *h, const void *key,
>> >
>> > 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
>> > 	new_idx = (uint32_t)((uintptr_t) slot_id);
>> >-	/* Copy key */
>> >-	memcpy(new_k->key, key, h->key_len);
>> >-	/* Key can be of arbitrary length, so it is not possible to store
>> >-	 * it atomically. Hence the new key element's memory stores
>> >-	 * (key as well as data) should be complete before it is referenced.
>> >-	 * 'pdata' acts as the synchronization point when an existing hash
>> >-	 * entry is updated.
>> >+	/* The store to application data (by the application) at *data should
>> >+	 * not leak after the store of pdata in the key store. i.e. pdata is
>> >+	 * the guard variable. Release the application data to the readers.
>> > 	 */
>> > 	__atomic_store_n(&new_k->pdata,
>> > 		data,
>> > 		__ATOMIC_RELEASE);
>> [Wang, Yipeng] Actually do we need to guard pdata for the newly inserted
>> key?  I thought the guard of key_idx already can make sure The order for the
>> application to read data.
>Yes, you are correct. In the hash_add case, the store-release on key_idx would be sufficient. However, hash_update case requires
>store-release on pData. This was the reason to keep store-release for pData in hash_add when the lock-free algorithm was
>introduced.

[Wang, Yipeng] Sorry that I am still a bit confused, we already have store release in search_and_update function right? Isn't that enough
for the hash_update case?
>
>> >+	/* Copy key */
>> >+	memcpy(new_k->key, key, h->key_len);
>> [Wang, Yipeng] You don't need to do the order change just to show the point
>> of unnecessary ordering I think.
>> I am afraid it may cause further confusion for future people who read this
>> change, especially it is not in the commit Message (and it is a bug fix).
>I made this change to keep it inline with the corresponding change in the lookup function. I can add this explanation to the commit
>message. Please let me know if this is ok for you.

[Wang, Yipeng] Thanks for the change.
To me it still looks unnecessary but If you think this cosmetic change would help others to understand the code better, I am OK with it.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-stable] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
  2019-07-02 17:23     ` Honnappa Nagarahalli
@ 2019-07-04  0:52       ` Wang, Yipeng1
  0 siblings, 0 replies; 14+ messages in thread
From: Wang, Yipeng1 @ 2019-07-04  0:52 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, nd, stable, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Tuesday, July 2, 2019 10:23 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
>dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org; nd <nd@arm.com>
>Subject: RE: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
>
><snip>
>
>> >
>> >tbl_chng_cnt is one of the first elements of the structure used in the
>> >lookup. Move it to the beginning of the cache line to gain performance.
>> >
>> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>> >Cc: stable@dpdk.org
>> >
>> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> >---
>> > lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
>> > 1 file changed, 3 insertions(+), 3 deletions(-)
>> >
>> >diff --git a/lib/librte_hash/rte_cuckoo_hash.h
>> >b/lib/librte_hash/rte_cuckoo_hash.h
>> >index fb19bb27d..af6451b5c 100644
>> >--- a/lib/librte_hash/rte_cuckoo_hash.h
>> >+++ b/lib/librte_hash/rte_cuckoo_hash.h
>> >@@ -170,7 +170,9 @@ struct rte_hash {
>> >
>> > 	/* Fields used in lookup */
>> >
>> >-	uint32_t key_len __rte_cache_aligned;
>> >+	uint32_t *tbl_chng_cnt __rte_cache_aligned;
>> >+	/**< Indicates if the hash table changed from last read. */
>> >+	uint32_t key_len;
>> > 	/**< Length of hash key. */
>> > 	uint8_t hw_trans_mem_support;
>> > 	/**< If hardware transactional memory is used. */ @@ -218,8 +220,6
>> @@
>> >struct rte_hash {
>> > 	 * is piggy-backed to freeing of the key index.
>> > 	 */
>> > 	uint32_t *ext_bkt_to_free;
>> >-	uint32_t *tbl_chng_cnt;
>> >-	/**< Indicates if the hash table changed from last read. */
>> > } __rte_cache_aligned;
>> >
>> > struct queue_node {
>> >--
>> >2.17.1
>>
>> [Wang, Yipeng]
>> I am not sure about this change. By moving counter to front, I think you
>> seems push key_store out of the cache line. And key_store Is also used in
>> lookup (and more commonly).
>> My tests also show perf drop in many cases.
>I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm
>platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you?
>
>But, I do agree that this work is not complete. We can drop this patch and take this up separately.
[Wang, Yipeng] 
I run the LF test on x86. For 1 reader case it seems a clear improvement for lookup without key-shifts. Other results are a mixed bag even
for lock-free. For most HTM it is a drop.

My opinion is to delay similar fine tuning effort tied to architecture specs. 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-07-04  0:39       ` Wang, Yipeng1
@ 2019-07-05  5:33         ` Honnappa Nagarahalli
  2019-07-08 16:44           ` Wang, Yipeng1
  0 siblings, 1 reply; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-05  5:33 UTC (permalink / raw)
  To: Wang, Yipeng1, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, nd, stable, nd, nd

> >Subject: RE: [PATCH 2/3] lib/hash: load pData after full key compare
> >
> >Thank you Yipeng for your comments.
> >
> >> >
> >> >When a hash entry is added, there are 2 sets of stores.
> >> >
> >> >1) The application writes its data to memory (whose address is
> >> >provided in rte_hash_add_key_with_hash_data API (or NULL))
> >> >2) The rte_hash library writes to its own internal data structures;
> >> >key store entry and the hash table.
> >> >
> >> >The only ordering requirement between these 2 is that - the store to
> >> >the application data must complete before the store to key_index.
> >> >There are no ordering requirements between the stores to the
> >> >key/signature and store to application data. The synchronization
> >> >point for application data can be any point between the 'store to
> >> >application data' and 'store to the key_index'. So, pData should not
> >> >be a guard variable for the data in hash table. It should be a guard
> >> >variable only for the application data written to the memory
> >> >location pointed by pData. Hence, pData can be loaded after full key
> comparison.
> >> >
> >> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
> >> >Cc: stable@dpdk.org
> >> >
> >> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> >---
> >> > lib/librte_hash/rte_cuckoo_hash.c | 67
> >> >+++++++++++++++----------------
> >> > 1 file changed, 32 insertions(+), 35 deletions(-)
> >> >
> >> >diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> >> >b/lib/librte_hash/rte_cuckoo_hash.c
> >> >index f37f6957d..077328fed 100644
> >> >--- a/lib/librte_hash/rte_cuckoo_hash.c
> >> >+++ b/lib/librte_hash/rte_cuckoo_hash.c
> >> >@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h,
> >> >void
> >> *data, const void *key,
> >> > 			k = (struct rte_hash_key *) ((char *)keys +
> >> > 					bkt->key_idx[i] * h->key_entry_size);
> >> > 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
> >> >-				/* 'pdata' acts as the synchronization point
> >> >-				 * when an existing hash entry is updated.
> >> >-				 * Key is not updated in this case.
> >> >+				/* The store to application data at *data
> >> >+				 * should not leak after the store to pdata
> >> >+				 * in the key store. i.e. pdata is the guard
> >> >+				 * variable. Release the application data
> >> >+				 * to the readers.
> >> > 				 */
> >> > 				__atomic_store_n(&k->pdata,
> >> > 					data,
> >> >@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct
> >> rte_hash *h,
> >> > 		/* Check if slot is available */
> >> > 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
> >> > 			prim_bkt->sig_current[i] = sig;
> >> >-			/* Key can be of arbitrary length, so it is
> >> >-			 * not possible to store it atomically.
> >> >-			 * Hence the new key element's memory stores
> >> >-			 * (key as well as data) should be complete
> >> >-			 * before it is referenced.
> >> >+			/* Store to signature and key should not
> >> >+			 * leak after the store to key_idx. i.e.
> >> >+			 * key_idx is the guard variable for signature
> >> >+			 * and key.
> >> > 			 */
> >> > 			__atomic_store_n(&prim_bkt->key_idx[i],
> >> > 					 new_idx,
> >> >@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct
> >> >rte_hash *h, const void *key,
> >> >
> >> > 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
> >> > 	new_idx = (uint32_t)((uintptr_t) slot_id);
> >> >-	/* Copy key */
> >> >-	memcpy(new_k->key, key, h->key_len);
> >> >-	/* Key can be of arbitrary length, so it is not possible to store
> >> >-	 * it atomically. Hence the new key element's memory stores
> >> >-	 * (key as well as data) should be complete before it is referenced.
> >> >-	 * 'pdata' acts as the synchronization point when an existing hash
> >> >-	 * entry is updated.
> >> >+	/* The store to application data (by the application) at *data should
> >> >+	 * not leak after the store of pdata in the key store. i.e. pdata is
> >> >+	 * the guard variable. Release the application data to the readers.
> >> > 	 */
> >> > 	__atomic_store_n(&new_k->pdata,
> >> > 		data,
> >> > 		__ATOMIC_RELEASE);
> >> [Wang, Yipeng] Actually do we need to guard pdata for the newly
> >> inserted key?  I thought the guard of key_idx already can make sure
> >> The order for the application to read data.
> >Yes, you are correct. In the hash_add case, the store-release on
> >key_idx would be sufficient. However, hash_update case requires
> >store-release on pData. This was the reason to keep store-release for pData
> in hash_add when the lock-free algorithm was introduced.
> 
> [Wang, Yipeng] Sorry that I am still a bit confused, we already have store
> release in search_and_update function right? Isn't that enough for the
> hash_update case?
No problem, looks like I did not use the correct terms. We are talking about 2 paths in the code:
1) When a new key is getting inserted, store-release of key_idx acts as the guard variable for the store to application data as well as the stores to internal hash table structures (signature, key, pdata).
2) But when an existing key is updated, there is no store to key_idx. In this case 'pdata' acts as the guard variable for the store to application data. Hence we need a store-release on 'pdata'. Due to this we need a load-acquire on 'pdata' in the lookup function.

Then, why do we need store-release on 'pdata' in code path 1? - store-release on 'pdata' in code path 1 is done for consistency with code path 2 i.e. we want to use store-release on 'pdata' consistently in both the code paths (unless we see performance degradation in path 1). IMO, it is much easier to understand the code this way.

> >
> >> >+	/* Copy key */
> >> >+	memcpy(new_k->key, key, h->key_len);
> >> [Wang, Yipeng] You don't need to do the order change just to show the
> >> point of unnecessary ordering I think.
> >> I am afraid it may cause further confusion for future people who read
> >> this change, especially it is not in the commit Message (and it is a bug fix).
> >I made this change to keep it inline with the corresponding change in
> >the lookup function. I can add this explanation to the commit message.
> Please let me know if this is ok for you.
> 
> [Wang, Yipeng] Thanks for the change.
> To me it still looks unnecessary but If you think this cosmetic change would
> help others to understand the code better, I am OK with it.
I agree this is unnecessary. When the change was being reviewed internally at Arm, I had not made this change initially. It resulted in questions as the new key insert's memory ordering steps did not match that of the lookup function. IMO, if we look at the algorithm as a whole (instead of looking at this commit alone), this code will be easier to understand.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-07-05  5:33         ` Honnappa Nagarahalli
@ 2019-07-08 16:44           ` Wang, Yipeng1
  0 siblings, 0 replies; 14+ messages in thread
From: Wang, Yipeng1 @ 2019-07-08 16:44 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, nd, stable, nd, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Thursday, July 4, 2019 10:33 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
>dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
>Subject: RE: [PATCH 2/3] lib/hash: load pData after full key compare
>
>> >Subject: RE: [PATCH 2/3] lib/hash: load pData after full key compare
>> >
>> >Thank you Yipeng for your comments.
>> >
>> >> >
>> >> >When a hash entry is added, there are 2 sets of stores.
>> >> >
>> >> >1) The application writes its data to memory (whose address is
>> >> >provided in rte_hash_add_key_with_hash_data API (or NULL))
>> >> >2) The rte_hash library writes to its own internal data structures;
>> >> >key store entry and the hash table.
>> >> >
>> >> >The only ordering requirement between these 2 is that - the store to
>> >> >the application data must complete before the store to key_index.
>> >> >There are no ordering requirements between the stores to the
>> >> >key/signature and store to application data. The synchronization
>> >> >point for application data can be any point between the 'store to
>> >> >application data' and 'store to the key_index'. So, pData should not
>> >> >be a guard variable for the data in hash table. It should be a guard
>> >> >variable only for the application data written to the memory
>> >> >location pointed by pData. Hence, pData can be loaded after full key
>> comparison.
>> >> >
>> >> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>> >> >Cc: stable@dpdk.org
>> >> >
>> >> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> >> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> >> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> >> >---
>> >> > lib/librte_hash/rte_cuckoo_hash.c | 67
>> >> >+++++++++++++++----------------
>> >> > 1 file changed, 32 insertions(+), 35 deletions(-)
>> >> >
>> >> >diff --git a/lib/librte_hash/rte_cuckoo_hash.c
>> >> >b/lib/librte_hash/rte_cuckoo_hash.c
>> >> >index f37f6957d..077328fed 100644
>> >> >--- a/lib/librte_hash/rte_cuckoo_hash.c
>> >> >+++ b/lib/librte_hash/rte_cuckoo_hash.c
>> >> >@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h,
>> >> >void
>> >> *data, const void *key,
>> >> > 			k = (struct rte_hash_key *) ((char *)keys +
>> >> > 					bkt->key_idx[i] * h->key_entry_size);
>> >> > 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>> >> >-				/* 'pdata' acts as the synchronization point
>> >> >-				 * when an existing hash entry is updated.
>> >> >-				 * Key is not updated in this case.
>> >> >+				/* The store to application data at *data
>> >> >+				 * should not leak after the store to pdata
>> >> >+				 * in the key store. i.e. pdata is the guard
>> >> >+				 * variable. Release the application data
>> >> >+				 * to the readers.
>> >> > 				 */
>> >> > 				__atomic_store_n(&k->pdata,
>> >> > 					data,
>> >> >@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct
>> >> rte_hash *h,
>> >> > 		/* Check if slot is available */
>> >> > 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
>> >> > 			prim_bkt->sig_current[i] = sig;
>> >> >-			/* Key can be of arbitrary length, so it is
>> >> >-			 * not possible to store it atomically.
>> >> >-			 * Hence the new key element's memory stores
>> >> >-			 * (key as well as data) should be complete
>> >> >-			 * before it is referenced.
>> >> >+			/* Store to signature and key should not
>> >> >+			 * leak after the store to key_idx. i.e.
>> >> >+			 * key_idx is the guard variable for signature
>> >> >+			 * and key.
>> >> > 			 */
>> >> > 			__atomic_store_n(&prim_bkt->key_idx[i],
>> >> > 					 new_idx,
>> >> >@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct
>> >> >rte_hash *h, const void *key,
>> >> >
>> >> > 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
>> >> > 	new_idx = (uint32_t)((uintptr_t) slot_id);
>> >> >-	/* Copy key */
>> >> >-	memcpy(new_k->key, key, h->key_len);
>> >> >-	/* Key can be of arbitrary length, so it is not possible to store
>> >> >-	 * it atomically. Hence the new key element's memory stores
>> >> >-	 * (key as well as data) should be complete before it is referenced.
>> >> >-	 * 'pdata' acts as the synchronization point when an existing hash
>> >> >-	 * entry is updated.
>> >> >+	/* The store to application data (by the application) at *data should
>> >> >+	 * not leak after the store of pdata in the key store. i.e. pdata is
>> >> >+	 * the guard variable. Release the application data to the readers.
>> >> > 	 */
>> >> > 	__atomic_store_n(&new_k->pdata,
>> >> > 		data,
>> >> > 		__ATOMIC_RELEASE);
>> >> [Wang, Yipeng] Actually do we need to guard pdata for the newly
>> >> inserted key?  I thought the guard of key_idx already can make sure
>> >> The order for the application to read data.
>> >Yes, you are correct. In the hash_add case, the store-release on
>> >key_idx would be sufficient. However, hash_update case requires
>> >store-release on pData. This was the reason to keep store-release for pData
>> in hash_add when the lock-free algorithm was introduced.
>>
>> [Wang, Yipeng] Sorry that I am still a bit confused, we already have store
>> release in search_and_update function right? Isn't that enough for the
>> hash_update case?
>No problem, looks like I did not use the correct terms. We are talking about 2 paths in the code:
>1) When a new key is getting inserted, store-release of key_idx acts as the guard variable for the store to application data as well as
>the stores to internal hash table structures (signature, key, pdata).
>2) But when an existing key is updated, there is no store to key_idx. In this case 'pdata' acts as the guard variable for the store to
>application data. Hence we need a store-release on 'pdata'. Due to this we need a load-acquire on 'pdata' in the lookup function.
>
>Then, why do we need store-release on 'pdata' in code path 1? - store-release on 'pdata' in code path 1 is done for consistency with
>code path 2 i.e. we want to use store-release on 'pdata' consistently in both the code paths (unless we see performance degradation
>in path 1). IMO, it is much easier to understand the code this way.
>
>> >
>> >> >+	/* Copy key */
>> >> >+	memcpy(new_k->key, key, h->key_len);
>> >> [Wang, Yipeng] You don't need to do the order change just to show the
>> >> point of unnecessary ordering I think.
>> >> I am afraid it may cause further confusion for future people who read
>> >> this change, especially it is not in the commit Message (and it is a bug fix).
>> >I made this change to keep it inline with the corresponding change in
>> >the lookup function. I can add this explanation to the commit message.
>> Please let me know if this is ok for you.
>>
>> [Wang, Yipeng] Thanks for the change.
>> To me it still looks unnecessary but If you think this cosmetic change would
>> help others to understand the code better, I am OK with it.
>I agree this is unnecessary. When the change was being reviewed internally at Arm, I had not made this change initially. It resulted in
>questions as the new key insert's memory ordering steps did not match that of the lookup function. IMO, if we look at the algorithm
>as a whole (instead of looking at this commit alone), this code will be easier to understand.
[Wang, Yipeng]

Acked-by: Yipeng Wang <yipeng1.wang@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-07-08 16:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190625211520.43181-1-honnappa.nagarahalli@arm.com>
2019-06-25 21:15 ` [dpdk-stable] [PATCH 1/3] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
2019-06-28 18:04   ` Wang, Yipeng1
2019-06-25 21:15 ` [dpdk-stable] [PATCH 2/3] lib/hash: load pData after full key compare Honnappa Nagarahalli
2019-06-28 18:40   ` Wang, Yipeng1
2019-07-02  4:35     ` Honnappa Nagarahalli
2019-07-04  0:39       ` Wang, Yipeng1
2019-07-05  5:33         ` Honnappa Nagarahalli
2019-07-08 16:44           ` Wang, Yipeng1
2019-06-25 21:15 ` [dpdk-stable] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position Honnappa Nagarahalli
2019-06-28 18:50   ` Wang, Yipeng1
2019-07-02 17:23     ` Honnappa Nagarahalli
2019-07-04  0:52       ` Wang, Yipeng1
     [not found] ` <20190702211634.37940-1-honnappa.nagarahalli@arm.com>
2019-07-02 21:16   ` [dpdk-stable] [PATCH v2 1/2] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
2019-07-02 21:16   ` [dpdk-stable] [PATCH v2 2/2] lib/hash: load pData after full key compare Honnappa Nagarahalli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).