DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] lib/hash: perf improvements for lock-free
@ 2019-06-25 21:15 Honnappa Nagarahalli
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 1/3] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-25 21:15 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd

While using the rte_hash library, there are 2 sets of stores that
happen.

1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.

The data from both of these stores is available to the readers only
after the index of the key store entry (key_index) is written in the
hash buckets by the library. So, key_index can act as the guard variable
for both of these writes.

When rte_hash_add_key_with_hash_data is called to update an existing entry,
the key_index is not written. But, the store to the application data
must complete before the address of the application data (pData) is
updated in the key store entry. So, pData alone acts as the guard variable
for this case.

However, it should be noted that there are no ordering requirements
between 1) and 2), except for one requirement - the store to the
application data must complete before the store to key_index.
In other words, there are no ordering requirements between the stores to
the key store entry/signature and store to application data. So, the
synchronization point for application data can be any point between
the 'store to application data' and 'store to the key_index'.

The first patch in this series moves the signature comparison before the
load-acquire of the key_index. This does not result in any issues because
of the full key comparison which is done after the load-acquire of
the key_index.
Performance improvements:
Lookup Hit: 6.16%
Lookup Miss: 8.54%

The second patch in this series, moves the store-release of pData
before the store to any hash internal data structures. This is not
necessary, but just helps to show the non-dependency between application
data and hash table data. On the reader side, the pData is loaded
only if the keys match, this provides performance benefits.
Performance improvements (with patch 1):
Lookup Hit: 6.25%
Lookup Miss: 13.97%

The third patch moves the table change counter to the beginning of
the cache line as it is one of the first fields accessed in the lookup
function. Further work is required to adjust rte_hash structure
to gain more performance.
Performance improvements (with patch 1 and patch 2):
Lookup Hit: 2.13%
Lookup Miss: 1.56%

Honnappa Nagarahalli (3):
  lib/hash: use ordered loads only if signature matches
  lib/hash: load pData after full key compare
  lib/hash: adjust tbl_chng_cnt position

 lib/librte_hash/rte_cuckoo_hash.c | 96 ++++++++++++++++---------------
 lib/librte_hash/rte_cuckoo_hash.h |  6 +-
 2 files changed, 53 insertions(+), 49 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH 1/3] lib/hash: use ordered loads only if signature matches
  2019-06-25 21:15 [dpdk-dev] [PATCH 0/3] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
@ 2019-06-25 21:15 ` Honnappa Nagarahalli
  2019-06-28 18:04   ` Wang, Yipeng1
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare Honnappa Nagarahalli
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-25 21:15 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

Relaxed signature comparison is done first. Further ordered loads
are done only if the signature matches. Any false positives are
caught by the full key comparison.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 35 ++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 953928f27..f37f6957d 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1188,22 +1188,29 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		key_idx = __atomic_load_n(&bkt->key_idx[i],
+		/* Signature comparison is done before the acquire-load
+		 * of the key index to achieve better performance.
+		 * Any false positives will be caught in full comparison
+		 * of the key.
+		 */
+		if (bkt->sig_current[i] == sig) {
+			key_idx = __atomic_load_n(&bkt->key_idx[i],
 					  __ATOMIC_ACQUIRE);
-		if (bkt->sig_current[i] == sig && key_idx != EMPTY_SLOT) {
-			k = (struct rte_hash_key *) ((char *)keys +
-					key_idx * h->key_entry_size);
-			pdata = __atomic_load_n(&k->pdata,
-					__ATOMIC_ACQUIRE);
+			if (key_idx != EMPTY_SLOT) {
+				k = (struct rte_hash_key *) ((char *)keys +
+						key_idx * h->key_entry_size);
+				pdata = __atomic_load_n(&k->pdata,
+						__ATOMIC_ACQUIRE);
 
-			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-				if (data != NULL)
-					*data = pdata;
-				/*
-				 * Return index where key is stored,
-				 * subtracting the first dummy index
-				 */
-				return key_idx - 1;
+				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
+					if (data != NULL)
+						*data = pdata;
+					/*
+					 * Return index where key is stored,
+					 * subtracting the first dummy index
+					 */
+					return key_idx - 1;
+				}
 			}
 		}
 	}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-06-25 21:15 [dpdk-dev] [PATCH 0/3] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 1/3] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
@ 2019-06-25 21:15 ` Honnappa Nagarahalli
  2019-06-28 18:40   ` Wang, Yipeng1
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position Honnappa Nagarahalli
  2019-07-02 21:16 ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
  3 siblings, 1 reply; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-25 21:15 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

When a hash entry is added, there are 2 sets of stores.

1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.

The only ordering requirement between these 2 is that - the store
to the application data must complete before the store to key_index.
There are no ordering requirements between the stores to the
key/signature and store to application data. The synchronization
point for application data can be any point between the 'store to
application data' and 'store to the key_index'. So, pData should not
be a guard variable for the data in hash table. It should be a guard
variable only for the application data written to the memory location
pointed by pData. Hence, pData can be loaded after full key comparison.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
 1 file changed, 32 insertions(+), 35 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f37f6957d..077328fed 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-				/* 'pdata' acts as the synchronization point
-				 * when an existing hash entry is updated.
-				 * Key is not updated in this case.
+				/* The store to application data at *data
+				 * should not leak after the store to pdata
+				 * in the key store. i.e. pdata is the guard
+				 * variable. Release the application data
+				 * to the readers.
 				 */
 				__atomic_store_n(&k->pdata,
 					data,
@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			/* Key can be of arbitrary length, so it is
-			 * not possible to store it atomically.
-			 * Hence the new key element's memory stores
-			 * (key as well as data) should be complete
-			 * before it is referenced.
+			/* Store to signature and key should not
+			 * leak after the store to key_idx. i.e.
+			 * key_idx is the guard variable for signature
+			 * and key.
 			 */
 			__atomic_store_n(&prim_bkt->key_idx[i],
 					 new_idx,
@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
 	new_idx = (uint32_t)((uintptr_t) slot_id);
-	/* Copy key */
-	memcpy(new_k->key, key, h->key_len);
-	/* Key can be of arbitrary length, so it is not possible to store
-	 * it atomically. Hence the new key element's memory stores
-	 * (key as well as data) should be complete before it is referenced.
-	 * 'pdata' acts as the synchronization point when an existing hash
-	 * entry is updated.
+	/* The store to application data (by the application) at *data should
+	 * not leak after the store of pdata in the key store. i.e. pdata is
+	 * the guard variable. Release the application data to the readers.
 	 */
 	__atomic_store_n(&new_k->pdata,
 		data,
 		__ATOMIC_RELEASE);
+	/* Copy key */
+	memcpy(new_k->key, key, h->key_len);
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
@@ -1064,8 +1063,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
 				cur_bkt->sig_current[i] = short_sig;
-				/* Store to signature should not leak after
-				 * the store to key_idx
+				/* Store to signature and key should not
+				 * leak after the store to key_idx. i.e.
+				 * key_idx is the guard variable for signature
+				 * and key.
 				 */
 				__atomic_store_n(&cur_bkt->key_idx[i],
 						 new_idx,
@@ -1087,8 +1088,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
-	/* Store to signature should not leak after
-	 * the store to key_idx
+	/* Store to signature and key should not leak after
+	 * the store to key_idx. i.e. key_idx is the guard variable
+	 * for signature and key.
 	 */
 	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
 			 new_idx,
@@ -1184,7 +1186,6 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 {
 	int i;
 	uint32_t key_idx;
-	void *pdata;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
@@ -1199,12 +1200,13 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 			if (key_idx != EMPTY_SLOT) {
 				k = (struct rte_hash_key *) ((char *)keys +
 						key_idx * h->key_entry_size);
-				pdata = __atomic_load_n(&k->pdata,
-						__ATOMIC_ACQUIRE);
 
 				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-					if (data != NULL)
-						*data = pdata;
+					if (data != NULL) {
+						*data = __atomic_load_n(
+							&k->pdata,
+							__ATOMIC_ACQUIRE);
+					}
 					/*
 					 * Return index where key is stored,
 					 * subtracting the first dummy index
@@ -1902,7 +1904,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	struct rte_hash_bucket *cur_bkt, *next_bkt;
-	void *pdata[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t cnt_b, cnt_a;
 
 	/* Prefetch first keys */
@@ -2004,10 +2005,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					(const char *)h->key_store +
 					key_idx * h->key_entry_size);
 
-				if (key_idx != EMPTY_SLOT)
-					pdata[i] = __atomic_load_n(
-							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
 				/*
 				 * If key index is 0, do not compare key,
 				 * as it is checking the dummy slot
@@ -2016,7 +2013,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = pdata[i];
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2038,10 +2037,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					(const char *)h->key_store +
 					key_idx * h->key_entry_size);
 
-				if (key_idx != EMPTY_SLOT)
-					pdata[i] = __atomic_load_n(
-							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
 				/*
 				 * If key index is 0, do not compare key,
 				 * as it is checking the dummy slot
@@ -2051,7 +2046,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = pdata[i];
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
  2019-06-25 21:15 [dpdk-dev] [PATCH 0/3] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 1/3] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare Honnappa Nagarahalli
@ 2019-06-25 21:15 ` Honnappa Nagarahalli
  2019-06-28 18:50   ` Wang, Yipeng1
  2019-07-02 21:16 ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
  3 siblings, 1 reply; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-25 21:15 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

tbl_chng_cnt is one of the first elements of the structure used in
the lookup. Move it to the beginning of the cache line to gain
performance.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fb19bb27d..af6451b5c 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -170,7 +170,9 @@ struct rte_hash {
 
 	/* Fields used in lookup */
 
-	uint32_t key_len __rte_cache_aligned;
+	uint32_t *tbl_chng_cnt __rte_cache_aligned;
+	/**< Indicates if the hash table changed from last read. */
+	uint32_t key_len;
 	/**< Length of hash key. */
 	uint8_t hw_trans_mem_support;
 	/**< If hardware transactional memory is used. */
@@ -218,8 +220,6 @@ struct rte_hash {
 	 * is piggy-backed to freeing of the key index.
 	 */
 	uint32_t *ext_bkt_to_free;
-	uint32_t *tbl_chng_cnt;
-	/**< Indicates if the hash table changed from last read. */
 } __rte_cache_aligned;
 
 struct queue_node {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] lib/hash: use ordered loads only if signature matches
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 1/3] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
@ 2019-06-28 18:04   ` Wang, Yipeng1
  0 siblings, 0 replies; 23+ messages in thread
From: Wang, Yipeng1 @ 2019-06-28 18:04 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
>Sent: Tuesday, June 25, 2019 2:15 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; honnappa.nagarahalli@arm.com
>Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org; nd@arm.com; stable@dpdk.org
>Subject: [PATCH 1/3] lib/hash: use ordered loads only if signature matches
>
>Relaxed signature comparison is done first. Further ordered loads
>are done only if the signature matches. Any false positives are
>caught by the full key comparison.
[Wang, Yipeng] add: This commit improves lookup performance.
>
>Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>Cc: stable@dpdk.org
>
>Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>---
> lib/librte_hash/rte_cuckoo_hash.c | 35 ++++++++++++++++++-------------
> 1 file changed, 21 insertions(+), 14 deletions(-)
>
>diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>index 953928f27..f37f6957d 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.c
>+++ b/lib/librte_hash/rte_cuckoo_hash.c
>@@ -1188,22 +1188,29 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
> 	struct rte_hash_key *k, *keys = h->key_store;
>
> 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
>-		key_idx = __atomic_load_n(&bkt->key_idx[i],
>+		/* Signature comparison is done before the acquire-load
>+		 * of the key index to achieve better performance.
>+		 * Any false positives will be caught in full comparison
>+		 * of the key.
>+		 */
[Wang, Yipeng] 
A bit more explanation would be helpful for future understanding of the code:
"Signature comparison is done ... performance.
Although this could accidentally cause the reader to read an old signature while the key_idx
is updated to a new key's,  any false positive will be .... of the key."

Otherwise:
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare Honnappa Nagarahalli
@ 2019-06-28 18:40   ` Wang, Yipeng1
  2019-07-02  4:35     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 23+ messages in thread
From: Wang, Yipeng1 @ 2019-06-28 18:40 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
>Sent: Tuesday, June 25, 2019 2:15 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; honnappa.nagarahalli@arm.com
>Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org; nd@arm.com; stable@dpdk.org
>Subject: [PATCH 2/3] lib/hash: load pData after full key compare
>
>When a hash entry is added, there are 2 sets of stores.
>
>1) The application writes its data to memory (whose address
>is provided in rte_hash_add_key_with_hash_data API (or NULL))
>2) The rte_hash library writes to its own internal data structures;
>key store entry and the hash table.
>
>The only ordering requirement between these 2 is that - the store
>to the application data must complete before the store to key_index.
>There are no ordering requirements between the stores to the
>key/signature and store to application data. The synchronization
>point for application data can be any point between the 'store to
>application data' and 'store to the key_index'. So, pData should not
>be a guard variable for the data in hash table. It should be a guard
>variable only for the application data written to the memory location
>pointed by pData. Hence, pData can be loaded after full key comparison.
>
>Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>Cc: stable@dpdk.org
>
>Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>---
> lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
> 1 file changed, 32 insertions(+), 35 deletions(-)
>
>diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
>index f37f6957d..077328fed 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.c
>+++ b/lib/librte_hash/rte_cuckoo_hash.c
>@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
> 			k = (struct rte_hash_key *) ((char *)keys +
> 					bkt->key_idx[i] * h->key_entry_size);
> 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>-				/* 'pdata' acts as the synchronization point
>-				 * when an existing hash entry is updated.
>-				 * Key is not updated in this case.
>+				/* The store to application data at *data
>+				 * should not leak after the store to pdata
>+				 * in the key store. i.e. pdata is the guard
>+				 * variable. Release the application data
>+				 * to the readers.
> 				 */
> 				__atomic_store_n(&k->pdata,
> 					data,
>@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
> 		/* Check if slot is available */
> 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
> 			prim_bkt->sig_current[i] = sig;
>-			/* Key can be of arbitrary length, so it is
>-			 * not possible to store it atomically.
>-			 * Hence the new key element's memory stores
>-			 * (key as well as data) should be complete
>-			 * before it is referenced.
>+			/* Store to signature and key should not
>+			 * leak after the store to key_idx. i.e.
>+			 * key_idx is the guard variable for signature
>+			 * and key.
> 			 */
> 			__atomic_store_n(&prim_bkt->key_idx[i],
> 					 new_idx,
>@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
>
> 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
> 	new_idx = (uint32_t)((uintptr_t) slot_id);
>-	/* Copy key */
>-	memcpy(new_k->key, key, h->key_len);
>-	/* Key can be of arbitrary length, so it is not possible to store
>-	 * it atomically. Hence the new key element's memory stores
>-	 * (key as well as data) should be complete before it is referenced.
>-	 * 'pdata' acts as the synchronization point when an existing hash
>-	 * entry is updated.
>+	/* The store to application data (by the application) at *data should
>+	 * not leak after the store of pdata in the key store. i.e. pdata is
>+	 * the guard variable. Release the application data to the readers.
> 	 */
> 	__atomic_store_n(&new_k->pdata,
> 		data,
> 		__ATOMIC_RELEASE);
[Wang, Yipeng] Actually do we need to guard pdata for the newly inserted key?  I thought the guard of key_idx already can make sure
The order for the application to read data. 
>+	/* Copy key */
>+	memcpy(new_k->key, key, h->key_len);
[Wang, Yipeng] You don't need to do the order change just to show the point of unnecessary ordering I think.
I am afraid it may cause further confusion for future people who read this change, especially it is not in the commit
Message (and it is a bug fix). 
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position Honnappa Nagarahalli
@ 2019-06-28 18:50   ` Wang, Yipeng1
  2019-07-02 17:23     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 23+ messages in thread
From: Wang, Yipeng1 @ 2019-06-28 18:50 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
>Sent: Tuesday, June 25, 2019 2:15 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; honnappa.nagarahalli@arm.com
>Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org; nd@arm.com; stable@dpdk.org
>Subject: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
>
>tbl_chng_cnt is one of the first elements of the structure used in
>the lookup. Move it to the beginning of the cache line to gain
>performance.
>
>Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>Cc: stable@dpdk.org
>
>Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>---
> lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
>index fb19bb27d..af6451b5c 100644
>--- a/lib/librte_hash/rte_cuckoo_hash.h
>+++ b/lib/librte_hash/rte_cuckoo_hash.h
>@@ -170,7 +170,9 @@ struct rte_hash {
>
> 	/* Fields used in lookup */
>
>-	uint32_t key_len __rte_cache_aligned;
>+	uint32_t *tbl_chng_cnt __rte_cache_aligned;
>+	/**< Indicates if the hash table changed from last read. */
>+	uint32_t key_len;
> 	/**< Length of hash key. */
> 	uint8_t hw_trans_mem_support;
> 	/**< If hardware transactional memory is used. */
>@@ -218,8 +220,6 @@ struct rte_hash {
> 	 * is piggy-backed to freeing of the key index.
> 	 */
> 	uint32_t *ext_bkt_to_free;
>-	uint32_t *tbl_chng_cnt;
>-	/**< Indicates if the hash table changed from last read. */
> } __rte_cache_aligned;
>
> struct queue_node {
>--
>2.17.1

[Wang, Yipeng] 
I am not sure about this change. By moving counter to front, I think you seems push key_store out of the cache line. And key_store
Is also used in lookup (and more commonly).
My tests also show perf drop in many cases.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-06-28 18:40   ` Wang, Yipeng1
@ 2019-07-02  4:35     ` Honnappa Nagarahalli
  2019-07-04  0:39       ` Wang, Yipeng1
  0 siblings, 1 reply; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-02  4:35 UTC (permalink / raw)
  To: Wang, Yipeng1, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, Honnappa Nagarahalli, nd, stable, nd

Thank you Yipeng for your comments.

> >
> >When a hash entry is added, there are 2 sets of stores.
> >
> >1) The application writes its data to memory (whose address is provided
> >in rte_hash_add_key_with_hash_data API (or NULL))
> >2) The rte_hash library writes to its own internal data structures; key
> >store entry and the hash table.
> >
> >The only ordering requirement between these 2 is that - the store to
> >the application data must complete before the store to key_index.
> >There are no ordering requirements between the stores to the
> >key/signature and store to application data. The synchronization point
> >for application data can be any point between the 'store to application
> >data' and 'store to the key_index'. So, pData should not be a guard
> >variable for the data in hash table. It should be a guard variable only
> >for the application data written to the memory location pointed by
> >pData. Hence, pData can be loaded after full key comparison.
> >
> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
> >Cc: stable@dpdk.org
> >
> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >---
> > lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
> > 1 file changed, 32 insertions(+), 35 deletions(-)
> >
> >diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> >b/lib/librte_hash/rte_cuckoo_hash.c
> >index f37f6957d..077328fed 100644
> >--- a/lib/librte_hash/rte_cuckoo_hash.c
> >+++ b/lib/librte_hash/rte_cuckoo_hash.c
> >@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void
> *data, const void *key,
> > 			k = (struct rte_hash_key *) ((char *)keys +
> > 					bkt->key_idx[i] * h->key_entry_size);
> > 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
> >-				/* 'pdata' acts as the synchronization point
> >-				 * when an existing hash entry is updated.
> >-				 * Key is not updated in this case.
> >+				/* The store to application data at *data
> >+				 * should not leak after the store to pdata
> >+				 * in the key store. i.e. pdata is the guard
> >+				 * variable. Release the application data
> >+				 * to the readers.
> > 				 */
> > 				__atomic_store_n(&k->pdata,
> > 					data,
> >@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct
> rte_hash *h,
> > 		/* Check if slot is available */
> > 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
> > 			prim_bkt->sig_current[i] = sig;
> >-			/* Key can be of arbitrary length, so it is
> >-			 * not possible to store it atomically.
> >-			 * Hence the new key element's memory stores
> >-			 * (key as well as data) should be complete
> >-			 * before it is referenced.
> >+			/* Store to signature and key should not
> >+			 * leak after the store to key_idx. i.e.
> >+			 * key_idx is the guard variable for signature
> >+			 * and key.
> > 			 */
> > 			__atomic_store_n(&prim_bkt->key_idx[i],
> > 					 new_idx,
> >@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct
> >rte_hash *h, const void *key,
> >
> > 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
> > 	new_idx = (uint32_t)((uintptr_t) slot_id);
> >-	/* Copy key */
> >-	memcpy(new_k->key, key, h->key_len);
> >-	/* Key can be of arbitrary length, so it is not possible to store
> >-	 * it atomically. Hence the new key element's memory stores
> >-	 * (key as well as data) should be complete before it is referenced.
> >-	 * 'pdata' acts as the synchronization point when an existing hash
> >-	 * entry is updated.
> >+	/* The store to application data (by the application) at *data should
> >+	 * not leak after the store of pdata in the key store. i.e. pdata is
> >+	 * the guard variable. Release the application data to the readers.
> > 	 */
> > 	__atomic_store_n(&new_k->pdata,
> > 		data,
> > 		__ATOMIC_RELEASE);
> [Wang, Yipeng] Actually do we need to guard pdata for the newly inserted
> key?  I thought the guard of key_idx already can make sure The order for the
> application to read data.
Yes, you are correct. In the hash_add case, the store-release on key_idx would be sufficient. However, hash_update case requires store-release on pData. This was the reason to keep store-release for pData in hash_add when the lock-free algorithm was introduced.

> >+	/* Copy key */
> >+	memcpy(new_k->key, key, h->key_len);
> [Wang, Yipeng] You don't need to do the order change just to show the point
> of unnecessary ordering I think.
> I am afraid it may cause further confusion for future people who read this
> change, especially it is not in the commit Message (and it is a bug fix).
I made this change to keep it inline with the corresponding change in the lookup function. I can add this explanation to the commit message. Please let me know if this is ok for you.

> >


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
  2019-06-28 18:50   ` Wang, Yipeng1
@ 2019-07-02 17:23     ` Honnappa Nagarahalli
  2019-07-04  0:52       ` Wang, Yipeng1
  0 siblings, 1 reply; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-02 17:23 UTC (permalink / raw)
  To: Wang, Yipeng1, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, Honnappa Nagarahalli, nd, stable, nd

<snip>

> >
> >tbl_chng_cnt is one of the first elements of the structure used in the
> >lookup. Move it to the beginning of the cache line to gain performance.
> >
> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
> >Cc: stable@dpdk.org
> >
> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >---
> > lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> >diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> >b/lib/librte_hash/rte_cuckoo_hash.h
> >index fb19bb27d..af6451b5c 100644
> >--- a/lib/librte_hash/rte_cuckoo_hash.h
> >+++ b/lib/librte_hash/rte_cuckoo_hash.h
> >@@ -170,7 +170,9 @@ struct rte_hash {
> >
> > 	/* Fields used in lookup */
> >
> >-	uint32_t key_len __rte_cache_aligned;
> >+	uint32_t *tbl_chng_cnt __rte_cache_aligned;
> >+	/**< Indicates if the hash table changed from last read. */
> >+	uint32_t key_len;
> > 	/**< Length of hash key. */
> > 	uint8_t hw_trans_mem_support;
> > 	/**< If hardware transactional memory is used. */ @@ -218,8 +220,6
> @@
> >struct rte_hash {
> > 	 * is piggy-backed to freeing of the key index.
> > 	 */
> > 	uint32_t *ext_bkt_to_free;
> >-	uint32_t *tbl_chng_cnt;
> >-	/**< Indicates if the hash table changed from last read. */
> > } __rte_cache_aligned;
> >
> > struct queue_node {
> >--
> >2.17.1
> 
> [Wang, Yipeng]
> I am not sure about this change. By moving counter to front, I think you
> seems push key_store out of the cache line. And key_store Is also used in
> lookup (and more commonly).
> My tests also show perf drop in many cases.
I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you?

But, I do agree that this work is not complete. We can drop this patch and take this up separately.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free
  2019-06-25 21:15 [dpdk-dev] [PATCH 0/3] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
                   ` (2 preceding siblings ...)
  2019-06-25 21:15 ` [dpdk-dev] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position Honnappa Nagarahalli
@ 2019-07-02 21:16 ` Honnappa Nagarahalli
  2019-07-02 21:16   ` [dpdk-dev] [PATCH v2 1/2] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
                     ` (3 more replies)
  3 siblings, 4 replies; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-02 21:16 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd

While using the rte_hash library, there are 2 sets of stores that
happen.

1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.

The data from both of these stores is available to the readers only
after the index of the key store entry (key_index) is written in the
hash buckets by the library. So, key_index can act as the guard variable
for both of these writes.

When rte_hash_add_key_with_hash_data is called to update an existing entry,
the key_index is not written. But, the store to the application data
must complete before the address of the application data (pData) is
updated in the key store entry. So, pData alone acts as the guard variable
for this case.

However, it should be noted that there are no ordering requirements
between 1) and 2), except for one requirement - the store to the
application data must complete before the store to key_index.
In other words, there are no ordering requirements between the stores to
the key store entry/signature and store to application data. So, the
synchronization point for application data can be any point between
the 'store to application data' and 'store to the key_index'.

The first patch in this series moves the signature comparison before the
load-acquire of the key_index. This does not result in any issues because
of the full key comparison which is done after the load-acquire of
the key_index.
Performance improvements:
Lookup Hit: 6.16%
Lookup Miss: 8.54%

The second patch in this series, moves the store-release of pData
before the store to any hash internal data structures. This is not
necessary, but just helps to show the non-dependency between application
data and hash table data. On the reader side, the pData is loaded
only if the keys match, this provides performance benefits.
Performance improvements (with patch 1):
Lookup Hit: 6.25%
Lookup Miss: 13.97%

v2
 - Dropped moving the tbl_chng_cnt to the beginning of the cache line
   commit
 - Changed the commit log for patch 1 to indicate that it improves
   performance (Yipeng)
 - Changed the comment in the search_one_bucket_lf function (Yipeng)
 - Changed the commit log for patch2 to indicate that changes to
   store-release of 'pdata' is cosmetic (Yipeng)

Honnappa Nagarahalli (2):
  lib/hash: use ordered loads only if signature matches
  lib/hash: load pData after full key compare

 lib/librte_hash/rte_cuckoo_hash.c | 98 ++++++++++++++++---------------
 1 file changed, 52 insertions(+), 46 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v2 1/2] lib/hash: use ordered loads only if signature matches
  2019-07-02 21:16 ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
@ 2019-07-02 21:16   ` Honnappa Nagarahalli
  2019-07-02 21:16   ` [dpdk-dev] [PATCH v2 2/2] lib/hash: load pData after full key compare Honnappa Nagarahalli
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-02 21:16 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

Relaxed signature comparison is done first. Further ordered loads
are done only if the signature matches. Any false positives are
caught by the full key comparison. This provides performance
benefits as load-acquire is executed only when required.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 37 +++++++++++++++++++------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 953928f27..0e042d924 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1188,22 +1188,31 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		key_idx = __atomic_load_n(&bkt->key_idx[i],
+		/* Signature comparison is done before the acquire-load
+		 * of the key index to achieve better performance.
+		 * This can result in the reader loading old signature
+		 * (which matches), while the key_idx is updated to a
+		 * value that belongs to a new key. However, the full
+		 * key comparison will ensure that the lookup fails.
+		 */
+		if (bkt->sig_current[i] == sig) {
+			key_idx = __atomic_load_n(&bkt->key_idx[i],
 					  __ATOMIC_ACQUIRE);
-		if (bkt->sig_current[i] == sig && key_idx != EMPTY_SLOT) {
-			k = (struct rte_hash_key *) ((char *)keys +
-					key_idx * h->key_entry_size);
-			pdata = __atomic_load_n(&k->pdata,
-					__ATOMIC_ACQUIRE);
+			if (key_idx != EMPTY_SLOT) {
+				k = (struct rte_hash_key *) ((char *)keys +
+						key_idx * h->key_entry_size);
+				pdata = __atomic_load_n(&k->pdata,
+						__ATOMIC_ACQUIRE);
 
-			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-				if (data != NULL)
-					*data = pdata;
-				/*
-				 * Return index where key is stored,
-				 * subtracting the first dummy index
-				 */
-				return key_idx - 1;
+				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
+					if (data != NULL)
+						*data = pdata;
+					/*
+					 * Return index where key is stored,
+					 * subtracting the first dummy index
+					 */
+					return key_idx - 1;
+				}
 			}
 		}
 	}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v2 2/2] lib/hash: load pData after full key compare
  2019-07-02 21:16 ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
  2019-07-02 21:16   ` [dpdk-dev] [PATCH v2 1/2] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
@ 2019-07-02 21:16   ` Honnappa Nagarahalli
  2019-07-04 11:13   ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Jerin Jacob Kollanukkaran
  2019-07-04 16:09   ` Thomas Monjalon
  3 siblings, 0 replies; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-02 21:16 UTC (permalink / raw)
  To: yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: gavin.hu, ruifeng.wang, dev, nd, stable

When a hash entry is added, there are 2 sets of stores.

1) The application writes its data to memory (whose address
is provided in rte_hash_add_key_with_hash_data API (or NULL))
2) The rte_hash library writes to its own internal data structures;
key store entry and the hash table.

The only ordering requirement between these 2 is that - store
to the application data must complete before the store to key_index.
There are no ordering requirements between the stores to
key/signature and store to application data. The synchronization
point for application data can be any point between the 'store to
application data' and 'store to the key_index'. So, 'pdata' should not
be a guard variable for the data in hash table. It should be a guard
variable only for the application data written to the memory location
pointed by 'pdata'. Hence, in the lookup functions, 'pdata' can be
loaded after full key comparison succeeds.

The synchronization point for the application data (store-release
to 'pdata' in key store) is changed to be consistent with the order
of loads in lookup function. However, this change is cosmetic and
does not affect the functionality.

Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
 1 file changed, 32 insertions(+), 35 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 0e042d924..55c5c1b8a 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
 			k = (struct rte_hash_key *) ((char *)keys +
 					bkt->key_idx[i] * h->key_entry_size);
 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-				/* 'pdata' acts as the synchronization point
-				 * when an existing hash entry is updated.
-				 * Key is not updated in this case.
+				/* The store to application data at *data
+				 * should not leak after the store to pdata
+				 * in the key store. i.e. pdata is the guard
+				 * variable. Release the application data
+				 * to the readers.
 				 */
 				__atomic_store_n(&k->pdata,
 					data,
@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
 		/* Check if slot is available */
 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
 			prim_bkt->sig_current[i] = sig;
-			/* Key can be of arbitrary length, so it is
-			 * not possible to store it atomically.
-			 * Hence the new key element's memory stores
-			 * (key as well as data) should be complete
-			 * before it is referenced.
+			/* Store to signature and key should not
+			 * leak after the store to key_idx. i.e.
+			 * key_idx is the guard variable for signature
+			 * and key.
 			 */
 			__atomic_store_n(&prim_bkt->key_idx[i],
 					 new_idx,
@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
 	new_idx = (uint32_t)((uintptr_t) slot_id);
-	/* Copy key */
-	memcpy(new_k->key, key, h->key_len);
-	/* Key can be of arbitrary length, so it is not possible to store
-	 * it atomically. Hence the new key element's memory stores
-	 * (key as well as data) should be complete before it is referenced.
-	 * 'pdata' acts as the synchronization point when an existing hash
-	 * entry is updated.
+	/* The store to application data (by the application) at *data should
+	 * not leak after the store of pdata in the key store. i.e. pdata is
+	 * the guard variable. Release the application data to the readers.
 	 */
 	__atomic_store_n(&new_k->pdata,
 		data,
 		__ATOMIC_RELEASE);
+	/* Copy key */
+	memcpy(new_k->key, key, h->key_len);
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
@@ -1064,8 +1063,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 			/* Check if slot is available */
 			if (likely(cur_bkt->key_idx[i] == EMPTY_SLOT)) {
 				cur_bkt->sig_current[i] = short_sig;
-				/* Store to signature should not leak after
-				 * the store to key_idx
+				/* Store to signature and key should not
+				 * leak after the store to key_idx. i.e.
+				 * key_idx is the guard variable for signature
+				 * and key.
 				 */
 				__atomic_store_n(&cur_bkt->key_idx[i],
 						 new_idx,
@@ -1087,8 +1088,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
-	/* Store to signature should not leak after
-	 * the store to key_idx
+	/* Store to signature and key should not leak after
+	 * the store to key_idx. i.e. key_idx is the guard variable
+	 * for signature and key.
 	 */
 	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
 			 new_idx,
@@ -1184,7 +1186,6 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 {
 	int i;
 	uint32_t key_idx;
-	void *pdata;
 	struct rte_hash_key *k, *keys = h->key_store;
 
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
@@ -1201,12 +1202,13 @@ search_one_bucket_lf(const struct rte_hash *h, const void *key, uint16_t sig,
 			if (key_idx != EMPTY_SLOT) {
 				k = (struct rte_hash_key *) ((char *)keys +
 						key_idx * h->key_entry_size);
-				pdata = __atomic_load_n(&k->pdata,
-						__ATOMIC_ACQUIRE);
 
 				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-					if (data != NULL)
-						*data = pdata;
+					if (data != NULL) {
+						*data = __atomic_load_n(
+							&k->pdata,
+							__ATOMIC_ACQUIRE);
+					}
 					/*
 					 * Return index where key is stored,
 					 * subtracting the first dummy index
@@ -1904,7 +1906,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 	uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
 	struct rte_hash_bucket *cur_bkt, *next_bkt;
-	void *pdata[RTE_HASH_LOOKUP_BULK_MAX];
 	uint32_t cnt_b, cnt_a;
 
 	/* Prefetch first keys */
@@ -2006,10 +2007,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					(const char *)h->key_store +
 					key_idx * h->key_entry_size);
 
-				if (key_idx != EMPTY_SLOT)
-					pdata[i] = __atomic_load_n(
-							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
 				/*
 				 * If key index is 0, do not compare key,
 				 * as it is checking the dummy slot
@@ -2018,7 +2015,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = pdata[i];
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2040,10 +2039,6 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					(const char *)h->key_store +
 					key_idx * h->key_entry_size);
 
-				if (key_idx != EMPTY_SLOT)
-					pdata[i] = __atomic_load_n(
-							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
 				/*
 				 * If key index is 0, do not compare key,
 				 * as it is checking the dummy slot
@@ -2053,7 +2048,9 @@ __rte_hash_lookup_bulk_lf(const struct rte_hash *h, const void **keys,
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = pdata[i];
+						data[i] = __atomic_load_n(
+							&key_slot->pdata,
+							__ATOMIC_ACQUIRE);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-07-02  4:35     ` Honnappa Nagarahalli
@ 2019-07-04  0:39       ` Wang, Yipeng1
  2019-07-05  5:33         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 23+ messages in thread
From: Wang, Yipeng1 @ 2019-07-04  0:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, nd, stable, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Monday, July 1, 2019 9:35 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
>dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org; nd <nd@arm.com>
>Subject: RE: [PATCH 2/3] lib/hash: load pData after full key compare
>
>Thank you Yipeng for your comments.
>
>> >
>> >When a hash entry is added, there are 2 sets of stores.
>> >
>> >1) The application writes its data to memory (whose address is provided
>> >in rte_hash_add_key_with_hash_data API (or NULL))
>> >2) The rte_hash library writes to its own internal data structures; key
>> >store entry and the hash table.
>> >
>> >The only ordering requirement between these 2 is that - the store to
>> >the application data must complete before the store to key_index.
>> >There are no ordering requirements between the stores to the
>> >key/signature and store to application data. The synchronization point
>> >for application data can be any point between the 'store to application
>> >data' and 'store to the key_index'. So, pData should not be a guard
>> >variable for the data in hash table. It should be a guard variable only
>> >for the application data written to the memory location pointed by
>> >pData. Hence, pData can be loaded after full key comparison.
>> >
>> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>> >Cc: stable@dpdk.org
>> >
>> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> >---
>> > lib/librte_hash/rte_cuckoo_hash.c | 67 +++++++++++++++----------------
>> > 1 file changed, 32 insertions(+), 35 deletions(-)
>> >
>> >diff --git a/lib/librte_hash/rte_cuckoo_hash.c
>> >b/lib/librte_hash/rte_cuckoo_hash.c
>> >index f37f6957d..077328fed 100644
>> >--- a/lib/librte_hash/rte_cuckoo_hash.c
>> >+++ b/lib/librte_hash/rte_cuckoo_hash.c
>> >@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h, void
>> *data, const void *key,
>> > 			k = (struct rte_hash_key *) ((char *)keys +
>> > 					bkt->key_idx[i] * h->key_entry_size);
>> > 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>> >-				/* 'pdata' acts as the synchronization point
>> >-				 * when an existing hash entry is updated.
>> >-				 * Key is not updated in this case.
>> >+				/* The store to application data at *data
>> >+				 * should not leak after the store to pdata
>> >+				 * in the key store. i.e. pdata is the guard
>> >+				 * variable. Release the application data
>> >+				 * to the readers.
>> > 				 */
>> > 				__atomic_store_n(&k->pdata,
>> > 					data,
>> >@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct
>> rte_hash *h,
>> > 		/* Check if slot is available */
>> > 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
>> > 			prim_bkt->sig_current[i] = sig;
>> >-			/* Key can be of arbitrary length, so it is
>> >-			 * not possible to store it atomically.
>> >-			 * Hence the new key element's memory stores
>> >-			 * (key as well as data) should be complete
>> >-			 * before it is referenced.
>> >+			/* Store to signature and key should not
>> >+			 * leak after the store to key_idx. i.e.
>> >+			 * key_idx is the guard variable for signature
>> >+			 * and key.
>> > 			 */
>> > 			__atomic_store_n(&prim_bkt->key_idx[i],
>> > 					 new_idx,
>> >@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct
>> >rte_hash *h, const void *key,
>> >
>> > 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
>> > 	new_idx = (uint32_t)((uintptr_t) slot_id);
>> >-	/* Copy key */
>> >-	memcpy(new_k->key, key, h->key_len);
>> >-	/* Key can be of arbitrary length, so it is not possible to store
>> >-	 * it atomically. Hence the new key element's memory stores
>> >-	 * (key as well as data) should be complete before it is referenced.
>> >-	 * 'pdata' acts as the synchronization point when an existing hash
>> >-	 * entry is updated.
>> >+	/* The store to application data (by the application) at *data should
>> >+	 * not leak after the store of pdata in the key store. i.e. pdata is
>> >+	 * the guard variable. Release the application data to the readers.
>> > 	 */
>> > 	__atomic_store_n(&new_k->pdata,
>> > 		data,
>> > 		__ATOMIC_RELEASE);
>> [Wang, Yipeng] Actually do we need to guard pdata for the newly inserted
>> key?  I thought the guard of key_idx already can make sure The order for the
>> application to read data.
>Yes, you are correct. In the hash_add case, the store-release on key_idx would be sufficient. However, hash_update case requires
>store-release on pData. This was the reason to keep store-release for pData in hash_add when the lock-free algorithm was
>introduced.

[Wang, Yipeng] Sorry that I am still a bit confused, we already have store release in search_and_update function right? Isn't that enough
for the hash_update case?
>
>> >+	/* Copy key */
>> >+	memcpy(new_k->key, key, h->key_len);
>> [Wang, Yipeng] You don't need to do the order change just to show the point
>> of unnecessary ordering I think.
>> I am afraid it may cause further confusion for future people who read this
>> change, especially it is not in the commit Message (and it is a bug fix).
>I made this change to keep it inline with the corresponding change in the lookup function. I can add this explanation to the commit
>message. Please let me know if this is ok for you.

[Wang, Yipeng] Thanks for the change.
To me it still looks unnecessary but If you think this cosmetic change would help others to understand the code better, I am OK with it.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
  2019-07-02 17:23     ` Honnappa Nagarahalli
@ 2019-07-04  0:52       ` Wang, Yipeng1
  0 siblings, 0 replies; 23+ messages in thread
From: Wang, Yipeng1 @ 2019-07-04  0:52 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, nd, stable, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Tuesday, July 2, 2019 10:23 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
>dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org; nd <nd@arm.com>
>Subject: RE: [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position
>
><snip>
>
>> >
>> >tbl_chng_cnt is one of the first elements of the structure used in the
>> >lookup. Move it to the beginning of the cache line to gain performance.
>> >
>> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>> >Cc: stable@dpdk.org
>> >
>> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> >---
>> > lib/librte_hash/rte_cuckoo_hash.h | 6 +++---
>> > 1 file changed, 3 insertions(+), 3 deletions(-)
>> >
>> >diff --git a/lib/librte_hash/rte_cuckoo_hash.h
>> >b/lib/librte_hash/rte_cuckoo_hash.h
>> >index fb19bb27d..af6451b5c 100644
>> >--- a/lib/librte_hash/rte_cuckoo_hash.h
>> >+++ b/lib/librte_hash/rte_cuckoo_hash.h
>> >@@ -170,7 +170,9 @@ struct rte_hash {
>> >
>> > 	/* Fields used in lookup */
>> >
>> >-	uint32_t key_len __rte_cache_aligned;
>> >+	uint32_t *tbl_chng_cnt __rte_cache_aligned;
>> >+	/**< Indicates if the hash table changed from last read. */
>> >+	uint32_t key_len;
>> > 	/**< Length of hash key. */
>> > 	uint8_t hw_trans_mem_support;
>> > 	/**< If hardware transactional memory is used. */ @@ -218,8 +220,6
>> @@
>> >struct rte_hash {
>> > 	 * is piggy-backed to freeing of the key index.
>> > 	 */
>> > 	uint32_t *ext_bkt_to_free;
>> >-	uint32_t *tbl_chng_cnt;
>> >-	/**< Indicates if the hash table changed from last read. */
>> > } __rte_cache_aligned;
>> >
>> > struct queue_node {
>> >--
>> >2.17.1
>>
>> [Wang, Yipeng]
>> I am not sure about this change. By moving counter to front, I think you
>> seems push key_store out of the cache line. And key_store Is also used in
>> lookup (and more commonly).
>> My tests also show perf drop in many cases.
>I ran hash_readwrite_lf tests and L3 fwd application. Both of them showed improvements for both lock-free and using locks for Arm
>platforms (L3 fwd was not run on x86). Which tests are resulting in performance drops for you?
>
>But, I do agree that this work is not complete. We can drop this patch and take this up separately.
[Wang, Yipeng] 
I run the LF test on x86. For 1 reader case it seems a clear improvement for lookup without key-shifts. Other results are a mixed bag even
for lock-free. For most HTM it is a drop.

My opinion is to delay similar fine tuning effort tied to architecture specs. 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free
  2019-07-02 21:16 ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
  2019-07-02 21:16   ` [dpdk-dev] [PATCH v2 1/2] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
  2019-07-02 21:16   ` [dpdk-dev] [PATCH v2 2/2] lib/hash: load pData after full key compare Honnappa Nagarahalli
@ 2019-07-04 11:13   ` Jerin Jacob Kollanukkaran
  2019-07-05  6:08     ` Honnappa Nagarahalli
  2019-07-04 16:09   ` Thomas Monjalon
  3 siblings, 1 reply; 23+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-04 11:13 UTC (permalink / raw)
  To: Honnappa Nagarahalli, yipeng1.wang, sameh.gobriel,
	bruce.richardson, pablo.de.lara.guarch
  Cc: gavin.hu, ruifeng.wang, dev, nd

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Honnappa Nagarahalli
> Sent: Wednesday, July 3, 2019 2:47 AM
> To: yipeng1.wang@intel.com; sameh.gobriel@intel.com;
> bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com;
> honnappa.nagarahalli@arm.com
> Cc: gavin.hu@arm.com; ruifeng.wang@arm.com; dev@dpdk.org;
> nd@arm.com
> Subject: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-
> free

[Snip]

> The first patch in this series moves the signature comparison before the load-
> acquire of the key_index. This does not result in any issues because of the
> full key comparison which is done after the load-acquire of the key_index.
> Performance improvements:
> Lookup Hit: 6.16%
> Lookup Miss: 8.54%
> 
> The second patch in this series, moves the store-release of pData before the
> store to any hash internal data structures. This is not necessary, but just
> helps to show the non-dependency between application data and hash table
> data. On the reader side, the pData is loaded only if the keys match, this
> provides performance benefits.
> Performance improvements (with patch 1):
> Lookup Hit: 6.25%
> Lookup Miss: 13.97%

Could you share the commands/data to test this specific performance measurement?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free
  2019-07-02 21:16 ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
                     ` (2 preceding siblings ...)
  2019-07-04 11:13   ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Jerin Jacob Kollanukkaran
@ 2019-07-04 16:09   ` Thomas Monjalon
  2019-07-05  6:14     ` Honnappa Nagarahalli
  2019-07-08 16:51     ` Wang, Yipeng1
  3 siblings, 2 replies; 23+ messages in thread
From: Thomas Monjalon @ 2019-07-04 16:09 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dev, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, gavin.hu, ruifeng.wang, nd

02/07/2019 23:16, Honnappa Nagarahalli:
> v2
>  - Dropped moving the tbl_chng_cnt to the beginning of the cache line
>    commit
>  - Changed the commit log for patch 1 to indicate that it improves
>    performance (Yipeng)
>  - Changed the comment in the search_one_bucket_lf function (Yipeng)
>  - Changed the commit log for patch2 to indicate that changes to
>    store-release of 'pdata' is cosmetic (Yipeng)
> 
> Honnappa Nagarahalli (2):
>   lib/hash: use ordered loads only if signature matches
>   lib/hash: load pData after full key compare

This series is missing 19.08 because of a lack of review.
It was expected because it was sent late in 19.08 cycle,
so this is just to make the status clear: it will be considered for 19.11.

PS: please think about --in-reply-to when sending new versions.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-07-04  0:39       ` Wang, Yipeng1
@ 2019-07-05  5:33         ` Honnappa Nagarahalli
  2019-07-08 16:44           ` Wang, Yipeng1
  0 siblings, 1 reply; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-05  5:33 UTC (permalink / raw)
  To: Wang, Yipeng1, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, nd, stable, nd, nd

> >Subject: RE: [PATCH 2/3] lib/hash: load pData after full key compare
> >
> >Thank you Yipeng for your comments.
> >
> >> >
> >> >When a hash entry is added, there are 2 sets of stores.
> >> >
> >> >1) The application writes its data to memory (whose address is
> >> >provided in rte_hash_add_key_with_hash_data API (or NULL))
> >> >2) The rte_hash library writes to its own internal data structures;
> >> >key store entry and the hash table.
> >> >
> >> >The only ordering requirement between these 2 is that - the store to
> >> >the application data must complete before the store to key_index.
> >> >There are no ordering requirements between the stores to the
> >> >key/signature and store to application data. The synchronization
> >> >point for application data can be any point between the 'store to
> >> >application data' and 'store to the key_index'. So, pData should not
> >> >be a guard variable for the data in hash table. It should be a guard
> >> >variable only for the application data written to the memory
> >> >location pointed by pData. Hence, pData can be loaded after full key
> comparison.
> >> >
> >> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
> >> >Cc: stable@dpdk.org
> >> >
> >> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> >---
> >> > lib/librte_hash/rte_cuckoo_hash.c | 67
> >> >+++++++++++++++----------------
> >> > 1 file changed, 32 insertions(+), 35 deletions(-)
> >> >
> >> >diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> >> >b/lib/librte_hash/rte_cuckoo_hash.c
> >> >index f37f6957d..077328fed 100644
> >> >--- a/lib/librte_hash/rte_cuckoo_hash.c
> >> >+++ b/lib/librte_hash/rte_cuckoo_hash.c
> >> >@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h,
> >> >void
> >> *data, const void *key,
> >> > 			k = (struct rte_hash_key *) ((char *)keys +
> >> > 					bkt->key_idx[i] * h->key_entry_size);
> >> > 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
> >> >-				/* 'pdata' acts as the synchronization point
> >> >-				 * when an existing hash entry is updated.
> >> >-				 * Key is not updated in this case.
> >> >+				/* The store to application data at *data
> >> >+				 * should not leak after the store to pdata
> >> >+				 * in the key store. i.e. pdata is the guard
> >> >+				 * variable. Release the application data
> >> >+				 * to the readers.
> >> > 				 */
> >> > 				__atomic_store_n(&k->pdata,
> >> > 					data,
> >> >@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct
> >> rte_hash *h,
> >> > 		/* Check if slot is available */
> >> > 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
> >> > 			prim_bkt->sig_current[i] = sig;
> >> >-			/* Key can be of arbitrary length, so it is
> >> >-			 * not possible to store it atomically.
> >> >-			 * Hence the new key element's memory stores
> >> >-			 * (key as well as data) should be complete
> >> >-			 * before it is referenced.
> >> >+			/* Store to signature and key should not
> >> >+			 * leak after the store to key_idx. i.e.
> >> >+			 * key_idx is the guard variable for signature
> >> >+			 * and key.
> >> > 			 */
> >> > 			__atomic_store_n(&prim_bkt->key_idx[i],
> >> > 					 new_idx,
> >> >@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct
> >> >rte_hash *h, const void *key,
> >> >
> >> > 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
> >> > 	new_idx = (uint32_t)((uintptr_t) slot_id);
> >> >-	/* Copy key */
> >> >-	memcpy(new_k->key, key, h->key_len);
> >> >-	/* Key can be of arbitrary length, so it is not possible to store
> >> >-	 * it atomically. Hence the new key element's memory stores
> >> >-	 * (key as well as data) should be complete before it is referenced.
> >> >-	 * 'pdata' acts as the synchronization point when an existing hash
> >> >-	 * entry is updated.
> >> >+	/* The store to application data (by the application) at *data should
> >> >+	 * not leak after the store of pdata in the key store. i.e. pdata is
> >> >+	 * the guard variable. Release the application data to the readers.
> >> > 	 */
> >> > 	__atomic_store_n(&new_k->pdata,
> >> > 		data,
> >> > 		__ATOMIC_RELEASE);
> >> [Wang, Yipeng] Actually do we need to guard pdata for the newly
> >> inserted key?  I thought the guard of key_idx already can make sure
> >> The order for the application to read data.
> >Yes, you are correct. In the hash_add case, the store-release on
> >key_idx would be sufficient. However, hash_update case requires
> >store-release on pData. This was the reason to keep store-release for pData
> in hash_add when the lock-free algorithm was introduced.
> 
> [Wang, Yipeng] Sorry that I am still a bit confused, we already have store
> release in search_and_update function right? Isn't that enough for the
> hash_update case?
No problem, looks like I did not use the correct terms. We are talking about 2 paths in the code:
1) When a new key is getting inserted, store-release of key_idx acts as the guard variable for the store to application data as well as the stores to internal hash table structures (signature, key, pdata).
2) But when an existing key is updated, there is no store to key_idx. In this case 'pdata' acts as the guard variable for the store to application data. Hence we need a store-release on 'pdata'. Due to this we need a load-acquire on 'pdata' in the lookup function.

Then, why do we need store-release on 'pdata' in code path 1? - store-release on 'pdata' in code path 1 is done for consistency with code path 2 i.e. we want to use store-release on 'pdata' consistently in both the code paths (unless we see performance degradation in path 1). IMO, it is much easier to understand the code this way.

> >
> >> >+	/* Copy key */
> >> >+	memcpy(new_k->key, key, h->key_len);
> >> [Wang, Yipeng] You don't need to do the order change just to show the
> >> point of unnecessary ordering I think.
> >> I am afraid it may cause further confusion for future people who read
> >> this change, especially it is not in the commit Message (and it is a bug fix).
> >I made this change to keep it inline with the corresponding change in
> >the lookup function. I can add this explanation to the commit message.
> Please let me know if this is ok for you.
> 
> [Wang, Yipeng] Thanks for the change.
> To me it still looks unnecessary but If you think this cosmetic change would
> help others to understand the code better, I am OK with it.
I agree this is unnecessary. When the change was being reviewed internally at Arm, I had not made this change initially. It resulted in questions as the new key insert's memory ordering steps did not match that of the lookup function. IMO, if we look at the algorithm as a whole (instead of looking at this commit alone), this code will be easier to understand.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free
  2019-07-04 11:13   ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Jerin Jacob Kollanukkaran
@ 2019-07-05  6:08     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-05  6:08 UTC (permalink / raw)
  To: jerinj, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, Honnappa Nagarahalli, nd, nd

> [Snip]
> 
> > The first patch in this series moves the signature comparison before
> > the load- acquire of the key_index. This does not result in any issues
> > because of the full key comparison which is done after the load-acquire of
> the key_index.
> > Performance improvements:
> > Lookup Hit: 6.16%
> > Lookup Miss: 8.54%
> >
> > The second patch in this series, moves the store-release of pData
> > before the store to any hash internal data structures. This is not
> > necessary, but just helps to show the non-dependency between
> > application data and hash table data. On the reader side, the pData is
> > loaded only if the keys match, this provides performance benefits.
> > Performance improvements (with patch 1):
> > Lookup Hit: 6.25%
> > Lookup Miss: 13.97%
> 
> Could you share the commands/data to test this specific performance
> measurement?
The data given here uses hash_readwrite_lf test with 5M entries. The keys used for hit is again 5M.

We also tested this using L3-fwd (note that L3-fwd app needs to be changed to enable lock-free hash code path).
Command: ./examples/l3fwd/build/l3fwd -c 0xc0 -n 1 -- -E -P -p 0x3 --config="(0,0,7),(1,0,6)"
On A72:
Upstream lock-free:
Hit: 11.117/11.168/11.101
Miss: 6.760/6.763/6.764

Upstream lock-free + patches(1,2)
Hit: 11.310/11.276/11.385
Miss: 11.245/11.293/11.281

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free
  2019-07-04 16:09   ` Thomas Monjalon
@ 2019-07-05  6:14     ` Honnappa Nagarahalli
  2019-07-05  6:29       ` Thomas Monjalon
  2019-07-08 16:51     ` Wang, Yipeng1
  1 sibling, 1 reply; 23+ messages in thread
From: Honnappa Nagarahalli @ 2019-07-05  6:14 UTC (permalink / raw)
  To: thomas
  Cc: dev, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli, nd, nd

> 02/07/2019 23:16, Honnappa Nagarahalli:
> > v2
> >  - Dropped moving the tbl_chng_cnt to the beginning of the cache line
> >    commit
> >  - Changed the commit log for patch 1 to indicate that it improves
> >    performance (Yipeng)
> >  - Changed the comment in the search_one_bucket_lf function (Yipeng)
> >  - Changed the commit log for patch2 to indicate that changes to
> >    store-release of 'pdata' is cosmetic (Yipeng)
> >
> > Honnappa Nagarahalli (2):
> >   lib/hash: use ordered loads only if signature matches
> >   lib/hash: load pData after full key compare
> 
> This series is missing 19.08 because of a lack of review.
> It was expected because it was sent late in 19.08 cycle, so this is just to make
> the status clear: it will be considered for 19.11.
> 
> PS: please think about --in-reply-to when sending new versions.
I used the following command:

git send-email --to yipeng1.wang@intel.com --to sameh.gobriel@intel.com --to bruce.richardson@intel.com --to pablo.de.lara.guarch@intel.com --to honnappa.nagarahalli@arm.com --cc gavin.hu@arm.com --cc ruifeng.wang@arm.com --cc dev@dpdk.org --cc nd@arm.com --in-reply-to=20190625211520.43181-1-honnappa.nagarahalli@arm.com patches/hash_regression/v2/*.patch

I am not sure what I am missing.
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free
  2019-07-05  6:14     ` Honnappa Nagarahalli
@ 2019-07-05  6:29       ` Thomas Monjalon
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Monjalon @ 2019-07-05  6:29 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dev, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	nd

05/07/2019 08:14, Honnappa Nagarahalli:
> > 02/07/2019 23:16, Honnappa Nagarahalli:
> > > v2
> > >  - Dropped moving the tbl_chng_cnt to the beginning of the cache line
> > >    commit
> > >  - Changed the commit log for patch 1 to indicate that it improves
> > >    performance (Yipeng)
> > >  - Changed the comment in the search_one_bucket_lf function (Yipeng)
> > >  - Changed the commit log for patch2 to indicate that changes to
> > >    store-release of 'pdata' is cosmetic (Yipeng)
> > >
> > > Honnappa Nagarahalli (2):
> > >   lib/hash: use ordered loads only if signature matches
> > >   lib/hash: load pData after full key compare
> > 
> > This series is missing 19.08 because of a lack of review.
> > It was expected because it was sent late in 19.08 cycle, so this is just to make
> > the status clear: it will be considered for 19.11.
> > 
> > PS: please think about --in-reply-to when sending new versions.
> I used the following command:
> 
> git send-email --to yipeng1.wang@intel.com --to sameh.gobriel@intel.com --to bruce.richardson@intel.com --to pablo.de.lara.guarch@intel.com --to honnappa.nagarahalli@arm.com --cc gavin.hu@arm.com --cc ruifeng.wang@arm.com --cc dev@dpdk.org --cc nd@arm.com --in-reply-to=20190625211520.43181-1-honnappa.nagarahalli@arm.com patches/hash_regression/v2/*.patch
> 
> I am not sure what I am missing.

You are missing nothing, sorry.
My email client showed it badly but it's better refreshed today.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare
  2019-07-05  5:33         ` Honnappa Nagarahalli
@ 2019-07-08 16:44           ` Wang, Yipeng1
  0 siblings, 0 replies; 23+ messages in thread
From: Wang, Yipeng1 @ 2019-07-08 16:44 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	dev, nd, stable, nd, nd

>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Thursday, July 4, 2019 10:33 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
>dev@dpdk.org; nd <nd@arm.com>; stable@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
>Subject: RE: [PATCH 2/3] lib/hash: load pData after full key compare
>
>> >Subject: RE: [PATCH 2/3] lib/hash: load pData after full key compare
>> >
>> >Thank you Yipeng for your comments.
>> >
>> >> >
>> >> >When a hash entry is added, there are 2 sets of stores.
>> >> >
>> >> >1) The application writes its data to memory (whose address is
>> >> >provided in rte_hash_add_key_with_hash_data API (or NULL))
>> >> >2) The rte_hash library writes to its own internal data structures;
>> >> >key store entry and the hash table.
>> >> >
>> >> >The only ordering requirement between these 2 is that - the store to
>> >> >the application data must complete before the store to key_index.
>> >> >There are no ordering requirements between the stores to the
>> >> >key/signature and store to application data. The synchronization
>> >> >point for application data can be any point between the 'store to
>> >> >application data' and 'store to the key_index'. So, pData should not
>> >> >be a guard variable for the data in hash table. It should be a guard
>> >> >variable only for the application data written to the memory
>> >> >location pointed by pData. Hence, pData can be loaded after full key
>> comparison.
>> >> >
>> >> >Fixes: e605a1d36 ("hash: add lock-free r/w concurrency")
>> >> >Cc: stable@dpdk.org
>> >> >
>> >> >Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> >> >Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> >> >Tested-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> >> >---
>> >> > lib/librte_hash/rte_cuckoo_hash.c | 67
>> >> >+++++++++++++++----------------
>> >> > 1 file changed, 32 insertions(+), 35 deletions(-)
>> >> >
>> >> >diff --git a/lib/librte_hash/rte_cuckoo_hash.c
>> >> >b/lib/librte_hash/rte_cuckoo_hash.c
>> >> >index f37f6957d..077328fed 100644
>> >> >--- a/lib/librte_hash/rte_cuckoo_hash.c
>> >> >+++ b/lib/librte_hash/rte_cuckoo_hash.c
>> >> >@@ -649,9 +649,11 @@ search_and_update(const struct rte_hash *h,
>> >> >void
>> >> *data, const void *key,
>> >> > 			k = (struct rte_hash_key *) ((char *)keys +
>> >> > 					bkt->key_idx[i] * h->key_entry_size);
>> >> > 			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>> >> >-				/* 'pdata' acts as the synchronization point
>> >> >-				 * when an existing hash entry is updated.
>> >> >-				 * Key is not updated in this case.
>> >> >+				/* The store to application data at *data
>> >> >+				 * should not leak after the store to pdata
>> >> >+				 * in the key store. i.e. pdata is the guard
>> >> >+				 * variable. Release the application data
>> >> >+				 * to the readers.
>> >> > 				 */
>> >> > 				__atomic_store_n(&k->pdata,
>> >> > 					data,
>> >> >@@ -711,11 +713,10 @@ rte_hash_cuckoo_insert_mw(const struct
>> >> rte_hash *h,
>> >> > 		/* Check if slot is available */
>> >> > 		if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
>> >> > 			prim_bkt->sig_current[i] = sig;
>> >> >-			/* Key can be of arbitrary length, so it is
>> >> >-			 * not possible to store it atomically.
>> >> >-			 * Hence the new key element's memory stores
>> >> >-			 * (key as well as data) should be complete
>> >> >-			 * before it is referenced.
>> >> >+			/* Store to signature and key should not
>> >> >+			 * leak after the store to key_idx. i.e.
>> >> >+			 * key_idx is the guard variable for signature
>> >> >+			 * and key.
>> >> > 			 */
>> >> > 			__atomic_store_n(&prim_bkt->key_idx[i],
>> >> > 					 new_idx,
>> >> >@@ -990,17 +991,15 @@ __rte_hash_add_key_with_hash(const struct
>> >> >rte_hash *h, const void *key,
>> >> >
>> >> > 	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
>> >> > 	new_idx = (uint32_t)((uintptr_t) slot_id);
>> >> >-	/* Copy key */
>> >> >-	memcpy(new_k->key, key, h->key_len);
>> >> >-	/* Key can be of arbitrary length, so it is not possible to store
>> >> >-	 * it atomically. Hence the new key element's memory stores
>> >> >-	 * (key as well as data) should be complete before it is referenced.
>> >> >-	 * 'pdata' acts as the synchronization point when an existing hash
>> >> >-	 * entry is updated.
>> >> >+	/* The store to application data (by the application) at *data should
>> >> >+	 * not leak after the store of pdata in the key store. i.e. pdata is
>> >> >+	 * the guard variable. Release the application data to the readers.
>> >> > 	 */
>> >> > 	__atomic_store_n(&new_k->pdata,
>> >> > 		data,
>> >> > 		__ATOMIC_RELEASE);
>> >> [Wang, Yipeng] Actually do we need to guard pdata for the newly
>> >> inserted key?  I thought the guard of key_idx already can make sure
>> >> The order for the application to read data.
>> >Yes, you are correct. In the hash_add case, the store-release on
>> >key_idx would be sufficient. However, hash_update case requires
>> >store-release on pData. This was the reason to keep store-release for pData
>> in hash_add when the lock-free algorithm was introduced.
>>
>> [Wang, Yipeng] Sorry that I am still a bit confused, we already have store
>> release in search_and_update function right? Isn't that enough for the
>> hash_update case?
>No problem, looks like I did not use the correct terms. We are talking about 2 paths in the code:
>1) When a new key is getting inserted, store-release of key_idx acts as the guard variable for the store to application data as well as
>the stores to internal hash table structures (signature, key, pdata).
>2) But when an existing key is updated, there is no store to key_idx. In this case 'pdata' acts as the guard variable for the store to
>application data. Hence we need a store-release on 'pdata'. Due to this we need a load-acquire on 'pdata' in the lookup function.
>
>Then, why do we need store-release on 'pdata' in code path 1? - store-release on 'pdata' in code path 1 is done for consistency with
>code path 2 i.e. we want to use store-release on 'pdata' consistently in both the code paths (unless we see performance degradation
>in path 1). IMO, it is much easier to understand the code this way.
>
>> >
>> >> >+	/* Copy key */
>> >> >+	memcpy(new_k->key, key, h->key_len);
>> >> [Wang, Yipeng] You don't need to do the order change just to show the
>> >> point of unnecessary ordering I think.
>> >> I am afraid it may cause further confusion for future people who read
>> >> this change, especially it is not in the commit Message (and it is a bug fix).
>> >I made this change to keep it inline with the corresponding change in
>> >the lookup function. I can add this explanation to the commit message.
>> Please let me know if this is ok for you.
>>
>> [Wang, Yipeng] Thanks for the change.
>> To me it still looks unnecessary but If you think this cosmetic change would
>> help others to understand the code better, I am OK with it.
>I agree this is unnecessary. When the change was being reviewed internally at Arm, I had not made this change initially. It resulted in
>questions as the new key insert's memory ordering steps did not match that of the lookup function. IMO, if we look at the algorithm
>as a whole (instead of looking at this commit alone), this code will be easier to understand.
[Wang, Yipeng]

Acked-by: Yipeng Wang <yipeng1.wang@intel.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free
  2019-07-04 16:09   ` Thomas Monjalon
  2019-07-05  6:14     ` Honnappa Nagarahalli
@ 2019-07-08 16:51     ` Wang, Yipeng1
  2019-07-08 18:10       ` Thomas Monjalon
  1 sibling, 1 reply; 23+ messages in thread
From: Wang, Yipeng1 @ 2019-07-08 16:51 UTC (permalink / raw)
  To: Thomas Monjalon, Honnappa Nagarahalli
  Cc: dev, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch, Pablo,
	gavin.hu, ruifeng.wang, nd

>-----Original Message-----
>From: Thomas Monjalon [mailto:thomas@monjalon.net]
>Sent: Thursday, July 4, 2019 9:10 AM
>To: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Cc: dev@dpdk.org; Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; gavin.hu@arm.com;
>ruifeng.wang@arm.com; nd@arm.com
>Subject: Re: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free
>
>02/07/2019 23:16, Honnappa Nagarahalli:
>> v2
>>  - Dropped moving the tbl_chng_cnt to the beginning of the cache line
>>    commit
>>  - Changed the commit log for patch 1 to indicate that it improves
>>    performance (Yipeng)
>>  - Changed the comment in the search_one_bucket_lf function (Yipeng)
>>  - Changed the commit log for patch2 to indicate that changes to
>>    store-release of 'pdata' is cosmetic (Yipeng)
>>
>> Honnappa Nagarahalli (2):
>>   lib/hash: use ordered loads only if signature matches
>>   lib/hash: load pData after full key compare
>
>This series is missing 19.08 because of a lack of review.
>It was expected because it was sent late in 19.08 cycle,
>so this is just to make the status clear: it will be considered for 19.11.
>
>PS: please think about --in-reply-to when sending new versions.
>
[Wang, Yipeng] Hi, Thomas, Thanks for your work.
I finished the review of this patch, I think it is good to go. Please include it if it is convenient for you.
Honnappa confirmed that my understanding of the code is correct, and it should help performance.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free
  2019-07-08 16:51     ` Wang, Yipeng1
@ 2019-07-08 18:10       ` Thomas Monjalon
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Monjalon @ 2019-07-08 18:10 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dev, Wang, Yipeng1, Gobriel, Sameh, Richardson, Bruce,
	De Lara Guarch, Pablo, gavin.hu, ruifeng.wang, nd

08/07/2019 18:51, Wang, Yipeng1:
>From: Thomas Monjalon [mailto:thomas@monjalon.net]
> >02/07/2019 23:16, Honnappa Nagarahalli:
> >> v2
> >>  - Dropped moving the tbl_chng_cnt to the beginning of the cache line
> >>    commit
> >>  - Changed the commit log for patch 1 to indicate that it improves
> >>    performance (Yipeng)
> >>  - Changed the comment in the search_one_bucket_lf function (Yipeng)
> >>  - Changed the commit log for patch2 to indicate that changes to
> >>    store-release of 'pdata' is cosmetic (Yipeng)
> >>
> >> Honnappa Nagarahalli (2):
> >>   lib/hash: use ordered loads only if signature matches
> >>   lib/hash: load pData after full key compare
> >
> >This series is missing 19.08 because of a lack of review.
> >It was expected because it was sent late in 19.08 cycle,
> >so this is just to make the status clear: it will be considered for 19.11.
> 
> [Wang, Yipeng] Hi, Thomas, Thanks for your work.
> I finished the review of this patch, I think it is good to go.
> Please include it if it is convenient for you.
> Honnappa confirmed that my understanding of the code is correct, and it should help performance.

Applied, thanks



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-07-08 18:10 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 21:15 [dpdk-dev] [PATCH 0/3] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
2019-06-25 21:15 ` [dpdk-dev] [PATCH 1/3] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
2019-06-28 18:04   ` Wang, Yipeng1
2019-06-25 21:15 ` [dpdk-dev] [PATCH 2/3] lib/hash: load pData after full key compare Honnappa Nagarahalli
2019-06-28 18:40   ` Wang, Yipeng1
2019-07-02  4:35     ` Honnappa Nagarahalli
2019-07-04  0:39       ` Wang, Yipeng1
2019-07-05  5:33         ` Honnappa Nagarahalli
2019-07-08 16:44           ` Wang, Yipeng1
2019-06-25 21:15 ` [dpdk-dev] [PATCH 3/3] lib/hash: adjust tbl_chng_cnt position Honnappa Nagarahalli
2019-06-28 18:50   ` Wang, Yipeng1
2019-07-02 17:23     ` Honnappa Nagarahalli
2019-07-04  0:52       ` Wang, Yipeng1
2019-07-02 21:16 ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Honnappa Nagarahalli
2019-07-02 21:16   ` [dpdk-dev] [PATCH v2 1/2] lib/hash: use ordered loads only if signature matches Honnappa Nagarahalli
2019-07-02 21:16   ` [dpdk-dev] [PATCH v2 2/2] lib/hash: load pData after full key compare Honnappa Nagarahalli
2019-07-04 11:13   ` [dpdk-dev] [PATCH v2 0/2] lib/hash: perf improvements for lock-free Jerin Jacob Kollanukkaran
2019-07-05  6:08     ` Honnappa Nagarahalli
2019-07-04 16:09   ` Thomas Monjalon
2019-07-05  6:14     ` Honnappa Nagarahalli
2019-07-05  6:29       ` Thomas Monjalon
2019-07-08 16:51     ` Wang, Yipeng1
2019-07-08 18:10       ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).