* [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash
@ 2018-09-06 17:12 Honnappa Nagarahalli
2018-09-06 17:12 ` [dpdk-dev] [PATCH 1/4] hash: correct key store element alignment Honnappa Nagarahalli
` (5 more replies)
0 siblings, 6 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-06 17:12 UTC (permalink / raw)
To: bruce.richardson, pablo.de.lara.guarch
Cc: dev, honnappa.nagarahalli, gavin.hu, steve.capper, ola.liljedahl,
nd, Honnappa Nagarahalli
Currently, reader-writer concurrency problems in rte_hash are
addressed using reader-writer locks. Use of reader-writer locks
results in following issues:
1) In many of the use cases for the hash table, writer threads
are running on control plane. If the writer is preempted while
holding the lock, it will block the readers for an extended period
resulting in packet drops. This problem seems to apply for platforms
with transactional memory support as well because of the algorithm
used for rte_rwlock_write_lock_tm:
static inline void
rte_rwlock_write_lock_tm(rte_rwlock_t *rwl)
{
if (likely(rte_try_tm(&rwl->cnt)))
return;
rte_rwlock_write_lock(rwl);
}
i.e. there is a posibility of using rte_rwlock_write_lock in
failure cases.
2) Reader-writer lock based solution does not address the following
issue.
rte_hash_lookup_xxx APIs return the index of the element in
the key store. Application(reader) can use that index to reference
other data structures in its scope. Because of this, the
index should not be freed till the application completes
using the index.
3) Since writer blocks all the readers, the hash lookup
rate comes down significantly when there is activity on the writer.
This happens even for unrelated entries. Performance numbers
given below clearly indicate this.
Lock-free solution is required to solve these problems. This patch
series adds the lock-free capabilities in the following steps:
1) Correct the alignment for the key store entry to prep for
memory ordering.
2) Add memory ordering to prevent race conditions when a new key
is added to the table.
3) Reader-writer concurrency issue, caused by moving the keys
to their alternate locations during key insert, is solved
by introducing an atomic global counter indicating a change
in table.
4) This solution also has to solve the issue of readers using
key store element even after the key is deleted from
control plane.
To solve this issue, the hash_del_key_xxx APIs do not free
the key store element. The key store element has to be freed
using the newly introduced rte_hash_free_key_with_position API.
It needs to be called once all the readers have stopped using
the key store element. How this is determined is outside
the scope of this patch (RCU is one such mechanism that the
application can use).
4) Finally, a lock free reader-writer concurrency flag is added
to enable this feature at run time.
Performance numbers:
Scenario: Equal number of writer/reader threads for concurrent
writers and readers. For readers only test, the
entries are added upfront.
Current code:
Cores Lookup Lookup
with add
2 474 246
4 935 579
6 1387 1048
8 1766 1480
10 2119 1951
12 2546 2441
With this patch:
Cores Lookup Lookup
with add
2 291 211
4 297 196
6 304 198
8 309 202
10 315 205
12 319 209
Honnappa Nagarahalli (4):
hash: correct key store element alignment
hash: add memory ordering to avoid race conditions
hash: fix rw concurrency while moving keys
hash: enable lock-free reader-writer concurrency
lib/librte_hash/rte_cuckoo_hash.c | 445 +++++++++++++++++++++++++----------
lib/librte_hash/rte_cuckoo_hash.h | 6 +-
lib/librte_hash/rte_hash.h | 63 ++++-
lib/librte_hash/rte_hash_version.map | 7 +
4 files changed, 393 insertions(+), 128 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 36+ messages in thread
* [dpdk-dev] [PATCH 1/4] hash: correct key store element alignment
2018-09-06 17:12 [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
@ 2018-09-06 17:12 ` Honnappa Nagarahalli
2018-09-27 23:58 ` Wang, Yipeng1
2018-09-06 17:12 ` [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions Honnappa Nagarahalli
` (4 subsequent siblings)
5 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-06 17:12 UTC (permalink / raw)
To: bruce.richardson, pablo.de.lara.guarch
Cc: dev, honnappa.nagarahalli, gavin.hu, steve.capper, ola.liljedahl,
nd, Honnappa Nagarahalli
Correct the key store array element alignment. This is required to
make 'pdata' in 'struct rte_hash_key' align on the correct boundary.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
---
lib/librte_hash/rte_cuckoo_hash.c | 4 +++-
lib/librte_hash/rte_cuckoo_hash.h | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..33acfc9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -189,7 +189,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
goto err_unlock;
}
- const uint32_t key_entry_size = sizeof(struct rte_hash_key) + params->key_len;
+ const uint32_t key_entry_size =
+ RTE_ALIGN(sizeof(struct rte_hash_key) + params->key_len,
+ KEY_ALIGNMENT);
const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
k = rte_zmalloc_socket(NULL, key_tbl_size,
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index b43f467..b0c7ef9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -125,7 +125,7 @@ struct rte_hash_key {
};
/* Variable key size */
char key[0];
-} __attribute__((aligned(KEY_ALIGNMENT)));
+};
/* All different signature compare functions */
enum rte_hash_sig_compare_function {
--
2.7.4
^ permalink raw reply [flat|nested] 36+ messages in thread
* [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions
2018-09-06 17:12 [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
2018-09-06 17:12 ` [dpdk-dev] [PATCH 1/4] hash: correct key store element alignment Honnappa Nagarahalli
@ 2018-09-06 17:12 ` Honnappa Nagarahalli
2018-09-28 0:43 ` Wang, Yipeng1
2018-09-06 17:12 ` [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys Honnappa Nagarahalli
` (3 subsequent siblings)
5 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-06 17:12 UTC (permalink / raw)
To: bruce.richardson, pablo.de.lara.guarch
Cc: dev, honnappa.nagarahalli, gavin.hu, steve.capper, ola.liljedahl,
nd, Honnappa Nagarahalli
Only race condition that can occur is - using the key store element
before the key write is completed. Hence, while inserting the element
the release memory order is used. Any other race condition is caught
by the key comparison. Memory orderings are added only where needed.
For ex: reads in the writer's context do not need memory ordering
as there is a single writer.
key_idx in the bucket entry and pdata in the key store element are
used for synchronisation. key_idx is used to release an inserted
entry in the bucket to the reader. Use of pdata for synchronisation
is required due to updation of an existing entry where-in only
the pdata is updated without updating key_idx.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
---
lib/librte_hash/rte_cuckoo_hash.c | 111 ++++++++++++++++++++++++++++----------
1 file changed, 82 insertions(+), 29 deletions(-)
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 33acfc9..2d89158 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -485,7 +485,9 @@ enqueue_slot_back(const struct rte_hash *h,
rte_ring_sp_enqueue(h->free_slots, slot_id);
}
-/* Search a key from bucket and update its data */
+/* Search a key from bucket and update its data.
+ * Writer holds the lock before calling this.
+ */
static inline int32_t
search_and_update(const struct rte_hash *h, void *data, const void *key,
struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
@@ -499,8 +501,13 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
k = (struct rte_hash_key *) ((char *)keys +
bkt->key_idx[i] * h->key_entry_size);
if (rte_hash_cmp_eq(key, k->key, h) == 0) {
- /* Update data */
- k->pdata = data;
+ /* 'pdata' acts as the synchronization point
+ * when an existing hash entry is updated.
+ * Key is not updated in this case.
+ */
+ __atomic_store_n(&k->pdata,
+ data,
+ __ATOMIC_RELEASE);
/*
* Return index where key is stored,
* subtracting the first dummy index
@@ -554,7 +561,15 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
prim_bkt->sig_current[i] = sig;
prim_bkt->sig_alt[i] = alt_hash;
- prim_bkt->key_idx[i] = new_idx;
+ /* Key can be of arbitrary length, so it is
+ * not possible to store it atomically.
+ * Hence the new key element's memory stores
+ * (key as well as data) should be complete
+ * before it is referenced.
+ */
+ __atomic_store_n(&prim_bkt->key_idx[i],
+ new_idx,
+ __ATOMIC_RELEASE);
break;
}
}
@@ -637,8 +652,10 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
prev_bkt->sig_current[prev_slot];
curr_bkt->sig_current[curr_slot] =
prev_bkt->sig_alt[prev_slot];
- curr_bkt->key_idx[curr_slot] =
- prev_bkt->key_idx[prev_slot];
+ /* Release the updated bucket entry */
+ __atomic_store_n(&curr_bkt->key_idx[curr_slot],
+ prev_bkt->key_idx[prev_slot],
+ __ATOMIC_RELEASE);
curr_slot = prev_slot;
curr_node = prev_node;
@@ -647,7 +664,10 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
curr_bkt->sig_current[curr_slot] = sig;
curr_bkt->sig_alt[curr_slot] = alt_hash;
- curr_bkt->key_idx[curr_slot] = new_idx;
+ /* Release the new bucket entry */
+ __atomic_store_n(&curr_bkt->key_idx[curr_slot],
+ new_idx,
+ __ATOMIC_RELEASE);
__hash_rw_writer_unlock(h);
@@ -778,8 +798,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
new_idx = (uint32_t)((uintptr_t) slot_id);
/* Copy key */
rte_memcpy(new_k->key, key, h->key_len);
- new_k->pdata = data;
-
+ /* Key can be of arbitrary length, so it is not possible to store
+ * it atomically. Hence the new key element's memory stores
+ * (key as well as data) should be complete before it is referenced.
+ * 'pdata' acts as the synchronization point when an existing hash
+ * entry is updated.
+ */
+ __atomic_store_n(&new_k->pdata,
+ data,
+ __ATOMIC_RELEASE);
/* Find an empty slot and insert */
ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
@@ -865,21 +892,27 @@ search_one_bucket(const struct rte_hash *h, const void *key, hash_sig_t sig,
void **data, const struct rte_hash_bucket *bkt)
{
int i;
+ uint32_t key_idx;
+ void *pdata;
struct rte_hash_key *k, *keys = h->key_store;
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
- if (bkt->sig_current[i] == sig &&
- bkt->key_idx[i] != EMPTY_SLOT) {
+ key_idx = __atomic_load_n(&bkt->key_idx[i],
+ __ATOMIC_ACQUIRE);
+ if (bkt->sig_current[i] == sig && key_idx != EMPTY_SLOT) {
k = (struct rte_hash_key *) ((char *)keys +
- bkt->key_idx[i] * h->key_entry_size);
+ key_idx * h->key_entry_size);
+ pdata = __atomic_load_n(&k->pdata,
+ __ATOMIC_ACQUIRE);
+
if (rte_hash_cmp_eq(key, k->key, h) == 0) {
if (data != NULL)
- *data = k->pdata;
+ *data = pdata;
/*
* Return index where key is stored,
* subtracting the first dummy index
*/
- return bkt->key_idx[i] - 1;
+ return key_idx - 1;
}
}
}
@@ -980,31 +1013,36 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
}
}
-/* Search one bucket and remove the matched key */
+/* Search one bucket and remove the matched key.
+ * Writer is expected to hold the lock while calling this
+ * function.
+ */
static inline int32_t
search_and_remove(const struct rte_hash *h, const void *key,
struct rte_hash_bucket *bkt, hash_sig_t sig)
{
struct rte_hash_key *k, *keys = h->key_store;
unsigned int i;
- int32_t ret;
+ uint32_t key_idx;
/* Check if key is in primary location */
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
- if (bkt->sig_current[i] == sig &&
- bkt->key_idx[i] != EMPTY_SLOT) {
+ key_idx = __atomic_load_n(&bkt->key_idx[i],
+ __ATOMIC_ACQUIRE);
+ if (bkt->sig_current[i] == sig && key_idx != EMPTY_SLOT) {
k = (struct rte_hash_key *) ((char *)keys +
- bkt->key_idx[i] * h->key_entry_size);
+ key_idx * h->key_entry_size);
if (rte_hash_cmp_eq(key, k->key, h) == 0) {
remove_entry(h, bkt, i);
+ __atomic_store_n(&bkt->key_idx[i],
+ EMPTY_SLOT,
+ __ATOMIC_RELEASE);
/*
* Return index where key is stored,
* subtracting the first dummy index
*/
- ret = bkt->key_idx[i] - 1;
- bkt->key_idx[i] = EMPTY_SLOT;
- return ret;
+ return key_idx - 1;
}
}
}
@@ -1151,6 +1189,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
const struct rte_hash_bucket *secondary_bkt[RTE_HASH_LOOKUP_BULK_MAX];
uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
+ void *pdata[RTE_HASH_LOOKUP_BULK_MAX];
/* Prefetch first keys */
for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++)
@@ -1220,18 +1259,25 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
while (prim_hitmask[i]) {
uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
- uint32_t key_idx = primary_bkt[i]->key_idx[hit_index];
+ uint32_t key_idx =
+ __atomic_load_n(
+ &primary_bkt[i]->key_idx[hit_index],
+ __ATOMIC_ACQUIRE);
const struct rte_hash_key *key_slot =
(const struct rte_hash_key *)(
(const char *)h->key_store +
key_idx * h->key_entry_size);
+
+ if (key_idx != EMPTY_SLOT)
+ pdata[i] = __atomic_load_n(&key_slot->pdata,
+ __ATOMIC_ACQUIRE);
/*
* If key index is 0, do not compare key,
* as it is checking the dummy slot
*/
if (!!key_idx & !rte_hash_cmp_eq(key_slot->key, keys[i], h)) {
if (data != NULL)
- data[i] = key_slot->pdata;
+ data[i] = pdata[i];
hits |= 1ULL << i;
positions[i] = key_idx - 1;
@@ -1243,11 +1289,19 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
while (sec_hitmask[i]) {
uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
- uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
+ uint32_t key_idx =
+ __atomic_load_n(
+ &secondary_bkt[i]->key_idx[hit_index],
+ __ATOMIC_ACQUIRE);
const struct rte_hash_key *key_slot =
(const struct rte_hash_key *)(
(const char *)h->key_store +
key_idx * h->key_entry_size);
+
+ if (key_idx != EMPTY_SLOT)
+ pdata[i] = __atomic_load_n(&key_slot->pdata,
+ __ATOMIC_ACQUIRE);
+
/*
* If key index is 0, do not compare key,
* as it is checking the dummy slot
@@ -1255,7 +1309,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
if (!!key_idx & !rte_hash_cmp_eq(key_slot->key, keys[i], h)) {
if (data != NULL)
- data[i] = key_slot->pdata;
+ data[i] = pdata[i];
hits |= 1ULL << i;
positions[i] = key_idx - 1;
@@ -1320,7 +1374,8 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
idx = *next % RTE_HASH_BUCKET_ENTRIES;
/* If current position is empty, go to the next one */
- while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+ while ((position = __atomic_load_n(&h->buckets[bucket_idx].key_idx[idx],
+ __ATOMIC_ACQUIRE)) == EMPTY_SLOT) {
(*next)++;
/* End of table */
if (*next == total_entries)
@@ -1329,8 +1384,6 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
idx = *next % RTE_HASH_BUCKET_ENTRIES;
}
__hash_rw_reader_lock(h);
- /* Get position of entry in key table */
- position = h->buckets[bucket_idx].key_idx[idx];
next_key = (struct rte_hash_key *) ((char *)h->key_store +
position * h->key_entry_size);
/* Return key and data */
--
2.7.4
^ permalink raw reply [flat|nested] 36+ messages in thread
* [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-06 17:12 [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
2018-09-06 17:12 ` [dpdk-dev] [PATCH 1/4] hash: correct key store element alignment Honnappa Nagarahalli
2018-09-06 17:12 ` [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions Honnappa Nagarahalli
@ 2018-09-06 17:12 ` Honnappa Nagarahalli
2018-09-28 1:00 ` Wang, Yipeng1
2018-09-06 17:12 ` [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency Honnappa Nagarahalli
` (2 subsequent siblings)
5 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-06 17:12 UTC (permalink / raw)
To: bruce.richardson, pablo.de.lara.guarch
Cc: dev, honnappa.nagarahalli, gavin.hu, steve.capper, ola.liljedahl,
nd, Honnappa Nagarahalli
Reader-writer concurrency issue, caused by moving the keys
to their alternative locations during key insert, is solved
by introducing a global counter(tbl_chng_cnt) indicating a
change in table.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
---
lib/librte_hash/rte_cuckoo_hash.c | 307 +++++++++++++++++++++++++-------------
lib/librte_hash/rte_cuckoo_hash.h | 2 +
lib/librte_hash/rte_hash.h | 8 +-
3 files changed, 206 insertions(+), 111 deletions(-)
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 2d89158..1e4a8d4 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -256,6 +256,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
#endif
/* Setup hash context */
snprintf(h->name, sizeof(h->name), "%s", params->name);
+ h->tbl_chng_cnt = 0;
h->entries = params->entries;
h->key_len = params->key_len;
h->key_entry_size = key_entry_size;
@@ -588,7 +589,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
* return 0 if succeeds.
*/
static inline int
-rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
+rte_hash_cuckoo_move_insert_mw(struct rte_hash *h,
struct rte_hash_bucket *bkt,
struct rte_hash_bucket *alt_bkt,
const struct rte_hash_key *key, void *data,
@@ -639,11 +640,27 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
if (unlikely(&h->buckets[prev_alt_bkt_idx]
!= curr_bkt)) {
/* revert it to empty, otherwise duplicated keys */
- curr_bkt->key_idx[curr_slot] = EMPTY_SLOT;
+ __atomic_store_n(&curr_bkt->key_idx[curr_slot],
+ EMPTY_SLOT,
+ __ATOMIC_RELEASE);
__hash_rw_writer_unlock(h);
return -1;
}
+ /* Inform the previous move. The current move need
+ * not be informed now as the current bucket entry
+ * is present in both primary and secondary.
+ * Since there is one writer, load acquires on
+ * tbl_chng_cnt are not required.
+ */
+ __atomic_store_n(&h->tbl_chng_cnt,
+ h->tbl_chng_cnt + 1,
+ __ATOMIC_RELEASE);
+ /* The stores to sig_alt and sig_current should not
+ * move above the store to tbl_chng_cnt.
+ */
+ __atomic_thread_fence(__ATOMIC_RELEASE);
+
/* Need to swap current/alt sig to allow later
* Cuckoo insert to move elements back to its
* primary bucket if available
@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
curr_bkt = curr_node->bkt;
}
+ /* Inform the previous move. The current move need
+ * not be informed now as the current bucket entry
+ * is present in both primary and secondary.
+ * Since there is one writer, load acquires on
+ * tbl_chng_cnt are not required.
+ */
+ __atomic_store_n(&h->tbl_chng_cnt,
+ h->tbl_chng_cnt + 1,
+ __ATOMIC_RELEASE);
+ /* The stores to sig_alt and sig_current should not
+ * move above the store to tbl_chng_cnt.
+ */
+ __atomic_thread_fence(__ATOMIC_RELEASE);
+
curr_bkt->sig_current[curr_slot] = sig;
curr_bkt->sig_alt[curr_slot] = alt_hash;
/* Release the new bucket entry */
@@ -680,7 +711,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
* Cuckoo
*/
static inline int
-rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
+rte_hash_cuckoo_make_space_mw(struct rte_hash *h,
struct rte_hash_bucket *bkt,
struct rte_hash_bucket *sec_bkt,
const struct rte_hash_key *key, void *data,
@@ -728,7 +759,7 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
}
static inline int32_t
-__rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
+__rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
hash_sig_t sig, void *data)
{
hash_sig_t alt_hash;
@@ -844,7 +875,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
}
int32_t
-rte_hash_add_key_with_hash(const struct rte_hash *h,
+rte_hash_add_key_with_hash(struct rte_hash *h,
const void *key, hash_sig_t sig)
{
RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
@@ -852,14 +883,14 @@ rte_hash_add_key_with_hash(const struct rte_hash *h,
}
int32_t
-rte_hash_add_key(const struct rte_hash *h, const void *key)
+rte_hash_add_key(struct rte_hash *h, const void *key)
{
RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
return __rte_hash_add_key_with_hash(h, key, rte_hash_hash(h, key), 0);
}
int
-rte_hash_add_key_with_hash_data(const struct rte_hash *h,
+rte_hash_add_key_with_hash_data(struct rte_hash *h,
const void *key, hash_sig_t sig, void *data)
{
int ret;
@@ -873,7 +904,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h,
}
int
-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data)
+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data)
{
int ret;
@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
uint32_t bucket_idx;
hash_sig_t alt_hash;
struct rte_hash_bucket *bkt;
+ uint32_t cnt_b, cnt_a;
int ret;
- bucket_idx = sig & h->bucket_bitmask;
- bkt = &h->buckets[bucket_idx];
-
__hash_rw_reader_lock(h);
- /* Check if key is in primary location */
- ret = search_one_bucket(h, key, sig, data, bkt);
- if (ret != -1) {
- __hash_rw_reader_unlock(h);
- return ret;
- }
- /* Calculate secondary hash */
- alt_hash = rte_hash_secondary_hash(sig);
- bucket_idx = alt_hash & h->bucket_bitmask;
- bkt = &h->buckets[bucket_idx];
+ do {
+ /* Load the table change counter before the lookup
+ * starts. Acquire semantics will make sure that
+ * loads in search_one_bucket are not hoisted.
+ */
+ cnt_b = __atomic_load_n(&h->tbl_chng_cnt,
+ __ATOMIC_ACQUIRE);
+
+ bucket_idx = sig & h->bucket_bitmask;
+ bkt = &h->buckets[bucket_idx];
+
+ /* Check if key is in primary location */
+ ret = search_one_bucket(h, key, sig, data, bkt);
+ if (ret != -1) {
+ __hash_rw_reader_unlock(h);
+ return ret;
+ }
+ /* Calculate secondary hash */
+ alt_hash = rte_hash_secondary_hash(sig);
+ bucket_idx = alt_hash & h->bucket_bitmask;
+ bkt = &h->buckets[bucket_idx];
+
+ /* Check if key is in secondary location */
+ ret = search_one_bucket(h, key, alt_hash, data, bkt);
+ if (ret != -1) {
+ __hash_rw_reader_unlock(h);
+ return ret;
+ }
+
+ /* The loads of sig_current in search_one_bucket
+ * should not move below the load from tbl_chng_cnt.
+ */
+ __atomic_thread_fence(__ATOMIC_ACQUIRE);
+ /* Re-read the table change counter to check if the
+ * table has changed during search. If yes, re-do
+ * the search.
+ * This load should not get hoisted. The load
+ * acquires on cnt_b, key index in primary bucket
+ * and key index in secondary bucket will make sure
+ * that it does not get hoisted.
+ */
+ cnt_a = __atomic_load_n(&h->tbl_chng_cnt,
+ __ATOMIC_ACQUIRE);
+ } while (cnt_b != cnt_a);
- /* Check if key is in secondary location */
- ret = search_one_bucket(h, key, alt_hash, data, bkt);
- if (ret != -1) {
- __hash_rw_reader_unlock(h);
- return ret;
- }
__hash_rw_reader_unlock(h);
return -ENOENT;
}
@@ -1190,6 +1247,7 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
uint32_t prim_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
uint32_t sec_hitmask[RTE_HASH_LOOKUP_BULK_MAX] = {0};
void *pdata[RTE_HASH_LOOKUP_BULK_MAX];
+ uint32_t cnt_b, cnt_a;
/* Prefetch first keys */
for (i = 0; i < PREFETCH_OFFSET && i < num_keys; i++)
@@ -1225,102 +1283,137 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
}
__hash_rw_reader_lock(h);
- /* Compare signatures and prefetch key slot of first hit */
- for (i = 0; i < num_keys; i++) {
- compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
+ do {
+ /* Load the table change counter before the lookup
+ * starts. Acquire semantics will make sure that
+ * loads in compare_signatures are not hoisted.
+ */
+ cnt_b = __atomic_load_n(&h->tbl_chng_cnt,
+ __ATOMIC_ACQUIRE);
+
+ /* Compare signatures and prefetch key slot of first hit */
+ for (i = 0; i < num_keys; i++) {
+ compare_signatures(&prim_hitmask[i], &sec_hitmask[i],
primary_bkt[i], secondary_bkt[i],
prim_hash[i], sec_hash[i], h->sig_cmp_fn);
- if (prim_hitmask[i]) {
- uint32_t first_hit = __builtin_ctzl(prim_hitmask[i]);
- uint32_t key_idx = primary_bkt[i]->key_idx[first_hit];
- const struct rte_hash_key *key_slot =
- (const struct rte_hash_key *)(
- (const char *)h->key_store +
- key_idx * h->key_entry_size);
- rte_prefetch0(key_slot);
- continue;
- }
+ if (prim_hitmask[i]) {
+ uint32_t first_hit =
+ __builtin_ctzl(prim_hitmask[i]);
+ uint32_t key_idx =
+ primary_bkt[i]->key_idx[first_hit];
+ const struct rte_hash_key *key_slot =
+ (const struct rte_hash_key *)(
+ (const char *)h->key_store +
+ key_idx * h->key_entry_size);
+ rte_prefetch0(key_slot);
+ continue;
+ }
- if (sec_hitmask[i]) {
- uint32_t first_hit = __builtin_ctzl(sec_hitmask[i]);
- uint32_t key_idx = secondary_bkt[i]->key_idx[first_hit];
- const struct rte_hash_key *key_slot =
- (const struct rte_hash_key *)(
- (const char *)h->key_store +
- key_idx * h->key_entry_size);
- rte_prefetch0(key_slot);
+ if (sec_hitmask[i]) {
+ uint32_t first_hit =
+ __builtin_ctzl(sec_hitmask[i]);
+ uint32_t key_idx =
+ secondary_bkt[i]->key_idx[first_hit];
+ const struct rte_hash_key *key_slot =
+ (const struct rte_hash_key *)(
+ (const char *)h->key_store +
+ key_idx * h->key_entry_size);
+ rte_prefetch0(key_slot);
+ }
}
- }
- /* Compare keys, first hits in primary first */
- for (i = 0; i < num_keys; i++) {
- positions[i] = -ENOENT;
- while (prim_hitmask[i]) {
- uint32_t hit_index = __builtin_ctzl(prim_hitmask[i]);
+ /* Compare keys, first hits in primary first */
+ for (i = 0; i < num_keys; i++) {
+ positions[i] = -ENOENT;
+ while (prim_hitmask[i]) {
+ uint32_t hit_index =
+ __builtin_ctzl(prim_hitmask[i]);
- uint32_t key_idx =
- __atomic_load_n(
- &primary_bkt[i]->key_idx[hit_index],
- __ATOMIC_ACQUIRE);
- const struct rte_hash_key *key_slot =
- (const struct rte_hash_key *)(
- (const char *)h->key_store +
- key_idx * h->key_entry_size);
-
- if (key_idx != EMPTY_SLOT)
- pdata[i] = __atomic_load_n(&key_slot->pdata,
- __ATOMIC_ACQUIRE);
- /*
- * If key index is 0, do not compare key,
- * as it is checking the dummy slot
- */
- if (!!key_idx & !rte_hash_cmp_eq(key_slot->key, keys[i], h)) {
- if (data != NULL)
- data[i] = pdata[i];
+ uint32_t key_idx =
+ __atomic_load_n(
+ &primary_bkt[i]->key_idx[hit_index],
+ __ATOMIC_ACQUIRE);
+ const struct rte_hash_key *key_slot =
+ (const struct rte_hash_key *)(
+ (const char *)h->key_store +
+ key_idx * h->key_entry_size);
- hits |= 1ULL << i;
- positions[i] = key_idx - 1;
- goto next_key;
+ if (key_idx != EMPTY_SLOT)
+ pdata[i] = __atomic_load_n(
+ &key_slot->pdata,
+ __ATOMIC_ACQUIRE);
+ /*
+ * If key index is 0, do not compare key,
+ * as it is checking the dummy slot
+ */
+ if (!!key_idx &
+ !rte_hash_cmp_eq(
+ key_slot->key, keys[i], h)) {
+ if (data != NULL)
+ data[i] = pdata[i];
+
+ hits |= 1ULL << i;
+ positions[i] = key_idx - 1;
+ goto next_key;
+ }
+ prim_hitmask[i] &= ~(1 << (hit_index));
}
- prim_hitmask[i] &= ~(1 << (hit_index));
- }
- while (sec_hitmask[i]) {
- uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
+ while (sec_hitmask[i]) {
+ uint32_t hit_index =
+ __builtin_ctzl(sec_hitmask[i]);
- uint32_t key_idx =
- __atomic_load_n(
- &secondary_bkt[i]->key_idx[hit_index],
- __ATOMIC_ACQUIRE);
- const struct rte_hash_key *key_slot =
- (const struct rte_hash_key *)(
- (const char *)h->key_store +
- key_idx * h->key_entry_size);
-
- if (key_idx != EMPTY_SLOT)
- pdata[i] = __atomic_load_n(&key_slot->pdata,
- __ATOMIC_ACQUIRE);
-
- /*
- * If key index is 0, do not compare key,
- * as it is checking the dummy slot
- */
+ uint32_t key_idx =
+ __atomic_load_n(
+ &secondary_bkt[i]->key_idx[hit_index],
+ __ATOMIC_ACQUIRE);
+ const struct rte_hash_key *key_slot =
+ (const struct rte_hash_key *)(
+ (const char *)h->key_store +
+ key_idx * h->key_entry_size);
- if (!!key_idx & !rte_hash_cmp_eq(key_slot->key, keys[i], h)) {
- if (data != NULL)
- data[i] = pdata[i];
+ if (key_idx != EMPTY_SLOT)
+ pdata[i] = __atomic_load_n(
+ &key_slot->pdata,
+ __ATOMIC_ACQUIRE);
+ /*
+ * If key index is 0, do not compare key,
+ * as it is checking the dummy slot
+ */
- hits |= 1ULL << i;
- positions[i] = key_idx - 1;
- goto next_key;
+ if (!!key_idx &
+ !rte_hash_cmp_eq(
+ key_slot->key, keys[i], h)) {
+ if (data != NULL)
+ data[i] = pdata[i];
+
+ hits |= 1ULL << i;
+ positions[i] = key_idx - 1;
+ goto next_key;
+ }
+ sec_hitmask[i] &= ~(1 << (hit_index));
}
- sec_hitmask[i] &= ~(1 << (hit_index));
- }
next_key:
- continue;
- }
+ continue;
+ }
+
+ /* The loads of sig_current in compare_signatures
+ * should not move below the load from tbl_chng_cnt.
+ */
+ __atomic_thread_fence(__ATOMIC_ACQUIRE);
+ /* Re-read the table change counter to check if the
+ * table has changed during search. If yes, re-do
+ * the search.
+ * This load should not get hoisted. The load
+ * acquires on cnt_b, primary key index and secondary
+ * key index will make sure that it does not get
+ * hoisted.
+ */
+ cnt_a = __atomic_load_n(&h->tbl_chng_cnt,
+ __ATOMIC_ACQUIRE);
+ } while (cnt_b != cnt_a);
__hash_rw_reader_unlock(h);
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index b0c7ef9..5ce864c 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -186,6 +186,8 @@ struct rte_hash {
* to the key table.
*/
rte_rwlock_t *readwrite_lock; /**< Read-write lock thread-safety. */
+ uint32_t tbl_chng_cnt;
+ /**< Indicates if the hash table changed from last read. */
} __rte_cache_aligned;
struct queue_node {
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d931..05e024b 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
* - -ENOSPC if there is no space in the hash for this key.
*/
int
-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
/**
* Add a key-value pair with a pre-computed hash value
@@ -180,7 +180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
* - -ENOSPC if there is no space in the hash for this key.
*/
int32_t
-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
hash_sig_t sig, void *data);
/**
@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
* array of user data. This value is unique for this key.
*/
int32_t
-rte_hash_add_key(const struct rte_hash *h, const void *key);
+rte_hash_add_key(struct rte_hash *h, const void *key);
/**
* Add a key to an existing hash table.
@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
* array of user data. This value is unique for this key.
*/
int32_t
-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
/**
* Remove a key from an existing hash table.
--
2.7.4
^ permalink raw reply [flat|nested] 36+ messages in thread
* [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency
2018-09-06 17:12 [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
` (2 preceding siblings ...)
2018-09-06 17:12 ` [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys Honnappa Nagarahalli
@ 2018-09-06 17:12 ` Honnappa Nagarahalli
2018-09-28 1:33 ` Wang, Yipeng1
2018-09-14 21:18 ` [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
2018-09-27 23:45 ` Wang, Yipeng1
5 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-06 17:12 UTC (permalink / raw)
To: bruce.richardson, pablo.de.lara.guarch
Cc: dev, honnappa.nagarahalli, gavin.hu, steve.capper, ola.liljedahl,
nd, Honnappa Nagarahalli
Add the flag to enable reader-writer concurrency during
run time. The rte_hash_del_xxx APIs do not free the keystore
element when this flag is enabled. Hence a new API,
rte_hash_free_key_with_position, to free the key store element
is added.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
---
lib/librte_hash/rte_cuckoo_hash.c | 105 ++++++++++++++++++++++++++---------
lib/librte_hash/rte_cuckoo_hash.h | 2 +
lib/librte_hash/rte_hash.h | 55 ++++++++++++++++++
lib/librte_hash/rte_hash_version.map | 7 +++
4 files changed, 142 insertions(+), 27 deletions(-)
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 1e4a8d4..bf51a73 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -93,6 +93,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
unsigned i;
unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
unsigned int readwrite_concur_support = 0;
+ unsigned int readwrite_concur_lf_support = 0;
rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +125,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
multi_writer_support = 1;
}
+ if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF)
+ readwrite_concur_lf_support = 1;
+
/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
if (multi_writer_support)
/*
@@ -272,6 +276,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->hw_trans_mem_support = hw_trans_mem_support;
h->multi_writer_support = multi_writer_support;
h->readwrite_concur_support = readwrite_concur_support;
+ h->readwrite_concur_lf_support = readwrite_concur_lf_support;
#if defined(RTE_ARCH_X86)
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -647,19 +652,21 @@ rte_hash_cuckoo_move_insert_mw(struct rte_hash *h,
return -1;
}
- /* Inform the previous move. The current move need
- * not be informed now as the current bucket entry
- * is present in both primary and secondary.
- * Since there is one writer, load acquires on
- * tbl_chng_cnt are not required.
- */
- __atomic_store_n(&h->tbl_chng_cnt,
- h->tbl_chng_cnt + 1,
- __ATOMIC_RELEASE);
- /* The stores to sig_alt and sig_current should not
- * move above the store to tbl_chng_cnt.
- */
- __atomic_thread_fence(__ATOMIC_RELEASE);
+ if (h->readwrite_concur_lf_support) {
+ /* Inform the previous move. The current move need
+ * not be informed now as the current bucket entry
+ * is present in both primary and secondary.
+ * Since there is one writer, load acquires on
+ * tbl_chng_cnt are not required.
+ */
+ __atomic_store_n(&h->tbl_chng_cnt,
+ h->tbl_chng_cnt + 1,
+ __ATOMIC_RELEASE);
+ /* The stores to sig_alt and sig_current should not
+ * move above the store to tbl_chng_cnt.
+ */
+ __atomic_thread_fence(__ATOMIC_RELEASE);
+ }
/* Need to swap current/alt sig to allow later
* Cuckoo insert to move elements back to its
@@ -679,19 +686,21 @@ rte_hash_cuckoo_move_insert_mw(struct rte_hash *h,
curr_bkt = curr_node->bkt;
}
- /* Inform the previous move. The current move need
- * not be informed now as the current bucket entry
- * is present in both primary and secondary.
- * Since there is one writer, load acquires on
- * tbl_chng_cnt are not required.
- */
- __atomic_store_n(&h->tbl_chng_cnt,
- h->tbl_chng_cnt + 1,
- __ATOMIC_RELEASE);
- /* The stores to sig_alt and sig_current should not
- * move above the store to tbl_chng_cnt.
- */
- __atomic_thread_fence(__ATOMIC_RELEASE);
+ if (h->readwrite_concur_lf_support) {
+ /* Inform the previous move. The current move need
+ * not be informed now as the current bucket entry
+ * is present in both primary and secondary.
+ * Since there is one writer, load acquires on
+ * tbl_chng_cnt are not required.
+ */
+ __atomic_store_n(&h->tbl_chng_cnt,
+ h->tbl_chng_cnt + 1,
+ __ATOMIC_RELEASE);
+ /* The stores to sig_alt and sig_current should not
+ * move above the store to tbl_chng_cnt.
+ */
+ __atomic_thread_fence(__ATOMIC_RELEASE);
+ }
curr_bkt->sig_current[curr_slot] = sig;
curr_bkt->sig_alt[curr_slot] = alt_hash;
@@ -1090,7 +1099,12 @@ search_and_remove(const struct rte_hash *h, const void *key,
k = (struct rte_hash_key *) ((char *)keys +
key_idx * h->key_entry_size);
if (rte_hash_cmp_eq(key, k->key, h) == 0) {
- remove_entry(h, bkt, i);
+ /* Do not free the key store element if
+ * lock-free concurrency is enabled as
+ * readers might still be using it.
+ */
+ if (!h->readwrite_concur_lf_support)
+ remove_entry(h, bkt, i);
__atomic_store_n(&bkt->key_idx[i],
EMPTY_SLOT,
@@ -1177,6 +1191,43 @@ rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t position,
return 0;
}
+int __rte_experimental
+rte_hash_free_key_with_position(const struct rte_hash *h,
+ const int32_t position)
+{
+ RETURN_IF_TRUE(((h == NULL) || (position == EMPTY_SLOT)), -EINVAL);
+
+ unsigned int lcore_id, n_slots;
+ struct lcore_cache *cached_free_slots;
+ const int32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+
+ /* Out of bounds */
+ if (position >= total_entries)
+ return -EINVAL;
+
+ if (h->multi_writer_support) {
+ lcore_id = rte_lcore_id();
+ cached_free_slots = &h->local_free_slots[lcore_id];
+ /* Cache full, need to free it. */
+ if (cached_free_slots->len == LCORE_CACHE_SIZE) {
+ /* Need to enqueue the free slots in global ring. */
+ n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+ cached_free_slots->objs,
+ LCORE_CACHE_SIZE, NULL);
+ cached_free_slots->len -= n_slots;
+ }
+ /* Put index of new free slot in cache. */
+ cached_free_slots->objs[cached_free_slots->len] =
+ (void *)((uintptr_t)position);
+ cached_free_slots->len++;
+ } else {
+ rte_ring_sp_enqueue(h->free_slots,
+ (void *)((uintptr_t)position));
+ }
+
+ return 0;
+}
+
static inline void
compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
const struct rte_hash_bucket *prim_bkt,
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index 5ce864c..08d67b8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -168,6 +168,8 @@ struct rte_hash {
/**< If multi-writer support is enabled. */
uint8_t readwrite_concur_support;
/**< If read-write concurrency support is enabled */
+ uint8_t readwrite_concur_lf_support;
+ /**< If read-write concurrency lock free support is enabled */
rte_hash_function hash_func; /**< Function used to calculate hash. */
uint32_t hash_func_init_val; /**< Init value used by hash_func. */
rte_hash_cmp_eq_t rte_hash_custom_cmp_eq;
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 05e024b..7d7372e 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -14,6 +14,8 @@
#include <stdint.h>
#include <stddef.h>
+#include <rte_compat.h>
+
#ifdef __cplusplus
extern "C" {
#endif
@@ -37,6 +39,9 @@ extern "C" {
/** Flag to support reader writer concurrency */
#define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
+/** Flag to support lock free reader writer concurrency */
+#define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF 0x08
+
/** Signature of key that is stored internally. */
typedef uint32_t hash_sig_t;
@@ -143,6 +148,11 @@ rte_hash_count(const struct rte_hash *h);
* and should only be called from one thread by default.
* Thread safety can be enabled by setting flag during
* table creation.
+ * When lock free reader writer concurrency is enabled,
+ * if this API is called to update an existing entry,
+ * the application should free any memory allocated for
+ * previous 'data' only after all the readers have stopped
+ * using previous 'data'.
*
* @param h
* Hash table to add the key to.
@@ -165,6 +175,11 @@ rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
* and should only be called from one thread by default.
* Thread safety can be enabled by setting flag during
* table creation.
+ * When lock free reader writer concurrency is enabled,
+ * if this API is called to update an existing entry,
+ * the application should free any memory allocated for
+ * previous 'data' only after all the readers have stopped
+ * using previous 'data'.
*
* @param h
* Hash table to add the key to.
@@ -230,6 +245,12 @@ rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
* and should only be called from one thread by default.
* Thread safety can be enabled by setting flag during
* table creation.
+ * If lock free reader writer concurrency is enabled,
+ * the hash library's internal memory for the deleted
+ * key is not freed. It should be freed by calling
+ * rte_hash_free_key_with_position API after all
+ * the readers have stopped using the hash entry
+ * corresponding to this key.
*
* @param h
* Hash table to remove the key from.
@@ -241,6 +262,8 @@ rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
* - A positive value that can be used by the caller as an offset into an
* array of user data. This value is unique for this key, and is the same
* value that was returned when the key was added.
+ * When lock free concurrency is enabled, this value should be used
+ * while calling the rte_hash_free_key_with_position API.
*/
int32_t
rte_hash_del_key(const struct rte_hash *h, const void *key);
@@ -251,6 +274,12 @@ rte_hash_del_key(const struct rte_hash *h, const void *key);
* and should only be called from one thread by default.
* Thread safety can be enabled by setting flag during
* table creation.
+ * If lock free reader writer concurrency is enabled,
+ * the hash library's internal memory for the deleted
+ * key is not freed. It should be freed by calling
+ * rte_hash_free_key_with_position API after all
+ * the readers have stopped using the hash entry
+ * corresponding to this key.
*
* @param h
* Hash table to remove the key from.
@@ -264,6 +293,8 @@ rte_hash_del_key(const struct rte_hash *h, const void *key);
* - A positive value that can be used by the caller as an offset into an
* array of user data. This value is unique for this key, and is the same
* value that was returned when the key was added.
+ * When lock free concurrency is enabled, this value should be used
+ * while calling the rte_hash_free_key_with_position API.
*/
int32_t
rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
@@ -290,6 +321,30 @@ rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t position,
void **key);
/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free hash library's internal memory given the position
+ * of the key. This operation is not multi-thread safe and should
+ * only be called from one thread by default. Thread safety
+ * can be enabled by setting flag during table creation.
+ * If lock free reader writer concurrency is enabled,
+ * the hash library's internal memory must be freed using this API
+ * after the key is deleted using rte_hash_del_key_xxx API.
+ *
+ * @param h
+ * Hash table to get the key from.
+ * @param position
+ * Position returned when the key was deleted.
+ * @return
+ * - 0 if freed successfully
+ * - -EINVAL if the parameters are invalid.
+ */
+int __rte_experimental
+rte_hash_free_key_with_position(const struct rte_hash *h,
+ const int32_t position);
+
+/**
* Find a key-value pair in the hash table.
* This operation is multi-thread safe with regarding to other lookup threads.
* Read-write concurrency can be enabled by setting flag during
diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
index e216ac8..734ae28 100644
--- a/lib/librte_hash/rte_hash_version.map
+++ b/lib/librte_hash/rte_hash_version.map
@@ -53,3 +53,10 @@ DPDK_18.08 {
rte_hash_count;
} DPDK_16.07;
+
+EXPERIMENTAL {
+ global:
+
+ rte_hash_free_key_with_position;
+
+};
--
2.7.4
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash
2018-09-06 17:12 [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
` (3 preceding siblings ...)
2018-09-06 17:12 ` [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency Honnappa Nagarahalli
@ 2018-09-14 21:18 ` Honnappa Nagarahalli
2018-09-26 14:36 ` Honnappa Nagarahalli
2018-09-27 23:45 ` Wang, Yipeng1
5 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-14 21:18 UTC (permalink / raw)
To: Honnappa Nagarahalli, bruce.richardson, pablo.de.lara.guarch
Cc: dev, honnappa.nagarahalli, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, yipeng1.wang, Michel Machado,
sameh.gobriel
I have added the memory ordering ladder diagrams to the DPDK summit slides to help understand the changes. The slides are available at: https://dpdkuserspace2018.sched.com/event/G44w/lock-free-read-write-concurrency-in-rtehash. Please look at the backup slides.
Thank you,
Honnappa
-----Original Message-----
From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Sent: Thursday, September 6, 2018 12:12 PM
To: bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
Cc: dev@dpdk.org; honnappa.nagarahalli; Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Subject: [PATCH 0/4] Address reader-writer concurrency in rte_hash
Currently, reader-writer concurrency problems in rte_hash are
addressed using reader-writer locks. Use of reader-writer locks
results in following issues:
1) In many of the use cases for the hash table, writer threads
are running on control plane. If the writer is preempted while
holding the lock, it will block the readers for an extended period
resulting in packet drops. This problem seems to apply for platforms
with transactional memory support as well because of the algorithm
used for rte_rwlock_write_lock_tm:
static inline void
rte_rwlock_write_lock_tm(rte_rwlock_t *rwl)
{
if (likely(rte_try_tm(&rwl->cnt)))
return;
rte_rwlock_write_lock(rwl);
}
i.e. there is a posibility of using rte_rwlock_write_lock in
failure cases.
2) Reader-writer lock based solution does not address the following
issue.
rte_hash_lookup_xxx APIs return the index of the element in
the key store. Application(reader) can use that index to reference
other data structures in its scope. Because of this, the
index should not be freed till the application completes
using the index.
3) Since writer blocks all the readers, the hash lookup
rate comes down significantly when there is activity on the writer.
This happens even for unrelated entries. Performance numbers
given below clearly indicate this.
Lock-free solution is required to solve these problems. This patch
series adds the lock-free capabilities in the following steps:
1) Correct the alignment for the key store entry to prep for
memory ordering.
2) Add memory ordering to prevent race conditions when a new key
is added to the table.
3) Reader-writer concurrency issue, caused by moving the keys
to their alternate locations during key insert, is solved
by introducing an atomic global counter indicating a change
in table.
4) This solution also has to solve the issue of readers using
key store element even after the key is deleted from
control plane.
To solve this issue, the hash_del_key_xxx APIs do not free
the key store element. The key store element has to be freed
using the newly introduced rte_hash_free_key_with_position API.
It needs to be called once all the readers have stopped using
the key store element. How this is determined is outside
the scope of this patch (RCU is one such mechanism that the
application can use).
4) Finally, a lock free reader-writer concurrency flag is added
to enable this feature at run time.
Performance numbers:
Scenario: Equal number of writer/reader threads for concurrent
writers and readers. For readers only test, the
entries are added upfront.
Current code:
Cores Lookup Lookup
with add
2 474 246
4 935 579
6 1387 1048
8 1766 1480
10 2119 1951
12 2546 2441
With this patch:
Cores Lookup Lookup
with add
2 291 211
4 297 196
6 304 198
8 309 202
10 315 205
12 319 209
Honnappa Nagarahalli (4):
hash: correct key store element alignment
hash: add memory ordering to avoid race conditions
hash: fix rw concurrency while moving keys
hash: enable lock-free reader-writer concurrency
lib/librte_hash/rte_cuckoo_hash.c | 445 +++++++++++++++++++++++++----------
lib/librte_hash/rte_cuckoo_hash.h | 6 +-
lib/librte_hash/rte_hash.h | 63 ++++-
lib/librte_hash/rte_hash_version.map | 7 +
4 files changed, 393 insertions(+), 128 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash
2018-09-14 21:18 ` [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
@ 2018-09-26 14:36 ` Honnappa Nagarahalli
0 siblings, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-26 14:36 UTC (permalink / raw)
To: Honnappa Nagarahalli, bruce.richardson, pablo.de.lara.guarch
Cc: dev, honnappa.nagarahalli, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, yipeng1.wang, Michel Machado,
sameh.gobriel
Hi Bruce/Pablo,
I need to get this into 18.11, appreciate any review/feedback soon.
Thank you,
Honnappa
> -----Original Message-----
> From: Honnappa Nagarahalli
> Sent: Friday, September 14, 2018 4:19 PM
> To: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>;
> bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
> Cc: dev@dpdk.org; honnappa.nagarahalli; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl
> <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; yipeng1.wang@intel.com;
> Michel Machado <michel@digirati.com.br>; sameh.gobriel@intel.com
> Subject: RE: [PATCH 0/4] Address reader-writer concurrency in rte_hash
>
> I have added the memory ordering ladder diagrams to the DPDK summit slides
> to help understand the changes. The slides are available at:
> https://dpdkuserspace2018.sched.com/event/G44w/lock-free-read-write-
> concurrency-in-rtehash. Please look at the backup slides.
>
> Thank you,
> Honnappa
>
> -----Original Message-----
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Sent: Thursday, September 6, 2018 12:12 PM
> To: bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
> Cc: dev@dpdk.org; honnappa.nagarahalli; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl
> <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Subject: [PATCH 0/4] Address reader-writer concurrency in rte_hash
>
> Currently, reader-writer concurrency problems in rte_hash are
> addressed using reader-writer locks. Use of reader-writer locks
> results in following issues:
>
> 1) In many of the use cases for the hash table, writer threads
> are running on control plane. If the writer is preempted while
> holding the lock, it will block the readers for an extended period
> resulting in packet drops. This problem seems to apply for platforms
> with transactional memory support as well because of the algorithm
> used for rte_rwlock_write_lock_tm:
>
> static inline void
> rte_rwlock_write_lock_tm(rte_rwlock_t *rwl)
> {
> if (likely(rte_try_tm(&rwl->cnt)))
> return;
> rte_rwlock_write_lock(rwl);
> }
>
> i.e. there is a posibility of using rte_rwlock_write_lock in
> failure cases.
> 2) Reader-writer lock based solution does not address the following
> issue.
> rte_hash_lookup_xxx APIs return the index of the element in
> the key store. Application(reader) can use that index to reference
> other data structures in its scope. Because of this, the
> index should not be freed till the application completes
> using the index.
> 3) Since writer blocks all the readers, the hash lookup
> rate comes down significantly when there is activity on the writer.
> This happens even for unrelated entries. Performance numbers
> given below clearly indicate this.
>
> Lock-free solution is required to solve these problems. This patch
> series adds the lock-free capabilities in the following steps:
>
> 1) Correct the alignment for the key store entry to prep for
> memory ordering.
> 2) Add memory ordering to prevent race conditions when a new key
> is added to the table.
>
> 3) Reader-writer concurrency issue, caused by moving the keys
> to their alternate locations during key insert, is solved
> by introducing an atomic global counter indicating a change
> in table.
>
> 4) This solution also has to solve the issue of readers using
> key store element even after the key is deleted from
> control plane.
> To solve this issue, the hash_del_key_xxx APIs do not free
> the key store element. The key store element has to be freed
> using the newly introduced rte_hash_free_key_with_position API.
> It needs to be called once all the readers have stopped using
> the key store element. How this is determined is outside
> the scope of this patch (RCU is one such mechanism that the
> application can use).
>
> 4) Finally, a lock free reader-writer concurrency flag is added
> to enable this feature at run time.
>
> Performance numbers:
> Scenario: Equal number of writer/reader threads for concurrent
> writers and readers. For readers only test, the
> entries are added upfront.
>
> Current code:
> Cores Lookup Lookup
> with add
> 2 474 246
> 4 935 579
> 6 1387 1048
> 8 1766 1480
> 10 2119 1951
> 12 2546 2441
>
> With this patch:
> Cores Lookup Lookup
> with add
> 2 291 211
> 4 297 196
> 6 304 198
> 8 309 202
> 10 315 205
> 12 319 209
>
> Honnappa Nagarahalli (4):
> hash: correct key store element alignment
> hash: add memory ordering to avoid race conditions
> hash: fix rw concurrency while moving keys
> hash: enable lock-free reader-writer concurrency
>
> lib/librte_hash/rte_cuckoo_hash.c | 445 +++++++++++++++++++++++++-------
> ---
> lib/librte_hash/rte_cuckoo_hash.h | 6 +-
> lib/librte_hash/rte_hash.h | 63 ++++-
> lib/librte_hash/rte_hash_version.map | 7 +
> 4 files changed, 393 insertions(+), 128 deletions(-)
>
> --
> 2.7.4
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash
2018-09-06 17:12 [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
` (4 preceding siblings ...)
2018-09-14 21:18 ` [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
@ 2018-09-27 23:45 ` Wang, Yipeng1
2018-09-28 21:11 ` Honnappa Nagarahalli
5 siblings, 1 reply; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-09-27 23:45 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, honnappa.nagarahalli, gavin.hu, steve.capper, ola.liljedahl,
nd, Gobriel, Sameh
Hi Honnappa,
Reply inlined:
>-----Original Message-----
>
> Currently, reader-writer concurrency problems in rte_hash are
> addressed using reader-writer locks. Use of reader-writer locks
> results in following issues:
>
> 1) In many of the use cases for the hash table, writer threads
> are running on control plane. If the writer is preempted while
> holding the lock, it will block the readers for an extended period
> resulting in packet drops. This problem seems to apply for platforms
> with transactional memory support as well because of the algorithm
> used for rte_rwlock_write_lock_tm:
>
> static inline void
> rte_rwlock_write_lock_tm(rte_rwlock_t *rwl)
> {
> if (likely(rte_try_tm(&rwl->cnt)))
> return;
> rte_rwlock_write_lock(rwl);
> }
>
> i.e. there is a posibility of using rte_rwlock_write_lock in
> failure cases.
[Wang, Yipeng] In our test, TSX failure happens very rarely on a TSX platform. But we agree
that without TSX, the current rte_rwlock implementation may make the writer to
hold a lock for a period of time.
> 2) Reader-writer lock based solution does not address the following
> issue.
> rte_hash_lookup_xxx APIs return the index of the element in
> the key store. Application(reader) can use that index to reference
> other data structures in its scope. Because of this, the
> index should not be freed till the application completes
> using the index.
[Wang, Yipeng] I agree on this use case. But I think we should provide new API functions for deletion
to users who want this behavior,
without changing the meaning of current API if that is possible.
> Current code:
> Cores Lookup Lookup
> with add
> 2 474 246
> 4 935 579
> 6 1387 1048
> 8 1766 1480
> 10 2119 1951
> 12 2546 2441
>
> With this patch:
> Cores Lookup Lookup
> with add
> 2 291 211
> 4 297 196
> 6 304 198
> 8 309 202
> 10 315 205
> 12 319 209
>
[Wang, Yipeng] It would be good if you could provide the platform information on these results.
Another comment is as you know we also have a new extension to rte_hash to enable extendable
buckets and partial-key hashing. Thanks for the comments btw. Could you check if your lockless
scheme also applies to those extensions?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 1/4] hash: correct key store element alignment
2018-09-06 17:12 ` [dpdk-dev] [PATCH 1/4] hash: correct key store element alignment Honnappa Nagarahalli
@ 2018-09-27 23:58 ` Wang, Yipeng1
0 siblings, 0 replies; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-09-27 23:58 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, honnappa.nagarahalli, gavin.hu, steve.capper, ola.liljedahl,
nd, Gobriel, Sameh
>-----Original Message-----
>From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
>Sent: Thursday, September 6, 2018 10:12 AM
>To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
>nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Subject: [dpdk-dev] [PATCH 1/4] hash: correct key store element alignment
>
>Correct the key store array element alignment. This is required to
>make 'pdata' in 'struct rte_hash_key' align on the correct boundary.
>
>Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
>Reviewed-by: Steve Capper <steve.capper@arm.com>
[Wang, Yipeng] Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
We agree on this change.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions
2018-09-06 17:12 ` [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions Honnappa Nagarahalli
@ 2018-09-28 0:43 ` Wang, Yipeng1
2018-09-30 22:20 ` Honnappa Nagarahalli
2018-10-01 10:42 ` Ola Liljedahl
0 siblings, 2 replies; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-09-28 0:43 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, gavin.hu, steve.capper, ola.liljedahl, nd
Some general comments for the various __atomic_store/load added,
1. Although it passes the compiler check, but I just want to confirm that if we should use GCC/clang builtins, or if
There are higher level APIs in DPDK to do atomic operations?
2. We believe compiler will translate the atomic_store/load to regular MOV instruction on
Total Store Order architecture (e.g. X86_64). But we run the perf test on x86 and here is the relative slowdown on
lookup comparing to master head. I am not sure if the performance drop comes from the atomic buitins.
Keysize | single lookup | bulk lookup
4 | 0.93 | 0.95
8 | 0.95 | 0.96
16 | 0.97 | 0.96
32 | 0.97 | 1.00
48 | 1.03 | 0.99
64 | 1.04 | 0.98
9 | 0.91 | 0.96
13 | 0.97 | 0.98
37 | 1.04 | 1.03
40 | 1.02 | 0.98
Other comments inlined.
>-----Original Message-----
>Only race condition that can occur is - using the key store element
>before the key write is completed. Hence, while inserting the element
>the release memory order is used. Any other race condition is caught
>by the key comparison. Memory orderings are added only where needed.
>For ex: reads in the writer's context do not need memory ordering
>as there is a single writer.
>
[Wang, Yipeng] I assume this commit works together with the next one to enable
read-write concurrency on relaxed memory ordering machine. This commit by itself does not do that, right?
If this is the case, should we merge the two commits,
or maybe just explain it a little bit more in the commit message?
>key_idx in the bucket entry and pdata in the key store element are
>used for synchronisation. key_idx is used to release an inserted
>entry in the bucket to the reader. Use of pdata for synchronisation
>is required due to updation of an existing entry where-in only
>the pdata is updated without updating key_idx.
>
>
>-/* Search a key from bucket and update its data */
>+/* Search a key from bucket and update its data.
>+ * Writer holds the lock before calling this.
>+ */
> static inline int32_t
> search_and_update(const struct rte_hash *h, void *data, const void *key,
> struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
>@@ -499,8 +501,13 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
> k = (struct rte_hash_key *) ((char *)keys +
> bkt->key_idx[i] * h->key_entry_size);
> if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>- /* Update data */
>- k->pdata = data;
>+ /* 'pdata' acts as the synchronization point
>+ * when an existing hash entry is updated.
>+ * Key is not updated in this case.
>+ */
[Wang, Yipeng] I did not quite understand why do we need synchronization for hash data update.
Since pdata write is already atomic, the lookup will either read out the stale data or the new data,
which should be fine without synchronization.
Is it to ensure the order of multiple reads in lookup threads?
>@@ -1243,11 +1289,19 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> while (sec_hitmask[i]) {
> uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
>
>- uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
>+ uint32_t key_idx =
>+ __atomic_load_n(
>+ &secondary_bkt[i]->key_idx[hit_index],
>+ __ATOMIC_ACQUIRE);
> const struct rte_hash_key *key_slot =
> (const struct rte_hash_key *)(
> (const char *)h->key_store +
> key_idx * h->key_entry_size);
>+
>+ if (key_idx != EMPTY_SLOT)
[Wang, Yipeng] I think even for current code, we need to check empty_slot.
Could you export this as a bug fix commit?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-06 17:12 ` [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys Honnappa Nagarahalli
@ 2018-09-28 1:00 ` Wang, Yipeng1
2018-09-28 8:26 ` Bruce Richardson
2018-09-30 23:05 ` Honnappa Nagarahalli
0 siblings, 2 replies; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-09-28 1:00 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, gavin.hu, steve.capper, ola.liljedahl, nd
Reply inlined:
>-----Original Message-----
>From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
>Sent: Thursday, September 6, 2018 10:12 AM
>To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
>nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>Reader-writer concurrency issue, caused by moving the keys
>to their alternative locations during key insert, is solved
>by introducing a global counter(tbl_chng_cnt) indicating a
>change in table.
>
>@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> curr_bkt = curr_node->bkt;
> }
>
>+ /* Inform the previous move. The current move need
>+ * not be informed now as the current bucket entry
>+ * is present in both primary and secondary.
>+ * Since there is one writer, load acquires on
>+ * tbl_chng_cnt are not required.
>+ */
>+ __atomic_store_n(&h->tbl_chng_cnt,
>+ h->tbl_chng_cnt + 1,
>+ __ATOMIC_RELEASE);
>+ /* The stores to sig_alt and sig_current should not
>+ * move above the store to tbl_chng_cnt.
>+ */
>+ __atomic_thread_fence(__ATOMIC_RELEASE);
>+
[Wang, Yipeng] I believe for X86 this fence should not be compiled to any code, otherwise
we need macros for the compile time check.
>@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> uint32_t bucket_idx;
> hash_sig_t alt_hash;
> struct rte_hash_bucket *bkt;
>+ uint32_t cnt_b, cnt_a;
> int ret;
>
>- bucket_idx = sig & h->bucket_bitmask;
>- bkt = &h->buckets[bucket_idx];
>-
> __hash_rw_reader_lock(h);
>
>- /* Check if key is in primary location */
>- ret = search_one_bucket(h, key, sig, data, bkt);
>- if (ret != -1) {
>- __hash_rw_reader_unlock(h);
>- return ret;
>- }
>- /* Calculate secondary hash */
>- alt_hash = rte_hash_secondary_hash(sig);
>- bucket_idx = alt_hash & h->bucket_bitmask;
>- bkt = &h->buckets[bucket_idx];
>+ do {
[Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and Concurrent
MemCache with Dumber Caching and Smarter Hashing"
as well as OvS cmap uses similar version counter to implement read-write concurrency for hash table,
but one difference is reader checks even/odd of the version counter to make sure there is no
concurrent writer. Could you just double check and confirm that this is not needed for your implementation?
>--- a/lib/librte_hash/rte_hash.h
>+++ b/lib/librte_hash/rte_hash.h
>@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> * - -ENOSPC if there is no space in the hash for this key.
> */
> int
>-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
>+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
>
> /**
> * Add a key-value pair with a pre-computed hash value
>@@ -180,7 +180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> * - -ENOSPC if there is no space in the hash for this key.
> */
> int32_t
>-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
>+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> hash_sig_t sig, void *data);
>
> /**
>@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> * array of user data. This value is unique for this key.
> */
> int32_t
>-rte_hash_add_key(const struct rte_hash *h, const void *key);
>+rte_hash_add_key(struct rte_hash *h, const void *key);
>
> /**
> * Add a key to an existing hash table.
>@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
> * array of user data. This value is unique for this key.
> */
> int32_t
>-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
>+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
>
> /
I think the above changes will break ABI by changing the parameter type? Other people may know better on this.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency
2018-09-06 17:12 ` [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency Honnappa Nagarahalli
@ 2018-09-28 1:33 ` Wang, Yipeng1
2018-10-01 4:11 ` Honnappa Nagarahalli
0 siblings, 1 reply; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-09-28 1:33 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, gavin.hu, steve.capper, ola.liljedahl, nd, Gobriel, Sameh
Reply inlined:
>-----Original Message-----
>From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
>Sent: Thursday, September 6, 2018 10:12 AM
>To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
>nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Subject: [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency
>
>Add the flag to enable reader-writer concurrency during
>run time. The rte_hash_del_xxx APIs do not free the keystore
>element when this flag is enabled. Hence a new API,
>rte_hash_free_key_with_position, to free the key store element
>is added.
>
>+/** Flag to support lock free reader writer concurrency */
>+#define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF 0x08
[Wang, Yipeng] It would be good to indicate that the lockless implementation works for single writer multiple readers.
Also, if people use a mix of the flags for example set both multiwriter and LF flags, then I guess either we need to return an error or
maybe multiwriter should have higher priority. Currently the RW_CONCURRENCY will assume MULTI_WRITER_ADD I think.
>+
> /** Signature of key that is stored internally. */
> typedef uint32_t hash_sig_t;
>
>@@ -143,6 +148,11 @@ rte_hash_count(const struct rte_hash *h);
> * and should only be called from one thread by default.
> * Thread safety can be enabled by setting flag during
> * table creation.
>+ * When lock free reader writer concurrency is enabled,
>+ * if this API is called to update an existing entry,
>+ * the application should free any memory allocated for
>+ * previous 'data' only after all the readers have stopped
>+ * using previous 'data'.
[Wang, Yipeng] Could you be more specific on this description?
When add_key API is called, the users do not know if it will update
an existing entry or inserting a new one, do they?
> *
> * @param h
> * Hash table to add the key to.
>@@ -165,6 +175,11 @@ rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
> * and should only be called from one thread by default.
> * Thread safety can be enabled by setting flag during
> * table creation.
>+ * When lock free reader writer concurrency is enabled,
>+ * if this API is called to update an existing entry,
>+ * the application should free any memory allocated for
>+ * previous 'data' only after all the readers have stopped
>+ * using previous 'data'.
> *
> * @param h
> * Hash table to add the key to.
>@@ -230,6 +245,12 @@ rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
> * and should only be called from one thread by default.
> * Thread safety can be enabled by setting flag during
> * table creation.
>+ * If lock free reader writer concurrency is enabled,
>+ * the hash library's internal memory for the deleted
>+ * key is not freed. It should be freed by calling
>+ * rte_hash_free_key_with_position API after all
>+ * the readers have stopped using the hash entry
>+ * corresponding to this key.
> *
> * @param h
> * Hash table to remove the key from.
>@@ -241,6 +262,8 @@ rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
> * - A positive value that can be used by the caller as an offset into an
> * array of user data. This value is unique for this key, and is the same
> * value that was returned when the key was added.
>+ * When lock free concurrency is enabled, this value should be used
>+ * while calling the rte_hash_free_key_with_position API.
> */
> int32_t
> rte_hash_del_key(const struct rte_hash *h, const void *key);
>@@ -251,6 +274,12 @@ rte_hash_del_key(const struct rte_hash *h, const void *key);
> * and should only be called from one thread by default.
> * Thread safety can be enabled by setting flag during
> * table creation.
>+ * If lock free reader writer concurrency is enabled,
>+ * the hash library's internal memory for the deleted
>+ * key is not freed. It should be freed by calling
>+ * rte_hash_free_key_with_position API after all
>+ * the readers have stopped using the hash entry
>+ * corresponding to this key.
> *
> * @param h
> * Hash table to remove the key from.
>@@ -264,6 +293,8 @@ rte_hash_del_key(const struct rte_hash *h, const void *key);
> * - A positive value that can be used by the caller as an offset into an
> * array of user data. This value is unique for this key, and is the same
> * value that was returned when the key was added.
>+ * When lock free concurrency is enabled, this value should be used
>+ * while calling the rte_hash_free_key_with_position API.
> */
> int32_t
> rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
>@@ -290,6 +321,30 @@ rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t position,
> void **key);
>
[Wang, Yipeng] If possible, how about having a new delete function instead of modifying the current one?
I think it does not need to be tied with the lockless implementation, it is orthogonal to multi-threading implementation.
people using locks may still want this new deletion behavior.
If people want old behavior, they can call current API, otherwise they can call the new deletion function, followed by
Rte_hash_free_key_with_position later.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 1:00 ` Wang, Yipeng1
@ 2018-09-28 8:26 ` Bruce Richardson
2018-09-28 8:55 ` Van Haaren, Harry
2018-09-30 23:05 ` Honnappa Nagarahalli
1 sibling, 1 reply; 36+ messages in thread
From: Bruce Richardson @ 2018-09-28 8:26 UTC (permalink / raw)
To: Wang, Yipeng1
Cc: Honnappa Nagarahalli, De Lara Guarch, Pablo, dev, gavin.hu,
steve.capper, ola.liljedahl, nd
On Fri, Sep 28, 2018 at 02:00:00AM +0100, Wang, Yipeng1 wrote:
> Reply inlined:
>
> >-----Original Message-----
> >From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
> >Sent: Thursday, September 6, 2018 10:12 AM
> >To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> >Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
> >nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
> >
> >Reader-writer concurrency issue, caused by moving the keys
> >to their alternative locations during key insert, is solved
> >by introducing a global counter(tbl_chng_cnt) indicating a
> >change in table.
> >
> >@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> > curr_bkt = curr_node->bkt;
> > }
> >
> >+ /* Inform the previous move. The current move need
> >+ * not be informed now as the current bucket entry
> >+ * is present in both primary and secondary.
> >+ * Since there is one writer, load acquires on
> >+ * tbl_chng_cnt are not required.
> >+ */
> >+ __atomic_store_n(&h->tbl_chng_cnt,
> >+ h->tbl_chng_cnt + 1,
> >+ __ATOMIC_RELEASE);
> >+ /* The stores to sig_alt and sig_current should not
> >+ * move above the store to tbl_chng_cnt.
> >+ */
> >+ __atomic_thread_fence(__ATOMIC_RELEASE);
> >+
> [Wang, Yipeng] I believe for X86 this fence should not be compiled to any code, otherwise
> we need macros for the compile time check.
>
> >@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> > uint32_t bucket_idx;
> > hash_sig_t alt_hash;
> > struct rte_hash_bucket *bkt;
> >+ uint32_t cnt_b, cnt_a;
> > int ret;
> >
> >- bucket_idx = sig & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >-
> > __hash_rw_reader_lock(h);
> >
> >- /* Check if key is in primary location */
> >- ret = search_one_bucket(h, key, sig, data, bkt);
> >- if (ret != -1) {
> >- __hash_rw_reader_unlock(h);
> >- return ret;
> >- }
> >- /* Calculate secondary hash */
> >- alt_hash = rte_hash_secondary_hash(sig);
> >- bucket_idx = alt_hash & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >+ do {
> [Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and Concurrent
> MemCache with Dumber Caching and Smarter Hashing"
> as well as OvS cmap uses similar version counter to implement read-write concurrency for hash table,
> but one difference is reader checks even/odd of the version counter to make sure there is no
> concurrent writer. Could you just double check and confirm that this is not needed for your implementation?
>
> >--- a/lib/librte_hash/rte_hash.h
> >+++ b/lib/librte_hash/rte_hash.h
> >@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int
> >-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> >+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
> >
> > /**
> > * Add a key-value pair with a pre-computed hash value
> >@@ -180,7 +180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> >+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> > hash_sig_t sig, void *data);
> >
> > /**
> >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> >+rte_hash_add_key(struct rte_hash *h, const void *key);
> >
> > /**
> > * Add a key to an existing hash table.
> >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
> >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
> >
> > /
>
> I think the above changes will break ABI by changing the parameter type? Other people may know better on this.
Just removing a const should not change the ABI, I believe, since the const
is just advisory hint to the compiler. Actual parameter size and count
remains unchanged so I don't believe there is an issue.
[ABI experts, please correct me if I'm wrong on this]
/Bruce
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 8:26 ` Bruce Richardson
@ 2018-09-28 8:55 ` Van Haaren, Harry
2018-09-30 22:33 ` Honnappa Nagarahalli
0 siblings, 1 reply; 36+ messages in thread
From: Van Haaren, Harry @ 2018-09-28 8:55 UTC (permalink / raw)
To: Richardson, Bruce, Wang, Yipeng1
Cc: Honnappa Nagarahalli, De Lara Guarch, Pablo, dev, gavin.hu,
steve.capper, ola.liljedahl, nd
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Friday, September 28, 2018 9:26 AM
> To: Wang, Yipeng1 <yipeng1.wang@intel.com>
> Cc: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; gavin.hu@arm.com;
> steve.capper@arm.com; ola.liljedahl@arm.com; nd@arm.com
> Subject: Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
>
> On Fri, Sep 28, 2018 at 02:00:00AM +0100, Wang, Yipeng1 wrote:
> > Reply inlined:
> >
> > >-----Original Message-----
> > >From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
> > >Sent: Thursday, September 6, 2018 10:12 AM
> > >To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> > >Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com;
> steve.capper@arm.com; ola.liljedahl@arm.com;
> > >nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > >Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
> > >
> > >Reader-writer concurrency issue, caused by moving the keys
> > >to their alternative locations during key insert, is solved
> > >by introducing a global counter(tbl_chng_cnt) indicating a
> > >change in table.
<snip>
> > > /**
> > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash
> *h, const void *key,
> > > * array of user data. This value is unique for this key.
> > > */
> > > int32_t
> > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > >
> > > /**
> > > * Add a key to an existing hash table.
> > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void
> *key);
> > > * array of user data. This value is unique for this key.
> > > */
> > > int32_t
> > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> hash_sig_t sig);
> > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> hash_sig_t sig);
> > >
> > > /
> >
> > I think the above changes will break ABI by changing the parameter type?
> Other people may know better on this.
>
> Just removing a const should not change the ABI, I believe, since the const
> is just advisory hint to the compiler. Actual parameter size and count
> remains unchanged so I don't believe there is an issue.
> [ABI experts, please correct me if I'm wrong on this]
[Certainly no ABI expert, but...]
I think this is an API break, not ABI break.
Given application code as follows, it will fail to compile - even though
running the new code as a .so wouldn't cause any issues (AFAIK).
void do_hash_stuff(const struct rte_hash *h, ...)
{
/* parameter passed in is const, but updated function prototype is non-const */
rte_hash_add_key_with_hash(h, ...);
}
This means that we can't recompile apps against latest patch without application
code changes, if the app was passing a const rte_hash struct as the first parameter.
-Harry
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash
2018-09-27 23:45 ` Wang, Yipeng1
@ 2018-09-28 21:11 ` Honnappa Nagarahalli
2018-10-02 0:30 ` Wang, Yipeng1
0 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-28 21:11 UTC (permalink / raw)
To: Wang, Yipeng1, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, honnappa.nagarahalli, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>
> Hi Honnappa,
>
> Reply inlined:
Hi Yipeng,
Thank you so much for reviewing.
>
> >-----Original Message-----
> >
> > Currently, reader-writer concurrency problems in rte_hash are
> > addressed using reader-writer locks. Use of reader-writer locks
> > results in following issues:
> >
> > 1) In many of the use cases for the hash table, writer threads
> > are running on control plane. If the writer is preempted while
> > holding the lock, it will block the readers for an extended period
> > resulting in packet drops. This problem seems to apply for platforms
> > with transactional memory support as well because of the algorithm
> > used for rte_rwlock_write_lock_tm:
> >
> > static inline void
> > rte_rwlock_write_lock_tm(rte_rwlock_t *rwl)
> > {
> > if (likely(rte_try_tm(&rwl->cnt)))
> > return;
> > rte_rwlock_write_lock(rwl);
> > }
> >
> > i.e. there is a posibility of using rte_rwlock_write_lock in
> > failure cases.
> [Wang, Yipeng] In our test, TSX failure happens very rarely on a TSX
> platform. But we agree that without TSX, the current rte_rwlock
> implementation may make the writer to hold a lock for a period of time.
>
> > 2) Reader-writer lock based solution does not address the following
> > issue.
> > rte_hash_lookup_xxx APIs return the index of the element in
> > the key store. Application(reader) can use that index to reference
> > other data structures in its scope. Because of this, the
> > index should not be freed till the application completes
> > using the index.
> [Wang, Yipeng] I agree on this use case. But I think we should provide new
> API functions for deletion to users who want this behavior, without
> changing the meaning of current API if that is possible.
In the lock-free algorithm, the rte_hash_delete API will not free the index. The new API rte_hash_free will free the index. The solution for the algorithm with rw locks needs to be thought about.
>
> > Current code:
> > Cores Lookup Lookup
> > with add
> > 2 474 246
> > 4 935 579
> > 6 1387 1048
> > 8 1766 1480
> > 10 2119 1951
> > 12 2546 2441
> >
> > With this patch:
> > Cores Lookup Lookup
> > with add
> > 2 291 211
> > 4 297 196
> > 6 304 198
> > 8 309 202
> > 10 315 205
> > 12 319 209
> >
> [Wang, Yipeng] It would be good if you could provide the platform
> information on these results.
Apologies, I should have done that. The machine I am using is: Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz, 64G memory. This is a hacked test case which is not upstreamed. In the case of 'Lookup with add' - I had equal number of threads calling 'rte_hash_add' and 'rte_hash_lookup'. In the case of 'Lookup' - a set of entries were added and all the threads called 'rte_hash_lookup'. Note that these are the numbers without htm. We have created another test case which I will upstream as next version of this patch. I will publish the numbers with that test case. So, you should be able to reproduce the numbers with that test case.
>
> Another comment is as you know we also have a new extension to rte_hash
> to enable extendable buckets and partial-key hashing. Thanks for the
> comments btw. Could you check if your lockless scheme also applies to
> those extensions?
Thank you for reminding me on this. I thought I had covered everything. On a relook, I have missed few key issues. I will reply on the other email thread.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions
2018-09-28 0:43 ` Wang, Yipeng1
@ 2018-09-30 22:20 ` Honnappa Nagarahalli
2018-10-01 22:41 ` Wang, Yipeng1
2018-10-01 10:42 ` Ola Liljedahl
1 sibling, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-30 22:20 UTC (permalink / raw)
To: Wang, Yipeng1, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China), Steve Capper, Ola Liljedahl, nd
>
> Some general comments for the various __atomic_store/load added,
>
> 1. Although it passes the compiler check, but I just want to confirm that if we
> should use GCC/clang builtins, or if There are higher level APIs in DPDK to do
> atomic operations?
>
I have used gcc builtins (just like rte_ring does)
> 2. We believe compiler will translate the atomic_store/load to regular MOV
> instruction on Total Store Order architecture (e.g. X86_64). But we run the
> perf test on x86 and here is the relative slowdown on lookup comparing to
> master head. I am not sure if the performance drop comes from the atomic
> buitins.
>
C11 atomics also block compiler reordering. Other than this, the retry loop is an addition to lookup.
The patch also has the alignment corrected. I am not sure how is that affecting the perf numbers.
> Keysize | single lookup | bulk lookup
> 4 | 0.93 | 0.95
> 8 | 0.95 | 0.96
> 16 | 0.97 | 0.96
> 32 | 0.97 | 1.00
> 48 | 1.03 | 0.99
> 64 | 1.04 | 0.98
> 9 | 0.91 | 0.96
> 13 | 0.97 | 0.98
> 37 | 1.04 | 1.03
> 40 | 1.02 | 0.98
>
I assume this is the data from the test cases in test_hash_perf.c file. I tried to reproduce this data, but my data is worse. Can you specify the actual test from test_hash_perf.c you are using (With locks/Pre-computed hash/With data/Elements in primary)?
IMO, the differences you have provided are not high.
> Other comments inlined.
>
> >-----Original Message-----
> >Only race condition that can occur is - using the key store element
> >before the key write is completed. Hence, while inserting the element
> >the release memory order is used. Any other race condition is caught by
> >the key comparison. Memory orderings are added only where needed.
> >For ex: reads in the writer's context do not need memory ordering as
> >there is a single writer.
> >
> [Wang, Yipeng] I assume this commit works together with the next one to
> enable read-write concurrency on relaxed memory ordering machine. This
> commit by itself does not do that, right?
> If this is the case, should we merge the two commits, or maybe just explain it
> a little bit more in the commit message?
This commit works only with the next commit. I kept it separated to make it easy for review.
I will merge it in the final patch.
>
> >key_idx in the bucket entry and pdata in the key store element are used
> >for synchronisation. key_idx is used to release an inserted entry in
> >the bucket to the reader. Use of pdata for synchronisation is required
> >due to updation of an existing entry where-in only the pdata is updated
> >without updating key_idx.
> >
> >
> >-/* Search a key from bucket and update its data */
> >+/* Search a key from bucket and update its data.
> >+ * Writer holds the lock before calling this.
> >+ */
> > static inline int32_t
> > search_and_update(const struct rte_hash *h, void *data, const void *key,
> > struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash) @@
> >-499,8 +501,13 @@ search_and_update(const struct rte_hash *h, void *data,
> const void *key,
> > k = (struct rte_hash_key *) ((char *)keys +
> > bkt->key_idx[i] * h->key_entry_size);
> > if (rte_hash_cmp_eq(key, k->key, h) == 0) {
> >- /* Update data */
> >- k->pdata = data;
> >+ /* 'pdata' acts as the synchronization point
> >+ * when an existing hash entry is updated.
> >+ * Key is not updated in this case.
> >+ */
> [Wang, Yipeng] I did not quite understand why do we need synchronization
> for hash data update.
> Since pdata write is already atomic, the lookup will either read out the stale
> data or the new data, which should be fine without synchronization.
> Is it to ensure the order of multiple reads in lookup threads?
Yes, it is to ensure the order of reads in the lookup threads. There might be reads in the application and we want to make sure they do not get reordered with this.
>
> >@@ -1243,11 +1289,19 @@ __rte_hash_lookup_bulk(const struct rte_hash
> *h, const void **keys,
> > while (sec_hitmask[i]) {
> > uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
> >
> >- uint32_t key_idx = secondary_bkt[i]-
> >key_idx[hit_index];
> >+ uint32_t key_idx =
> >+ __atomic_load_n(
> >+ &secondary_bkt[i]->key_idx[hit_index],
> >+ __ATOMIC_ACQUIRE);
> > const struct rte_hash_key *key_slot =
> > (const struct rte_hash_key *)(
> > (const char *)h->key_store +
> > key_idx * h->key_entry_size);
> >+
> >+ if (key_idx != EMPTY_SLOT)
> [Wang, Yipeng] I think even for current code, we need to check empty_slot.
> Could you export this as a bug fix commit?
>
In the existing code, there is check 'if (!!key_idx & !rte_hash....)'. Are you referring to '!!key_idx'? I think this should be changed to '(key_idx != EMPTY_SLOT)'.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 8:55 ` Van Haaren, Harry
@ 2018-09-30 22:33 ` Honnappa Nagarahalli
2018-10-02 13:17 ` Van Haaren, Harry
0 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-30 22:33 UTC (permalink / raw)
To: Van Haaren, Harry, Richardson, Bruce, Wang, Yipeng1
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd
> > > >
> > > >Reader-writer concurrency issue, caused by moving the keys to their
> > > >alternative locations during key insert, is solved by introducing a
> > > >global counter(tbl_chng_cnt) indicating a change in table.
>
> <snip>
>
> > > > /**
> > > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct
> > > >rte_hash
> > *h, const void *key,
> > > > * array of user data. This value is unique for this key.
> > > > */
> > > > int32_t
> > > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > > >
> > > > /**
> > > > * Add a key to an existing hash table.
> > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
> > > >const void
> > *key);
> > > > * array of user data. This value is unique for this key.
> > > > */
> > > > int32_t
> > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void
> > > >*key,
> > hash_sig_t sig);
> > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> > hash_sig_t sig);
> > > >
> > > > /
> > >
> > > I think the above changes will break ABI by changing the parameter type?
> > Other people may know better on this.
> >
> > Just removing a const should not change the ABI, I believe, since the
> > const is just advisory hint to the compiler. Actual parameter size and
> > count remains unchanged so I don't believe there is an issue.
> > [ABI experts, please correct me if I'm wrong on this]
>
>
> [Certainly no ABI expert, but...]
>
> I think this is an API break, not ABI break.
>
> Given application code as follows, it will fail to compile - even though running
> the new code as a .so wouldn't cause any issues (AFAIK).
>
> void do_hash_stuff(const struct rte_hash *h, ...) {
> /* parameter passed in is const, but updated function prototype is non-
> const */
> rte_hash_add_key_with_hash(h, ...);
> }
>
> This means that we can't recompile apps against latest patch without
> application code changes, if the app was passing a const rte_hash struct as
> the first parameter.
>
Agree. Do we need to do anything for this?
>
> -Harry
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 1:00 ` Wang, Yipeng1
2018-09-28 8:26 ` Bruce Richardson
@ 2018-09-30 23:05 ` Honnappa Nagarahalli
2018-10-01 22:56 ` Wang, Yipeng1
2018-10-03 0:16 ` Wang, Yipeng1
1 sibling, 2 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-09-30 23:05 UTC (permalink / raw)
To: Wang, Yipeng1, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China), Steve Capper, Ola Liljedahl, nd
> >
> >Reader-writer concurrency issue, caused by moving the keys to their
> >alternative locations during key insert, is solved by introducing a
> >global counter(tbl_chng_cnt) indicating a change in table.
> >
> >@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
> > curr_bkt = curr_node->bkt;
> > }
> >
> >+ /* Inform the previous move. The current move need
> >+ * not be informed now as the current bucket entry
> >+ * is present in both primary and secondary.
> >+ * Since there is one writer, load acquires on
> >+ * tbl_chng_cnt are not required.
> >+ */
> >+ __atomic_store_n(&h->tbl_chng_cnt,
> >+ h->tbl_chng_cnt + 1,
> >+ __ATOMIC_RELEASE);
> >+ /* The stores to sig_alt and sig_current should not
> >+ * move above the store to tbl_chng_cnt.
> >+ */
> >+ __atomic_thread_fence(__ATOMIC_RELEASE);
> >+
> [Wang, Yipeng] I believe for X86 this fence should not be compiled to any
> code, otherwise we need macros for the compile time check.
'__atomic_thread_fence(__ATOMIC_RELEASE)' provides load-load and load-store fence [1]. Hence, it should not add any barriers for x86.
[1] https://preshing.com/20130922/acquire-and-release-fences/
>
> >@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct
> rte_hash *h, const void *key,
> > uint32_t bucket_idx;
> > hash_sig_t alt_hash;
> > struct rte_hash_bucket *bkt;
> >+ uint32_t cnt_b, cnt_a;
> > int ret;
> >
> >- bucket_idx = sig & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >-
> > __hash_rw_reader_lock(h);
> >
> >- /* Check if key is in primary location */
> >- ret = search_one_bucket(h, key, sig, data, bkt);
> >- if (ret != -1) {
> >- __hash_rw_reader_unlock(h);
> >- return ret;
> >- }
> >- /* Calculate secondary hash */
> >- alt_hash = rte_hash_secondary_hash(sig);
> >- bucket_idx = alt_hash & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >+ do {
> [Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and
> Concurrent MemCache with Dumber Caching and Smarter Hashing"
> as well as OvS cmap uses similar version counter to implement read-write
> concurrency for hash table, but one difference is reader checks even/odd of
> the version counter to make sure there is no concurrent writer. Could you just
> double check and confirm that this is not needed for your implementation?
>
I relooked at this paper. My patch makes use of the fact that during the process of shifting the key will be present in both primary and secondary buckets. The check for odd version counter is not required as the full key comparison would have identified any false signature matches.
> >--- a/lib/librte_hash/rte_hash.h
> >+++ b/lib/librte_hash/rte_hash.h
> >@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int
> >-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void
> >*data);
> >+rte_hash_add_key_data(struct rte_hash *h, const void *key, void
> >+*data);
> >
> > /**
> > * Add a key-value pair with a pre-computed hash value @@ -180,7
> >+180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void
> *key, void *data);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void
> >*key,
> >+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> > hash_sig_t sig, void *data);
> >
> > /**
> >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct
> rte_hash *h, const void *key,
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> >+rte_hash_add_key(struct rte_hash *h, const void *key);
> >
> > /**
> > * Add a key to an existing hash table.
> >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const
> void *key);
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> >hash_sig_t sig);
> >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> >+hash_sig_t sig);
> >
> > /
>
> I think the above changes will break ABI by changing the parameter type?
> Other people may know better on this.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency
2018-09-28 1:33 ` Wang, Yipeng1
@ 2018-10-01 4:11 ` Honnappa Nagarahalli
2018-10-01 23:54 ` Wang, Yipeng1
0 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-01 4:11 UTC (permalink / raw)
To: Wang, Yipeng1, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
> >
> >Add the flag to enable reader-writer concurrency during run time. The
> >rte_hash_del_xxx APIs do not free the keystore element when this flag
> >is enabled. Hence a new API, rte_hash_free_key_with_position, to free
> >the key store element is added.
> >
> >+/** Flag to support lock free reader writer concurrency */ #define
> >+RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF 0x08
> [Wang, Yipeng] It would be good to indicate that the lockless implementation
> works for single writer multiple readers.
Multi-writers are supported by using the rw-lock or transactional memory. Essentially, we still have single writer. This patch works fine with multi-writer as defined by ' MULTI_WRITER_ADD' flag. I have tested it as well. I will enable this test case in V2.
> Also, if people use a mix of the flags for example set both multiwriter and LF
> flags, then I guess either we need to return an error or maybe multiwriter
> should have higher priority. Currently the RW_CONCURRENCY will assume
> MULTI_WRITER_ADD I think.
As mentioned above, multi-writer and LF combination is supported. Yes, RW_CONCURRENCY currently assumes MULTI_WRITER_ADD. I think we should separate them.
> >+
> > /** Signature of key that is stored internally. */ typedef uint32_t
> > hash_sig_t;
> >
> >@@ -143,6 +148,11 @@ rte_hash_count(const struct rte_hash *h);
> > * and should only be called from one thread by default.
> > * Thread safety can be enabled by setting flag during
> > * table creation.
> >+ * When lock free reader writer concurrency is enabled,
> >+ * if this API is called to update an existing entry,
> >+ * the application should free any memory allocated for
> >+ * previous 'data' only after all the readers have stopped
> >+ * using previous 'data'.
> [Wang, Yipeng] Could you be more specific on this description?
> When add_key API is called, the users do not know if it will update an existing
> entry or inserting a new one, do they?
I think, it will depend on the application. The applications I have worked on so far, added a hash entry as a result of receiving an event and updated it on receiving another event. I can change the comments to indicate that the applications need to be aware of add/update operations.
>
> > *
> > * @param h
> > * Hash table to add the key to.
> >@@ -165,6 +175,11 @@ rte_hash_add_key_data(struct rte_hash *h, const
> >void *key, void *data);
> > * and should only be called from one thread by default.
> > * Thread safety can be enabled by setting flag during
> > * table creation.
> >+ * When lock free reader writer concurrency is enabled,
> >+ * if this API is called to update an existing entry,
> >+ * the application should free any memory allocated for
> >+ * previous 'data' only after all the readers have stopped
> >+ * using previous 'data'.
> > *
> > * @param h
> > * Hash table to add the key to.
> >@@ -230,6 +245,12 @@ rte_hash_add_key_with_hash(struct rte_hash *h,
> >const void *key, hash_sig_t sig);
> > * and should only be called from one thread by default.
> > * Thread safety can be enabled by setting flag during
> > * table creation.
> >+ * If lock free reader writer concurrency is enabled,
> >+ * the hash library's internal memory for the deleted
> >+ * key is not freed. It should be freed by calling
> >+ * rte_hash_free_key_with_position API after all
> >+ * the readers have stopped using the hash entry
> >+ * corresponding to this key.
> > *
> > * @param h
> > * Hash table to remove the key from.
> >@@ -241,6 +262,8 @@ rte_hash_add_key_with_hash(struct rte_hash *h,
> const void *key, hash_sig_t sig);
> > * - A positive value that can be used by the caller as an offset into an
> > * array of user data. This value is unique for this key, and is the same
> > * value that was returned when the key was added.
> >+ * When lock free concurrency is enabled, this value should be used
> >+ * while calling the rte_hash_free_key_with_position API.
> > */
> > int32_t
> > rte_hash_del_key(const struct rte_hash *h, const void *key); @@ -251,6
> >+274,12 @@ rte_hash_del_key(const struct rte_hash *h, const void *key);
> > * and should only be called from one thread by default.
> > * Thread safety can be enabled by setting flag during
> > * table creation.
> >+ * If lock free reader writer concurrency is enabled,
> >+ * the hash library's internal memory for the deleted
> >+ * key is not freed. It should be freed by calling
> >+ * rte_hash_free_key_with_position API after all
> >+ * the readers have stopped using the hash entry
> >+ * corresponding to this key.
> > *
> > * @param h
> > * Hash table to remove the key from.
> >@@ -264,6 +293,8 @@ rte_hash_del_key(const struct rte_hash *h, const
> void *key);
> > * - A positive value that can be used by the caller as an offset into an
> > * array of user data. This value is unique for this key, and is the same
> > * value that was returned when the key was added.
> >+ * When lock free concurrency is enabled, this value should be used
> >+ * while calling the rte_hash_free_key_with_position API.
> > */
> > int32_t
> > rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
> >hash_sig_t sig); @@ -290,6 +321,30 @@
> rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t
> position,
> > void **key);
> >
> [Wang, Yipeng] If possible, how about having a new delete function instead of
> modifying the current one?
> I think it does not need to be tied with the lockless implementation, it is
> orthogonal to multi-threading implementation.
> people using locks may still want this new deletion behavior.
> If people want old behavior, they can call current API, otherwise they can call
> the new deletion function, followed by Rte_hash_free_key_with_position later.
I like the terms 'delete' and 'free'. I am finding it hard to come up with a good name for the API. It will be on the lines of 'rte_hash_del_key_with_hash_no_free' - I do not like the name much.
Instead, we could have a configuration flag for the hash table, 'RTE_HASH_EXTRA_FLAGS_FREE_MEM_ON_DEL'. If this is enabled, 'rte_hash_del_...' APIs will free the key store index and any internal memory. Enabling lock-free RW concurrency will enable this flag. User can enable this flag explicitly while not using lock-free RW concurrency as well.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions
2018-09-28 0:43 ` Wang, Yipeng1
2018-09-30 22:20 ` Honnappa Nagarahalli
@ 2018-10-01 10:42 ` Ola Liljedahl
2018-10-02 1:52 ` Wang, Yipeng1
1 sibling, 1 reply; 36+ messages in thread
From: Ola Liljedahl @ 2018-10-01 10:42 UTC (permalink / raw)
To: Wang, Yipeng1, Honnappa Nagarahalli, Richardson, Bruce,
De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China), Steve Capper, nd
On 28/09/2018, 02:43, "Wang, Yipeng1" <yipeng1.wang@intel.com> wrote:
Some general comments for the various __atomic_store/load added,
1. Although it passes the compiler check, but I just want to confirm that if we should use GCC/clang builtins, or if
There are higher level APIs in DPDK to do atomic operations?
[Ola] Adding "higher level" API's on top of the basic language/compiler support is not a good idea.
There is an infinite amount of base types for the atomic operations, multiply that with all different types of atomic operations (e.g. load, store, fetch_add, add, cas etc etc) and the different memory orderings and you create a very large API (but likely only a small but irregular subset will be used). So lots of work for little gain and difficult to test every single item in the API.
For some compiler that does not support __atomic builtins, one could write an __atomic emulation layer. But I think GCC __atomic is already the ideal source code abstraction.
2. We believe compiler will translate the atomic_store/load to regular MOV instruction on
Total Store Order architecture (e.g. X86_64). But we run the perf test on x86 and here is the relative slowdown on
lookup comparing to master head. I am not sure if the performance drop comes from the atomic buitins.
[Ola] Any performance difference is most likely not from the use of atomic builtins because as you write, on x86 they should translate to normal loads and stores in most situations. But the code and data structures have changed so there is some difference in e.g. memory accesses, couldn't this explain the performance difference?
Keysize | single lookup | bulk lookup
4 | 0.93 | 0.95
8 | 0.95 | 0.96
16 | 0.97 | 0.96
32 | 0.97 | 1.00
48 | 1.03 | 0.99
64 | 1.04 | 0.98
9 | 0.91 | 0.96
13 | 0.97 | 0.98
37 | 1.04 | 1.03
40 | 1.02 | 0.98
Other comments inlined.
>-----Original Message-----
>Only race condition that can occur is - using the key store element
>before the key write is completed. Hence, while inserting the element
>the release memory order is used. Any other race condition is caught
>by the key comparison. Memory orderings are added only where needed.
>For ex: reads in the writer's context do not need memory ordering
>as there is a single writer.
>
[Wang, Yipeng] I assume this commit works together with the next one to enable
read-write concurrency on relaxed memory ordering machine. This commit by itself does not do that, right?
If this is the case, should we merge the two commits,
or maybe just explain it a little bit more in the commit message?
>key_idx in the bucket entry and pdata in the key store element are
>used for synchronisation. key_idx is used to release an inserted
>entry in the bucket to the reader. Use of pdata for synchronisation
>is required due to updation of an existing entry where-in only
>the pdata is updated without updating key_idx.
>
>
>-/* Search a key from bucket and update its data */
>+/* Search a key from bucket and update its data.
>+ * Writer holds the lock before calling this.
>+ */
> static inline int32_t
> search_and_update(const struct rte_hash *h, void *data, const void *key,
> struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
>@@ -499,8 +501,13 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
> k = (struct rte_hash_key *) ((char *)keys +
> bkt->key_idx[i] * h->key_entry_size);
> if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>- /* Update data */
>- k->pdata = data;
>+ /* 'pdata' acts as the synchronization point
>+ * when an existing hash entry is updated.
>+ * Key is not updated in this case.
>+ */
[Wang, Yipeng] I did not quite understand why do we need synchronization for hash data update.
Since pdata write is already atomic, the lookup will either read out the stale data or the new data,
which should be fine without synchronization.
Is it to ensure the order of multiple reads in lookup threads?
[Ola] If pdata is used as a reference to access other shared data, you need to ensure that the access of pdata and accesses to other data are ordered appropriately (e.g. with acquire/release). I think reading a new pdata but stale associated data is a bad thing.
>@@ -1243,11 +1289,19 @@ __rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> while (sec_hitmask[i]) {
> uint32_t hit_index = __builtin_ctzl(sec_hitmask[i]);
>
>- uint32_t key_idx = secondary_bkt[i]->key_idx[hit_index];
>+ uint32_t key_idx =
>+ __atomic_load_n(
>+ &secondary_bkt[i]->key_idx[hit_index],
>+ __ATOMIC_ACQUIRE);
> const struct rte_hash_key *key_slot =
> (const struct rte_hash_key *)(
> (const char *)h->key_store +
> key_idx * h->key_entry_size);
>+
>+ if (key_idx != EMPTY_SLOT)
[Wang, Yipeng] I think even for current code, we need to check empty_slot.
Could you export this as a bug fix commit?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions
2018-09-30 22:20 ` Honnappa Nagarahalli
@ 2018-10-01 22:41 ` Wang, Yipeng1
0 siblings, 0 replies; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-10-01 22:41 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Sunday, September 30, 2018 3:21 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
><pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl
><Ola.Liljedahl@arm.com>; nd <nd@arm.com>
>Subject: RE: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions
>
>>
>> Some general comments for the various __atomic_store/load added,
>>
>> 1. Although it passes the compiler check, but I just want to confirm that if we
>> should use GCC/clang builtins, or if There are higher level APIs in DPDK to do
>> atomic operations?
>>
>I have used gcc builtins (just like rte_ring does)
[Wang, Yipeng] I checked rte_ring, it also has a specific header for C11, since it is a C11
standard, do we need something similar here?
>
>> 2. We believe compiler will translate the atomic_store/load to regular MOV
>> instruction on Total Store Order architecture (e.g. X86_64). But we run the
>> perf test on x86 and here is the relative slowdown on lookup comparing to
>> master head. I am not sure if the performance drop comes from the atomic
>> buitins.
>>
>C11 atomics also block compiler reordering. Other than this, the retry loop is an addition to lookup.
>The patch also has the alignment corrected. I am not sure how is that affecting the perf numbers.
>
>> Keysize | single lookup | bulk lookup
>> 4 | 0.93 | 0.95
>> 8 | 0.95 | 0.96
>> 16 | 0.97 | 0.96
>> 32 | 0.97 | 1.00
>> 48 | 1.03 | 0.99
>> 64 | 1.04 | 0.98
>> 9 | 0.91 | 0.96
>> 13 | 0.97 | 0.98
>> 37 | 1.04 | 1.03
>> 40 | 1.02 | 0.98
>>
>I assume this is the data from the test cases in test_hash_perf.c file. I tried to reproduce this data, but my data is worse. Can you
>specify the actual test from test_hash_perf.c you are using (With locks/Pre-computed hash/With data/Elements in primary)?
>IMO, the differences you have provided are not high.
[Wang, Yipeng] I remember the performance data I used is the no-lock, without hash, with 8-byte data, in both primary and secondary.
I compared the master head to the one with your first two commits.
>
>> [Wang, Yipeng] I think even for current code, we need to check empty_slot.
>> Could you export this as a bug fix commit?
>>
>In the existing code, there is check 'if (!!key_idx & !rte_hash....)'. Are you referring to '!!key_idx'? I think this should be changed to
>'(key_idx != EMPTY_SLOT)'.
[Wang, Yipeng] Yeah, I guess I did not see that part. Then I guess it is no need to export as a bug fix for now since it is not a functional issue.
Your change is good.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-30 23:05 ` Honnappa Nagarahalli
@ 2018-10-01 22:56 ` Wang, Yipeng1
2018-10-03 0:16 ` Wang, Yipeng1
1 sibling, 0 replies; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-10-01 22:56 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Sunday, September 30, 2018 4:06 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
><pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl
><Ola.Liljedahl@arm.com>; nd <nd@arm.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>> >+ __atomic_store_n(&h->tbl_chng_cnt,
>> >+ h->tbl_chng_cnt + 1,
>> >+ __ATOMIC_RELEASE);
>> >+ /* The stores to sig_alt and sig_current should not
>> >+ * move above the store to tbl_chng_cnt.
>> >+ */
>> >+ __atomic_thread_fence(__ATOMIC_RELEASE);
>> >+
>> [Wang, Yipeng] I believe for X86 this fence should not be compiled to any
>> code, otherwise we need macros for the compile time check.
>'__atomic_thread_fence(__ATOMIC_RELEASE)' provides load-load and load-store fence [1]. Hence, it should not add any barriers for
>x86.
>
>[1] https://preshing.com/20130922/acquire-and-release-fences/
>
[Wang, Yipeng] Thanks for the link, it is very informative!
>>
>> >@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct
>> rte_hash *h, const void *key,
>> > uint32_t bucket_idx;
>> > hash_sig_t alt_hash;
>> > struct rte_hash_bucket *bkt;
>> >+ uint32_t cnt_b, cnt_a;
>> > int ret;
>> >
>> >- bucket_idx = sig & h->bucket_bitmask;
>> >- bkt = &h->buckets[bucket_idx];
>> >-
>> > __hash_rw_reader_lock(h);
>> >
>> >- /* Check if key is in primary location */
>> >- ret = search_one_bucket(h, key, sig, data, bkt);
>> >- if (ret != -1) {
>> >- __hash_rw_reader_unlock(h);
>> >- return ret;
>> >- }
>> >- /* Calculate secondary hash */
>> >- alt_hash = rte_hash_secondary_hash(sig);
>> >- bucket_idx = alt_hash & h->bucket_bitmask;
>> >- bkt = &h->buckets[bucket_idx];
>> >+ do {
>> [Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and
>> Concurrent MemCache with Dumber Caching and Smarter Hashing"
>> as well as OvS cmap uses similar version counter to implement read-write
>> concurrency for hash table, but one difference is reader checks even/odd of
>> the version counter to make sure there is no concurrent writer. Could you just
>> double check and confirm that this is not needed for your implementation?
>>
>I relooked at this paper. My patch makes use of the fact that during the process of shifting the key will be present in both primary and
>secondary buckets. The check for odd version counter is not required as the full key comparison would have identified any false
>signature matches.
[Wang, Yipeng] I was thinking about another corner case and wondering if the version counter needs to be done on key deletion as well.
For example, a reader reads out the index and falsely matches a signature, before it reads out the data and the key, the key-data pair got
deleted, removed, and recycled by another writer thread. This writer partially overwrote the key (because key store is too large to be atomic).
Now, the reader begins to compare the key, and accidentally matches the key (because the key is partially written and accidentally matches), will
The reader read the wrong data out (which should have been a lookup miss)?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency
2018-10-01 4:11 ` Honnappa Nagarahalli
@ 2018-10-01 23:54 ` Wang, Yipeng1
2018-10-11 5:24 ` Honnappa Nagarahalli
0 siblings, 1 reply; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-10-01 23:54 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Sunday, September 30, 2018 9:11 PM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
><pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl
><Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel, Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency
>
>> >
>> >Add the flag to enable reader-writer concurrency during run time. The
>> >rte_hash_del_xxx APIs do not free the keystore element when this flag
>> >is enabled. Hence a new API, rte_hash_free_key_with_position, to free
>> >the key store element is added.
>> >
>> >+/** Flag to support lock free reader writer concurrency */ #define
>> >+RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF 0x08
>> [Wang, Yipeng] It would be good to indicate that the lockless implementation
>> works for single writer multiple readers.
>Multi-writers are supported by using the rw-lock or transactional memory. Essentially, we still have single writer. This patch works fine
>with multi-writer as defined by ' MULTI_WRITER_ADD' flag. I have tested it as well. I will enable this test case in V2.
>
>> Also, if people use a mix of the flags for example set both multiwriter and LF
>> flags, then I guess either we need to return an error or maybe multiwriter
>> should have higher priority. Currently the RW_CONCURRENCY will assume
>> MULTI_WRITER_ADD I think.
>As mentioned above, multi-writer and LF combination is supported. Yes, RW_CONCURRENCY currently assumes MULTI_WRITER_ADD.
>I think we should separate them.
[Wang, Yipeng] It would be great if you could just add a little bit more comments to both of the flags to be more specific on what
Read write concurrency mean in both cases, just in case users got confused.
You may also want to update the documentation later (https://doc.dpdk.org/guides/prog_guide/hash_lib.html).
>
>> >+
>> > /** Signature of key that is stored internally. */ typedef uint32_t
>> > hash_sig_t;
>> >
>> >@@ -143,6 +148,11 @@ rte_hash_count(const struct rte_hash *h);
>> > * and should only be called from one thread by default.
>> > * Thread safety can be enabled by setting flag during
>> > * table creation.
>> >+ * When lock free reader writer concurrency is enabled,
>> >+ * if this API is called to update an existing entry,
>> >+ * the application should free any memory allocated for
>> >+ * previous 'data' only after all the readers have stopped
>> >+ * using previous 'data'.
>> [Wang, Yipeng] Could you be more specific on this description?
>> When add_key API is called, the users do not know if it will update an existing
>> entry or inserting a new one, do they?
>I think, it will depend on the application. The applications I have worked on so far, added a hash entry as a result of receiving an event
>and updated it on receiving another event. I can change the comments to indicate that the applications need to be aware of
>add/update operations.
[Wang, Yipeng] Even if for current rte_hash, after update, the application may still use the old data. It is the upper level application's
Responsibility. How is it specific to lock free implementation?
>
>> > rte_hash_del_key(const struct rte_hash *h, const void *key); @@ -251,6
>> >+274,12 @@ rte_hash_del_key(const struct rte_hash *h, const void *key);
>> > * and should only be called from one thread by default.
>> > * Thread safety can be enabled by setting flag during
>> > * table creation.
>> >+ * If lock free reader writer concurrency is enabled,
>> >+ * the hash library's internal memory for the deleted
>> >+ * key is not freed. It should be freed by calling
>> >+ * rte_hash_free_key_with_position API after all
>> >+ * the readers have stopped using the hash entry
>> >+ * corresponding to this key.
>> > *
>> > * @param h
>> > * Hash table to remove the key from.
>> >@@ -264,6 +293,8 @@ rte_hash_del_key(const struct rte_hash *h, const
>> void *key);
>> > * - A positive value that can be used by the caller as an offset into an
>> > * array of user data. This value is unique for this key, and is the same
>> > * value that was returned when the key was added.
>> >+ * When lock free concurrency is enabled, this value should be used
>> >+ * while calling the rte_hash_free_key_with_position API.
>> > */
>> > int32_t
>> > rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
>> >hash_sig_t sig); @@ -290,6 +321,30 @@
>> rte_hash_get_key_with_position(const struct rte_hash *h, const int32_t
>> position,
>> > void **key);
>> >
>> [Wang, Yipeng] If possible, how about having a new delete function instead of
>> modifying the current one?
>> I think it does not need to be tied with the lockless implementation, it is
>> orthogonal to multi-threading implementation.
>> people using locks may still want this new deletion behavior.
>> If people want old behavior, they can call current API, otherwise they can call
>> the new deletion function, followed by Rte_hash_free_key_with_position later.
>I like the terms 'delete' and 'free'. I am finding it hard to come up with a good name for the API. It will be on the lines of
>'rte_hash_del_key_with_hash_no_free' - I do not like the name much.
>Instead, we could have a configuration flag for the hash table, 'RTE_HASH_EXTRA_FLAGS_FREE_MEM_ON_DEL'. If this is enabled,
>'rte_hash_del_...' APIs will free the key store index and any internal memory. Enabling lock-free RW concurrency will enable this flag.
>User can enable this flag explicitly while not using lock-free RW concurrency as well.
[Wang, Yipeng] I am OK with either way. For flag, maybe we should call it RTE_HASH_EXTRA_FLAGS_RECYCLE _ON_DEL, since
The key-data pair index is recycled to be more specific. User should know that the index might be re-used by another write.
BTW, current flag is only 8 bit, as we specify more and more flags, maybe we should announce an API change to change it to 32bit
for next release.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash
2018-09-28 21:11 ` Honnappa Nagarahalli
@ 2018-10-02 0:30 ` Wang, Yipeng1
0 siblings, 0 replies; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-10-02 0:30 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, honnappa.nagarahalli, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>>
>> Another comment is as you know we also have a new extension to rte_hash
>> to enable extendable buckets and partial-key hashing. Thanks for the
>> comments btw. Could you check if your lockless scheme also applies to
>> those extensions?
>Thank you for reminding me on this. I thought I had covered everything. On a relook, I have missed few key issues. I will reply on the
>other email thread.
[Wang, Yipeng] Hi, Honnappa, would like to rebase your V2 on the V4 patch I submitted for the ext table and partial-key hashing (http://patchwork.dpdk.org/cover/45620/). I will have V5 sent soon, so if it is in time, please rebase on V5. If you have any difficulty to do so we can work together on resolving the conflicts. Let's resolve the conflicts so that the final merge would be easier.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions
2018-10-01 10:42 ` Ola Liljedahl
@ 2018-10-02 1:52 ` Wang, Yipeng1
0 siblings, 0 replies; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-10-02 1:52 UTC (permalink / raw)
To: Ola Liljedahl, Honnappa Nagarahalli, Richardson, Bruce,
De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China), Steve Capper, nd, Gobriel, Sameh
>-----Original Message-----
>From: Ola Liljedahl [mailto:Ola.Liljedahl@arm.com]
>On 28/09/2018, 02:43, "Wang, Yipeng1" <yipeng1.wang@intel.com> wrote:
>
> Some general comments for the various __atomic_store/load added,
>
> 1. Although it passes the compiler check, but I just want to confirm that if we should use GCC/clang builtins, or if
> There are higher level APIs in DPDK to do atomic operations?
>[Ola] Adding "higher level" API's on top of the basic language/compiler support is not a good idea.
>There is an infinite amount of base types for the atomic operations, multiply that with all different types of atomic operations (e.g.
>load, store, fetch_add, add, cas etc etc) and the different memory orderings and you create a very large API (but likely only a small but
>irregular subset will be used). So lots of work for little gain and difficult to test every single item in the API.
>
>For some compiler that does not support __atomic builtins, one could write an __atomic emulation layer. But I think GCC __atomic is
>already the ideal source code abstraction.
[Wang, Yipeng]Thanks for the explanation. I think OVS does something like using macros to abstract the various atomic
function from different compilers/architectures. But anyway,
since rte_ring is using the builtin as well and the compiler check passed, I am OK with the implementation.
Another comment I replied earlier is that rte_ring seems having a c11 header for using them. Should we
assume similar thing?
>
>
> 2. We believe compiler will translate the atomic_store/load to regular MOV instruction on
> Total Store Order architecture (e.g. X86_64). But we run the perf test on x86 and here is the relative slowdown on
> lookup comparing to master head. I am not sure if the performance drop comes from the atomic buitins.
>[Ola] Any performance difference is most likely not from the use of atomic builtins because as you write, on x86 they should translate
>to normal loads and stores in most situations. But the code and data structures have changed so there is some difference in e.g.
>memory accesses, couldn't this explain the performance difference?>
[Wang, Yipeng] Yes it might be.
> [Wang, Yipeng] I did not quite understand why do we need synchronization for hash data update.
> Since pdata write is already atomic, the lookup will either read out the stale data or the new data,
> which should be fine without synchronization.
> Is it to ensure the order of multiple reads in lookup threads?
>[Ola] If pdata is used as a reference to access other shared data, you need to ensure that the access of pdata and accesses to other
>data are ordered appropriately (e.g. with acquire/release). I think reading a new pdata but stale associated data is a bad thing.
>
[Wang, Yipeng] Thanks for the explanation. I got it now!
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-30 22:33 ` Honnappa Nagarahalli
@ 2018-10-02 13:17 ` Van Haaren, Harry
2018-10-02 23:58 ` Wang, Yipeng1
0 siblings, 1 reply; 36+ messages in thread
From: Van Haaren, Harry @ 2018-10-02 13:17 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, Wang, Yipeng1
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd
> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Sunday, September 30, 2018 11:33 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Wang, Yipeng1 <yipeng1.wang@intel.com>
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org;
> Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Steve Capper
> <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd
> <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
>
>
> > > > >
> > > > >Reader-writer concurrency issue, caused by moving the keys to their
> > > > >alternative locations during key insert, is solved by introducing a
> > > > >global counter(tbl_chng_cnt) indicating a change in table.
> >
> > <snip>
> >
> > > > > /**
> > > > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct
> > > > >rte_hash
> > > *h, const void *key,
> > > > > * array of user data. This value is unique for this key.
> > > > > */
> > > > > int32_t
> > > > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > > > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > > > >
> > > > > /**
> > > > > * Add a key to an existing hash table.
> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
> > > > >const void
> > > *key);
> > > > > * array of user data. This value is unique for this key.
> > > > > */
> > > > > int32_t
> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void
> > > > >*key,
> > > hash_sig_t sig);
> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> > > hash_sig_t sig);
> > > > >
> > > > > /
> > > >
> > > > I think the above changes will break ABI by changing the parameter
> type?
> > > Other people may know better on this.
> > >
> > > Just removing a const should not change the ABI, I believe, since the
> > > const is just advisory hint to the compiler. Actual parameter size and
> > > count remains unchanged so I don't believe there is an issue.
> > > [ABI experts, please correct me if I'm wrong on this]
> >
> >
> > [Certainly no ABI expert, but...]
> >
> > I think this is an API break, not ABI break.
> >
> > Given application code as follows, it will fail to compile - even though
> running
> > the new code as a .so wouldn't cause any issues (AFAIK).
> >
> > void do_hash_stuff(const struct rte_hash *h, ...) {
> > /* parameter passed in is const, but updated function prototype is
> non-
> > const */
> > rte_hash_add_key_with_hash(h, ...);
> > }
> >
> > This means that we can't recompile apps against latest patch without
> > application code changes, if the app was passing a const rte_hash struct
> as
> > the first parameter.
> >
> Agree. Do we need to do anything for this?
I think we should try to avoid breaking API wherever possible.
If we must, then I suppose we could follow the ABI process of
a deprecation notice.
>From my reading of the versioning docs, it doesn't document this case:
https://doc.dpdk.org/guides/contributing/versioning.html
I don't recall a similar situation in DPDK previously - so I suggest
you ask Tech board for input here.
Hope that helps! -Harry
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-02 13:17 ` Van Haaren, Harry
@ 2018-10-02 23:58 ` Wang, Yipeng1
2018-10-03 17:32 ` Honnappa Nagarahalli
0 siblings, 1 reply; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-10-02 23:58 UTC (permalink / raw)
To: Van Haaren, Harry, Honnappa Nagarahalli, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Van Haaren, Harry
>> > > > > /**
>> > > > > * Add a key to an existing hash table.
>> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
>> > > > >const void
>> > > *key);
>> > > > > * array of user data. This value is unique for this key.
>> > > > > */
>> > > > > int32_t
>> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void
>> > > > >*key,
>> > > hash_sig_t sig);
>> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
>> > > hash_sig_t sig);
>> > > > >
>> > > > > /
>> > > >
>> > > > I think the above changes will break ABI by changing the parameter
>> type?
>> > > Other people may know better on this.
>> > >
>> > > Just removing a const should not change the ABI, I believe, since the
>> > > const is just advisory hint to the compiler. Actual parameter size and
>> > > count remains unchanged so I don't believe there is an issue.
>> > > [ABI experts, please correct me if I'm wrong on this]
>> >
>> >
>> > [Certainly no ABI expert, but...]
>> >
>> > I think this is an API break, not ABI break.
>> >
>> > Given application code as follows, it will fail to compile - even though
>> running
>> > the new code as a .so wouldn't cause any issues (AFAIK).
>> >
>> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> > /* parameter passed in is const, but updated function prototype is
>> non-
>> > const */
>> > rte_hash_add_key_with_hash(h, ...);
>> > }
>> >
>> > This means that we can't recompile apps against latest patch without
>> > application code changes, if the app was passing a const rte_hash struct
>> as
>> > the first parameter.
>> >
>> Agree. Do we need to do anything for this?
>
>I think we should try to avoid breaking API wherever possible.
>If we must, then I suppose we could follow the ABI process of
>a deprecation notice.
>
>From my reading of the versioning docs, it doesn't document this case:
>https://doc.dpdk.org/guides/contributing/versioning.html
>
>I don't recall a similar situation in DPDK previously - so I suggest
>you ask Tech board for input here.
>
>Hope that helps! -Harry
[Wang, Yipeng]
Honnappa, how about use a pointer to the counter in the rte_hash struct instead of the counter? Will this avoid
API change?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-30 23:05 ` Honnappa Nagarahalli
2018-10-01 22:56 ` Wang, Yipeng1
@ 2018-10-03 0:16 ` Wang, Yipeng1
2018-10-03 17:39 ` Honnappa Nagarahalli
1 sibling, 1 reply; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-10-03 0:16 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>[Wang, Yipeng] I was thinking about another corner case and wondering if the version counter needs to be done on key deletion as
>well.
>For example, a reader reads out the index and falsely matches a signature, before it reads out the data and the key, the key-data pair
>got
>deleted, removed, and recycled by another writer thread. This writer partially overwrote the key (because key store is too large to be
>atomic).
>Now, the reader begins to compare the key, and accidentally matches the key (because the key is partially written and accidentally
>matches), will
>The reader read the wrong data out (which should have been a lookup miss)?
>
[Wang, Yipeng] As a second thought after reading your slides, I think the scenario I described above should be handled by the upper
level application using RCU-like mechanisms to recycle key-data only after the grace period. So I think it is OK. Please ignore my last comment.
But if convenient, please add more comment in the API doc and in user guide as well just in case users might not understand the restriction.
Thanks
Yipeng
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-02 23:58 ` Wang, Yipeng1
@ 2018-10-03 17:32 ` Honnappa Nagarahalli
2018-10-03 17:56 ` Wang, Yipeng1
2018-10-04 3:54 ` Honnappa Nagarahalli
0 siblings, 2 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-03 17:32 UTC (permalink / raw)
To: Wang, Yipeng1, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
> >-----Original Message-----
> >From: Van Haaren, Harry
> >> > > > > /**
> >> > > > > * Add a key to an existing hash table.
> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
> >> > > > >const void
> >> > > *key);
> >> > > > > * array of user data. This value is unique for this key.
> >> > > > > */
> >> > > > > int32_t
> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
> >> > > > >void *key,
> >> > > hash_sig_t sig);
> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
> >> > > > >+*key,
> >> > > hash_sig_t sig);
> >> > > > >
> >> > > > > /
> >> > > >
> >> > > > I think the above changes will break ABI by changing the
> >> > > > parameter
> >> type?
> >> > > Other people may know better on this.
> >> > >
> >> > > Just removing a const should not change the ABI, I believe, since
> >> > > the const is just advisory hint to the compiler. Actual parameter
> >> > > size and count remains unchanged so I don't believe there is an issue.
> >> > > [ABI experts, please correct me if I'm wrong on this]
> >> >
> >> >
> >> > [Certainly no ABI expert, but...]
> >> >
> >> > I think this is an API break, not ABI break.
> >> >
> >> > Given application code as follows, it will fail to compile - even
> >> > though
> >> running
> >> > the new code as a .so wouldn't cause any issues (AFAIK).
> >> >
> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
> >> > /* parameter passed in is const, but updated function prototype
> >> > is
> >> non-
> >> > const */
> >> > rte_hash_add_key_with_hash(h, ...); }
> >> >
> >> > This means that we can't recompile apps against latest patch
> >> > without application code changes, if the app was passing a const
> >> > rte_hash struct
> >> as
> >> > the first parameter.
> >> >
> >> Agree. Do we need to do anything for this?
> >
> >I think we should try to avoid breaking API wherever possible.
> >If we must, then I suppose we could follow the ABI process of a
> >deprecation notice.
> >
> >From my reading of the versioning docs, it doesn't document this case:
> >https://doc.dpdk.org/guides/contributing/versioning.html
> >
> >I don't recall a similar situation in DPDK previously - so I suggest
> >you ask Tech board for input here.
> >
> >Hope that helps! -Harry
> [Wang, Yipeng]
> Honnappa, how about use a pointer to the counter in the rte_hash struct
> instead of the counter? Will this avoid API change?
I think it defeats the purpose of 'const' parameter to the API and provides incorrect information to the user.
IMO, DPDK should have guidelines on how to handle the API compatibility breaks. I will send an email to tech board on this.
We can also solve this by having counters on the bucket. I was planning to do this little bit later. I will look at the effort involved and may be do it now.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 0:16 ` Wang, Yipeng1
@ 2018-10-03 17:39 ` Honnappa Nagarahalli
0 siblings, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-03 17:39 UTC (permalink / raw)
To: Wang, Yipeng1, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
> >-----Original Message-----
> >[Wang, Yipeng] I was thinking about another corner case and wondering
> >if the version counter needs to be done on key deletion as well.
> >For example, a reader reads out the index and falsely matches a
> >signature, before it reads out the data and the key, the key-data pair
> >got deleted, removed, and recycled by another writer thread. This
> >writer partially overwrote the key (because key store is too large to be
> atomic).
> >Now, the reader begins to compare the key, and accidentally matches the
> >key (because the key is partially written and accidentally matches),
> >will The reader read the wrong data out (which should have been a lookup
> miss)?
> >
> [Wang, Yipeng] As a second thought after reading your slides, I think the
> scenario I described above should be handled by the upper level application
> using RCU-like mechanisms to recycle key-data only after the grace period.
> So I think it is OK. Please ignore my last comment.
>
> But if convenient, please add more comment in the API doc and in user
> guide as well just in case users might not understand the restriction.
>
Agree, it needs careful documentation. I plan to change the documentation after the patch is completed.
> Thanks
> Yipeng
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:32 ` Honnappa Nagarahalli
@ 2018-10-03 17:56 ` Wang, Yipeng1
2018-10-03 23:05 ` Ola Liljedahl
2018-10-04 3:32 ` Honnappa Nagarahalli
2018-10-04 3:54 ` Honnappa Nagarahalli
1 sibling, 2 replies; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-10-03 17:56 UTC (permalink / raw)
To: Honnappa Nagarahalli, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, October 3, 2018 10:33 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>
>Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Gavin Hu (Arm Technology China)
><Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel,
>Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>> >-----Original Message-----
>> >From: Van Haaren, Harry
>> >> > > > > /**
>> >> > > > > * Add a key to an existing hash table.
>> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
>> >> > > > >const void
>> >> > > *key);
>> >> > > > > * array of user data. This value is unique for this key.
>> >> > > > > */
>> >> > > > > int32_t
>> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
>> >> > > > >void *key,
>> >> > > hash_sig_t sig);
>> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
>> >> > > > >+*key,
>> >> > > hash_sig_t sig);
>> >> > > > >
>> >> > > > > /
>> >> > > >
>> >> > > > I think the above changes will break ABI by changing the
>> >> > > > parameter
>> >> type?
>> >> > > Other people may know better on this.
>> >> > >
>> >> > > Just removing a const should not change the ABI, I believe, since
>> >> > > the const is just advisory hint to the compiler. Actual parameter
>> >> > > size and count remains unchanged so I don't believe there is an issue.
>> >> > > [ABI experts, please correct me if I'm wrong on this]
>> >> >
>> >> >
>> >> > [Certainly no ABI expert, but...]
>> >> >
>> >> > I think this is an API break, not ABI break.
>> >> >
>> >> > Given application code as follows, it will fail to compile - even
>> >> > though
>> >> running
>> >> > the new code as a .so wouldn't cause any issues (AFAIK).
>> >> >
>> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> >> > /* parameter passed in is const, but updated function prototype
>> >> > is
>> >> non-
>> >> > const */
>> >> > rte_hash_add_key_with_hash(h, ...); }
>> >> >
>> >> > This means that we can't recompile apps against latest patch
>> >> > without application code changes, if the app was passing a const
>> >> > rte_hash struct
>> >> as
>> >> > the first parameter.
>> >> >
>> >> Agree. Do we need to do anything for this?
>> >
>> >I think we should try to avoid breaking API wherever possible.
>> >If we must, then I suppose we could follow the ABI process of a
>> >deprecation notice.
>> >
>> >From my reading of the versioning docs, it doesn't document this case:
>> >https://doc.dpdk.org/guides/contributing/versioning.html
>> >
>> >I don't recall a similar situation in DPDK previously - so I suggest
>> >you ask Tech board for input here.
>> >
>> >Hope that helps! -Harry
>> [Wang, Yipeng]
>> Honnappa, how about use a pointer to the counter in the rte_hash struct
>> instead of the counter? Will this avoid API change?
>I think it defeats the purpose of 'const' parameter to the API and provides incorrect information to the user.
>IMO, DPDK should have guidelines on how to handle the API compatibility breaks. I will send an email to tech board on this.
>We can also solve this by having counters on the bucket. I was planning to do this little bit later. I will look at the effort involved and
>may be do it now.
[Wang, Yipeng]
I think with ABI/API change, you might need to announce it one release cycle ahead.
In the cuckoo switch paper: Scalable, High Performance Ethernet Forwarding with
CUCKOOSWITCH
it separates the version counter array and the hash table. You can strike a balance
between granularity of the version counter and the cache/memory requirement.
Is it a better way?
Another consideration is current bucket is 64-byte exactly with the partial-key-hashing.
To add another counter, we need to think about changing certain variables to still align
cache line.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:56 ` Wang, Yipeng1
@ 2018-10-03 23:05 ` Ola Liljedahl
2018-10-04 3:32 ` Honnappa Nagarahalli
1 sibling, 0 replies; 36+ messages in thread
From: Ola Liljedahl @ 2018-10-03 23:05 UTC (permalink / raw)
To: Wang, Yipeng1, Honnappa Nagarahalli, Van Haaren, Harry,
Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, nd, Gobriel, Sameh
On 03/10/2018, 20:00, "Wang, Yipeng1" <yipeng1.wang@intel.com> wrote:
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, October 3, 2018 10:33 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>
>Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Gavin Hu (Arm Technology China)
><Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel,
>Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>> >-----Original Message-----
>> >From: Van Haaren, Harry
>> >> > > > > /**
>> >> > > > > * Add a key to an existing hash table.
>> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
>> >> > > > >const void
>> >> > > *key);
>> >> > > > > * array of user data. This value is unique for this key.
>> >> > > > > */
>> >> > > > > int32_t
>> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
>> >> > > > >void *key,
>> >> > > hash_sig_t sig);
>> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
>> >> > > > >+*key,
>> >> > > hash_sig_t sig);
>> >> > > > >
>> >> > > > > /
>> >> > > >
>> >> > > > I think the above changes will break ABI by changing the
>> >> > > > parameter
>> >> type?
>> >> > > Other people may know better on this.
>> >> > >
>> >> > > Just removing a const should not change the ABI, I believe, since
>> >> > > the const is just advisory hint to the compiler. Actual parameter
>> >> > > size and count remains unchanged so I don't believe there is an issue.
>> >> > > [ABI experts, please correct me if I'm wrong on this]
>> >> >
>> >> >
>> >> > [Certainly no ABI expert, but...]
>> >> >
>> >> > I think this is an API break, not ABI break.
>> >> >
>> >> > Given application code as follows, it will fail to compile - even
>> >> > though
>> >> running
>> >> > the new code as a .so wouldn't cause any issues (AFAIK).
>> >> >
>> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> >> > /* parameter passed in is const, but updated function prototype
>> >> > is
>> >> non-
>> >> > const */
>> >> > rte_hash_add_key_with_hash(h, ...); }
>> >> >
>> >> > This means that we can't recompile apps against latest patch
>> >> > without application code changes, if the app was passing a const
>> >> > rte_hash struct
>> >> as
>> >> > the first parameter.
>> >> >
>> >> Agree. Do we need to do anything for this?
>> >
>> >I think we should try to avoid breaking API wherever possible.
>> >If we must, then I suppose we could follow the ABI process of a
>> >deprecation notice.
>> >
>> >From my reading of the versioning docs, it doesn't document this case:
>> >https://doc.dpdk.org/guides/contributing/versioning.html
>> >
>> >I don't recall a similar situation in DPDK previously - so I suggest
>> >you ask Tech board for input here.
>> >
>> >Hope that helps! -Harry
>> [Wang, Yipeng]
>> Honnappa, how about use a pointer to the counter in the rte_hash struct
>> instead of the counter? Will this avoid API change?
>I think it defeats the purpose of 'const' parameter to the API and provides incorrect information to the user.
>IMO, DPDK should have guidelines on how to handle the API compatibility breaks. I will send an email to tech board on this.
>We can also solve this by having counters on the bucket. I was planning to do this little bit later. I will look at the effort involved and
>may be do it now.
[Wang, Yipeng]
I think with ABI/API change, you might need to announce it one release cycle ahead.
In the cuckoo switch paper: Scalable, High Performance Ethernet Forwarding with
CUCKOOSWITCH
it separates the version counter array and the hash table. You can strike a balance
between granularity of the version counter and the cache/memory requirement.
Is it a better way?
[Ola] Having only a single generation counter can easily become a scalability bottleneck due to write contention to the cache line.
Ideally you want each gen counter to be located in its own cache line (multiple gen counters in the same cache line will experience the same write contention). But that seems a bit wasteful of memory.
Ideally (in order to avoid accessing more cache lines), the gen counter should be located in the hash bucket. But as you write below, this would create problems for the layout of the hash bucket, there isn't any room for another field.
So I propose an array of gen. counters, indexed by the hash (of somekind) of primary and alternate hashes (signatures) of the element (modulo the array size). So another hash table.
Another consideration is current bucket is 64-byte exactly with the partial-key-hashing.
To add another counter, we need to think about changing certain variables to still align
cache line.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:56 ` Wang, Yipeng1
2018-10-03 23:05 ` Ola Liljedahl
@ 2018-10-04 3:32 ` Honnappa Nagarahalli
1 sibling, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-04 3:32 UTC (permalink / raw)
To: Wang, Yipeng1, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh,
Honnappa Nagarahalli
> >
> >> >-----Original Message-----
> >> >From: Van Haaren, Harry
> >> >> > > > > /**
> >> >> > > > > * Add a key to an existing hash table.
> >> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash
> >> >> > > > >*h, const void
> >> >> > > *key);
> >> >> > > > > * array of user data. This value is unique for this key.
> >> >> > > > > */
> >> >> > > > > int32_t
> >> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
> >> >> > > > >void *key,
> >> >> > > hash_sig_t sig);
> >> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
> >> >> > > > >+*key,
> >> >> > > hash_sig_t sig);
> >> >> > > > >
> >> >> > > > > /
> >> >> > > >
> >> >> > > > I think the above changes will break ABI by changing the
> >> >> > > > parameter
> >> >> type?
> >> >> > > Other people may know better on this.
> >> >> > >
> >> >> > > Just removing a const should not change the ABI, I believe,
> >> >> > > since the const is just advisory hint to the compiler. Actual
> >> >> > > parameter size and count remains unchanged so I don't believe
> there is an issue.
> >> >> > > [ABI experts, please correct me if I'm wrong on this]
> >> >> >
> >> >> >
> >> >> > [Certainly no ABI expert, but...]
> >> >> >
> >> >> > I think this is an API break, not ABI break.
> >> >> >
> >> >> > Given application code as follows, it will fail to compile -
> >> >> > even though
> >> >> running
> >> >> > the new code as a .so wouldn't cause any issues (AFAIK).
> >> >> >
> >> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
> >> >> > /* parameter passed in is const, but updated function
> >> >> > prototype is
> >> >> non-
> >> >> > const */
> >> >> > rte_hash_add_key_with_hash(h, ...); }
> >> >> >
> >> >> > This means that we can't recompile apps against latest patch
> >> >> > without application code changes, if the app was passing a const
> >> >> > rte_hash struct
> >> >> as
> >> >> > the first parameter.
> >> >> >
> >> >> Agree. Do we need to do anything for this?
> >> >
> >> >I think we should try to avoid breaking API wherever possible.
> >> >If we must, then I suppose we could follow the ABI process of a
> >> >deprecation notice.
> >> >
> >> >From my reading of the versioning docs, it doesn't document this case:
> >> >https://doc.dpdk.org/guides/contributing/versioning.html
> >> >
> >> >I don't recall a similar situation in DPDK previously - so I suggest
> >> >you ask Tech board for input here.
> >> >
> >> >Hope that helps! -Harry
> >> [Wang, Yipeng]
> >> Honnappa, how about use a pointer to the counter in the rte_hash
> >> struct instead of the counter? Will this avoid API change?
> >I think it defeats the purpose of 'const' parameter to the API and provides
> incorrect information to the user.
> >IMO, DPDK should have guidelines on how to handle the API compatibility
> breaks. I will send an email to tech board on this.
> >We can also solve this by having counters on the bucket. I was planning
> >to do this little bit later. I will look at the effort involved and may be do it
> now.
> [Wang, Yipeng]
> I think with ABI/API change, you might need to announce it one release cycle
> ahead.
>
> In the cuckoo switch paper: Scalable, High Performance Ethernet Forwarding
> with CUCKOOSWITCH it separates the version counter array and the hash
> table. You can strike a balance between granularity of the version counter and
> the cache/memory requirement.
> Is it a better way?
This will introduce another cache line access. It would be good to stay within the single cacheline.
>
> Another consideration is current bucket is 64-byte exactly with the partial-
> key-hashing.
> To add another counter, we need to think about changing certain variables to
> still align cache line.
The 'flags' structure member is not being used. I plan to remove that. That will give us 8B, I will use 4B out of it for the counter.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:32 ` Honnappa Nagarahalli
2018-10-03 17:56 ` Wang, Yipeng1
@ 2018-10-04 3:54 ` Honnappa Nagarahalli
2018-10-04 19:16 ` Wang, Yipeng1
1 sibling, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-04 3:54 UTC (permalink / raw)
To: Honnappa Nagarahalli, Wang, Yipeng1, Van Haaren, Harry,
Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>
> > >-----Original Message-----
> > >From: Van Haaren, Harry
> > >> > > > > /**
> > >> > > > > * Add a key to an existing hash table.
> > >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash
> > >> > > > >*h, const void
> > >> > > *key);
> > >> > > > > * array of user data. This value is unique for this key.
> > >> > > > > */
> > >> > > > > int32_t
> > >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
> > >> > > > >void *key,
> > >> > > hash_sig_t sig);
> > >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
> > >> > > > >+*key,
> > >> > > hash_sig_t sig);
> > >> > > > >
> > >> > > > > /
> > >> > > >
> > >> > > > I think the above changes will break ABI by changing the
> > >> > > > parameter
> > >> type?
> > >> > > Other people may know better on this.
> > >> > >
> > >> > > Just removing a const should not change the ABI, I believe,
> > >> > > since the const is just advisory hint to the compiler. Actual
> > >> > > parameter size and count remains unchanged so I don't believe there
> is an issue.
> > >> > > [ABI experts, please correct me if I'm wrong on this]
> > >> >
> > >> >
> > >> > [Certainly no ABI expert, but...]
> > >> >
> > >> > I think this is an API break, not ABI break.
> > >> >
> > >> > Given application code as follows, it will fail to compile - even
> > >> > though
> > >> running
> > >> > the new code as a .so wouldn't cause any issues (AFAIK).
> > >> >
> > >> > void do_hash_stuff(const struct rte_hash *h, ...) {
> > >> > /* parameter passed in is const, but updated function
> > >> > prototype is
> > >> non-
> > >> > const */
> > >> > rte_hash_add_key_with_hash(h, ...); }
> > >> >
> > >> > This means that we can't recompile apps against latest patch
> > >> > without application code changes, if the app was passing a const
> > >> > rte_hash struct
> > >> as
> > >> > the first parameter.
> > >> >
> > >> Agree. Do we need to do anything for this?
> > >
> > >I think we should try to avoid breaking API wherever possible.
> > >If we must, then I suppose we could follow the ABI process of a
> > >deprecation notice.
> > >
> > >From my reading of the versioning docs, it doesn't document this case:
> > >https://doc.dpdk.org/guides/contributing/versioning.html
> > >
> > >I don't recall a similar situation in DPDK previously - so I suggest
> > >you ask Tech board for input here.
> > >
> > >Hope that helps! -Harry
> > [Wang, Yipeng]
> > Honnappa, how about use a pointer to the counter in the rte_hash
> > struct instead of the counter? Will this avoid API change?
> I think it defeats the purpose of 'const' parameter to the API and provides
> incorrect information to the user.
Yipeng, I think I have misunderstood your comment. I believe you meant; we could allocate memory to the counter and store the pointer in the structure. Please correct me if I am wrong.
This could be a solution, though it will be another cache line access. It might be ok given that it is a single cache line for entire hash table.
> IMO, DPDK should have guidelines on how to handle the API compatibility
> breaks. I will send an email to tech board on this.
> We can also solve this by having counters on the bucket. I was planning to do
> this little bit later. I will look at the effort involved and may be do it now.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-04 3:54 ` Honnappa Nagarahalli
@ 2018-10-04 19:16 ` Wang, Yipeng1
0 siblings, 0 replies; 36+ messages in thread
From: Wang, Yipeng1 @ 2018-10-04 19:16 UTC (permalink / raw)
To: Honnappa Nagarahalli, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, October 3, 2018 8:54 PM
>To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Wang, Yipeng1 <yipeng1.wang@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
>Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Gavin Hu (Arm Technology China)
><Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel,
>Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>>
>> > >-----Original Message-----
>> > >From: Van Haaren, Harry
>> > >> > > > > /**
>> > >> > > > > * Add a key to an existing hash table.
>> > >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash
>> > >> > > > >*h, const void
>> > >> > > *key);
>> > >> > > > > * array of user data. This value is unique for this key.
>> > >> > > > > */
>> > >> > > > > int32_t
>> > >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
>> > >> > > > >void *key,
>> > >> > > hash_sig_t sig);
>> > >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
>> > >> > > > >+*key,
>> > >> > > hash_sig_t sig);
>> > >> > > > >
>> > >> > > > > /
>> > >> > > >
>> > >> > > > I think the above changes will break ABI by changing the
>> > >> > > > parameter
>> > >> type?
>> > >> > > Other people may know better on this.
>> > >> > >
>> > >> > > Just removing a const should not change the ABI, I believe,
>> > >> > > since the const is just advisory hint to the compiler. Actual
>> > >> > > parameter size and count remains unchanged so I don't believe there
>> is an issue.
>> > >> > > [ABI experts, please correct me if I'm wrong on this]
>> > >> >
>> > >> >
>> > >> > [Certainly no ABI expert, but...]
>> > >> >
>> > >> > I think this is an API break, not ABI break.
>> > >> >
>> > >> > Given application code as follows, it will fail to compile - even
>> > >> > though
>> > >> running
>> > >> > the new code as a .so wouldn't cause any issues (AFAIK).
>> > >> >
>> > >> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> > >> > /* parameter passed in is const, but updated function
>> > >> > prototype is
>> > >> non-
>> > >> > const */
>> > >> > rte_hash_add_key_with_hash(h, ...); }
>> > >> >
>> > >> > This means that we can't recompile apps against latest patch
>> > >> > without application code changes, if the app was passing a const
>> > >> > rte_hash struct
>> > >> as
>> > >> > the first parameter.
>> > >> >
>> > >> Agree. Do we need to do anything for this?
>> > >
>> > >I think we should try to avoid breaking API wherever possible.
>> > >If we must, then I suppose we could follow the ABI process of a
>> > >deprecation notice.
>> > >
>> > >From my reading of the versioning docs, it doesn't document this case:
>> > >https://doc.dpdk.org/guides/contributing/versioning.html
>> > >
>> > >I don't recall a similar situation in DPDK previously - so I suggest
>> > >you ask Tech board for input here.
>> > >
>> > >Hope that helps! -Harry
>> > [Wang, Yipeng]
>> > Honnappa, how about use a pointer to the counter in the rte_hash
>> > struct instead of the counter? Will this avoid API change?
>> I think it defeats the purpose of 'const' parameter to the API and provides
>> incorrect information to the user.
>Yipeng, I think I have misunderstood your comment. I believe you meant; we could allocate memory to the counter and store the
>pointer in the structure. Please correct me if I am wrong.
>This could be a solution, though it will be another cache line access. It might be ok given that it is a single cache line for entire hash
>table.
[Wang, Yipeng] Yeah that is what I meant. It is an additional memory access but probably it will be in local cache.
Since time is tight, it could be a simple workaround for this version and in future you can extend this pointed counter to a counter array as Ola suggested and the
Cuckooo switch paper did for scaling issue.
>
>> IMO, DPDK should have guidelines on how to handle the API compatibility
>> breaks. I will send an email to tech board on this.
>> We can also solve this by having counters on the bucket. I was planning to do
>> this little bit later. I will look at the effort involved and may be do it now.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency
2018-10-01 23:54 ` Wang, Yipeng1
@ 2018-10-11 5:24 ` Honnappa Nagarahalli
0 siblings, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2018-10-11 5:24 UTC (permalink / raw)
To: Wang, Yipeng1, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
> >> >
> >> >Add the flag to enable reader-writer concurrency during run time.
> >> >The rte_hash_del_xxx APIs do not free the keystore element when this
> >> >flag is enabled. Hence a new API, rte_hash_free_key_with_position,
> >> >to free the key store element is added.
> >> >
> >> >+/** Flag to support lock free reader writer concurrency */ #define
> >> >+RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF 0x08
> >> [Wang, Yipeng] It would be good to indicate that the lockless
> >> implementation works for single writer multiple readers.
> >Multi-writers are supported by using the rw-lock or transactional
> >memory. Essentially, we still have single writer. This patch works fine with
> multi-writer as defined by ' MULTI_WRITER_ADD' flag. I have tested it as well.
> I will enable this test case in V2.
> >
> >> Also, if people use a mix of the flags for example set both
> >> multiwriter and LF flags, then I guess either we need to return an
> >> error or maybe multiwriter should have higher priority. Currently the
> >> RW_CONCURRENCY will assume MULTI_WRITER_ADD I think.
> >As mentioned above, multi-writer and LF combination is supported. Yes,
> RW_CONCURRENCY currently assumes MULTI_WRITER_ADD.
> >I think we should separate them.
> [Wang, Yipeng] It would be great if you could just add a little bit more
> comments to both of the flags to be more specific on what Read write
> concurrency mean in both cases, just in case users got confused.
> You may also want to update the documentation later
> (https://doc.dpdk.org/guides/prog_guide/hash_lib.html).
I will add the documentation once the patch is accepted.
>
> >
> >> >+
> >> > /** Signature of key that is stored internally. */ typedef uint32_t
> >> > hash_sig_t;
> >> >
> >> >@@ -143,6 +148,11 @@ rte_hash_count(const struct rte_hash *h);
> >> > * and should only be called from one thread by default.
> >> > * Thread safety can be enabled by setting flag during
> >> > * table creation.
> >> >+ * When lock free reader writer concurrency is enabled,
> >> >+ * if this API is called to update an existing entry,
> >> >+ * the application should free any memory allocated for
> >> >+ * previous 'data' only after all the readers have stopped
> >> >+ * using previous 'data'.
> >> [Wang, Yipeng] Could you be more specific on this description?
> >> When add_key API is called, the users do not know if it will update
> >> an existing entry or inserting a new one, do they?
> >I think, it will depend on the application. The applications I have
> >worked on so far, added a hash entry as a result of receiving an event
> >and updated it on receiving another event. I can change the comments to
> indicate that the applications need to be aware of add/update operations.
> [Wang, Yipeng] Even if for current rte_hash, after update, the application may
> still use the old data. It is the upper level application's Responsibility. How is it
> specific to lock free implementation?
I agree. I think it makes sense to keep this warning, but make it not specific to lock-free algorithm. I will make this change in V3.
> >
> >> > rte_hash_del_key(const struct rte_hash *h, const void *key); @@
> >> > -251,6
> >> >+274,12 @@ rte_hash_del_key(const struct rte_hash *h, const void
> >> >+*key);
> >> > * and should only be called from one thread by default.
> >> > * Thread safety can be enabled by setting flag during
> >> > * table creation.
> >> >+ * If lock free reader writer concurrency is enabled,
> >> >+ * the hash library's internal memory for the deleted
> >> >+ * key is not freed. It should be freed by calling
> >> >+ * rte_hash_free_key_with_position API after all
> >> >+ * the readers have stopped using the hash entry
> >> >+ * corresponding to this key.
> >> > *
> >> > * @param h
> >> > * Hash table to remove the key from.
> >> >@@ -264,6 +293,8 @@ rte_hash_del_key(const struct rte_hash *h, const
> >> void *key);
> >> > * - A positive value that can be used by the caller as an offset into an
> >> > * array of user data. This value is unique for this key, and is the same
> >> > * value that was returned when the key was added.
> >> >+ * When lock free concurrency is enabled, this value should be used
> >> >+ * while calling the rte_hash_free_key_with_position API.
> >> > */
> >> > int32_t
> >> > rte_hash_del_key_with_hash(const struct rte_hash *h, const void
> >> >*key, hash_sig_t sig); @@ -290,6 +321,30 @@
> >> rte_hash_get_key_with_position(const struct rte_hash *h, const
> >> int32_t position,
> >> > void **key);
> >> >
> >> [Wang, Yipeng] If possible, how about having a new delete function
> >> instead of modifying the current one?
> >> I think it does not need to be tied with the lockless implementation,
> >> it is orthogonal to multi-threading implementation.
> >> people using locks may still want this new deletion behavior.
> >> If people want old behavior, they can call current API, otherwise
> >> they can call the new deletion function, followed by
> Rte_hash_free_key_with_position later.
> >I like the terms 'delete' and 'free'. I am finding it hard to come up
> >with a good name for the API. It will be on the lines of
> 'rte_hash_del_key_with_hash_no_free' - I do not like the name much.
> >Instead, we could have a configuration flag for the hash table,
> >'RTE_HASH_EXTRA_FLAGS_FREE_MEM_ON_DEL'. If this is enabled,
> 'rte_hash_del_...' APIs will free the key store index and any internal memory.
> Enabling lock-free RW concurrency will enable this flag.
> >User can enable this flag explicitly while not using lock-free RW concurrency
> as well.
> [Wang, Yipeng] I am OK with either way. For flag, maybe we should call it
> RTE_HASH_EXTRA_FLAGS_RECYCLE _ON_DEL, since The key-data pair index is
> recycled to be more specific. User should know that the index might be re-
> used by another write.
> BTW, current flag is only 8 bit, as we specify more and more flags, maybe we
> should announce an API change to change it to 32bit for next release.
I agree. Do you know how to do this? Do you want to take care of this?
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2018-10-11 5:24 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-06 17:12 [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
2018-09-06 17:12 ` [dpdk-dev] [PATCH 1/4] hash: correct key store element alignment Honnappa Nagarahalli
2018-09-27 23:58 ` Wang, Yipeng1
2018-09-06 17:12 ` [dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions Honnappa Nagarahalli
2018-09-28 0:43 ` Wang, Yipeng1
2018-09-30 22:20 ` Honnappa Nagarahalli
2018-10-01 22:41 ` Wang, Yipeng1
2018-10-01 10:42 ` Ola Liljedahl
2018-10-02 1:52 ` Wang, Yipeng1
2018-09-06 17:12 ` [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys Honnappa Nagarahalli
2018-09-28 1:00 ` Wang, Yipeng1
2018-09-28 8:26 ` Bruce Richardson
2018-09-28 8:55 ` Van Haaren, Harry
2018-09-30 22:33 ` Honnappa Nagarahalli
2018-10-02 13:17 ` Van Haaren, Harry
2018-10-02 23:58 ` Wang, Yipeng1
2018-10-03 17:32 ` Honnappa Nagarahalli
2018-10-03 17:56 ` Wang, Yipeng1
2018-10-03 23:05 ` Ola Liljedahl
2018-10-04 3:32 ` Honnappa Nagarahalli
2018-10-04 3:54 ` Honnappa Nagarahalli
2018-10-04 19:16 ` Wang, Yipeng1
2018-09-30 23:05 ` Honnappa Nagarahalli
2018-10-01 22:56 ` Wang, Yipeng1
2018-10-03 0:16 ` Wang, Yipeng1
2018-10-03 17:39 ` Honnappa Nagarahalli
2018-09-06 17:12 ` [dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency Honnappa Nagarahalli
2018-09-28 1:33 ` Wang, Yipeng1
2018-10-01 4:11 ` Honnappa Nagarahalli
2018-10-01 23:54 ` Wang, Yipeng1
2018-10-11 5:24 ` Honnappa Nagarahalli
2018-09-14 21:18 ` [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
2018-09-26 14:36 ` Honnappa Nagarahalli
2018-09-27 23:45 ` Wang, Yipeng1
2018-09-28 21:11 ` Honnappa Nagarahalli
2018-10-02 0:30 ` Wang, Yipeng1
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).