DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
@ 2020-06-04 17:17 Stephen Hemminger
  2020-06-04 17:51 ` Honnappa Nagarahalli
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Stephen Hemminger @ 2020-06-04 17:17 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson
  Cc: dev, Stephen Hemminger, honnappa.nagarahalli, pablo.de.lara.guarch

The code in rte_cuckoo_hash multi-writer support is broken if write
operations are called from a non-EAL thread.

rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL
thread and that leads to using wrong local cache.

Add error checks and document the restriction.

Fixes: 9d033dac7d7c ("hash: support no free on delete")
Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: honnappa.nagarahalli@arm.com
Cc: pablo.de.lara.guarch@intel.com
---
 doc/guides/prog_guide/hash_lib.rst | 1 +
 lib/librte_hash/rte_cuckoo_hash.c  | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/doc/guides/prog_guide/hash_lib.rst b/doc/guides/prog_guide/hash_lib.rst
index d06c7de2ead1..29b41a425a43 100644
--- a/doc/guides/prog_guide/hash_lib.rst
+++ b/doc/guides/prog_guide/hash_lib.rst
@@ -85,6 +85,7 @@ For concurrent writes, and concurrent reads and writes the following flag values
 
 *  If the multi-writer flag (RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) is set, multiple threads writing to the table is allowed.
    Key add, delete, and table reset are protected from other writer threads. With only this flag set, readers are not protected from ongoing writes.
+   The writer threads must be EAL threads, it is not safe to write to a multi-writer hash table from an interrupt, control or other threads.
 
 *  If the read/write concurrency (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY) is set, multithread read/write operation is safe
    (i.e., application does not need to stop the readers from accessing the hash table until writers finish their updates. Readers and writers can operate on the table concurrently).
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 90cb99b0eef8..79c94107a582 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -979,6 +979,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	/* Did not find a match, so get a new slot for storing the new key */
 	if (h->use_local_cache) {
 		lcore_id = rte_lcore_id();
+		if (lcore_id == LCORE_ID_ANY)
+			return -EINVAL;
+
 		cached_free_slots = &h->local_free_slots[lcore_id];
 		/* Try to get a free slot from the local cache */
 		if (cached_free_slots->len == 0) {
@@ -1382,6 +1385,10 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 
 	if (h->use_local_cache) {
 		lcore_id = rte_lcore_id();
+		ERR_IF_TRUE((lcore_id == LCORE_ID_ANY),
+			    "%s: attempt to remove entry from non EAL thread\n",
+			    __func__);
+
 		cached_free_slots = &h->local_free_slots[lcore_id];
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
@@ -1637,6 +1644,8 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 
 	if (h->use_local_cache) {
 		lcore_id = rte_lcore_id();
+		RETURN_IF_TRUE((lcore_id == LCORE_ID_ANY), -EINVAL);
+
 		cached_free_slots = &h->local_free_slots[lcore_id];
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
-- 
2.26.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-04 17:17 [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread Stephen Hemminger
@ 2020-06-04 17:51 ` Honnappa Nagarahalli
  2020-06-04 17:58   ` Stephen Hemminger
  2020-06-04 21:32 ` Wang, Yipeng1
  2020-06-05 18:35 ` Stephen Hemminger
  2 siblings, 1 reply; 12+ messages in thread
From: Honnappa Nagarahalli @ 2020-06-04 17:51 UTC (permalink / raw)
  To: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson
  Cc: dev, pablo.de.lara.guarch, nd, nd

<snip>

> Subject: [PATCH] hash: document breakage with multi-writer thread
> 
> The code in rte_cuckoo_hash multi-writer support is broken if write
> operations are called from a non-EAL thread.
> 
> rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL thread
> and that leads to using wrong local cache.
> 
> Add error checks and document the restriction.
Having multiple non-EAL writer threads is a valid use case. Should we fix the issue instead?

> 
> Fixes: 9d033dac7d7c ("hash: support no free on delete")
> Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Cc: honnappa.nagarahalli@arm.com
> Cc: pablo.de.lara.guarch@intel.com
> ---
>  doc/guides/prog_guide/hash_lib.rst | 1 +  lib/librte_hash/rte_cuckoo_hash.c
> | 9 +++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/hash_lib.rst
> b/doc/guides/prog_guide/hash_lib.rst
> index d06c7de2ead1..29b41a425a43 100644
> --- a/doc/guides/prog_guide/hash_lib.rst
> +++ b/doc/guides/prog_guide/hash_lib.rst
> @@ -85,6 +85,7 @@ For concurrent writes, and concurrent reads and writes
> the following flag values
> 
>  *  If the multi-writer flag (RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) is
> set, multiple threads writing to the table is allowed.
>     Key add, delete, and table reset are protected from other writer threads.
> With only this flag set, readers are not protected from ongoing writes.
> +   The writer threads must be EAL threads, it is not safe to write to a multi-
> writer hash table from an interrupt, control or other threads.
> 
>  *  If the read/write concurrency
> (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY) is set, multithread
> read/write operation is safe
>     (i.e., application does not need to stop the readers from accessing the hash
> table until writers finish their updates. Readers and writers can operate on
> the table concurrently).
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index 90cb99b0eef8..79c94107a582 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -979,6 +979,9 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
>  	/* Did not find a match, so get a new slot for storing the new key */
>  	if (h->use_local_cache) {
>  		lcore_id = rte_lcore_id();
> +		if (lcore_id == LCORE_ID_ANY)
> +			return -EINVAL;
> +
>  		cached_free_slots = &h->local_free_slots[lcore_id];
>  		/* Try to get a free slot from the local cache */
>  		if (cached_free_slots->len == 0) {
> @@ -1382,6 +1385,10 @@ remove_entry(const struct rte_hash *h, struct
> rte_hash_bucket *bkt, unsigned i)
> 
>  	if (h->use_local_cache) {
>  		lcore_id = rte_lcore_id();
> +		ERR_IF_TRUE((lcore_id == LCORE_ID_ANY),
> +			    "%s: attempt to remove entry from non EAL
> thread\n",
> +			    __func__);
> +
>  		cached_free_slots = &h->local_free_slots[lcore_id];
>  		/* Cache full, need to free it. */
>  		if (cached_free_slots->len == LCORE_CACHE_SIZE) { @@ -
> 1637,6 +1644,8 @@ rte_hash_free_key_with_position(const struct rte_hash
> *h,
> 
>  	if (h->use_local_cache) {
>  		lcore_id = rte_lcore_id();
> +		RETURN_IF_TRUE((lcore_id == LCORE_ID_ANY), -EINVAL);
> +
>  		cached_free_slots = &h->local_free_slots[lcore_id];
>  		/* Cache full, need to free it. */
>  		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
> --
> 2.26.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-04 17:51 ` Honnappa Nagarahalli
@ 2020-06-04 17:58   ` Stephen Hemminger
  2020-06-04 18:43     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2020-06-04 17:58 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, dev,
	pablo.de.lara.guarch, nd

On Thu, 4 Jun 2020 17:51:43 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:

> <snip>
> 
> > Subject: [PATCH] hash: document breakage with multi-writer thread
> > 
> > The code in rte_cuckoo_hash multi-writer support is broken if write
> > operations are called from a non-EAL thread.
> > 
> > rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL thread
> > and that leads to using wrong local cache.
> > 
> > Add error checks and document the restriction.  
> Having multiple non-EAL writer threads is a valid use case. Should we fix the issue instead?

Discovered this the hard way...

Fixing is non-trivial. Basically, the local cache has to be take out and
that leads to having to do real locking or atomic operations.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-04 17:58   ` Stephen Hemminger
@ 2020-06-04 18:43     ` Honnappa Nagarahalli
  2020-06-04 19:10       ` Wang, Yipeng1
  0 siblings, 1 reply; 12+ messages in thread
From: Honnappa Nagarahalli @ 2020-06-04 18:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, dev,
	pablo.de.lara.guarch, nd, Honnappa Nagarahalli, nd

> > <snip>
> >
> > > Subject: [PATCH] hash: document breakage with multi-writer thread
> > >
> > > The code in rte_cuckoo_hash multi-writer support is broken if write
> > > operations are called from a non-EAL thread.
> > >
> > > rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL
> > > thread and that leads to using wrong local cache.
> > >
> > > Add error checks and document the restriction.
> > Having multiple non-EAL writer threads is a valid use case. Should we fix the
> issue instead?
> 
> Discovered this the hard way...
> 
> Fixing is non-trivial. Basically, the local cache has to be take out and that
> leads to having to do real locking or atomic operations.
Looking at rte_hash_create function:

        if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) {
                use_local_cache = 1;
                writer_takes_lock = 1;
        }

The writer locks are in place already. The code to handle the case when local cache is taken out is also there.
What we need is another input flag that says 'multi writer + non-eal threads' which would set 'use_local_cache = 0' and 'writer_takes_lock = 1'.
Not sure, it would be valuable addition. But looks like this is what you were expecting when you had enabled 'RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD'. Many other APIs in DPDK do not provide this kind of MT safety.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-04 18:43     ` Honnappa Nagarahalli
@ 2020-06-04 19:10       ` Wang, Yipeng1
  2020-06-04 19:34         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 12+ messages in thread
From: Wang, Yipeng1 @ 2020-06-04 19:10 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Stephen Hemminger
  Cc: Gobriel, Sameh, Richardson, Bruce, dev, De Lara Guarch, Pablo, nd, nd

> > > <snip>
> > >
> > > > Subject: [PATCH] hash: document breakage with multi-writer thread
> > > >
> > > > The code in rte_cuckoo_hash multi-writer support is broken if
> > > > write operations are called from a non-EAL thread.
> > > >
> > > > rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL
> > > > thread and that leads to using wrong local cache.
> > > >
> > > > Add error checks and document the restriction.
> > > Having multiple non-EAL writer threads is a valid use case. Should
> > > we fix the
> > issue instead?
> >
> > Discovered this the hard way...
> >
> > Fixing is non-trivial. Basically, the local cache has to be take out
> > and that leads to having to do real locking or atomic operations.
> Looking at rte_hash_create function:
> 
>         if (params->extra_flag &
> RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) {
>                 use_local_cache = 1;
>                 writer_takes_lock = 1;
>         }
> 
> The writer locks are in place already. The code to handle the case when local
> cache is taken out is also there.
> What we need is another input flag that says 'multi writer + non-eal threads'
> which would set 'use_local_cache = 0' and 'writer_takes_lock = 1'.
> Not sure, it would be valuable addition. But looks like this is what you were
> expecting when you had enabled
> 'RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD'. Many other APIs in DPDK
> do not provide this kind of MT safety.

[Wang, Yipeng]
If possible, we can try to not add new flags, because there are already a lot of flag options.
How about in the code, we check if the writer is a non-eal or not by checking the rte_lcore_id, and operate on the global queue?
Could this work?
If(h->use_local_cache) {
	lcore_id = rte_lcore_id();
	if(lcore_id == LCORE_ID_ANY) {   // this is non-eal threads
		<call rte_ring_mp/mc_* to directly operate on global queue>
	}
	Else {
		<original path>
	}
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-04 19:10       ` Wang, Yipeng1
@ 2020-06-04 19:34         ` Honnappa Nagarahalli
  2020-06-04 20:22           ` Wang, Yipeng1
  0 siblings, 1 reply; 12+ messages in thread
From: Honnappa Nagarahalli @ 2020-06-04 19:34 UTC (permalink / raw)
  To: Wang, Yipeng1, Stephen Hemminger
  Cc: Gobriel, Sameh, Richardson, Bruce, dev, De Lara Guarch,  Pablo,
	nd, Honnappa Nagarahalli, nd

<snip>

> > > >
> > > > > Subject: [PATCH] hash: document breakage with multi-writer
> > > > > thread
> > > > >
> > > > > The code in rte_cuckoo_hash multi-writer support is broken if
> > > > > write operations are called from a non-EAL thread.
> > > > >
> > > > > rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL
> > > > > thread and that leads to using wrong local cache.
> > > > >
> > > > > Add error checks and document the restriction.
> > > > Having multiple non-EAL writer threads is a valid use case. Should
> > > > we fix the
> > > issue instead?
> > >
> > > Discovered this the hard way...
> > >
> > > Fixing is non-trivial. Basically, the local cache has to be take out
> > > and that leads to having to do real locking or atomic operations.
> > Looking at rte_hash_create function:
> >
> >         if (params->extra_flag &
> > RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) {
> >                 use_local_cache = 1;
> >                 writer_takes_lock = 1;
> >         }
> >
> > The writer locks are in place already. The code to handle the case
> > when local cache is taken out is also there.
> > What we need is another input flag that says 'multi writer + non-eal
> threads'
> > which would set 'use_local_cache = 0' and 'writer_takes_lock = 1'.
> > Not sure, it would be valuable addition. But looks like this is what
> > you were expecting when you had enabled
> > 'RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD'. Many other APIs in DPDK
> do
> > not provide this kind of MT safety.
> 
> [Wang, Yipeng]
> If possible, we can try to not add new flags, because there are already a lot of
> flag options.
> How about in the code, we check if the writer is a non-eal or not by checking
> the rte_lcore_id, and operate on the global queue?
> Could this work?
> If(h->use_local_cache) {
> 	lcore_id = rte_lcore_id();
> 	if(lcore_id == LCORE_ID_ANY) {   // this is non-eal threads
> 		<call rte_ring_mp/mc_* to directly operate on global queue>
> 	}
> 	Else {
> 		<original path>
> 	}
> }
The other thing I wanted to do was saving on the memory allocated for the local cache when the writers are non-eal threads. Without knowing the kind of threads upfront, we might have to create the local cache when a writer adds the entry first time.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-04 19:34         ` Honnappa Nagarahalli
@ 2020-06-04 20:22           ` Wang, Yipeng1
  2020-06-04 21:06             ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Wang, Yipeng1 @ 2020-06-04 20:22 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Stephen Hemminger
  Cc: Gobriel, Sameh, Richardson, Bruce, dev, De Lara Guarch, Pablo, nd, nd

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Thursday, June 4, 2020 12:34 PM
> To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>
> Cc: Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; dev@dpdk.org; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; nd <nd@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH] hash: document breakage with multi-writer thread
> 
> <snip>
> 
> > > > >
> > > > > > Subject: [PATCH] hash: document breakage with multi-writer
> > > > > > thread
> > > > > >
> > > > > > The code in rte_cuckoo_hash multi-writer support is broken if
> > > > > > write operations are called from a non-EAL thread.
> > > > > >
> > > > > > rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non
> > > > > > EAL thread and that leads to using wrong local cache.
> > > > > >
> > > > > > Add error checks and document the restriction.
> > > > > Having multiple non-EAL writer threads is a valid use case.
> > > > > Should we fix the
> > > > issue instead?
> > > >
> > > > Discovered this the hard way...
> > > >
> > > > Fixing is non-trivial. Basically, the local cache has to be take
> > > > out and that leads to having to do real locking or atomic operations.
> > > Looking at rte_hash_create function:
> > >
> > >         if (params->extra_flag &
> > > RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) {
> > >                 use_local_cache = 1;
> > >                 writer_takes_lock = 1;
> > >         }
> > >
> > > The writer locks are in place already. The code to handle the case
> > > when local cache is taken out is also there.
> > > What we need is another input flag that says 'multi writer + non-eal
> > threads'
> > > which would set 'use_local_cache = 0' and 'writer_takes_lock = 1'.
> > > Not sure, it would be valuable addition. But looks like this is what
> > > you were expecting when you had enabled
> > > 'RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD'. Many other APIs in
> DPDK
> > do
> > > not provide this kind of MT safety.
> >
> > [Wang, Yipeng]
> > If possible, we can try to not add new flags, because there are
> > already a lot of flag options.
> > How about in the code, we check if the writer is a non-eal or not by
> > checking the rte_lcore_id, and operate on the global queue?
> > Could this work?
> > If(h->use_local_cache) {
> > 	lcore_id = rte_lcore_id();
> > 	if(lcore_id == LCORE_ID_ANY) {   // this is non-eal threads
> > 		<call rte_ring_mp/mc_* to directly operate on global queue>
> > 	}
> > 	Else {
> > 		<original path>
> > 	}
> > }
> The other thing I wanted to do was saving on the memory allocated for the
> local cache when the writers are non-eal threads. Without knowing the kind
> of threads upfront, we might have to create the local cache when a writer
> adds the entry first time.

I got what you mean.  If people only use non-eal threads, we could save the space of local cache completely. 
Creating local cache during the first write is one solution. But the current rte_hash always allocate things during
table creation time. This provides guarantee that the program won't fail in the middle due to memory allocation issue.

Meanwhile I would rather be wasting some space than adding another option flag related to multi-threading.
In my opinion, all those flags are already confusing enough. It would also be harder to maintain in future.

 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-04 20:22           ` Wang, Yipeng1
@ 2020-06-04 21:06             ` Stephen Hemminger
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2020-06-04 21:06 UTC (permalink / raw)
  To: Wang, Yipeng1
  Cc: Honnappa Nagarahalli, Gobriel, Sameh, Richardson, Bruce, dev,
	De Lara Guarch, Pablo, nd

On Thu, 4 Jun 2020 20:22:06 +0000
"Wang, Yipeng1" <yipeng1.wang@intel.com> wrote:

> > -----Original Message-----
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Sent: Thursday, June 4, 2020 12:34 PM
> > To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Stephen Hemminger
> > <stephen@networkplumber.org>
> > Cc: Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce
> > <bruce.richardson@intel.com>; dev@dpdk.org; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>; nd <nd@arm.com>; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> > Subject: RE: [PATCH] hash: document breakage with multi-writer thread
> > 
> > <snip>
> >   
> > > > > >  
> > > > > > > Subject: [PATCH] hash: document breakage with multi-writer
> > > > > > > thread
> > > > > > >
> > > > > > > The code in rte_cuckoo_hash multi-writer support is broken if
> > > > > > > write operations are called from a non-EAL thread.
> > > > > > >
> > > > > > > rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non
> > > > > > > EAL thread and that leads to using wrong local cache.
> > > > > > >
> > > > > > > Add error checks and document the restriction.  
> > > > > > Having multiple non-EAL writer threads is a valid use case.
> > > > > > Should we fix the  
> > > > > issue instead?
> > > > >
> > > > > Discovered this the hard way...
> > > > >
> > > > > Fixing is non-trivial. Basically, the local cache has to be take
> > > > > out and that leads to having to do real locking or atomic operations.  
> > > > Looking at rte_hash_create function:
> > > >
> > > >         if (params->extra_flag &
> > > > RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) {
> > > >                 use_local_cache = 1;
> > > >                 writer_takes_lock = 1;
> > > >         }
> > > >
> > > > The writer locks are in place already. The code to handle the case
> > > > when local cache is taken out is also there.
> > > > What we need is another input flag that says 'multi writer + non-eal  
> > > threads'  
> > > > which would set 'use_local_cache = 0' and 'writer_takes_lock = 1'.
> > > > Not sure, it would be valuable addition. But looks like this is what
> > > > you were expecting when you had enabled
> > > > 'RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD'. Many other APIs in  
> > DPDK  
> > > do  
> > > > not provide this kind of MT safety.  
> > >
> > > [Wang, Yipeng]
> > > If possible, we can try to not add new flags, because there are
> > > already a lot of flag options.
> > > How about in the code, we check if the writer is a non-eal or not by
> > > checking the rte_lcore_id, and operate on the global queue?
> > > Could this work?
> > > If(h->use_local_cache) {
> > > 	lcore_id = rte_lcore_id();
> > > 	if(lcore_id == LCORE_ID_ANY) {   // this is non-eal threads
> > > 		<call rte_ring_mp/mc_* to directly operate on global queue>
> > > 	}
> > > 	Else {
> > > 		<original path>
> > > 	}
> > > }  
> > The other thing I wanted to do was saving on the memory allocated for the
> > local cache when the writers are non-eal threads. Without knowing the kind
> > of threads upfront, we might have to create the local cache when a writer
> > adds the entry first time.  
> 
> I got what you mean.  If people only use non-eal threads, we could save the space of local cache completely. 
> Creating local cache during the first write is one solution. But the current rte_hash always allocate things during
> table creation time. This provides guarantee that the program won't fail in the middle due to memory allocation issue.
> 
> Meanwhile I would rather be wasting some space than adding another option flag related to multi-threading.
> In my opinion, all those flags are already confusing enough. It would also be harder to maintain in future.

Don't care about exact fix. Just don't want to randomly corrupt memory.
Having it work would be better. But can we just error out for now; existing code is broken

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-04 17:17 [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread Stephen Hemminger
  2020-06-04 17:51 ` Honnappa Nagarahalli
@ 2020-06-04 21:32 ` Wang, Yipeng1
  2020-06-05 18:35 ` Stephen Hemminger
  2 siblings, 0 replies; 12+ messages in thread
From: Wang, Yipeng1 @ 2020-06-04 21:32 UTC (permalink / raw)
  To: Stephen Hemminger, Gobriel, Sameh, Richardson, Bruce
  Cc: dev, honnappa.nagarahalli, De Lara Guarch, Pablo

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, June 4, 2020 10:18 AM
> To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh
> <sameh.gobriel@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>
> Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org>;
> honnappa.nagarahalli@arm.com; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: [PATCH] hash: document breakage with multi-writer thread
> 
> The code in rte_cuckoo_hash multi-writer support is broken if write
> operations are called from a non-EAL thread.
> 
> rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL thread
> and that leads to using wrong local cache.
> 
> Add error checks and document the restriction.
> 
> Fixes: 9d033dac7d7c ("hash: support no free on delete")
> Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Cc: honnappa.nagarahalli@arm.com
> Cc: pablo.de.lara.guarch@intel.com
> ---
>  doc/guides/prog_guide/hash_lib.rst | 1 +
> lib/librte_hash/rte_cuckoo_hash.c  | 9 +++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/hash_lib.rst
> b/doc/guides/prog_guide/hash_lib.rst
> index d06c7de2ead1..29b41a425a43 100644
> --- a/doc/guides/prog_guide/hash_lib.rst
> +++ b/doc/guides/prog_guide/hash_lib.rst
> @@ -85,6 +85,7 @@ For concurrent writes, and concurrent reads and writes
> the following flag values
> 
>  *  If the multi-writer flag (RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD)
> is set, multiple threads writing to the table is allowed.
>     Key add, delete, and table reset are protected from other writer threads.
> With only this flag set, readers are not protected from ongoing writes.
> +   The writer threads must be EAL threads, it is not safe to write to a multi-
> writer hash table from an interrupt, control or other threads.
> 
>  *  If the read/write concurrency
> (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY) is set, multithread
> read/write operation is safe
>     (i.e., application does not need to stop the readers from accessing the hash
> table until writers finish their updates. Readers and writers can operate on
> the table concurrently).
> diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> b/lib/librte_hash/rte_cuckoo_hash.c
> index 90cb99b0eef8..79c94107a582 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.c
> +++ b/lib/librte_hash/rte_cuckoo_hash.c
> @@ -979,6 +979,9 @@ __rte_hash_add_key_with_hash(const struct
> rte_hash *h, const void *key,
>  	/* Did not find a match, so get a new slot for storing the new key */
>  	if (h->use_local_cache) {
>  		lcore_id = rte_lcore_id();
> +		if (lcore_id == LCORE_ID_ANY)
> +			return -EINVAL;
> +
>  		cached_free_slots = &h->local_free_slots[lcore_id];
>  		/* Try to get a free slot from the local cache */
>  		if (cached_free_slots->len == 0) {
> @@ -1382,6 +1385,10 @@ remove_entry(const struct rte_hash *h, struct
> rte_hash_bucket *bkt, unsigned i)
> 
>  	if (h->use_local_cache) {
>  		lcore_id = rte_lcore_id();
> +		ERR_IF_TRUE((lcore_id == LCORE_ID_ANY),
> +			    "%s: attempt to remove entry from non EAL
> thread\n",
> +			    __func__);
> +
>  		cached_free_slots = &h->local_free_slots[lcore_id];
>  		/* Cache full, need to free it. */
>  		if (cached_free_slots->len == LCORE_CACHE_SIZE) { @@ -
> 1637,6 +1644,8 @@ rte_hash_free_key_with_position(const struct rte_hash
> *h,
> 
>  	if (h->use_local_cache) {
>  		lcore_id = rte_lcore_id();
> +		RETURN_IF_TRUE((lcore_id == LCORE_ID_ANY), -EINVAL);
> +
>  		cached_free_slots = &h->local_free_slots[lcore_id];
>  		/* Cache full, need to free it. */
>  		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
> --
> 2.26.2
[Wang, Yipeng] 
Since no conclusion on a better fix yet, I acked this fix.
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-04 17:17 [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread Stephen Hemminger
  2020-06-04 17:51 ` Honnappa Nagarahalli
  2020-06-04 21:32 ` Wang, Yipeng1
@ 2020-06-05 18:35 ` Stephen Hemminger
  2020-06-16 16:12   ` Thomas Monjalon
  2 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2020-06-05 18:35 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Bruce Richardson
  Cc: dev, honnappa.nagarahalli, pablo.de.lara.guarch

On Thu,  4 Jun 2020 10:17:31 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> The code in rte_cuckoo_hash multi-writer support is broken if write
> operations are called from a non-EAL thread.
> 
> rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL
> thread and that leads to using wrong local cache.
> 
> Add error checks and document the restriction.
> 
> Fixes: 9d033dac7d7c ("hash: support no free on delete")
> Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Cc: honnappa.nagarahalli@arm.com
> Cc: pablo.de.lara.guarch@intel.com

This restriction also needs to be added to the known issues
section of EAL

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-05 18:35 ` Stephen Hemminger
@ 2020-06-16 16:12   ` Thomas Monjalon
  2020-11-26 17:56     ` Thomas Monjalon
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Monjalon @ 2020-06-16 16:12 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, dev,
	honnappa.nagarahalli, pablo.de.lara.guarch

05/06/2020 20:35, Stephen Hemminger:
> On Thu,  4 Jun 2020 10:17:31 -0700
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> > The code in rte_cuckoo_hash multi-writer support is broken if write
> > operations are called from a non-EAL thread.
> > 
> > rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL
> > thread and that leads to using wrong local cache.
> > 
> > Add error checks and document the restriction.
> > 
> > Fixes: 9d033dac7d7c ("hash: support no free on delete")
> > Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > Cc: honnappa.nagarahalli@arm.com
> > Cc: pablo.de.lara.guarch@intel.com
> 
> This restriction also needs to be added to the known issues
> section of EAL

Are you going to send a v2 adding doc?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread
  2020-06-16 16:12   ` Thomas Monjalon
@ 2020-11-26 17:56     ` Thomas Monjalon
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Monjalon @ 2020-11-26 17:56 UTC (permalink / raw)
  To: Stephen Hemminger, Yipeng Wang, honnappa.nagarahalli
  Cc: dev, Sameh Gobriel, Bruce Richardson, pablo.de.lara.guarch

16/06/2020 18:12, Thomas Monjalon:
> 05/06/2020 20:35, Stephen Hemminger:
> > On Thu,  4 Jun 2020 10:17:31 -0700
> > Stephen Hemminger <stephen@networkplumber.org> wrote:
> > 
> > > The code in rte_cuckoo_hash multi-writer support is broken if write
> > > operations are called from a non-EAL thread.
> > > 
> > > rte_lcore_id() wil return LCORE_ID_ANY (UINT32_MAX) for non EAL
> > > thread and that leads to using wrong local cache.
> > > 
> > > Add error checks and document the restriction.
> > > 
> > > Fixes: 9d033dac7d7c ("hash: support no free on delete")
> > > Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > Cc: honnappa.nagarahalli@arm.com
> > > Cc: pablo.de.lara.guarch@intel.com
> > 
> > This restriction also needs to be added to the known issues
> > section of EAL
> 
> Are you going to send a v2 adding doc?

Any follow-up?




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-26 17:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-04 17:17 [dpdk-dev] [PATCH] hash: document breakage with multi-writer thread Stephen Hemminger
2020-06-04 17:51 ` Honnappa Nagarahalli
2020-06-04 17:58   ` Stephen Hemminger
2020-06-04 18:43     ` Honnappa Nagarahalli
2020-06-04 19:10       ` Wang, Yipeng1
2020-06-04 19:34         ` Honnappa Nagarahalli
2020-06-04 20:22           ` Wang, Yipeng1
2020-06-04 21:06             ` Stephen Hemminger
2020-06-04 21:32 ` Wang, Yipeng1
2020-06-05 18:35 ` Stephen Hemminger
2020-06-16 16:12   ` Thomas Monjalon
2020-11-26 17:56     ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).