* [PATCH] eal: fix memory initialization deadlock
@ 2023-08-30 10:33 Artemy Kovalyov
2023-08-30 19:13 ` Dmitry Kozlyuk
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Artemy Kovalyov @ 2023-08-30 10:33 UTC (permalink / raw)
To: dev
Cc: Thomas Monjalon, Ophir Munk, stable, Anatoly Burakov,
Stephen Hemminger, Morten Brørup
The issue arose due to the change in the DPDK read-write lock
implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed
to prevent new read locks while a write lock is in the queue. However,
this change has led to a scenario where a recursive read lock, where a
lock is acquired twice by the same execution thread, can initiate a
sequence of events resulting in a deadlock:
Process 1 takes the first read lock.
Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due
to the presence of a read lock. This makes process 2 enter a wait loop
until the read lock is released.
Process 1 tries to take a second read lock. However, since a write lock
is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until
the write lock is acquired and then released.
Both processes end up in a blocked state, unable to proceed, resulting
in a deadlock scenario.
Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if
it already possesses one. The problem arises during initialization: the
rte_eal_init() function acquires the memory_hotplug_lock, and later on,
the sequence of calls rte_eal_memory_init() -> eal_memalloc_init() ->
rte_memseg_list_walk() acquires it again without releasing it. This
scenario introduces the risk of a potential deadlock when concurrent
write locks are applied to the same memory_hotplug_lock. To address this
we resolved the issue by replacing rte_memseg_list_walk() with
rte_memseg_list_walk_thread_unsafe().
Bugzilla ID: 1277
Fixes: 832cecc03d77 ("rwlock: prevent readers from starving writers")
Cc: stable@dpdk.org
Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
lib/eal/include/generic/rte_rwlock.h | 4 ++++
lib/eal/linux/eal_memalloc.c | 7 +++++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h
index 9e083bbc61..c98fc7d083 100644
--- a/lib/eal/include/generic/rte_rwlock.h
+++ b/lib/eal/include/generic/rte_rwlock.h
@@ -80,6 +80,10 @@ rte_rwlock_init(rte_rwlock_t *rwl)
/**
* Take a read lock. Loop until the lock is held.
*
+ * @note The RW lock isn't recursive, so calling this function on the same
+ * lock twice without releasing it could potentially result in a deadlock
+ * scenario when a write lock is involved.
+ *
* @param rwl
* A pointer to a rwlock structure.
*/
diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
index f8b1588cae..3705b41f5f 100644
--- a/lib/eal/linux/eal_memalloc.c
+++ b/lib/eal/linux/eal_memalloc.c
@@ -1740,7 +1740,10 @@ eal_memalloc_init(void)
eal_get_internal_configuration();
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
- if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
+ /* memory_hotplug_lock is taken in rte_eal_init(), so it's
+ * safe to call thread-unsafe version.
+ */
+ if (rte_memseg_list_walk_thread_unsafe(secondary_msl_create_walk, NULL) < 0)
return -1;
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
internal_conf->in_memory) {
@@ -1778,7 +1781,7 @@ eal_memalloc_init(void)
}
/* initialize all of the fd lists */
- if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+ if (rte_memseg_list_walk_thread_unsafe(fd_list_create_walk, NULL))
return -1;
return 0;
}
--
2.25.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] eal: fix memory initialization deadlock
2023-08-30 10:33 [PATCH] eal: fix memory initialization deadlock Artemy Kovalyov
@ 2023-08-30 19:13 ` Dmitry Kozlyuk
2023-09-04 8:24 ` [PATCH v2] " Artemy Kovalyov
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Dmitry Kozlyuk @ 2023-08-30 19:13 UTC (permalink / raw)
To: Artemy Kovalyov
Cc: dev, Thomas Monjalon, Ophir Munk, stable, Anatoly Burakov,
Stephen Hemminger, Morten Brørup
2023-08-30 13:33 (UTC+0300), Artemy Kovalyov:
> Following these changes, the RW-lock no longer supports
> recursion, implying that a single thread shouldn't obtain a read lock if
> it already possesses one. The problem arises during initialization: the
> rte_eal_init() function acquires the memory_hotplug_lock, and later on,
> the sequence of calls rte_eal_memory_init() -> eal_memalloc_init() ->
> rte_memseg_list_walk() acquires it again without releasing it. This
> scenario introduces the risk of a potential deadlock when concurrent
> write locks are applied to the same memory_hotplug_lock. To address this
> we resolved the issue by replacing rte_memseg_list_walk() with
> rte_memseg_list_walk_thread_unsafe().
There is another call to rte_memseg_list_walk() during initialization:
from eal_dynmem_hugepage_init(), please address it too.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] eal: fix memory initialization deadlock
2023-08-30 10:33 [PATCH] eal: fix memory initialization deadlock Artemy Kovalyov
2023-08-30 19:13 ` Dmitry Kozlyuk
@ 2023-09-04 8:24 ` Artemy Kovalyov
2023-09-05 7:05 ` Dmitry Kozlyuk
2023-09-06 9:52 ` [PATCH v3] " Artemy Kovalyov
2023-09-08 13:17 ` [PATCH v4 0/2] " Artemy Kovalyov
3 siblings, 1 reply; 12+ messages in thread
From: Artemy Kovalyov @ 2023-09-04 8:24 UTC (permalink / raw)
To: dev
Cc: Thomas Monjalon, Ophir Munk, stable, Anatoly Burakov,
Morten Brørup, Stephen Hemminger
The issue arose due to the change in the DPDK read-write lock
implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed
to prevent new read locks while a write lock is in the queue. However,
this change has led to a scenario where a recursive read lock, where a
lock is acquired twice by the same execution thread, can initiate a
sequence of events resulting in a deadlock:
Process 1 takes the first read lock.
Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due
to the presence of a read lock. This makes process 2 enter a wait loop
until the read lock is released.
Process 1 tries to take a second read lock. However, since a write lock
is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until
the write lock is acquired and then released.
Both processes end up in a blocked state, unable to proceed, resulting
in a deadlock scenario.
Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if
it already possesses one. The problem arises during initialization: the
rte_eal_init() function acquires the memory_hotplug_lock, and later on,
the sequence of calls rte_eal_memory_init() -> eal_memalloc_init() ->
rte_memseg_list_walk() acquires it again without releasing it. This
scenario introduces the risk of a potential deadlock when concurrent
write locks are applied to the same memory_hotplug_lock. To address this
we resolved the issue by replacing rte_memseg_list_walk() with
rte_memseg_list_walk_thread_unsafe().
Bugzilla ID: 1277
Fixes: 832cecc03d77 ("rwlock: prevent readers from starving writers")
Cc: stable@dpdk.org
Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
v2:
changed walk to thread-unsafe version in eal_dynmem_hugepage_init() 32-bit flow
---
lib/eal/common/eal_common_dynmem.c | 5 ++++-
lib/eal/include/generic/rte_rwlock.h | 4 ++++
lib/eal/linux/eal_memalloc.c | 7 +++++--
3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c
index bdbbe233a0..0d5da40096 100644
--- a/lib/eal/common/eal_common_dynmem.c
+++ b/lib/eal/common/eal_common_dynmem.c
@@ -251,7 +251,10 @@ eal_dynmem_hugepage_init(void)
*/
memset(&dummy, 0, sizeof(dummy));
dummy.hugepage_sz = hpi->hugepage_sz;
- if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0)
+ /* memory_hotplug_lock is taken in rte_eal_init(), so it's
+ * safe to call thread-unsafe version.
+ */
+ if (rte_memseg_list_walk_thread_unsafe(hugepage_count_walk, &dummy) < 0)
return -1;
for (i = 0; i < RTE_DIM(dummy.num_pages); i++) {
diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h
index 9e083bbc61..c98fc7d083 100644
--- a/lib/eal/include/generic/rte_rwlock.h
+++ b/lib/eal/include/generic/rte_rwlock.h
@@ -80,6 +80,10 @@ rte_rwlock_init(rte_rwlock_t *rwl)
/**
* Take a read lock. Loop until the lock is held.
*
+ * @note The RW lock isn't recursive, so calling this function on the same
+ * lock twice without releasing it could potentially result in a deadlock
+ * scenario when a write lock is involved.
+ *
* @param rwl
* A pointer to a rwlock structure.
*/
diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
index f8b1588cae..3705b41f5f 100644
--- a/lib/eal/linux/eal_memalloc.c
+++ b/lib/eal/linux/eal_memalloc.c
@@ -1740,7 +1740,10 @@ eal_memalloc_init(void)
eal_get_internal_configuration();
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
- if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
+ /* memory_hotplug_lock is taken in rte_eal_init(), so it's
+ * safe to call thread-unsafe version.
+ */
+ if (rte_memseg_list_walk_thread_unsafe(secondary_msl_create_walk, NULL) < 0)
return -1;
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
internal_conf->in_memory) {
@@ -1778,7 +1781,7 @@ eal_memalloc_init(void)
}
/* initialize all of the fd lists */
- if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+ if (rte_memseg_list_walk_thread_unsafe(fd_list_create_walk, NULL))
return -1;
return 0;
}
--
2.25.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] eal: fix memory initialization deadlock
2023-09-04 8:24 ` [PATCH v2] " Artemy Kovalyov
@ 2023-09-05 7:05 ` Dmitry Kozlyuk
2023-09-05 9:05 ` Artemy Kovalyov
0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Kozlyuk @ 2023-09-05 7:05 UTC (permalink / raw)
To: Artemy Kovalyov
Cc: dev, Thomas Monjalon, Ophir Munk, stable, Anatoly Burakov,
Morten Brørup, Stephen Hemminger
2023-09-04 11:24 (UTC+0300), Artemy Kovalyov:
> diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c
> index bdbbe233a0..0d5da40096 100644
> --- a/lib/eal/common/eal_common_dynmem.c
> +++ b/lib/eal/common/eal_common_dynmem.c
> @@ -251,7 +251,10 @@ eal_dynmem_hugepage_init(void)
> */
> memset(&dummy, 0, sizeof(dummy));
> dummy.hugepage_sz = hpi->hugepage_sz;
> - if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0)
> + /* memory_hotplug_lock is taken in rte_eal_init(), so it's
> + * safe to call thread-unsafe version.
> + */
Nit: the lock is really taken in rte_eal_memory_init().
Probably "The lock is held during initialization, so..."
would more robust against code changes and differences between platforms.
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH v2] eal: fix memory initialization deadlock
2023-09-05 7:05 ` Dmitry Kozlyuk
@ 2023-09-05 9:05 ` Artemy Kovalyov
2023-09-05 10:15 ` David Marchand
0 siblings, 1 reply; 12+ messages in thread
From: Artemy Kovalyov @ 2023-09-05 9:05 UTC (permalink / raw)
To: Dmitry Kozlyuk
Cc: dev, NBU-Contact-Thomas Monjalon (EXTERNAL),
Ophir Munk, stable, Anatoly Burakov, Morten Brørup,
Stephen Hemminger
> > + /* memory_hotplug_lock is taken in rte_eal_init(), so it's
> > + * safe to call thread-unsafe version.
> > + */
>
> Nit: the lock is really taken in rte_eal_memory_init().
> Probably "The lock is held during initialization, so..."
> would more robust against code changes and differences between platforms.
It was previously located differently, but in the current version, it has been shifted to rte_eal_init(). It might be worth noting this to ensure that if there are further code changes in the future, the locking problem becomes more apparent. We had discussed this in the bug report.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] eal: fix memory initialization deadlock
2023-09-05 9:05 ` Artemy Kovalyov
@ 2023-09-05 10:15 ` David Marchand
0 siblings, 0 replies; 12+ messages in thread
From: David Marchand @ 2023-09-05 10:15 UTC (permalink / raw)
To: Artemy Kovalyov
Cc: Dmitry Kozlyuk, dev, NBU-Contact-Thomas Monjalon (EXTERNAL),
Ophir Munk, stable, Anatoly Burakov, Morten Brørup,
Stephen Hemminger
On Tue, Sep 5, 2023 at 11:05 AM Artemy Kovalyov <artemyko@nvidia.com> wrote:
>
> > > + /* memory_hotplug_lock is taken in rte_eal_init(), so it's
> > > + * safe to call thread-unsafe version.
> > > + */
> >
> > Nit: the lock is really taken in rte_eal_memory_init().
> > Probably "The lock is held during initialization, so..."
> > would more robust against code changes and differences between platforms.
>
> It was previously located differently, but in the current version, it has been shifted to rte_eal_init(). It might be worth noting this to ensure that if there are further code changes in the future, the locking problem becomes more apparent. We had discussed this in the bug report.
One option to explore is lock annotations.
One note thought: those annotations do not get inherited in called code.
So some special care is needed to maintain/annotate all code leading
to the locations where the locks do matter.
Quick example with rte_memseg_list_walk:
https://github.com/david-marchand/dpdk/commit/mem_annotations
And clang catches a deadlock:
https://github.com/david-marchand/dpdk/actions/runs/6082842080/job/16501450978#step:19:816
--
David Marchand
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3] eal: fix memory initialization deadlock
2023-08-30 10:33 [PATCH] eal: fix memory initialization deadlock Artemy Kovalyov
2023-08-30 19:13 ` Dmitry Kozlyuk
2023-09-04 8:24 ` [PATCH v2] " Artemy Kovalyov
@ 2023-09-06 9:52 ` Artemy Kovalyov
2023-09-06 12:52 ` David Marchand
2023-09-08 13:17 ` [PATCH v4 0/2] " Artemy Kovalyov
3 siblings, 1 reply; 12+ messages in thread
From: Artemy Kovalyov @ 2023-09-06 9:52 UTC (permalink / raw)
To: dev
Cc: Thomas Monjalon, Ophir Munk, stable, Anatoly Burakov,
Morten Brørup, Stephen Hemminger
The issue arose due to the change in the DPDK read-write lock
implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed
to prevent new read locks while a write lock is in the queue. However,
this change has led to a scenario where a recursive read lock, where a
lock is acquired twice by the same execution thread, can initiate a
sequence of events resulting in a deadlock:
Process 1 takes the first read lock.
Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due
to the presence of a read lock. This makes process 2 enter a wait loop
until the read lock is released.
Process 1 tries to take a second read lock. However, since a write lock
is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until
the write lock is acquired and then released.
Both processes end up in a blocked state, unable to proceed, resulting
in a deadlock scenario.
Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if
it already possesses one. The problem arises during initialization: the
rte_eal_init() function acquires the memory_hotplug_lock, and later on,
the sequence of calls rte_eal_memory_init() -> eal_memalloc_init() ->
rte_memseg_list_walk() acquires it again without releasing it. This
scenario introduces the risk of a potential deadlock when concurrent
write locks are applied to the same memory_hotplug_lock. To address this
we resolved the issue by replacing rte_memseg_list_walk() with
rte_memseg_list_walk_thread_unsafe().
Implementing a lock annotation for rte_memseg_list_walk() to
proactively identify bugs similar to this one during compile time.
Bugzilla ID: 1277
Fixes: 832cecc03d77 ("rwlock: prevent readers from starving writers")
Cc: stable@dpdk.org
Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
v2:
changed walk to thread-unsafe version in eal_dynmem_hugepage_init() 32-bit flow
v3:
added lock annotation for the flow
---
lib/eal/common/eal_common_dynmem.c | 5 ++++-
lib/eal/common/eal_memalloc.h | 3 ++-
lib/eal/common/eal_private.h | 3 ++-
lib/eal/include/generic/rte_rwlock.h | 4 ++++
lib/eal/include/rte_lock_annotations.h | 5 +++++
lib/eal/include/rte_memory.h | 4 +++-
lib/eal/linux/eal_memalloc.c | 7 +++++--
7 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c
index bdbbe233a0..95da55d9b0 100644
--- a/lib/eal/common/eal_common_dynmem.c
+++ b/lib/eal/common/eal_common_dynmem.c
@@ -251,7 +251,10 @@ eal_dynmem_hugepage_init(void)
*/
memset(&dummy, 0, sizeof(dummy));
dummy.hugepage_sz = hpi->hugepage_sz;
- if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0)
+ /* memory_hotplug_lock is held during initialization, so it's
+ * safe to call thread-unsafe version.
+ */
+ if (rte_memseg_list_walk_thread_unsafe(hugepage_count_walk, &dummy) < 0)
return -1;
for (i = 0; i < RTE_DIM(dummy.num_pages); i++) {
diff --git a/lib/eal/common/eal_memalloc.h b/lib/eal/common/eal_memalloc.h
index ebc3a6f6c1..286ffb7633 100644
--- a/lib/eal/common/eal_memalloc.h
+++ b/lib/eal/common/eal_memalloc.h
@@ -91,7 +91,8 @@ int
eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
int
-eal_memalloc_init(void);
+eal_memalloc_init(void)
+ __rte_shared_locks_required(rte_mcfg_mem_get_lock());
int
eal_memalloc_cleanup(void);
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 5eadba4902..ebd496b537 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -115,7 +115,8 @@ int rte_eal_memseg_init(void);
* @return
* 0 on success, negative on error
*/
-int rte_eal_memory_init(void);
+int rte_eal_memory_init(void)
+ __rte_shared_locks_required(rte_mcfg_mem_get_lock());
/**
* Configure timers
diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h
index 9e083bbc61..c98fc7d083 100644
--- a/lib/eal/include/generic/rte_rwlock.h
+++ b/lib/eal/include/generic/rte_rwlock.h
@@ -80,6 +80,10 @@ rte_rwlock_init(rte_rwlock_t *rwl)
/**
* Take a read lock. Loop until the lock is held.
*
+ * @note The RW lock isn't recursive, so calling this function on the same
+ * lock twice without releasing it could potentially result in a deadlock
+ * scenario when a write lock is involved.
+ *
* @param rwl
* A pointer to a rwlock structure.
*/
diff --git a/lib/eal/include/rte_lock_annotations.h b/lib/eal/include/rte_lock_annotations.h
index 9fc50082d6..2456a69352 100644
--- a/lib/eal/include/rte_lock_annotations.h
+++ b/lib/eal/include/rte_lock_annotations.h
@@ -40,6 +40,9 @@ extern "C" {
#define __rte_unlock_function(...) \
__attribute__((unlock_function(__VA_ARGS__)))
+#define __rte_locks_excluded(...) \
+ __attribute__((locks_excluded(__VA_ARGS__)))
+
#define __rte_no_thread_safety_analysis \
__attribute__((no_thread_safety_analysis))
@@ -62,6 +65,8 @@ extern "C" {
#define __rte_unlock_function(...)
+#define __rte_locks_excluded(...)
+
#define __rte_no_thread_safety_analysis
#endif /* RTE_ANNOTATE_LOCKS */
diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index 3a1c607228..842362d527 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -22,6 +22,7 @@ extern "C" {
#include <rte_bitops.h>
#include <rte_common.h>
#include <rte_config.h>
+#include <rte_eal_memconfig.h>
#include <rte_fbarray.h>
#define RTE_PGSIZE_4K (1ULL << 12)
@@ -250,7 +251,8 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* -1 if user function reported error
*/
int
-rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg);
+rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
+ __rte_locks_excluded(rte_mcfg_mem_get_lock());
/**
* Walk list of all memsegs without performing any locking.
diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
index f8b1588cae..9853ec78a2 100644
--- a/lib/eal/linux/eal_memalloc.c
+++ b/lib/eal/linux/eal_memalloc.c
@@ -1740,7 +1740,10 @@ eal_memalloc_init(void)
eal_get_internal_configuration();
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
- if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
+ /* memory_hotplug_lock is held during initialization, so it's
+ * safe to call thread-unsafe version.
+ */
+ if (rte_memseg_list_walk_thread_unsafe(secondary_msl_create_walk, NULL) < 0)
return -1;
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
internal_conf->in_memory) {
@@ -1778,7 +1781,7 @@ eal_memalloc_init(void)
}
/* initialize all of the fd lists */
- if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+ if (rte_memseg_list_walk_thread_unsafe(fd_list_create_walk, NULL))
return -1;
return 0;
}
--
2.25.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] eal: fix memory initialization deadlock
2023-09-06 9:52 ` [PATCH v3] " Artemy Kovalyov
@ 2023-09-06 12:52 ` David Marchand
0 siblings, 0 replies; 12+ messages in thread
From: David Marchand @ 2023-09-06 12:52 UTC (permalink / raw)
To: Artemy Kovalyov
Cc: dev, Thomas Monjalon, Ophir Munk, stable, Anatoly Burakov,
Morten Brørup, Stephen Hemminger
On Wed, Sep 6, 2023 at 11:53 AM Artemy Kovalyov <artemyko@nvidia.com> wrote:
>
> The issue arose due to the change in the DPDK read-write lock
> implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed
> to prevent new read locks while a write lock is in the queue. However,
> this change has led to a scenario where a recursive read lock, where a
> lock is acquired twice by the same execution thread, can initiate a
> sequence of events resulting in a deadlock:
>
> Process 1 takes the first read lock.
> Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due
> to the presence of a read lock. This makes process 2 enter a wait loop
> until the read lock is released.
> Process 1 tries to take a second read lock. However, since a write lock
> is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until
> the write lock is acquired and then released.
>
> Both processes end up in a blocked state, unable to proceed, resulting
> in a deadlock scenario.
>
> Following these changes, the RW-lock no longer supports
> recursion, implying that a single thread shouldn't obtain a read lock if
> it already possesses one. The problem arises during initialization: the
> rte_eal_init() function acquires the memory_hotplug_lock, and later on,
> the sequence of calls rte_eal_memory_init() -> eal_memalloc_init() ->
> rte_memseg_list_walk() acquires it again without releasing it. This
> scenario introduces the risk of a potential deadlock when concurrent
> write locks are applied to the same memory_hotplug_lock. To address this
> we resolved the issue by replacing rte_memseg_list_walk() with
> rte_memseg_list_walk_thread_unsafe().
>
> Implementing a lock annotation for rte_memseg_list_walk() to
> proactively identify bugs similar to this one during compile time.
The annotations are not necessary to the fix (that we will likely
backport in LTS versions).
Please split this change in two patches, to separate those annotations
from the fix.
>
> Bugzilla ID: 1277
> Fixes: 832cecc03d77 ("rwlock: prevent readers from starving writers")
> Cc: stable@dpdk.org
>
> Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
--
David Marchand
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4 0/2] fix memory initialization deadlock
2023-08-30 10:33 [PATCH] eal: fix memory initialization deadlock Artemy Kovalyov
` (2 preceding siblings ...)
2023-09-06 9:52 ` [PATCH v3] " Artemy Kovalyov
@ 2023-09-08 13:17 ` Artemy Kovalyov
2023-09-08 13:17 ` [PATCH v4 1/2] eal: " Artemy Kovalyov
2023-09-08 13:17 ` [PATCH v4 2/2] eal: annotate rte_memseg_list_walk() Artemy Kovalyov
3 siblings, 2 replies; 12+ messages in thread
From: Artemy Kovalyov @ 2023-09-08 13:17 UTC (permalink / raw)
To: dev; +Cc: Thomas Monjalon, Ophir Munk
The issue arose due to the change in the DPDK read-write lock
implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed
to prevent new read locks while a write lock is in the queue. However,
this change has led to a scenario where a recursive read lock, where a
lock is acquired twice by the same execution thread, can initiate a
sequence of events resulting in a deadlock:
Process 1 takes the first read lock.
Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due
to the presence of a read lock. This makes process 2 enter a wait loop
until the read lock is released.
Process 1 tries to take a second read lock. However, since a write lock
is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until
the write lock is acquired and then released.
Both processes end up in a blocked state, unable to proceed, resulting
in a deadlock scenario.
Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if
it already possesses one. The problem arises during initialization: the
rte_eal_init() function acquires the memory_hotplug_lock, and later on,
there are sequences of calls leading to rte_memseg_list_walk() which acquires
it again without releasing it. This scenario introduces the risk of a
potential deadlock when concurrent write locks are applied to the same
memory_hotplug_lock. To address this we resolved the issue by replacing
rte_memseg_list_walk() with rte_memseg_list_walk_thread_unsafe().
Implementing a lock annotation for rte_memseg_list_walk() to
proactively identify bugs similar to this one during compile time.
Artemy Kovalyov (2):
eal: fix memory initialization deadlock
eal: annotate rte_memseg_list_walk()
lib/eal/common/eal_common_dynmem.c | 5 ++++-
lib/eal/common/eal_memalloc.h | 3 ++-
lib/eal/common/eal_private.h | 3 ++-
lib/eal/include/generic/rte_rwlock.h | 4 ++++
lib/eal/include/rte_lock_annotations.h | 5 +++++
lib/eal/include/rte_memory.h | 4 +++-
lib/eal/linux/eal_memalloc.c | 7 +++++--
7 files changed, 25 insertions(+), 6 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4 1/2] eal: fix memory initialization deadlock
2023-09-08 13:17 ` [PATCH v4 0/2] " Artemy Kovalyov
@ 2023-09-08 13:17 ` Artemy Kovalyov
2023-09-08 13:17 ` [PATCH v4 2/2] eal: annotate rte_memseg_list_walk() Artemy Kovalyov
1 sibling, 0 replies; 12+ messages in thread
From: Artemy Kovalyov @ 2023-09-08 13:17 UTC (permalink / raw)
To: dev
Cc: Thomas Monjalon, Ophir Munk, stable, Anatoly Burakov,
Morten Brørup, Stephen Hemminger
The issue arose due to changes in the DPDK read-write lock
implementation. Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if
it already possesses one. The problem arises during initialization: the
rte_eal_init() function acquires the memory_hotplug_lock, and later on,
there are sequences of calls that acquire it again without releasing it.
* rte_eal_memory_init() -> eal_memalloc_init() -> rte_memseg_list_walk()
* rte_eal_memory_init() -> rte_eal_hugepage_init() ->
eal_dynmem_hugepage_init() -> rte_memseg_list_walk()
This scenario introduces the risk of a potential deadlock when concurrent
write locks are applied to the same memory_hotplug_lock. To address this
locally, we resolved the issue by replacing rte_memseg_list_walk() with
rte_memseg_list_walk_thread_unsafe().
Bugzilla ID: 1277
Fixes: 832cecc03d77 ("rwlock: prevent readers from starving writers")
Cc: stable@dpdk.org
Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
lib/eal/common/eal_common_dynmem.c | 5 ++++-
lib/eal/include/generic/rte_rwlock.h | 4 ++++
lib/eal/linux/eal_memalloc.c | 7 +++++--
3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c
index bdbbe23..95da55d 100644
--- a/lib/eal/common/eal_common_dynmem.c
+++ b/lib/eal/common/eal_common_dynmem.c
@@ -251,7 +251,10 @@
*/
memset(&dummy, 0, sizeof(dummy));
dummy.hugepage_sz = hpi->hugepage_sz;
- if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0)
+ /* memory_hotplug_lock is held during initialization, so it's
+ * safe to call thread-unsafe version.
+ */
+ if (rte_memseg_list_walk_thread_unsafe(hugepage_count_walk, &dummy) < 0)
return -1;
for (i = 0; i < RTE_DIM(dummy.num_pages); i++) {
diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h
index 9e083bb..c98fc7d 100644
--- a/lib/eal/include/generic/rte_rwlock.h
+++ b/lib/eal/include/generic/rte_rwlock.h
@@ -80,6 +80,10 @@
/**
* Take a read lock. Loop until the lock is held.
*
+ * @note The RW lock isn't recursive, so calling this function on the same
+ * lock twice without releasing it could potentially result in a deadlock
+ * scenario when a write lock is involved.
+ *
* @param rwl
* A pointer to a rwlock structure.
*/
diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
index f8b1588..9853ec7 100644
--- a/lib/eal/linux/eal_memalloc.c
+++ b/lib/eal/linux/eal_memalloc.c
@@ -1740,7 +1740,10 @@ struct rte_memseg *
eal_get_internal_configuration();
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
- if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
+ /* memory_hotplug_lock is held during initialization, so it's
+ * safe to call thread-unsafe version.
+ */
+ if (rte_memseg_list_walk_thread_unsafe(secondary_msl_create_walk, NULL) < 0)
return -1;
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
internal_conf->in_memory) {
@@ -1778,7 +1781,7 @@ struct rte_memseg *
}
/* initialize all of the fd lists */
- if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+ if (rte_memseg_list_walk_thread_unsafe(fd_list_create_walk, NULL))
return -1;
return 0;
}
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4 2/2] eal: annotate rte_memseg_list_walk()
2023-09-08 13:17 ` [PATCH v4 0/2] " Artemy Kovalyov
2023-09-08 13:17 ` [PATCH v4 1/2] eal: " Artemy Kovalyov
@ 2023-09-08 13:17 ` Artemy Kovalyov
2023-10-06 10:12 ` David Marchand
1 sibling, 1 reply; 12+ messages in thread
From: Artemy Kovalyov @ 2023-09-08 13:17 UTC (permalink / raw)
To: dev; +Cc: Thomas Monjalon, Ophir Munk, Anatoly Burakov
Implementing a lock annotation for rte_memseg_list_walk() to
proactively identify bugs similar to memory_hotplug_lock deadlock during
initialization during compile time.
Bugzilla ID: 1277
Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
lib/eal/common/eal_memalloc.h | 3 ++-
lib/eal/common/eal_private.h | 3 ++-
lib/eal/include/rte_lock_annotations.h | 5 +++++
lib/eal/include/rte_memory.h | 4 +++-
4 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/lib/eal/common/eal_memalloc.h b/lib/eal/common/eal_memalloc.h
index ebc3a6f..286ffb7 100644
--- a/lib/eal/common/eal_memalloc.h
+++ b/lib/eal/common/eal_memalloc.h
@@ -91,7 +91,8 @@ struct rte_memseg *
eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
int
-eal_memalloc_init(void);
+eal_memalloc_init(void)
+ __rte_shared_locks_required(rte_mcfg_mem_get_lock());
int
eal_memalloc_cleanup(void);
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 5eadba4..ebd496b 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -115,7 +115,8 @@ struct rte_config {
* @return
* 0 on success, negative on error
*/
-int rte_eal_memory_init(void);
+int rte_eal_memory_init(void)
+ __rte_shared_locks_required(rte_mcfg_mem_get_lock());
/**
* Configure timers
diff --git a/lib/eal/include/rte_lock_annotations.h b/lib/eal/include/rte_lock_annotations.h
index 9fc5008..2456a69 100644
--- a/lib/eal/include/rte_lock_annotations.h
+++ b/lib/eal/include/rte_lock_annotations.h
@@ -40,6 +40,9 @@
#define __rte_unlock_function(...) \
__attribute__((unlock_function(__VA_ARGS__)))
+#define __rte_locks_excluded(...) \
+ __attribute__((locks_excluded(__VA_ARGS__)))
+
#define __rte_no_thread_safety_analysis \
__attribute__((no_thread_safety_analysis))
@@ -62,6 +65,8 @@
#define __rte_unlock_function(...)
+#define __rte_locks_excluded(...)
+
#define __rte_no_thread_safety_analysis
#endif /* RTE_ANNOTATE_LOCKS */
diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index 3a1c607..842362d 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -22,6 +22,7 @@
#include <rte_bitops.h>
#include <rte_common.h>
#include <rte_config.h>
+#include <rte_eal_memconfig.h>
#include <rte_fbarray.h>
#define RTE_PGSIZE_4K (1ULL << 12)
@@ -250,7 +251,8 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* -1 if user function reported error
*/
int
-rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg);
+rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
+ __rte_locks_excluded(rte_mcfg_mem_get_lock());
/**
* Walk list of all memsegs without performing any locking.
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4 2/2] eal: annotate rte_memseg_list_walk()
2023-09-08 13:17 ` [PATCH v4 2/2] eal: annotate rte_memseg_list_walk() Artemy Kovalyov
@ 2023-10-06 10:12 ` David Marchand
0 siblings, 0 replies; 12+ messages in thread
From: David Marchand @ 2023-10-06 10:12 UTC (permalink / raw)
To: Artemy Kovalyov
Cc: dev, Thomas Monjalon, Ophir Munk, Anatoly Burakov, Jonathan Erb,
Dmitry Kozlyuk
On Fri, Sep 8, 2023 at 3:18 PM Artemy Kovalyov <artemyko@nvidia.com> wrote:
>
> Implementing a lock annotation for rte_memseg_list_walk() to
> proactively identify bugs similar to memory_hotplug_lock deadlock during
> initialization during compile time.
>
> Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
Series applied, thanks Artemy.
--
David Marchand
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-10-06 10:12 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-30 10:33 [PATCH] eal: fix memory initialization deadlock Artemy Kovalyov
2023-08-30 19:13 ` Dmitry Kozlyuk
2023-09-04 8:24 ` [PATCH v2] " Artemy Kovalyov
2023-09-05 7:05 ` Dmitry Kozlyuk
2023-09-05 9:05 ` Artemy Kovalyov
2023-09-05 10:15 ` David Marchand
2023-09-06 9:52 ` [PATCH v3] " Artemy Kovalyov
2023-09-06 12:52 ` David Marchand
2023-09-08 13:17 ` [PATCH v4 0/2] " Artemy Kovalyov
2023-09-08 13:17 ` [PATCH v4 1/2] eal: " Artemy Kovalyov
2023-09-08 13:17 ` [PATCH v4 2/2] eal: annotate rte_memseg_list_walk() Artemy Kovalyov
2023-10-06 10:12 ` David Marchand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).