* [PATCH] Revert "eal/unix: fix thread creation"
@ 2024-10-30 19:08 luca.boccassi
2024-10-30 19:52 ` Stephen Hemminger
2024-10-30 20:30 ` [PATCH v2] " luca.boccassi
0 siblings, 2 replies; 12+ messages in thread
From: luca.boccassi @ 2024-10-30 19:08 UTC (permalink / raw)
To: dev; +Cc: david.marchand, roretzla
From: Luca Boccassi <luca.boccassi@gmail.com>
This commit introduced a regression on arm64, causing a deadlock.
lcores_autotest gets stuck and never terminates:
[ 1077s] EAL: Detected CPU lcores: 4
[ 1077s] EAL: Detected NUMA nodes: 1
[ 1077s] EAL: Detected shared linkage of DPDK
[ 1077s] EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
[ 1077s] EAL: Selected IOVA mode 'VA'
[ 1077s] APP: HPET is not enabled, using TSC as default timer
[ 1077s] RTE>>lcores_autotest
[ 1127s] DPDK:fast-tests / lcores_autotest time out (After 50.0 seconds)
This is 100% reproducible when running the fast tests suite
after a package build on OBS. Reverting it reliably fixes the
issue.
This reverts commit b28c6196b132d1f25cb8c1bf781520fc41556b3a.
---
I have bisected this long standing issue and identified the commit
that introduced it. If anybody can provide a different fix that would
be better, but if it's not possible to find another solution, it would
be good to revert it until it can be found, to resolve the regression.
lib/eal/unix/rte_thread.c | 73 +++++++++++++++------------------------
1 file changed, 28 insertions(+), 45 deletions(-)
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 1b4c73f58e..9a39ba3bb3 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -5,7 +5,6 @@
#include <errno.h>
#include <pthread.h>
-#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
@@ -19,14 +18,9 @@ struct eal_tls_key {
pthread_key_t thread_index;
};
-struct thread_start_context {
+struct thread_routine_ctx {
rte_thread_func thread_func;
- void *thread_args;
- const rte_thread_attr_t *thread_attr;
- pthread_mutex_t wrapper_mutex;
- pthread_cond_t wrapper_cond;
- int wrapper_ret;
- bool wrapper_done;
+ void *routine_args;
};
static int
@@ -89,29 +83,13 @@ thread_map_os_priority_to_eal_priority(int policy, int os_pri,
}
static void *
-thread_start_wrapper(void *arg)
+thread_func_wrapper(void *arg)
{
- struct thread_start_context *ctx = (struct thread_start_context *)arg;
- rte_thread_func thread_func = ctx->thread_func;
- void *thread_args = ctx->thread_args;
- int ret = 0;
+ struct thread_routine_ctx ctx = *(struct thread_routine_ctx *)arg;
- if (ctx->thread_attr != NULL && CPU_COUNT(&ctx->thread_attr->cpuset) > 0) {
- ret = rte_thread_set_affinity_by_id(rte_thread_self(), &ctx->thread_attr->cpuset);
- if (ret != 0)
- EAL_LOG(DEBUG, "rte_thread_set_affinity_by_id failed");
- }
+ free(arg);
- pthread_mutex_lock(&ctx->wrapper_mutex);
- ctx->wrapper_ret = ret;
- ctx->wrapper_done = true;
- pthread_cond_signal(&ctx->wrapper_cond);
- pthread_mutex_unlock(&ctx->wrapper_mutex);
-
- if (ret != 0)
- return NULL;
-
- return (void *)(uintptr_t)thread_func(thread_args);
+ return (void *)(uintptr_t)ctx.thread_func(ctx.routine_args);
}
int
@@ -122,18 +100,20 @@ rte_thread_create(rte_thread_t *thread_id,
int ret = 0;
pthread_attr_t attr;
pthread_attr_t *attrp = NULL;
+ struct thread_routine_ctx *ctx;
struct sched_param param = {
.sched_priority = 0,
};
int policy = SCHED_OTHER;
- struct thread_start_context ctx = {
- .thread_func = thread_func,
- .thread_args = args,
- .thread_attr = thread_attr,
- .wrapper_done = false,
- .wrapper_mutex = PTHREAD_MUTEX_INITIALIZER,
- .wrapper_cond = PTHREAD_COND_INITIALIZER,
- };
+
+ ctx = calloc(1, sizeof(*ctx));
+ if (ctx == NULL) {
+ RTE_LOG(DEBUG, EAL, "Insufficient memory for thread context allocations\n");
+ ret = ENOMEM;
+ goto cleanup;
+ }
+ ctx->routine_args = args;
+ ctx->thread_func = thread_func;
if (thread_attr != NULL) {
ret = pthread_attr_init(&attr);
@@ -155,6 +135,7 @@ rte_thread_create(rte_thread_t *thread_id,
goto cleanup;
}
+
if (thread_attr->priority ==
RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
ret = ENOTSUP;
@@ -179,22 +160,24 @@ rte_thread_create(rte_thread_t *thread_id,
}
ret = pthread_create((pthread_t *)&thread_id->opaque_id, attrp,
- thread_start_wrapper, &ctx);
+ thread_func_wrapper, ctx);
if (ret != 0) {
EAL_LOG(DEBUG, "pthread_create failed");
goto cleanup;
}
- pthread_mutex_lock(&ctx.wrapper_mutex);
- while (!ctx.wrapper_done)
- pthread_cond_wait(&ctx.wrapper_cond, &ctx.wrapper_mutex);
- ret = ctx.wrapper_ret;
- pthread_mutex_unlock(&ctx.wrapper_mutex);
-
- if (ret != 0)
- rte_thread_join(*thread_id, NULL);
+ if (thread_attr != NULL && CPU_COUNT(&thread_attr->cpuset) > 0) {
+ ret = rte_thread_set_affinity_by_id(*thread_id,
+ &thread_attr->cpuset);
+ if (ret != 0) {
+ EAL_LOG(DEBUG, "rte_thread_set_affinity_by_id failed");
+ goto cleanup;
+ }
+ }
+ ctx = NULL;
cleanup:
+ free(ctx);
if (attrp != NULL)
pthread_attr_destroy(&attr);
--
2.45.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "eal/unix: fix thread creation"
2024-10-30 19:08 [PATCH] Revert "eal/unix: fix thread creation" luca.boccassi
@ 2024-10-30 19:52 ` Stephen Hemminger
2024-10-30 20:31 ` Luca Boccassi
2024-10-30 20:30 ` [PATCH v2] " luca.boccassi
1 sibling, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2024-10-30 19:52 UTC (permalink / raw)
To: luca.boccassi; +Cc: dev, david.marchand, roretzla
On Wed, 30 Oct 2024 19:08:41 +0000
luca.boccassi@gmail.com wrote:
> From: Luca Boccassi <luca.boccassi@gmail.com>
>
> This commit introduced a regression on arm64, causing a deadlock.
> lcores_autotest gets stuck and never terminates:
>
> [ 1077s] EAL: Detected CPU lcores: 4
> [ 1077s] EAL: Detected NUMA nodes: 1
> [ 1077s] EAL: Detected shared linkage of DPDK
> [ 1077s] EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
> [ 1077s] EAL: Selected IOVA mode 'VA'
> [ 1077s] APP: HPET is not enabled, using TSC as default timer
> [ 1077s] RTE>>lcores_autotest
> [ 1127s] DPDK:fast-tests / lcores_autotest time out (After 50.0 seconds)
>
> This is 100% reproducible when running the fast tests suite
> after a package build on OBS. Reverting it reliably fixes the
> issue.
>
> This reverts commit b28c6196b132d1f25cb8c1bf781520fc41556b3a.
> ---
> I have bisected this long standing issue and identified the commit
> that introduced it. If anybody can provide a different fix that would
> be better, but if it's not possible to find another solution, it would
> be good to revert it until it can be found, to resolve the regression.
>
> lib/eal/unix/rte_thread.c | 73 +++++++++++++++------------------------
> 1 file changed, 28 insertions(+), 45 deletions(-)
Missing DCO (no Signed-off-by) which is required even for a Revert.
Also Luca usually uses either Debian or Microsoft email address
getting one from gmail is different and not in mailmap.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] Revert "eal/unix: fix thread creation"
2024-10-30 19:08 [PATCH] Revert "eal/unix: fix thread creation" luca.boccassi
2024-10-30 19:52 ` Stephen Hemminger
@ 2024-10-30 20:30 ` luca.boccassi
2024-10-31 12:47 ` David Marchand
1 sibling, 1 reply; 12+ messages in thread
From: luca.boccassi @ 2024-10-30 20:30 UTC (permalink / raw)
To: dev; +Cc: david.marchand, roretzla
From: Luca Boccassi <luca.boccassi@gmail.com>
This commit introduced a regression on arm64, causing a deadlock.
lcores_autotest gets stuck and never terminates:
[ 1077s] EAL: Detected CPU lcores: 4
[ 1077s] EAL: Detected NUMA nodes: 1
[ 1077s] EAL: Detected shared linkage of DPDK
[ 1077s] EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
[ 1077s] EAL: Selected IOVA mode 'VA'
[ 1077s] APP: HPET is not enabled, using TSC as default timer
[ 1077s] RTE>>lcores_autotest
[ 1127s] DPDK:fast-tests / lcores_autotest time out (After 50.0 seconds)
This is 100% reproducible when running the fast tests suite
after a package build on OBS. Reverting it reliably fixes the
issue.
This reverts commit b28c6196b132d1f25cb8c1bf781520fc41556b3a.
Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com>
---
v2: add forgotten signed-off-by
I have bisected this long standing issue and identified the commit
that introduced it. If anybody can provide a different fix that would
be better, but if it's not possible to find another solution, it would
be good to revert it until it can be found, to resolve the regression.
lib/eal/unix/rte_thread.c | 73 +++++++++++++++------------------------
1 file changed, 28 insertions(+), 45 deletions(-)
diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c
index 1b4c73f58e..9a39ba3bb3 100644
--- a/lib/eal/unix/rte_thread.c
+++ b/lib/eal/unix/rte_thread.c
@@ -5,7 +5,6 @@
#include <errno.h>
#include <pthread.h>
-#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
@@ -19,14 +18,9 @@ struct eal_tls_key {
pthread_key_t thread_index;
};
-struct thread_start_context {
+struct thread_routine_ctx {
rte_thread_func thread_func;
- void *thread_args;
- const rte_thread_attr_t *thread_attr;
- pthread_mutex_t wrapper_mutex;
- pthread_cond_t wrapper_cond;
- int wrapper_ret;
- bool wrapper_done;
+ void *routine_args;
};
static int
@@ -89,29 +83,13 @@ thread_map_os_priority_to_eal_priority(int policy, int os_pri,
}
static void *
-thread_start_wrapper(void *arg)
+thread_func_wrapper(void *arg)
{
- struct thread_start_context *ctx = (struct thread_start_context *)arg;
- rte_thread_func thread_func = ctx->thread_func;
- void *thread_args = ctx->thread_args;
- int ret = 0;
+ struct thread_routine_ctx ctx = *(struct thread_routine_ctx *)arg;
- if (ctx->thread_attr != NULL && CPU_COUNT(&ctx->thread_attr->cpuset) > 0) {
- ret = rte_thread_set_affinity_by_id(rte_thread_self(), &ctx->thread_attr->cpuset);
- if (ret != 0)
- EAL_LOG(DEBUG, "rte_thread_set_affinity_by_id failed");
- }
+ free(arg);
- pthread_mutex_lock(&ctx->wrapper_mutex);
- ctx->wrapper_ret = ret;
- ctx->wrapper_done = true;
- pthread_cond_signal(&ctx->wrapper_cond);
- pthread_mutex_unlock(&ctx->wrapper_mutex);
-
- if (ret != 0)
- return NULL;
-
- return (void *)(uintptr_t)thread_func(thread_args);
+ return (void *)(uintptr_t)ctx.thread_func(ctx.routine_args);
}
int
@@ -122,18 +100,20 @@ rte_thread_create(rte_thread_t *thread_id,
int ret = 0;
pthread_attr_t attr;
pthread_attr_t *attrp = NULL;
+ struct thread_routine_ctx *ctx;
struct sched_param param = {
.sched_priority = 0,
};
int policy = SCHED_OTHER;
- struct thread_start_context ctx = {
- .thread_func = thread_func,
- .thread_args = args,
- .thread_attr = thread_attr,
- .wrapper_done = false,
- .wrapper_mutex = PTHREAD_MUTEX_INITIALIZER,
- .wrapper_cond = PTHREAD_COND_INITIALIZER,
- };
+
+ ctx = calloc(1, sizeof(*ctx));
+ if (ctx == NULL) {
+ RTE_LOG(DEBUG, EAL, "Insufficient memory for thread context allocations\n");
+ ret = ENOMEM;
+ goto cleanup;
+ }
+ ctx->routine_args = args;
+ ctx->thread_func = thread_func;
if (thread_attr != NULL) {
ret = pthread_attr_init(&attr);
@@ -155,6 +135,7 @@ rte_thread_create(rte_thread_t *thread_id,
goto cleanup;
}
+
if (thread_attr->priority ==
RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
ret = ENOTSUP;
@@ -179,22 +160,24 @@ rte_thread_create(rte_thread_t *thread_id,
}
ret = pthread_create((pthread_t *)&thread_id->opaque_id, attrp,
- thread_start_wrapper, &ctx);
+ thread_func_wrapper, ctx);
if (ret != 0) {
EAL_LOG(DEBUG, "pthread_create failed");
goto cleanup;
}
- pthread_mutex_lock(&ctx.wrapper_mutex);
- while (!ctx.wrapper_done)
- pthread_cond_wait(&ctx.wrapper_cond, &ctx.wrapper_mutex);
- ret = ctx.wrapper_ret;
- pthread_mutex_unlock(&ctx.wrapper_mutex);
-
- if (ret != 0)
- rte_thread_join(*thread_id, NULL);
+ if (thread_attr != NULL && CPU_COUNT(&thread_attr->cpuset) > 0) {
+ ret = rte_thread_set_affinity_by_id(*thread_id,
+ &thread_attr->cpuset);
+ if (ret != 0) {
+ EAL_LOG(DEBUG, "rte_thread_set_affinity_by_id failed");
+ goto cleanup;
+ }
+ }
+ ctx = NULL;
cleanup:
+ free(ctx);
if (attrp != NULL)
pthread_attr_destroy(&attr);
--
2.45.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Revert "eal/unix: fix thread creation"
2024-10-30 19:52 ` Stephen Hemminger
@ 2024-10-30 20:31 ` Luca Boccassi
0 siblings, 0 replies; 12+ messages in thread
From: Luca Boccassi @ 2024-10-30 20:31 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, david.marchand, roretzla
On Wed, 30 Oct 2024 at 19:52, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Wed, 30 Oct 2024 19:08:41 +0000
> luca.boccassi@gmail.com wrote:
>
> > From: Luca Boccassi <luca.boccassi@gmail.com>
> >
> > This commit introduced a regression on arm64, causing a deadlock.
> > lcores_autotest gets stuck and never terminates:
> >
> > [ 1077s] EAL: Detected CPU lcores: 4
> > [ 1077s] EAL: Detected NUMA nodes: 1
> > [ 1077s] EAL: Detected shared linkage of DPDK
> > [ 1077s] EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
> > [ 1077s] EAL: Selected IOVA mode 'VA'
> > [ 1077s] APP: HPET is not enabled, using TSC as default timer
> > [ 1077s] RTE>>lcores_autotest
> > [ 1127s] DPDK:fast-tests / lcores_autotest time out (After 50.0 seconds)
> >
> > This is 100% reproducible when running the fast tests suite
> > after a package build on OBS. Reverting it reliably fixes the
> > issue.
> >
> > This reverts commit b28c6196b132d1f25cb8c1bf781520fc41556b3a.
> > ---
> > I have bisected this long standing issue and identified the commit
> > that introduced it. If anybody can provide a different fix that would
> > be better, but if it's not possible to find another solution, it would
> > be good to revert it until it can be found, to resolve the regression.
> >
> > lib/eal/unix/rte_thread.c | 73 +++++++++++++++------------------------
> > 1 file changed, 28 insertions(+), 45 deletions(-)
>
> Missing DCO (no Signed-off-by) which is required even for a Revert.
>
> Also Luca usually uses either Debian or Microsoft email address
> getting one from gmail is different and not in mailmap.
Yeah forgot the signed-off, sent v2, thanks
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Revert "eal/unix: fix thread creation"
2024-10-30 20:30 ` [PATCH v2] " luca.boccassi
@ 2024-10-31 12:47 ` David Marchand
2024-10-31 12:52 ` David Marchand
0 siblings, 1 reply; 12+ messages in thread
From: David Marchand @ 2024-10-31 12:47 UTC (permalink / raw)
To: luca.boccassi; +Cc: dev, roretzla
Hello Luca,
>
> This commit introduced a regression on arm64, causing a deadlock.
> lcores_autotest gets stuck and never terminates:
>
> [ 1077s] EAL: Detected CPU lcores: 4
> [ 1077s] EAL: Detected NUMA nodes: 1
> [ 1077s] EAL: Detected shared linkage of DPDK
> [ 1077s] EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
> [ 1077s] EAL: Selected IOVA mode 'VA'
> [ 1077s] APP: HPET is not enabled, using TSC as default timer
> [ 1077s] RTE>>lcores_autotest
> [ 1127s] DPDK:fast-tests / lcores_autotest time out (After 50.0 seconds)
>
> This is 100% reproducible when running the fast tests suite
> after a package build on OBS. Reverting it reliably fixes the
> issue.
>
> This reverts commit b28c6196b132d1f25cb8c1bf781520fc41556b3a.
>
> Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com>
> ---
> v2: add forgotten signed-off-by
>
> I have bisected this long standing issue and identified the commit
> that introduced it. If anybody can provide a different fix that would
> be better, but if it's not possible to find another solution, it would
> be good to revert it until it can be found, to resolve the regression.
Thanks for tracking this down.
There is one issue with reverting: iirc, it reintroduces a race / double-free.
Could you share a backtrace when hitting this deadlock?
On my side, I am not able to catch it neither on x86 nor in a ARM vm I borrowed.
I built dpdk manually in a Debian 12 container, trying to mimick OBS
cflags & friends.
# rm -rf build-debian; CC='ccache gcc' meson setup build-debian
-Dmachine=default -Dbuildtype=plain -Ddefault_library=shared
-Dc_args='-O2 -fstack-protector-strong -Wformat
-Werror=format-security -Werror -Wdate-time -D_FORTIFY_SOURCE=2' &&
ninja -C build-debian && meson test -C build-debian --suite fast-tests
--verbose -t 5
...
36/81 DPDK:fast-tests / lcores_autotest RUNNING
>>> LD_LIBRARY_PATH=/root/dpdk/build-debian/lib:/root/dpdk/build-debian/drivers MALLOC_PERTURB_=90 DPDK_TEST=lcores_autotest /root/dpdk/build-debian/app/dpdk-test --no-huge -m 2048 -d /root/dpdk/build-debian/drivers
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
✀ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
EAL: Detected CPU lcores: 3
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
VIRTIO_INIT: eth_virtio_pci_init(): Failed to init PCI device
PCI_BUS: Requested device 0000:01:00.0 cannot be used
APP: HPET is not enabled, using TSC as default timer
RTE>>lcores_autotest
EAL threads count: 3, RTE_MAX_LCORE=256
lcore 0, socket 0, role RTE, cpuset 0
lcore 1, socket 0, role RTE, cpuset 1
lcore 2, socket 0, role RTE, cpuset 2
non-EAL threads count: 253
Warning: could not register new thread (this might be expected during
this test), reason Cannot allocate memory
non-EAL threads count: 254
Warning: could not register new thread (this might be expected during
this test), reason Cannot allocate memory
lcore 0, socket 0, role RTE, cpuset 0
lcore 1, socket 0, role RTE, cpuset 1
lcore 2, socket 0, role RTE, cpuset 2
lcore 3, socket 0, role NON_EAL, cpuset 0
lcore 0, socket 0, role RTE, cpuset 0
lcore 1, socket 0, role RTE, cpuset 1
lcore 2, socket 0, role RTE, cpuset 2
Control thread running successfully
Test OK
RTE>>――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
36/81 DPDK:fast-tests / lcores_autotest OK 1.87s
This vm runs on:
# lspcu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 3
On-line CPU(s) list: 0-2
Vendor ID: ARM
BIOS Vendor ID: QEMU
Model name: Neoverse-N1
BIOS Model name: virt-rhel8.6.0 CPU @ 2.0GHz
...
--
David Marchand
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Revert "eal/unix: fix thread creation"
2024-10-31 12:47 ` David Marchand
@ 2024-10-31 12:52 ` David Marchand
2024-10-31 12:58 ` Luca Boccassi
0 siblings, 1 reply; 12+ messages in thread
From: David Marchand @ 2024-10-31 12:52 UTC (permalink / raw)
To: luca.boccassi; +Cc: dev, roretzla
On Thu, Oct 31, 2024 at 1:47 PM David Marchand
<david.marchand@redhat.com> wrote:
> Could you share a backtrace when hitting this deadlock?
If the backtrace is not possible, running with
--log-level=lib.eal:debug may help.
--
David Marchand
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Revert "eal/unix: fix thread creation"
2024-10-31 12:52 ` David Marchand
@ 2024-10-31 12:58 ` Luca Boccassi
2024-10-31 13:03 ` David Marchand
0 siblings, 1 reply; 12+ messages in thread
From: Luca Boccassi @ 2024-10-31 12:58 UTC (permalink / raw)
To: David Marchand; +Cc: dev, roretzla
On Thu, 31 Oct 2024 at 12:52, David Marchand <david.marchand@redhat.com> wrote:
>
> On Thu, Oct 31, 2024 at 1:47 PM David Marchand
> <david.marchand@redhat.com> wrote:
> > Could you share a backtrace when hitting this deadlock?
>
> If the backtrace is not possible, running with
> --log-level=lib.eal:debug may help.
I cannot get backtraces. This runs via "meson test", how can that
option be passed in?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Revert "eal/unix: fix thread creation"
2024-10-31 12:58 ` Luca Boccassi
@ 2024-10-31 13:03 ` David Marchand
2024-10-31 14:05 ` Luca Boccassi
0 siblings, 1 reply; 12+ messages in thread
From: David Marchand @ 2024-10-31 13:03 UTC (permalink / raw)
To: Luca Boccassi; +Cc: dev, roretzla
On Thu, Oct 31, 2024 at 1:58 PM Luca Boccassi <luca.boccassi@gmail.com> wrote:
>
> On Thu, 31 Oct 2024 at 12:52, David Marchand <david.marchand@redhat.com> wrote:
> >
> > On Thu, Oct 31, 2024 at 1:47 PM David Marchand
> > <david.marchand@redhat.com> wrote:
> > > Could you share a backtrace when hitting this deadlock?
> >
> > If the backtrace is not possible, running with
> > --log-level=lib.eal:debug may help.
>
> I cannot get backtraces. This runs via "meson test", how can that
> option be passed in?
# meson test -C build-debian --suite fast-tests --verbose -t 5
--test-args=--log-level=lib.eal:debug
--
David Marchand
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Revert "eal/unix: fix thread creation"
2024-10-31 13:03 ` David Marchand
@ 2024-10-31 14:05 ` Luca Boccassi
2024-10-31 20:46 ` Stephen Hemminger
0 siblings, 1 reply; 12+ messages in thread
From: Luca Boccassi @ 2024-10-31 14:05 UTC (permalink / raw)
To: David Marchand; +Cc: dev, roretzla
On Thu, 31 Oct 2024 at 13:04, David Marchand <david.marchand@redhat.com> wrote:
>
> On Thu, Oct 31, 2024 at 1:58 PM Luca Boccassi <luca.boccassi@gmail.com> wrote:
> >
> > On Thu, 31 Oct 2024 at 12:52, David Marchand <david.marchand@redhat.com> wrote:
> > >
> > > On Thu, Oct 31, 2024 at 1:47 PM David Marchand
> > > <david.marchand@redhat.com> wrote:
> > > > Could you share a backtrace when hitting this deadlock?
> > >
> > > If the backtrace is not possible, running with
> > > --log-level=lib.eal:debug may help.
> >
> > I cannot get backtraces. This runs via "meson test", how can that
> > option be passed in?
>
> # meson test -C build-debian --suite fast-tests --verbose -t 5
> --test-args=--log-level=lib.eal:debug
https://paste.debian.net/1334095/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Revert "eal/unix: fix thread creation"
2024-10-31 14:05 ` Luca Boccassi
@ 2024-10-31 20:46 ` Stephen Hemminger
2024-11-02 10:09 ` David Marchand
0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2024-10-31 20:46 UTC (permalink / raw)
To: Luca Boccassi; +Cc: David Marchand, dev, roretzla
On Thu, 31 Oct 2024 14:05:16 +0000
Luca Boccassi <luca.boccassi@gmail.com> wrote:
> On Thu, 31 Oct 2024 at 13:04, David Marchand <david.marchand@redhat.com> wrote:
> >
> > On Thu, Oct 31, 2024 at 1:58 PM Luca Boccassi <luca.boccassi@gmail.com> wrote:
> > >
> > > On Thu, 31 Oct 2024 at 12:52, David Marchand <david.marchand@redhat.com> wrote:
> > > >
> > > > On Thu, Oct 31, 2024 at 1:47 PM David Marchand
> > > > <david.marchand@redhat.com> wrote:
> > > > > Could you share a backtrace when hitting this deadlock?
> > > >
> > > > If the backtrace is not possible, running with
> > > > --log-level=lib.eal:debug may help.
> > >
> > > I cannot get backtraces. This runs via "meson test", how can that
> > > option be passed in?
> >
> > # meson test -C build-debian --suite fast-tests --verbose -t 5
> > --test-args=--log-level=lib.eal:debug
>
> https://paste.debian.net/1334095/
Could not repro this on Raspberry Pi 5. Main branch builds and runs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Revert "eal/unix: fix thread creation"
2024-10-31 20:46 ` Stephen Hemminger
@ 2024-11-02 10:09 ` David Marchand
2024-11-04 17:59 ` David Marchand
0 siblings, 1 reply; 12+ messages in thread
From: David Marchand @ 2024-11-02 10:09 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Luca Boccassi, dev, roretzla
On Thu, Oct 31, 2024 at 9:46 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Thu, 31 Oct 2024 14:05:16 +0000
> Luca Boccassi <luca.boccassi@gmail.com> wrote:
>
> > On Thu, 31 Oct 2024 at 13:04, David Marchand <david.marchand@redhat.com> wrote:
> > >
> > > On Thu, Oct 31, 2024 at 1:58 PM Luca Boccassi <luca.boccassi@gmail.com> wrote:
> > > >
> > > > On Thu, 31 Oct 2024 at 12:52, David Marchand <david.marchand@redhat.com> wrote:
> > > > >
> > > > > On Thu, Oct 31, 2024 at 1:47 PM David Marchand
> > > > > <david.marchand@redhat.com> wrote:
> > > > > > Could you share a backtrace when hitting this deadlock?
> > > > >
> > > > > If the backtrace is not possible, running with
> > > > > --log-level=lib.eal:debug may help.
> > > >
> > > > I cannot get backtraces. This runs via "meson test", how can that
> > > > option be passed in?
> > >
> > > # meson test -C build-debian --suite fast-tests --verbose -t 5
> > > --test-args=--log-level=lib.eal:debug
> >
> > https://paste.debian.net/1334095/
>
> Could not repro this on Raspberry Pi 5. Main branch builds and runs
There is no deadlock at play, as far as I can see.
The mentionned commit slowed down thread instantiation (a lot, from
what the timestamps seen in the paste link).
The exact reason is not entirely clear to me, but it forced the thread
creating children threads to wait for them to start running.
Reverting this commit enhances the situation, but reintroduce a double
free (if set affinity of the created thread fails, both the parent and
the created thread will free ctx).
I sent a patch, reintroducing use of pthread_attr_setaffinity_np like
Tyler had first proposed.
Let's see what the CI think of this.
--
David Marchand
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Revert "eal/unix: fix thread creation"
2024-11-02 10:09 ` David Marchand
@ 2024-11-04 17:59 ` David Marchand
0 siblings, 0 replies; 12+ messages in thread
From: David Marchand @ 2024-11-04 17:59 UTC (permalink / raw)
To: Luca Boccassi; +Cc: dev, roretzla, Stephen Hemminger
Hello Luca,
On Sat, Nov 2, 2024 at 11:09 AM David Marchand
<david.marchand@redhat.com> wrote:
>
> On Thu, Oct 31, 2024 at 9:46 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Thu, 31 Oct 2024 14:05:16 +0000
> > Luca Boccassi <luca.boccassi@gmail.com> wrote:
> >
> > > On Thu, 31 Oct 2024 at 13:04, David Marchand <david.marchand@redhat.com> wrote:
> > > >
> > > > On Thu, Oct 31, 2024 at 1:58 PM Luca Boccassi <luca.boccassi@gmail.com> wrote:
> > > > >
> > > > > On Thu, 31 Oct 2024 at 12:52, David Marchand <david.marchand@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 31, 2024 at 1:47 PM David Marchand
> > > > > > <david.marchand@redhat.com> wrote:
> > > > > > > Could you share a backtrace when hitting this deadlock?
> > > > > >
> > > > > > If the backtrace is not possible, running with
> > > > > > --log-level=lib.eal:debug may help.
> > > > >
> > > > > I cannot get backtraces. This runs via "meson test", how can that
> > > > > option be passed in?
> > > >
> > > > # meson test -C build-debian --suite fast-tests --verbose -t 5
> > > > --test-args=--log-level=lib.eal:debug
> > >
> > > https://paste.debian.net/1334095/
> >
> > Could not repro this on Raspberry Pi 5. Main branch builds and runs
>
> There is no deadlock at play, as far as I can see.
>
> The mentionned commit slowed down thread instantiation (a lot, from
> what the timestamps seen in the paste link).
> The exact reason is not entirely clear to me, but it forced the thread
> creating children threads to wait for them to start running.
> Reverting this commit enhances the situation, but reintroduce a double
> free (if set affinity of the created thread fails, both the parent and
> the created thread will free ctx).
>
> I sent a patch, reintroducing use of pthread_attr_setaffinity_np like
> Tyler had first proposed.
> Let's see what the CI think of this.
The change has been merged and OBS is now all green.
You can disable the debug logs in your OBS package.
Thanks.
--
David Marchand
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-11-04 17:59 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-30 19:08 [PATCH] Revert "eal/unix: fix thread creation" luca.boccassi
2024-10-30 19:52 ` Stephen Hemminger
2024-10-30 20:31 ` Luca Boccassi
2024-10-30 20:30 ` [PATCH v2] " luca.boccassi
2024-10-31 12:47 ` David Marchand
2024-10-31 12:52 ` David Marchand
2024-10-31 12:58 ` Luca Boccassi
2024-10-31 13:03 ` David Marchand
2024-10-31 14:05 ` Luca Boccassi
2024-10-31 20:46 ` Stephen Hemminger
2024-11-02 10:09 ` David Marchand
2024-11-04 17:59 ` David Marchand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).