* [PATCH] service: avoid worker lcore exit deadlock
@ 2024-10-01 16:26 Mattias Rönnblom
2024-10-02 14:02 ` Tyler Retzlaff
2024-10-03 6:57 ` [PATCH v2] service: fix deadlock on worker lcore exit David Marchand
0 siblings, 2 replies; 6+ messages in thread
From: Mattias Rönnblom @ 2024-10-01 16:26 UTC (permalink / raw)
To: Harry van Haaren, Stephen Hemminger
Cc: hofors, dev, Suanming Mou, thomas, david.marchand,
Mattias Rönnblom, stable
Calling rte_exit() from a worker lcore thread causes a deadlock in
rte_service_finalize().
This patch makes rte_service_finalize() deadlock-free by avoiding the
need to synchronize with service lcore threads, which in turn is
achieved by moving service and per-lcore state from the heap to being
statically allocated.
The BSS segment increases with ~156 kB (on x86_64 with default
RTE_MAX_LCORE and RTE_SERVICE_NUM_MAX).
According to the service perf autotest, this change also results in a
slight reduction of service framework overhead.
Fixes: 33666b448f15 ("service: fix crash on exit")
Cc: harry.van.haaren@intel.com
Cc: stable@dpdk.org
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
lib/eal/common/rte_service.c | 28 ++--------------------------
1 file changed, 2 insertions(+), 26 deletions(-)
diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index 94e872a08a..2ea0c0c0d9 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -15,7 +15,6 @@
#include <rte_common.h>
#include <rte_cycles.h>
#include <rte_atomic.h>
-#include <rte_malloc.h>
#include <rte_spinlock.h>
#include <rte_trace_point.h>
@@ -74,8 +73,8 @@ struct core_state {
} __rte_cache_aligned;
static uint32_t rte_service_count;
-static struct rte_service_spec_impl *rte_services;
-static struct core_state *lcore_states;
+static struct rte_service_spec_impl rte_services[RTE_SERVICE_NUM_MAX];
+static struct core_state lcore_states[RTE_MAX_LCORE];
static uint32_t rte_service_library_initialized;
int32_t
@@ -93,21 +92,6 @@ rte_service_init(void)
return -EALREADY;
}
- rte_services = rte_calloc("rte_services", RTE_SERVICE_NUM_MAX,
- sizeof(struct rte_service_spec_impl),
- RTE_CACHE_LINE_SIZE);
- if (!rte_services) {
- RTE_LOG(ERR, EAL, "error allocating rte services array\n");
- goto fail_mem;
- }
-
- lcore_states = rte_calloc("rte_service_core_states", RTE_MAX_LCORE,
- sizeof(struct core_state), RTE_CACHE_LINE_SIZE);
- if (!lcore_states) {
- RTE_LOG(ERR, EAL, "error allocating core states array\n");
- goto fail_mem;
- }
-
int i;
struct rte_config *cfg = rte_eal_get_configuration();
for (i = 0; i < RTE_MAX_LCORE; i++) {
@@ -120,10 +104,6 @@ rte_service_init(void)
rte_service_library_initialized = 1;
return 0;
-fail_mem:
- rte_free(rte_services);
- rte_free(lcore_states);
- return -ENOMEM;
}
void
@@ -133,10 +113,6 @@ rte_service_finalize(void)
return;
rte_service_lcore_reset_all();
- rte_eal_mp_wait_lcore();
-
- rte_free(rte_services);
- rte_free(lcore_states);
rte_service_library_initialized = 0;
}
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] service: avoid worker lcore exit deadlock
2024-10-01 16:26 [PATCH] service: avoid worker lcore exit deadlock Mattias Rönnblom
@ 2024-10-02 14:02 ` Tyler Retzlaff
2024-10-03 6:57 ` [PATCH v2] service: fix deadlock on worker lcore exit David Marchand
1 sibling, 0 replies; 6+ messages in thread
From: Tyler Retzlaff @ 2024-10-02 14:02 UTC (permalink / raw)
To: 20230629203720.682f90c3
Cc: Harry van Haaren, Stephen Hemminger, hofors, dev, Suanming Mou,
thomas, david.marchand, Mattias Rönnblom, stable
On Tue, Oct 01, 2024 at 06:26:03PM +0200, Mattias Rönnblom wrote:
> Calling rte_exit() from a worker lcore thread causes a deadlock in
> rte_service_finalize().
>
> This patch makes rte_service_finalize() deadlock-free by avoiding the
> need to synchronize with service lcore threads, which in turn is
> achieved by moving service and per-lcore state from the heap to being
> statically allocated.
>
> The BSS segment increases with ~156 kB (on x86_64 with default
> RTE_MAX_LCORE and RTE_SERVICE_NUM_MAX).
>
> According to the service perf autotest, this change also results in a
> slight reduction of service framework overhead.
>
> Fixes: 33666b448f15 ("service: fix crash on exit")
> Cc: harry.van.haaren@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2] service: fix deadlock on worker lcore exit
2024-10-01 16:26 [PATCH] service: avoid worker lcore exit deadlock Mattias Rönnblom
2024-10-02 14:02 ` Tyler Retzlaff
@ 2024-10-03 6:57 ` David Marchand
2024-10-03 9:13 ` David Marchand
1 sibling, 1 reply; 6+ messages in thread
From: David Marchand @ 2024-10-03 6:57 UTC (permalink / raw)
To: dev
Cc: stephen, suanmingm, thomas, Mattias Rönnblom, stable,
Tyler Retzlaff, Harry van Haaren, Aaron Conole
From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Calling rte_exit() from a worker lcore thread causes a deadlock in
rte_service_finalize().
This patch makes rte_service_finalize() deadlock-free by avoiding the
need to synchronize with service lcore threads, which in turn is
achieved by moving service and per-lcore state from the heap to being
statically allocated.
The BSS segment increases with ~156 kB (on x86_64 with default
RTE_MAX_LCORE and RTE_SERVICE_NUM_MAX).
According to the service perf autotest, this change also results in a
slight reduction of service framework overhead.
Fixes: 33666b448f15 ("service: fix crash on exit")
Cc: stable@dpdk.org
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
Changes since v1:
- rebased,
---
lib/eal/common/rte_service.c | 28 ++--------------------------
1 file changed, 2 insertions(+), 26 deletions(-)
diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index a38c594ce4..5b6805b8d8 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -15,7 +15,6 @@
#include <rte_common.h>
#include <rte_cycles.h>
#include <rte_atomic.h>
-#include <rte_malloc.h>
#include <rte_spinlock.h>
#include <rte_trace_point.h>
@@ -76,8 +75,8 @@ struct __rte_cache_aligned core_state {
};
static uint32_t rte_service_count;
-static struct rte_service_spec_impl *rte_services;
-static struct core_state *lcore_states;
+static struct rte_service_spec_impl rte_services[RTE_SERVICE_NUM_MAX];
+static struct core_state lcore_states[RTE_MAX_LCORE];
static uint32_t rte_service_library_initialized;
int32_t
@@ -95,21 +94,6 @@ rte_service_init(void)
return -EALREADY;
}
- rte_services = rte_calloc("rte_services", RTE_SERVICE_NUM_MAX,
- sizeof(struct rte_service_spec_impl),
- RTE_CACHE_LINE_SIZE);
- if (!rte_services) {
- EAL_LOG(ERR, "error allocating rte services array");
- goto fail_mem;
- }
-
- lcore_states = rte_calloc("rte_service_core_states", RTE_MAX_LCORE,
- sizeof(struct core_state), RTE_CACHE_LINE_SIZE);
- if (!lcore_states) {
- EAL_LOG(ERR, "error allocating core states array");
- goto fail_mem;
- }
-
int i;
struct rte_config *cfg = rte_eal_get_configuration();
for (i = 0; i < RTE_MAX_LCORE; i++) {
@@ -122,10 +106,6 @@ rte_service_init(void)
rte_service_library_initialized = 1;
return 0;
-fail_mem:
- rte_free(rte_services);
- rte_free(lcore_states);
- return -ENOMEM;
}
void
@@ -135,10 +115,6 @@ rte_service_finalize(void)
return;
rte_service_lcore_reset_all();
- rte_eal_mp_wait_lcore();
-
- rte_free(rte_services);
- rte_free(lcore_states);
rte_service_library_initialized = 0;
}
--
2.46.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] service: fix deadlock on worker lcore exit
2024-10-03 6:57 ` [PATCH v2] service: fix deadlock on worker lcore exit David Marchand
@ 2024-10-03 9:13 ` David Marchand
2024-10-03 15:50 ` Van Haaren, Harry
0 siblings, 1 reply; 6+ messages in thread
From: David Marchand @ 2024-10-03 9:13 UTC (permalink / raw)
To: Mattias Rönnblom, Harry van Haaren
Cc: dev, stephen, suanmingm, thomas, stable, Tyler Retzlaff, Aaron Conole
On Thu, Oct 3, 2024 at 8:57 AM David Marchand <david.marchand@redhat.com> wrote:
>
> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>
> Calling rte_exit() from a worker lcore thread causes a deadlock in
> rte_service_finalize().
>
> This patch makes rte_service_finalize() deadlock-free by avoiding the
> need to synchronize with service lcore threads, which in turn is
> achieved by moving service and per-lcore state from the heap to being
> statically allocated.
>
> The BSS segment increases with ~156 kB (on x86_64 with default
> RTE_MAX_LCORE and RTE_SERVICE_NUM_MAX).
>
> According to the service perf autotest, this change also results in a
> slight reduction of service framework overhead.
>
> Fixes: 33666b448f15 ("service: fix crash on exit")
> Cc: stable@dpdk.org
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
> Changes since v1:
> - rebased,
I can't merge this patch in its current state.
At the moment, two CI report a problem with the
eal_flags_file_prefix_autotest unit test.
-------------------------------------stdout-------------------------------------
RTE>>eal_flags_file_prefix_autotest
Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
'--proc-type=secondary' '-m' '18' '--file-prefix=memtest'
Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
'-m' '18' '--file-prefix=memtest1'
Error - hugepage files for memtest1 were not deleted!
Test Failed
RTE>>
Can you have a look?
Thanks.
--
David marchand
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] service: fix deadlock on worker lcore exit
2024-10-03 9:13 ` David Marchand
@ 2024-10-03 15:50 ` Van Haaren, Harry
2024-10-11 8:50 ` David Marchand
0 siblings, 1 reply; 6+ messages in thread
From: Van Haaren, Harry @ 2024-10-03 15:50 UTC (permalink / raw)
To: Marchand, David, Mattias Rönnblom
Cc: dev, stephen, suanmingm, thomas, stable, Tyler Retzlaff, Aaron Conole
[-- Attachment #1: Type: text/plain, Size: 3606 bytes --]
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, October 3, 2024 10:13 AM
> To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> Cc: dev@dpdk.org <dev@dpdk.org>; stephen@networkplumber.org <stephen@networkplumber.org>; suanmingm@nvidia.com <suanmingm@nvidia.com>; thomas@monjalon.net <thomas@monjalon.net>; stable@dpdk.org <stable@dpdk.org>; Tyler Retzlaff <roretzla@linux.microsoft.com>; Aaron Conole <aconole@redhat.com>
> Subject: Re: [PATCH v2] service: fix deadlock on worker lcore exit
>
> On Thu, Oct 3, 2024 at 8:57 AM David Marchand <david.marchand@redhat.com> wrote:
> >
> > From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >
> > Calling rte_exit() from a worker lcore thread causes a deadlock in
> > rte_service_finalize().
> >
> > This patch makes rte_service_finalize() deadlock-free by avoiding the
> > need to synchronize with service lcore threads, which in turn is
> > achieved by moving service and per-lcore state from the heap to being
> > statically allocated.
> >
> > The BSS segment increases with ~156 kB (on x86_64 with default
> > RTE_MAX_LCORE and RTE_SERVICE_NUM_MAX).
> >
> > According to the service perf autotest, this change also results in a
> > slight reduction of service framework overhead.
> >
> > Fixes: 33666b448f15 ("service: fix crash on exit")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> > Changes since v1:
> > - rebased,
>
> I can't merge this patch in its current state.
>
> At the moment, two CI report a problem with the
> eal_flags_file_prefix_autotest unit test.
>
> -------------------------------------stdout-------------------------------------
> RTE>>eal_flags_file_prefix_autotest
> Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
> '--proc-type=secondary' '-m' '18' '--file-prefix=memtest'
> Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
> '-m' '18' '--file-prefix=memtest1'
> Error - hugepage files for memtest1 were not deleted!
> Test Failed
> RTE>>
>
> Can you have a look?
Not sure how the code change in question is relating to the eal-flags failure, but I can reproduce the failure here.
Reproducing issue on *all* of the below tags; this indicates its likely a board-config issue, and not a true issue (unless its been there since 23.11??).
Tested commits were all bad:
b3485f4293 (HEAD, tag: v24.07) version: 24.07.0
a9778aad62 (HEAD, tag: v24.03) version: 24.03.0
eeb0605f11 (HEAD, tag: v23.11) version: 23.11.0
So I'm pretty sure this is a board/runner config issue, with the error output as follows here:
RTE>>eal_flags_file_prefix_autotest
Running binary with argv[]:'./app/test/dpdk-test' '--proc-type=secondary' '-m' '18' '--file-prefix=memtest'
EAL: Detected CPU lcores: 64
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Cannot open '/var/run/dpdk/memtest/config' for rte_mem_config
EAL: FATAL: Cannot init config
EAL: Cannot init config
FAIL:
DPDK_TEST=eal_flags_file_prefix_autotest ./app/test/dpdk-test --no-pci
PASS:
DPDK_TEST=eal_flags_file_prefix_autotest ./app/test/dpdk-test
So seems like the eal-flags test is NOT able to handle args like "--no-pci"? I tend to run tests in no PCI mode to speed up things :)
In short, this service-cores patch is not the root cause. Perhaps some of the CI folks can confirm if there's extra args passed to the runner?
Regards, -Harry
[-- Attachment #2: Type: text/html, Size: 17135 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] service: fix deadlock on worker lcore exit
2024-10-03 15:50 ` Van Haaren, Harry
@ 2024-10-11 8:50 ` David Marchand
0 siblings, 0 replies; 6+ messages in thread
From: David Marchand @ 2024-10-11 8:50 UTC (permalink / raw)
To: Van Haaren, Harry, ci
Cc: Mattias Rönnblom, dev, stephen, suanmingm, thomas, stable,
Tyler Retzlaff, Aaron Conole
On Thu, Oct 3, 2024 at 5:50 PM Van Haaren, Harry
<harry.van.haaren@intel.com> wrote:
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Thursday, October 3, 2024 10:13 AM
> > To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> > Cc: dev@dpdk.org <dev@dpdk.org>; stephen@networkplumber.org <stephen@networkplumber.org>; suanmingm@nvidia.com <suanmingm@nvidia.com>; thomas@monjalon.net <thomas@monjalon.net>; stable@dpdk.org <stable@dpdk.org>; Tyler Retzlaff <roretzla@linux.microsoft.com>; Aaron Conole <aconole@redhat.com>
> > Subject: Re: [PATCH v2] service: fix deadlock on worker lcore exit
> >
> > On Thu, Oct 3, 2024 at 8:57 AM David Marchand <david.marchand@redhat.com> wrote:
> > >
> > > From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > >
> > > Calling rte_exit() from a worker lcore thread causes a deadlock in
> > > rte_service_finalize().
> > >
> > > This patch makes rte_service_finalize() deadlock-free by avoiding the
> > > need to synchronize with service lcore threads, which in turn is
> > > achieved by moving service and per-lcore state from the heap to being
> > > statically allocated.
> > >
> > > The BSS segment increases with ~156 kB (on x86_64 with default
> > > RTE_MAX_LCORE and RTE_SERVICE_NUM_MAX).
> > >
> > > According to the service perf autotest, this change also results in a
> > > slight reduction of service framework overhead.
> > >
> > > Fixes: 33666b448f15 ("service: fix crash on exit")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > > Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > ---
> > > Changes since v1:
> > > - rebased,
> >
> > I can't merge this patch in its current state.
> >
> > At the moment, two CI report a problem with the
> > eal_flags_file_prefix_autotest unit test.
> >
> > -------------------------------------stdout-------------------------------------
> > RTE>>eal_flags_file_prefix_autotest
> > Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
> > '--proc-type=secondary' '-m' '18' '--file-prefix=memtest'
> > Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
> > '-m' '18' '--file-prefix=memtest1'
> > Error - hugepage files for memtest1 were not deleted!
> > Test Failed
> > RTE>>
> >
> > Can you have a look?
>
> Not sure how the code change in question is relating to the eal-flags failure, but I can reproduce the failure here.
> Reproducing issue on *all* of the below tags; this indicates its likely a board-config issue, and not a true issue (unless its been there since 23.11??).
>
> Tested commits were all bad:
> b3485f4293 (HEAD, tag: v24.07) version: 24.07.0
> a9778aad62 (HEAD, tag: v24.03) version: 24.03.0
> eeb0605f11 (HEAD, tag: v23.11) version: 23.11.0
>
> So I'm pretty sure this is a board/runner config issue, with the error output as follows here:
> RTE>>eal_flags_file_prefix_autotest
> Running binary with argv[]:'./app/test/dpdk-test' '--proc-type=secondary' '-m' '18' '--file-prefix=memtest'
> EAL: Detected CPU lcores: 64
> EAL: Detected NUMA nodes: 2
> EAL: Detected static linkage of DPDK
> EAL: Cannot open '/var/run/dpdk/memtest/config' for rte_mem_config
> EAL: FATAL: Cannot init config
> EAL: Cannot init config
>
> FAIL:
> DPDK_TEST=eal_flags_file_prefix_autotest ./app/test/dpdk-test --no-pci
>
> PASS:
> DPDK_TEST=eal_flags_file_prefix_autotest ./app/test/dpdk-test
>
> So seems like the eal-flags test is NOT able to handle args like "--no-pci"? I tend to run tests in no PCI mode to speed up things :)
Well, speeding up, or hiding the issue, I guess.
> In short, this service-cores patch is not the root cause. Perhaps some of the CI folks can confirm if there's extra args passed to the runner?
To be clear, I can't merge this patch because of this (systematic)
failure in many CI env (GHA, LoongArch, UNH).
Adding CI ml in the loop.
--
David Marchand
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-10-11 8:51 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-01 16:26 [PATCH] service: avoid worker lcore exit deadlock Mattias Rönnblom
2024-10-02 14:02 ` Tyler Retzlaff
2024-10-03 6:57 ` [PATCH v2] service: fix deadlock on worker lcore exit David Marchand
2024-10-03 9:13 ` David Marchand
2024-10-03 15:50 ` Van Haaren, Harry
2024-10-11 8:50 ` David Marchand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).