From: "Van Haaren, Harry" <harry.van.haaren@intel.com>
To: "Marchand, David" <david.marchand@redhat.com>,
"Mattias Rönnblom" <mattias.ronnblom@ericsson.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"stephen@networkplumber.org" <stephen@networkplumber.org>,
"suanmingm@nvidia.com" <suanmingm@nvidia.com>,
"thomas@monjalon.net" <thomas@monjalon.net>,
"stable@dpdk.org" <stable@dpdk.org>,
Tyler Retzlaff <roretzla@linux.microsoft.com>,
"Aaron Conole" <aconole@redhat.com>
Subject: Re: [PATCH v2] service: fix deadlock on worker lcore exit
Date: Thu, 3 Oct 2024 15:50:27 +0000 [thread overview]
Message-ID: <PH8PR11MB6803B474AE398A725C2A3583D7712@PH8PR11MB6803.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CAJFAV8zgd_EJwbshjAWNMe4m2s=btu+6cY9ToA0vmudz89svaw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3606 bytes --]
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, October 3, 2024 10:13 AM
> To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> Cc: dev@dpdk.org <dev@dpdk.org>; stephen@networkplumber.org <stephen@networkplumber.org>; suanmingm@nvidia.com <suanmingm@nvidia.com>; thomas@monjalon.net <thomas@monjalon.net>; stable@dpdk.org <stable@dpdk.org>; Tyler Retzlaff <roretzla@linux.microsoft.com>; Aaron Conole <aconole@redhat.com>
> Subject: Re: [PATCH v2] service: fix deadlock on worker lcore exit
>
> On Thu, Oct 3, 2024 at 8:57 AM David Marchand <david.marchand@redhat.com> wrote:
> >
> > From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >
> > Calling rte_exit() from a worker lcore thread causes a deadlock in
> > rte_service_finalize().
> >
> > This patch makes rte_service_finalize() deadlock-free by avoiding the
> > need to synchronize with service lcore threads, which in turn is
> > achieved by moving service and per-lcore state from the heap to being
> > statically allocated.
> >
> > The BSS segment increases with ~156 kB (on x86_64 with default
> > RTE_MAX_LCORE and RTE_SERVICE_NUM_MAX).
> >
> > According to the service perf autotest, this change also results in a
> > slight reduction of service framework overhead.
> >
> > Fixes: 33666b448f15 ("service: fix crash on exit")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> > Changes since v1:
> > - rebased,
>
> I can't merge this patch in its current state.
>
> At the moment, two CI report a problem with the
> eal_flags_file_prefix_autotest unit test.
>
> -------------------------------------stdout-------------------------------------
> RTE>>eal_flags_file_prefix_autotest
> Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
> '--proc-type=secondary' '-m' '18' '--file-prefix=memtest'
> Running binary with argv[]:'/home/zhoumin/gh_dpdk/build/app/dpdk-test'
> '-m' '18' '--file-prefix=memtest1'
> Error - hugepage files for memtest1 were not deleted!
> Test Failed
> RTE>>
>
> Can you have a look?
Not sure how the code change in question is relating to the eal-flags failure, but I can reproduce the failure here.
Reproducing issue on *all* of the below tags; this indicates its likely a board-config issue, and not a true issue (unless its been there since 23.11??).
Tested commits were all bad:
b3485f4293 (HEAD, tag: v24.07) version: 24.07.0
a9778aad62 (HEAD, tag: v24.03) version: 24.03.0
eeb0605f11 (HEAD, tag: v23.11) version: 23.11.0
So I'm pretty sure this is a board/runner config issue, with the error output as follows here:
RTE>>eal_flags_file_prefix_autotest
Running binary with argv[]:'./app/test/dpdk-test' '--proc-type=secondary' '-m' '18' '--file-prefix=memtest'
EAL: Detected CPU lcores: 64
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Cannot open '/var/run/dpdk/memtest/config' for rte_mem_config
EAL: FATAL: Cannot init config
EAL: Cannot init config
FAIL:
DPDK_TEST=eal_flags_file_prefix_autotest ./app/test/dpdk-test --no-pci
PASS:
DPDK_TEST=eal_flags_file_prefix_autotest ./app/test/dpdk-test
So seems like the eal-flags test is NOT able to handle args like "--no-pci"? I tend to run tests in no PCI mode to speed up things :)
In short, this service-cores patch is not the root cause. Perhaps some of the CI folks can confirm if there's extra args passed to the runner?
Regards, -Harry
[-- Attachment #2: Type: text/html, Size: 17135 bytes --]
next prev parent reply other threads:[~2024-10-03 15:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-01 16:26 [PATCH] service: avoid worker lcore exit deadlock Mattias Rönnblom
2024-10-02 14:02 ` Tyler Retzlaff
2024-10-03 6:57 ` [PATCH v2] service: fix deadlock on worker lcore exit David Marchand
2024-10-03 9:13 ` David Marchand
2024-10-03 15:50 ` Van Haaren, Harry [this message]
2024-10-11 8:50 ` David Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=PH8PR11MB6803B474AE398A725C2A3583D7712@PH8PR11MB6803.namprd11.prod.outlook.com \
--to=harry.van.haaren@intel.com \
--cc=aconole@redhat.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=mattias.ronnblom@ericsson.com \
--cc=roretzla@linux.microsoft.com \
--cc=stable@dpdk.org \
--cc=stephen@networkplumber.org \
--cc=suanmingm@nvidia.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).