From: David Marchand <david.marchand@redhat.com>
To: luca.boccassi@gmail.com
Cc: dev@dpdk.org, roretzla@linux.microsoft.com
Subject: Re: [PATCH v2] Revert "eal/unix: fix thread creation"
Date: Thu, 31 Oct 2024 13:47:36 +0100 [thread overview]
Message-ID: <CAJFAV8zZuO6BYmkvDnsx5G14PFZGxSUEkvSX64ve3qW1yn+mmQ@mail.gmail.com> (raw)
In-Reply-To: <20241030203122.416198-1-luca.boccassi@gmail.com>
Hello Luca,
>
> This commit introduced a regression on arm64, causing a deadlock.
> lcores_autotest gets stuck and never terminates:
>
> [ 1077s] EAL: Detected CPU lcores: 4
> [ 1077s] EAL: Detected NUMA nodes: 1
> [ 1077s] EAL: Detected shared linkage of DPDK
> [ 1077s] EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
> [ 1077s] EAL: Selected IOVA mode 'VA'
> [ 1077s] APP: HPET is not enabled, using TSC as default timer
> [ 1077s] RTE>>lcores_autotest
> [ 1127s] DPDK:fast-tests / lcores_autotest time out (After 50.0 seconds)
>
> This is 100% reproducible when running the fast tests suite
> after a package build on OBS. Reverting it reliably fixes the
> issue.
>
> This reverts commit b28c6196b132d1f25cb8c1bf781520fc41556b3a.
>
> Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com>
> ---
> v2: add forgotten signed-off-by
>
> I have bisected this long standing issue and identified the commit
> that introduced it. If anybody can provide a different fix that would
> be better, but if it's not possible to find another solution, it would
> be good to revert it until it can be found, to resolve the regression.
Thanks for tracking this down.
There is one issue with reverting: iirc, it reintroduces a race / double-free.
Could you share a backtrace when hitting this deadlock?
On my side, I am not able to catch it neither on x86 nor in a ARM vm I borrowed.
I built dpdk manually in a Debian 12 container, trying to mimick OBS
cflags & friends.
# rm -rf build-debian; CC='ccache gcc' meson setup build-debian
-Dmachine=default -Dbuildtype=plain -Ddefault_library=shared
-Dc_args='-O2 -fstack-protector-strong -Wformat
-Werror=format-security -Werror -Wdate-time -D_FORTIFY_SOURCE=2' &&
ninja -C build-debian && meson test -C build-debian --suite fast-tests
--verbose -t 5
...
36/81 DPDK:fast-tests / lcores_autotest RUNNING
>>> LD_LIBRARY_PATH=/root/dpdk/build-debian/lib:/root/dpdk/build-debian/drivers MALLOC_PERTURB_=90 DPDK_TEST=lcores_autotest /root/dpdk/build-debian/app/dpdk-test --no-huge -m 2048 -d /root/dpdk/build-debian/drivers
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
✀ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
EAL: Detected CPU lcores: 3
EAL: Detected NUMA nodes: 1
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
VIRTIO_INIT: eth_virtio_pci_init(): Failed to init PCI device
PCI_BUS: Requested device 0000:01:00.0 cannot be used
APP: HPET is not enabled, using TSC as default timer
RTE>>lcores_autotest
EAL threads count: 3, RTE_MAX_LCORE=256
lcore 0, socket 0, role RTE, cpuset 0
lcore 1, socket 0, role RTE, cpuset 1
lcore 2, socket 0, role RTE, cpuset 2
non-EAL threads count: 253
Warning: could not register new thread (this might be expected during
this test), reason Cannot allocate memory
non-EAL threads count: 254
Warning: could not register new thread (this might be expected during
this test), reason Cannot allocate memory
lcore 0, socket 0, role RTE, cpuset 0
lcore 1, socket 0, role RTE, cpuset 1
lcore 2, socket 0, role RTE, cpuset 2
lcore 3, socket 0, role NON_EAL, cpuset 0
lcore 0, socket 0, role RTE, cpuset 0
lcore 1, socket 0, role RTE, cpuset 1
lcore 2, socket 0, role RTE, cpuset 2
Control thread running successfully
Test OK
RTE>>――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
36/81 DPDK:fast-tests / lcores_autotest OK 1.87s
This vm runs on:
# lspcu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 3
On-line CPU(s) list: 0-2
Vendor ID: ARM
BIOS Vendor ID: QEMU
Model name: Neoverse-N1
BIOS Model name: virt-rhel8.6.0 CPU @ 2.0GHz
...
--
David Marchand
next prev parent reply other threads:[~2024-10-31 12:47 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-30 19:08 [PATCH] " luca.boccassi
2024-10-30 19:52 ` Stephen Hemminger
2024-10-30 20:31 ` Luca Boccassi
2024-10-30 20:30 ` [PATCH v2] " luca.boccassi
2024-10-31 12:47 ` David Marchand [this message]
2024-10-31 12:52 ` David Marchand
2024-10-31 12:58 ` Luca Boccassi
2024-10-31 13:03 ` David Marchand
2024-10-31 14:05 ` Luca Boccassi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJFAV8zZuO6BYmkvDnsx5G14PFZGxSUEkvSX64ve3qW1yn+mmQ@mail.gmail.com \
--to=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=luca.boccassi@gmail.com \
--cc=roretzla@linux.microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).