* [Bug 1277] memory_hotplug_lock deadlock during initialization
@ 2023-08-23 14:02 bugzilla
2023-08-23 14:56 ` Stephen Hemminger
0 siblings, 1 reply; 2+ messages in thread
From: bugzilla @ 2023-08-23 14:02 UTC (permalink / raw)
To: dev
[-- Attachment #1: Type: text/plain, Size: 7843 bytes --]
https://bugs.dpdk.org/show_bug.cgi?id=1277
Bug ID: 1277
Summary: memory_hotplug_lock deadlock during initialization
Product: DPDK
Version: unspecified
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: core
Assignee: dev@dpdk.org
Reporter: artemyko@nvidia.com
Target Milestone: ---
It seems the issue arose due to changes in the DPDK read-write lock
implementation. Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if it
already possesses one. The problem arises during initialization: the
rte_eal_memory_init() function acquires the memory_hotplug_lock, and later on,
the sequence of calls eal_memalloc_init() -> rte_memseg_list_walk() acquires it
again without releasing it. This scenario introduces the risk of a potential
deadlock when concurrent write locks are applied to the same
memory_hotplug_lock. To address this locally, we resolved the issue by
replacing rte_memseg_list_walk() with rte_memseg_list_walk_thread_unsafe().
Reproduction:
Create mp_deadlock directory under dpdk/examples/. Then add main.c
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright(c) 2010-2014 Intel Corporation
*/
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <errno.h>
#include <sys/queue.h>
#include <rte_memory.h>
#include <rte_malloc.h>
#include <rte_launch.h>
#include <rte_eal.h>
#include <rte_per_lcore.h>
#include <rte_lcore.h>
#include <rte_debug.h>
/* Initialization of Environment Abstraction Layer (EAL). 8< */
int
main(int argc, char **argv)
{
int ret;
ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_panic("Cannot init EAL\n");
/* >8 End of initialization of Environment Abstraction Layer */
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
getchar();
else {
if (rte_lcore_id() <= 1) {
int i = 0;
void *p;
while (1) {
p = rte_malloc_socket(NULL, 0x1000000, 0x1000,
-1);
rte_free(p);
printf("malloc %d times\n", i++);
}
}
}
/* clean up the EAL */
rte_eal_cleanup();
return 0;
}
Compile: I followed https://doc.dpdk.org/guides/prog_guide/build_app.html and
some tips from related web page.
Run primary: ./examples/mp_deadlock/build/mp_deadlock -l 0 --file-prefix=dpdk1
--proc-type=primary
Run secondary 1: ./examples/mp_deadlock/build/mp_deadlock -l 1
--file-prefix=dpdk1 --proc-type=secondary
Run secondary 2:
while true
do
./examples/mp_deadlock/build/mp_deadlock -l 2 --file-prefix=dpdk1
--proc-type=secondary
done
Stack trace. It looks like the following caused deadlock.
#0 0x00007f850e97a3f2 in rte_mcfg_mem_write_lock () from
/usr/local/lib64/librte_eal.so.23
And
#0 0x00007f3f591b5362 in rte_mcfg_mem_read_lock () from
/usr/local/lib64/librte_eal.so.23
[root@fedora dpdk]# ps -ef | grep deadlock
root 7328 1004 0 20:47 pts/0 00:00:00 bash ./mp_deadlock1.sh
root 7329 7328 4 20:47 pts/0 00:00:01
./examples/mp_deadlock/build/mp_deadlock -l 0 --file-prefix=dpdk1
--proc-type=primary
root 7333 5693 0 20:47 pts/4 00:00:00 bash ./mp_deadlock2.sh
root 7334 7333 94 20:47 pts/4 00:00:31
./examples/mp_deadlock/build/mp_deadlock -l 1 --file-prefix=dpdk1
--proc-type=secondary
root 7337 5267 0 20:47 pts/1 00:00:00 bash ./mp_deadlock.sh
root 7338 7337 98 20:47 pts/1 00:00:29
./examples/mp_deadlock/build/mp_deadlock -l 2 --file-prefix=dpdk1
--proc-type=secondary
root 7342 5480 0 20:47 pts/2 00:00:00 grep --color=auto deadlock
[root@fedora dpdk]# pstack 7329
Thread 4 (Thread 0x7f20ae487640 (LWP 7332) "telemetry-v2"):
#0 0x00007f20b200ae6f in accept () from /lib64/libc.so.6
#1 0x00007f20b1e004a3 in socket_listener () from
/usr/local/lib64/librte_telemetry.so.23
#2 0x00007f20b1f85b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f20b200a6a0 in clone3 () from /lib64/libc.so.6
Thread 3 (Thread 0x7f20aec88640 (LWP 7331) "rte_mp_handle"):
#0 0x00007f20b200b23d in recvmsg () from /lib64/libc.so.6
#1 0x00007f20b2137ecf in mp_handle () from /usr/local/lib64/librte_eal.so.23
#2 0x00007f20b1f85b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f20b200a6a0 in clone3 () from /lib64/libc.so.6
Thread 2 (Thread 0x7f20af489640 (LWP 7330) "eal-intr-thread"):
#0 0x00007f20b2009c7e in epoll_wait () from /lib64/libc.so.6
#1 0x00007f20b2141c54 in eal_intr_thread_main () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f20b1f85b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f20b200a6a0 in clone3 () from /lib64/libc.so.6
Thread 1 (Thread 0x7f20b1df9900 (LWP 7329) "mp_deadlock"):
#0 0x00007f20b1ff984c in read () from /lib64/libc.so.6
#1 0x00007f20b1f7e914 in __GI__IO_file_underflow () from /lib64/libc.so.6
#2 0x00007f20b1f7f946 in _IO_default_uflow () from /lib64/libc.so.6
#3 0x00007f20b1f7a328 in getc () from /lib64/libc.so.6
#4 0x000000000040113e in main ()
[root@fedora dpdk]# pstack 7334
Thread 3 (Thread 0x7f850b4da640 (LWP 7336) "rte_mp_handle"):
#0 0x00007f850e85d23d in recvmsg () from /lib64/libc.so.6
#1 0x00007f850e989ecf in mp_handle () from /usr/local/lib64/librte_eal.so.23
#2 0x00007f850e7d7b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f850e85c6a0 in clone3 () from /lib64/libc.so.6
Thread 2 (Thread 0x7f850bcdb640 (LWP 7335) "eal-intr-thread"):
#0 0x00007f850e85bc7e in epoll_wait () from /lib64/libc.so.6
#1 0x00007f850e993c54 in eal_intr_thread_main () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f850e7d7b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f850e85c6a0 in clone3 () from /lib64/libc.so.6
Thread 1 (Thread 0x7f850e64b900 (LWP 7334) "mp_deadlock"):
#0 0x00007f850e97a3f2 in rte_mcfg_mem_write_lock () from
/usr/local/lib64/librte_eal.so.23
#1 0x00007f850e984509 in malloc_heap_free () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f850e98508f in rte_free () from /usr/local/lib64/librte_eal.so.23
#3 0x0000000000401126 in main ()
[root@fedora dpdk]# pstack 7338
Thread 3 (Thread 0x7f3f55d15640 (LWP 7340) "rte_mp_handle"):
#0 0x00007f3f5909823d in recvmsg () from /lib64/libc.so.6
#1 0x00007f3f591c4ecf in mp_handle () from /usr/local/lib64/librte_eal.so.23
#2 0x00007f3f59012b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f3f590976a0 in clone3 () from /lib64/libc.so.6
Thread 2 (Thread 0x7f3f56516640 (LWP 7339) "eal-intr-thread"):
#0 0x00007f3f59096c7e in epoll_wait () from /lib64/libc.so.6
#1 0x00007f3f591cec54 in eal_intr_thread_main () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f3f59012b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f3f590976a0 in clone3 () from /lib64/libc.so.6
Thread 1 (Thread 0x7f3f58e86900 (LWP 7338) "mp_deadlock"):
#0 0x00007f3f591b5362 in rte_mcfg_mem_read_lock () from
/usr/local/lib64/librte_eal.so.23
#1 0x00007f3f591b6bf2 in rte_memseg_list_walk () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f3f591d2f65 in eal_memalloc_init () from
/usr/local/lib64/librte_eal.so.23
#3 0x00007f3f591b741b in rte_eal_memory_init () from
/usr/local/lib64/librte_eal.so.23
#4 0x00007f3f591aab64 in rte_eal_init.cold () from
/usr/local/lib64/librte_eal.so.23
#5 0x00000000004010d9 in main ()
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #2: Type: text/html, Size: 10012 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Bug 1277] memory_hotplug_lock deadlock during initialization
2023-08-23 14:02 [Bug 1277] memory_hotplug_lock deadlock during initialization bugzilla
@ 2023-08-23 14:56 ` Stephen Hemminger
0 siblings, 0 replies; 2+ messages in thread
From: Stephen Hemminger @ 2023-08-23 14:56 UTC (permalink / raw)
To: bugzilla; +Cc: dev
> It seems the issue arose due to changes in the DPDK read-write lock
> implementation. Following these changes, the RW-lock no longer supports
> recursion, implying that a single thread shouldn't obtain a read lock if it
> already possesses one. The problem arises during initialization: the
> rte_eal_memory_init() function acquires the memory_hotplug_lock, and later on,
> the sequence of calls eal_memalloc_init() -> rte_memseg_list_walk() acquires it
> again without releasing it. This scenario introduces the risk of a potential
> deadlock when concurrent write locks are applied to the same
> memory_hotplug_lock. To address this locally, we resolved the issue by
> replacing rte_memseg_list_walk() with rte_memseg_list_walk_thread_unsafe().
recursive read locks are bad idea and the fact that it worked before was
an accident.
Please send patch for review.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-08-23 14:56 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-23 14:02 [Bug 1277] memory_hotplug_lock deadlock during initialization bugzilla
2023-08-23 14:56 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).