From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8F47F437A1; Wed, 27 Dec 2023 04:14:10 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0501A4029C; Wed, 27 Dec 2023 04:14:10 +0100 (CET) Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by mails.dpdk.org (Postfix) with ESMTP id 5C7414027A for ; Wed, 27 Dec 2023 04:14:08 +0100 (CET) Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-2ccba761783so22146721fa.1 for ; Tue, 26 Dec 2023 19:14:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703646847; x=1704251647; darn=dpdk.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=42e0Rb4P/G5zEAJcLzJ/mPC8ixUq+x/7QCbtNIwsTCI=; b=Vu/saXWwAi/jQaXbi+vRxahvWsOXdXJjyWJ5k57XREu/mS2twpOSGORivw99CgCK3d Go1rDxHem+hPiFJaN+vybLdXHWeructPdsKrLKk+rwsuEnCwZrpOil9hTVgFCbXg/HK3 A+k1hkRxL2i3ePcXffU5tss0W5pvgEFK2fv89XNVQOGNT01EkdAEL2mtUk7+zbmlUl+J iM7mCiBhXgrCXc+oSiwmHNuGPonnAK4lUp/cNPH92kQTEt6kPQruN3bE9/qw4ZhGeUCh ecP0GDShsel3ZmReiCu9oxPs5ZdEob/k8KOgGs6pMfzhbYRe5UBypCCbH5xg1dXThH1z 9f6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703646847; x=1704251647; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=42e0Rb4P/G5zEAJcLzJ/mPC8ixUq+x/7QCbtNIwsTCI=; b=SX99k2QE2vW16Wsavt6VMsDZOpsXKIqdPkOreu9ntpQoRnARfzAng3jJOJPpU6XWDo rrXYi+ie4VauV9KyAshexSYHfip3d4EzkSqFvA3rQUd5ZeMsv8E1N+jECx7wR6/wR+1S ueU5rwviy06GTLTrqpb51KHkFZSFTQWLL+1dQy8HiWN6aS0JMMlAZ4z2Mg9AFVjLmVRx DipPdZH313f73yuM70IGbqzbgdDjrZN2huZk4XD9FubgxBngM0OsjESESWPanxvN9Qr7 m23a2NUHaNpgXs4md382rRaheKKAhFnAAXGCjHit6uCoBRY47tFL+HSkLjFL6Z//dzI3 YdSQ== X-Gm-Message-State: AOJu0Yy6x0H5TYK/cu4u/yAdqeu9j2t86bEOGPXPru060kAotCLjDljp V8/fN0EZUh1q49jI+ZCVX+VbAdk7HbFw9efR/x7R8q7h7ZQ= X-Google-Smtp-Source: AGHT+IFkfxelNDBgqrg/FWDrEdqVrecc5aqB8xR/H7/NqcpdeIiLB9J189AJgRjNQupC1/K9ivwFD8wmTon2JKsnVVc= X-Received: by 2002:a2e:b050:0:b0:2cc:d5b0:3f0f with SMTP id d16-20020a2eb050000000b002ccd5b03f0fmr681563ljl.79.1703646847252; Tue, 26 Dec 2023 19:14:07 -0800 (PST) MIME-Version: 1.0 From: Linzhe Lee Date: Wed, 27 Dec 2023 11:13:56 +0800 Message-ID: Subject: memory_hotplug_lock deadlock during initialization in Multi-process Mode on DPDK Version 22.11.3 LTS To: dev@dpdk.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Dear Team, I hope this message finds you well. We have encountered a recurring deadlock issue within the function rte_rwlock_write_lock in the DPDK version 22.11.3 LTS. It appears to be related to a known matter addressed in https://bugs.dpdk.org/show_bug.cgi?id=1277 and subsequently resolved in version 23.11. I kindly propose the backporting of this fix to the 22.11 branch, considering its status as a long-term support (LTS) version. This deadlock scenario significantly impacts the initialization of the secondary program, rendering it unable to function correctly. Here is a snippet of the secondary program's initiation call stack: ``` #0 0x00000000013dd604 in rte_mcfg_mem_read_lock () #1 0x00000000013def02 in rte_memseg_list_walk () #2 0x00000000013fbc85 in eal_memalloc_init () #3 0x00000000013df73b in rte_eal_memory_init () #4 0x0000000000889cf5 in rte_eal_init.cold () #5 0x000000000088d094 in main () at ../app/status_server/main.cc:96 #6 0x00007ffff678e555 in __libc_start_main () from /lib64/libc.so.6 #7 0x00000000009ca80d in _start () at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/shared_ptr_base.h:1169 ``` The main program's situation during this deadlock is as follows: ``` (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7fdec00 (LWP 20071))] #0 0x00007ffff6b1d85d in nanosleep () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff6b1d85d in nanosleep () from /lib64/libc.so.6 #1 0x00007ffff6b1d6f4 in sleep () from /lib64/libc.so.6 #2 0x00000000006e1f24 in lcore_main (pInfo=) at ../app/main/main.c:682 #3 main () at ../app/main/main.c:1174 #4 0x00007ffff6a7a555 in __libc_start_main () from /lib64/libc.so.6 #5 0x000000000081f57d in _start () (gdb) thread 2 [Switching to thread 2 (Thread 0x7ffff3c50700 (LWP 20166))] #0 0x00007ffff6e349dd in accept () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007ffff6e349dd in accept () from /lib64/libpthread.so.0 #1 0x0000000001172b23 in socket_listener () #2 0x00007ffff6e2dea5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007ffff6b568dd in clone () from /lib64/libc.so.6 (gdb) thread 3 [Switching to thread 3 (Thread 0x7ffff4451700 (LWP 20157))] #0 0x00007ffff6e34bad in recvmsg () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007ffff6e34bad in recvmsg () from /lib64/libpthread.so.0 #1 0x000000000115fce7 in mp_handle () #2 0x00007ffff6e2dea5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007ffff6b568dd in clone () from /lib64/libc.so.6 (gdb) thread 4 [Switching to thread 4 (Thread 0x7ffff4c52700 (LWP 20156))] #0 0x00007ffff6b56eb3 in epoll_wait () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff6b56eb3 in epoll_wait () from /lib64/libc.so.6 #1 0x0000000001169be4 in eal_intr_thread_main () #2 0x00007ffff6e2dea5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007ffff6b568dd in clone () from /lib64/libc.so.6 ``` Your assistance in resolving this matter or providing guidance on a workaround would be greatly appreciated. Thank you for your attention to this issue.