https://bugs.dpdk.org/show_bug.cgi?id=1668 Bug ID: 1668 Summary: EAL: rte_eal_mp_remote_launch is able to launch an lcore which is running Product: DPDK Version: 25.03 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: Normal Component: core Assignee: dev@dpdk.org Reporter: probb@iol.unh.edu Target Milestone: --- Created attachment 303 --> https://bugs.dpdk.org/attachment.cgi?id=303&action=edit Testlog Branch: main Environment info: $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 45 bits physical, 48 bits virtual CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 16 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6246 CPU @ 3.30GHz # gcc --version gcc (Debian 10.2.1-6) 10.2.1 20210110 cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" From UNH CI, we had a fail for dpdk-test per_lcore_autotest, due to rte_eal_mp_remote_launch launching an lcore which is already running. I have attached the testlogs, and the relevant blurgs are below. 78/119 DPDK:fast-tests / per_lcore_autotest FAIL 1.14s (exit status 255 or signal 127 SIGinvalid) 16:49:56 DPDK_TEST=per_lcore_autotest MALLOC_PERTURB_=178 /root/workspace/Generic-Unit-Test-DPDK/dpdk/build/app/dpdk-test --no-huge -m 2048 ----------------------------------- output ----------------------------------- stdout: RTE>>per_lcore_autotest on socket 0, on core 1, variable is 1 wait 100ms on lcore 1 It does remote launch successfully but it should not at this time Test Failed RTE>>wait 100ms on lcore 1 stderr: EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 2 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized APP: HPET is not enabled, using TSC as default timer ------------------------------------------------------------------------------ The test: ``` test_per_lcore(void) { unsigned lcore_id; int ret; rte_eal_mp_remote_launch(assign_vars, NULL, SKIP_MAIN); RTE_LCORE_FOREACH_WORKER(lcore_id) { if (rte_eal_wait_lcore(lcore_id) < 0) return -1; } rte_eal_mp_remote_launch(display_vars, NULL, SKIP_MAIN); RTE_LCORE_FOREACH_WORKER(lcore_id) { if (rte_eal_wait_lcore(lcore_id) < 0) return -1; } /* test if it could do remote launch twice at the same time or not */ ret = rte_eal_mp_remote_launch(test_per_lcore_delay, NULL, SKIP_MAIN); if (ret < 0) { printf("It fails to do remote launch but it should able to do\n"); return -1; } /* it should not be able to launch a lcore which is running */ ret = rte_eal_mp_remote_launch(test_per_lcore_delay, NULL, SKIP_MAIN); if (ret == 0) { printf("It does remote launch successfully but it should not at this time\n"); return -1; } RTE_LCORE_FOREACH_WORKER(lcore_id) { if (rte_eal_wait_lcore(lcore_id) < 0) return -1; } return 0; } ``` The patch which triggered the run which resulted in the fail has CI results here, although it is unrelated to the failure: Patch: https://lab.dpdk.org/results/dashboard/patchsets/32742/ I will also note that Debian 11 is EOL and should be removed from our CI. We will do this, but I felt that I should add this bug anyhow since it is unlikely to be caused by the OS version. -- You are receiving this mail because: You are the assignee for the bug.