Bug ID 1668
Summary EAL: rte_eal_mp_remote_launch is able to launch an lcore which is running
Product DPDK
Version 25.03
Hardware x86
OS Linux
Status UNCONFIRMED
Severity normal
Priority Normal
Component core
Assignee dev@dpdk.org
Reporter probb@iol.unh.edu
Target Milestone ---

Created attachment 303 [details]
Testlog

Branch: main

Environment info:

$ lscpu
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      45 bits physical, 48 bits virtual
CPU(s):                             16
On-line CPU(s) list:                0-15
Thread(s) per core:                 1
Core(s) per socket:                 1
Socket(s):                          16
NUMA node(s):                       2
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              85
Model name:                         Intel(R) Xeon(R) Gold 6246 CPU @ 3.30GHz

# gcc --version
gcc (Debian 10.2.1-6) 10.2.1 20210110

cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"

From UNH CI, we had a fail for dpdk-test per_lcore_autotest, due to
rte_eal_mp_remote_launch launching an lcore which is already running. I have
attached the testlogs, and the relevant blurgs are below.

78/119 DPDK:fast-tests / per_lcore_autotest             FAIL             1.14s 
 (exit status 255 or signal 127 SIGinvalid)
16:49:56 DPDK_TEST=per_lcore_autotest MALLOC_PERTURB_=178
/root/workspace/Generic-Unit-Test-DPDK/dpdk/build/app/dpdk-test --no-huge -m
2048
----------------------------------- output -----------------------------------
stdout:
RTE>>per_lcore_autotest
on socket 0, on core 1, variable is 1
wait 100ms on lcore 1
It does remote launch successfully but it should not at this time
Test Failed
RTE>>wait 100ms on lcore 1
stderr:
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
APP: HPET is not enabled, using TSC as default timer
------------------------------------------------------------------------------

The test:

```
test_per_lcore(void)
{
unsigned lcore_id;
int ret;

rte_eal_mp_remote_launch(assign_vars, NULL, SKIP_MAIN);
RTE_LCORE_FOREACH_WORKER(lcore_id) {
if (rte_eal_wait_lcore(lcore_id) < 0)
return -1;
}

rte_eal_mp_remote_launch(display_vars, NULL, SKIP_MAIN);
RTE_LCORE_FOREACH_WORKER(lcore_id) {
if (rte_eal_wait_lcore(lcore_id) < 0)
return -1;
}

/* test if it could do remote launch twice at the same time or not */
ret = rte_eal_mp_remote_launch(test_per_lcore_delay, NULL, SKIP_MAIN);
if (ret < 0) {
printf("It fails to do remote launch but it should able to do\n");
return -1;
}
/* it should not be able to launch a lcore which is running */
ret = rte_eal_mp_remote_launch(test_per_lcore_delay, NULL, SKIP_MAIN);
if (ret == 0) {
printf("It does remote launch successfully but it should not at this time\n");
return -1;
}
RTE_LCORE_FOREACH_WORKER(lcore_id) {
if (rte_eal_wait_lcore(lcore_id) < 0)
return -1;
}

return 0;
}
```

The patch which triggered the run which resulted in the fail has CI results
here, although it is unrelated to the failure: Patch:
https://lab.dpdk.org/results/dashboard/patchsets/32742/

I will also note that Debian 11 is EOL and should be removed from our CI. We
will do this, but I felt that I should add this bug anyhow since it is unlikely
to be caused by the OS version.
          


You are receiving this mail because: