Hello,
This is an update to this bug research, as I have had time to look at it again
I have created an example program (code below) and tried them with debug & rte_malloc_debug using dpdk 20.11 and 22.07
the results are the same - and will also be below
I now suspect it could be a bug in DPDK dynamic memory mode (it doesn't happen in legacy mode)
and may be related to a long allocation time causing a timeout
The application code is very minimal, and should at the most get an error at `rte_mempool_create`
more information about the system, firmware and DPDK compilation can be provided if it may be related to that
The primary process code:
#include <rte_eal.h>
#include <memory>
int main(void) {
const char* flags[] = {"-l","1","--no-pci"};
rte_eal_init(sizeof(flags) / sizeof(char*), std::const_cast<char **>(flags));
printf("primary started");
while (true) {}
return 0;
}
The secondary process code:
#include <rte_eal.h>
#include <rte_mempool.h>
#include <memory>
int main(void) {
const char* flags[] = {"-l","1","--no-pci", "--proc-type", "secondary"};
rte_eal_init(sizeof(flags) / sizeof(char*), std::const_cast<char **>(flags));
rte_mempool* pool = rte_mempool_create("my_pool", 150000000, 40, 0, 0, NULL,
NULL, NULL, NULL, 0, 0); // 150M elements * 40B = 6GB mempool
if (pool) {
printf("allocation success");
} else {
printf("allocation failure");
}
fflush(stdout);
return 0;
}
The result in the primary process:
EAL: Detected CPU lcores: 96
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-Process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
TELEMETRY: No legacy callbacks, legacy socket not created
primary started
The results in the secondary process:
EAL: Detected CPU lcores: 96
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-Process socket /var/run/dpdk/rte/mp_socket_.......
EAL: Selected IOVA mode 'PA'
EAL: Request timed out // <---------------This is the rte_mempool_create
EAL: Request timed out
EAL: Request timed out
*** crashes with retcode 139
The main process looks find from the CLI, but the secondary will not be able to start again (stuck at EAL: Selected IOVA mode 'PA')
What should my next step be? As far as debugging / solving / reporting this?