Using lslocks command on Linux I see that the primary has a lock on  /mnt/huge2M and the secondary is waiting for a lock on the same directory.

SECONDARY  2416270 FLOCK      WRITE* 0     0          0 /mnt/huge2M...    2416174
PRIMARY         2416174 FLOCK      WRITE  0     0          0 /mnt/huge2M...

Is a PRIMARY supposed to hold a permanent lock on a /mnt/huge2M ?




Il giorno mer 24 ago 2022 alle ore 11:18 Anna Tauzzi <admin@argonnetech.net> ha scritto:
Already tried the first suggestion with no luck, the secondary always gets stuck:

#0  0x00007fc6d3eb05ab in flock () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007fc6d3ba1343 in sync_walk () from /usr/local/lib/librte_eal.so.22
#2  0x00007fc6d3b8402b in rte_memseg_list_walk_thread_unsafe () from /usr/local/lib/librte_eal.so.22
#3  0x00007fc6d3ba18bf in eal_memalloc_sync_with_primary () from /usr/local/lib/librte_eal.so.22
#4  0x00007fc6d3ba24b5 in rte_eal_hugepage_attach () from /usr/local/lib/librte_eal.so.22
#5  0x00007fc6d3b848f1 in rte_eal_memory_init () from /usr/local/lib/librte_eal.so.22
#6  0x00007fc6d3b782aa in rte_eal_init.cold () from /usr/local/lib/librte_eal.so.22

For the second info:
if I prevent  the primary to allocate on the NUMA where  secondary is running, then, the secondary doesn't get stuck.




Il giorno mer 24 ago 2022 alle ore 11:14 Antonio Di Bacco <a.dibacco.ks@gmail.com> ha scritto:
Can you try launching the secondary with some delay in order not to
overlap with memory allocations done in the primary?
Is your primary allocating memory on NUMA 0 where the secondary is running?

On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi <admin@argonnetech.net> wrote:
>
> I have a primary process that spawns a secondary process.Primary is on NUMA 1 while secondary on NUMA 0.
> The secondary process starts up but when calling rte_eal_init it gets stuck with this backtrace:
>
> flock()
> sync_walk()
> rte_memseg_list_walk_thread_unsafe()
> eal_memalloc_sync_with_primary()
> rte_eal_hugepage_attach()
> rte_eal_memory_init()
> rte_eal_init.cold()
>
> While starting the secondary, it is possible that the primary is allocating memory on different NUMAs. I'm saying this because if in the primary I replace the dpdk memory allocation function (rte_zalloc...) with a plain memalign I don't get this problem.
>
>
>