* Secondary process stuck in rte_eal_memory_init
@ 2022-08-23 14:54 Anna Tauzzi
2022-08-24 9:14 ` Antonio Di Bacco
0 siblings, 1 reply; 4+ messages in thread
From: Anna Tauzzi @ 2022-08-23 14:54 UTC (permalink / raw)
To: users
[-- Attachment #1: Type: text/plain, Size: 617 bytes --]
I have a primary process that spawns a secondary process.Primary is on NUMA
1 while secondary on NUMA 0.
The secondary process starts up but when calling rte_eal_init it gets stuck
with this backtrace:
flock()
sync_walk()
rte_memseg_list_walk_thread_unsafe()
eal_memalloc_sync_with_primary()
rte_eal_hugepage_attach()
rte_eal_memory_init()
rte_eal_init.cold()
While starting the secondary, it is possible that the primary is allocating
memory on different NUMAs. I'm saying this because if in the primary I
replace the dpdk memory allocation function (rte_zalloc...) with a plain
memalign I don't get this problem.
[-- Attachment #2: Type: text/html, Size: 811 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Secondary process stuck in rte_eal_memory_init
2022-08-23 14:54 Secondary process stuck in rte_eal_memory_init Anna Tauzzi
@ 2022-08-24 9:14 ` Antonio Di Bacco
2022-08-24 9:18 ` Anna Tauzzi
0 siblings, 1 reply; 4+ messages in thread
From: Antonio Di Bacco @ 2022-08-24 9:14 UTC (permalink / raw)
To: Anna Tauzzi; +Cc: users
Can you try launching the secondary with some delay in order not to
overlap with memory allocations done in the primary?
Is your primary allocating memory on NUMA 0 where the secondary is running?
On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi <admin@argonnetech.net> wrote:
>
> I have a primary process that spawns a secondary process.Primary is on NUMA 1 while secondary on NUMA 0.
> The secondary process starts up but when calling rte_eal_init it gets stuck with this backtrace:
>
> flock()
> sync_walk()
> rte_memseg_list_walk_thread_unsafe()
> eal_memalloc_sync_with_primary()
> rte_eal_hugepage_attach()
> rte_eal_memory_init()
> rte_eal_init.cold()
>
> While starting the secondary, it is possible that the primary is allocating memory on different NUMAs. I'm saying this because if in the primary I replace the dpdk memory allocation function (rte_zalloc...) with a plain memalign I don't get this problem.
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Secondary process stuck in rte_eal_memory_init
2022-08-24 9:14 ` Antonio Di Bacco
@ 2022-08-24 9:18 ` Anna Tauzzi
2022-08-24 10:11 ` Anna Tauzzi
0 siblings, 1 reply; 4+ messages in thread
From: Anna Tauzzi @ 2022-08-24 9:18 UTC (permalink / raw)
To: Antonio Di Bacco; +Cc: users
[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]
Already tried the first suggestion with no luck, the secondary always gets
stuck:
#0 0x00007fc6d3eb05ab in flock () at ../sysdeps/unix/syscall-template.S:78
#1 0x00007fc6d3ba1343 in sync_walk () from /usr/local/lib/librte_eal.so.22
#2 0x00007fc6d3b8402b in rte_memseg_list_walk_thread_unsafe () from
/usr/local/lib/librte_eal.so.22
#3 0x00007fc6d3ba18bf in eal_memalloc_sync_with_primary () from
/usr/local/lib/librte_eal.so.22
#4 0x00007fc6d3ba24b5 in rte_eal_hugepage_attach () from
/usr/local/lib/librte_eal.so.22
#5 0x00007fc6d3b848f1 in rte_eal_memory_init () from
/usr/local/lib/librte_eal.so.22
#6 0x00007fc6d3b782aa in rte_eal_init.cold () from
/usr/local/lib/librte_eal.so.22
For the second info:
if I prevent the primary to allocate on the NUMA where secondary is
running, then, the secondary doesn't get stuck.
Il giorno mer 24 ago 2022 alle ore 11:14 Antonio Di Bacco <
a.dibacco.ks@gmail.com> ha scritto:
> Can you try launching the secondary with some delay in order not to
> overlap with memory allocations done in the primary?
> Is your primary allocating memory on NUMA 0 where the secondary is running?
>
> On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi <admin@argonnetech.net> wrote:
> >
> > I have a primary process that spawns a secondary process.Primary is on
> NUMA 1 while secondary on NUMA 0.
> > The secondary process starts up but when calling rte_eal_init it gets
> stuck with this backtrace:
> >
> > flock()
> > sync_walk()
> > rte_memseg_list_walk_thread_unsafe()
> > eal_memalloc_sync_with_primary()
> > rte_eal_hugepage_attach()
> > rte_eal_memory_init()
> > rte_eal_init.cold()
> >
> > While starting the secondary, it is possible that the primary is
> allocating memory on different NUMAs. I'm saying this because if in the
> primary I replace the dpdk memory allocation function (rte_zalloc...) with
> a plain memalign I don't get this problem.
> >
> >
> >
>
[-- Attachment #2: Type: text/html, Size: 2496 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Secondary process stuck in rte_eal_memory_init
2022-08-24 9:18 ` Anna Tauzzi
@ 2022-08-24 10:11 ` Anna Tauzzi
0 siblings, 0 replies; 4+ messages in thread
From: Anna Tauzzi @ 2022-08-24 10:11 UTC (permalink / raw)
To: Antonio Di Bacco; +Cc: users
[-- Attachment #1: Type: text/plain, Size: 2448 bytes --]
Using lslocks command on Linux I see that the primary has a lock on
/mnt/huge2M and the secondary is waiting for a lock on the same directory.
SECONDARY 2416270 FLOCK WRITE* 0 0 0 /mnt/huge2M...
2416174
PRIMARY 2416174 FLOCK WRITE 0 0 0 /mnt/huge2M...
Is a PRIMARY supposed to hold a permanent lock on a /mnt/huge2M ?
Il giorno mer 24 ago 2022 alle ore 11:18 Anna Tauzzi <admin@argonnetech.net>
ha scritto:
> Already tried the first suggestion with no luck, the secondary always gets
> stuck:
>
> #0 0x00007fc6d3eb05ab in flock () at ../sysdeps/unix/syscall-template.S:78
> #1 0x00007fc6d3ba1343 in sync_walk () from /usr/local/lib/librte_eal.so.22
> #2 0x00007fc6d3b8402b in rte_memseg_list_walk_thread_unsafe () from
> /usr/local/lib/librte_eal.so.22
> #3 0x00007fc6d3ba18bf in eal_memalloc_sync_with_primary () from
> /usr/local/lib/librte_eal.so.22
> #4 0x00007fc6d3ba24b5 in rte_eal_hugepage_attach () from
> /usr/local/lib/librte_eal.so.22
> #5 0x00007fc6d3b848f1 in rte_eal_memory_init () from
> /usr/local/lib/librte_eal.so.22
> #6 0x00007fc6d3b782aa in rte_eal_init.cold () from
> /usr/local/lib/librte_eal.so.22
>
> For the second info:
> if I prevent the primary to allocate on the NUMA where secondary is
> running, then, the secondary doesn't get stuck.
>
>
>
>
> Il giorno mer 24 ago 2022 alle ore 11:14 Antonio Di Bacco <
> a.dibacco.ks@gmail.com> ha scritto:
>
>> Can you try launching the secondary with some delay in order not to
>> overlap with memory allocations done in the primary?
>> Is your primary allocating memory on NUMA 0 where the secondary is
>> running?
>>
>> On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi <admin@argonnetech.net>
>> wrote:
>> >
>> > I have a primary process that spawns a secondary process.Primary is on
>> NUMA 1 while secondary on NUMA 0.
>> > The secondary process starts up but when calling rte_eal_init it gets
>> stuck with this backtrace:
>> >
>> > flock()
>> > sync_walk()
>> > rte_memseg_list_walk_thread_unsafe()
>> > eal_memalloc_sync_with_primary()
>> > rte_eal_hugepage_attach()
>> > rte_eal_memory_init()
>> > rte_eal_init.cold()
>> >
>> > While starting the secondary, it is possible that the primary is
>> allocating memory on different NUMAs. I'm saying this because if in the
>> primary I replace the dpdk memory allocation function (rte_zalloc...) with
>> a plain memalign I don't get this problem.
>> >
>> >
>> >
>>
>
[-- Attachment #2: Type: text/html, Size: 3387 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-08-24 10:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-23 14:54 Secondary process stuck in rte_eal_memory_init Anna Tauzzi
2022-08-24 9:14 ` Antonio Di Bacco
2022-08-24 9:18 ` Anna Tauzzi
2022-08-24 10:11 ` Anna Tauzzi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).