From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4188DA00C2 for ; Wed, 24 Aug 2022 11:18:52 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0FE0440DFD; Wed, 24 Aug 2022 11:18:52 +0200 (CEST) Received: from mail-vs1-f46.google.com (mail-vs1-f46.google.com [209.85.217.46]) by mails.dpdk.org (Postfix) with ESMTP id 0496940DDE for ; Wed, 24 Aug 2022 11:18:50 +0200 (CEST) Received: by mail-vs1-f46.google.com with SMTP id h67so15949753vsc.11 for ; Wed, 24 Aug 2022 02:18:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=argonnetech.net; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=QWGRnIk8sgGCXFrC7NxktpD8dGBcLScYt+ZcurceEzQ=; b=NkdT7UbOE6NQ0Qne/eZlho4QAnl7jb+qXEQIbn8IyDHeCA9ODZNUhvphtsxKnRcGtG zBKxvnltBoWyl7rQbvDaMYQCWY06knitla8s4UDBEXZ2h8eo8Uirhxq93sPz9XrlWeC8 UCh3ccfDFAF2Q2UniLS5RPWX92VDYekD05sVDMxbEy4+seRL0SsxXR5la8IEFcCaz3HQ 2XTH/f8uYV+kPtSYrVv6HqyxwyfnwPj4W6w3636Cng6lareGuoz51zib8R0b1TcwTx1o THkEOqeVOo1OPZBgmpG2MeBpAF/pfHpt4BGANkMPyGl6EMt/4b1PItJvRslwqnI6ifbr 8x2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=QWGRnIk8sgGCXFrC7NxktpD8dGBcLScYt+ZcurceEzQ=; b=Cwe1VWdhl1D+wZCyPXvNDG+FQTBbD4J2wvzxTcILyAaVw+ZjzAdqO8ZsDIwbMCxDs5 /OBDGEXLY1ia71cEVILTgbxAzHte71qyH9NUHMIhQVWSfomqP3z6kVaXoqZSRHNRansY VPWjEdBi7RRFyrsVGo6xNGz6lkpOlKdfjyeaqMyg7SzbAJyGjdTrOeZUxmDbLTpnYOAq I4pHHblpZT2pQPJFF7sEfkSjqXi/3JECEHUuulOS05XVBZ8CW+Vul+3i6Tnw44XzZIn7 M7SlaVCzQmcEHfdhXz310YONd0ISafAd0bB6c2i3JPziYwJb7MJX9jem5esVgtckMw3z wG1Q== X-Gm-Message-State: ACgBeo26GPR5rrLvuYVXW1x9tJ9pfMtEigVpYWgvEZ9PdlyyxVQfYp0j EQsB1TdT862J7eGjleeEap5kW2kzS66CK+7z02MVdq1qxatz3g== X-Google-Smtp-Source: AA6agR4B+3RWrgylLZhAsZZN8jg6rgxa070t9NsRWBfeyw91Rk71QxtnN/RkFG05szfMdpuymSyx6OscPUM6tVMfhjk= X-Received: by 2002:a05:6102:34ea:b0:390:3bc9:8833 with SMTP id bi10-20020a05610234ea00b003903bc98833mr7801837vsb.72.1661332729325; Wed, 24 Aug 2022 02:18:49 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Anna Tauzzi Date: Wed, 24 Aug 2022 11:18:38 +0200 Message-ID: Subject: Re: Secondary process stuck in rte_eal_memory_init To: Antonio Di Bacco Cc: users@dpdk.org Content-Type: multipart/alternative; boundary="0000000000002f7d6805e6f92b81" X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --0000000000002f7d6805e6f92b81 Content-Type: text/plain; charset="UTF-8" Already tried the first suggestion with no luck, the secondary always gets stuck: #0 0x00007fc6d3eb05ab in flock () at ../sysdeps/unix/syscall-template.S:78 #1 0x00007fc6d3ba1343 in sync_walk () from /usr/local/lib/librte_eal.so.22 #2 0x00007fc6d3b8402b in rte_memseg_list_walk_thread_unsafe () from /usr/local/lib/librte_eal.so.22 #3 0x00007fc6d3ba18bf in eal_memalloc_sync_with_primary () from /usr/local/lib/librte_eal.so.22 #4 0x00007fc6d3ba24b5 in rte_eal_hugepage_attach () from /usr/local/lib/librte_eal.so.22 #5 0x00007fc6d3b848f1 in rte_eal_memory_init () from /usr/local/lib/librte_eal.so.22 #6 0x00007fc6d3b782aa in rte_eal_init.cold () from /usr/local/lib/librte_eal.so.22 For the second info: if I prevent the primary to allocate on the NUMA where secondary is running, then, the secondary doesn't get stuck. Il giorno mer 24 ago 2022 alle ore 11:14 Antonio Di Bacco < a.dibacco.ks@gmail.com> ha scritto: > Can you try launching the secondary with some delay in order not to > overlap with memory allocations done in the primary? > Is your primary allocating memory on NUMA 0 where the secondary is running? > > On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi wrote: > > > > I have a primary process that spawns a secondary process.Primary is on > NUMA 1 while secondary on NUMA 0. > > The secondary process starts up but when calling rte_eal_init it gets > stuck with this backtrace: > > > > flock() > > sync_walk() > > rte_memseg_list_walk_thread_unsafe() > > eal_memalloc_sync_with_primary() > > rte_eal_hugepage_attach() > > rte_eal_memory_init() > > rte_eal_init.cold() > > > > While starting the secondary, it is possible that the primary is > allocating memory on different NUMAs. I'm saying this because if in the > primary I replace the dpdk memory allocation function (rte_zalloc...) with > a plain memalign I don't get this problem. > > > > > > > --0000000000002f7d6805e6f92b81 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Already tried the first suggestion with no luck, the secon= dary always gets stuck:

#0 =C2=A00x00007fc6d3eb05ab in f= lock () at ../sysdeps/unix/syscall-template.S:78
#1 =C2=A00x00007fc6d3ba= 1343 in sync_walk () from /usr/local/lib/librte_eal.so.22
#2 =C2=A00x000= 07fc6d3b8402b in rte_memseg_list_walk_thread_unsafe () from /usr/local/lib/= librte_eal.so.22
#3 =C2=A00x00007fc6d3ba18bf in eal_memalloc_sync_with_p= rimary () from /usr/local/lib/librte_eal.so.22
#4 =C2=A00x00007fc6d3ba24= b5 in rte_eal_hugepage_attach () from /usr/local/lib/librte_eal.so.22
#5= =C2=A00x00007fc6d3b848f1 in rte_eal_memory_init () from /usr/local/lib/lib= rte_eal.so.22
#6 =C2=A00x00007fc6d3b782aa in rte_eal_init.cold () from /= usr/local/lib/librte_eal.so.22

For the second = info:
if I prevent=C2=A0 the primary to allocate on the NUMA wher= e=C2=A0 secondary is running, then, the secondary doesn't get stuck.




Il giorno mer 24 ago 2022 a= lle ore 11:14 Antonio Di Bacco <a.dibacco.ks@gmail.com> ha scritto:
Can you try launching the secondary with some = delay in order not to
overlap with memory allocations done in the primary?
Is your primary allocating memory on NUMA 0 where the secondary is running?=

On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi <admin@argonnetech.net> wrote:
>
> I have a primary process that spawns a secondary process.Primary is on= NUMA 1 while secondary on NUMA 0.
> The secondary process starts up but when calling rte_eal_init it gets = stuck with this backtrace:
>
> flock()
> sync_walk()
> rte_memseg_list_walk_thread_unsafe()
> eal_memalloc_sync_with_primary()
> rte_eal_hugepage_attach()
> rte_eal_memory_init()
> rte_eal_init.cold()
>
> While starting the secondary, it is possible that the primary is alloc= ating memory on different NUMAs. I'm saying this because if in the prim= ary I replace the dpdk memory allocation function (rte_zalloc...) with a pl= ain memalign I don't get this problem.
>
>
>
--0000000000002f7d6805e6f92b81--