From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 27684A0540 for ; Wed, 24 Aug 2022 12:11:42 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AB78C40DFD; Wed, 24 Aug 2022 12:11:41 +0200 (CEST) Received: from mail-vs1-f51.google.com (mail-vs1-f51.google.com [209.85.217.51]) by mails.dpdk.org (Postfix) with ESMTP id E9E6740DDE for ; Wed, 24 Aug 2022 12:11:39 +0200 (CEST) Received: by mail-vs1-f51.google.com with SMTP id k2so17011045vsk.8 for ; Wed, 24 Aug 2022 03:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=argonnetech.net; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=V2ZXbPEpj3o2+xEEUahuViLBwVm8W7yD3mExYTv8uWg=; b=nKBa1P3uLxjHU1UIK7r34SzlTL+rsIbQNRunpp1K/8CpiHZzwJ5PmTVrWKL8dp2gtX lOy6cdQxH5ycV5zXGkp/0nqZpL4aazL9xiEAJW/O0MkHy0H/eG5BuUCgVKIIMROQU+WE hbezMvzleEbn0aB0/wCZFiKWfvpFFdT9A6xr7akc/wdjBfwWtGONC+TsOzIHl560VC2g yOjIxUor7kl1evmexoofPJd7lX3bkUecLB3x9Z75Z4/l8PfX8tWWRubwXwIlDFBMJbyL 90GEqYcq2MSPY0I8W/SCtqiy7lQi9zJ+9MxRzLc0a3AEXRmewAF/7DPnb6wWesjLnJoq T1UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=V2ZXbPEpj3o2+xEEUahuViLBwVm8W7yD3mExYTv8uWg=; b=NEnhMsZDK/U81x27QxtSjIEIqlkEz1fy8UJHjnZcryW29CuED9OMKBQ75ih1Lg91fB jNO+qEsIsxehYWDUdmszjn7YMOlxoVwPIIvHA5n1xqzEwFejK2nGLGSjms6SSZJmnTGQ sLzZWEPPgPPfkWRMMfN4N6PscOmZOsNDScuw0wiuhuQKJsdWwheW3ZV491uBJwMM24xU OTv3xAD9lMi7dJt9+GSuaxHkk29FNrbry+dowmhaGTza91ZX6zbQd8i7LEk374t8yVsw 2R+TZJMqR61wXngZ0QJJvWRUcw3lcifqkbcA8bgH6ACcrhS6sBPCVsHSZldlFGz5kWD3 3H/g== X-Gm-Message-State: ACgBeo1bE4DLOp2++UMwKVTuA4SBTGMs9g++bVdNmZ5pT2aw1OdEsp4D 5V0i7bvhEHKAnLr7qbmWq/f44nouKHbLwMk4DdipPrVNLhcrVg== X-Google-Smtp-Source: AA6agR5BvAC8/EVht8DCEE+zsA3pvCH1yAQprseIyto3QYQJeK5YsKwX6vcplAFFCwF/5e/rgFzCzVdjpOWUaHMTWF0= X-Received: by 2002:a05:6102:c04:b0:390:9426:a176 with SMTP id x4-20020a0561020c0400b003909426a176mr716041vss.24.1661335899251; Wed, 24 Aug 2022 03:11:39 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Anna Tauzzi Date: Wed, 24 Aug 2022 12:11:28 +0200 Message-ID: Subject: Re: Secondary process stuck in rte_eal_memory_init To: Antonio Di Bacco Cc: users@dpdk.org Content-Type: multipart/alternative; boundary="00000000000020c20605e6f9e8e2" X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --00000000000020c20605e6f9e8e2 Content-Type: text/plain; charset="UTF-8" Using lslocks command on Linux I see that the primary has a lock on /mnt/huge2M and the secondary is waiting for a lock on the same directory. SECONDARY 2416270 FLOCK WRITE* 0 0 0 /mnt/huge2M... 2416174 PRIMARY 2416174 FLOCK WRITE 0 0 0 /mnt/huge2M... Is a PRIMARY supposed to hold a permanent lock on a /mnt/huge2M ? Il giorno mer 24 ago 2022 alle ore 11:18 Anna Tauzzi ha scritto: > Already tried the first suggestion with no luck, the secondary always gets > stuck: > > #0 0x00007fc6d3eb05ab in flock () at ../sysdeps/unix/syscall-template.S:78 > #1 0x00007fc6d3ba1343 in sync_walk () from /usr/local/lib/librte_eal.so.22 > #2 0x00007fc6d3b8402b in rte_memseg_list_walk_thread_unsafe () from > /usr/local/lib/librte_eal.so.22 > #3 0x00007fc6d3ba18bf in eal_memalloc_sync_with_primary () from > /usr/local/lib/librte_eal.so.22 > #4 0x00007fc6d3ba24b5 in rte_eal_hugepage_attach () from > /usr/local/lib/librte_eal.so.22 > #5 0x00007fc6d3b848f1 in rte_eal_memory_init () from > /usr/local/lib/librte_eal.so.22 > #6 0x00007fc6d3b782aa in rte_eal_init.cold () from > /usr/local/lib/librte_eal.so.22 > > For the second info: > if I prevent the primary to allocate on the NUMA where secondary is > running, then, the secondary doesn't get stuck. > > > > > Il giorno mer 24 ago 2022 alle ore 11:14 Antonio Di Bacco < > a.dibacco.ks@gmail.com> ha scritto: > >> Can you try launching the secondary with some delay in order not to >> overlap with memory allocations done in the primary? >> Is your primary allocating memory on NUMA 0 where the secondary is >> running? >> >> On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi >> wrote: >> > >> > I have a primary process that spawns a secondary process.Primary is on >> NUMA 1 while secondary on NUMA 0. >> > The secondary process starts up but when calling rte_eal_init it gets >> stuck with this backtrace: >> > >> > flock() >> > sync_walk() >> > rte_memseg_list_walk_thread_unsafe() >> > eal_memalloc_sync_with_primary() >> > rte_eal_hugepage_attach() >> > rte_eal_memory_init() >> > rte_eal_init.cold() >> > >> > While starting the secondary, it is possible that the primary is >> allocating memory on different NUMAs. I'm saying this because if in the >> primary I replace the dpdk memory allocation function (rte_zalloc...) with >> a plain memalign I don't get this problem. >> > >> > >> > >> > --00000000000020c20605e6f9e8e2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Using lslocks command on Linux I see that the primary has = a lock on=C2=A0 /mnt/huge2M and the secondary is waiting for a lock on the same directory.<= div>
SECONDARY=C2=A0 2416270 FLOCK =C2=A0 =C2=A0 =C2=A0WRITE*= 0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 /mnt/huge2M... =C2= =A0 =C2=A02416174
PRIMARY=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02416174 FLOCK= =C2=A0 =C2=A0 =C2=A0WRITE =C2=A00 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A00 /mnt/huge2M...

Is a PRIMARY suppos= ed to hold a permanent lock on a /mnt/huge2M ?



Il giorno mer 24 ago 2022 alle ore 11:18 Anna Tauzzi = <admin@argonnetech.net> = ha scritto:
Already tried the first suggestion with no luck, the secondary= always gets stuck:

#0 =C2=A00x00007fc6d3eb05ab in flock= () at ../sysdeps/unix/syscall-template.S:78
#1 =C2=A00x00007fc6d3ba1343= in sync_walk () from /usr/local/lib/librte_eal.so.22
#2 =C2=A00x00007fc= 6d3b8402b in rte_memseg_list_walk_thread_unsafe () from /usr/local/lib/libr= te_eal.so.22
#3 =C2=A00x00007fc6d3ba18bf in eal_memalloc_sync_with_prima= ry () from /usr/local/lib/librte_eal.so.22
#4 =C2=A00x00007fc6d3ba24b5 i= n rte_eal_hugepage_attach () from /usr/local/lib/librte_eal.so.22
#5 =C2= =A00x00007fc6d3b848f1 in rte_eal_memory_init () from /usr/local/lib/librte_= eal.so.22
#6 =C2=A00x00007fc6d3b782aa in rte_eal_init.cold () from /usr/= local/lib/librte_eal.so.22

For the second info= :
if I prevent=C2=A0 the primary to allocate on the NUMA where=C2= =A0 secondary is running, then, the secondary doesn't get stuck.
<= div>



Il giorno mer 24 ago 2022 alle = ore 11:14 Antonio Di Bacco <a.dibacco.ks@gmail.com> ha scritto:
Can you try launching the second= ary with some delay in order not to
overlap with memory allocations done in the primary?
Is your primary allocating memory on NUMA 0 where the secondary is running?=

On Tue, Aug 23, 2022 at 4:54 PM Anna Tauzzi <admin@argonnetech.net> wrote:
>
> I have a primary process that spawns a secondary process.Primary is on= NUMA 1 while secondary on NUMA 0.
> The secondary process starts up but when calling rte_eal_init it gets = stuck with this backtrace:
>
> flock()
> sync_walk()
> rte_memseg_list_walk_thread_unsafe()
> eal_memalloc_sync_with_primary()
> rte_eal_hugepage_attach()
> rte_eal_memory_init()
> rte_eal_init.cold()
>
> While starting the secondary, it is possible that the primary is alloc= ating memory on different NUMAs. I'm saying this because if in the prim= ary I replace the dpdk memory allocation function (rte_zalloc...) with a pl= ain memalign I don't get this problem.
>
>
>
--00000000000020c20605e6f9e8e2--