From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2ED7DA054F for ; Wed, 7 Sep 2022 13:01:23 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B785F40143; Wed, 7 Sep 2022 13:01:22 +0200 (CEST) Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by mails.dpdk.org (Postfix) with ESMTP id 7766A400D6 for ; Wed, 7 Sep 2022 13:01:21 +0200 (CEST) Received: by mail-pj1-f42.google.com with SMTP id i15-20020a17090a4b8f00b0020073b4ac27so7663006pjh.3 for ; Wed, 07 Sep 2022 04:01:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=Fv9scyxLIykyQcn7U/LnhcZaHel7Fic5KNIyMGH6pJM=; b=XXykaTFOZf4sORlhc9RTkJUyuG9GW9KPVZcqk6bv/N6lNZAbqlsIHsSWSsvCE7PG/8 e76NUH33wJoFa0Iibv3d0666eFPUakH7JOKEYPDb5RVgdNz8fqeNSbZUUeTogkzY/7TR eKCQMUTaSeAb4J1GIupDwDwR75ScabLtjIMj4PMVKqZKPU3IfXu/qgYHmkew3zNj1hPx kCkvpPehjo6Xqv7SBu9JPEXjXDk8hjeEvKqM9LM3ak82nXbokgHcp+CyTBHRdPTT0HOj Pq/h1Z4n2ziz5YID8gsnb5f4zYyyqRYUD57RnyNj/QsjfNXkyjuHHeNWU1FgZ/v+m8Li rdRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=Fv9scyxLIykyQcn7U/LnhcZaHel7Fic5KNIyMGH6pJM=; b=CVJZ4yZgaPnVkEkQSKlmcXzvpKvySf0/B/lie45BLrFGOjO8w+lLlgp+YfVkfMlkuG zqTfhZJAJbT/I+sPheYWhctyfn1La2NnohV9MwC5XCJyKhzY4hnrA7043XhxHhv5FznD W3nvMsrY5bmLc8cniuejBXSLUeDcjQYjQkZ07gNDHM9yrvKqiDn+rIV/dikrTaRZmqwm Y7QrUw7VO2DTytxFkQyburzUjxt9Ae8yvt2O776xfq8H72CH2Ma27W6eRueTIVp3oczt cyrxwRSjhFmErIckAfRMyP7mUXrlScgi9uxzoiHibyCHrJRqIrU1ZJt08jOpP3xci2rs oB2w== X-Gm-Message-State: ACgBeo1+MZzPgWyDlYbL+Cmh4G91DZih4RI+geGxdTNy4z8K66+T2Avu vDJL1GQQFLQlgaD8CeROJjobvii/4fUiHz1iUQmNfqirDBM= X-Google-Smtp-Source: AA6agR7RpaxuFelbWDTJF3k2IWmPdypaBcwlkWgZd0hru0q88rjF5ji4rM1S9Z8MjsqIzWv/cq90ehNpb8FQGQbsk24= X-Received: by 2002:a17:90b:384f:b0:1f4:ee87:9523 with SMTP id nl15-20020a17090b384f00b001f4ee879523mr3329186pjb.100.1662548480518; Wed, 07 Sep 2022 04:01:20 -0700 (PDT) MIME-Version: 1.0 References: <20220727153055.0907ea35@sovereign> <20220728161020.179f1fc7@sovereign> In-Reply-To: From: MOD Date: Wed, 7 Sep 2022 14:00:30 +0300 Message-ID: Subject: Re: Mempool bigger than 1 page causes segmentation fault To: Dmitry Kozlyuk Cc: users@dpdk.org Content-Type: multipart/alternative; boundary="0000000000009a71bf05e8143bc2" X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --0000000000009a71bf05e8143bc2 Content-Type: text/plain; charset="UTF-8" Hello, This is an update to this bug research, as I have had time to look at it again I have created an example program (code below) and tried them with debug & rte_malloc_debug using dpdk 20.11 and 22.07 the results are the same - and will also be below I now suspect it could be a bug in DPDK dynamic memory mode (it doesn't happen in legacy mode) and may be related to a long allocation time causing a timeout The application code is very minimal, and should at the most get an error at `rte_mempool_create` more information about the system, firmware and DPDK compilation can be provided if it may be related to that The primary process code: #include #include int main(void) { const char* flags[] = {"-l","1","--no-pci"}; rte_eal_init(sizeof(flags) / sizeof(char*), std:: const_cast(flags)); printf("primary started"); while (true) {} return 0; } The secondary process code: #include #include #include int main( void) { const char* flags[] = {"-l","1","--no-pci", "--proc-type", "secondary"}; rte_eal_init(sizeof(flags) / sizeof(char*), std::const_cast< char **>(flags)); rte_mempool* pool = rte_mempool_create("my_pool", 150000000, 40, 0, 0, NULL, NULL, NULL, NULL, 0, 0); // 150M elements * 40B = 6GB mempool if (pool) { printf("allocation success"); } else { printf("allocation failure"); } fflush(stdout); return 0; } The result in the primary process: EAL: Detected CPU lcores: 96 EAL: Detected NUMA nodes: 2 EAL: Detected shared linkage of DPDK EAL: Multi-Process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' TELEMETRY: No legacy callbacks, legacy socket not created primary started The results in the secondary process: EAL: Detected CPU lcores: 96 EAL: Detected NUMA nodes: 2 EAL: Detected shared linkage of DPDK EAL: Multi-Process socket /var/run/dpdk/rte/mp_socket_....... EAL: Selected IOVA mode 'PA' EAL: Request timed out // <---------------This is the rte_mempool_create EAL: Request timed out EAL: Request timed out *** crashes with retcode 139 The main process looks find from the CLI, but the secondary will not be able to start again (stuck at EAL: Selected IOVA mode 'PA') What should my next step be? As far as debugging / solving / reporting this? On Sun, Jul 31, 2022 at 2:32 PM MOD wrote: > Hi, > The issue is probably not with my code but with the compilation on DPDK, > because I got it to repeat on a separated program, > where I setup an EAL with the flags `-l 1 --no-pci` > (just rte_eal_init and rte_mempool_create) > > this seems to be a memseg_list issue > When running the program above, and requesting large amounts of memory > (200M elements of 8 bytes each) > I don't crash, but get `couldnt find suitable memseg_list` error > This also happens when trying to allocate from the main process > > This error is probably related to these parameters from rte_config.h: > /* EAL defines */ > #define RTE_MAX_HEAPS 32 > #define RTE_MAX_MEMSEG_LISTS 128 > #define RTE_MAX_MEMSEG_PER_LIST 8192 > #define RTE_MAX_MEM_MB_PER_LIST 32768 > #define RTE_MAX_MEMSEG_PER_TYPE 32768 > #define RTE_MAX_MEM_MB_PER_TYPE 65536 > #define RTE_MAX_MEMZONE 2560 > #define RTE_MAX_TAILQ 32 > > > I could not find a good documentation on how to calculate the proper > values for these parameters > > > On Thu, Jul 28, 2022 at 4:10 PM Dmitry Kozlyuk > wrote: > >> 2022-07-28 15:05 (UTC+0300), MOD: >> > Hi, Thanks for the response! >> > the DPDK version is 20.11.4 >> > >> > the stack trace is: >> > malloc_elem_can_hold() // librte_eal.so.21 >> > find_suitable_element() // librte_eal.so.21 >> > malloc_heap_alloc() // librte_eal.so.21 >> > rte_memzone_reserve_thread_safe() // librte_eal.so.21 >> > rte_mempool_populate_default() // librte_mempool.so.21 >> > rte_mempool_create() // librte_mempool.so.21 >> >> Is this all the info---no arguments, no lines? >> You're using a debug build of DPDK, right? >> >> > RTE_MALLOC_DEBUG doesn't seem to change anything, >> > but I noticed that I have been wrong about the allocation succeeding >> > (not because of RTE_MALLOC_DEBUG) >> > >> > the error happens right on the first attempt. >> >> Did you try running with ASAN (meson -Db_sanitize=address)? >> >> Can you provide a short code to reproduce >> or does it happen only in a larger program? >> >> Please keep Cc: users@dpdk.org so that more people can join if they want. >> > --0000000000009a71bf05e8143bc2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,
This is an update to this bug research, as I hav= e had time to look at it again
I have created an example program (code = below) and tried them with debug & rte_malloc_debug using dpdk 20.11 an= d 22.07
the results are the same - and will also be below
I n= ow suspect it could be a bug in DPDK dynamic memory mode (it doesn't ha= ppen in legacy mode)
and may be related to a long allocation time= causing a timeout
The application code is very minimal, and shou= ld at the most get an error at `rte_mempool_create`

more information about the system, firmware and DPDK compilation can be p= rovided if it may be related to that

The primary process code= :
#include <rte_eal.h> #include <memory> int main(void<= /span>) = { const char*= flags[] =3D {"-l",&quo= t;1","--no-pci"}; rte_eal_init(sizeof(flags) / sizeof(char*), std::<= span class=3D"gmail-sc5" style=3D"font-weight:bold;color:rgb(0,0,255);font-= family:"Courier New";font-size:10pt">const_cast<char **>(f= lags)); printf(&quo= t;primary started"); while (true<= span class=3D"gmail-sc10" style=3D"font-weight:bold;color:rgb(0,0,128);font= -family:"Courier New";font-size:10pt">) {} return 0= ; }
=




=



The secondary p= rocess code:
#include <rte_eal.h> #include <rte_mempool.h> #include <memory> int main(void<= /span>) = { const char*= flags[] =3D {"-l",&quo= t;1","--no-pci", <= /span>"--proc-type", "secondary"}; rte_eal_init(sizeof(flags) / sizeof(char*), std::<= span class=3D"gmail-sc5" style=3D"font-weight:bold;color:rgb(0,0,255);font-= family:"Courier New";font-size:10pt">const_cast<char **>(f= lags)); rte_mempool* pool <= /span>=3D rte_mempool_create= ("my_pool", 150000000, 40, 0, 0, NULL, NULL, NULL, NULL, 0,= ); // 150M eleme= nts * 40B =3D 6GB mempool if (pool) { printf(= "allocation success"); } else { printf(= "allocation failure"); } fflush(stdout); return 0= ; }














The resul= t in the primary process:
EAL: Detected CPU lcores: 96
=
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of = DPDK
EAL: Multi-Process socket /var/run/dpdk/rte/mp_socket
<= div>EAL: Selected IOVA mode 'PA'
TELEMETRY: No legacy cal= lbacks, legacy socket not created
primary started

<= /div>
The results in the secondary process:
EA= L: Detected CPU lcores: 96
EAL: Detected NUMA nodes: = 2
EAL: Detected shared linkage of DPDK
EAL: Multi-Proce= ss socket /var/run/dpdk/rte/mp_socket_.......
EAL: Selected IOVA = mode 'PA'
EAL: Request timed out // <-----------= ----This is the rte_mempool_create
EAL: Request timed out
EAL: Request timed out
*** crashes with retcode 139

The main process looks find from the CLI, but the se= condary will not be able to start again (stuck at EAL: Selected IOVA mode &= #39;PA')

What should my next step be? As far a= s debugging / solving / reporting this?


On Sun, Jul 31, 2022 at 2:= 32 PM MOD <sdk.register@gmail.= com> wrote:
Hi,
The issue is probably not with my code but with= the compilation on DPDK,
because I got it to repeat on a separat= ed program,
where I setup an EAL=C2=A0with the flags `-l 1 --no-p= ci`
(just rte_eal_init and rte_mempool_create)

this seems to be a memseg_list issue
When running the prog= ram=C2=A0 above, and requesting large amounts of memory (200M elements of 8= bytes each)
I don't crash, but get `couldnt find suitable me= mseg_list` error
This also happens=C2=A0when trying to allocate f= rom=C2=A0 the main process

This error is probably = related to these parameters from rte_config.h:
/* EAL = defines */
#define RTE_MAX_HE= APS 32
= #= define RTE_MAX_MEMSEG_LISTS 128
#define RTE_MAX_MEMSEG_PER_LIST 8192
#define RTE_MAX_MEM_MB_PER_= LIST 32768
#define RTE_MAX_ME= MSEG_PER_TYPE 32768
#define RTE_MAX_MEM_MB_PER_TYPE 65536
#define <= span style=3D"box-sizing:border-box">RTE_MAX_MEMZONE 2560
#define RTE_MAX_T= AILQ 32

I could not find a good documentation on how to= calculate the proper values for these parameters


On Th= u, Jul 28, 2022 at 4:10 PM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
2022-07-28 15:05 (= UTC+0300), MOD:
> Hi, Thanks for the response!
> the DPDK version is 20.11.4
>
> the stack trace is:
> malloc_elem_can_hold() // librte_eal.so.21
> find_suitable_element() // librte_eal.so.21
> malloc_heap_alloc()=C2=A0 // librte_eal.so.21
> rte_memzone_reserve_thread_safe()=C2=A0 // librte_eal.so.21
> rte_mempool_populate_default()=C2=A0 // librte_mempool.so.21
> rte_mempool_create() // librte_mempool.so.21

Is this all the info---no arguments, no lines?
You're using a debug build of DPDK, right?

> RTE_MALLOC_DEBUG doesn't seem to change anything,
> but I noticed that I have been wrong about the allocation succeeding > (not because of RTE_MALLOC_DEBUG)
>
> the error happens right on the first attempt.

Did you try running with ASAN (meson -Db_sanitize=3Daddress)?

Can you provide a short code to reproduce
or does it happen only in a larger program?

Please keep Cc: users@d= pdk.org so that more people can join if they want.
--0000000000009a71bf05e8143bc2--