From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6D2494627D for ; Thu, 20 Feb 2025 15:19:33 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id F2D8240041; Thu, 20 Feb 2025 15:19:32 +0100 (CET) Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by mails.dpdk.org (Postfix) with ESMTP id 7CA864003C for ; Thu, 20 Feb 2025 15:19:31 +0100 (CET) Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-2fbf77b2b64so2018896a91.2 for ; Thu, 20 Feb 2025 06:19:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740061169; x=1740665969; darn=dpdk.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=YriLNfd6HprVoeLRKuUpYV9sOWvjNhx67Z0pRFLj4/I=; b=QT2EoRQGgwHiW7QuSlHAYVN9AGOYfWJKYI6bMsvq4YESIJSIsxsR2d8A61FvC+j0pW 8aMHCX/KxL0BLTsgYCxlKr21GtGl8smV7pd16YiEKx4EHuxQ23ovEP9WDU6e+TCsdDAa vrkzMtpeIGBT0qn8BFW9owUtk1BE55alSajyv3ZHuJdxRPs1ZTXJOL3qj6cGNHhqKjBs ZMRSo0iMnh9Jbfc+Qz8IZqjgEjuEJQYlafVcGFMlfQgmIHf/snNp4n2kLsgnb0bFfH6L yLec9wyButQI8Zd8B5pJ4k05jjfxcwZPeUNCzVFG+cWrsjTUubGFgcf79blvPC1QNdxu Mw8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740061169; x=1740665969; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YriLNfd6HprVoeLRKuUpYV9sOWvjNhx67Z0pRFLj4/I=; b=U1zVtjy/OkUthtzY+/ZLjbHoNC5P93ATj9+h6SueP/4q8NMgw9LlCnzFImoEnfeLPP FK7/fuYa3RSxt0Y/s7gt9qcXFTNFRsM+qNToFNM/s7C4Me1X8MrdIghgNEY+nDnN32Uj JHP79jd/FtizfqrDW6KGGdPTvjgQRBAIzPZKBA6RYFZ33cPabjDtEKiiLgmE+3kGQKpy gQNqQNkABqDkOws/QywjI8wM7XIBAeANsFffaoRyj615Wj9bHNbkoo3MlLZOkff3qKty NTOcvufgdNFhSqys3aXJDDzBRzEmQq4IDYptGSs31ukgXKboXTJVJYj8V0L7ChGhhGcM IqCw== X-Gm-Message-State: AOJu0YzUzzX44wtLds54KFuUgrKfR+02hwxboYewOK8HOK9cysRHthT7 /18QEyBJ0+6Vocp43Lzfz1vDFmqoDQJYgsbVZkEaq4DsAx43KLYUclwUS9Q56LeCY60U5jVy5J5 0O2Nq0c9q0lv7+a2NasT8Irm0uJCKWuP1 X-Gm-Gg: ASbGncuZA/oI+xk3xmS1obaDnOOaTfJD5ClYBbe7CSCw5kQUMd6q7UNdiVLC6NG41uq JjYN0xhQ6OHI/Ceb1jRFqaZTq0A/vpgzLIJf3CVW5NIzU4XgPHwWicz0B8I6JYHJx4PmP3fc= X-Google-Smtp-Source: AGHT+IHOcCzuPHbh/vDekbSP6I4tre13gAKm7xqyQsva295+Lvxbogd4d7ibeJLonVNuTXutCy2GZhNzipdNoth6RDs= X-Received: by 2002:a17:90b:2252:b0:2fa:17d2:166 with SMTP id 98e67ed59e1d1-2fcb5ac2254mr11735015a91.31.1740061168796; Thu, 20 Feb 2025 06:19:28 -0800 (PST) MIME-Version: 1.0 References: <20250220012828.21ae8b76@sovereign> <20250220142143.7cbecabd@sovereign> In-Reply-To: <20250220142143.7cbecabd@sovereign> From: Lucas Date: Thu, 20 Feb 2025 15:19:11 +0100 X-Gm-Features: AWEUYZn0OrK-lkpN1YHSUdlp-QzQ9G6N0aQE34LK6-KDz1aVJpLyctg1cNhmKL0 Message-ID: Subject: Re: Increase DPDK's virtual memory to more than 512 GiB To: users@dpdk.org Content-Type: multipart/alternative; boundary="000000000000da8d09062e938fcf" X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --000000000000da8d09062e938fcf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Dmitry, My apologies for the private reply. I am quite new to the mailing list. I will do a profile later of the allocation. As for the issue, at 550 GB so 252'757'352 MBUFs (using default mbuf buf size), it works now, with ring allocation. However, the physical memory usage now goes up a lot and I end up swapping. It appears that not all memory is in hugepages (not all pages are filled) and that perhaps the kernel also allocates more memory. I have 755 GiB RAM available, so 600 GB of mempool is pushing it. I realise now that I also have some private data in the mempool, so the figure of 550 GB is plainly wrong. In reality, one object is: - rte_mempool_calc_obj_size gives: total_size =3D 2240 bytes - Private data per mbuf (alignment included) is: 48 bytes So actual memory consumption is: 252'757'352 MBUFs =C3=97 (48 + 2240 bytes)= =3D 578'308'821'376 bytes ~ 578 GB That is at least 28 GB more. I now fixed my program to address this issue and when requesting 500 GB, it will take the private data and headroom into account. I will update later with some memory statistics and a profile. On Thu, Feb 20, 2025 at 12:21=E2=80=AFPM Dmitry Kozlyuk wrote: > Hi Lucas, > > Please don't send private replies to discussions in the mailing list. > > 2025-02-20 12:00 (UTC+0100), Lucas: > > Hi Dmitry, > > > > Thank you for your detailed instructions. > > I have followed them in the following way: > > > > - In config/rte_config.h set RTE_MAX_MEM_MB_PER_LIST to 524288 > > - In config/rte_config.h set RTE_MAX_MEM_MB_PER_TYPE to 524288 > > - To change RTE_MAX_MEM_MB, I have to change the configure set in th= e > > meson build system. To do so, I changed > > > https://elixir.bootlin.com/dpdk/v24.11.1/source/config/meson.build#L362 > > I replaced "dpdk_conf.set('RTE_MAX_MEM_MB', 524288)" with > > "dpdk_conf.set('RTE_MAX_MEM_MB', 8388608)". As I have 8 NUMA nodes, > with 2 > > huge pages sizes: 8 NUMA nodes =C3=97 2 hugepage sizes =C3=97 512 Gi= B =3D 8388 GiB > > > > With these changes, my program can create a mempool with 275'735'294 > > MBUFS, comprising 2'176 (bytes of MBUF size) =C3=97 275'735'294 =3D > > 599'999'999'744 bytes ~ 600 GB, but fails later, as I also need an extr= a > > rte_ring to hold pointers to the MBUFs. In HTOP, a virtual memory size = of > > 4'097G is reported. > > However, with a smaller amount, 229'779'411 MBUFs, it works (i.e. 500 > GB). > > And I can also allocate a ring of the same size (I use RING_F_EXACT_SZ, > so > > in reality it is more). > > > > I have tried increasing the limits further to be able to allocate more: > > > > - In config/rte_config.h set RTE_MAX_MEM_MB_PER_LIST to 1048576 > > - In config/rte_config.h set RTE_MAX_MEM_MB_PER_TYPE to 1048576 > > - Set RTE_MAX_MEM_MB =3D 16'777'216 > > > > The virtual memory allocated is now 8192G, 8192G, and I can allocate 60= 0 > GB > > for the mempool (275'735'294 MBUFs), but the ring allocation fails (fai= ls > > with 'Cannot allocate memory'). Allocation of the mempool now takes sev= en > > minutes. > > It would be interesting to profile this one your issue is resolved. > I expect that hugepage allocation takes only about 10 seconds of this, > while the rest is mempool initialization. > > > How would I be able to also allocate a ring of the same size? > > I verified that the number of MBUFs I need rounded up to the next power > of > > 2 is still smaller than the max size of an unsigned int on my platform > > (x86_64, so 32 bit unsigned int). > > I have 670 hugepages of 1 GB available. Is this too little? In principl= e > > the ring takes 64 bits =C3=97 # entries =3D memory. In this case, that = would be: > > next power of 2 is 536'870'912 =C3=97 8 bytes (64 bites) =3D 4.3 GB. > > With 670 GB available, and roughly 600 GB for the mempool, this should > fit. > > Could it be that supporting structures take the rest of the memory? > > Mempool adds headers and padding to objects within, > so it probably takes more memory than calculated. > You can use rte_mempool_mem_iter() and rte_mempool_calc_obj_size() to > check. > You can check exact memory usage with rte_malloc_get_socket_stats(). > --000000000000da8d09062e938fcf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Dmitry,

My apologies for = the private reply. I am quite new to the mailing list.
I will do = a profile later of the allocation.

As for the issu= e, at 550 GB so 252'757'352 MBUFs (using default mbuf buf size), it= works now, with ring allocation. However, the physical memory usage now go= es up a lot and I end up swapping. It appears that not all memory is in hug= epages (not all pages are filled) and that perhaps the kernel also allocate= s more memory. I have 755 GiB RAM available, so 600 GB of mempool is pushin= g it.
I realise now that I also have some private data in the= mempool, so the figure of 550 GB is plainly wrong. In reality, one object = is:
  • rte_mempool_calc_obj_size gives: total_size =3D 2240 = bytes
  • Private data per mbuf (alignment included) is:=C2=A0 48 b= ytes
So actual memory consumption is: 252'757'352 MBU= Fs =C3=97 (48 + 2240 bytes) =3D 578'308'821'376 bytes ~ 578 GB<= /div>
That is at least 28 GB more.

I now fixed= my program to address this issue and when requesting 500 GB, it will take = the private data and headroom into account.

I = will update later with some memory statistics and a profile.


=
On Thu, Feb 20, 2025 at 12:21=E2=80= =AFPM Dmitry Kozlyuk <dmitry= .kozliuk@gmail.com> wrote:
Hi Lucas,

Please don't send private replies to discussions in the mailing list.
2025-02-20 12:00 (UTC+0100), Lucas:
> Hi Dmitry,
>
> Thank you for your detailed instructions.
> I have followed them in the following way:
>
>=C2=A0 =C2=A0 - In config/rte_config.h set RTE_MAX_MEM_MB_PER_LIST to 5= 24288
>=C2=A0 =C2=A0 - In config/rte_config.h set RTE_MAX_MEM_MB_PER_TYPE to 5= 24288
>=C2=A0 =C2=A0 - To change RTE_MAX_MEM_MB, I have to change the configur= e set in the
>=C2=A0 =C2=A0 meson build system. To do so, I changed
>=C2=A0 =C2=A0 https://eli= xir.bootlin.com/dpdk/v24.11.1/source/config/meson.build#L362
>=C2=A0 =C2=A0 I replaced "dpdk_conf.set('RTE_MAX_MEM_MB', = 524288)" with
>=C2=A0 =C2=A0 "dpdk_conf.set('RTE_MAX_MEM_MB', 8388608)&qu= ot;. As I have 8 NUMA nodes, with 2
>=C2=A0 =C2=A0 huge pages sizes: 8 NUMA nodes =C3=97 2 hugepage sizes = =C3=97 512 GiB =3D 8388 GiB
>
> With these changes, my program can create a mempool with=C2=A0 275'= ;735'294
> MBUFS, comprising 2'176 (bytes of MBUF size) =C3=97 275'735= 9;294 =3D
> 599'999'999'744 bytes ~ 600 GB, but fails later, as I also= need an extra
> rte_ring to hold pointers to the MBUFs. In HTOP, a virtual memory size= of
> 4'097G is reported.
> However, with a smaller amount, 229'779'411 MBUFs, it works (i= .e. 500 GB).
> And I can also allocate a ring of the same size (I use RING_F_EXACT_SZ= , so
> in reality it is more).
>
> I have tried increasing the limits further to be able to allocate more= :
>
>=C2=A0 =C2=A0 - In config/rte_config.h set RTE_MAX_MEM_MB_PER_LIST to 1= 048576
>=C2=A0 =C2=A0 - In config/rte_config.h set RTE_MAX_MEM_MB_PER_TYPE to 1= 048576
>=C2=A0 =C2=A0 - Set RTE_MAX_MEM_MB =3D 16'777'216
>
> The virtual memory allocated is now 8192G, 8192G, and I can allocate 6= 00 GB
> for the mempool (275'735'294 MBUFs), but the ring allocation f= ails (fails
> with 'Cannot allocate memory'). Allocation of the mempool now = takes seven
> minutes.

It would be interesting to profile this one your issue is resolved.
I expect that hugepage allocation takes only about 10 seconds of this,
while the rest is mempool initialization.

> How would I be able to also allocate a ring of the same size?
> I verified that the number of MBUFs I need rounded up to the next powe= r of
> 2 is still smaller than the max size of an unsigned int on my platform=
> (x86_64, so 32 bit unsigned int).
> I have 670 hugepages of 1 GB available. Is this too little? In princip= le
> the ring takes 64 bits =C3=97 # entries =3D memory. In this case, that= would be:
> next power of 2 is 536'870'912 =C3=97 8 bytes (64 bites) =3D 4= .3 GB.
> With 670 GB available, and roughly 600 GB for the mempool, this should= fit.
> Could it be that supporting structures take the rest of the memory?
Mempool adds headers and padding to objects within,
so it probably takes more memory than calculated.
You can use rte_mempool_mem_iter() and rte_mempool_calc_obj_size() to check= .
You can check exact memory usage with rte_malloc_get_socket_stats().
--000000000000da8d09062e938fcf--