Hi Dmitry,

My apologies for the private reply. I am quite new to the mailing list.

I will do a profile later of the allocation.

As for the issue, at 550 GB so 252'757'352 MBUFs (using default mbuf buf size), it works now, with ring allocation. However, the physical memory usage now goes up a lot and I end up swapping. It appears that not all memory is in hugepages (not all pages are filled) and that perhaps the kernel also allocates more memory. I have 755 GiB RAM available, so 600 GB of mempool is pushing it.

I realise now that I also have some private data in the mempool, so the figure of 550 GB is plainly wrong. In reality, one object is:

rte_mempool_calc_obj_size gives: total_size = 2240 bytes
Private data per mbuf (alignment included) is: 48 bytes

So actual memory consumption is: 252'757'352 MBUFs × (48 + 2240 bytes) = 578'308'821'376 bytes ~ 578 GB

That is at least 28 GB more.

I now fixed my program to address this issue and when requesting 500 GB, it will take the private data and headroom into account.

I will update later with some memory statistics and a profile.

On Thu, Feb 20, 2025 at 12:21 PM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:

Hi Lucas,

Please don't send private replies to discussions in the mailing list.

2025-02-20 12:00 (UTC+0100), Lucas:
> Hi Dmitry,
>
> Thank you for your detailed instructions.
> I have followed them in the following way:
>
> - In config/rte_config.h set RTE_MAX_MEM_MB_PER_LIST to 524288
> - In config/rte_config.h set RTE_MAX_MEM_MB_PER_TYPE to 524288
> - To change RTE_MAX_MEM_MB, I have to change the configure set in the
> meson build system. To do so, I changed
> https://elixir.bootlin.com/dpdk/v24.11.1/source/config/meson.build#L362
> I replaced "dpdk_conf.set('RTE_MAX_MEM_MB', 524288)" with
> "dpdk_conf.set('RTE_MAX_MEM_MB', 8388608)". As I have 8 NUMA nodes, with 2
> huge pages sizes: 8 NUMA nodes × 2 hugepage sizes × 512 GiB = 8388 GiB
>
> With these changes, my program can create a mempool with 275'735'294
> MBUFS, comprising 2'176 (bytes of MBUF size) × 275'735'294 =
> 599'999'999'744 bytes ~ 600 GB, but fails later, as I also need an extra
> rte_ring to hold pointers to the MBUFs. In HTOP, a virtual memory size of
> 4'097G is reported.
> However, with a smaller amount, 229'779'411 MBUFs, it works (i.e. 500 GB).
> And I can also allocate a ring of the same size (I use RING_F_EXACT_SZ, so
> in reality it is more).
>
> I have tried increasing the limits further to be able to allocate more:
>
> - In config/rte_config.h set RTE_MAX_MEM_MB_PER_LIST to 1048576
> - In config/rte_config.h set RTE_MAX_MEM_MB_PER_TYPE to 1048576
> - Set RTE_MAX_MEM_MB = 16'777'216
>
> The virtual memory allocated is now 8192G, 8192G, and I can allocate 600 GB
> for the mempool (275'735'294 MBUFs), but the ring allocation fails (fails
> with 'Cannot allocate memory'). Allocation of the mempool now takes seven
> minutes.

It would be interesting to profile this one your issue is resolved.
I expect that hugepage allocation takes only about 10 seconds of this,
while the rest is mempool initialization.

> How would I be able to also allocate a ring of the same size?
> I verified that the number of MBUFs I need rounded up to the next power of
> 2 is still smaller than the max size of an unsigned int on my platform
> (x86_64, so 32 bit unsigned int).
> I have 670 hugepages of 1 GB available. Is this too little? In principle
> the ring takes 64 bits × # entries = memory. In this case, that would be:
> next power of 2 is 536'870'912 × 8 bytes (64 bites) = 4.3 GB.
> With 670 GB available, and roughly 600 GB for the mempool, this should fit.
> Could it be that supporting structures take the rest of the memory?

Mempool adds headers and padding to objects within,
so it probably takes more memory than calculated.
You can use rte_mempool_mem_iter() and rte_mempool_calc_obj_size() to check.
You can check exact memory usage with rte_malloc_get_socket_stats().