DPDK patches and discussions
 help / color / mirror / Atom feed
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= <mannywang@tencent.com>
To: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>,
	Konstantin Ananyev <konstantin.ananyev@huawei.com>
Cc: dev@dpdk.org
Subject: Re: [PATCH v3] acl: support custom memory allocator
Date: Wed, 26 Nov 2025 16:09:20 +0800	[thread overview]
Message-ID: <08881270F044B8AD+5e6e521c-8430-4b66-a44c-9b1b8f8f297a@tencent.com> (raw)
In-Reply-To: <cb7c53de-bc26-42c1-8695-00a4a335ebef@gmail.com>

Thanks for the follow-up question.

 > I don't understand the build stage issue and why it needs a custom 
allocator.

The fragmentation concern does not come from the amount of address space,
but from how the underlying heap allocator manages **large / mid-sized
temporary buffers** that are repeatedly allocated and freed during ACL 
build.

ACL build allocates many temporary arrays, tables and sorted structures.
Some of them are several MB in size. When these allocations are done via
malloc/calloc, they typically end up in the general heap. Every build
iteration produces a different allocation pattern and size distribution.
Even if the allocations are freed at the end, the internal heap layout is
not restored to a “flat” state. Small holes remain, and future allocation of
large contiguous blocks may fail even if the total free memory is 
sufficient.

This becomes a real operational issue in long-running processes.

 > What exactly gets fragmented? It is the entire process address space 
which is practically unlimited?

It is not the address space that is the limiting factor.
It is the **allocator's internal arena**.

Most allocators (glibc malloc, jemalloc, tcmalloc, etc) retain internal
metadata, bins, and split blocks. Their fragmentation behavior accumulates
over time. The process may still have hundreds of MB of “free memory”, but
not in **contiguous regions** that satisfy the next large request.

 > How does malloc/free overhead compare to overall ACL build time?

The cost of malloc/free calls themselves is not the core problem.
The overhead is small relative to the total build time.

The risk is that allocator fragmentation increases unpredictably over a long
deployment, until a large block allocation fails in the data plane.

Our team has seen this exact behavior in production environments.
Because we cannot fully control the allocator state, we prefer a model
with zero dynamic allocation after init:

* persistent runtime structures → pre-allocated static region
* temporary build data → resettable memory pool

This avoids failure modes caused by allocator history and guarantees stable
latency regardless of system uptime or build frequency.

On 11/26/2025 3:57 PM, Dmitry Kozlyuk wrote:
> On 11/26/25 05:44, mannywang(王永峰) wrote:
>> Thanks for sharing this suggestion.
>>
>> We actually evaluated the heap-based approach before implementing this 
>> patch.
>> It can help in some scenarios, but unfortunately it does not fully 
>> solve our
>> use cases. Specifically:
>>
>> 1. **Heap count / scalability**
>>    Our application maintains at least ~200 rte_acl_ctx instances (due 
>> to the
>>    total rule count and multi-tenant isolation). Allowing a dedicated 
>> heap per
>>    context would exceed the practical limits of the current rte_malloc 
>> heap
>>    model. The number of heaps that can be created is not unlimited, and
>>    maintaining hundreds of separate heaps would introduce considerable
>>    management overhead.
> This is a valid point against heaps, thanks.
>> 2. **Temporary allocations in build stage**
>>    During `rte_acl_build`, a significant portion of memory is 
>> allocated through
>>    `calloc()` for internal temporary structures. These allocations are 
>> freed
>>    right after the build completes. Even if runtime memory could come 
>> from a
>>    custom heap, these temporary allocations would still need an 
>> independent
>>    allocator or callback mechanism to avoid fragmentation and repeated
>>    malloc/free cycles.
> I don't understand the build stage issue and why it needs a custom 
> allocator.
> What exactly gets fragmented?
> It is the entire process address space which is practically unlimited?
> How does is malloc/free overhead compare to the overall ACL build time?
> 


  reply	other threads:[~2025-11-26  8:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-14  2:51 [RFC] rte_acl_build memory fragmentation concern and proposal for external memory support mannywang(王永峰)
2025-11-17 12:51 ` Konstantin Ananyev
2025-11-25  9:40   ` [PATCH] acl: support custom memory allocator =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-25 12:06   ` [PATCH v2] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-25 12:14   ` [PATCH v3] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-25 14:59     ` Stephen Hemminger
2025-11-26  2:37       ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-25 18:01     ` Dmitry Kozlyuk
2025-11-26  2:44       ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-26  7:57         ` Dmitry Kozlyuk
2025-11-26  8:09           ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= [this message]
2025-11-26 21:28             ` Stephen Hemminger
2025-11-27  2:05               ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08881270F044B8AD+5e6e521c-8430-4b66-a44c-9b1b8f8f297a@tencent.com \
    --to=mannywang@tencent.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=konstantin.ananyev@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).