DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC] rte_acl_build memory fragmentation concern and proposal for external memory support
@ 2025-11-14  2:51 mannywang(王永峰)
  2025-11-17 12:51 ` Konstantin Ananyev
  0 siblings, 1 reply; 20+ messages in thread
From: mannywang(王永峰) @ 2025-11-14  2:51 UTC (permalink / raw)
  To: dev; +Cc: konstantin.v.ananyev

[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]

Problem Statement

Hello DPDK community,

We've been using DPDK's ACL library in a long-running network application and observed a concerning issue regarding memory fragmentation. The rte_acl_build()function internally uses rte_zmalloc_socket()for dynamic memory allocation during ACL rule compilation.

While this design works well for short-lived processes, it creates challenges for long-running applications (days/weeks) where memory fragmentation becomes a critical concern. Each ACL rebuild (due to rule updates, configuration changes, etc.) allocates variable-sized memory blocks for the trie structures and transition tables, which are then freed when the ACL context is destroyed or rebuilt.

This pattern leads to:
1. Memory fragmentation​ over time, as the heap accumulates "holes" of varying sizes
2. Potential allocation failures​ even when total free memory appears sufficient
3. Difficulty in memory budgeting for deterministic systems


Current Limitation

The current API doesn't allow applications to provide pre-allocated memory for ACL construction, forcing dynamic allocations that contribute to heap fragmentation. This is particularly problematic for applications that prioritize long-term stability and predictable memory usage.

Proposed Solution

We propose extending the ACL API to support external memory buffers for the build process. This would allow applications to manage memory allocation strategies according to their specific requirements (pool-based, static allocation, etc.).






发自我的企业微信

[-- Attachment #2: Type: text/html, Size: 15755 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [RFC] rte_acl_build memory fragmentation concern and proposal for external memory support
  2025-11-14  2:51 [RFC] rte_acl_build memory fragmentation concern and proposal for external memory support mannywang(王永峰)
@ 2025-11-17 12:51 ` Konstantin Ananyev
  2025-11-25  9:40   ` [PATCH] acl: support custom memory allocator =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Konstantin Ananyev @ 2025-11-17 12:51 UTC (permalink / raw)
  To: mannywang(王永峰), dev; +Cc: konstantin.v.ananyev

[-- Attachment #1: Type: text/plain, Size: 2275 bytes --]

Hi,
I suppose you mean some sort of user-provided alloc()/free() callbacks that
will be used by ACL lib at generate phase?
And I presume if none provided, by default it will fall-back to rte_malloc/rte_free?
If so, then yes, sounds reasonable.
Patches are welcomed :)
Konstantin

From: mannywang(王永峰) <mannywang@tencent.com>
Sent: Friday, November 14, 2025 2:52 AM
To: dev <dev@dpdk.org>
Cc: konstantin.v.ananyev <konstantin.v.ananyev@yandex.ru>
Subject: [RFC] rte_acl_build memory fragmentation concern and proposal for external memory support

Problem Statement

Hello DPDK community,

We've been using DPDK's ACL library in a long-running network application and observed a concerning issue regarding memory fragmentation. The rte_acl_build()function internally uses rte_zmalloc_socket()for dynamic memory allocation during ACL rule compilation.

While this design works well for short-lived processes, it creates challenges for long-running applications (days/weeks) where memory fragmentation becomes a critical concern. Each ACL rebuild (due to rule updates, configuration changes, etc.) allocates variable-sized memory blocks for the trie structures and transition tables, which are then freed when the ACL context is destroyed or rebuilt.

This pattern leads to:
1.  Memory fragmentation​ over time, as the heap accumulates "holes" of varying sizes
2.  Potential allocation failures​ even when total free memory appears sufficient
3.  Difficulty in memory budgeting for deterministic systems
Current Limitation

The current API doesn't allow applications to provide pre-allocated memory for ACL construction, forcing dynamic allocations that contribute to heap fragmentation. This is particularly problematic for applications that prioritize long-term stability and predictable memory usage.

Proposed Solution

We propose extending the ACL API to support external memory buffers for the build process. This would allow applications to manage memory allocation strategies according to their specific requirements (pool-based, static allocation, etc.).

________________________________
发自我的企业微信<https://work.weixin.qq.com/wework_admin/user/h5/qqmail_user_card/vc879305fe5b6b5af3?from=myprofile>



[-- Attachment #2: Type: text/html, Size: 58241 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] acl: support custom memory allocator
  2025-11-17 12:51 ` Konstantin Ananyev
@ 2025-11-25  9:40   ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-25 12:06   ` [PATCH v2] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-25 12:14   ` [PATCH v3] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2 siblings, 0 replies; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-11-25  9:40 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, YongFeng Wang

Reduce memory fragmentation caused by dynamic memory allocations
by allowing users to provide custom memory allocator.

Add new members to struct rte_acl_config to allow passing custom
allocator callbacks to rte_acl_build:

- running_alloc: allocator callback for run-time internal memory
- running_free: free callback for run-time internal memory
- running_ctx: user-defined context passed to running_alloc/free

- temp_alloc: allocator callback for temporary memory during ACL build
- temp_reset: reset callback for temporary allocator
- temp_ctx: user-defined context passed to temp_alloc/reset

These callbacks allow users to provide their own memory pools or
allocators for both persistent runtime structures and temporary
build-time data.

A typical approach is to pre-allocate a static memory region
for rte_acl_ctx, and to provide a global temporary memory manager
that supports multipleallocations and a single reset during ACL build.

Since tb_mem_pool handles allocation failures using siglongjmp,
temp_alloc follows the same approach for failure handling.

Signed-off-by: YongFeng Wang <mannywang@tencent.com>
---
 app/test/test_acl.c | 181 +++++++++++++++++++++++++++++++++++++++++++-
 lib/acl/acl.h       |   3 +-
 lib/acl/acl_bld.c   |  14 +++-
 lib/acl/acl_gen.c   |   8 +-
 lib/acl/rte_acl.c   |   5 +-
 lib/acl/rte_acl.h   |  20 +++++
 lib/acl/tb_mem.c    |   8 ++
 lib/acl/tb_mem.h    |   6 ++
 8 files changed, 236 insertions(+), 9 deletions(-)

diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index 43d13b5b0f..0f20967df2 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -1721,6 +1721,184 @@ test_u32_range(void)
 	return rc;
 }
 
+struct acl_ctx_wrapper_t {
+	struct rte_acl_ctx *ctx;
+	void *running_buf;
+	bool running_buf_using;
+};
+
+struct acl_temp_mem_mgr_t {
+	void *buf;
+	uint32_t buf_used;
+	sigjmp_buf fail;
+};
+
+struct acl_ctx_wrapper_t g_acl_ctx_wrapper;
+struct acl_temp_mem_mgr_t g_temp_mem_mgr;
+
+#define ACL_RUNNING_BUF_SIZE (10 * 1024 * 1024)
+#define ACL_TEMP_BUF_SIZE (10 * 1024 * 1024)
+
+static void *running_alloc(size_t size, unsigned int align, void *cb_data)
+{
+	(void)align;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)cb_data;
+	if (gwlb_acl_ctx->running_buf_using)
+		return NULL;
+	printf("running memory alloc for acl context, size=%" PRId64 ", pointer=%p\n",
+		size,
+		gwlb_acl_ctx->running_buf);
+	gwlb_acl_ctx->running_buf_using = true;
+	return gwlb_acl_ctx->running_buf;
+}
+
+static void running_free(void *buf, void *cb_data)
+{
+	if (!buf)
+		return;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)cb_data;
+	printf("running memory free pointer=%p\n", buf);
+	gwlb_acl_ctx->running_buf_using = false;
+}
+
+static void *temp_alloc(size_t size, sigjmp_buf fail, void *cb_data)
+{
+	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct acl_temp_mem_mgr_t *)cb_data;
+	if (ACL_TEMP_BUF_SIZE - gwlb_acl_build->buf_used < size) {
+		printf("Line %i: alloc temp memory fail, size=%" PRId64 ", used=%d\n",
+			__LINE__,
+			size,
+			gwlb_acl_build->buf_used);
+		siglongjmp(fail, -ENOMEM);
+		return NULL;
+	}
+	void *ret = (char *)gwlb_acl_build->buf + gwlb_acl_build->buf_used;
+	gwlb_acl_build->buf_used += size;
+	return ret;
+}
+
+static void temp_reset(void *cb_data)
+{
+	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct acl_temp_mem_mgr_t *)cb_data;
+	memset(gwlb_acl_build->buf, 0, ACL_TEMP_BUF_SIZE);
+	printf("temp memory reset, used total=%d\n", gwlb_acl_build->buf_used);
+	gwlb_acl_build->buf_used = 0;
+}
+
+static int
+rte_acl_ipv4vlan_build_wich_mem_cb(struct rte_acl_ctx *ctx,
+	const uint32_t layout[RTE_ACL_IPV4VLAN_NUM],
+	uint32_t num_categories)
+{
+	struct rte_acl_config cfg;
+
+	if (ctx == NULL || layout == NULL)
+		return -EINVAL;
+
+	memset(&cfg, 0, sizeof(cfg));
+	acl_ipv4vlan_config(&cfg, layout, num_categories);
+	cfg.running_alloc = running_alloc;
+	cfg.running_free = running_free;
+	cfg.running_cb_ctx = &g_acl_ctx_wrapper;
+	cfg.temp_alloc = temp_alloc;
+	cfg.temp_reset = temp_reset;
+	cfg.temp_cb_ctx = &g_temp_mem_mgr;
+	return rte_acl_build(ctx, &cfg);
+}
+
+static int
+test_classify_buid_wich_mem_cb(struct rte_acl_ctx *acx,
+	const struct rte_acl_ipv4vlan_rule *rules, uint32_t num)
+{
+	int ret;
+
+	/* add rules to the context */
+	ret = rte_acl_ipv4vlan_add_rules(acx, rules, num);
+	if (ret != 0) {
+		printf("Line %i: Adding rules to ACL context failed!\n",
+			__LINE__);
+		return ret;
+	}
+
+	/* try building the context */
+	ret = rte_acl_ipv4vlan_build_wich_mem_cb(acx, ipv4_7tuple_layout,
+		RTE_ACL_MAX_CATEGORIES);
+	if (ret != 0) {
+		printf("Line %i: Building ACL context failed!\n", __LINE__);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int
+test_mem_cb(void)
+{
+	int i, ret;
+	g_acl_ctx_wrapper.ctx = rte_acl_create(&acl_param);
+	if (g_acl_ctx_wrapper.ctx == NULL) {
+		printf("Line %i: Error creating ACL context!\n", __LINE__);
+		return -1;
+	}
+	g_acl_ctx_wrapper.running_buf = rte_zmalloc_socket(
+		"test_acl",
+		ACL_RUNNING_BUF_SIZE,
+		RTE_CACHE_LINE_SIZE,
+		SOCKET_ID_ANY);
+	if (!g_acl_ctx_wrapper.running_buf) {
+		printf("Line %i: Error allocing running buf for acl context!\n", __LINE__);
+		return 1;
+	}
+	g_acl_ctx_wrapper.running_buf_using = false;
+
+	g_temp_mem_mgr.buf = malloc(ACL_TEMP_BUF_SIZE);
+	if (!g_temp_mem_mgr.buf)
+		printf("Line %i: Error allocing teem buf for acl build!\n", __LINE__);
+	memset(g_temp_mem_mgr.buf, 0, ACL_TEMP_BUF_SIZE);
+	g_temp_mem_mgr.buf_used = 0;
+
+	ret = 0;
+	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
+
+		if ((i & 1) == 0)
+			rte_acl_reset(g_acl_ctx_wrapper.ctx);
+		else
+			rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+
+		ret = test_classify_buid_wich_mem_cb(g_acl_ctx_wrapper.ctx, acl_test_rules,
+			RTE_DIM(acl_test_rules));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: "
+				"Adding rules to ACL context failed!\n",
+				__LINE__, i);
+			break;
+		}
+
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+
+		/* reset rules and make sure that classify still works ok. */
+		rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+	}
+
+	rte_acl_free(g_acl_ctx_wrapper.ctx);
+	free(g_temp_mem_mgr.buf);
+	rte_free(g_acl_ctx_wrapper.running_buf);
+	return ret;
+}
+
 static int
 test_acl(void)
 {
@@ -1742,7 +1920,8 @@ test_acl(void)
 		return -1;
 	if (test_u32_range() < 0)
 		return -1;
-
+	if (test_mem_cb() < 0)
+		return -1;
 	return 0;
 }
 
diff --git a/lib/acl/acl.h b/lib/acl/acl.h
index c8e4e72fab..7080fff64d 100644
--- a/lib/acl/acl.h
+++ b/lib/acl/acl.h
@@ -189,7 +189,8 @@ struct rte_acl_ctx {
 
 int rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
-	uint32_t num_categories, uint32_t data_index_sz, size_t max_size);
+	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
+	const struct rte_acl_config *cfg);
 
 typedef int (*rte_acl_classify_t)
 (const struct rte_acl_ctx *, const uint8_t **, uint32_t *, uint32_t, uint32_t);
diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
index 7056b1c117..1fd0ee3aa5 100644
--- a/lib/acl/acl_bld.c
+++ b/lib/acl/acl_bld.c
@@ -777,9 +777,12 @@ acl_merge_trie(struct acl_build_context *context,
  *  - reset all RT related fields to zero.
  */
 static void
-acl_build_reset(struct rte_acl_ctx *ctx)
+acl_build_reset(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
 {
-	rte_free(ctx->mem);
+	if (cfg->running_free)
+		cfg->running_free(ctx->mem, cfg->running_cb_ctx);
+	else
+		rte_free(ctx->mem);
 	memset(&ctx->num_categories, 0,
 		sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories));
 }
@@ -1518,6 +1521,9 @@ acl_bld(struct acl_build_context *bcx, struct rte_acl_ctx *ctx,
 	bcx->acx = ctx;
 	bcx->pool.alignment = ACL_POOL_ALIGN;
 	bcx->pool.min_alloc = ACL_POOL_ALLOC_MIN;
+	bcx->pool.alloc_cb = cfg->temp_alloc;
+	bcx->pool.reset_cb = cfg->temp_reset;
+	bcx->pool.cb_ctx = cfg->temp_cb_ctx;
 	bcx->cfg = *cfg;
 	bcx->category_mask = RTE_LEN2MASK(bcx->cfg.num_categories,
 		typeof(bcx->category_mask));
@@ -1635,7 +1641,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
 	if (rc != 0)
 		return rc;
 
-	acl_build_reset(ctx);
+	acl_build_reset(ctx, cfg);
 
 	if (cfg->max_size == 0) {
 		n = NODE_MIN;
@@ -1655,7 +1661,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
 			rc = rte_acl_gen(ctx, bcx.tries, bcx.bld_tries,
 				bcx.num_tries, bcx.cfg.num_categories,
 				ACL_MAX_INDEXES * RTE_DIM(bcx.tries) *
-				sizeof(ctx->data_indexes[0]), max_size);
+				sizeof(ctx->data_indexes[0]), max_size, cfg);
 			if (rc == 0) {
 				/* set data indexes. */
 				acl_set_data_indexes(ctx);
diff --git a/lib/acl/acl_gen.c b/lib/acl/acl_gen.c
index 3c53d24056..6aa7d74635 100644
--- a/lib/acl/acl_gen.c
+++ b/lib/acl/acl_gen.c
@@ -448,7 +448,8 @@ acl_calc_counts_indices(struct acl_node_counters *counts,
 int
 rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
-	uint32_t num_categories, uint32_t data_index_sz, size_t max_size)
+	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
+	const struct rte_acl_config *cfg)
 {
 	void *mem;
 	size_t total_size;
@@ -478,7 +479,10 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 		return -ERANGE;
 	}
 
-	mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
+	if (cfg->running_alloc)
+		mem = cfg->running_alloc(total_size, RTE_CACHE_LINE_SIZE, cfg->running_cb_ctx);
+	else
+		mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
 			ctx->socket_id);
 	if (mem == NULL) {
 		ACL_LOG(ERR,
diff --git a/lib/acl/rte_acl.c b/lib/acl/rte_acl.c
index 8c0ca29618..e765c40f4f 100644
--- a/lib/acl/rte_acl.c
+++ b/lib/acl/rte_acl.c
@@ -362,7 +362,10 @@ rte_acl_free(struct rte_acl_ctx *ctx)
 
 	rte_mcfg_tailq_write_unlock();
 
-	rte_free(ctx->mem);
+	if (ctx->config.running_free)
+		ctx->config.running_free(ctx->mem, ctx->config.running_cb_ctx);
+	else
+		rte_free(ctx->mem);
 	rte_free(ctx);
 	rte_free(te);
 }
diff --git a/lib/acl/rte_acl.h b/lib/acl/rte_acl.h
index 95354cabb8..c675c9ff81 100644
--- a/lib/acl/rte_acl.h
+++ b/lib/acl/rte_acl.h
@@ -13,6 +13,7 @@
 
 #include <rte_common.h>
 #include <rte_acl_osdep.h>
+#include <setjmp.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -61,6 +62,11 @@ struct rte_acl_field_def {
  * ACL build configuration.
  * Defines the fields of an ACL trie and number of categories to build with.
  */
+typedef void *(*rte_acl_running_alloc_t)(size_t, unsigned int, void *);
+typedef void  (*rte_acl_running_free_t)(void *, void *);
+typedef void *(*rte_acl_temp_alloc_t)(size_t, sigjmp_buf, void *);
+typedef void  (*rte_acl_temp_reset_t)(void *);
+
 struct rte_acl_config {
 	uint32_t num_categories; /**< Number of categories to build with. */
 	uint32_t num_fields;     /**< Number of field definitions. */
@@ -68,6 +74,20 @@ struct rte_acl_config {
 	/**< array of field definitions. */
 	size_t max_size;
 	/**< max memory limit for internal run-time structures. */
+
+	/**< Allocator callback for run-time internal memory. */
+	rte_acl_running_alloc_t  running_alloc;
+	/**< Free callback for run-time internal memory. */
+	rte_acl_running_free_t   running_free;
+	/**< User context passed to running_alloc/free. */
+	void                     *running_cb_ctx;
+
+	/**< Allocator callback for temporary memory used during build. */
+	rte_acl_temp_alloc_t     temp_alloc;
+	/**< Reset callback for temporary allocator. */
+	rte_acl_temp_reset_t     temp_reset;
+	/**< User context passed to temp_alloc/reset. */
+	void                     *temp_cb_ctx;
 };
 
 /**
diff --git a/lib/acl/tb_mem.c b/lib/acl/tb_mem.c
index 9264433422..b9c69b563e 100644
--- a/lib/acl/tb_mem.c
+++ b/lib/acl/tb_mem.c
@@ -55,6 +55,9 @@ tb_alloc(struct tb_mem_pool *pool, size_t size)
 
 	size = RTE_ALIGN_CEIL(size, pool->alignment);
 
+	if (pool->alloc_cb)
+		return pool->alloc_cb(size, pool->fail, pool->cb_ctx);
+
 	block = pool->block;
 	if (block == NULL || block->size < size) {
 		new_sz = (size > pool->min_alloc) ? size : pool->min_alloc;
@@ -71,6 +74,11 @@ tb_free_pool(struct tb_mem_pool *pool)
 {
 	struct tb_mem_block *next, *block;
 
+	if (pool->reset_cb) {
+		pool->reset_cb(pool->cb_ctx);
+		return;
+	}
+
 	for (block = pool->block; block != NULL; block = next) {
 		next = block->next;
 		free(block);
diff --git a/lib/acl/tb_mem.h b/lib/acl/tb_mem.h
index 2093744a6d..2fdebefc31 100644
--- a/lib/acl/tb_mem.h
+++ b/lib/acl/tb_mem.h
@@ -24,11 +24,17 @@ struct tb_mem_block {
 	uint8_t             *mem;
 };
 
+typedef void *(*rte_tb_alloc_t)(size_t, sigjmp_buf, void *);
+typedef void (*rte_tb_reset_t)(void *);
+
 struct tb_mem_pool {
 	struct tb_mem_block *block;
 	size_t               alignment;
 	size_t               min_alloc;
 	size_t               alloc;
+	rte_tb_alloc_t       alloc_cb;
+	rte_tb_reset_t       reset_cb;
+	void                 *cb_ctx;
 	/* jump target in case of memory allocation failure. */
 	sigjmp_buf           fail;
 };
-- 
2.43.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] acl: support custom memory allocator
  2025-11-17 12:51 ` Konstantin Ananyev
  2025-11-25  9:40   ` [PATCH] acl: support custom memory allocator =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
@ 2025-11-25 12:06   ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-25 12:14   ` [PATCH v3] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2 siblings, 0 replies; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-11-25 12:06 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, YongFeng Wang

Reduce memory fragmentation caused by dynamic memory allocations
by allowing users to provide custom memory allocator.

Add new members to struct rte_acl_config to allow passing custom
allocator callbacks to rte_acl_build:

- running_alloc: allocator callback for run-time internal memory
- running_free: free callback for run-time internal memory
- running_ctx: user-defined context passed to running_alloc/free

- temp_alloc: allocator callback for temporary memory during ACL build
- temp_reset: reset callback for temporary allocator
- temp_ctx: user-defined context passed to temp_alloc/reset

These callbacks allow users to provide their own memory pools or
allocators for both persistent runtime structures and temporary
build-time data.

A typical approach is to pre-allocate a static memory region
for rte_acl_ctx, and to provide a global temporary memory manager
that supports multipleallocations and a single reset during ACL build.

Since tb_mem_pool handles allocation failures using siglongjmp,
temp_alloc follows the same approach for failure handling.

v2:
Fix build warning of test code on 32-bit systems

Signed-off-by: YongFeng Wang <mannywang@tencent.com>
---
 ...-acl-support-custom-memory-allocator.patch | 440 ++++++++++++++++++
 app/test/test_acl.c                           | 181 ++++++-
 lib/acl/acl.h                                 |   3 +-
 lib/acl/acl_bld.c                             |  14 +-
 lib/acl/acl_gen.c                             |   8 +-
 lib/acl/rte_acl.c                             |   5 +-
 lib/acl/rte_acl.h                             |  20 +
 lib/acl/tb_mem.c                              |   8 +
 lib/acl/tb_mem.h                              |   6 +
 9 files changed, 676 insertions(+), 9 deletions(-)
 create mode 100644 0001-acl-support-custom-memory-allocator.patch

diff --git a/0001-acl-support-custom-memory-allocator.patch b/0001-acl-support-custom-memory-allocator.patch
new file mode 100644
index 0000000000..648c7b96b2
--- /dev/null
+++ b/0001-acl-support-custom-memory-allocator.patch
@@ -0,0 +1,440 @@
+From 248773f961d66ebf1a0812822e9251cf4b0f9d5e Mon Sep 17 00:00:00 2001
+From: YongFeng Wang <mannywang@tencent.com>
+Date: Mon, 24 Nov 2025 19:10:38 +0800
+Subject: [PATCH] acl: support custom memory allocator
+
+Reduce memory fragmentation caused by dynamic memory allocations
+by allowing users to provide custom memory allocator.
+
+Add new members to struct rte_acl_config to allow passing custom
+allocator callbacks to rte_acl_build:
+
+- running_alloc: allocator callback for run-time internal memory
+- running_free: free callback for run-time internal memory
+- running_ctx: user-defined context passed to running_alloc/free
+
+- temp_alloc: allocator callback for temporary memory during ACL build
+- temp_reset: reset callback for temporary allocator
+- temp_ctx: user-defined context passed to temp_alloc/reset
+
+These callbacks allow users to provide their own memory pools or
+allocators for both persistent runtime structures and temporary
+build-time data.
+
+A typical approach is to pre-allocate a static memory region
+for rte_acl_ctx, and to provide a global temporary memory manager
+that supports multipleallocations and a single reset during ACL build.
+
+Since tb_mem_pool handles allocation failures using siglongjmp,
+temp_alloc follows the same approach for failure handling.
+
+Signed-off-by: YongFeng Wang <mannywang@tencent.com>
+---
+ app/test/test_acl.c | 181 +++++++++++++++++++++++++++++++++++++++++++-
+ lib/acl/acl.h       |   3 +-
+ lib/acl/acl_bld.c   |  14 +++-
+ lib/acl/acl_gen.c   |   8 +-
+ lib/acl/rte_acl.c   |   5 +-
+ lib/acl/rte_acl.h   |  20 +++++
+ lib/acl/tb_mem.c    |   8 ++
+ lib/acl/tb_mem.h    |   6 ++
+ 8 files changed, 236 insertions(+), 9 deletions(-)
+
+diff --git a/app/test/test_acl.c b/app/test/test_acl.c
+index 43d13b5b0f..0f20967df2 100644
+--- a/app/test/test_acl.c
++++ b/app/test/test_acl.c
+@@ -1721,6 +1721,184 @@ test_u32_range(void)
+ 	return rc;
+ }
+ 
++struct acl_ctx_wrapper_t {
++	struct rte_acl_ctx *ctx;
++	void *running_buf;
++	bool running_buf_using;
++};
++
++struct acl_temp_mem_mgr_t {
++	void *buf;
++	uint32_t buf_used;
++	sigjmp_buf fail;
++};
++
++struct acl_ctx_wrapper_t g_acl_ctx_wrapper;
++struct acl_temp_mem_mgr_t g_temp_mem_mgr;
++
++#define ACL_RUNNING_BUF_SIZE (10 * 1024 * 1024)
++#define ACL_TEMP_BUF_SIZE (10 * 1024 * 1024)
++
++static void *running_alloc(size_t size, unsigned int align, void *cb_data)
++{
++	(void)align;
++	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)cb_data;
++	if (gwlb_acl_ctx->running_buf_using)
++		return NULL;
++	printf("running memory alloc for acl context, size=%" PRId64 ", pointer=%p\n",
++		size,
++		gwlb_acl_ctx->running_buf);
++	gwlb_acl_ctx->running_buf_using = true;
++	return gwlb_acl_ctx->running_buf;
++}
++
++static void running_free(void *buf, void *cb_data)
++{
++	if (!buf)
++		return;
++	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)cb_data;
++	printf("running memory free pointer=%p\n", buf);
++	gwlb_acl_ctx->running_buf_using = false;
++}
++
++static void *temp_alloc(size_t size, sigjmp_buf fail, void *cb_data)
++{
++	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct acl_temp_mem_mgr_t *)cb_data;
++	if (ACL_TEMP_BUF_SIZE - gwlb_acl_build->buf_used < size) {
++		printf("Line %i: alloc temp memory fail, size=%" PRId64 ", used=%d\n",
++			__LINE__,
++			size,
++			gwlb_acl_build->buf_used);
++		siglongjmp(fail, -ENOMEM);
++		return NULL;
++	}
++	void *ret = (char *)gwlb_acl_build->buf + gwlb_acl_build->buf_used;
++	gwlb_acl_build->buf_used += size;
++	return ret;
++}
++
++static void temp_reset(void *cb_data)
++{
++	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct acl_temp_mem_mgr_t *)cb_data;
++	memset(gwlb_acl_build->buf, 0, ACL_TEMP_BUF_SIZE);
++	printf("temp memory reset, used total=%d\n", gwlb_acl_build->buf_used);
++	gwlb_acl_build->buf_used = 0;
++}
++
++static int
++rte_acl_ipv4vlan_build_wich_mem_cb(struct rte_acl_ctx *ctx,
++	const uint32_t layout[RTE_ACL_IPV4VLAN_NUM],
++	uint32_t num_categories)
++{
++	struct rte_acl_config cfg;
++
++	if (ctx == NULL || layout == NULL)
++		return -EINVAL;
++
++	memset(&cfg, 0, sizeof(cfg));
++	acl_ipv4vlan_config(&cfg, layout, num_categories);
++	cfg.running_alloc = running_alloc;
++	cfg.running_free = running_free;
++	cfg.running_cb_ctx = &g_acl_ctx_wrapper;
++	cfg.temp_alloc = temp_alloc;
++	cfg.temp_reset = temp_reset;
++	cfg.temp_cb_ctx = &g_temp_mem_mgr;
++	return rte_acl_build(ctx, &cfg);
++}
++
++static int
++test_classify_buid_wich_mem_cb(struct rte_acl_ctx *acx,
++	const struct rte_acl_ipv4vlan_rule *rules, uint32_t num)
++{
++	int ret;
++
++	/* add rules to the context */
++	ret = rte_acl_ipv4vlan_add_rules(acx, rules, num);
++	if (ret != 0) {
++		printf("Line %i: Adding rules to ACL context failed!\n",
++			__LINE__);
++		return ret;
++	}
++
++	/* try building the context */
++	ret = rte_acl_ipv4vlan_build_wich_mem_cb(acx, ipv4_7tuple_layout,
++		RTE_ACL_MAX_CATEGORIES);
++	if (ret != 0) {
++		printf("Line %i: Building ACL context failed!\n", __LINE__);
++		return ret;
++	}
++
++	return 0;
++}
++
++static int
++test_mem_cb(void)
++{
++	int i, ret;
++	g_acl_ctx_wrapper.ctx = rte_acl_create(&acl_param);
++	if (g_acl_ctx_wrapper.ctx == NULL) {
++		printf("Line %i: Error creating ACL context!\n", __LINE__);
++		return -1;
++	}
++	g_acl_ctx_wrapper.running_buf = rte_zmalloc_socket(
++		"test_acl",
++		ACL_RUNNING_BUF_SIZE,
++		RTE_CACHE_LINE_SIZE,
++		SOCKET_ID_ANY);
++	if (!g_acl_ctx_wrapper.running_buf) {
++		printf("Line %i: Error allocing running buf for acl context!\n", __LINE__);
++		return 1;
++	}
++	g_acl_ctx_wrapper.running_buf_using = false;
++
++	g_temp_mem_mgr.buf = malloc(ACL_TEMP_BUF_SIZE);
++	if (!g_temp_mem_mgr.buf)
++		printf("Line %i: Error allocing teem buf for acl build!\n", __LINE__);
++	memset(g_temp_mem_mgr.buf, 0, ACL_TEMP_BUF_SIZE);
++	g_temp_mem_mgr.buf_used = 0;
++
++	ret = 0;
++	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
++
++		if ((i & 1) == 0)
++			rte_acl_reset(g_acl_ctx_wrapper.ctx);
++		else
++			rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
++
++		ret = test_classify_buid_wich_mem_cb(g_acl_ctx_wrapper.ctx, acl_test_rules,
++			RTE_DIM(acl_test_rules));
++		if (ret != 0) {
++			printf("Line %i, iter: %d: "
++				"Adding rules to ACL context failed!\n",
++				__LINE__, i);
++			break;
++		}
++
++		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
++			RTE_DIM(acl_test_data));
++		if (ret != 0) {
++			printf("Line %i, iter: %d: %s failed!\n",
++				__LINE__, i, __func__);
++			break;
++		}
++
++		/* reset rules and make sure that classify still works ok. */
++		rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
++		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
++			RTE_DIM(acl_test_data));
++		if (ret != 0) {
++			printf("Line %i, iter: %d: %s failed!\n",
++				__LINE__, i, __func__);
++			break;
++		}
++	}
++
++	rte_acl_free(g_acl_ctx_wrapper.ctx);
++	free(g_temp_mem_mgr.buf);
++	rte_free(g_acl_ctx_wrapper.running_buf);
++	return ret;
++}
++
+ static int
+ test_acl(void)
+ {
+@@ -1742,7 +1920,8 @@ test_acl(void)
+ 		return -1;
+ 	if (test_u32_range() < 0)
+ 		return -1;
+-
++	if (test_mem_cb() < 0)
++		return -1;
+ 	return 0;
+ }
+ 
+diff --git a/lib/acl/acl.h b/lib/acl/acl.h
+index c8e4e72fab..7080fff64d 100644
+--- a/lib/acl/acl.h
++++ b/lib/acl/acl.h
+@@ -189,7 +189,8 @@ struct rte_acl_ctx {
+ 
+ int rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
+ 	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
+-	uint32_t num_categories, uint32_t data_index_sz, size_t max_size);
++	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
++	const struct rte_acl_config *cfg);
+ 
+ typedef int (*rte_acl_classify_t)
+ (const struct rte_acl_ctx *, const uint8_t **, uint32_t *, uint32_t, uint32_t);
+diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
+index 7056b1c117..1fd0ee3aa5 100644
+--- a/lib/acl/acl_bld.c
++++ b/lib/acl/acl_bld.c
+@@ -777,9 +777,12 @@ acl_merge_trie(struct acl_build_context *context,
+  *  - reset all RT related fields to zero.
+  */
+ static void
+-acl_build_reset(struct rte_acl_ctx *ctx)
++acl_build_reset(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
+ {
+-	rte_free(ctx->mem);
++	if (cfg->running_free)
++		cfg->running_free(ctx->mem, cfg->running_cb_ctx);
++	else
++		rte_free(ctx->mem);
+ 	memset(&ctx->num_categories, 0,
+ 		sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories));
+ }
+@@ -1518,6 +1521,9 @@ acl_bld(struct acl_build_context *bcx, struct rte_acl_ctx *ctx,
+ 	bcx->acx = ctx;
+ 	bcx->pool.alignment = ACL_POOL_ALIGN;
+ 	bcx->pool.min_alloc = ACL_POOL_ALLOC_MIN;
++	bcx->pool.alloc_cb = cfg->temp_alloc;
++	bcx->pool.reset_cb = cfg->temp_reset;
++	bcx->pool.cb_ctx = cfg->temp_cb_ctx;
+ 	bcx->cfg = *cfg;
+ 	bcx->category_mask = RTE_LEN2MASK(bcx->cfg.num_categories,
+ 		typeof(bcx->category_mask));
+@@ -1635,7 +1641,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
+ 	if (rc != 0)
+ 		return rc;
+ 
+-	acl_build_reset(ctx);
++	acl_build_reset(ctx, cfg);
+ 
+ 	if (cfg->max_size == 0) {
+ 		n = NODE_MIN;
+@@ -1655,7 +1661,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
+ 			rc = rte_acl_gen(ctx, bcx.tries, bcx.bld_tries,
+ 				bcx.num_tries, bcx.cfg.num_categories,
+ 				ACL_MAX_INDEXES * RTE_DIM(bcx.tries) *
+-				sizeof(ctx->data_indexes[0]), max_size);
++				sizeof(ctx->data_indexes[0]), max_size, cfg);
+ 			if (rc == 0) {
+ 				/* set data indexes. */
+ 				acl_set_data_indexes(ctx);
+diff --git a/lib/acl/acl_gen.c b/lib/acl/acl_gen.c
+index 3c53d24056..6aa7d74635 100644
+--- a/lib/acl/acl_gen.c
++++ b/lib/acl/acl_gen.c
+@@ -448,7 +448,8 @@ acl_calc_counts_indices(struct acl_node_counters *counts,
+ int
+ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
+ 	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
+-	uint32_t num_categories, uint32_t data_index_sz, size_t max_size)
++	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
++	const struct rte_acl_config *cfg)
+ {
+ 	void *mem;
+ 	size_t total_size;
+@@ -478,7 +479,10 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
+ 		return -ERANGE;
+ 	}
+ 
+-	mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
++	if (cfg->running_alloc)
++		mem = cfg->running_alloc(total_size, RTE_CACHE_LINE_SIZE, cfg->running_cb_ctx);
++	else
++		mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
+ 			ctx->socket_id);
+ 	if (mem == NULL) {
+ 		ACL_LOG(ERR,
+diff --git a/lib/acl/rte_acl.c b/lib/acl/rte_acl.c
+index 8c0ca29618..e765c40f4f 100644
+--- a/lib/acl/rte_acl.c
++++ b/lib/acl/rte_acl.c
+@@ -362,7 +362,10 @@ rte_acl_free(struct rte_acl_ctx *ctx)
+ 
+ 	rte_mcfg_tailq_write_unlock();
+ 
+-	rte_free(ctx->mem);
++	if (ctx->config.running_free)
++		ctx->config.running_free(ctx->mem, ctx->config.running_cb_ctx);
++	else
++		rte_free(ctx->mem);
+ 	rte_free(ctx);
+ 	rte_free(te);
+ }
+diff --git a/lib/acl/rte_acl.h b/lib/acl/rte_acl.h
+index 95354cabb8..c675c9ff81 100644
+--- a/lib/acl/rte_acl.h
++++ b/lib/acl/rte_acl.h
+@@ -13,6 +13,7 @@
+ 
+ #include <rte_common.h>
+ #include <rte_acl_osdep.h>
++#include <setjmp.h>
+ 
+ #ifdef __cplusplus
+ extern "C" {
+@@ -61,6 +62,11 @@ struct rte_acl_field_def {
+  * ACL build configuration.
+  * Defines the fields of an ACL trie and number of categories to build with.
+  */
++typedef void *(*rte_acl_running_alloc_t)(size_t, unsigned int, void *);
++typedef void  (*rte_acl_running_free_t)(void *, void *);
++typedef void *(*rte_acl_temp_alloc_t)(size_t, sigjmp_buf, void *);
++typedef void  (*rte_acl_temp_reset_t)(void *);
++
+ struct rte_acl_config {
+ 	uint32_t num_categories; /**< Number of categories to build with. */
+ 	uint32_t num_fields;     /**< Number of field definitions. */
+@@ -68,6 +74,20 @@ struct rte_acl_config {
+ 	/**< array of field definitions. */
+ 	size_t max_size;
+ 	/**< max memory limit for internal run-time structures. */
++
++	/**< Allocator callback for run-time internal memory. */
++	rte_acl_running_alloc_t  running_alloc;
++	/**< Free callback for run-time internal memory. */
++	rte_acl_running_free_t   running_free;
++	/**< User context passed to running_alloc/free. */
++	void                     *running_cb_ctx;
++
++	/**< Allocator callback for temporary memory used during build. */
++	rte_acl_temp_alloc_t     temp_alloc;
++	/**< Reset callback for temporary allocator. */
++	rte_acl_temp_reset_t     temp_reset;
++	/**< User context passed to temp_alloc/reset. */
++	void                     *temp_cb_ctx;
+ };
+ 
+ /**
+diff --git a/lib/acl/tb_mem.c b/lib/acl/tb_mem.c
+index 9264433422..b9c69b563e 100644
+--- a/lib/acl/tb_mem.c
++++ b/lib/acl/tb_mem.c
+@@ -55,6 +55,9 @@ tb_alloc(struct tb_mem_pool *pool, size_t size)
+ 
+ 	size = RTE_ALIGN_CEIL(size, pool->alignment);
+ 
++	if (pool->alloc_cb)
++		return pool->alloc_cb(size, pool->fail, pool->cb_ctx);
++
+ 	block = pool->block;
+ 	if (block == NULL || block->size < size) {
+ 		new_sz = (size > pool->min_alloc) ? size : pool->min_alloc;
+@@ -71,6 +74,11 @@ tb_free_pool(struct tb_mem_pool *pool)
+ {
+ 	struct tb_mem_block *next, *block;
+ 
++	if (pool->reset_cb) {
++		pool->reset_cb(pool->cb_ctx);
++		return;
++	}
++
+ 	for (block = pool->block; block != NULL; block = next) {
+ 		next = block->next;
+ 		free(block);
+diff --git a/lib/acl/tb_mem.h b/lib/acl/tb_mem.h
+index 2093744a6d..2fdebefc31 100644
+--- a/lib/acl/tb_mem.h
++++ b/lib/acl/tb_mem.h
+@@ -24,11 +24,17 @@ struct tb_mem_block {
+ 	uint8_t             *mem;
+ };
+ 
++typedef void *(*rte_tb_alloc_t)(size_t, sigjmp_buf, void *);
++typedef void (*rte_tb_reset_t)(void *);
++
+ struct tb_mem_pool {
+ 	struct tb_mem_block *block;
+ 	size_t               alignment;
+ 	size_t               min_alloc;
+ 	size_t               alloc;
++	rte_tb_alloc_t       alloc_cb;
++	rte_tb_reset_t       reset_cb;
++	void                 *cb_ctx;
+ 	/* jump target in case of memory allocation failure. */
+ 	sigjmp_buf           fail;
+ };
+-- 
+2.43.0
+
diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index 43d13b5b0f..9c6ed34f0c 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -1721,6 +1721,184 @@ test_u32_range(void)
 	return rc;
 }
 
+struct acl_ctx_wrapper_t {
+	struct rte_acl_ctx *ctx;
+	void *running_buf;
+	bool running_buf_using;
+};
+
+struct acl_temp_mem_mgr_t {
+	void *buf;
+	uint32_t buf_used;
+	sigjmp_buf fail;
+};
+
+struct acl_ctx_wrapper_t g_acl_ctx_wrapper;
+struct acl_temp_mem_mgr_t g_temp_mem_mgr;
+
+#define ACL_RUNNING_BUF_SIZE (10 * 1024 * 1024)
+#define ACL_TEMP_BUF_SIZE (10 * 1024 * 1024)
+
+static void *running_alloc(size_t size, unsigned int align, void *cb_data)
+{
+	(void)align;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)cb_data;
+	if (gwlb_acl_ctx->running_buf_using)
+		return NULL;
+	printf("running memory alloc for acl context, size=%zu, pointer=%p\n",
+		size,
+		gwlb_acl_ctx->running_buf);
+	gwlb_acl_ctx->running_buf_using = true;
+	return gwlb_acl_ctx->running_buf;
+}
+
+static void running_free(void *buf, void *cb_data)
+{
+	if (!buf)
+		return;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)cb_data;
+	printf("running memory free pointer=%p\n", buf);
+	gwlb_acl_ctx->running_buf_using = false;
+}
+
+static void *temp_alloc(size_t size, sigjmp_buf fail, void *cb_data)
+{
+	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct acl_temp_mem_mgr_t *)cb_data;
+	if (ACL_TEMP_BUF_SIZE - gwlb_acl_build->buf_used < size) {
+		printf("Line %i: alloc temp memory fail, size=%zu, used=%d\n",
+			__LINE__,
+			size,
+			gwlb_acl_build->buf_used);
+		siglongjmp(fail, -ENOMEM);
+		return NULL;
+	}
+	void *ret = (char *)gwlb_acl_build->buf + gwlb_acl_build->buf_used;
+	gwlb_acl_build->buf_used += size;
+	return ret;
+}
+
+static void temp_reset(void *cb_data)
+{
+	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct acl_temp_mem_mgr_t *)cb_data;
+	memset(gwlb_acl_build->buf, 0, ACL_TEMP_BUF_SIZE);
+	printf("temp memory reset, used total=%u\n", gwlb_acl_build->buf_used);
+	gwlb_acl_build->buf_used = 0;
+}
+
+static int
+rte_acl_ipv4vlan_build_wich_mem_cb(struct rte_acl_ctx *ctx,
+	const uint32_t layout[RTE_ACL_IPV4VLAN_NUM],
+	uint32_t num_categories)
+{
+	struct rte_acl_config cfg;
+
+	if (ctx == NULL || layout == NULL)
+		return -EINVAL;
+
+	memset(&cfg, 0, sizeof(cfg));
+	acl_ipv4vlan_config(&cfg, layout, num_categories);
+	cfg.running_alloc = running_alloc;
+	cfg.running_free = running_free;
+	cfg.running_cb_ctx = &g_acl_ctx_wrapper;
+	cfg.temp_alloc = temp_alloc;
+	cfg.temp_reset = temp_reset;
+	cfg.temp_cb_ctx = &g_temp_mem_mgr;
+	return rte_acl_build(ctx, &cfg);
+}
+
+static int
+test_classify_buid_wich_mem_cb(struct rte_acl_ctx *acx,
+	const struct rte_acl_ipv4vlan_rule *rules, uint32_t num)
+{
+	int ret;
+
+	/* add rules to the context */
+	ret = rte_acl_ipv4vlan_add_rules(acx, rules, num);
+	if (ret != 0) {
+		printf("Line %i: Adding rules to ACL context failed!\n",
+			__LINE__);
+		return ret;
+	}
+
+	/* try building the context */
+	ret = rte_acl_ipv4vlan_build_wich_mem_cb(acx, ipv4_7tuple_layout,
+		RTE_ACL_MAX_CATEGORIES);
+	if (ret != 0) {
+		printf("Line %i: Building ACL context failed!\n", __LINE__);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int
+test_mem_cb(void)
+{
+	int i, ret;
+	g_acl_ctx_wrapper.ctx = rte_acl_create(&acl_param);
+	if (g_acl_ctx_wrapper.ctx == NULL) {
+		printf("Line %i: Error creating ACL context!\n", __LINE__);
+		return -1;
+	}
+	g_acl_ctx_wrapper.running_buf = rte_zmalloc_socket(
+		"test_acl",
+		ACL_RUNNING_BUF_SIZE,
+		RTE_CACHE_LINE_SIZE,
+		SOCKET_ID_ANY);
+	if (!g_acl_ctx_wrapper.running_buf) {
+		printf("Line %i: Error allocing running buf for acl context!\n", __LINE__);
+		return 1;
+	}
+	g_acl_ctx_wrapper.running_buf_using = false;
+
+	g_temp_mem_mgr.buf = malloc(ACL_TEMP_BUF_SIZE);
+	if (!g_temp_mem_mgr.buf)
+		printf("Line %i: Error allocing teem buf for acl build!\n", __LINE__);
+	memset(g_temp_mem_mgr.buf, 0, ACL_TEMP_BUF_SIZE);
+	g_temp_mem_mgr.buf_used = 0;
+
+	ret = 0;
+	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
+
+		if ((i & 1) == 0)
+			rte_acl_reset(g_acl_ctx_wrapper.ctx);
+		else
+			rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+
+		ret = test_classify_buid_wich_mem_cb(g_acl_ctx_wrapper.ctx, acl_test_rules,
+			RTE_DIM(acl_test_rules));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: "
+				"Adding rules to ACL context failed!\n",
+				__LINE__, i);
+			break;
+		}
+
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+
+		/* reset rules and make sure that classify still works ok. */
+		rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+	}
+
+	rte_acl_free(g_acl_ctx_wrapper.ctx);
+	free(g_temp_mem_mgr.buf);
+	rte_free(g_acl_ctx_wrapper.running_buf);
+	return ret;
+}
+
 static int
 test_acl(void)
 {
@@ -1742,7 +1920,8 @@ test_acl(void)
 		return -1;
 	if (test_u32_range() < 0)
 		return -1;
-
+	if (test_mem_cb() < 0)
+		return -1;
 	return 0;
 }
 
diff --git a/lib/acl/acl.h b/lib/acl/acl.h
index c8e4e72fab..7080fff64d 100644
--- a/lib/acl/acl.h
+++ b/lib/acl/acl.h
@@ -189,7 +189,8 @@ struct rte_acl_ctx {
 
 int rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
-	uint32_t num_categories, uint32_t data_index_sz, size_t max_size);
+	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
+	const struct rte_acl_config *cfg);
 
 typedef int (*rte_acl_classify_t)
 (const struct rte_acl_ctx *, const uint8_t **, uint32_t *, uint32_t, uint32_t);
diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
index 7056b1c117..1fd0ee3aa5 100644
--- a/lib/acl/acl_bld.c
+++ b/lib/acl/acl_bld.c
@@ -777,9 +777,12 @@ acl_merge_trie(struct acl_build_context *context,
  *  - reset all RT related fields to zero.
  */
 static void
-acl_build_reset(struct rte_acl_ctx *ctx)
+acl_build_reset(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
 {
-	rte_free(ctx->mem);
+	if (cfg->running_free)
+		cfg->running_free(ctx->mem, cfg->running_cb_ctx);
+	else
+		rte_free(ctx->mem);
 	memset(&ctx->num_categories, 0,
 		sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories));
 }
@@ -1518,6 +1521,9 @@ acl_bld(struct acl_build_context *bcx, struct rte_acl_ctx *ctx,
 	bcx->acx = ctx;
 	bcx->pool.alignment = ACL_POOL_ALIGN;
 	bcx->pool.min_alloc = ACL_POOL_ALLOC_MIN;
+	bcx->pool.alloc_cb = cfg->temp_alloc;
+	bcx->pool.reset_cb = cfg->temp_reset;
+	bcx->pool.cb_ctx = cfg->temp_cb_ctx;
 	bcx->cfg = *cfg;
 	bcx->category_mask = RTE_LEN2MASK(bcx->cfg.num_categories,
 		typeof(bcx->category_mask));
@@ -1635,7 +1641,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
 	if (rc != 0)
 		return rc;
 
-	acl_build_reset(ctx);
+	acl_build_reset(ctx, cfg);
 
 	if (cfg->max_size == 0) {
 		n = NODE_MIN;
@@ -1655,7 +1661,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
 			rc = rte_acl_gen(ctx, bcx.tries, bcx.bld_tries,
 				bcx.num_tries, bcx.cfg.num_categories,
 				ACL_MAX_INDEXES * RTE_DIM(bcx.tries) *
-				sizeof(ctx->data_indexes[0]), max_size);
+				sizeof(ctx->data_indexes[0]), max_size, cfg);
 			if (rc == 0) {
 				/* set data indexes. */
 				acl_set_data_indexes(ctx);
diff --git a/lib/acl/acl_gen.c b/lib/acl/acl_gen.c
index 3c53d24056..6aa7d74635 100644
--- a/lib/acl/acl_gen.c
+++ b/lib/acl/acl_gen.c
@@ -448,7 +448,8 @@ acl_calc_counts_indices(struct acl_node_counters *counts,
 int
 rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
-	uint32_t num_categories, uint32_t data_index_sz, size_t max_size)
+	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
+	const struct rte_acl_config *cfg)
 {
 	void *mem;
 	size_t total_size;
@@ -478,7 +479,10 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 		return -ERANGE;
 	}
 
-	mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
+	if (cfg->running_alloc)
+		mem = cfg->running_alloc(total_size, RTE_CACHE_LINE_SIZE, cfg->running_cb_ctx);
+	else
+		mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
 			ctx->socket_id);
 	if (mem == NULL) {
 		ACL_LOG(ERR,
diff --git a/lib/acl/rte_acl.c b/lib/acl/rte_acl.c
index 8c0ca29618..e765c40f4f 100644
--- a/lib/acl/rte_acl.c
+++ b/lib/acl/rte_acl.c
@@ -362,7 +362,10 @@ rte_acl_free(struct rte_acl_ctx *ctx)
 
 	rte_mcfg_tailq_write_unlock();
 
-	rte_free(ctx->mem);
+	if (ctx->config.running_free)
+		ctx->config.running_free(ctx->mem, ctx->config.running_cb_ctx);
+	else
+		rte_free(ctx->mem);
 	rte_free(ctx);
 	rte_free(te);
 }
diff --git a/lib/acl/rte_acl.h b/lib/acl/rte_acl.h
index 95354cabb8..c675c9ff81 100644
--- a/lib/acl/rte_acl.h
+++ b/lib/acl/rte_acl.h
@@ -13,6 +13,7 @@
 
 #include <rte_common.h>
 #include <rte_acl_osdep.h>
+#include <setjmp.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -61,6 +62,11 @@ struct rte_acl_field_def {
  * ACL build configuration.
  * Defines the fields of an ACL trie and number of categories to build with.
  */
+typedef void *(*rte_acl_running_alloc_t)(size_t, unsigned int, void *);
+typedef void  (*rte_acl_running_free_t)(void *, void *);
+typedef void *(*rte_acl_temp_alloc_t)(size_t, sigjmp_buf, void *);
+typedef void  (*rte_acl_temp_reset_t)(void *);
+
 struct rte_acl_config {
 	uint32_t num_categories; /**< Number of categories to build with. */
 	uint32_t num_fields;     /**< Number of field definitions. */
@@ -68,6 +74,20 @@ struct rte_acl_config {
 	/**< array of field definitions. */
 	size_t max_size;
 	/**< max memory limit for internal run-time structures. */
+
+	/**< Allocator callback for run-time internal memory. */
+	rte_acl_running_alloc_t  running_alloc;
+	/**< Free callback for run-time internal memory. */
+	rte_acl_running_free_t   running_free;
+	/**< User context passed to running_alloc/free. */
+	void                     *running_cb_ctx;
+
+	/**< Allocator callback for temporary memory used during build. */
+	rte_acl_temp_alloc_t     temp_alloc;
+	/**< Reset callback for temporary allocator. */
+	rte_acl_temp_reset_t     temp_reset;
+	/**< User context passed to temp_alloc/reset. */
+	void                     *temp_cb_ctx;
 };
 
 /**
diff --git a/lib/acl/tb_mem.c b/lib/acl/tb_mem.c
index 9264433422..b9c69b563e 100644
--- a/lib/acl/tb_mem.c
+++ b/lib/acl/tb_mem.c
@@ -55,6 +55,9 @@ tb_alloc(struct tb_mem_pool *pool, size_t size)
 
 	size = RTE_ALIGN_CEIL(size, pool->alignment);
 
+	if (pool->alloc_cb)
+		return pool->alloc_cb(size, pool->fail, pool->cb_ctx);
+
 	block = pool->block;
 	if (block == NULL || block->size < size) {
 		new_sz = (size > pool->min_alloc) ? size : pool->min_alloc;
@@ -71,6 +74,11 @@ tb_free_pool(struct tb_mem_pool *pool)
 {
 	struct tb_mem_block *next, *block;
 
+	if (pool->reset_cb) {
+		pool->reset_cb(pool->cb_ctx);
+		return;
+	}
+
 	for (block = pool->block; block != NULL; block = next) {
 		next = block->next;
 		free(block);
diff --git a/lib/acl/tb_mem.h b/lib/acl/tb_mem.h
index 2093744a6d..2fdebefc31 100644
--- a/lib/acl/tb_mem.h
+++ b/lib/acl/tb_mem.h
@@ -24,11 +24,17 @@ struct tb_mem_block {
 	uint8_t             *mem;
 };
 
+typedef void *(*rte_tb_alloc_t)(size_t, sigjmp_buf, void *);
+typedef void (*rte_tb_reset_t)(void *);
+
 struct tb_mem_pool {
 	struct tb_mem_block *block;
 	size_t               alignment;
 	size_t               min_alloc;
 	size_t               alloc;
+	rte_tb_alloc_t       alloc_cb;
+	rte_tb_reset_t       reset_cb;
+	void                 *cb_ctx;
 	/* jump target in case of memory allocation failure. */
 	sigjmp_buf           fail;
 };
-- 
2.43.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3] acl: support custom memory allocator
  2025-11-17 12:51 ` Konstantin Ananyev
  2025-11-25  9:40   ` [PATCH] acl: support custom memory allocator =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-25 12:06   ` [PATCH v2] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
@ 2025-11-25 12:14   ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-25 14:59     ` Stephen Hemminger
                       ` (2 more replies)
  2 siblings, 3 replies; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-11-25 12:14 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, YongFeng Wang

Reduce memory fragmentation caused by dynamic memory allocations
by allowing users to provide custom memory allocator.

Add new members to struct rte_acl_config to allow passing custom
allocator callbacks to rte_acl_build:

- running_alloc: allocator callback for run-time internal memory
- running_free: free callback for run-time internal memory
- running_ctx: user-defined context passed to running_alloc/free

- temp_alloc: allocator callback for temporary memory during ACL build
- temp_reset: reset callback for temporary allocator
- temp_ctx: user-defined context passed to temp_alloc/reset

These callbacks allow users to provide their own memory pools or
allocators for both persistent runtime structures and temporary
build-time data.

A typical approach is to pre-allocate a static memory region
for rte_acl_ctx, and to provide a global temporary memory manager
that supports multipleallocations and a single reset during ACL build.

Since tb_mem_pool handles allocation failures using siglongjmp,
temp_alloc follows the same approach for failure handling.

Signed-off-by: YongFeng Wang <mannywang@tencent.com>
---
 app/test/test_acl.c | 181 +++++++++++++++++++++++++++++++++++++++++++-
 lib/acl/acl.h       |   3 +-
 lib/acl/acl_bld.c   |  14 +++-
 lib/acl/acl_gen.c   |   8 +-
 lib/acl/rte_acl.c   |   5 +-
 lib/acl/rte_acl.h   |  20 +++++
 lib/acl/tb_mem.c    |   8 ++
 lib/acl/tb_mem.h    |   6 ++
 8 files changed, 236 insertions(+), 9 deletions(-)

diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index 43d13b5b0f..9c6ed34f0c 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -1721,6 +1721,184 @@ test_u32_range(void)
 	return rc;
 }
 
+struct acl_ctx_wrapper_t {
+	struct rte_acl_ctx *ctx;
+	void *running_buf;
+	bool running_buf_using;
+};
+
+struct acl_temp_mem_mgr_t {
+	void *buf;
+	uint32_t buf_used;
+	sigjmp_buf fail;
+};
+
+struct acl_ctx_wrapper_t g_acl_ctx_wrapper;
+struct acl_temp_mem_mgr_t g_temp_mem_mgr;
+
+#define ACL_RUNNING_BUF_SIZE (10 * 1024 * 1024)
+#define ACL_TEMP_BUF_SIZE (10 * 1024 * 1024)
+
+static void *running_alloc(size_t size, unsigned int align, void *cb_data)
+{
+	(void)align;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)cb_data;
+	if (gwlb_acl_ctx->running_buf_using)
+		return NULL;
+	printf("running memory alloc for acl context, size=%zu, pointer=%p\n",
+		size,
+		gwlb_acl_ctx->running_buf);
+	gwlb_acl_ctx->running_buf_using = true;
+	return gwlb_acl_ctx->running_buf;
+}
+
+static void running_free(void *buf, void *cb_data)
+{
+	if (!buf)
+		return;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)cb_data;
+	printf("running memory free pointer=%p\n", buf);
+	gwlb_acl_ctx->running_buf_using = false;
+}
+
+static void *temp_alloc(size_t size, sigjmp_buf fail, void *cb_data)
+{
+	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct acl_temp_mem_mgr_t *)cb_data;
+	if (ACL_TEMP_BUF_SIZE - gwlb_acl_build->buf_used < size) {
+		printf("Line %i: alloc temp memory fail, size=%zu, used=%d\n",
+			__LINE__,
+			size,
+			gwlb_acl_build->buf_used);
+		siglongjmp(fail, -ENOMEM);
+		return NULL;
+	}
+	void *ret = (char *)gwlb_acl_build->buf + gwlb_acl_build->buf_used;
+	gwlb_acl_build->buf_used += size;
+	return ret;
+}
+
+static void temp_reset(void *cb_data)
+{
+	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct acl_temp_mem_mgr_t *)cb_data;
+	memset(gwlb_acl_build->buf, 0, ACL_TEMP_BUF_SIZE);
+	printf("temp memory reset, used total=%u\n", gwlb_acl_build->buf_used);
+	gwlb_acl_build->buf_used = 0;
+}
+
+static int
+rte_acl_ipv4vlan_build_wich_mem_cb(struct rte_acl_ctx *ctx,
+	const uint32_t layout[RTE_ACL_IPV4VLAN_NUM],
+	uint32_t num_categories)
+{
+	struct rte_acl_config cfg;
+
+	if (ctx == NULL || layout == NULL)
+		return -EINVAL;
+
+	memset(&cfg, 0, sizeof(cfg));
+	acl_ipv4vlan_config(&cfg, layout, num_categories);
+	cfg.running_alloc = running_alloc;
+	cfg.running_free = running_free;
+	cfg.running_cb_ctx = &g_acl_ctx_wrapper;
+	cfg.temp_alloc = temp_alloc;
+	cfg.temp_reset = temp_reset;
+	cfg.temp_cb_ctx = &g_temp_mem_mgr;
+	return rte_acl_build(ctx, &cfg);
+}
+
+static int
+test_classify_buid_wich_mem_cb(struct rte_acl_ctx *acx,
+	const struct rte_acl_ipv4vlan_rule *rules, uint32_t num)
+{
+	int ret;
+
+	/* add rules to the context */
+	ret = rte_acl_ipv4vlan_add_rules(acx, rules, num);
+	if (ret != 0) {
+		printf("Line %i: Adding rules to ACL context failed!\n",
+			__LINE__);
+		return ret;
+	}
+
+	/* try building the context */
+	ret = rte_acl_ipv4vlan_build_wich_mem_cb(acx, ipv4_7tuple_layout,
+		RTE_ACL_MAX_CATEGORIES);
+	if (ret != 0) {
+		printf("Line %i: Building ACL context failed!\n", __LINE__);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int
+test_mem_cb(void)
+{
+	int i, ret;
+	g_acl_ctx_wrapper.ctx = rte_acl_create(&acl_param);
+	if (g_acl_ctx_wrapper.ctx == NULL) {
+		printf("Line %i: Error creating ACL context!\n", __LINE__);
+		return -1;
+	}
+	g_acl_ctx_wrapper.running_buf = rte_zmalloc_socket(
+		"test_acl",
+		ACL_RUNNING_BUF_SIZE,
+		RTE_CACHE_LINE_SIZE,
+		SOCKET_ID_ANY);
+	if (!g_acl_ctx_wrapper.running_buf) {
+		printf("Line %i: Error allocing running buf for acl context!\n", __LINE__);
+		return 1;
+	}
+	g_acl_ctx_wrapper.running_buf_using = false;
+
+	g_temp_mem_mgr.buf = malloc(ACL_TEMP_BUF_SIZE);
+	if (!g_temp_mem_mgr.buf)
+		printf("Line %i: Error allocing teem buf for acl build!\n", __LINE__);
+	memset(g_temp_mem_mgr.buf, 0, ACL_TEMP_BUF_SIZE);
+	g_temp_mem_mgr.buf_used = 0;
+
+	ret = 0;
+	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
+
+		if ((i & 1) == 0)
+			rte_acl_reset(g_acl_ctx_wrapper.ctx);
+		else
+			rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+
+		ret = test_classify_buid_wich_mem_cb(g_acl_ctx_wrapper.ctx, acl_test_rules,
+			RTE_DIM(acl_test_rules));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: "
+				"Adding rules to ACL context failed!\n",
+				__LINE__, i);
+			break;
+		}
+
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+
+		/* reset rules and make sure that classify still works ok. */
+		rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+	}
+
+	rte_acl_free(g_acl_ctx_wrapper.ctx);
+	free(g_temp_mem_mgr.buf);
+	rte_free(g_acl_ctx_wrapper.running_buf);
+	return ret;
+}
+
 static int
 test_acl(void)
 {
@@ -1742,7 +1920,8 @@ test_acl(void)
 		return -1;
 	if (test_u32_range() < 0)
 		return -1;
-
+	if (test_mem_cb() < 0)
+		return -1;
 	return 0;
 }
 
diff --git a/lib/acl/acl.h b/lib/acl/acl.h
index c8e4e72fab..7080fff64d 100644
--- a/lib/acl/acl.h
+++ b/lib/acl/acl.h
@@ -189,7 +189,8 @@ struct rte_acl_ctx {
 
 int rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
-	uint32_t num_categories, uint32_t data_index_sz, size_t max_size);
+	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
+	const struct rte_acl_config *cfg);
 
 typedef int (*rte_acl_classify_t)
 (const struct rte_acl_ctx *, const uint8_t **, uint32_t *, uint32_t, uint32_t);
diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
index 7056b1c117..1fd0ee3aa5 100644
--- a/lib/acl/acl_bld.c
+++ b/lib/acl/acl_bld.c
@@ -777,9 +777,12 @@ acl_merge_trie(struct acl_build_context *context,
  *  - reset all RT related fields to zero.
  */
 static void
-acl_build_reset(struct rte_acl_ctx *ctx)
+acl_build_reset(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
 {
-	rte_free(ctx->mem);
+	if (cfg->running_free)
+		cfg->running_free(ctx->mem, cfg->running_cb_ctx);
+	else
+		rte_free(ctx->mem);
 	memset(&ctx->num_categories, 0,
 		sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories));
 }
@@ -1518,6 +1521,9 @@ acl_bld(struct acl_build_context *bcx, struct rte_acl_ctx *ctx,
 	bcx->acx = ctx;
 	bcx->pool.alignment = ACL_POOL_ALIGN;
 	bcx->pool.min_alloc = ACL_POOL_ALLOC_MIN;
+	bcx->pool.alloc_cb = cfg->temp_alloc;
+	bcx->pool.reset_cb = cfg->temp_reset;
+	bcx->pool.cb_ctx = cfg->temp_cb_ctx;
 	bcx->cfg = *cfg;
 	bcx->category_mask = RTE_LEN2MASK(bcx->cfg.num_categories,
 		typeof(bcx->category_mask));
@@ -1635,7 +1641,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
 	if (rc != 0)
 		return rc;
 
-	acl_build_reset(ctx);
+	acl_build_reset(ctx, cfg);
 
 	if (cfg->max_size == 0) {
 		n = NODE_MIN;
@@ -1655,7 +1661,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
 			rc = rte_acl_gen(ctx, bcx.tries, bcx.bld_tries,
 				bcx.num_tries, bcx.cfg.num_categories,
 				ACL_MAX_INDEXES * RTE_DIM(bcx.tries) *
-				sizeof(ctx->data_indexes[0]), max_size);
+				sizeof(ctx->data_indexes[0]), max_size, cfg);
 			if (rc == 0) {
 				/* set data indexes. */
 				acl_set_data_indexes(ctx);
diff --git a/lib/acl/acl_gen.c b/lib/acl/acl_gen.c
index 3c53d24056..6aa7d74635 100644
--- a/lib/acl/acl_gen.c
+++ b/lib/acl/acl_gen.c
@@ -448,7 +448,8 @@ acl_calc_counts_indices(struct acl_node_counters *counts,
 int
 rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
-	uint32_t num_categories, uint32_t data_index_sz, size_t max_size)
+	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
+	const struct rte_acl_config *cfg)
 {
 	void *mem;
 	size_t total_size;
@@ -478,7 +479,10 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 		return -ERANGE;
 	}
 
-	mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
+	if (cfg->running_alloc)
+		mem = cfg->running_alloc(total_size, RTE_CACHE_LINE_SIZE, cfg->running_cb_ctx);
+	else
+		mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
 			ctx->socket_id);
 	if (mem == NULL) {
 		ACL_LOG(ERR,
diff --git a/lib/acl/rte_acl.c b/lib/acl/rte_acl.c
index 8c0ca29618..e765c40f4f 100644
--- a/lib/acl/rte_acl.c
+++ b/lib/acl/rte_acl.c
@@ -362,7 +362,10 @@ rte_acl_free(struct rte_acl_ctx *ctx)
 
 	rte_mcfg_tailq_write_unlock();
 
-	rte_free(ctx->mem);
+	if (ctx->config.running_free)
+		ctx->config.running_free(ctx->mem, ctx->config.running_cb_ctx);
+	else
+		rte_free(ctx->mem);
 	rte_free(ctx);
 	rte_free(te);
 }
diff --git a/lib/acl/rte_acl.h b/lib/acl/rte_acl.h
index 95354cabb8..c675c9ff81 100644
--- a/lib/acl/rte_acl.h
+++ b/lib/acl/rte_acl.h
@@ -13,6 +13,7 @@
 
 #include <rte_common.h>
 #include <rte_acl_osdep.h>
+#include <setjmp.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -61,6 +62,11 @@ struct rte_acl_field_def {
  * ACL build configuration.
  * Defines the fields of an ACL trie and number of categories to build with.
  */
+typedef void *(*rte_acl_running_alloc_t)(size_t, unsigned int, void *);
+typedef void  (*rte_acl_running_free_t)(void *, void *);
+typedef void *(*rte_acl_temp_alloc_t)(size_t, sigjmp_buf, void *);
+typedef void  (*rte_acl_temp_reset_t)(void *);
+
 struct rte_acl_config {
 	uint32_t num_categories; /**< Number of categories to build with. */
 	uint32_t num_fields;     /**< Number of field definitions. */
@@ -68,6 +74,20 @@ struct rte_acl_config {
 	/**< array of field definitions. */
 	size_t max_size;
 	/**< max memory limit for internal run-time structures. */
+
+	/**< Allocator callback for run-time internal memory. */
+	rte_acl_running_alloc_t  running_alloc;
+	/**< Free callback for run-time internal memory. */
+	rte_acl_running_free_t   running_free;
+	/**< User context passed to running_alloc/free. */
+	void                     *running_cb_ctx;
+
+	/**< Allocator callback for temporary memory used during build. */
+	rte_acl_temp_alloc_t     temp_alloc;
+	/**< Reset callback for temporary allocator. */
+	rte_acl_temp_reset_t     temp_reset;
+	/**< User context passed to temp_alloc/reset. */
+	void                     *temp_cb_ctx;
 };
 
 /**
diff --git a/lib/acl/tb_mem.c b/lib/acl/tb_mem.c
index 9264433422..b9c69b563e 100644
--- a/lib/acl/tb_mem.c
+++ b/lib/acl/tb_mem.c
@@ -55,6 +55,9 @@ tb_alloc(struct tb_mem_pool *pool, size_t size)
 
 	size = RTE_ALIGN_CEIL(size, pool->alignment);
 
+	if (pool->alloc_cb)
+		return pool->alloc_cb(size, pool->fail, pool->cb_ctx);
+
 	block = pool->block;
 	if (block == NULL || block->size < size) {
 		new_sz = (size > pool->min_alloc) ? size : pool->min_alloc;
@@ -71,6 +74,11 @@ tb_free_pool(struct tb_mem_pool *pool)
 {
 	struct tb_mem_block *next, *block;
 
+	if (pool->reset_cb) {
+		pool->reset_cb(pool->cb_ctx);
+		return;
+	}
+
 	for (block = pool->block; block != NULL; block = next) {
 		next = block->next;
 		free(block);
diff --git a/lib/acl/tb_mem.h b/lib/acl/tb_mem.h
index 2093744a6d..2fdebefc31 100644
--- a/lib/acl/tb_mem.h
+++ b/lib/acl/tb_mem.h
@@ -24,11 +24,17 @@ struct tb_mem_block {
 	uint8_t             *mem;
 };
 
+typedef void *(*rte_tb_alloc_t)(size_t, sigjmp_buf, void *);
+typedef void (*rte_tb_reset_t)(void *);
+
 struct tb_mem_pool {
 	struct tb_mem_block *block;
 	size_t               alignment;
 	size_t               min_alloc;
 	size_t               alloc;
+	rte_tb_alloc_t       alloc_cb;
+	rte_tb_reset_t       reset_cb;
+	void                 *cb_ctx;
 	/* jump target in case of memory allocation failure. */
 	sigjmp_buf           fail;
 };
-- 
2.43.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] acl: support custom memory allocator
  2025-11-25 12:14   ` [PATCH v3] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
@ 2025-11-25 14:59     ` Stephen Hemminger
  2025-11-26  2:37       ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-25 18:01     ` Dmitry Kozlyuk
  2025-11-28 13:26     ` Konstantin Ananyev
  2 siblings, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2025-11-25 14:59 UTC (permalink / raw)
  To: mannywang(王永峰); +Cc: Konstantin Ananyev, dev

On Tue, 25 Nov 2025 12:14:46 +0000
"mannywang(王永峰)" <mannywang@tencent.com> wrote:

> Reduce memory fragmentation caused by dynamic memory allocations
> by allowing users to provide custom memory allocator.
> 
> Add new members to struct rte_acl_config to allow passing custom
> allocator callbacks to rte_acl_build:
> 
> - running_alloc: allocator callback for run-time internal memory
> - running_free: free callback for run-time internal memory
> - running_ctx: user-defined context passed to running_alloc/free
> 
> - temp_alloc: allocator callback for temporary memory during ACL build
> - temp_reset: reset callback for temporary allocator
> - temp_ctx: user-defined context passed to temp_alloc/reset
> 
> These callbacks allow users to provide their own memory pools or
> allocators for both persistent runtime structures and temporary
> build-time data.
> 
> A typical approach is to pre-allocate a static memory region
> for rte_acl_ctx, and to provide a global temporary memory manager
> that supports multipleallocations and a single reset during ACL build.
> 
> Since tb_mem_pool handles allocation failures using siglongjmp,
> temp_alloc follows the same approach for failure handling.
> 
> Signed-off-by: YongFeng Wang <mannywang@tencent.com>

Rather than introduce an API change which can have impacts in many places;
would it be better to fix the underlying rte_malloc implementation.
The allocator in rte_malloc() is simplistic compared to glibc and
other malloc libraries. The other libraries provide better density,
statistics and performance.

Improving rte_malloc() would help all use cases not just the special
case of busy ACL usage.

The other question is does ACL library really need to be storing
this data in huge pages at all? If all it needed was an allocator
for single process usage, than just using regular malloc would
avoid the whole mess.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] acl: support custom memory allocator
  2025-11-25 12:14   ` [PATCH v3] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-25 14:59     ` Stephen Hemminger
@ 2025-11-25 18:01     ` Dmitry Kozlyuk
  2025-11-26  2:44       ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-28 13:26     ` Konstantin Ananyev
  2 siblings, 1 reply; 20+ messages in thread
From: Dmitry Kozlyuk @ 2025-11-25 18:01 UTC (permalink / raw)
  To: mannywang(王永峰), Konstantin Ananyev; +Cc: dev

On 11/25/25 15:14, mannywang(王永峰) wrote:
> Reduce memory fragmentation caused by dynamic memory allocations
> by allowing users to provide custom memory allocator.
>
> Add new members to struct rte_acl_config to allow passing custom
> allocator callbacks to rte_acl_build:
>
> - running_alloc: allocator callback for run-time internal memory
> - running_free: free callback for run-time internal memory
> - running_ctx: user-defined context passed to running_alloc/free
>
> - temp_alloc: allocator callback for temporary memory during ACL build
> - temp_reset: reset callback for temporary allocator
> - temp_ctx: user-defined context passed to temp_alloc/reset
>
> These callbacks allow users to provide their own memory pools or
> allocators for both persistent runtime structures and temporary
> build-time data.
>
> A typical approach is to pre-allocate a static memory region
> for rte_acl_ctx, and to provide a global temporary memory manager
> that supports multipleallocations and a single reset during ACL build.
>
> Since tb_mem_pool handles allocation failures using siglongjmp,
> temp_alloc follows the same approach for failure handling.

If a static memory region would suffice for runtime memory,
could you have solved the issue using existing API as follows?
1. Allocate memory in any way, may even use `rte_malloc_*()`.
2. Create a new heap using `rte_malloc_heap_create()`.
3. Attach the memory to the heap using `rte_malloc_heap_memory_add()`.
4. Get the heap "socket ID" using `rte_malloc_heap_get_socket()`.
5. Pass the heap "socket ID" to `rte_acl_create()`.

In https://inbox.dpdk.org/dev/tencent_4125B1322F9238892BFA5F38@qq.com/
you said that the issue is runtime memory fragmentation,
but also did "propose extending the ACL API to support
external memory buffers for the build process".
What is the issue with build-time allocations?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Internet]Re: [PATCH v3] acl: support custom memory allocator
  2025-11-25 14:59     ` Stephen Hemminger
@ 2025-11-26  2:37       ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  0 siblings, 0 replies; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-11-26  2:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Konstantin Ananyev, dev

Thanks for the review and questions.

In our deployment scenario, long-running services must avoid runtime
allocation/deallocation to ensure stability. We have observed memory
fragmentation in practice when frequent small allocations happen on the
data path. Even with optimized allocators, this behavior accumulates over
time and can lead to unexpected latency spikes.

To address this, our project adopts a pre-allocation model: each acl_ctx
is associated with a sufficiently large memory block during initialization,
and no allocations occur afterwards. This approach has been effective in
eliminating runtime uncertainty in our use case.

The proposed patch enables applications with similar requirements to plug
their memory management strategy into the ACL layer without changing the
core logic. The default behavior remains unchanged.

On 11/25/2025 10:59 PM, Stephen Hemminger wrote:
> On Tue, 25 Nov 2025 12:14:46 +0000
> "mannywang(王永峰)" <mannywang@tencent.com> wrote:
> 
>> Reduce memory fragmentation caused by dynamic memory allocations
>> by allowing users to provide custom memory allocator.
>>
>> Add new members to struct rte_acl_config to allow passing custom
>> allocator callbacks to rte_acl_build:
>>
>> - running_alloc: allocator callback for run-time internal memory
>> - running_free: free callback for run-time internal memory
>> - running_ctx: user-defined context passed to running_alloc/free
>>
>> - temp_alloc: allocator callback for temporary memory during ACL build
>> - temp_reset: reset callback for temporary allocator
>> - temp_ctx: user-defined context passed to temp_alloc/reset
>>
>> These callbacks allow users to provide their own memory pools or
>> allocators for both persistent runtime structures and temporary
>> build-time data.
>>
>> A typical approach is to pre-allocate a static memory region
>> for rte_acl_ctx, and to provide a global temporary memory manager
>> that supports multipleallocations and a single reset during ACL build.
>>
>> Since tb_mem_pool handles allocation failures using siglongjmp,
>> temp_alloc follows the same approach for failure handling.
>>
>> Signed-off-by: YongFeng Wang <mannywang@tencent.com>
> 
> Rather than introduce an API change which can have impacts in many places;
> would it be better to fix the underlying rte_malloc implementation.
> The allocator in rte_malloc() is simplistic compared to glibc and
> other malloc libraries. The other libraries provide better density,
> statistics and performance.
> 
> Improving rte_malloc() would help all use cases not just the special
> case of busy ACL usage.
> 
> The other question is does ACL library really need to be storing
> this data in huge pages at all? If all it needed was an allocator
> for single process usage, than just using regular malloc would
> avoid the whole mess.
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Internet]Re: [PATCH v3] acl: support custom memory allocator
  2025-11-25 18:01     ` Dmitry Kozlyuk
@ 2025-11-26  2:44       ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-26  7:57         ` Dmitry Kozlyuk
  0 siblings, 1 reply; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-11-26  2:44 UTC (permalink / raw)
  To: Dmitry Kozlyuk, Konstantin Ananyev; +Cc: dev

Thanks for sharing this suggestion.

We actually evaluated the heap-based approach before implementing this 
patch.
It can help in some scenarios, but unfortunately it does not fully solve our
use cases. Specifically:

1. **Heap count / scalability**
    Our application maintains at least ~200 rte_acl_ctx instances (due 
to the
    total rule count and multi-tenant isolation). Allowing a dedicated 
heap per
    context would exceed the practical limits of the current rte_malloc heap
    model. The number of heaps that can be created is not unlimited, and
    maintaining hundreds of separate heaps would introduce considerable
    management overhead.

2. **Temporary allocations in build stage**
    During `rte_acl_build`, a significant portion of memory is allocated 
through
    `calloc()` for internal temporary structures. These allocations are 
freed
    right after the build completes. Even if runtime memory could come 
from a
    custom heap, these temporary allocations would still need an independent
    allocator or callback mechanism to avoid fragmentation and repeated
    malloc/free cycles.

The goal of this patch is to provide allocator callbacks so that 
applications
can apply their own memory model consistently — static region for 
runtime, and
resettable pool for build — without relying on uncontrolled internal
allocations.

On 11/26/2025 2:01 AM, Dmitry Kozlyuk wrote:
> On 11/25/25 15:14, mannywang(王永峰) wrote:
>> Reduce memory fragmentation caused by dynamic memory allocations
>> by allowing users to provide custom memory allocator.
>>
>> Add new members to struct rte_acl_config to allow passing custom
>> allocator callbacks to rte_acl_build:
>>
>> - running_alloc: allocator callback for run-time internal memory
>> - running_free: free callback for run-time internal memory
>> - running_ctx: user-defined context passed to running_alloc/free
>>
>> - temp_alloc: allocator callback for temporary memory during ACL build
>> - temp_reset: reset callback for temporary allocator
>> - temp_ctx: user-defined context passed to temp_alloc/reset
>>
>> These callbacks allow users to provide their own memory pools or
>> allocators for both persistent runtime structures and temporary
>> build-time data.
>>
>> A typical approach is to pre-allocate a static memory region
>> for rte_acl_ctx, and to provide a global temporary memory manager
>> that supports multipleallocations and a single reset during ACL build.
>>
>> Since tb_mem_pool handles allocation failures using siglongjmp,
>> temp_alloc follows the same approach for failure handling.
> 
> If a static memory region would suffice for runtime memory,
> could you have solved the issue using existing API as follows?
> 1. Allocate memory in any way, may even use `rte_malloc_*()`.
> 2. Create a new heap using `rte_malloc_heap_create()`.
> 3. Attach the memory to the heap using `rte_malloc_heap_memory_add()`.
> 4. Get the heap "socket ID" using `rte_malloc_heap_get_socket()`.
> 5. Pass the heap "socket ID" to `rte_acl_create()`.
> 
> In https://inbox.dpdk.org/dev/tencent_4125B1322F9238892BFA5F38@qq.com/
> you said that the issue is runtime memory fragmentation,
> but also did "propose extending the ACL API to support
> external memory buffers for the build process".
> What is the issue with build-time allocations?
> 
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Internet]Re: [PATCH v3] acl: support custom memory allocator
  2025-11-26  2:44       ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
@ 2025-11-26  7:57         ` Dmitry Kozlyuk
  2025-11-26  8:09           ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Kozlyuk @ 2025-11-26  7:57 UTC (permalink / raw)
  To: mannywang(王永峰), Konstantin Ananyev; +Cc: dev

On 11/26/25 05:44, mannywang(王永峰) wrote:
> Thanks for sharing this suggestion.
>
> We actually evaluated the heap-based approach before implementing this 
> patch.
> It can help in some scenarios, but unfortunately it does not fully 
> solve our
> use cases. Specifically:
>
> 1. **Heap count / scalability**
>    Our application maintains at least ~200 rte_acl_ctx instances (due 
> to the
>    total rule count and multi-tenant isolation). Allowing a dedicated 
> heap per
>    context would exceed the practical limits of the current rte_malloc 
> heap
>    model. The number of heaps that can be created is not unlimited, and
>    maintaining hundreds of separate heaps would introduce considerable
>    management overhead.
This is a valid point against heaps, thanks.
> 2. **Temporary allocations in build stage**
>    During `rte_acl_build`, a significant portion of memory is 
> allocated through
>    `calloc()` for internal temporary structures. These allocations are 
> freed
>    right after the build completes. Even if runtime memory could come 
> from a
>    custom heap, these temporary allocations would still need an 
> independent
>    allocator or callback mechanism to avoid fragmentation and repeated
>    malloc/free cycles.
I don't understand the build stage issue and why it needs a custom 
allocator.
What exactly gets fragmented?
It is the entire process address space which is practically unlimited?
How does is malloc/free overhead compare to the overall ACL build time?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] acl: support custom memory allocator
  2025-11-26  7:57         ` Dmitry Kozlyuk
@ 2025-11-26  8:09           ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-26 21:28             ` Stephen Hemminger
  0 siblings, 1 reply; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-11-26  8:09 UTC (permalink / raw)
  To: Dmitry Kozlyuk, Konstantin Ananyev; +Cc: dev

Thanks for the follow-up question.

 > I don't understand the build stage issue and why it needs a custom 
allocator.

The fragmentation concern does not come from the amount of address space,
but from how the underlying heap allocator manages **large / mid-sized
temporary buffers** that are repeatedly allocated and freed during ACL 
build.

ACL build allocates many temporary arrays, tables and sorted structures.
Some of them are several MB in size. When these allocations are done via
malloc/calloc, they typically end up in the general heap. Every build
iteration produces a different allocation pattern and size distribution.
Even if the allocations are freed at the end, the internal heap layout is
not restored to a “flat” state. Small holes remain, and future allocation of
large contiguous blocks may fail even if the total free memory is 
sufficient.

This becomes a real operational issue in long-running processes.

 > What exactly gets fragmented? It is the entire process address space 
which is practically unlimited?

It is not the address space that is the limiting factor.
It is the **allocator's internal arena**.

Most allocators (glibc malloc, jemalloc, tcmalloc, etc) retain internal
metadata, bins, and split blocks. Their fragmentation behavior accumulates
over time. The process may still have hundreds of MB of “free memory”, but
not in **contiguous regions** that satisfy the next large request.

 > How does malloc/free overhead compare to overall ACL build time?

The cost of malloc/free calls themselves is not the core problem.
The overhead is small relative to the total build time.

The risk is that allocator fragmentation increases unpredictably over a long
deployment, until a large block allocation fails in the data plane.

Our team has seen this exact behavior in production environments.
Because we cannot fully control the allocator state, we prefer a model
with zero dynamic allocation after init:

* persistent runtime structures → pre-allocated static region
* temporary build data → resettable memory pool

This avoids failure modes caused by allocator history and guarantees stable
latency regardless of system uptime or build frequency.

On 11/26/2025 3:57 PM, Dmitry Kozlyuk wrote:
> On 11/26/25 05:44, mannywang(王永峰) wrote:
>> Thanks for sharing this suggestion.
>>
>> We actually evaluated the heap-based approach before implementing this 
>> patch.
>> It can help in some scenarios, but unfortunately it does not fully 
>> solve our
>> use cases. Specifically:
>>
>> 1. **Heap count / scalability**
>>    Our application maintains at least ~200 rte_acl_ctx instances (due 
>> to the
>>    total rule count and multi-tenant isolation). Allowing a dedicated 
>> heap per
>>    context would exceed the practical limits of the current rte_malloc 
>> heap
>>    model. The number of heaps that can be created is not unlimited, and
>>    maintaining hundreds of separate heaps would introduce considerable
>>    management overhead.
> This is a valid point against heaps, thanks.
>> 2. **Temporary allocations in build stage**
>>    During `rte_acl_build`, a significant portion of memory is 
>> allocated through
>>    `calloc()` for internal temporary structures. These allocations are 
>> freed
>>    right after the build completes. Even if runtime memory could come 
>> from a
>>    custom heap, these temporary allocations would still need an 
>> independent
>>    allocator or callback mechanism to avoid fragmentation and repeated
>>    malloc/free cycles.
> I don't understand the build stage issue and why it needs a custom 
> allocator.
> What exactly gets fragmented?
> It is the entire process address space which is practically unlimited?
> How does is malloc/free overhead compare to the overall ACL build time?
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] acl: support custom memory allocator
  2025-11-26  8:09           ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
@ 2025-11-26 21:28             ` Stephen Hemminger
  2025-11-27  2:05               ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2025-11-26 21:28 UTC (permalink / raw)
  To: mannywang(王永峰)
  Cc: Dmitry Kozlyuk, Konstantin Ananyev, dev

On Wed, 26 Nov 2025 16:09:20 +0800
"mannywang(王永峰)" <mannywang@tencent.com> wrote:

> Thanks for the follow-up question.
> 
>  > I don't understand the build stage issue and why it needs a custom   
> allocator.
> 
> The fragmentation concern does not come from the amount of address space,
> but from how the underlying heap allocator manages **large / mid-sized
> temporary buffers** that are repeatedly allocated and freed during ACL 
> build.
> 
> ACL build allocates many temporary arrays, tables and sorted structures.
> Some of them are several MB in size. When these allocations are done via
> malloc/calloc, they typically end up in the general heap. Every build
> iteration produces a different allocation pattern and size distribution.
> Even if the allocations are freed at the end, the internal heap layout is
> not restored to a “flat” state. Small holes remain, and future allocation of
> large contiguous blocks may fail even if the total free memory is 
> sufficient.
> 
> This becomes a real operational issue in long-running processes.
> 
>  > What exactly gets fragmented? It is the entire process address space   
> which is practically unlimited?
> 
> It is not the address space that is the limiting factor.
> It is the **allocator's internal arena**.
> 
> Most allocators (glibc malloc, jemalloc, tcmalloc, etc) retain internal
> metadata, bins, and split blocks. Their fragmentation behavior accumulates
> over time. The process may still have hundreds of MB of “free memory”, but
> not in **contiguous regions** that satisfy the next large request.
> 
>  > How does malloc/free overhead compare to overall ACL build time?  
> 
> The cost of malloc/free calls themselves is not the core problem.
> The overhead is small relative to the total build time.
> 
> The risk is that allocator fragmentation increases unpredictably over a long
> deployment, until a large block allocation fails in the data plane.
> 
> Our team has seen this exact behavior in production environments.
> Because we cannot fully control the allocator state, we prefer a model
> with zero dynamic allocation after init:
> 
> * persistent runtime structures → pre-allocated static region
> * temporary build data → resettable memory pool
> 
> This avoids failure modes caused by allocator history and guarantees stable
> latency regardless of system uptime or build frequency.
> 
> On 11/26/2025 3:57 PM, Dmitry Kozlyuk wrote:
> > On 11/26/25 05:44, mannywang(王永峰) wrote:  
> >> Thanks for sharing this suggestion.
> >>
> >> We actually evaluated the heap-based approach before implementing this 
> >> patch.
> >> It can help in some scenarios, but unfortunately it does not fully 
> >> solve our
> >> use cases. Specifically:
> >>
> >> 1. **Heap count / scalability**
> >>    Our application maintains at least ~200 rte_acl_ctx instances (due 
> >> to the
> >>    total rule count and multi-tenant isolation). Allowing a dedicated 
> >> heap per
> >>    context would exceed the practical limits of the current rte_malloc 
> >> heap
> >>    model. The number of heaps that can be created is not unlimited, and
> >>    maintaining hundreds of separate heaps would introduce considerable
> >>    management overhead.  
> > This is a valid point against heaps, thanks.  
> >> 2. **Temporary allocations in build stage**
> >>    During `rte_acl_build`, a significant portion of memory is 
> >> allocated through
> >>    `calloc()` for internal temporary structures. These allocations are 
> >> freed
> >>    right after the build completes. Even if runtime memory could come 
> >> from a
> >>    custom heap, these temporary allocations would still need an 
> >> independent
> >>    allocator or callback mechanism to avoid fragmentation and repeated
> >>    malloc/free cycles.  
> > I don't understand the build stage issue and why it needs a custom 
> > allocator.
> > What exactly gets fragmented?
> > It is the entire process address space which is practically unlimited?
> > How does is malloc/free overhead compare to the overall ACL build time?
> >   

I have seen similar issues in other networking software, mostly it is because
glibc wants to avoid expensive compaction. See https://sourceware.org/glibc/wiki/MallocInternals

The solution was to call malloc_trim() at the end of control transaction.
If ACL library is doing lots of small allocations, then adding it there
would help.

The effect can also be mitigated by using mallopt to adjust MALLOC_TRIM_THRESHOLD.
There is lots of documentation on the Internet on this.

Another option for some workloads is using an alternative library for malloc.
There are lots of benchmarks on glibc vs tcmalloc vs jemalloc.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Internet]Re: [PATCH v3] acl: support custom memory allocator
  2025-11-26 21:28             ` Stephen Hemminger
@ 2025-11-27  2:05               ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  0 siblings, 0 replies; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-11-27  2:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Dmitry Kozlyuk, Konstantin Ananyev, dev

Thank you very much for your suggestions.Yes, we could indeed do better 
on the dynamic memory management side, and some other team members also 
share similar views.

On the other hand, this patch gives users an option: to completely avoid 
dynamic memory management, or, to put it more directly, to trade a 
sufficiently large (and possibly wasteful) amount of memory for higher 
determinism.

On 11/27/2025 5:28 AM, Stephen Hemminger wrote:
> On Wed, 26 Nov 2025 16:09:20 +0800
> "mannywang(王永峰)" <mannywang@tencent.com> wrote:
> 
>> Thanks for the follow-up question.
>>
>>   > I don't understand the build stage issue and why it needs a custom
>> allocator.
>>
>> The fragmentation concern does not come from the amount of address space,
>> but from how the underlying heap allocator manages **large / mid-sized
>> temporary buffers** that are repeatedly allocated and freed during ACL
>> build.
>>
>> ACL build allocates many temporary arrays, tables and sorted structures.
>> Some of them are several MB in size. When these allocations are done via
>> malloc/calloc, they typically end up in the general heap. Every build
>> iteration produces a different allocation pattern and size distribution.
>> Even if the allocations are freed at the end, the internal heap layout is
>> not restored to a “flat” state. Small holes remain, and future allocation of
>> large contiguous blocks may fail even if the total free memory is
>> sufficient.
>>
>> This becomes a real operational issue in long-running processes.
>>
>>   > What exactly gets fragmented? It is the entire process address space
>> which is practically unlimited?
>>
>> It is not the address space that is the limiting factor.
>> It is the **allocator's internal arena**.
>>
>> Most allocators (glibc malloc, jemalloc, tcmalloc, etc) retain internal
>> metadata, bins, and split blocks. Their fragmentation behavior accumulates
>> over time. The process may still have hundreds of MB of “free memory”, but
>> not in **contiguous regions** that satisfy the next large request.
>>
>>   > How does malloc/free overhead compare to overall ACL build time?
>>
>> The cost of malloc/free calls themselves is not the core problem.
>> The overhead is small relative to the total build time.
>>
>> The risk is that allocator fragmentation increases unpredictably over a long
>> deployment, until a large block allocation fails in the data plane.
>>
>> Our team has seen this exact behavior in production environments.
>> Because we cannot fully control the allocator state, we prefer a model
>> with zero dynamic allocation after init:
>>
>> * persistent runtime structures → pre-allocated static region
>> * temporary build data → resettable memory pool
>>
>> This avoids failure modes caused by allocator history and guarantees stable
>> latency regardless of system uptime or build frequency.
>>
>> On 11/26/2025 3:57 PM, Dmitry Kozlyuk wrote:
>>> On 11/26/25 05:44, mannywang(王永峰) wrote:
>>>> Thanks for sharing this suggestion.
>>>>
>>>> We actually evaluated the heap-based approach before implementing this
>>>> patch.
>>>> It can help in some scenarios, but unfortunately it does not fully
>>>> solve our
>>>> use cases. Specifically:
>>>>
>>>> 1. **Heap count / scalability**
>>>>     Our application maintains at least ~200 rte_acl_ctx instances (due
>>>> to the
>>>>     total rule count and multi-tenant isolation). Allowing a dedicated
>>>> heap per
>>>>     context would exceed the practical limits of the current rte_malloc
>>>> heap
>>>>     model. The number of heaps that can be created is not unlimited, and
>>>>     maintaining hundreds of separate heaps would introduce considerable
>>>>     management overhead.
>>> This is a valid point against heaps, thanks.
>>>> 2. **Temporary allocations in build stage**
>>>>     During `rte_acl_build`, a significant portion of memory is
>>>> allocated through
>>>>     `calloc()` for internal temporary structures. These allocations are
>>>> freed
>>>>     right after the build completes. Even if runtime memory could come
>>>> from a
>>>>     custom heap, these temporary allocations would still need an
>>>> independent
>>>>     allocator or callback mechanism to avoid fragmentation and repeated
>>>>     malloc/free cycles.
>>> I don't understand the build stage issue and why it needs a custom
>>> allocator.
>>> What exactly gets fragmented?
>>> It is the entire process address space which is practically unlimited?
>>> How does is malloc/free overhead compare to the overall ACL build time?
>>>    
> 
> I have seen similar issues in other networking software, mostly it is because
> glibc wants to avoid expensive compaction. See https://sourceware.org/glibc/wiki/MallocInternals
> 
> The solution was to call malloc_trim() at the end of control transaction.
> If ACL library is doing lots of small allocations, then adding it there
> would help.
> 
> The effect can also be mitigated by using mallopt to adjust MALLOC_TRIM_THRESHOLD.
> There is lots of documentation on the Internet on this.
> 
> Another option for some workloads is using an alternative library for malloc.
> There are lots of benchmarks on glibc vs tcmalloc vs jemalloc.
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] acl: support custom memory allocator
  2025-11-25 12:14   ` [PATCH v3] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-11-25 14:59     ` Stephen Hemminger
  2025-11-25 18:01     ` Dmitry Kozlyuk
@ 2025-11-28 13:26     ` Konstantin Ananyev
  2025-11-28 15:07       ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
                         ` (2 more replies)
  2 siblings, 3 replies; 20+ messages in thread
From: Konstantin Ananyev @ 2025-11-28 13:26 UTC (permalink / raw)
  To: mannywang(王永峰); +Cc: dev



> Reduce memory fragmentation caused by dynamic memory allocations
> by allowing users to provide custom memory allocator.
> 
> Add new members to struct rte_acl_config to allow passing custom
> allocator callbacks to rte_acl_build:
> 
> - running_alloc: allocator callback for run-time internal memory
> - running_free: free callback for run-time internal memory
> - running_ctx: user-defined context passed to running_alloc/free
> 
> - temp_alloc: allocator callback for temporary memory during ACL build
> - temp_reset: reset callback for temporary allocator
> - temp_ctx: user-defined context passed to temp_alloc/reset
> 
> These callbacks allow users to provide their own memory pools or
> allocators for both persistent runtime structures and temporary
> build-time data.
> 
> A typical approach is to pre-allocate a static memory region
> for rte_acl_ctx, and to provide a global temporary memory manager
> that supports multipleallocations and a single reset during ACL build.
> 
> Since tb_mem_pool handles allocation failures using siglongjmp,
> temp_alloc follows the same approach for failure handling.

Thank you for the patch, though overall approach looks
a bit overcomplicated to me: in particular I am still not convinced
that we do need a special allocator for temporary build buffers.
Another concern, is that 'struct rte_acl_config' is part of public 
API and can't be changed at will: only at next API/ABI breakage point.
Can I suggest something more simpler:

1. Add new pubic API:
struct rte_acl_mem_cb {
    void (*zalloc)(void *udata, size_t size, size_t align, int32_t numa_socket);
   void (*free)( void *udata, void *ptr2free);
   void *udata;
}; 
    
int rte_acl_set_mem_cb(struct rte_acl_ctx *acl, const struct struct rte_acl_mem_ctx *mcb);
int rte_acl_get_mem_cb(const struct rte_acl_ctx *acl, struct struct rte_acl_mem_ctx *mcb);

and add ' struct rte_acl_mem_cb' instance into struct rte_acl_ctx.
At  rte_acl_create() initialize them into some default functions that will be just a stubs
around calling rte_zmallo_socket()/rte_free().
At acl_gen.c we will have:
-    mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
+   mem = ctx->mcb.zmalloc(ctx->mcb.udata, total_size, RTE_CACHE_LINE_SIZE,
                        ctx->socket_id);

Does it make sense to you?
 
> Signed-off-by: YongFeng Wang <mannywang@tencent.com>
> ---
>  app/test/test_acl.c | 181
> +++++++++++++++++++++++++++++++++++++++++++-
>  lib/acl/acl.h       |   3 +-
>  lib/acl/acl_bld.c   |  14 +++-
>  lib/acl/acl_gen.c   |   8 +-
>  lib/acl/rte_acl.c   |   5 +-
>  lib/acl/rte_acl.h   |  20 +++++
>  lib/acl/tb_mem.c    |   8 ++
>  lib/acl/tb_mem.h    |   6 ++
>  8 files changed, 236 insertions(+), 9 deletions(-)
> 
> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> index 43d13b5b0f..9c6ed34f0c 100644
> --- a/app/test/test_acl.c
> +++ b/app/test/test_acl.c
> @@ -1721,6 +1721,184 @@ test_u32_range(void)
>  	return rc;
>  }
> 
> +struct acl_ctx_wrapper_t {
> +	struct rte_acl_ctx *ctx;
> +	void *running_buf;
> +	bool running_buf_using;
> +};
> +
> +struct acl_temp_mem_mgr_t {
> +	void *buf;
> +	uint32_t buf_used;
> +	sigjmp_buf fail;
> +};
> +
> +struct acl_ctx_wrapper_t g_acl_ctx_wrapper;
> +struct acl_temp_mem_mgr_t g_temp_mem_mgr;
> +
> +#define ACL_RUNNING_BUF_SIZE (10 * 1024 * 1024)
> +#define ACL_TEMP_BUF_SIZE (10 * 1024 * 1024)
> +
> +static void *running_alloc(size_t size, unsigned int align, void *cb_data)
> +{
> +	(void)align;
> +	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t
> *)cb_data;
> +	if (gwlb_acl_ctx->running_buf_using)
> +		return NULL;
> +	printf("running memory alloc for acl context, size=%zu, pointer=%p\n",
> +		size,
> +		gwlb_acl_ctx->running_buf);
> +	gwlb_acl_ctx->running_buf_using = true;
> +	return gwlb_acl_ctx->running_buf;
> +}
> +
> +static void running_free(void *buf, void *cb_data)
> +{
> +	if (!buf)
> +		return;
> +	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t
> *)cb_data;
> +	printf("running memory free pointer=%p\n", buf);
> +	gwlb_acl_ctx->running_buf_using = false;
> +}
> +
> +static void *temp_alloc(size_t size, sigjmp_buf fail, void *cb_data)
> +{
> +	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct
> acl_temp_mem_mgr_t *)cb_data;
> +	if (ACL_TEMP_BUF_SIZE - gwlb_acl_build->buf_used < size) {
> +		printf("Line %i: alloc temp memory fail, size=%zu, used=%d\n",
> +			__LINE__,
> +			size,
> +			gwlb_acl_build->buf_used);
> +		siglongjmp(fail, -ENOMEM);
> +		return NULL;
> +	}
> +	void *ret = (char *)gwlb_acl_build->buf + gwlb_acl_build->buf_used;
> +	gwlb_acl_build->buf_used += size;
> +	return ret;
> +}
> +
> +static void temp_reset(void *cb_data)
> +{
> +	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct
> acl_temp_mem_mgr_t *)cb_data;
> +	memset(gwlb_acl_build->buf, 0, ACL_TEMP_BUF_SIZE);
> +	printf("temp memory reset, used total=%u\n", gwlb_acl_build-
> >buf_used);
> +	gwlb_acl_build->buf_used = 0;
> +}
> +
> +static int
> +rte_acl_ipv4vlan_build_wich_mem_cb(struct rte_acl_ctx *ctx,
> +	const uint32_t layout[RTE_ACL_IPV4VLAN_NUM],
> +	uint32_t num_categories)
> +{
> +	struct rte_acl_config cfg;
> +
> +	if (ctx == NULL || layout == NULL)
> +		return -EINVAL;
> +
> +	memset(&cfg, 0, sizeof(cfg));
> +	acl_ipv4vlan_config(&cfg, layout, num_categories);
> +	cfg.running_alloc = running_alloc;
> +	cfg.running_free = running_free;
> +	cfg.running_cb_ctx = &g_acl_ctx_wrapper;
> +	cfg.temp_alloc = temp_alloc;
> +	cfg.temp_reset = temp_reset;
> +	cfg.temp_cb_ctx = &g_temp_mem_mgr;
> +	return rte_acl_build(ctx, &cfg);
> +}
> +
> +static int
> +test_classify_buid_wich_mem_cb(struct rte_acl_ctx *acx,
> +	const struct rte_acl_ipv4vlan_rule *rules, uint32_t num)
> +{
> +	int ret;
> +
> +	/* add rules to the context */
> +	ret = rte_acl_ipv4vlan_add_rules(acx, rules, num);
> +	if (ret != 0) {
> +		printf("Line %i: Adding rules to ACL context failed!\n",
> +			__LINE__);
> +		return ret;
> +	}
> +
> +	/* try building the context */
> +	ret = rte_acl_ipv4vlan_build_wich_mem_cb(acx, ipv4_7tuple_layout,
> +		RTE_ACL_MAX_CATEGORIES);
> +	if (ret != 0) {
> +		printf("Line %i: Building ACL context failed!\n", __LINE__);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +test_mem_cb(void)
> +{
> +	int i, ret;
> +	g_acl_ctx_wrapper.ctx = rte_acl_create(&acl_param);
> +	if (g_acl_ctx_wrapper.ctx == NULL) {
> +		printf("Line %i: Error creating ACL context!\n", __LINE__);
> +		return -1;
> +	}
> +	g_acl_ctx_wrapper.running_buf = rte_zmalloc_socket(
> +		"test_acl",
> +		ACL_RUNNING_BUF_SIZE,
> +		RTE_CACHE_LINE_SIZE,
> +		SOCKET_ID_ANY);
> +	if (!g_acl_ctx_wrapper.running_buf) {
> +		printf("Line %i: Error allocing running buf for acl context!\n",
> __LINE__);
> +		return 1;
> +	}
> +	g_acl_ctx_wrapper.running_buf_using = false;
> +
> +	g_temp_mem_mgr.buf = malloc(ACL_TEMP_BUF_SIZE);
> +	if (!g_temp_mem_mgr.buf)
> +		printf("Line %i: Error allocing teem buf for acl build!\n",
> __LINE__);
> +	memset(g_temp_mem_mgr.buf, 0, ACL_TEMP_BUF_SIZE);
> +	g_temp_mem_mgr.buf_used = 0;
> +
> +	ret = 0;
> +	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> +
> +		if ((i & 1) == 0)
> +			rte_acl_reset(g_acl_ctx_wrapper.ctx);
> +		else
> +			rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
> +
> +		ret = test_classify_buid_wich_mem_cb(g_acl_ctx_wrapper.ctx,
> acl_test_rules,
> +			RTE_DIM(acl_test_rules));
> +		if (ret != 0) {
> +			printf("Line %i, iter: %d: "
> +				"Adding rules to ACL context failed!\n",
> +				__LINE__, i);
> +			break;
> +		}
> +
> +		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
> +			RTE_DIM(acl_test_data));
> +		if (ret != 0) {
> +			printf("Line %i, iter: %d: %s failed!\n",
> +				__LINE__, i, __func__);
> +			break;
> +		}
> +
> +		/* reset rules and make sure that classify still works ok. */
> +		rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
> +		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
> +			RTE_DIM(acl_test_data));
> +		if (ret != 0) {
> +			printf("Line %i, iter: %d: %s failed!\n",
> +				__LINE__, i, __func__);
> +			break;
> +		}
> +	}
> +
> +	rte_acl_free(g_acl_ctx_wrapper.ctx);
> +	free(g_temp_mem_mgr.buf);
> +	rte_free(g_acl_ctx_wrapper.running_buf);
> +	return ret;
> +}
> +
>  static int
>  test_acl(void)
>  {
> @@ -1742,7 +1920,8 @@ test_acl(void)
>  		return -1;
>  	if (test_u32_range() < 0)
>  		return -1;
> -
> +	if (test_mem_cb() < 0)
> +		return -1;
>  	return 0;
>  }
> 
> diff --git a/lib/acl/acl.h b/lib/acl/acl.h
> index c8e4e72fab..7080fff64d 100644
> --- a/lib/acl/acl.h
> +++ b/lib/acl/acl.h
> @@ -189,7 +189,8 @@ struct rte_acl_ctx {
> 
>  int rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
>  	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
> -	uint32_t num_categories, uint32_t data_index_sz, size_t max_size);
> +	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
> +	const struct rte_acl_config *cfg);
> 
>  typedef int (*rte_acl_classify_t)
>  (const struct rte_acl_ctx *, const uint8_t **, uint32_t *, uint32_t, uint32_t);
> diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
> index 7056b1c117..1fd0ee3aa5 100644
> --- a/lib/acl/acl_bld.c
> +++ b/lib/acl/acl_bld.c
> @@ -777,9 +777,12 @@ acl_merge_trie(struct acl_build_context *context,
>   *  - reset all RT related fields to zero.
>   */
>  static void
> -acl_build_reset(struct rte_acl_ctx *ctx)
> +acl_build_reset(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
>  {
> -	rte_free(ctx->mem);
> +	if (cfg->running_free)
> +		cfg->running_free(ctx->mem, cfg->running_cb_ctx);
> +	else
> +		rte_free(ctx->mem);
>  	memset(&ctx->num_categories, 0,
>  		sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories));
>  }
> @@ -1518,6 +1521,9 @@ acl_bld(struct acl_build_context *bcx, struct rte_acl_ctx
> *ctx,
>  	bcx->acx = ctx;
>  	bcx->pool.alignment = ACL_POOL_ALIGN;
>  	bcx->pool.min_alloc = ACL_POOL_ALLOC_MIN;
> +	bcx->pool.alloc_cb = cfg->temp_alloc;
> +	bcx->pool.reset_cb = cfg->temp_reset;
> +	bcx->pool.cb_ctx = cfg->temp_cb_ctx;
>  	bcx->cfg = *cfg;
>  	bcx->category_mask = RTE_LEN2MASK(bcx->cfg.num_categories,
>  		typeof(bcx->category_mask));
> @@ -1635,7 +1641,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct
> rte_acl_config *cfg)
>  	if (rc != 0)
>  		return rc;
> 
> -	acl_build_reset(ctx);
> +	acl_build_reset(ctx, cfg);
> 
>  	if (cfg->max_size == 0) {
>  		n = NODE_MIN;
> @@ -1655,7 +1661,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct
> rte_acl_config *cfg)
>  			rc = rte_acl_gen(ctx, bcx.tries, bcx.bld_tries,
>  				bcx.num_tries, bcx.cfg.num_categories,
>  				ACL_MAX_INDEXES * RTE_DIM(bcx.tries) *
> -				sizeof(ctx->data_indexes[0]), max_size);
> +				sizeof(ctx->data_indexes[0]), max_size, cfg);
>  			if (rc == 0) {
>  				/* set data indexes. */
>  				acl_set_data_indexes(ctx);
> diff --git a/lib/acl/acl_gen.c b/lib/acl/acl_gen.c
> index 3c53d24056..6aa7d74635 100644
> --- a/lib/acl/acl_gen.c
> +++ b/lib/acl/acl_gen.c
> @@ -448,7 +448,8 @@ acl_calc_counts_indices(struct acl_node_counters
> *counts,
>  int
>  rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
>  	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
> -	uint32_t num_categories, uint32_t data_index_sz, size_t max_size)
> +	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
> +	const struct rte_acl_config *cfg)
>  {
>  	void *mem;
>  	size_t total_size;
> @@ -478,7 +479,10 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie
> *trie,
>  		return -ERANGE;
>  	}
> 
> -	mem = rte_zmalloc_socket(ctx->name, total_size,
> RTE_CACHE_LINE_SIZE,
> +	if (cfg->running_alloc)
> +		mem = cfg->running_alloc(total_size, RTE_CACHE_LINE_SIZE, cfg-
> >running_cb_ctx);
> +	else
> +		mem = rte_zmalloc_socket(ctx->name, total_size,
> RTE_CACHE_LINE_SIZE,
>  			ctx->socket_id);
>  	if (mem == NULL) {
>  		ACL_LOG(ERR,
> diff --git a/lib/acl/rte_acl.c b/lib/acl/rte_acl.c
> index 8c0ca29618..e765c40f4f 100644
> --- a/lib/acl/rte_acl.c
> +++ b/lib/acl/rte_acl.c
> @@ -362,7 +362,10 @@ rte_acl_free(struct rte_acl_ctx *ctx)
> 
>  	rte_mcfg_tailq_write_unlock();
> 
> -	rte_free(ctx->mem);
> +	if (ctx->config.running_free)
> +		ctx->config.running_free(ctx->mem, ctx-
> >config.running_cb_ctx);
> +	else
> +		rte_free(ctx->mem);
>  	rte_free(ctx);
>  	rte_free(te);
>  }
> diff --git a/lib/acl/rte_acl.h b/lib/acl/rte_acl.h
> index 95354cabb8..c675c9ff81 100644
> --- a/lib/acl/rte_acl.h
> +++ b/lib/acl/rte_acl.h
> @@ -13,6 +13,7 @@
> 
>  #include <rte_common.h>
>  #include <rte_acl_osdep.h>
> +#include <setjmp.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -61,6 +62,11 @@ struct rte_acl_field_def {
>   * ACL build configuration.
>   * Defines the fields of an ACL trie and number of categories to build with.
>   */
> +typedef void *(*rte_acl_running_alloc_t)(size_t, unsigned int, void *);
> +typedef void  (*rte_acl_running_free_t)(void *, void *);
> +typedef void *(*rte_acl_temp_alloc_t)(size_t, sigjmp_buf, void *);
> +typedef void  (*rte_acl_temp_reset_t)(void *);
> +
>  struct rte_acl_config {
>  	uint32_t num_categories; /**< Number of categories to build with. */
>  	uint32_t num_fields;     /**< Number of field definitions. */
> @@ -68,6 +74,20 @@ struct rte_acl_config {
>  	/**< array of field definitions. */
>  	size_t max_size;
>  	/**< max memory limit for internal run-time structures. */
> +
> +	/**< Allocator callback for run-time internal memory. */
> +	rte_acl_running_alloc_t  running_alloc;
> +	/**< Free callback for run-time internal memory. */
> +	rte_acl_running_free_t   running_free;
> +	/**< User context passed to running_alloc/free. */
> +	void                     *running_cb_ctx;
> +
> +	/**< Allocator callback for temporary memory used during build. */
> +	rte_acl_temp_alloc_t     temp_alloc;
> +	/**< Reset callback for temporary allocator. */
> +	rte_acl_temp_reset_t     temp_reset;
> +	/**< User context passed to temp_alloc/reset. */
> +	void                     *temp_cb_ctx;
>  };
> 
>  /**
> diff --git a/lib/acl/tb_mem.c b/lib/acl/tb_mem.c
> index 9264433422..b9c69b563e 100644
> --- a/lib/acl/tb_mem.c
> +++ b/lib/acl/tb_mem.c
> @@ -55,6 +55,9 @@ tb_alloc(struct tb_mem_pool *pool, size_t size)
> 
>  	size = RTE_ALIGN_CEIL(size, pool->alignment);
> 
> +	if (pool->alloc_cb)
> +		return pool->alloc_cb(size, pool->fail, pool->cb_ctx);
> +
>  	block = pool->block;
>  	if (block == NULL || block->size < size) {
>  		new_sz = (size > pool->min_alloc) ? size : pool->min_alloc;
> @@ -71,6 +74,11 @@ tb_free_pool(struct tb_mem_pool *pool)
>  {
>  	struct tb_mem_block *next, *block;
> 
> +	if (pool->reset_cb) {
> +		pool->reset_cb(pool->cb_ctx);
> +		return;
> +	}
> +
>  	for (block = pool->block; block != NULL; block = next) {
>  		next = block->next;
>  		free(block);
> diff --git a/lib/acl/tb_mem.h b/lib/acl/tb_mem.h
> index 2093744a6d..2fdebefc31 100644
> --- a/lib/acl/tb_mem.h
> +++ b/lib/acl/tb_mem.h
> @@ -24,11 +24,17 @@ struct tb_mem_block {
>  	uint8_t             *mem;
>  };
> 
> +typedef void *(*rte_tb_alloc_t)(size_t, sigjmp_buf, void *);
> +typedef void (*rte_tb_reset_t)(void *);
> +
>  struct tb_mem_pool {
>  	struct tb_mem_block *block;
>  	size_t               alignment;
>  	size_t               min_alloc;
>  	size_t               alloc;
> +	rte_tb_alloc_t       alloc_cb;
> +	rte_tb_reset_t       reset_cb;
> +	void                 *cb_ctx;
>  	/* jump target in case of memory allocation failure. */
>  	sigjmp_buf           fail;
>  };
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] acl: support custom memory allocator
  2025-11-28 13:26     ` Konstantin Ananyev
@ 2025-11-28 15:07       ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-12-01 12:05       ` [PATCH v4] acl: support custom memory allocators in rte_acl_build =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-12-01 12:45       ` [PATCH v5] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2 siblings, 0 replies; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-11-28 15:07 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev


Yes, I will update the patch accordingly and send next version.


On 11/28/2025 9:26 PM, Konstantin Ananyev wrote:
> 
> 
>> Reduce memory fragmentation caused by dynamic memory allocations
>> by allowing users to provide custom memory allocator.
>>
>> Add new members to struct rte_acl_config to allow passing custom
>> allocator callbacks to rte_acl_build:
>>
>> - running_alloc: allocator callback for run-time internal memory
>> - running_free: free callback for run-time internal memory
>> - running_ctx: user-defined context passed to running_alloc/free
>>
>> - temp_alloc: allocator callback for temporary memory during ACL build
>> - temp_reset: reset callback for temporary allocator
>> - temp_ctx: user-defined context passed to temp_alloc/reset
>>
>> These callbacks allow users to provide their own memory pools or
>> allocators for both persistent runtime structures and temporary
>> build-time data.
>>
>> A typical approach is to pre-allocate a static memory region
>> for rte_acl_ctx, and to provide a global temporary memory manager
>> that supports multipleallocations and a single reset during ACL build.
>>
>> Since tb_mem_pool handles allocation failures using siglongjmp,
>> temp_alloc follows the same approach for failure handling.
> 
> Thank you for the patch, though overall approach looks
> a bit overcomplicated to me: in particular I am still not convinced
> that we do need a special allocator for temporary build buffers.
> Another concern, is that 'struct rte_acl_config' is part of public
> API and can't be changed at will: only at next API/ABI breakage point.
> Can I suggest something more simpler:
> 
> 1. Add new pubic API:
> struct rte_acl_mem_cb {
>      void (*zalloc)(void *udata, size_t size, size_t align, int32_t numa_socket);
>     void (*free)( void *udata, void *ptr2free);
>     void *udata;
> };
>      
> int rte_acl_set_mem_cb(struct rte_acl_ctx *acl, const struct struct rte_acl_mem_ctx *mcb);
> int rte_acl_get_mem_cb(const struct rte_acl_ctx *acl, struct struct rte_acl_mem_ctx *mcb);
> 
> and add ' struct rte_acl_mem_cb' instance into struct rte_acl_ctx.
> At  rte_acl_create() initialize them into some default functions that will be just a stubs
> around calling rte_zmallo_socket()/rte_free().
> At acl_gen.c we will have:
> -    mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
> +   mem = ctx->mcb.zmalloc(ctx->mcb.udata, total_size, RTE_CACHE_LINE_SIZE,
>                          ctx->socket_id);
> 
> Does it make sense to you?
>   
>> Signed-off-by: YongFeng Wang <mannywang@tencent.com>
>> ---
>>   app/test/test_acl.c | 181
>> +++++++++++++++++++++++++++++++++++++++++++-
>>   lib/acl/acl.h       |   3 +-
>>   lib/acl/acl_bld.c   |  14 +++-
>>   lib/acl/acl_gen.c   |   8 +-
>>   lib/acl/rte_acl.c   |   5 +-
>>   lib/acl/rte_acl.h   |  20 +++++
>>   lib/acl/tb_mem.c    |   8 ++
>>   lib/acl/tb_mem.h    |   6 ++
>>   8 files changed, 236 insertions(+), 9 deletions(-)
>>
>> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
>> index 43d13b5b0f..9c6ed34f0c 100644
>> --- a/app/test/test_acl.c
>> +++ b/app/test/test_acl.c
>> @@ -1721,6 +1721,184 @@ test_u32_range(void)
>>   	return rc;
>>   }
>>
>> +struct acl_ctx_wrapper_t {
>> +	struct rte_acl_ctx *ctx;
>> +	void *running_buf;
>> +	bool running_buf_using;
>> +};
>> +
>> +struct acl_temp_mem_mgr_t {
>> +	void *buf;
>> +	uint32_t buf_used;
>> +	sigjmp_buf fail;
>> +};
>> +
>> +struct acl_ctx_wrapper_t g_acl_ctx_wrapper;
>> +struct acl_temp_mem_mgr_t g_temp_mem_mgr;
>> +
>> +#define ACL_RUNNING_BUF_SIZE (10 * 1024 * 1024)
>> +#define ACL_TEMP_BUF_SIZE (10 * 1024 * 1024)
>> +
>> +static void *running_alloc(size_t size, unsigned int align, void *cb_data)
>> +{
>> +	(void)align;
>> +	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t
>> *)cb_data;
>> +	if (gwlb_acl_ctx->running_buf_using)
>> +		return NULL;
>> +	printf("running memory alloc for acl context, size=%zu, pointer=%p\n",
>> +		size,
>> +		gwlb_acl_ctx->running_buf);
>> +	gwlb_acl_ctx->running_buf_using = true;
>> +	return gwlb_acl_ctx->running_buf;
>> +}
>> +
>> +static void running_free(void *buf, void *cb_data)
>> +{
>> +	if (!buf)
>> +		return;
>> +	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t
>> *)cb_data;
>> +	printf("running memory free pointer=%p\n", buf);
>> +	gwlb_acl_ctx->running_buf_using = false;
>> +}
>> +
>> +static void *temp_alloc(size_t size, sigjmp_buf fail, void *cb_data)
>> +{
>> +	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct
>> acl_temp_mem_mgr_t *)cb_data;
>> +	if (ACL_TEMP_BUF_SIZE - gwlb_acl_build->buf_used < size) {
>> +		printf("Line %i: alloc temp memory fail, size=%zu, used=%d\n",
>> +			__LINE__,
>> +			size,
>> +			gwlb_acl_build->buf_used);
>> +		siglongjmp(fail, -ENOMEM);
>> +		return NULL;
>> +	}
>> +	void *ret = (char *)gwlb_acl_build->buf + gwlb_acl_build->buf_used;
>> +	gwlb_acl_build->buf_used += size;
>> +	return ret;
>> +}
>> +
>> +static void temp_reset(void *cb_data)
>> +{
>> +	struct acl_temp_mem_mgr_t *gwlb_acl_build = (struct
>> acl_temp_mem_mgr_t *)cb_data;
>> +	memset(gwlb_acl_build->buf, 0, ACL_TEMP_BUF_SIZE);
>> +	printf("temp memory reset, used total=%u\n", gwlb_acl_build-
>>> buf_used);
>> +	gwlb_acl_build->buf_used = 0;
>> +}
>> +
>> +static int
>> +rte_acl_ipv4vlan_build_wich_mem_cb(struct rte_acl_ctx *ctx,
>> +	const uint32_t layout[RTE_ACL_IPV4VLAN_NUM],
>> +	uint32_t num_categories)
>> +{
>> +	struct rte_acl_config cfg;
>> +
>> +	if (ctx == NULL || layout == NULL)
>> +		return -EINVAL;
>> +
>> +	memset(&cfg, 0, sizeof(cfg));
>> +	acl_ipv4vlan_config(&cfg, layout, num_categories);
>> +	cfg.running_alloc = running_alloc;
>> +	cfg.running_free = running_free;
>> +	cfg.running_cb_ctx = &g_acl_ctx_wrapper;
>> +	cfg.temp_alloc = temp_alloc;
>> +	cfg.temp_reset = temp_reset;
>> +	cfg.temp_cb_ctx = &g_temp_mem_mgr;
>> +	return rte_acl_build(ctx, &cfg);
>> +}
>> +
>> +static int
>> +test_classify_buid_wich_mem_cb(struct rte_acl_ctx *acx,
>> +	const struct rte_acl_ipv4vlan_rule *rules, uint32_t num)
>> +{
>> +	int ret;
>> +
>> +	/* add rules to the context */
>> +	ret = rte_acl_ipv4vlan_add_rules(acx, rules, num);
>> +	if (ret != 0) {
>> +		printf("Line %i: Adding rules to ACL context failed!\n",
>> +			__LINE__);
>> +		return ret;
>> +	}
>> +
>> +	/* try building the context */
>> +	ret = rte_acl_ipv4vlan_build_wich_mem_cb(acx, ipv4_7tuple_layout,
>> +		RTE_ACL_MAX_CATEGORIES);
>> +	if (ret != 0) {
>> +		printf("Line %i: Building ACL context failed!\n", __LINE__);
>> +		return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +test_mem_cb(void)
>> +{
>> +	int i, ret;
>> +	g_acl_ctx_wrapper.ctx = rte_acl_create(&acl_param);
>> +	if (g_acl_ctx_wrapper.ctx == NULL) {
>> +		printf("Line %i: Error creating ACL context!\n", __LINE__);
>> +		return -1;
>> +	}
>> +	g_acl_ctx_wrapper.running_buf = rte_zmalloc_socket(
>> +		"test_acl",
>> +		ACL_RUNNING_BUF_SIZE,
>> +		RTE_CACHE_LINE_SIZE,
>> +		SOCKET_ID_ANY);
>> +	if (!g_acl_ctx_wrapper.running_buf) {
>> +		printf("Line %i: Error allocing running buf for acl context!\n",
>> __LINE__);
>> +		return 1;
>> +	}
>> +	g_acl_ctx_wrapper.running_buf_using = false;
>> +
>> +	g_temp_mem_mgr.buf = malloc(ACL_TEMP_BUF_SIZE);
>> +	if (!g_temp_mem_mgr.buf)
>> +		printf("Line %i: Error allocing teem buf for acl build!\n",
>> __LINE__);
>> +	memset(g_temp_mem_mgr.buf, 0, ACL_TEMP_BUF_SIZE);
>> +	g_temp_mem_mgr.buf_used = 0;
>> +
>> +	ret = 0;
>> +	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
>> +
>> +		if ((i & 1) == 0)
>> +			rte_acl_reset(g_acl_ctx_wrapper.ctx);
>> +		else
>> +			rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
>> +
>> +		ret = test_classify_buid_wich_mem_cb(g_acl_ctx_wrapper.ctx,
>> acl_test_rules,
>> +			RTE_DIM(acl_test_rules));
>> +		if (ret != 0) {
>> +			printf("Line %i, iter: %d: "
>> +				"Adding rules to ACL context failed!\n",
>> +				__LINE__, i);
>> +			break;
>> +		}
>> +
>> +		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
>> +			RTE_DIM(acl_test_data));
>> +		if (ret != 0) {
>> +			printf("Line %i, iter: %d: %s failed!\n",
>> +				__LINE__, i, __func__);
>> +			break;
>> +		}
>> +
>> +		/* reset rules and make sure that classify still works ok. */
>> +		rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
>> +		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
>> +			RTE_DIM(acl_test_data));
>> +		if (ret != 0) {
>> +			printf("Line %i, iter: %d: %s failed!\n",
>> +				__LINE__, i, __func__);
>> +			break;
>> +		}
>> +	}
>> +
>> +	rte_acl_free(g_acl_ctx_wrapper.ctx);
>> +	free(g_temp_mem_mgr.buf);
>> +	rte_free(g_acl_ctx_wrapper.running_buf);
>> +	return ret;
>> +}
>> +
>>   static int
>>   test_acl(void)
>>   {
>> @@ -1742,7 +1920,8 @@ test_acl(void)
>>   		return -1;
>>   	if (test_u32_range() < 0)
>>   		return -1;
>> -
>> +	if (test_mem_cb() < 0)
>> +		return -1;
>>   	return 0;
>>   }
>>
>> diff --git a/lib/acl/acl.h b/lib/acl/acl.h
>> index c8e4e72fab..7080fff64d 100644
>> --- a/lib/acl/acl.h
>> +++ b/lib/acl/acl.h
>> @@ -189,7 +189,8 @@ struct rte_acl_ctx {
>>
>>   int rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
>>   	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
>> -	uint32_t num_categories, uint32_t data_index_sz, size_t max_size);
>> +	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
>> +	const struct rte_acl_config *cfg);
>>
>>   typedef int (*rte_acl_classify_t)
>>   (const struct rte_acl_ctx *, const uint8_t **, uint32_t *, uint32_t, uint32_t);
>> diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
>> index 7056b1c117..1fd0ee3aa5 100644
>> --- a/lib/acl/acl_bld.c
>> +++ b/lib/acl/acl_bld.c
>> @@ -777,9 +777,12 @@ acl_merge_trie(struct acl_build_context *context,
>>    *  - reset all RT related fields to zero.
>>    */
>>   static void
>> -acl_build_reset(struct rte_acl_ctx *ctx)
>> +acl_build_reset(struct rte_acl_ctx *ctx, const struct rte_acl_config *cfg)
>>   {
>> -	rte_free(ctx->mem);
>> +	if (cfg->running_free)
>> +		cfg->running_free(ctx->mem, cfg->running_cb_ctx);
>> +	else
>> +		rte_free(ctx->mem);
>>   	memset(&ctx->num_categories, 0,
>>   		sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories));
>>   }
>> @@ -1518,6 +1521,9 @@ acl_bld(struct acl_build_context *bcx, struct rte_acl_ctx
>> *ctx,
>>   	bcx->acx = ctx;
>>   	bcx->pool.alignment = ACL_POOL_ALIGN;
>>   	bcx->pool.min_alloc = ACL_POOL_ALLOC_MIN;
>> +	bcx->pool.alloc_cb = cfg->temp_alloc;
>> +	bcx->pool.reset_cb = cfg->temp_reset;
>> +	bcx->pool.cb_ctx = cfg->temp_cb_ctx;
>>   	bcx->cfg = *cfg;
>>   	bcx->category_mask = RTE_LEN2MASK(bcx->cfg.num_categories,
>>   		typeof(bcx->category_mask));
>> @@ -1635,7 +1641,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct
>> rte_acl_config *cfg)
>>   	if (rc != 0)
>>   		return rc;
>>
>> -	acl_build_reset(ctx);
>> +	acl_build_reset(ctx, cfg);
>>
>>   	if (cfg->max_size == 0) {
>>   		n = NODE_MIN;
>> @@ -1655,7 +1661,7 @@ rte_acl_build(struct rte_acl_ctx *ctx, const struct
>> rte_acl_config *cfg)
>>   			rc = rte_acl_gen(ctx, bcx.tries, bcx.bld_tries,
>>   				bcx.num_tries, bcx.cfg.num_categories,
>>   				ACL_MAX_INDEXES * RTE_DIM(bcx.tries) *
>> -				sizeof(ctx->data_indexes[0]), max_size);
>> +				sizeof(ctx->data_indexes[0]), max_size, cfg);
>>   			if (rc == 0) {
>>   				/* set data indexes. */
>>   				acl_set_data_indexes(ctx);
>> diff --git a/lib/acl/acl_gen.c b/lib/acl/acl_gen.c
>> index 3c53d24056..6aa7d74635 100644
>> --- a/lib/acl/acl_gen.c
>> +++ b/lib/acl/acl_gen.c
>> @@ -448,7 +448,8 @@ acl_calc_counts_indices(struct acl_node_counters
>> *counts,
>>   int
>>   rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
>>   	struct rte_acl_bld_trie *node_bld_trie, uint32_t num_tries,
>> -	uint32_t num_categories, uint32_t data_index_sz, size_t max_size)
>> +	uint32_t num_categories, uint32_t data_index_sz, size_t max_size,
>> +	const struct rte_acl_config *cfg)
>>   {
>>   	void *mem;
>>   	size_t total_size;
>> @@ -478,7 +479,10 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie
>> *trie,
>>   		return -ERANGE;
>>   	}
>>
>> -	mem = rte_zmalloc_socket(ctx->name, total_size,
>> RTE_CACHE_LINE_SIZE,
>> +	if (cfg->running_alloc)
>> +		mem = cfg->running_alloc(total_size, RTE_CACHE_LINE_SIZE, cfg-
>>> running_cb_ctx);
>> +	else
>> +		mem = rte_zmalloc_socket(ctx->name, total_size,
>> RTE_CACHE_LINE_SIZE,
>>   			ctx->socket_id);
>>   	if (mem == NULL) {
>>   		ACL_LOG(ERR,
>> diff --git a/lib/acl/rte_acl.c b/lib/acl/rte_acl.c
>> index 8c0ca29618..e765c40f4f 100644
>> --- a/lib/acl/rte_acl.c
>> +++ b/lib/acl/rte_acl.c
>> @@ -362,7 +362,10 @@ rte_acl_free(struct rte_acl_ctx *ctx)
>>
>>   	rte_mcfg_tailq_write_unlock();
>>
>> -	rte_free(ctx->mem);
>> +	if (ctx->config.running_free)
>> +		ctx->config.running_free(ctx->mem, ctx-
>>> config.running_cb_ctx);
>> +	else
>> +		rte_free(ctx->mem);
>>   	rte_free(ctx);
>>   	rte_free(te);
>>   }
>> diff --git a/lib/acl/rte_acl.h b/lib/acl/rte_acl.h
>> index 95354cabb8..c675c9ff81 100644
>> --- a/lib/acl/rte_acl.h
>> +++ b/lib/acl/rte_acl.h
>> @@ -13,6 +13,7 @@
>>
>>   #include <rte_common.h>
>>   #include <rte_acl_osdep.h>
>> +#include <setjmp.h>
>>
>>   #ifdef __cplusplus
>>   extern "C" {
>> @@ -61,6 +62,11 @@ struct rte_acl_field_def {
>>    * ACL build configuration.
>>    * Defines the fields of an ACL trie and number of categories to build with.
>>    */
>> +typedef void *(*rte_acl_running_alloc_t)(size_t, unsigned int, void *);
>> +typedef void  (*rte_acl_running_free_t)(void *, void *);
>> +typedef void *(*rte_acl_temp_alloc_t)(size_t, sigjmp_buf, void *);
>> +typedef void  (*rte_acl_temp_reset_t)(void *);
>> +
>>   struct rte_acl_config {
>>   	uint32_t num_categories; /**< Number of categories to build with. */
>>   	uint32_t num_fields;     /**< Number of field definitions. */
>> @@ -68,6 +74,20 @@ struct rte_acl_config {
>>   	/**< array of field definitions. */
>>   	size_t max_size;
>>   	/**< max memory limit for internal run-time structures. */
>> +
>> +	/**< Allocator callback for run-time internal memory. */
>> +	rte_acl_running_alloc_t  running_alloc;
>> +	/**< Free callback for run-time internal memory. */
>> +	rte_acl_running_free_t   running_free;
>> +	/**< User context passed to running_alloc/free. */
>> +	void                     *running_cb_ctx;
>> +
>> +	/**< Allocator callback for temporary memory used during build. */
>> +	rte_acl_temp_alloc_t     temp_alloc;
>> +	/**< Reset callback for temporary allocator. */
>> +	rte_acl_temp_reset_t     temp_reset;
>> +	/**< User context passed to temp_alloc/reset. */
>> +	void                     *temp_cb_ctx;
>>   };
>>
>>   /**
>> diff --git a/lib/acl/tb_mem.c b/lib/acl/tb_mem.c
>> index 9264433422..b9c69b563e 100644
>> --- a/lib/acl/tb_mem.c
>> +++ b/lib/acl/tb_mem.c
>> @@ -55,6 +55,9 @@ tb_alloc(struct tb_mem_pool *pool, size_t size)
>>
>>   	size = RTE_ALIGN_CEIL(size, pool->alignment);
>>
>> +	if (pool->alloc_cb)
>> +		return pool->alloc_cb(size, pool->fail, pool->cb_ctx);
>> +
>>   	block = pool->block;
>>   	if (block == NULL || block->size < size) {
>>   		new_sz = (size > pool->min_alloc) ? size : pool->min_alloc;
>> @@ -71,6 +74,11 @@ tb_free_pool(struct tb_mem_pool *pool)
>>   {
>>   	struct tb_mem_block *next, *block;
>>
>> +	if (pool->reset_cb) {
>> +		pool->reset_cb(pool->cb_ctx);
>> +		return;
>> +	}
>> +
>>   	for (block = pool->block; block != NULL; block = next) {
>>   		next = block->next;
>>   		free(block);
>> diff --git a/lib/acl/tb_mem.h b/lib/acl/tb_mem.h
>> index 2093744a6d..2fdebefc31 100644
>> --- a/lib/acl/tb_mem.h
>> +++ b/lib/acl/tb_mem.h
>> @@ -24,11 +24,17 @@ struct tb_mem_block {
>>   	uint8_t             *mem;
>>   };
>>
>> +typedef void *(*rte_tb_alloc_t)(size_t, sigjmp_buf, void *);
>> +typedef void (*rte_tb_reset_t)(void *);
>> +
>>   struct tb_mem_pool {
>>   	struct tb_mem_block *block;
>>   	size_t               alignment;
>>   	size_t               min_alloc;
>>   	size_t               alloc;
>> +	rte_tb_alloc_t       alloc_cb;
>> +	rte_tb_reset_t       reset_cb;
>> +	void                 *cb_ctx;
>>   	/* jump target in case of memory allocation failure. */
>>   	sigjmp_buf           fail;
>>   };
>> --
>> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4] acl: support custom memory allocators in rte_acl_build
  2025-11-28 13:26     ` Konstantin Ananyev
  2025-11-28 15:07       ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
@ 2025-12-01 12:05       ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-12-01 15:59         ` Patrick Robb
  2025-12-01 16:42         ` Stephen Hemminger
  2025-12-01 12:45       ` [PATCH v5] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2 siblings, 2 replies; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-12-01 12:05 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, YongFeng Wang

Allow users to provide custom
memory allocation callbacks for runtime memory in rte_acl_ctx, via
struct rte_acl_mem_cb.

Key changes:
- Added struct rte_acl_mem_cb with zalloc, free, and udata.
- Added rte_acl_set_mem_cb() / rte_acl_get_mem_cb() to set/get callbacks.
- Default allocation uses existing rte_zmalloc_socket/rte_free.
- Modified ACL code to call callbacks for runtime allocations instead
  of rte_zmalloc_socket/rte_free directly.

Signed-off-by: YongFeng Wang <mannywang@tencent.com>
---
 app/test/test_acl.c | 121 ++++++++++++++++++++++++++++++++++++++++++++
 lib/acl/acl.h       |   1 +
 lib/acl/acl_bld.c   |   2 +-
 lib/acl/acl_gen.c   |   4 +-
 lib/acl/rte_acl.c   |  35 ++++++++++++-
 lib/acl/rte_acl.h   |  42 +++++++++++++++
 6 files changed, 201 insertions(+), 4 deletions(-)

diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index 43d13b5b0f..930fdf8362 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -1721,6 +1721,125 @@ test_u32_range(void)
 	return rc;
 }
 
+struct acl_ctx_wrapper_t {
+	struct rte_acl_ctx *ctx;
+	void *running_buf;
+	bool running_buf_using;
+};
+
+#define ACL_RUNNING_BUF_SIZE (10 * 1024 * 1024)
+
+static void *running_alloc(void *udata, char *name, size_t size,
+	size_t align, int32_t socket_id)
+{
+	(void)align;
+	(void)name;
+	(void)socket_id;
+	if (size > ACL_RUNNING_BUF_SIZE)
+		return NULL;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)udata;
+	if (gwlb_acl_ctx->running_buf_using)
+		return NULL;
+	printf("running memory alloc for acl context, size=%zu, pointer=%p\n",
+		size,
+		gwlb_acl_ctx->running_buf);
+	memset(gwlb_acl_ctx->running_buf, 0, size);
+	gwlb_acl_ctx->running_buf_using = true;
+	return gwlb_acl_ctx->running_buf;
+}
+
+static void running_free(void *udata, void *ptr)
+{
+	if (!ptr)
+		return;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)udata;
+	printf("running memory free, pointer=%p\n", ptr);
+	gwlb_acl_ctx->running_buf_using = false;
+}
+
+static int
+test_mem_cb(void)
+{
+	int i, ret;
+	struct acl_ctx_wrapper_t g_acl_ctx_wrapper;
+	g_acl_ctx_wrapper.ctx = rte_acl_create(&acl_param);
+	if (g_acl_ctx_wrapper.ctx == NULL) {
+		printf("Line %i: Error creating ACL context!\n", __LINE__);
+		return -1;
+	}
+	g_acl_ctx_wrapper.running_buf = rte_zmalloc_socket(
+		"test_acl",
+		ACL_RUNNING_BUF_SIZE,
+		RTE_CACHE_LINE_SIZE,
+		SOCKET_ID_ANY);
+	if (!g_acl_ctx_wrapper.running_buf) {
+		printf("Line %i: Error allocing running buf for acl context!\n", __LINE__);
+		return 1;
+	}
+	g_acl_ctx_wrapper.running_buf_using = false;
+
+	struct rte_acl_mem_cb mcb = {
+		.zalloc = running_alloc,
+		.free = running_free,
+		.udata = &g_acl_ctx_wrapper
+	};
+	ret = rte_acl_set_mem_cb(g_acl_ctx_wrapper.ctx, &mcb);
+	if (ret) {
+		printf("Line %i: Error set mem cb for acl context!\n", __LINE__);
+		return 1;
+	}
+	struct rte_acl_mem_cb new_mcb;
+	memset(&new_mcb, 0, sizeof(struct rte_acl_mem_cb));
+	ret = rte_acl_get_mem_cb(g_acl_ctx_wrapper.ctx, &new_mcb);
+	if (ret) {
+		printf("Line %i: Error get mem cb for acl context!\n", __LINE__);
+		return 1;
+	}
+	if (memcmp(&mcb, &new_mcb, sizeof(struct rte_acl_mem_cb)) != 0) {
+		printf("Line %i: Error get mem cb for acl context!\n", __LINE__);
+		return 1;
+	}
+	ret = 0;
+	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
+
+		if ((i & 1) == 0)
+			rte_acl_reset(g_acl_ctx_wrapper.ctx);
+		else
+			rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+
+		ret = test_classify_buid(g_acl_ctx_wrapper.ctx, acl_test_rules,
+			RTE_DIM(acl_test_rules));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: "
+				"Adding rules to ACL context failed!\n",
+				__LINE__, i);
+			break;
+		}
+
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+
+		/* reset rules and make sure that classify still works ok. */
+		rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+	}
+
+	rte_acl_free(g_acl_ctx_wrapper.ctx);
+	rte_free(g_acl_ctx_wrapper.running_buf);
+	return ret;
+}
+
 static int
 test_acl(void)
 {
@@ -1742,6 +1861,8 @@ test_acl(void)
 		return -1;
 	if (test_u32_range() < 0)
 		return -1;
+	if (test_mem_cb() < 0)
+		return -1;
 
 	return 0;
 }
diff --git a/lib/acl/acl.h b/lib/acl/acl.h
index c8e4e72fab..3acfc0cb9f 100644
--- a/lib/acl/acl.h
+++ b/lib/acl/acl.h
@@ -174,6 +174,7 @@ struct rte_acl_ctx {
 	uint32_t            max_rules;
 	uint32_t            rule_sz;
 	uint32_t            num_rules;
+	struct rte_acl_mem_cb mem_cb;
 	uint32_t            num_categories;
 	uint32_t            num_tries;
 	uint32_t            match_index;
diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
index 7056b1c117..e3d342bd79 100644
--- a/lib/acl/acl_bld.c
+++ b/lib/acl/acl_bld.c
@@ -779,7 +779,7 @@ acl_merge_trie(struct acl_build_context *context,
 static void
 acl_build_reset(struct rte_acl_ctx *ctx)
 {
-	rte_free(ctx->mem);
+	ctx->mem_cb.free(ctx->mem_cb.udata, ctx->mem);
 	memset(&ctx->num_categories, 0,
 		sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories));
 }
diff --git a/lib/acl/acl_gen.c b/lib/acl/acl_gen.c
index 3c53d24056..f482317884 100644
--- a/lib/acl/acl_gen.c
+++ b/lib/acl/acl_gen.c
@@ -478,8 +478,8 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 		return -ERANGE;
 	}
 
-	mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
-			ctx->socket_id);
+	mem = ctx->mem_cb.zalloc(ctx->mem_cb.udata, ctx->name, total_size,
+			RTE_CACHE_LINE_SIZE, ctx->socket_id);
 	if (mem == NULL) {
 		ACL_LOG(ERR,
 			"allocation of %zu bytes on socket %d for %s failed",
diff --git a/lib/acl/rte_acl.c b/lib/acl/rte_acl.c
index 8c0ca29618..0e1fda321a 100644
--- a/lib/acl/rte_acl.c
+++ b/lib/acl/rte_acl.c
@@ -264,6 +264,20 @@ acl_get_best_alg(void)
 	return alg[i];
 }
 
+static void *
+zalloc_dft(void *udata, char *name, size_t size, size_t align, int32_t socket_id)
+{
+	(void)udata;
+	return rte_zmalloc_socket(name, size, align, socket_id);
+}
+
+static void
+free_dft(void *udata, void *ptr)
+{
+	(void)udata;
+	rte_free(ptr);
+}
+
 RTE_EXPORT_SYMBOL(rte_acl_set_ctx_classify)
 extern int
 rte_acl_set_ctx_classify(struct rte_acl_ctx *ctx, enum rte_acl_classify_alg alg)
@@ -362,7 +376,7 @@ rte_acl_free(struct rte_acl_ctx *ctx)
 
 	rte_mcfg_tailq_write_unlock();
 
-	rte_free(ctx->mem);
+	ctx->mem_cb.free(ctx->mem_cb.udata, ctx->mem);
 	rte_free(ctx);
 	rte_free(te);
 }
@@ -425,6 +439,9 @@ rte_acl_create(const struct rte_acl_param *param)
 		ctx->rule_sz = param->rule_size;
 		ctx->socket_id = param->socket_id;
 		ctx->alg = acl_get_best_alg();
+		ctx->mem_cb.zalloc = zalloc_dft;
+		ctx->mem_cb.free = free_dft;
+		ctx->mem_cb.udata = NULL;
 		strlcpy(ctx->name, param->name, sizeof(ctx->name));
 
 		te->data = (void *) ctx;
@@ -555,3 +572,19 @@ rte_acl_list_dump(void)
 	}
 	rte_mcfg_tailq_read_unlock();
 }
+
+int rte_acl_set_mem_cb(struct rte_acl_ctx *acl, const struct rte_acl_mem_cb *mcb)
+{
+	if (!acl || !mcb || !mcb->zalloc || !mcb->free)
+		return -EINVAL;
+	memcpy(&acl->mem_cb, mcb, sizeof(struct rte_acl_mem_cb));
+	return 0;
+}
+
+int rte_acl_get_mem_cb(const struct rte_acl_ctx *acl, struct rte_acl_mem_cb *mcb)
+{
+	if (!acl || !mcb)
+		return -EINVAL;
+	memcpy(mcb, &acl->mem_cb, sizeof(struct rte_acl_mem_cb));
+	return 0;
+}
diff --git a/lib/acl/rte_acl.h b/lib/acl/rte_acl.h
index 95354cabb8..e3273556e2 100644
--- a/lib/acl/rte_acl.h
+++ b/lib/acl/rte_acl.h
@@ -136,6 +136,48 @@ struct rte_acl_param {
 /** @internal opaque ACL handle */
 struct rte_acl_ctx;
 
+/**
+ * Memory allocation callbacks for ACL runtime.
+ */
+struct rte_acl_mem_cb {
+	/** Allocate zero-initialized memory used during runtime. */
+	void *(*zalloc)(void *udata, char *name, size_t size, size_t align, int32_t socket_id);
+
+	/** Free memory previously allocated by zalloc(). */
+	void (*free)(void *udata, void *ptr);
+
+	/** User-provided context passed to allocation/free callbacks. */
+	void *udata;
+};
+
+/**
+ * Set memory allocation callbacks for a given ACL context.
+ *
+ * @param acl
+ *   The ACL context.
+ * @param mcb
+ *   Pointer to the memory callback structure
+ *
+ * @return
+ *   0 on success.
+ *   -EINVAL if parameters are invalid.
+ */
+int rte_acl_set_mem_cb(struct rte_acl_ctx *acl, const struct rte_acl_mem_cb *mcb);
+
+/**
+ * Retrieve the memory allocation callbacks assigned to the ACL context.
+ *
+ * @param acl
+ *   The ACL context.
+ * @param mcb
+ *   Output location for the current memory callback structure
+ *
+ * @return
+ *   0 on success.
+ *   -EINVAL if parameters are invalid.
+ */
+int rte_acl_get_mem_cb(const struct rte_acl_ctx *acl, struct rte_acl_mem_cb *mcb);
+
 /**
  * De-allocate all memory used by ACL context.
  *
-- 
2.43.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5] acl: support custom memory allocators in rte_acl_build
  2025-11-28 13:26     ` Konstantin Ananyev
  2025-11-28 15:07       ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-12-01 12:05       ` [PATCH v4] acl: support custom memory allocators in rte_acl_build =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
@ 2025-12-01 12:45       ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-12-02  2:47         ` fengchengwen
  2 siblings, 1 reply; 20+ messages in thread
From: =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?= @ 2025-12-01 12:45 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, YongFeng Wang

Allow users to provide custom
memory allocation callbacks for runtime memory in rte_acl_ctx, via
struct rte_acl_mem_cb.

Key changes:
- Added struct rte_acl_mem_cb with zalloc, free, and udata.
- Added rte_acl_set_mem_cb() / rte_acl_get_mem_cb() to set/get callbacks.
- Default allocation uses existing rte_zmalloc_socket/rte_free.
- Modified ACL code to call callbacks for runtime allocations instead
  of rte_zmalloc_socket/rte_free directly.

v5:
- Remove temporary memory allocation callback for build stage.
- Introduce new API (rte_acl_set_mem_cb / rte_acl_get_mem_cb) instead of
  modifying existing rte_acl_config to preserve ABI compatibility.

Signed-off-by: YongFeng Wang <mannywang@tencent.com>
---
 app/test/test_acl.c | 121 ++++++++++++++++++++++++++++++++++++++++++++
 lib/acl/acl.h       |   1 +
 lib/acl/acl_bld.c   |   2 +-
 lib/acl/acl_gen.c   |   4 +-
 lib/acl/rte_acl.c   |  45 +++++++++++++++-
 lib/acl/rte_acl.h   |  42 +++++++++++++++
 6 files changed, 211 insertions(+), 4 deletions(-)

diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index 43d13b5b0f..930fdf8362 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -1721,6 +1721,125 @@ test_u32_range(void)
 	return rc;
 }
 
+struct acl_ctx_wrapper_t {
+	struct rte_acl_ctx *ctx;
+	void *running_buf;
+	bool running_buf_using;
+};
+
+#define ACL_RUNNING_BUF_SIZE (10 * 1024 * 1024)
+
+static void *running_alloc(void *udata, char *name, size_t size,
+	size_t align, int32_t socket_id)
+{
+	(void)align;
+	(void)name;
+	(void)socket_id;
+	if (size > ACL_RUNNING_BUF_SIZE)
+		return NULL;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)udata;
+	if (gwlb_acl_ctx->running_buf_using)
+		return NULL;
+	printf("running memory alloc for acl context, size=%zu, pointer=%p\n",
+		size,
+		gwlb_acl_ctx->running_buf);
+	memset(gwlb_acl_ctx->running_buf, 0, size);
+	gwlb_acl_ctx->running_buf_using = true;
+	return gwlb_acl_ctx->running_buf;
+}
+
+static void running_free(void *udata, void *ptr)
+{
+	if (!ptr)
+		return;
+	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)udata;
+	printf("running memory free, pointer=%p\n", ptr);
+	gwlb_acl_ctx->running_buf_using = false;
+}
+
+static int
+test_mem_cb(void)
+{
+	int i, ret;
+	struct acl_ctx_wrapper_t g_acl_ctx_wrapper;
+	g_acl_ctx_wrapper.ctx = rte_acl_create(&acl_param);
+	if (g_acl_ctx_wrapper.ctx == NULL) {
+		printf("Line %i: Error creating ACL context!\n", __LINE__);
+		return -1;
+	}
+	g_acl_ctx_wrapper.running_buf = rte_zmalloc_socket(
+		"test_acl",
+		ACL_RUNNING_BUF_SIZE,
+		RTE_CACHE_LINE_SIZE,
+		SOCKET_ID_ANY);
+	if (!g_acl_ctx_wrapper.running_buf) {
+		printf("Line %i: Error allocing running buf for acl context!\n", __LINE__);
+		return 1;
+	}
+	g_acl_ctx_wrapper.running_buf_using = false;
+
+	struct rte_acl_mem_cb mcb = {
+		.zalloc = running_alloc,
+		.free = running_free,
+		.udata = &g_acl_ctx_wrapper
+	};
+	ret = rte_acl_set_mem_cb(g_acl_ctx_wrapper.ctx, &mcb);
+	if (ret) {
+		printf("Line %i: Error set mem cb for acl context!\n", __LINE__);
+		return 1;
+	}
+	struct rte_acl_mem_cb new_mcb;
+	memset(&new_mcb, 0, sizeof(struct rte_acl_mem_cb));
+	ret = rte_acl_get_mem_cb(g_acl_ctx_wrapper.ctx, &new_mcb);
+	if (ret) {
+		printf("Line %i: Error get mem cb for acl context!\n", __LINE__);
+		return 1;
+	}
+	if (memcmp(&mcb, &new_mcb, sizeof(struct rte_acl_mem_cb)) != 0) {
+		printf("Line %i: Error get mem cb for acl context!\n", __LINE__);
+		return 1;
+	}
+	ret = 0;
+	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
+
+		if ((i & 1) == 0)
+			rte_acl_reset(g_acl_ctx_wrapper.ctx);
+		else
+			rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+
+		ret = test_classify_buid(g_acl_ctx_wrapper.ctx, acl_test_rules,
+			RTE_DIM(acl_test_rules));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: "
+				"Adding rules to ACL context failed!\n",
+				__LINE__, i);
+			break;
+		}
+
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+
+		/* reset rules and make sure that classify still works ok. */
+		rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
+		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
+			RTE_DIM(acl_test_data));
+		if (ret != 0) {
+			printf("Line %i, iter: %d: %s failed!\n",
+				__LINE__, i, __func__);
+			break;
+		}
+	}
+
+	rte_acl_free(g_acl_ctx_wrapper.ctx);
+	rte_free(g_acl_ctx_wrapper.running_buf);
+	return ret;
+}
+
 static int
 test_acl(void)
 {
@@ -1742,6 +1861,8 @@ test_acl(void)
 		return -1;
 	if (test_u32_range() < 0)
 		return -1;
+	if (test_mem_cb() < 0)
+		return -1;
 
 	return 0;
 }
diff --git a/lib/acl/acl.h b/lib/acl/acl.h
index c8e4e72fab..3acfc0cb9f 100644
--- a/lib/acl/acl.h
+++ b/lib/acl/acl.h
@@ -174,6 +174,7 @@ struct rte_acl_ctx {
 	uint32_t            max_rules;
 	uint32_t            rule_sz;
 	uint32_t            num_rules;
+	struct rte_acl_mem_cb mem_cb;
 	uint32_t            num_categories;
 	uint32_t            num_tries;
 	uint32_t            match_index;
diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
index 7056b1c117..e3d342bd79 100644
--- a/lib/acl/acl_bld.c
+++ b/lib/acl/acl_bld.c
@@ -779,7 +779,7 @@ acl_merge_trie(struct acl_build_context *context,
 static void
 acl_build_reset(struct rte_acl_ctx *ctx)
 {
-	rte_free(ctx->mem);
+	ctx->mem_cb.free(ctx->mem_cb.udata, ctx->mem);
 	memset(&ctx->num_categories, 0,
 		sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories));
 }
diff --git a/lib/acl/acl_gen.c b/lib/acl/acl_gen.c
index 3c53d24056..f482317884 100644
--- a/lib/acl/acl_gen.c
+++ b/lib/acl/acl_gen.c
@@ -478,8 +478,8 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
 		return -ERANGE;
 	}
 
-	mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
-			ctx->socket_id);
+	mem = ctx->mem_cb.zalloc(ctx->mem_cb.udata, ctx->name, total_size,
+			RTE_CACHE_LINE_SIZE, ctx->socket_id);
 	if (mem == NULL) {
 		ACL_LOG(ERR,
 			"allocation of %zu bytes on socket %d for %s failed",
diff --git a/lib/acl/rte_acl.c b/lib/acl/rte_acl.c
index 8c0ca29618..3c1a118c46 100644
--- a/lib/acl/rte_acl.c
+++ b/lib/acl/rte_acl.c
@@ -264,6 +264,20 @@ acl_get_best_alg(void)
 	return alg[i];
 }
 
+static void *
+zalloc_dft(void *udata, char *name, size_t size, size_t align, int32_t socket_id)
+{
+	(void)udata;
+	return rte_zmalloc_socket(name, size, align, socket_id);
+}
+
+static void
+free_dft(void *udata, void *ptr)
+{
+	(void)udata;
+	rte_free(ptr);
+}
+
 RTE_EXPORT_SYMBOL(rte_acl_set_ctx_classify)
 extern int
 rte_acl_set_ctx_classify(struct rte_acl_ctx *ctx, enum rte_acl_classify_alg alg)
@@ -362,7 +376,7 @@ rte_acl_free(struct rte_acl_ctx *ctx)
 
 	rte_mcfg_tailq_write_unlock();
 
-	rte_free(ctx->mem);
+	ctx->mem_cb.free(ctx->mem_cb.udata, ctx->mem);
 	rte_free(ctx);
 	rte_free(te);
 }
@@ -425,6 +439,9 @@ rte_acl_create(const struct rte_acl_param *param)
 		ctx->rule_sz = param->rule_size;
 		ctx->socket_id = param->socket_id;
 		ctx->alg = acl_get_best_alg();
+		ctx->mem_cb.zalloc = zalloc_dft;
+		ctx->mem_cb.free = free_dft;
+		ctx->mem_cb.udata = NULL;
 		strlcpy(ctx->name, param->name, sizeof(ctx->name));
 
 		te->data = (void *) ctx;
@@ -555,3 +572,29 @@ rte_acl_list_dump(void)
 	}
 	rte_mcfg_tailq_read_unlock();
 }
+
+/*
+ * Set memory allocation callbacks for a given ACL context.
+ */
+RTE_EXPORT_SYMBOL(rte_acl_set_mem_cb)
+int
+rte_acl_set_mem_cb(struct rte_acl_ctx *acl, const struct rte_acl_mem_cb *mcb)
+{
+	if (!acl || !mcb || !mcb->zalloc || !mcb->free)
+		return -EINVAL;
+	memcpy(&acl->mem_cb, mcb, sizeof(struct rte_acl_mem_cb));
+	return 0;
+}
+
+/*
+ * Retrieve the memory allocation callbacks assigned to the ACL context.
+ */
+RTE_EXPORT_SYMBOL(rte_acl_get_mem_cb)
+int
+rte_acl_get_mem_cb(const struct rte_acl_ctx *acl, struct rte_acl_mem_cb *mcb)
+{
+	if (!acl || !mcb)
+		return -EINVAL;
+	memcpy(mcb, &acl->mem_cb, sizeof(struct rte_acl_mem_cb));
+	return 0;
+}
diff --git a/lib/acl/rte_acl.h b/lib/acl/rte_acl.h
index 95354cabb8..e3273556e2 100644
--- a/lib/acl/rte_acl.h
+++ b/lib/acl/rte_acl.h
@@ -136,6 +136,48 @@ struct rte_acl_param {
 /** @internal opaque ACL handle */
 struct rte_acl_ctx;
 
+/**
+ * Memory allocation callbacks for ACL runtime.
+ */
+struct rte_acl_mem_cb {
+	/** Allocate zero-initialized memory used during runtime. */
+	void *(*zalloc)(void *udata, char *name, size_t size, size_t align, int32_t socket_id);
+
+	/** Free memory previously allocated by zalloc(). */
+	void (*free)(void *udata, void *ptr);
+
+	/** User-provided context passed to allocation/free callbacks. */
+	void *udata;
+};
+
+/**
+ * Set memory allocation callbacks for a given ACL context.
+ *
+ * @param acl
+ *   The ACL context.
+ * @param mcb
+ *   Pointer to the memory callback structure
+ *
+ * @return
+ *   0 on success.
+ *   -EINVAL if parameters are invalid.
+ */
+int rte_acl_set_mem_cb(struct rte_acl_ctx *acl, const struct rte_acl_mem_cb *mcb);
+
+/**
+ * Retrieve the memory allocation callbacks assigned to the ACL context.
+ *
+ * @param acl
+ *   The ACL context.
+ * @param mcb
+ *   Output location for the current memory callback structure
+ *
+ * @return
+ *   0 on success.
+ *   -EINVAL if parameters are invalid.
+ */
+int rte_acl_get_mem_cb(const struct rte_acl_ctx *acl, struct rte_acl_mem_cb *mcb);
+
 /**
  * De-allocate all memory used by ACL context.
  *
-- 
2.43.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] acl: support custom memory allocators in rte_acl_build
  2025-12-01 12:05       ` [PATCH v4] acl: support custom memory allocators in rte_acl_build =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
@ 2025-12-01 15:59         ` Patrick Robb
  2025-12-01 16:42         ` Stephen Hemminger
  1 sibling, 0 replies; 20+ messages in thread
From: Patrick Robb @ 2025-12-01 15:59 UTC (permalink / raw)
  To: mannywang(王永峰); +Cc: Konstantin Ananyev, dev

[-- Attachment #1: Type: text/plain, Size: 122 bytes --]

Recheck-request: iol-intel-Performance

Putting in a retest due to an infra fail at UNH. No action on your end is
needed.

[-- Attachment #2: Type: text/html, Size: 149 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] acl: support custom memory allocators in rte_acl_build
  2025-12-01 12:05       ` [PATCH v4] acl: support custom memory allocators in rte_acl_build =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
  2025-12-01 15:59         ` Patrick Robb
@ 2025-12-01 16:42         ` Stephen Hemminger
  1 sibling, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2025-12-01 16:42 UTC (permalink / raw)
  To: mannywang(王永峰); +Cc: Konstantin Ananyev, dev

On Mon,  1 Dec 2025 12:05:07 +0000
"mannywang(王永峰)" <mannywang@tencent.com> wrote:

> +static void *running_alloc(void *udata, char *name, size_t size,
> +	size_t align, int32_t socket_id)
> +{
> +	(void)align;
> +	(void)name;
> +	(void)socket_id;

Please use __rte_unused attribute or RTE_SET_USED() macro for this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5] acl: support custom memory allocators in rte_acl_build
  2025-12-01 12:45       ` [PATCH v5] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
@ 2025-12-02  2:47         ` fengchengwen
  0 siblings, 0 replies; 20+ messages in thread
From: fengchengwen @ 2025-12-02  2:47 UTC (permalink / raw)
  To: mannywang(王永峰), Konstantin Ananyev
  Cc: dev, Dmitry Kozlyuk

commit header could simply as: acl: support custom memory allocator

On 12/1/2025 8:45 PM, mannywang(王永峰) wrote:
> Allow users to provide custom
> memory allocation callbacks for runtime memory in rte_acl_ctx, via
> struct rte_acl_mem_cb.
> 
> Key changes:
> - Added struct rte_acl_mem_cb with zalloc, free, and udata.
> - Added rte_acl_set_mem_cb() / rte_acl_get_mem_cb() to set/get callbacks.
> - Default allocation uses existing rte_zmalloc_socket/rte_free.
> - Modified ACL code to call callbacks for runtime allocations instead
>   of rte_zmalloc_socket/rte_free directly.
> 
> v5:
> - Remove temporary memory allocation callback for build stage.
> - Introduce new API (rte_acl_set_mem_cb / rte_acl_get_mem_cb) instead of
>   modifying existing rte_acl_config to preserve ABI compatibility.
> 
> Signed-off-by: YongFeng Wang <mannywang@tencent.com>
> ---
>  app/test/test_acl.c | 121 ++++++++++++++++++++++++++++++++++++++++++++
>  lib/acl/acl.h       |   1 +
>  lib/acl/acl_bld.c   |   2 +-
>  lib/acl/acl_gen.c   |   4 +-
>  lib/acl/rte_acl.c   |  45 +++++++++++++++-
>  lib/acl/rte_acl.h   |  42 +++++++++++++++
>  6 files changed, 211 insertions(+), 4 deletions(-)
> 
> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> index 43d13b5b0f..930fdf8362 100644
> --- a/app/test/test_acl.c
> +++ b/app/test/test_acl.c
> @@ -1721,6 +1721,125 @@ test_u32_range(void)
>  	return rc;
>  }
>  
> +struct acl_ctx_wrapper_t {

no need add post-fix _t

> +	struct rte_acl_ctx *ctx;
> +	void *running_buf;
> +	bool running_buf_using;
> +};
> +
> +#define ACL_RUNNING_BUF_SIZE (10 * 1024 * 1024)
> +
> +static void *running_alloc(void *udata, char *name, size_t size,
> +	size_t align, int32_t socket_id)
> +{
> +	(void)align;
> +	(void)name;
> +	(void)socket_id;
> +	if (size > ACL_RUNNING_BUF_SIZE)
> +		return NULL;
> +	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)udata;

gwlb - what the meaning? suggest simple acl_ctx or just ctx.

> +	if (gwlb_acl_ctx->running_buf_using)
> +		return NULL;
> +	printf("running memory alloc for acl context, size=%zu, pointer=%p\n",
> +		size,
> +		gwlb_acl_ctx->running_buf);
> +	memset(gwlb_acl_ctx->running_buf, 0, size);
> +	gwlb_acl_ctx->running_buf_using = true;

this allocator don't support conti alloc, it will has dependent for acl-library,
how about use Dmitry Kozlyuk's comment for [PATCH v3], that is use rte_malloc_heap_create

> +	return gwlb_acl_ctx->running_buf;
> +}
> +
> +static void running_free(void *udata, void *ptr)
> +{
> +	if (!ptr)
> +		return;
> +	struct acl_ctx_wrapper_t *gwlb_acl_ctx = (struct acl_ctx_wrapper_t *)udata;
> +	printf("running memory free, pointer=%p\n", ptr);
> +	gwlb_acl_ctx->running_buf_using = false;
> +}
> +
> +static int
> +test_mem_cb(void)
> +{
> +	int i, ret;
> +	struct acl_ctx_wrapper_t g_acl_ctx_wrapper;

no need add g_acl_ prefix, and suggest zero default (struct acl_ctx_wrapper ctx_wrap = {0})

> +	g_acl_ctx_wrapper.ctx = rte_acl_create(&acl_param);
> +	if (g_acl_ctx_wrapper.ctx == NULL) {
> +		printf("Line %i: Error creating ACL context!\n", __LINE__);
> +		return -1;
> +	}
> +	g_acl_ctx_wrapper.running_buf = rte_zmalloc_socket(
> +		"test_acl",
> +		ACL_RUNNING_BUF_SIZE,
> +		RTE_CACHE_LINE_SIZE,
> +		SOCKET_ID_ANY);
> +	if (!g_acl_ctx_wrapper.running_buf) {
> +		printf("Line %i: Error allocing running buf for acl context!\n", __LINE__);

please add resource rollback, the same as following.

> +		return 1;
> +	}
> +	g_acl_ctx_wrapper.running_buf_using = false;
> +
> +	struct rte_acl_mem_cb mcb = {
> +		.zalloc = running_alloc,
> +		.free = running_free,
> +		.udata = &g_acl_ctx_wrapper
> +	};
> +	ret = rte_acl_set_mem_cb(g_acl_ctx_wrapper.ctx, &mcb);
> +	if (ret) {

please use explicit exp, e.g. if (ret != 0), the same as other place.

> +		printf("Line %i: Error set mem cb for acl context!\n", __LINE__);
> +		return 1;
> +	}
> +	struct rte_acl_mem_cb new_mcb;
> +	memset(&new_mcb, 0, sizeof(struct rte_acl_mem_cb));
> +	ret = rte_acl_get_mem_cb(g_acl_ctx_wrapper.ctx, &new_mcb);
> +	if (ret) {
> +		printf("Line %i: Error get mem cb for acl context!\n", __LINE__);
> +		return 1;
> +	}
> +	if (memcmp(&mcb, &new_mcb, sizeof(struct rte_acl_mem_cb)) != 0) {
> +		printf("Line %i: Error get mem cb for acl context!\n", __LINE__);
> +		return 1;
> +	}
> +	ret = 0;
> +	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {

why not i < TEST_CLASSIFY_ITER ?

> +
> +		if ((i & 1) == 0)
> +			rte_acl_reset(g_acl_ctx_wrapper.ctx);
> +		else
> +			rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
> +
> +		ret = test_classify_buid(g_acl_ctx_wrapper.ctx, acl_test_rules,
> +			RTE_DIM(acl_test_rules));
> +		if (ret != 0) {
> +			printf("Line %i, iter: %d: "
> +				"Adding rules to ACL context failed!\n",

no need extra line for log-string.

> +				__LINE__, i);
> +			break;
> +		}
> +
> +		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
> +			RTE_DIM(acl_test_data));
> +		if (ret != 0) {
> +			printf("Line %i, iter: %d: %s failed!\n",
> +				__LINE__, i, __func__);
> +			break;
> +		}
> +
> +		/* reset rules and make sure that classify still works ok. */
> +		rte_acl_reset_rules(g_acl_ctx_wrapper.ctx);
> +		ret = test_classify_run(g_acl_ctx_wrapper.ctx, acl_test_data,
> +			RTE_DIM(acl_test_data));
> +		if (ret != 0) {
> +			printf("Line %i, iter: %d: %s failed!\n",
> +				__LINE__, i, __func__);
> +			break;
> +		}
> +	}
> +
> +	rte_acl_free(g_acl_ctx_wrapper.ctx);
> +	rte_free(g_acl_ctx_wrapper.running_buf);
> +	return ret;
> +}
> +
>  static int
>  test_acl(void)
>  {
> @@ -1742,6 +1861,8 @@ test_acl(void)
>  		return -1;
>  	if (test_u32_range() < 0)
>  		return -1;
> +	if (test_mem_cb() < 0)
> +		return -1;
>  
>  	return 0;
>  }
> diff --git a/lib/acl/acl.h b/lib/acl/acl.h
> index c8e4e72fab..3acfc0cb9f 100644
> --- a/lib/acl/acl.h
> +++ b/lib/acl/acl.h
> @@ -174,6 +174,7 @@ struct rte_acl_ctx {
>  	uint32_t            max_rules;
>  	uint32_t            rule_sz;
>  	uint32_t            num_rules;
> +	struct rte_acl_mem_cb mem_cb;

why not place near *mem/mem_sz field ?

>  	uint32_t            num_categories;
>  	uint32_t            num_tries;
>  	uint32_t            match_index;
> diff --git a/lib/acl/acl_bld.c b/lib/acl/acl_bld.c
> index 7056b1c117..e3d342bd79 100644
> --- a/lib/acl/acl_bld.c
> +++ b/lib/acl/acl_bld.c
> @@ -779,7 +779,7 @@ acl_merge_trie(struct acl_build_context *context,
>  static void
>  acl_build_reset(struct rte_acl_ctx *ctx)
>  {
> -	rte_free(ctx->mem);
> +	ctx->mem_cb.free(ctx->mem_cb.udata, ctx->mem);
>  	memset(&ctx->num_categories, 0,
>  		sizeof(*ctx) - offsetof(struct rte_acl_ctx, num_categories));
>  }
> diff --git a/lib/acl/acl_gen.c b/lib/acl/acl_gen.c
> index 3c53d24056..f482317884 100644
> --- a/lib/acl/acl_gen.c
> +++ b/lib/acl/acl_gen.c
> @@ -478,8 +478,8 @@ rte_acl_gen(struct rte_acl_ctx *ctx, struct rte_acl_trie *trie,
>  		return -ERANGE;
>  	}
>  
> -	mem = rte_zmalloc_socket(ctx->name, total_size, RTE_CACHE_LINE_SIZE,
> -			ctx->socket_id);
> +	mem = ctx->mem_cb.zalloc(ctx->mem_cb.udata, ctx->name, total_size,
> +			RTE_CACHE_LINE_SIZE, ctx->socket_id);
>  	if (mem == NULL) {
>  		ACL_LOG(ERR,
>  			"allocation of %zu bytes on socket %d for %s failed",
> diff --git a/lib/acl/rte_acl.c b/lib/acl/rte_acl.c
> index 8c0ca29618..3c1a118c46 100644
> --- a/lib/acl/rte_acl.c
> +++ b/lib/acl/rte_acl.c
> @@ -264,6 +264,20 @@ acl_get_best_alg(void)
>  	return alg[i];
>  }
>  
> +static void *
> +zalloc_dft(void *udata, char *name, size_t size, size_t align, int32_t socket_id)

how about acl_mem_default_zalloc?

> +{
> +	(void)udata;
> +	return rte_zmalloc_socket(name, size, align, socket_id);
> +}
> +
> +static void
> +free_dft(void *udata, void *ptr)

how about acl_mem_default_free?

> +{
> +	(void)udata;
> +	rte_free(ptr);
> +}
> +
>  RTE_EXPORT_SYMBOL(rte_acl_set_ctx_classify)
>  extern int
>  rte_acl_set_ctx_classify(struct rte_acl_ctx *ctx, enum rte_acl_classify_alg alg)
> @@ -362,7 +376,7 @@ rte_acl_free(struct rte_acl_ctx *ctx)
>  
>  	rte_mcfg_tailq_write_unlock();
>  
> -	rte_free(ctx->mem);
> +	ctx->mem_cb.free(ctx->mem_cb.udata, ctx->mem);
>  	rte_free(ctx);
>  	rte_free(te);
>  }
> @@ -425,6 +439,9 @@ rte_acl_create(const struct rte_acl_param *param)
>  		ctx->rule_sz = param->rule_size;
>  		ctx->socket_id = param->socket_id;
>  		ctx->alg = acl_get_best_alg();
> +		ctx->mem_cb.zalloc = zalloc_dft;
> +		ctx->mem_cb.free = free_dft;
> +		ctx->mem_cb.udata = NULL;
>  		strlcpy(ctx->name, param->name, sizeof(ctx->name));
>  
>  		te->data = (void *) ctx;
> @@ -555,3 +572,29 @@ rte_acl_list_dump(void)
>  	}
>  	rte_mcfg_tailq_read_unlock();
>  }
> +
> +/*
> + * Set memory allocation callbacks for a given ACL context.
> + */
> +RTE_EXPORT_SYMBOL(rte_acl_set_mem_cb)

Should be RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_acl_set_mem_cb, 26.03)

> +int
> +rte_acl_set_mem_cb(struct rte_acl_ctx *acl, const struct rte_acl_mem_cb *mcb)
> +{
> +	if (!acl || !mcb || !mcb->zalloc || !mcb->free)

suggest acl == NULL || mcb == NULL and ...

> +		return -EINVAL;
> +	memcpy(&acl->mem_cb, mcb, sizeof(struct rte_acl_mem_cb));
> +	return 0;
> +}
> +
> +/*
> + * Retrieve the memory allocation callbacks assigned to the ACL context.
> + */
> +RTE_EXPORT_SYMBOL(rte_acl_get_mem_cb)

Should be RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_acl_get_mem_cb, 26.03)

> +int
> +rte_acl_get_mem_cb(const struct rte_acl_ctx *acl, struct rte_acl_mem_cb *mcb)
> +{
> +	if (!acl || !mcb)

suggest acl == NULL || mcb == NULL

> +		return -EINVAL;
> +	memcpy(mcb, &acl->mem_cb, sizeof(struct rte_acl_mem_cb));
> +	return 0;
> +}
> diff --git a/lib/acl/rte_acl.h b/lib/acl/rte_acl.h
> index 95354cabb8..e3273556e2 100644
> --- a/lib/acl/rte_acl.h
> +++ b/lib/acl/rte_acl.h
> @@ -136,6 +136,48 @@ struct rte_acl_param {
>  /** @internal opaque ACL handle */
>  struct rte_acl_ctx;
>  
> +/**
> + * Memory allocation callbacks for ACL runtime.
> + */
> +struct rte_acl_mem_cb {

How about rte_acl_mem_hook ?
I think hook is better suited to this semantics.

> +	/** Allocate zero-initialized memory used during runtime. */
> +	void *(*zalloc)(void *udata, char *name, size_t size, size_t align, int32_t socket_id);

Suggest move udata as the last one parameter.

> +
> +	/** Free memory previously allocated by zalloc(). */
> +	void (*free)(void *udata, void *ptr);

Suggest move udata as the last one parameter.

> +
> +	/** User-provided context passed to allocation/free callbacks. */
> +	void *udata;
> +};
> +
> +/**
> + * Set memory allocation callbacks for a given ACL context.
> + *
> + * @param acl
> + *   The ACL context.
> + * @param mcb
> + *   Pointer to the memory callback structure

Suggest add one note about when could invoke this API.

> + *
> + * @return
> + *   0 on success.
> + *   -EINVAL if parameters are invalid.
> + */

should add __rte_experimental

> +int rte_acl_set_mem_cb(struct rte_acl_ctx *acl, const struct rte_acl_mem_cb *mcb);
> +
> +/**
> + * Retrieve the memory allocation callbacks assigned to the ACL context.
> + *
> + * @param acl
> + *   The ACL context.
> + * @param mcb
> + *   Output location for the current memory callback structure
> + *
> + * @return
> + *   0 on success.
> + *   -EINVAL if parameters are invalid.
> + */

should add __rte_experimental

> +int rte_acl_get_mem_cb(const struct rte_acl_ctx *acl, struct rte_acl_mem_cb *mcb);
> +
>  /**
>   * De-allocate all memory used by ACL context.
>   *

please add descriptor section for packet_classif_access_ctrl.rst and release note.



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-12-02  2:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-14  2:51 [RFC] rte_acl_build memory fragmentation concern and proposal for external memory support mannywang(王永峰)
2025-11-17 12:51 ` Konstantin Ananyev
2025-11-25  9:40   ` [PATCH] acl: support custom memory allocator =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-25 12:06   ` [PATCH v2] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-25 12:14   ` [PATCH v3] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-25 14:59     ` Stephen Hemminger
2025-11-26  2:37       ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-25 18:01     ` Dmitry Kozlyuk
2025-11-26  2:44       ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-26  7:57         ` Dmitry Kozlyuk
2025-11-26  8:09           ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-26 21:28             ` Stephen Hemminger
2025-11-27  2:05               ` [Internet]Re: " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-11-28 13:26     ` Konstantin Ananyev
2025-11-28 15:07       ` =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-12-01 12:05       ` [PATCH v4] acl: support custom memory allocators in rte_acl_build =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-12-01 15:59         ` Patrick Robb
2025-12-01 16:42         ` Stephen Hemminger
2025-12-01 12:45       ` [PATCH v5] " =?gb18030?B?bWFubnl3YW5nKM3108C35Sk=?=
2025-12-02  2:47         ` fengchengwen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).