[PATCH v3 1/5] net/mlx5: add new devarg for Tx queue consecutive memory

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Bing Zhao <bingz@nvidia.com>
To: <viacheslavo@nvidia.com>, <matan@nvidia.com>
Cc: <dev@dpdk.org>, <thomas@monjalon.net>, <dsosnowski@nvidia.com>,
	<suanmingm@nvidia.com>, <rasland@nvidia.com>
Subject: [PATCH v3 1/5] net/mlx5: add new devarg for Tx queue consecutive memory
Date: Fri, 27 Jun 2025 19:37:25 +0300	[thread overview]
Message-ID: <20250627163729.50460-2-bingz@nvidia.com> (raw)
In-Reply-To: <20250627163729.50460-1-bingz@nvidia.com>

With this commit, a new device argument is introduced to control
the memory allocation for Tx queues.

By default, without specifying any value. A default alignment with
system page size will be used. All SQ / CQ memory of Tx queues will
be allocated once and a single umem & MR will be used.

When setting to 0, the legacy way of per queue umem allocation will
be selected in the following commit.

If the value is smaller than the system page size, the starting
address alignment will be rounded up to the page size.

The value is a logarithm value based to 2.

Signed-off-by: Bing Zhao <bingz@nvidia.com>
---
 doc/guides/nics/mlx5.rst | 18 ++++++++++++++++++
 drivers/net/mlx5/mlx5.c  | 36 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5.h  |  7 ++++---
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index c1dcb9ca68..82cb06909d 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1682,6 +1682,24 @@ for an additional list of options shared with other mlx5 drivers.
 
   By default, the PMD will set this value to 1.
 
+- ``txq_mem_algn`` parameter [int]
+
+  A The logarithm value to the base 2 for the memory starting
+  address alignment for Tx queues' WQ and associated CQ. Different CPU
+  architectures and generations may have different cache systems. The memory
+  accessing order may impact the cache misses rate on different CPUs. This devarg
+  gives the ability to control the alignment and gaps between TxQs without
+  rebuilding the application binary. User can tune the SW performance by specifying
+  this devarg after benchmark testing on their servers and systems.
+
+  By default, the PMD will set it to log(4096), or log(64*1024) on some specific OS
+  distributions - based on the system page size configuration.
+  All TxQs will use unique memory region and umem area, each TxQs will starting at an
+  address with 4K/64K (default system page size) alignment. If the user's input value is
+  less then the page size, it will be rounded up. If bigger than the maximal queue size,
+  a warning message will be shown, there will be some waste of the memory space. 0 indicates
+  that the legacy per queue memory allocation and separate MRs will be used as before.
+
 
 Multiport E-Switch
 ------------------
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1bad8a9e90..a167d75aeb 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -185,6 +185,14 @@
 /* Device parameter to control representor matching in ingress/egress flows with HWS. */
 #define MLX5_REPR_MATCHING_EN "repr_matching_en"
 
+/*
+ * Alignment of the Tx queue starting address,
+ * If not set, using separate umem and MR for each TxQ.
+ * If set, using consecutive memory address and single MR for all Tx queues, each TxQ will start at
+ * the alignment specified.
+ */
+#define MLX5_TXQ_MEM_ALGN "txq_mem_algn"
+
 /* Shared memory between primary and secondary processes. */
 struct mlx5_shared_data *mlx5_shared_data;
 
@@ -1447,6 +1455,8 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->cnt_svc.cycle_time = tmp;
 	} else if (strcmp(MLX5_REPR_MATCHING_EN, key) == 0) {
 		config->repr_matching = !!tmp;
+	} else if (strcmp(MLX5_TXQ_MEM_ALGN, key) == 0) {
+		config->txq_mem_algn = (uint32_t)tmp;
 	}
 	return 0;
 }
@@ -1486,9 +1496,17 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		MLX5_HWS_CNT_SERVICE_CORE,
 		MLX5_HWS_CNT_CYCLE_TIME,
 		MLX5_REPR_MATCHING_EN,
+		MLX5_TXQ_MEM_ALGN,
 		NULL,
 	};
 	int ret = 0;
+	size_t alignment = rte_mem_page_size();
+	uint32_t max_queue_umem_size = MLX5_WQE_SIZE * mlx5_dev_get_max_wq_size(sh);
+
+	if (alignment == (size_t)-1) {
+		DRV_LOG(WARNING, "Failed to get page_size, using default 4K size.");
+		alignment = 4 * 1024;
+	}
 
 	/* Default configuration. */
 	memset(config, 0, sizeof(*config));
@@ -1501,6 +1519,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 	config->cnt_svc.cycle_time = MLX5_CNT_SVC_CYCLE_TIME_DEFAULT;
 	config->cnt_svc.service_core = rte_get_main_lcore();
 	config->repr_matching = 1;
+	config->txq_mem_algn = log2above(alignment);
 	if (mkvlist != NULL) {
 		/* Process parameters. */
 		ret = mlx5_kvargs_process(mkvlist, params,
@@ -1567,6 +1586,16 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		config->hw_fcs_strip = 0;
 	else
 		config->hw_fcs_strip = sh->dev_cap.hw_fcs_strip;
+	if (config->txq_mem_algn != 0 && config->txq_mem_algn < log2above(alignment)) {
+		DRV_LOG(WARNING,
+			"\"txq_mem_algn\" too small %u, round up to %u.",
+			config->txq_mem_algn, log2above(alignment));
+		config->txq_mem_algn = log2above(alignment);
+	} else if (config->txq_mem_algn > log2above(max_queue_umem_size)) {
+		DRV_LOG(WARNING,
+			"\"txq_mem_algn\" with value %u bigger than %u.",
+			config->txq_mem_algn, log2above(max_queue_umem_size));
+	}
 	DRV_LOG(DEBUG, "FCS stripping configuration is %ssupported",
 		(config->hw_fcs_strip ? "" : "not "));
 	DRV_LOG(DEBUG, "\"tx_pp\" is %d.", config->tx_pp);
@@ -1584,6 +1613,7 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		config->allow_duplicate_pattern);
 	DRV_LOG(DEBUG, "\"fdb_def_rule_en\" is %u.", config->fdb_def_rule);
 	DRV_LOG(DEBUG, "\"repr_matching_en\" is %u.", config->repr_matching);
+	DRV_LOG(DEBUG, "\"txq_mem_algn\" is %u.", config->txq_mem_algn);
 	return 0;
 }
 
@@ -3151,6 +3181,12 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.txq_mem_algn != config->txq_mem_algn) {
+		DRV_LOG(ERR, "\"TxQ memory alignment\" "
+			"configuration mismatch for shared %s context. %u - %u",
+			sh->ibdev_name, sh->config.txq_mem_algn, config->txq_mem_algn);
+		goto error;
+	}
 	mlx5_free(config);
 	return 0;
 error:
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f085656196..6b8d29a2bf 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -386,13 +386,14 @@ struct mlx5_sh_config {
 	uint32_t hw_fcs_strip:1; /* FCS stripping is supported. */
 	uint32_t allow_duplicate_pattern:1;
 	uint32_t lro_allowed:1; /* Whether LRO is allowed. */
+	/* Allow/Prevent the duplicate rules pattern. */
+	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
+	uint32_t repr_matching:1; /* Enable implicit vport matching in HWS FDB. */
+	uint32_t txq_mem_algn; /* logarithm value of the TxQ address alignment. */
 	struct {
 		uint16_t service_core;
 		uint32_t cycle_time; /* query cycle time in milli-second. */
 	} cnt_svc; /* configure for HW steering's counter's service. */
-	/* Allow/Prevent the duplicate rules pattern. */
-	uint32_t fdb_def_rule:1; /* Create FDB default jump rule */
-	uint32_t repr_matching:1; /* Enable implicit vport matching in HWS FDB. */
 };
 
 /* Structure for VF VLAN workaround. */
-- 
2.34.1

next prev parent reply	other threads:[~2025-06-27 16:38 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250623173524.128125-1:x-bingz@nvidia.com>
2025-06-23 18:34 ` [PATCH v2 0/3] Use consecutive Tx queues' memory Bing Zhao
2025-06-23 18:34   ` [PATCH v2 1/3] net/mlx5: fix the WQE size calculation for Tx queue Bing Zhao
2025-06-23 18:34   ` [PATCH v2 2/3] net/mlx5: add new devarg for Tx queue consecutive memory Bing Zhao
2025-06-24 12:01     ` Stephen Hemminger
2025-06-26 13:18       ` Bing Zhao
2025-06-26 14:29         ` Stephen Hemminger
2025-06-26 15:21           ` Thomas Monjalon
2025-06-23 18:34   ` [PATCH v2 3/3] net/mlx5: use consecutive memory for all Tx queues Bing Zhao
2025-06-27 16:37   ` [PATCH v3 0/5] Use consecutive Tx queues' memory Bing Zhao
2025-06-27 16:37     ` Bing Zhao [this message]
2025-06-27 16:37     ` [PATCH v3 2/5] net/mlx5: calculate the memory length for all Tx queues Bing Zhao
2025-06-27 16:37     ` [PATCH v3 3/5] net/mlx5: allocate and release unique resources for " Bing Zhao
2025-06-27 16:37     ` [PATCH v3 4/5] net/mlx5: pass the information in Tx queue start Bing Zhao
2025-06-27 16:37     ` [PATCH v3 5/5] net/mlx5: use consecutive memory for Tx queue creation Bing Zhao
2025-06-27  8:25 ` [PATCH] net/mlx5: fix the WQE size calculation for Tx queue Bing Zhao
2025-06-27  9:27   ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250627163729.50460-2-bingz@nvidia.com \
    --to=bingz@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=matan@nvidia.com \
    --cc=rasland@nvidia.com \
    --cc=suanmingm@nvidia.com \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).