DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/7] ethdev: introduce hairpin memory capabilities
@ 2022-09-19 16:37 Dariusz Sosnowski
  2022-09-19 16:37 ` [PATCH 1/7] " Dariusz Sosnowski
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-09-19 16:37 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, Viacheslav Ovsiienko, Matan Azrad, Ori Kam, Wisam Jaddo,
	Aman Singh, Yuying Zhang

This patch series introduces hairpin memory configuration options proposed in
http://patches.dpdk.org/project/dpdk/patch/20220811120530.191683-1-dsosnowski@nvidia.com/
for Rx and Tx hairpin queues. It also implements handling of these options in mlx5 PMD
and allows to use new hairpin options in testpmd (through `--hairpin-mode` option) and
flow-perf (through `--hairpin-conf` option).

Dariusz Sosnowski (7):
  ethdev: introduce hairpin memory capabilities
  common/mlx5: add hairpin SQ buffer type capabilities
  common/mlx5: add hairpin RQ buffer type capabilities
  net/mlx5: allow hairpin Tx queue in RTE memory
  net/mlx5: allow hairpin Rx queue in locked memory
  app/testpmd: add hairpin queues memory modes
  app/flow-perf: add hairpin queue memory config

 app/test-flow-perf/main.c             |  32 +++++
 app/test-pmd/parameters.c             |   2 +-
 app/test-pmd/testpmd.c                |  24 +++-
 app/test-pmd/testpmd.h                |   2 +-
 doc/guides/platform/mlx5.rst          |   5 +
 doc/guides/testpmd_app_ug/run_app.rst |  10 +-
 drivers/common/mlx5/mlx5_devx_cmds.c  |   8 ++
 drivers/common/mlx5/mlx5_devx_cmds.h  |   5 +
 drivers/common/mlx5/mlx5_prm.h        |  25 +++-
 drivers/net/mlx5/mlx5.h               |   2 +
 drivers/net/mlx5/mlx5_devx.c          | 170 +++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c        |   6 +
 lib/ethdev/rte_ethdev.c               |  44 +++++++
 lib/ethdev/rte_ethdev.h               |  65 +++++++++-
 14 files changed, 373 insertions(+), 27 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/7] ethdev: introduce hairpin memory capabilities
  2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
@ 2022-09-19 16:37 ` Dariusz Sosnowski
  2022-10-04 16:50   ` Thomas Monjalon
  2022-09-19 16:37 ` [PATCH 2/7] common/mlx5: add hairpin SQ buffer type capabilities Dariusz Sosnowski
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-09-19 16:37 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko; +Cc: dev

This patch introduces new hairpin queue configuration options through
rte_eth_hairpin_conf struct, allowing to tune Rx and Tx hairpin queues
memory configuration. Hairpin configuration is extended with the
following fields:

- use_locked_device_memory - If set, PMD will use specialized on-device
  memory to store RX or TX hairpin queue data.
- use_rte_memory - If set, PMD will use DPDK-managed memory to store RX
  or TX hairpin queue data.
- force_memory - If set, PMD will be forced to use provided memory
  settings. If no appropriate resources are available, then device start
  will fail. If unset and no resources are available, PMD will fallback
  to using default type of resource for given queue.

Hairpin capabilities are also extended, to allow verification of support
of given hairpin memory configurations. Struct rte_eth_hairpin_cap is
extended with two additional fields of type rte_eth_hairpin_queue_cap:

- rx_cap - memory capabilities of hairpin RX queues.
- tx_cap - memory capabilities of hairpin TX queues.

Struct rte_eth_hairpin_queue_cap exposes whether given queue type
supports use_locked_device_memory and use_rte_memory flags.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 lib/ethdev/rte_ethdev.c | 44 ++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h | 65 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 1979dc0850..edcec08231 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1945,6 +1945,28 @@ rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			conf->peer_count, cap.max_rx_2_tx);
 		return -EINVAL;
 	}
+	if (conf->use_locked_device_memory && !cap.rx_cap.locked_device_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use locked device memory for Rx queue, which is not supported");
+		return -EINVAL;
+	}
+	if (conf->use_rte_memory && !cap.rx_cap.rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use DPDK memory for Rx queue, which is not supported");
+		return -EINVAL;
+	}
+	if (conf->use_locked_device_memory && conf->use_rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use mutually exclusive memory settings for Rx queue");
+		return -EINVAL;
+	}
+	if (conf->force_memory &&
+	    !conf->use_locked_device_memory &&
+	    !conf->use_rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to force Rx queue memory settings, but none is set");
+		return -EINVAL;
+	}
 	if (conf->peer_count == 0) {
 		RTE_ETHDEV_LOG(ERR,
 			"Invalid value for number of peers for Rx queue(=%u), should be: > 0",
@@ -2111,6 +2133,28 @@ rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 			conf->peer_count, cap.max_tx_2_rx);
 		return -EINVAL;
 	}
+	if (conf->use_locked_device_memory && !cap.tx_cap.locked_device_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use locked device memory for Tx queue, which is not supported");
+		return -EINVAL;
+	}
+	if (conf->use_rte_memory && !cap.tx_cap.rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use DPDK memory for Tx queue, which is not supported");
+		return -EINVAL;
+	}
+	if (conf->use_locked_device_memory && conf->use_rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use mutually exclusive memory settings for Tx queue");
+		return -EINVAL;
+	}
+	if (conf->force_memory &&
+	    !conf->use_locked_device_memory &&
+	    !conf->use_rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to force Tx queue memory settings, but none is set");
+		return -EINVAL;
+	}
 	if (conf->peer_count == 0) {
 		RTE_ETHDEV_LOG(ERR,
 			"Invalid value for number of peers for Tx queue(=%u), should be: > 0",
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index de9e970d4d..e179b0e79b 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1273,6 +1273,28 @@ struct rte_eth_txconf {
 	void *reserved_ptrs[2];   /**< Reserved for future fields */
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to return the Tx or Rx hairpin queue capabilities that are supported.
+ */
+struct rte_eth_hairpin_queue_cap {
+	/**
+	 * When set, a specialized on-device memory type can be used as a backing
+	 * storage for a given hairpin queue type.
+	 */
+	uint32_t locked_device_memory:1;
+
+	/**
+	 * When set, memory managed by DPDK can be used as a backing storage
+	 * for a given hairpin queue type.
+	 */
+	uint32_t rte_memory:1;
+
+	uint32_t reserved:30; /**< Reserved for future fields */
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
@@ -1287,6 +1309,8 @@ struct rte_eth_hairpin_cap {
 	/** Max number of Tx queues to be connected to one Rx queue. */
 	uint16_t max_tx_2_rx;
 	uint16_t max_nb_desc; /**< The max num of descriptors. */
+	struct rte_eth_hairpin_queue_cap rx_cap; /**< Rx hairpin queue capabilities. */
+	struct rte_eth_hairpin_queue_cap tx_cap; /**< Tx hairpin queue capabilities. */
 };
 
 #define RTE_ETH_MAX_HAIRPIN_PEERS 32
@@ -1334,7 +1358,46 @@ struct rte_eth_hairpin_conf {
 	 *   configured automatically during port start.
 	 */
 	uint32_t manual_bind:1;
-	uint32_t reserved:14; /**< Reserved bits. */
+
+	/**
+	 * Use locked device memory as a backing storage.
+	 *
+	 * - When set, PMD will attempt to use on-device memory as a backing storage for descriptors
+	 *   and/or data in hairpin queue.
+	 * - When set, PMD will use detault memory type as a backing storage. Please refer to PMD
+	 *   documentation for details.
+	 *
+	 * API user should check if PMD supports this configuration flag using
+	 * @see rte_eth_dev_hairpin_capability_get.
+	 */
+	uint32_t use_locked_device_memory:1;
+
+	/**
+	 * Use DPDK memory as backing storage.
+	 *
+	 * - When set, PMD will attempt to use memory managed by DPDK as a backing storage
+	 *   for descriptors and/or data in hairpin queue.
+	 * - When clear, PMD will use default memory type as a backing storage. Please refer
+	 *   to PMD documentation for details.
+	 *
+	 * API user should check if PMD supports this configuration flag using
+	 * @see rte_eth_dev_hairpin_capability_get.
+	 */
+	uint32_t use_rte_memory:1;
+
+	/**
+	 * Force usage of hairpin memory configuration.
+	 *
+	 * - When set, PMD will attempt to use specified memory settings and
+	 *   if resource allocation fails, then hairpin queue setup will result in an
+	 *   error.
+	 * - When clear, PMD will attempt to use specified memory settings and
+	 *   if resource allocation fails, then PMD will retry allocation with default
+	 *   configuration.
+	 */
+	uint32_t force_memory:1;
+
+	uint32_t reserved:11; /**< Reserved bits. */
 	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
 };
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/7] common/mlx5: add hairpin SQ buffer type capabilities
  2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
  2022-09-19 16:37 ` [PATCH 1/7] " Dariusz Sosnowski
@ 2022-09-19 16:37 ` Dariusz Sosnowski
  2022-09-27 13:03   ` Slava Ovsiienko
  2022-09-19 16:37 ` [PATCH 3/7] common/mlx5: add hairpin RQ " Dariusz Sosnowski
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-09-19 16:37 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This patch extends HCA_CAP and SQ Context structs available in PRM. This
fields allow checking if NIC supports storing hairpin SQ's WQ buffer in
host memory and configuring such memory placement.

HCA capabilities are extended with the following fields:

- hairpin_sq_wq_in_host_mem - If set, then NIC supports using host
memory as a backing storage for hairpin SQ's WQ buffer.
- hairpin_sq_wqe_bb_size - Indicates the required size of SQ WQE basic
block.

SQ Context is extended with hairpin_wq_buffer_type which informs
NIC where SQ's WQ buffer will be stored. This field can take the
following values:

- MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_INTERNAL_BUFFER - WQ buffer will be
  stored in unlocked device memory.
- MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_HOST_MEMORY - WQ buffer will be stored
  in host memory. Buffer is provided by PMD.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  5 +++++
 drivers/common/mlx5/mlx5_devx_cmds.h |  3 +++
 drivers/common/mlx5/mlx5_prm.h       | 15 +++++++++++++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 8880a9f3b5..2b12ce0d4c 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -981,6 +981,10 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 		}
 		attr->log_min_stride_wqe_sz = MLX5_GET(cmd_hca_cap_2, hcattr,
 						       log_min_stride_wqe_sz);
+		attr->hairpin_sq_wqe_bb_size = MLX5_GET(cmd_hca_cap_2, hcattr,
+							hairpin_sq_wqe_bb_size);
+		attr->hairpin_sq_wq_in_host_mem = MLX5_GET(cmd_hca_cap_2, hcattr,
+							   hairpin_sq_wq_in_host_mem);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
@@ -1698,6 +1702,7 @@ mlx5_devx_cmd_create_sq(void *ctx,
 	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
 	MLX5_SET(sqc, sq_ctx, non_wire, sq_attr->non_wire);
 	MLX5_SET(sqc, sq_ctx, static_sq_wq, sq_attr->static_sq_wq);
+	MLX5_SET(sqc, sq_ctx, hairpin_wq_buffer_type, sq_attr->hairpin_wq_buffer_type);
 	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
 	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
 	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index af6053a788..9ac2d75df4 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -191,6 +191,8 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_queues:5;
 	uint32_t log_max_hairpin_wq_data_sz:5;
 	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t hairpin_sq_wqe_bb_size:4;
+	uint32_t hairpin_sq_wq_in_host_mem:1;
 	uint32_t vhca_id:16;
 	uint32_t relaxed_ordering_write:1;
 	uint32_t relaxed_ordering_read:1;
@@ -407,6 +409,7 @@ struct mlx5_devx_create_sq_attr {
 	uint32_t non_wire:1;
 	uint32_t static_sq_wq:1;
 	uint32_t ts_format:2;
+	uint32_t hairpin_wq_buffer_type:3;
 	uint32_t user_index:24;
 	uint32_t cqn:24;
 	uint32_t packet_pacing_rate_limit_index:16;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4346279c81..04d35ca845 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -2020,7 +2020,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 reserved_at_d8[0x3];
 	u8 log_max_conn_track_offload[0x5];
 	u8 reserved_at_e0[0x20]; /* End of DW7. */
-	u8 reserved_at_100[0x700];
+	u8 reserved_at_100[0x60];
+	u8 reserved_at_160[0x3];
+	u8 hairpin_sq_wqe_bb_size[0x5];
+	u8 hairpin_sq_wq_in_host_mem[0x1];
+	u8 reserved_at_169[0x697];
 };
 
 struct mlx5_ifc_esw_cap_bits {
@@ -2673,6 +2677,11 @@ enum {
 	MLX5_SQC_STATE_ERR  = 0x3,
 };
 
+enum {
+	MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_INTERNAL_BUFFER = 0x0,
+	MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_HOST_MEMORY = 0x1,
+};
+
 struct mlx5_ifc_sqc_bits {
 	u8 rlky[0x1];
 	u8 cd_master[0x1];
@@ -2686,7 +2695,9 @@ struct mlx5_ifc_sqc_bits {
 	u8 hairpin[0x1];
 	u8 non_wire[0x1];
 	u8 static_sq_wq[0x1];
-	u8 reserved_at_11[0x9];
+	u8 reserved_at_11[0x4];
+	u8 hairpin_wq_buffer_type[0x3];
+	u8 reserved_at_18[0x2];
 	u8 ts_format[0x02];
 	u8 reserved_at_1c[0x4];
 	u8 reserved_at_20[0x8];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/7] common/mlx5: add hairpin RQ buffer type capabilities
  2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
  2022-09-19 16:37 ` [PATCH 1/7] " Dariusz Sosnowski
  2022-09-19 16:37 ` [PATCH 2/7] common/mlx5: add hairpin SQ buffer type capabilities Dariusz Sosnowski
@ 2022-09-19 16:37 ` Dariusz Sosnowski
  2022-09-27 13:04   ` Slava Ovsiienko
  2022-09-19 16:37 ` [PATCH 4/7] net/mlx5: allow hairpin Tx queue in RTE memory Dariusz Sosnowski
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-09-19 16:37 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This patch adds new HCA capability related to hairpin RQs. This new
capability, hairpin_data_buffer_locked, indicates whether HCA supports
locking data buffer of hairpin RQ in ICMC (Interconnect Context Memory
Cache).

Struct used to define RQ configuration (RQ context) is extended with
hairpin_data_buffer_type field, which configures data buffer for hairpin
RQ. It can take the following values:

- MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_UNLOCKED_INTERNAL_BUFFER - hairpin
  RQ's data buffer is stored in unlocked memory in ICMC.
- MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_LOCKED_INTERNAL_BUFFER - hairpin
  RQ's data buffer is stored in locked memory in ICMC.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  3 +++
 drivers/common/mlx5/mlx5_devx_cmds.h |  2 ++
 drivers/common/mlx5/mlx5_prm.h       | 12 ++++++++++--
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2b12ce0d4c..95b38783dc 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -985,6 +985,8 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 							hairpin_sq_wqe_bb_size);
 		attr->hairpin_sq_wq_in_host_mem = MLX5_GET(cmd_hca_cap_2, hcattr,
 							   hairpin_sq_wq_in_host_mem);
+		attr->hairpin_data_buffer_locked = MLX5_GET(cmd_hca_cap_2, hcattr,
+							    hairpin_data_buffer_locked);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
@@ -1285,6 +1287,7 @@ mlx5_devx_cmd_create_rq(void *ctx,
 	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
 	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
 	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
+	MLX5_SET(rqc, rq_ctx, hairpin_data_buffer_type, rq_attr->hairpin_data_buffer_type);
 	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
 	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
 	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 9ac2d75df4..cceaf3411d 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -193,6 +193,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_num_packets:5;
 	uint32_t hairpin_sq_wqe_bb_size:4;
 	uint32_t hairpin_sq_wq_in_host_mem:1;
+	uint32_t hairpin_data_buffer_locked:1;
 	uint32_t vhca_id:16;
 	uint32_t relaxed_ordering_write:1;
 	uint32_t relaxed_ordering_read:1;
@@ -313,6 +314,7 @@ struct mlx5_devx_create_rq_attr {
 	uint32_t state:4;
 	uint32_t flush_in_error_en:1;
 	uint32_t hairpin:1;
+	uint32_t hairpin_data_buffer_type:3;
 	uint32_t ts_format:2;
 	uint32_t user_index:24;
 	uint32_t cqn:24;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 04d35ca845..9c1c93f916 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -2024,7 +2024,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 reserved_at_160[0x3];
 	u8 hairpin_sq_wqe_bb_size[0x5];
 	u8 hairpin_sq_wq_in_host_mem[0x1];
-	u8 reserved_at_169[0x697];
+	u8 hairpin_data_buffer_locked[0x1];
+	u8 reserved_at_16a[0x696];
 };
 
 struct mlx5_ifc_esw_cap_bits {
@@ -2304,7 +2305,9 @@ struct mlx5_ifc_rqc_bits {
 	u8 reserved_at_c[0x1];
 	u8 flush_in_error_en[0x1];
 	u8 hairpin[0x1];
-	u8 reserved_at_f[0xB];
+	u8 reserved_at_f[0x6];
+	u8 hairpin_data_buffer_type[0x3];
+	u8 reserved_at_a8[0x2];
 	u8 ts_format[0x02];
 	u8 reserved_at_1c[0x4];
 	u8 reserved_at_20[0x8];
@@ -2813,6 +2816,11 @@ enum {
 	MLX5_CQE_SIZE_128B = 0x1,
 };
 
+enum {
+	MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_UNLOCKED_INTERNAL_BUFFER = 0x0,
+	MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_LOCKED_INTERNAL_BUFFER = 0x1,
+};
+
 struct mlx5_ifc_cqc_bits {
 	u8 status[0x4];
 	u8 as_notify[0x1];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/7] net/mlx5: allow hairpin Tx queue in RTE memory
  2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
                   ` (2 preceding siblings ...)
  2022-09-19 16:37 ` [PATCH 3/7] common/mlx5: add hairpin RQ " Dariusz Sosnowski
@ 2022-09-19 16:37 ` Dariusz Sosnowski
  2022-09-27 13:05   ` Slava Ovsiienko
  2022-09-19 16:37 ` [PATCH 5/7] net/mlx5: allow hairpin Rx queue in locked memory Dariusz Sosnowski
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-09-19 16:37 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This patch adds a capability to place hairpin Tx queue in host memory
managed by DPDK. This capability is equivalent to storing hairpin SQ's
WQ buffer in host memory.

Hairpin Tx queue creation is extended with allocating a memory buffer of
proper size (calculated from required number of packets and WQE BB size
advertised in HCA capabilities).

force_memory flag of hairpin queue configuration is also supported.
If it is set and:

- allocation of memory buffer fails,
- or hairpin SQ creation fails,

then device start will fail. If it is unset, PMD will fallback to
creating the hairpin SQ with WQ buffer located in unlocked device
memory.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/mlx5.h        |   2 +
 drivers/net/mlx5/mlx5_devx.c   | 119 ++++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c |   4 ++
 3 files changed, 116 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8af84aef50..f564d4b771 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1384,6 +1384,8 @@ struct mlx5_txq_obj {
 			struct mlx5_devx_obj *sq;
 			/* DevX object for Sx queue. */
 			struct mlx5_devx_obj *tis; /* The TIS object. */
+			void *umem_buf_wq_buffer;
+			struct mlx5dv_devx_umem *umem_obj_wq_buffer;
 		};
 		struct {
 			struct rte_eth_dev *dev;
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 6886ae1f22..a81b1bae47 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1185,18 +1185,23 @@ static int
 mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hca_attr *hca_attr = &priv->sh->cdev->config.hca_attr;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 		container_of(txq_data, struct mlx5_txq_ctrl, txq);
-	struct mlx5_devx_create_sq_attr attr = { 0 };
+	struct mlx5_devx_create_sq_attr dev_mem_attr = { 0 };
+	struct mlx5_devx_create_sq_attr host_mem_attr = { 0 };
 	struct mlx5_txq_obj *tmpl = txq_ctrl->obj;
+	struct mlx5dv_devx_umem *umem_obj = NULL;
+	void *umem_buf = NULL;
 	uint32_t max_wq_data;
 
 	MLX5_ASSERT(txq_data);
 	MLX5_ASSERT(tmpl);
 	tmpl->txq_ctrl = txq_ctrl;
-	attr.hairpin = 1;
-	attr.tis_lst_sz = 1;
+	dev_mem_attr.hairpin = 1;
+	dev_mem_attr.tis_lst_sz = 1;
+	dev_mem_attr.tis_num = mlx5_get_txq_tis_num(dev, idx);
 	max_wq_data =
 		priv->sh->cdev->config.hca_attr.log_max_hairpin_wq_data_sz;
 	/* Jumbo frames > 9KB should be supported, and more packets. */
@@ -1208,19 +1213,103 @@ mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
 			rte_errno = ERANGE;
 			return -rte_errno;
 		}
-		attr.wq_attr.log_hairpin_data_sz = priv->config.log_hp_size;
+		dev_mem_attr.wq_attr.log_hairpin_data_sz = priv->config.log_hp_size;
 	} else {
-		attr.wq_attr.log_hairpin_data_sz =
+		dev_mem_attr.wq_attr.log_hairpin_data_sz =
 				(max_wq_data < MLX5_HAIRPIN_JUMBO_LOG_SIZE) ?
 				 max_wq_data : MLX5_HAIRPIN_JUMBO_LOG_SIZE;
 	}
 	/* Set the packets number to the maximum value for performance. */
-	attr.wq_attr.log_hairpin_num_packets =
-			attr.wq_attr.log_hairpin_data_sz -
+	dev_mem_attr.wq_attr.log_hairpin_num_packets =
+			dev_mem_attr.wq_attr.log_hairpin_data_sz -
 			MLX5_HAIRPIN_QUEUE_STRIDE;
+	dev_mem_attr.hairpin_wq_buffer_type = MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_INTERNAL_BUFFER;
+	if (txq_ctrl->hairpin_conf.use_rte_memory) {
+		uint32_t umem_size;
+		uint32_t umem_dbrec;
+		size_t alignment = MLX5_WQE_BUF_ALIGNMENT;
 
-	attr.tis_num = mlx5_get_txq_tis_num(dev, idx);
-	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->cdev->ctx, &attr);
+		if (alignment == (size_t)-1) {
+			DRV_LOG(ERR, "Failed to get WQE buf alignment.");
+			rte_errno = ENOMEM;
+			return -rte_errno;
+		}
+		/*
+		 * It is assumed that configuration is verified against capabilities
+		 * during queue setup.
+		 */
+		MLX5_ASSERT(hca_attr->hairpin_sq_wq_in_host_mem);
+		MLX5_ASSERT(hca_attr->hairpin_sq_wqe_bb_size > 0);
+		rte_memcpy(&host_mem_attr, &dev_mem_attr, sizeof(host_mem_attr));
+		umem_size = MLX5_WQE_SIZE *
+			RTE_BIT32(host_mem_attr.wq_attr.log_hairpin_num_packets);
+		umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+		umem_size += MLX5_DBR_SIZE;
+		umem_buf = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO, umem_size,
+				       alignment, priv->sh->numa_node);
+		if (umem_buf == NULL && txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(ERR, "Failed to allocate memory for hairpin TX queue");
+			rte_errno = ENOMEM;
+			return -rte_errno;
+		} else if (umem_buf == NULL && !txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(WARNING, "Failed to allocate memory for hairpin TX queue."
+					 " Falling back to TX queue located on the device.");
+			goto create_sq_on_device;
+		}
+		umem_obj = mlx5_glue->devx_umem_reg(priv->sh->cdev->ctx,
+						    (void *)(uintptr_t)umem_buf,
+						    umem_size,
+						    IBV_ACCESS_LOCAL_WRITE);
+		if (umem_obj == NULL && txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(ERR, "Failed to register UMEM for hairpin TX queue");
+			mlx5_free(umem_buf);
+			return -rte_errno;
+		} else if (umem_obj == NULL && !txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(WARNING, "Failed to register UMEM for hairpin TX queue."
+					 " Falling back to TX queue located on the device.");
+			rte_errno = 0;
+			mlx5_free(umem_buf);
+			goto create_sq_on_device;
+		}
+		host_mem_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
+		host_mem_attr.wq_attr.wq_umem_valid = 1;
+		host_mem_attr.wq_attr.wq_umem_id = mlx5_os_get_umem_id(umem_obj);
+		host_mem_attr.wq_attr.wq_umem_offset = 0;
+		host_mem_attr.wq_attr.dbr_umem_valid = 1;
+		host_mem_attr.wq_attr.dbr_umem_id = host_mem_attr.wq_attr.wq_umem_id;
+		host_mem_attr.wq_attr.dbr_addr = umem_dbrec;
+		host_mem_attr.wq_attr.log_wq_stride = rte_log2_u32(MLX5_WQE_SIZE);
+		host_mem_attr.wq_attr.log_wq_sz =
+				host_mem_attr.wq_attr.log_hairpin_num_packets *
+				hca_attr->hairpin_sq_wqe_bb_size;
+		host_mem_attr.wq_attr.log_wq_pg_sz = MLX5_LOG_PAGE_SIZE;
+		host_mem_attr.hairpin_wq_buffer_type = MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_HOST_MEMORY;
+		tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->cdev->ctx, &host_mem_attr);
+		if (!tmpl->sq && txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(ERR,
+				"Port %u tx hairpin queue %u can't create SQ object.",
+				dev->data->port_id, idx);
+			claim_zero(mlx5_glue->devx_umem_dereg(umem_obj));
+			mlx5_free(umem_buf);
+			return -rte_errno;
+		} else if (!tmpl->sq && !txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(WARNING,
+				"Port %u tx hairpin queue %u failed to allocate SQ object"
+				" using host memory. Falling back to TX queue located"
+				" on the device",
+				dev->data->port_id, idx);
+			rte_errno = 0;
+			claim_zero(mlx5_glue->devx_umem_dereg(umem_obj));
+			mlx5_free(umem_buf);
+			goto create_sq_on_device;
+		}
+		tmpl->umem_buf_wq_buffer = umem_buf;
+		tmpl->umem_obj_wq_buffer = umem_obj;
+		return 0;
+	}
+
+create_sq_on_device:
+	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->cdev->ctx, &dev_mem_attr);
 	if (!tmpl->sq) {
 		DRV_LOG(ERR,
 			"Port %u tx hairpin queue %u can't create SQ object.",
@@ -1452,8 +1541,20 @@ mlx5_txq_devx_obj_release(struct mlx5_txq_obj *txq_obj)
 {
 	MLX5_ASSERT(txq_obj);
 	if (txq_obj->txq_ctrl->is_hairpin) {
+		if (txq_obj->sq) {
+			claim_zero(mlx5_devx_cmd_destroy(txq_obj->sq));
+			txq_obj->sq = NULL;
+		}
 		if (txq_obj->tis)
 			claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		if (txq_obj->umem_obj_wq_buffer) {
+			claim_zero(mlx5_glue->devx_umem_dereg(txq_obj->umem_obj_wq_buffer));
+			txq_obj->umem_obj_wq_buffer = NULL;
+		}
+		if (txq_obj->umem_buf_wq_buffer) {
+			mlx5_free(txq_obj->umem_buf_wq_buffer);
+			txq_obj->umem_buf_wq_buffer = NULL;
+		}
 #if defined(HAVE_MLX5DV_DEVX_UAR_OFFSET) || !defined(HAVE_INFINIBAND_VERBS_H)
 	} else {
 		mlx5_txq_release_devx_resources(txq_obj);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 05c919ed39..7f5b01ac74 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -729,6 +729,7 @@ int
 mlx5_hairpin_cap_get(struct rte_eth_dev *dev, struct rte_eth_hairpin_cap *cap)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hca_attr *hca_attr;
 
 	if (!mlx5_devx_obj_ops_en(priv->sh)) {
 		rte_errno = ENOTSUP;
@@ -738,5 +739,8 @@ mlx5_hairpin_cap_get(struct rte_eth_dev *dev, struct rte_eth_hairpin_cap *cap)
 	cap->max_rx_2_tx = 1;
 	cap->max_tx_2_rx = 1;
 	cap->max_nb_desc = 8192;
+	hca_attr = &priv->sh->cdev->config.hca_attr;
+	cap->tx_cap.locked_device_memory = 0;
+	cap->tx_cap.rte_memory = hca_attr->hairpin_sq_wq_in_host_mem;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 5/7] net/mlx5: allow hairpin Rx queue in locked memory
  2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
                   ` (3 preceding siblings ...)
  2022-09-19 16:37 ` [PATCH 4/7] net/mlx5: allow hairpin Tx queue in RTE memory Dariusz Sosnowski
@ 2022-09-19 16:37 ` Dariusz Sosnowski
  2022-09-27 13:04   ` Slava Ovsiienko
  2022-11-25 14:06   ` Kenneth Klette Jonassen
  2022-09-19 16:37 ` [PATCH 6/7] app/testpmd: add hairpin queues memory modes Dariusz Sosnowski
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-09-19 16:37 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This patch adds a capability to place hairpin Rx queue in locked device
memory. This capability is equivalent to storing hairpin RQ's data
buffers in locked internal device memory.

Hairpin Rx queue creation is extended with requesting that RQ is
allocated in locked internal device memory. If allocation fails and
force_memory hairpin configuration is set, then hairpin queue creation
(and, as a result, device start) fails. If force_memory is unset, then
PMD will fallback to allocating memory for hairpin RQ in unlocked
internal device memory.

To allow such allocation, the user must set HAIRPIN_DATA_BUFFER_LOCK
flag in FW using mlxconfig tool.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/platform/mlx5.rst   |  5 ++++
 drivers/net/mlx5/mlx5_devx.c   | 51 ++++++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_ethdev.c |  2 ++
 3 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/doc/guides/platform/mlx5.rst b/doc/guides/platform/mlx5.rst
index 38c1fdce4c..88a2961bb4 100644
--- a/doc/guides/platform/mlx5.rst
+++ b/doc/guides/platform/mlx5.rst
@@ -548,6 +548,11 @@ Below are some firmware configurations listed.
 
    REAL_TIME_CLOCK_ENABLE=1
 
+- allow locking hairpin RQ data buffer in device memory::
+
+   HAIRPIN_DATA_BUFFER_LOCK=1
+   MEMIC_SIZE_LIMIT=0
+
 
 .. _mlx5_common_driver_options:
 
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index a81b1bae47..e65350bd7c 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -468,14 +468,16 @@ mlx5_rxq_obj_hairpin_new(struct mlx5_rxq_priv *rxq)
 {
 	uint16_t idx = rxq->idx;
 	struct mlx5_priv *priv = rxq->priv;
+	struct mlx5_hca_attr *hca_attr __rte_unused = &priv->sh->cdev->config.hca_attr;
 	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->ctrl;
-	struct mlx5_devx_create_rq_attr attr = { 0 };
+	struct mlx5_devx_create_rq_attr unlocked_attr = { 0 };
+	struct mlx5_devx_create_rq_attr locked_attr = { 0 };
 	struct mlx5_rxq_obj *tmpl = rxq_ctrl->obj;
 	uint32_t max_wq_data;
 
 	MLX5_ASSERT(rxq != NULL && rxq->ctrl != NULL && tmpl != NULL);
 	tmpl->rxq_ctrl = rxq_ctrl;
-	attr.hairpin = 1;
+	unlocked_attr.hairpin = 1;
 	max_wq_data =
 		priv->sh->cdev->config.hca_attr.log_max_hairpin_wq_data_sz;
 	/* Jumbo frames > 9KB should be supported, and more packets. */
@@ -487,20 +489,50 @@ mlx5_rxq_obj_hairpin_new(struct mlx5_rxq_priv *rxq)
 			rte_errno = ERANGE;
 			return -rte_errno;
 		}
-		attr.wq_attr.log_hairpin_data_sz = priv->config.log_hp_size;
+		unlocked_attr.wq_attr.log_hairpin_data_sz = priv->config.log_hp_size;
 	} else {
-		attr.wq_attr.log_hairpin_data_sz =
+		unlocked_attr.wq_attr.log_hairpin_data_sz =
 				(max_wq_data < MLX5_HAIRPIN_JUMBO_LOG_SIZE) ?
 				 max_wq_data : MLX5_HAIRPIN_JUMBO_LOG_SIZE;
 	}
 	/* Set the packets number to the maximum value for performance. */
-	attr.wq_attr.log_hairpin_num_packets =
-			attr.wq_attr.log_hairpin_data_sz -
+	unlocked_attr.wq_attr.log_hairpin_num_packets =
+			unlocked_attr.wq_attr.log_hairpin_data_sz -
 			MLX5_HAIRPIN_QUEUE_STRIDE;
-	attr.counter_set_id = priv->counter_set_id;
+	unlocked_attr.counter_set_id = priv->counter_set_id;
 	rxq_ctrl->rxq.delay_drop = priv->config.hp_delay_drop;
-	attr.delay_drop_en = priv->config.hp_delay_drop;
-	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->cdev->ctx, &attr,
+	unlocked_attr.delay_drop_en = priv->config.hp_delay_drop;
+	unlocked_attr.hairpin_data_buffer_type =
+			MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_UNLOCKED_INTERNAL_BUFFER;
+	if (rxq->hairpin_conf.use_locked_device_memory) {
+		/*
+		 * It is assumed that configuration is verified against capabilities
+		 * during queue setup.
+		 */
+		MLX5_ASSERT(hca_attr->hairpin_data_buffer_locked);
+		rte_memcpy(&locked_attr, &unlocked_attr, sizeof(locked_attr));
+		locked_attr.hairpin_data_buffer_type =
+				MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_LOCKED_INTERNAL_BUFFER;
+		tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->cdev->ctx, &locked_attr,
+						   rxq_ctrl->socket);
+		if (!tmpl->rq && rxq->hairpin_conf.force_memory) {
+			DRV_LOG(ERR, "Port %u Rx hairpin queue %u can't create RQ object"
+				     " with locked memory buffer",
+				     priv->dev_data->port_id, idx);
+			return -rte_errno;
+		} else if (!tmpl->rq && !rxq->hairpin_conf.force_memory) {
+			DRV_LOG(WARNING, "Port %u Rx hairpin queue %u can't create RQ object"
+					 " with locked memory buffer. Falling back to unlocked"
+					 " device memory.",
+					 priv->dev_data->port_id, idx);
+			rte_errno = 0;
+			goto create_rq_unlocked;
+		}
+		goto create_rq_set_state;
+	}
+
+create_rq_unlocked:
+	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->cdev->ctx, &unlocked_attr,
 					   rxq_ctrl->socket);
 	if (!tmpl->rq) {
 		DRV_LOG(ERR,
@@ -509,6 +541,7 @@ mlx5_rxq_obj_hairpin_new(struct mlx5_rxq_priv *rxq)
 		rte_errno = errno;
 		return -rte_errno;
 	}
+create_rq_set_state:
 	priv->dev_data->rx_queue_state[idx] = RTE_ETH_QUEUE_STATE_HAIRPIN;
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 7f5b01ac74..7f400da103 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -740,6 +740,8 @@ mlx5_hairpin_cap_get(struct rte_eth_dev *dev, struct rte_eth_hairpin_cap *cap)
 	cap->max_tx_2_rx = 1;
 	cap->max_nb_desc = 8192;
 	hca_attr = &priv->sh->cdev->config.hca_attr;
+	cap->rx_cap.locked_device_memory = hca_attr->hairpin_data_buffer_locked;
+	cap->rx_cap.rte_memory = 0;
 	cap->tx_cap.locked_device_memory = 0;
 	cap->tx_cap.rte_memory = hca_attr->hairpin_sq_wq_in_host_mem;
 	return 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 6/7] app/testpmd: add hairpin queues memory modes
  2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
                   ` (4 preceding siblings ...)
  2022-09-19 16:37 ` [PATCH 5/7] net/mlx5: allow hairpin Rx queue in locked memory Dariusz Sosnowski
@ 2022-09-19 16:37 ` Dariusz Sosnowski
  2022-09-19 16:37 ` [PATCH 7/7] app/flow-perf: add hairpin queue memory config Dariusz Sosnowski
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-09-19 16:37 UTC (permalink / raw)
  To: Aman Singh, Yuying Zhang; +Cc: dev

This patch extends hairpin-mode command line option of test-pmd
application with an ability to configure whether Rx/Tx hairpin queue
should use locked device memory or RTE memory.

For purposes of this configurations the following bits of 32 bit
hairpin-mode are reserved:

- Bit 8 - If set, then force_memory flag will be set for hairpin RX
  queue.
- Bit 9 - If set, then force_memory flag will be set for hairpin TX
  queue.
- Bits 12-15 - Memory options for hairpin Rx queue:
    - Bit 12 - If set, then use_locked_device_memory will be set.
    - Bit 13 - If set, then use_rte_memory will be set.
    - Bit 14 - Reserved for future use.
    - Bit 15 - Reserved for future use.
- Bits 16-19 - Memory options for hairpin Tx queue:
    - Bit 16 - If set, then use_locked_device_memory will be set.
    - Bit 17 - If set, then use_rte_memory will be set.
    - Bit 18 - Reserved for future use.
    - Bit 19 - Reserved for future use.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 app/test-pmd/parameters.c             |  2 +-
 app/test-pmd/testpmd.c                | 24 +++++++++++++++++++++++-
 app/test-pmd/testpmd.h                |  2 +-
 doc/guides/testpmd_app_ug/run_app.rst | 10 ++++++++--
 4 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e3c9757f3f..662e6e4a36 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -1162,7 +1162,7 @@ launch_args_parse(int argc, char** argv)
 				if (errno != 0 || end == optarg)
 					rte_exit(EXIT_FAILURE, "hairpin mode invalid\n");
 				else
-					hairpin_mode = (uint16_t)n;
+					hairpin_mode = (uint32_t)n;
 			}
 			if (!strcmp(lgopts[opt_idx].name, "burst")) {
 				n = atoi(optarg);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index addcbcac85..2fbd546073 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -409,7 +409,7 @@ bool setup_on_probe_event = true;
 uint8_t clear_ptypes = true;
 
 /* Hairpin ports configuration mode. */
-uint16_t hairpin_mode;
+uint32_t hairpin_mode;
 
 /* Pretty printing of ethdev events */
 static const char * const eth_event_desc[] = {
@@ -2552,6 +2552,16 @@ port_is_started(portid_t port_id)
 	return 1;
 }
 
+#define HAIRPIN_MODE_RX_FORCE_MEMORY RTE_BIT32(8)
+#define HAIRPIN_MODE_TX_FORCE_MEMORY RTE_BIT32(9)
+
+#define HAIRPIN_MODE_RX_LOCKED_MEMORY RTE_BIT32(12)
+#define HAIRPIN_MODE_RX_RTE_MEMORY RTE_BIT32(13)
+
+#define HAIRPIN_MODE_TX_LOCKED_MEMORY RTE_BIT32(16)
+#define HAIRPIN_MODE_TX_RTE_MEMORY RTE_BIT32(17)
+
+
 /* Configure the Rx and Tx hairpin queues for the selected port. */
 static int
 setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
@@ -2567,6 +2577,12 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 	uint16_t peer_tx_port = pi;
 	uint32_t manual = 1;
 	uint32_t tx_exp = hairpin_mode & 0x10;
+	uint32_t rx_force_memory = hairpin_mode & HAIRPIN_MODE_RX_FORCE_MEMORY;
+	uint32_t rx_locked_memory = hairpin_mode & HAIRPIN_MODE_RX_LOCKED_MEMORY;
+	uint32_t rx_rte_memory = hairpin_mode & HAIRPIN_MODE_RX_RTE_MEMORY;
+	uint32_t tx_force_memory = hairpin_mode & HAIRPIN_MODE_TX_FORCE_MEMORY;
+	uint32_t tx_locked_memory = hairpin_mode & HAIRPIN_MODE_TX_LOCKED_MEMORY;
+	uint32_t tx_rte_memory = hairpin_mode & HAIRPIN_MODE_TX_RTE_MEMORY;
 
 	if (!(hairpin_mode & 0xf)) {
 		peer_rx_port = pi;
@@ -2606,6 +2622,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 		hairpin_conf.peers[0].queue = i + nb_rxq;
 		hairpin_conf.manual_bind = !!manual;
 		hairpin_conf.tx_explicit = !!tx_exp;
+		hairpin_conf.force_memory = !!tx_force_memory;
+		hairpin_conf.use_locked_device_memory = !!tx_locked_memory;
+		hairpin_conf.use_rte_memory = !!tx_rte_memory;
 		diag = rte_eth_tx_hairpin_queue_setup
 			(pi, qi, nb_txd, &hairpin_conf);
 		i++;
@@ -2629,6 +2648,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 		hairpin_conf.peers[0].queue = i + nb_txq;
 		hairpin_conf.manual_bind = !!manual;
 		hairpin_conf.tx_explicit = !!tx_exp;
+		hairpin_conf.force_memory = !!rx_force_memory;
+		hairpin_conf.use_locked_device_memory = !!rx_locked_memory;
+		hairpin_conf.use_rte_memory = !!rx_rte_memory;
 		diag = rte_eth_rx_hairpin_queue_setup
 			(pi, qi, nb_rxd, &hairpin_conf);
 		i++;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fb2f5195d3..bc4d9788fa 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -542,7 +542,7 @@ extern uint16_t stats_period;
 extern struct rte_eth_xstat_name *xstats_display;
 extern unsigned int xstats_display_num;
 
-extern uint16_t hairpin_mode;
+extern uint32_t hairpin_mode;
 
 #ifdef RTE_LIB_LATENCYSTATS
 extern uint8_t latencystats_enabled;
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 30edef07ea..c91c231094 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -556,10 +556,16 @@ The command line options are:
 
     Enable display of RX and TX burst stats.
 
-*   ``--hairpin-mode=0xXX``
+*   ``--hairpin-mode=0xXXXX``
 
-    Set the hairpin port mode with bitmask, only valid when hairpin queues number is set::
+    Set the hairpin port configuration with bitmask, only valid when hairpin queues number is set::
 
+	bit 18 - hairpin TX queues will use RTE memory
+	bit 16 - hairpin TX queues will use locked device memory
+	bit 13 - hairpin RX queues will use RTE memory
+	bit 12 - hairpin RX queues will use locked device memory
+	bit 9 - force memory settings of hairpin TX queue
+	bit 8 - force memory settings of hairpin RX queue
 	bit 4 - explicit Tx flow rule
 	bit 1 - two hairpin ports paired
 	bit 0 - two hairpin ports loop
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 7/7] app/flow-perf: add hairpin queue memory config
  2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
                   ` (5 preceding siblings ...)
  2022-09-19 16:37 ` [PATCH 6/7] app/testpmd: add hairpin queues memory modes Dariusz Sosnowski
@ 2022-09-19 16:37 ` Dariusz Sosnowski
  2022-10-04 12:24   ` Wisam Monther
  2022-10-04 16:44 ` [PATCH 0/7] ethdev: introduce hairpin memory capabilities Thomas Monjalon
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
  8 siblings, 1 reply; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-09-19 16:37 UTC (permalink / raw)
  To: Wisam Jaddo; +Cc: dev

This patch adds the hairpin-conf command line parameter to flow-perf
application. hairpin-conf parameter takes a hexadecimal bitmask with
bits having the following meaning:

- Bit 0 - Force memory settings of hairpin RX queue.
- Bit 1 - Force memory settings of hairpin TX queue.
- Bit 4 - Use locked device memory for hairpin RX queue.
- Bit 5 - Use RTE memory for hairpin RX queue.
- Bit 8 - Use locked device memory for hairpin TX queue.
- Bit 9 - Use RTE memory for hairpin TX queue.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 app/test-flow-perf/main.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index f375097028..4a9206803a 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -46,6 +46,15 @@
 #define DEFAULT_RULES_BATCH     100000
 #define DEFAULT_GROUP                0
 
+#define HAIRPIN_RX_CONF_FORCE_MEMORY  (0x0001)
+#define HAIRPIN_TX_CONF_FORCE_MEMORY  (0x0002)
+
+#define HAIRPIN_RX_CONF_LOCKED_MEMORY (0x0010)
+#define HAIRPIN_RX_CONF_RTE_MEMORY    (0x0020)
+
+#define HAIRPIN_TX_CONF_LOCKED_MEMORY (0x0100)
+#define HAIRPIN_TX_CONF_RTE_MEMORY    (0x0200)
+
 struct rte_flow *flow;
 static uint8_t flow_group;
 
@@ -61,6 +70,7 @@ static uint32_t policy_id[MAX_PORTS];
 static uint8_t items_idx, actions_idx, attrs_idx;
 
 static uint64_t ports_mask;
+static uint64_t hairpin_conf_mask;
 static uint16_t dst_ports[RTE_MAX_ETHPORTS];
 static volatile bool force_quit;
 static bool dump_iterations;
@@ -482,6 +492,7 @@ usage(char *progname)
 	printf("  --enable-fwd: To enable packets forwarding"
 		" after insertion\n");
 	printf("  --portmask=N: hexadecimal bitmask of ports used\n");
+	printf("  --hairpin-conf=0xXXXX: hexadecimal bitmask of hairpin queue configuration\n");
 	printf("  --random-priority=N,S: use random priority levels "
 		"from 0 to (N - 1) for flows "
 		"and S as seed for pseudo-random number generator\n");
@@ -629,6 +640,7 @@ static void
 args_parse(int argc, char **argv)
 {
 	uint64_t pm, seed;
+	uint64_t hp_conf;
 	char **argvopt;
 	uint32_t prio;
 	char *token;
@@ -648,6 +660,7 @@ args_parse(int argc, char **argv)
 		{ "enable-fwd",                 0, 0, 0 },
 		{ "unique-data",                0, 0, 0 },
 		{ "portmask",                   1, 0, 0 },
+		{ "hairpin-conf",               1, 0, 0 },
 		{ "cores",                      1, 0, 0 },
 		{ "random-priority",            1, 0, 0 },
 		{ "meter-profile-alg",          1, 0, 0 },
@@ -880,6 +893,13 @@ args_parse(int argc, char **argv)
 					rte_exit(EXIT_FAILURE, "Invalid fwd port mask\n");
 				ports_mask = pm;
 			}
+			if (strcmp(lgopts[opt_idx].name, "hairpin-conf") == 0) {
+				end = NULL;
+				hp_conf = strtoull(optarg, &end, 16);
+				if ((optarg[0] == '\0') || (end == NULL) || (*end != '\0'))
+					rte_exit(EXIT_FAILURE, "Invalid hairpin config mask\n");
+				hairpin_conf_mask = hp_conf;
+			}
 			if (strcmp(lgopts[opt_idx].name,
 					"port-id") == 0) {
 				uint16_t port_idx = 0;
@@ -2035,6 +2055,12 @@ init_port(void)
 				hairpin_conf.peers[0].port = port_id;
 				hairpin_conf.peers[0].queue =
 					std_queue + tx_queues_count;
+				hairpin_conf.use_locked_device_memory =
+					!!(hairpin_conf_mask & HAIRPIN_RX_CONF_LOCKED_MEMORY);
+				hairpin_conf.use_rte_memory =
+					!!(hairpin_conf_mask & HAIRPIN_RX_CONF_RTE_MEMORY);
+				hairpin_conf.force_memory =
+					!!(hairpin_conf_mask & HAIRPIN_RX_CONF_FORCE_MEMORY);
 				ret = rte_eth_rx_hairpin_queue_setup(
 						port_id, hairpin_queue,
 						rxd_count, &hairpin_conf);
@@ -2050,6 +2076,12 @@ init_port(void)
 				hairpin_conf.peers[0].port = port_id;
 				hairpin_conf.peers[0].queue =
 					std_queue + rx_queues_count;
+				hairpin_conf.use_locked_device_memory =
+					!!(hairpin_conf_mask & HAIRPIN_TX_CONF_LOCKED_MEMORY);
+				hairpin_conf.use_rte_memory =
+					!!(hairpin_conf_mask & HAIRPIN_TX_CONF_RTE_MEMORY);
+				hairpin_conf.force_memory =
+					!!(hairpin_conf_mask & HAIRPIN_TX_CONF_FORCE_MEMORY);
 				ret = rte_eth_tx_hairpin_queue_setup(
 						port_id, hairpin_queue,
 						txd_count, &hairpin_conf);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 2/7] common/mlx5: add hairpin SQ buffer type capabilities
  2022-09-19 16:37 ` [PATCH 2/7] common/mlx5: add hairpin SQ buffer type capabilities Dariusz Sosnowski
@ 2022-09-27 13:03   ` Slava Ovsiienko
  0 siblings, 0 replies; 30+ messages in thread
From: Slava Ovsiienko @ 2022-09-27 13:03 UTC (permalink / raw)
  To: Dariusz Sosnowski, Matan Azrad; +Cc: dev

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Monday, September 19, 2022 19:37
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH 2/7] common/mlx5: add hairpin SQ buffer type capabilities
> 
> This patch extends HCA_CAP and SQ Context structs available in PRM. This
> fields allow checking if NIC supports storing hairpin SQ's WQ buffer in host
> memory and configuring such memory placement.
> 
> HCA capabilities are extended with the following fields:
> 
> - hairpin_sq_wq_in_host_mem - If set, then NIC supports using host memory as
> a backing storage for hairpin SQ's WQ buffer.
> - hairpin_sq_wqe_bb_size - Indicates the required size of SQ WQE basic
> block.
> 
> SQ Context is extended with hairpin_wq_buffer_type which informs NIC where
> SQ's WQ buffer will be stored. This field can take the following values:
> 
> - MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_INTERNAL_BUFFER - WQ buffer will be
>   stored in unlocked device memory.
> - MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_HOST_MEMORY - WQ buffer will be stored
>   in host memory. Buffer is provided by PMD.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 3/7] common/mlx5: add hairpin RQ buffer type capabilities
  2022-09-19 16:37 ` [PATCH 3/7] common/mlx5: add hairpin RQ " Dariusz Sosnowski
@ 2022-09-27 13:04   ` Slava Ovsiienko
  0 siblings, 0 replies; 30+ messages in thread
From: Slava Ovsiienko @ 2022-09-27 13:04 UTC (permalink / raw)
  To: Dariusz Sosnowski, Matan Azrad; +Cc: dev

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Monday, September 19, 2022 19:37
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH 3/7] common/mlx5: add hairpin RQ buffer type capabilities
> 
> This patch adds new HCA capability related to hairpin RQs. This new
> capability, hairpin_data_buffer_locked, indicates whether HCA supports
> locking data buffer of hairpin RQ in ICMC (Interconnect Context Memory
> Cache).
> 
> Struct used to define RQ configuration (RQ context) is extended with
> hairpin_data_buffer_type field, which configures data buffer for hairpin RQ.
> It can take the following values:
> 
> - MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_UNLOCKED_INTERNAL_BUFFER - hairpin
>   RQ's data buffer is stored in unlocked memory in ICMC.
> - MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_LOCKED_INTERNAL_BUFFER - hairpin
>   RQ's data buffer is stored in locked memory in ICMC.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 5/7] net/mlx5: allow hairpin Rx queue in locked memory
  2022-09-19 16:37 ` [PATCH 5/7] net/mlx5: allow hairpin Rx queue in locked memory Dariusz Sosnowski
@ 2022-09-27 13:04   ` Slava Ovsiienko
  2022-11-25 14:06   ` Kenneth Klette Jonassen
  1 sibling, 0 replies; 30+ messages in thread
From: Slava Ovsiienko @ 2022-09-27 13:04 UTC (permalink / raw)
  To: Dariusz Sosnowski, Matan Azrad; +Cc: dev

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Monday, September 19, 2022 19:37
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH 5/7] net/mlx5: allow hairpin Rx queue in locked memory
> 
> This patch adds a capability to place hairpin Rx queue in locked device
> memory. This capability is equivalent to storing hairpin RQ's data buffers
> in locked internal device memory.
> 
> Hairpin Rx queue creation is extended with requesting that RQ is allocated
> in locked internal device memory. If allocation fails and force_memory
> hairpin configuration is set, then hairpin queue creation (and, as a result,
> device start) fails. If force_memory is unset, then PMD will fallback to
> allocating memory for hairpin RQ in unlocked internal device memory.
> 
> To allow such allocation, the user must set HAIRPIN_DATA_BUFFER_LOCK flag in
> FW using mlxconfig tool.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 4/7] net/mlx5: allow hairpin Tx queue in RTE memory
  2022-09-19 16:37 ` [PATCH 4/7] net/mlx5: allow hairpin Tx queue in RTE memory Dariusz Sosnowski
@ 2022-09-27 13:05   ` Slava Ovsiienko
  0 siblings, 0 replies; 30+ messages in thread
From: Slava Ovsiienko @ 2022-09-27 13:05 UTC (permalink / raw)
  To: Dariusz Sosnowski, Matan Azrad; +Cc: dev

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Monday, September 19, 2022 19:37
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH 4/7] net/mlx5: allow hairpin Tx queue in RTE memory
> 
> This patch adds a capability to place hairpin Tx queue in host memory
> managed by DPDK. This capability is equivalent to storing hairpin SQ's WQ
> buffer in host memory.
> 
> Hairpin Tx queue creation is extended with allocating a memory buffer of
> proper size (calculated from required number of packets and WQE BB size
> advertised in HCA capabilities).
> 
> force_memory flag of hairpin queue configuration is also supported.
> If it is set and:
> 
> - allocation of memory buffer fails,
> - or hairpin SQ creation fails,
> 
> then device start will fail. If it is unset, PMD will fallback to creating
> the hairpin SQ with WQ buffer located in unlocked device memory.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 7/7] app/flow-perf: add hairpin queue memory config
  2022-09-19 16:37 ` [PATCH 7/7] app/flow-perf: add hairpin queue memory config Dariusz Sosnowski
@ 2022-10-04 12:24   ` Wisam Monther
  2022-10-06 11:06     ` Dariusz Sosnowski
  0 siblings, 1 reply; 30+ messages in thread
From: Wisam Monther @ 2022-10-04 12:24 UTC (permalink / raw)
  To: Dariusz Sosnowski; +Cc: dev

Hi Dariusz,

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Monday, September 19, 2022 7:38 PM
> To: Wisam Monther <wisamm@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH 7/7] app/flow-perf: add hairpin queue memory config
> 
> This patch adds the hairpin-conf command line parameter to flow-perf
> application. hairpin-conf parameter takes a hexadecimal bitmask with bits
> having the following meaning:
> 
> - Bit 0 - Force memory settings of hairpin RX queue.
> - Bit 1 - Force memory settings of hairpin TX queue.
> - Bit 4 - Use locked device memory for hairpin RX queue.
> - Bit 5 - Use RTE memory for hairpin RX queue.
> - Bit 8 - Use locked device memory for hairpin TX queue.
> - Bit 9 - Use RTE memory for hairpin TX queue.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> ---

You have some checks issues; can you please kindly check them?

BRs,
Wisam Jaddo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/7] ethdev: introduce hairpin memory capabilities
  2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
                   ` (6 preceding siblings ...)
  2022-09-19 16:37 ` [PATCH 7/7] app/flow-perf: add hairpin queue memory config Dariusz Sosnowski
@ 2022-10-04 16:44 ` Thomas Monjalon
  2022-10-06 11:08   ` Dariusz Sosnowski
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
  8 siblings, 1 reply; 30+ messages in thread
From: Thomas Monjalon @ 2022-10-04 16:44 UTC (permalink / raw)
  To: Dariusz Sosnowski
  Cc: Ferruh Yigit, Andrew Rybchenko, dev, Viacheslav Ovsiienko,
	Matan Azrad, Ori Kam, Wisam Jaddo, Aman Singh, Yuying Zhang

19/09/2022 18:37, Dariusz Sosnowski:
> This patch series introduces hairpin memory configuration options proposed in
> http://patches.dpdk.org/project/dpdk/patch/20220811120530.191683-1-dsosnowski@nvidia.com/
> for Rx and Tx hairpin queues. It also implements handling of these options in mlx5 PMD
> and allows to use new hairpin options in testpmd (through `--hairpin-mode` option) and
> flow-perf (through `--hairpin-conf` option).

2 things are missing in this series:

1/ motivation (why is this needed)
2/ compilation on Windows
	looks like devx_umem_reg has 5 parameters in Windows glue!




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] ethdev: introduce hairpin memory capabilities
  2022-09-19 16:37 ` [PATCH 1/7] " Dariusz Sosnowski
@ 2022-10-04 16:50   ` Thomas Monjalon
  2022-10-06 11:21     ` Dariusz Sosnowski
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Monjalon @ 2022-10-04 16:50 UTC (permalink / raw)
  To: Dariusz Sosnowski; +Cc: Ferruh Yigit, Andrew Rybchenko, dev

19/09/2022 18:37, Dariusz Sosnowski:
> This patch introduces new hairpin queue configuration options through
> rte_eth_hairpin_conf struct, allowing to tune Rx and Tx hairpin queues
> memory configuration. Hairpin configuration is extended with the
> following fields:

What is the benefit?
How the user knows what to use?
Isn't it too much low level for a user?
Why it is not automatic in the driver?

[...]
> +	/**
> +	 * Use locked device memory as a backing storage.
> +	 *
> +	 * - When set, PMD will attempt to use on-device memory as a backing storage for descriptors
> +	 *   and/or data in hairpin queue.
> +	 * - When set, PMD will use detault memory type as a backing storage. Please refer to PMD

You probably mean "clear".
Please make lines shorter.
You should split lines logically, after a dot or at the end of a part.

> +	 *   documentation for details.
> +	 *
> +	 * API user should check if PMD supports this configuration flag using
> +	 * @see rte_eth_dev_hairpin_capability_get.
> +	 */
> +	uint32_t use_locked_device_memory:1;
> +
> +	/**
> +	 * Use DPDK memory as backing storage.
> +	 *
> +	 * - When set, PMD will attempt to use memory managed by DPDK as a backing storage
> +	 *   for descriptors and/or data in hairpin queue.
> +	 * - When clear, PMD will use default memory type as a backing storage. Please refer
> +	 *   to PMD documentation for details.
> +	 *
> +	 * API user should check if PMD supports this configuration flag using
> +	 * @see rte_eth_dev_hairpin_capability_get.
> +	 */
> +	uint32_t use_rte_memory:1;
> +
> +	/**
> +	 * Force usage of hairpin memory configuration.
> +	 *
> +	 * - When set, PMD will attempt to use specified memory settings and
> +	 *   if resource allocation fails, then hairpin queue setup will result in an
> +	 *   error.
> +	 * - When clear, PMD will attempt to use specified memory settings and
> +	 *   if resource allocation fails, then PMD will retry allocation with default
> +	 *   configuration.
> +	 */
> +	uint32_t force_memory:1;
> +
> +	uint32_t reserved:11; /**< Reserved bits. */

You can insert a blank line here.

>  	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
>  };



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 0/8] ethdev: introduce hairpin memory capabilities
  2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
                   ` (7 preceding siblings ...)
  2022-10-04 16:44 ` [PATCH 0/7] ethdev: introduce hairpin memory capabilities Thomas Monjalon
@ 2022-10-06 11:00 ` Dariusz Sosnowski
  2022-10-06 11:00   ` [PATCH v2 1/8] " Dariusz Sosnowski
                     ` (8 more replies)
  8 siblings, 9 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:00 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, Viacheslav Ovsiienko, Matan Azrad, Ori Kam, Wisam Jaddo,
	Aman Singh, Yuying Zhang

The hairpin queues are used to transmit packets received on the wire, back to the wire.
How hairpin queues are implemented and configured is decided internally by the PMD and
applications have no control over the configuration of Rx and Tx hairpin queues.
This patchset addresses that by:

- Extending hairpin queue capabilities reported by PMDs.
- Exposing new configuration options for Rx and Tx hairpin queues.

Main goal of this patchset is to allow applications to provide configuration hints
regarding memory placement of hairpin queues.
These hints specify whether buffers of hairpin queues should be placed in host memory
or in dedicated device memory.

For example, in context of NVIDIA Connect-X and BlueField devices,
this distinction is important for several reasons:

- By default, data buffers and packet descriptors are placed in device memory region
  which is shared with other resources (e.g. flow rules).
  This results in memory contention on the device,
  which may lead to degraded performance under heavy load.
- Placing hairpin queues in dedicated device memory can decrease latency of hairpinned traffic,
  since hairpin queue processing will not be memory starved by other operations.
  Side effect of this memory configuration is that it leaves less memory for other resources,
  possibly causing memory contention in non-hairpin traffic.
- Placing hairpin queues in host memory can increase throughput of hairpinned
  traffic at the cost of increasing latency.
  Each packet processed by hairpin queues will incur additional PCI transactions (increase in latency),
  but memory contention on the device is avoided.

Depending on the workload and whether throughput or latency has a higher priority for developers,
it would be beneficial if developers could choose the best hairpin configuration for their use case.

To address that, this patchset adds the following configuration options (in rte_eth_hairpin_conf struct):

- use_locked_device_memory - If set, PMD will allocate specialized on-device memory for the queue.
- use_rte_memory - If set, PMD will use DPDK-managed memory for the queue.
- force_memory - If set, PMD will be forced to use provided memory configuration.
  If no appropriate resources are available, the queue allocation will fail.
  If unset and no appropriate resources are available, PMD will fallback to its default behavior.

Implementing support for these flags is optional and applications should be allowed to not set any of these new flags.
This will result in default memory configuration provided by the PMD.
Application developers should consult the PMD documentation in that case.

These changes were originally proposed in http://patches.dpdk.org/project/dpdk/patch/20220811120530.191683-1-dsosnowski@nvidia.com/.

Dariusz Sosnowski (8):
  ethdev: introduce hairpin memory capabilities
  common/mlx5: add hairpin SQ buffer type capabilities
  common/mlx5: add hairpin RQ buffer type capabilities
  net/mlx5: allow hairpin Tx queue in RTE memory
  net/mlx5: allow hairpin Rx queue in locked memory
  doc: add notes for hairpin to mlx5 documentation
  app/testpmd: add hairpin queues memory modes
  app/flow-perf: add hairpin queue memory config

 app/test-flow-perf/main.c              |  32 +++++
 app/test-pmd/parameters.c              |   2 +-
 app/test-pmd/testpmd.c                 |  24 +++-
 app/test-pmd/testpmd.h                 |   2 +-
 doc/guides/nics/mlx5.rst               |  37 ++++++
 doc/guides/platform/mlx5.rst           |   5 +
 doc/guides/rel_notes/release_22_11.rst |  10 ++
 doc/guides/testpmd_app_ug/run_app.rst  |  10 +-
 drivers/common/mlx5/mlx5_devx_cmds.c   |   8 ++
 drivers/common/mlx5/mlx5_devx_cmds.h   |   5 +
 drivers/common/mlx5/mlx5_prm.h         |  25 +++-
 drivers/net/mlx5/mlx5.h                |   2 +
 drivers/net/mlx5/mlx5_devx.c           | 170 ++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c         |   6 +
 lib/ethdev/rte_ethdev.c                |  44 +++++++
 lib/ethdev/rte_ethdev.h                |  68 +++++++++-
 16 files changed, 422 insertions(+), 28 deletions(-)

-- 
v2:
* Fix Windows build by using mlx5_os_umem_dereg defined on both platforms to allocate memory for Tx hairpin queue.
* Added hairpin section to mlx5 PMD.
* Added info about new hairpin configuration options to DPDK release notes.

2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/8] ethdev: introduce hairpin memory capabilities
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
@ 2022-10-06 11:00   ` Dariusz Sosnowski
  2022-10-06 11:00   ` [PATCH v2 2/8] common/mlx5: add hairpin SQ buffer type capabilities Dariusz Sosnowski
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:00 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko; +Cc: dev

Before this patch, implementation details and configuration of hairpin
queues were decided internally by the PMD. Applications had no control
over the configuration of Rx and Tx hairpin queues, despite number of
descriptors, explicit Tx flow mode and disabling automatic binding.
This patch addresses that by adding:

- Hairpin queue capabilities reported by PMDs.
- New configuration options for Rx and Tx hairpin queues.

Main goal of this patch is to allow applications to provide
configuration hints regarding placement of hairpin queues.
These hints specify whether buffers of hairpin queues should be placed
in host memory or in dedicated device memory. Different memory options
may have different performance characteristics and hairpin configuration
should be fine-tuned to the specific application and use case.

This patch introduces new hairpin queue configuration options through
rte_eth_hairpin_conf struct, allowing to tune Rx and Tx hairpin queues
memory configuration. Hairpin configuration is extended with the
following fields:

- use_locked_device_memory - If set, PMD will use specialized on-device
  memory to store RX or TX hairpin queue data.
- use_rte_memory - If set, PMD will use DPDK-managed memory to store RX
  or TX hairpin queue data.
- force_memory - If set, PMD will be forced to use provided memory
  settings. If no appropriate resources are available, then device start
  will fail. If unset and no resources are available, PMD will fallback
  to using default type of resource for given queue.

If application chooses to use PMD default memory configuration, all of
these flags should remain unset.

Hairpin capabilities are also extended, to allow verification of support
of given hairpin memory configurations. Struct rte_eth_hairpin_cap is
extended with two additional fields of type rte_eth_hairpin_queue_cap:

- rx_cap - memory capabilities of hairpin RX queues.
- tx_cap - memory capabilities of hairpin TX queues.

Struct rte_eth_hairpin_queue_cap exposes whether given queue type
supports use_locked_device_memory and use_rte_memory flags.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/rel_notes/release_22_11.rst | 10 ++++
 lib/ethdev/rte_ethdev.c                | 44 +++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 68 +++++++++++++++++++++++++-
 3 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index ac67e7e710..e5c48c6b18 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -66,6 +66,16 @@ New Features
   Added new function ``rte_flow_async_action_handle_query()``,
   to query the action asynchronously.
 
+* **Added hairpin memory configurations options in ethdev API.**
+
+  Added new configuration flags for hairpin queues in ``rte_eth_hairpin_conf``:
+
+  * ``use_locked_device_memory``
+  * ``use_rte_memory``
+  * ``force_memory``
+
+  Each flag has a corresponding capability flag in ``rte_eth_hairpin_queue_cap`` struct.
+
 * **Updated Intel iavf driver.**
 
   * Added flow subscription support.
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 2821770e2d..bece83eb91 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1961,6 +1961,28 @@ rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
 			conf->peer_count, cap.max_rx_2_tx);
 		return -EINVAL;
 	}
+	if (conf->use_locked_device_memory && !cap.rx_cap.locked_device_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use locked device memory for Rx queue, which is not supported");
+		return -EINVAL;
+	}
+	if (conf->use_rte_memory && !cap.rx_cap.rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use DPDK memory for Rx queue, which is not supported");
+		return -EINVAL;
+	}
+	if (conf->use_locked_device_memory && conf->use_rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use mutually exclusive memory settings for Rx queue");
+		return -EINVAL;
+	}
+	if (conf->force_memory &&
+	    !conf->use_locked_device_memory &&
+	    !conf->use_rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to force Rx queue memory settings, but none is set");
+		return -EINVAL;
+	}
 	if (conf->peer_count == 0) {
 		RTE_ETHDEV_LOG(ERR,
 			"Invalid value for number of peers for Rx queue(=%u), should be: > 0",
@@ -2128,6 +2150,28 @@ rte_eth_tx_hairpin_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
 			conf->peer_count, cap.max_tx_2_rx);
 		return -EINVAL;
 	}
+	if (conf->use_locked_device_memory && !cap.tx_cap.locked_device_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use locked device memory for Tx queue, which is not supported");
+		return -EINVAL;
+	}
+	if (conf->use_rte_memory && !cap.tx_cap.rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use DPDK memory for Tx queue, which is not supported");
+		return -EINVAL;
+	}
+	if (conf->use_locked_device_memory && conf->use_rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to use mutually exclusive memory settings for Tx queue");
+		return -EINVAL;
+	}
+	if (conf->force_memory &&
+	    !conf->use_locked_device_memory &&
+	    !conf->use_rte_memory) {
+		RTE_ETHDEV_LOG(ERR,
+			"Attempt to force Tx queue memory settings, but none is set");
+		return -EINVAL;
+	}
 	if (conf->peer_count == 0) {
 		RTE_ETHDEV_LOG(ERR,
 			"Invalid value for number of peers for Tx queue(=%u), should be: > 0",
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index a21f58b9cd..eab931d3b2 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1092,6 +1092,28 @@ struct rte_eth_txconf {
 	void *reserved_ptrs[2];   /**< Reserved for future fields */
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to return the Tx or Rx hairpin queue capabilities that are supported.
+ */
+struct rte_eth_hairpin_queue_cap {
+	/**
+	 * When set, PMD supports placing descriptors and/or data buffers
+	 * in dedicated device memory.
+	 */
+	uint32_t locked_device_memory:1;
+
+	/**
+	 * When set, PMD supports placing descriptors and/or data buffers
+	 * in host memory managed by DPDK.
+	 */
+	uint32_t rte_memory:1;
+
+	uint32_t reserved:30; /**< Reserved for future fields */
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
@@ -1106,6 +1128,8 @@ struct rte_eth_hairpin_cap {
 	/** Max number of Tx queues to be connected to one Rx queue. */
 	uint16_t max_tx_2_rx;
 	uint16_t max_nb_desc; /**< The max num of descriptors. */
+	struct rte_eth_hairpin_queue_cap rx_cap; /**< Rx hairpin queue capabilities. */
+	struct rte_eth_hairpin_queue_cap tx_cap; /**< Tx hairpin queue capabilities. */
 };
 
 #define RTE_ETH_MAX_HAIRPIN_PEERS 32
@@ -1149,11 +1173,51 @@ struct rte_eth_hairpin_conf {
 	 *   function after all the queues are set up properly and the ports are
 	 *   started. Also, the hairpin unbind function should be called
 	 *   accordingly before stopping a port that with hairpin configured.
-	 * - When clear, the PMD will try to enable the hairpin with the queues
+	 * - When cleared, the PMD will try to enable the hairpin with the queues
 	 *   configured automatically during port start.
 	 */
 	uint32_t manual_bind:1;
-	uint32_t reserved:14; /**< Reserved bits. */
+
+	/**
+	 * Use locked device memory as a backing storage.
+	 *
+	 * - When set, PMD will attempt place descriptors and/or data buffers
+	 *   in dedicated device memory.
+	 * - When cleared, PMD will use default memory type as a backing storage.
+	 *   Please refer to PMD documentation for details.
+	 *
+	 * API user should check if PMD supports this configuration flag using
+	 * @see rte_eth_dev_hairpin_capability_get.
+	 */
+	uint32_t use_locked_device_memory:1;
+
+	/**
+	 * Use DPDK memory as backing storage.
+	 *
+	 * - When set, PMD will attempt place descriptors and/or data buffers
+	 *   in host memory managed by DPDK.
+	 * - When cleared, PMD will use default memory type as a backing storage.
+	 *   Please refer to PMD documentation for details.
+	 *
+	 * API user should check if PMD supports this configuration flag using
+	 * @see rte_eth_dev_hairpin_capability_get.
+	 */
+	uint32_t use_rte_memory:1;
+
+	/**
+	 * Force usage of hairpin memory configuration.
+	 *
+	 * - When set, PMD will attempt to use specified memory settings.
+	 *   If resource allocation fails, then hairpin queue allocation
+	 *   will result in an error.
+	 * - When clear, PMD will attempt to use specified memory settings.
+	 *   If resource allocation fails, then PMD will retry
+	 *   allocation with default configuration.
+	 */
+	uint32_t force_memory:1;
+
+	uint32_t reserved:11; /**< Reserved bits. */
+
 	struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
 };
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 2/8] common/mlx5: add hairpin SQ buffer type capabilities
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
  2022-10-06 11:00   ` [PATCH v2 1/8] " Dariusz Sosnowski
@ 2022-10-06 11:00   ` Dariusz Sosnowski
  2022-10-06 11:01   ` [PATCH v2 3/8] common/mlx5: add hairpin RQ " Dariusz Sosnowski
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:00 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This patch extends HCA_CAP and SQ Context structs available in PRM. This
fields allow checking if NIC supports storing hairpin SQ's WQ buffer in
host memory and configuring such memory placement.

HCA capabilities are extended with the following fields:

- hairpin_sq_wq_in_host_mem - If set, then NIC supports using host
memory as a backing storage for hairpin SQ's WQ buffer.
- hairpin_sq_wqe_bb_size - Indicates the required size of SQ WQE basic
block.

SQ Context is extended with hairpin_wq_buffer_type which informs
NIC where SQ's WQ buffer will be stored. This field can take the
following values:

- MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_INTERNAL_BUFFER - WQ buffer will be
  stored in unlocked device memory.
- MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_HOST_MEMORY - WQ buffer will be stored
  in host memory. Buffer is provided by PMD.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  5 +++++
 drivers/common/mlx5/mlx5_devx_cmds.h |  3 +++
 drivers/common/mlx5/mlx5_prm.h       | 15 +++++++++++++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index fb33023138..a1e8179568 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -989,6 +989,10 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 		}
 		attr->log_min_stride_wqe_sz = MLX5_GET(cmd_hca_cap_2, hcattr,
 						       log_min_stride_wqe_sz);
+		attr->hairpin_sq_wqe_bb_size = MLX5_GET(cmd_hca_cap_2, hcattr,
+							hairpin_sq_wqe_bb_size);
+		attr->hairpin_sq_wq_in_host_mem = MLX5_GET(cmd_hca_cap_2, hcattr,
+							   hairpin_sq_wq_in_host_mem);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
@@ -1706,6 +1710,7 @@ mlx5_devx_cmd_create_sq(void *ctx,
 	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
 	MLX5_SET(sqc, sq_ctx, non_wire, sq_attr->non_wire);
 	MLX5_SET(sqc, sq_ctx, static_sq_wq, sq_attr->static_sq_wq);
+	MLX5_SET(sqc, sq_ctx, hairpin_wq_buffer_type, sq_attr->hairpin_wq_buffer_type);
 	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
 	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
 	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index af6053a788..9ac2d75df4 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -191,6 +191,8 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_queues:5;
 	uint32_t log_max_hairpin_wq_data_sz:5;
 	uint32_t log_max_hairpin_num_packets:5;
+	uint32_t hairpin_sq_wqe_bb_size:4;
+	uint32_t hairpin_sq_wq_in_host_mem:1;
 	uint32_t vhca_id:16;
 	uint32_t relaxed_ordering_write:1;
 	uint32_t relaxed_ordering_read:1;
@@ -407,6 +409,7 @@ struct mlx5_devx_create_sq_attr {
 	uint32_t non_wire:1;
 	uint32_t static_sq_wq:1;
 	uint32_t ts_format:2;
+	uint32_t hairpin_wq_buffer_type:3;
 	uint32_t user_index:24;
 	uint32_t cqn:24;
 	uint32_t packet_pacing_rate_limit_index:16;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4346279c81..04d35ca845 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -2020,7 +2020,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 reserved_at_d8[0x3];
 	u8 log_max_conn_track_offload[0x5];
 	u8 reserved_at_e0[0x20]; /* End of DW7. */
-	u8 reserved_at_100[0x700];
+	u8 reserved_at_100[0x60];
+	u8 reserved_at_160[0x3];
+	u8 hairpin_sq_wqe_bb_size[0x5];
+	u8 hairpin_sq_wq_in_host_mem[0x1];
+	u8 reserved_at_169[0x697];
 };
 
 struct mlx5_ifc_esw_cap_bits {
@@ -2673,6 +2677,11 @@ enum {
 	MLX5_SQC_STATE_ERR  = 0x3,
 };
 
+enum {
+	MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_INTERNAL_BUFFER = 0x0,
+	MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_HOST_MEMORY = 0x1,
+};
+
 struct mlx5_ifc_sqc_bits {
 	u8 rlky[0x1];
 	u8 cd_master[0x1];
@@ -2686,7 +2695,9 @@ struct mlx5_ifc_sqc_bits {
 	u8 hairpin[0x1];
 	u8 non_wire[0x1];
 	u8 static_sq_wq[0x1];
-	u8 reserved_at_11[0x9];
+	u8 reserved_at_11[0x4];
+	u8 hairpin_wq_buffer_type[0x3];
+	u8 reserved_at_18[0x2];
 	u8 ts_format[0x02];
 	u8 reserved_at_1c[0x4];
 	u8 reserved_at_20[0x8];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 3/8] common/mlx5: add hairpin RQ buffer type capabilities
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
  2022-10-06 11:00   ` [PATCH v2 1/8] " Dariusz Sosnowski
  2022-10-06 11:00   ` [PATCH v2 2/8] common/mlx5: add hairpin SQ buffer type capabilities Dariusz Sosnowski
@ 2022-10-06 11:01   ` Dariusz Sosnowski
  2022-10-06 11:01   ` [PATCH v2 4/8] net/mlx5: allow hairpin Tx queue in RTE memory Dariusz Sosnowski
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:01 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This patch adds new HCA capability related to hairpin RQs. This new
capability, hairpin_data_buffer_locked, indicates whether HCA supports
locking data buffer of hairpin RQ in ICMC (Interconnect Context Memory
Cache).

Struct used to define RQ configuration (RQ context) is extended with
hairpin_data_buffer_type field, which configures data buffer for hairpin
RQ. It can take the following values:

- MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_UNLOCKED_INTERNAL_BUFFER - hairpin
  RQ's data buffer is stored in unlocked memory in ICMC.
- MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_LOCKED_INTERNAL_BUFFER - hairpin
  RQ's data buffer is stored in locked memory in ICMC.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  3 +++
 drivers/common/mlx5/mlx5_devx_cmds.h |  2 ++
 drivers/common/mlx5/mlx5_prm.h       | 12 ++++++++++--
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index a1e8179568..76f0b6724f 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -993,6 +993,8 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 							hairpin_sq_wqe_bb_size);
 		attr->hairpin_sq_wq_in_host_mem = MLX5_GET(cmd_hca_cap_2, hcattr,
 							   hairpin_sq_wq_in_host_mem);
+		attr->hairpin_data_buffer_locked = MLX5_GET(cmd_hca_cap_2, hcattr,
+							    hairpin_data_buffer_locked);
 	}
 	if (attr->log_min_stride_wqe_sz == 0)
 		attr->log_min_stride_wqe_sz = MLX5_MPRQ_LOG_MIN_STRIDE_WQE_SIZE;
@@ -1293,6 +1295,7 @@ mlx5_devx_cmd_create_rq(void *ctx,
 	MLX5_SET(rqc, rq_ctx, state, rq_attr->state);
 	MLX5_SET(rqc, rq_ctx, flush_in_error_en, rq_attr->flush_in_error_en);
 	MLX5_SET(rqc, rq_ctx, hairpin, rq_attr->hairpin);
+	MLX5_SET(rqc, rq_ctx, hairpin_data_buffer_type, rq_attr->hairpin_data_buffer_type);
 	MLX5_SET(rqc, rq_ctx, user_index, rq_attr->user_index);
 	MLX5_SET(rqc, rq_ctx, cqn, rq_attr->cqn);
 	MLX5_SET(rqc, rq_ctx, counter_set_id, rq_attr->counter_set_id);
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 9ac2d75df4..cceaf3411d 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -193,6 +193,7 @@ struct mlx5_hca_attr {
 	uint32_t log_max_hairpin_num_packets:5;
 	uint32_t hairpin_sq_wqe_bb_size:4;
 	uint32_t hairpin_sq_wq_in_host_mem:1;
+	uint32_t hairpin_data_buffer_locked:1;
 	uint32_t vhca_id:16;
 	uint32_t relaxed_ordering_write:1;
 	uint32_t relaxed_ordering_read:1;
@@ -313,6 +314,7 @@ struct mlx5_devx_create_rq_attr {
 	uint32_t state:4;
 	uint32_t flush_in_error_en:1;
 	uint32_t hairpin:1;
+	uint32_t hairpin_data_buffer_type:3;
 	uint32_t ts_format:2;
 	uint32_t user_index:24;
 	uint32_t cqn:24;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 04d35ca845..9c1c93f916 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -2024,7 +2024,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8 reserved_at_160[0x3];
 	u8 hairpin_sq_wqe_bb_size[0x5];
 	u8 hairpin_sq_wq_in_host_mem[0x1];
-	u8 reserved_at_169[0x697];
+	u8 hairpin_data_buffer_locked[0x1];
+	u8 reserved_at_16a[0x696];
 };
 
 struct mlx5_ifc_esw_cap_bits {
@@ -2304,7 +2305,9 @@ struct mlx5_ifc_rqc_bits {
 	u8 reserved_at_c[0x1];
 	u8 flush_in_error_en[0x1];
 	u8 hairpin[0x1];
-	u8 reserved_at_f[0xB];
+	u8 reserved_at_f[0x6];
+	u8 hairpin_data_buffer_type[0x3];
+	u8 reserved_at_a8[0x2];
 	u8 ts_format[0x02];
 	u8 reserved_at_1c[0x4];
 	u8 reserved_at_20[0x8];
@@ -2813,6 +2816,11 @@ enum {
 	MLX5_CQE_SIZE_128B = 0x1,
 };
 
+enum {
+	MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_UNLOCKED_INTERNAL_BUFFER = 0x0,
+	MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_LOCKED_INTERNAL_BUFFER = 0x1,
+};
+
 struct mlx5_ifc_cqc_bits {
 	u8 status[0x4];
 	u8 as_notify[0x1];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 4/8] net/mlx5: allow hairpin Tx queue in RTE memory
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
                     ` (2 preceding siblings ...)
  2022-10-06 11:01   ` [PATCH v2 3/8] common/mlx5: add hairpin RQ " Dariusz Sosnowski
@ 2022-10-06 11:01   ` Dariusz Sosnowski
  2022-10-06 11:01   ` [PATCH v2 5/8] net/mlx5: allow hairpin Rx queue in locked memory Dariusz Sosnowski
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:01 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This patch adds a capability to place hairpin Tx queue in host memory
managed by DPDK. This capability is equivalent to storing hairpin SQ's
WQ buffer in host memory.

Hairpin Tx queue creation is extended with allocating a memory buffer of
proper size (calculated from required number of packets and WQE BB size
advertised in HCA capabilities).

force_memory flag of hairpin queue configuration is also supported.
If it is set and:

- allocation of memory buffer fails,
- or hairpin SQ creation fails,

then device start will fail. If it is unset, PMD will fallback to
creating the hairpin SQ with WQ buffer located in unlocked device
memory.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5.h        |   2 +
 drivers/net/mlx5/mlx5_devx.c   | 119 ++++++++++++++++++++++++++++++---
 drivers/net/mlx5/mlx5_ethdev.c |   4 ++
 3 files changed, 116 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 95ecbea39e..3c9e6bad53 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1386,6 +1386,8 @@ struct mlx5_txq_obj {
 			struct mlx5_devx_obj *sq;
 			/* DevX object for Sx queue. */
 			struct mlx5_devx_obj *tis; /* The TIS object. */
+			void *umem_buf_wq_buffer;
+			void *umem_obj_wq_buffer;
 		};
 		struct {
 			struct rte_eth_dev *dev;
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 943aa8ef57..c61c34bd99 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1185,18 +1185,23 @@ static int
 mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hca_attr *hca_attr = &priv->sh->cdev->config.hca_attr;
 	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
 	struct mlx5_txq_ctrl *txq_ctrl =
 		container_of(txq_data, struct mlx5_txq_ctrl, txq);
-	struct mlx5_devx_create_sq_attr attr = { 0 };
+	struct mlx5_devx_create_sq_attr dev_mem_attr = { 0 };
+	struct mlx5_devx_create_sq_attr host_mem_attr = { 0 };
 	struct mlx5_txq_obj *tmpl = txq_ctrl->obj;
+	void *umem_buf = NULL;
+	void *umem_obj = NULL;
 	uint32_t max_wq_data;
 
 	MLX5_ASSERT(txq_data);
 	MLX5_ASSERT(tmpl);
 	tmpl->txq_ctrl = txq_ctrl;
-	attr.hairpin = 1;
-	attr.tis_lst_sz = 1;
+	dev_mem_attr.hairpin = 1;
+	dev_mem_attr.tis_lst_sz = 1;
+	dev_mem_attr.tis_num = mlx5_get_txq_tis_num(dev, idx);
 	max_wq_data =
 		priv->sh->cdev->config.hca_attr.log_max_hairpin_wq_data_sz;
 	/* Jumbo frames > 9KB should be supported, and more packets. */
@@ -1208,19 +1213,103 @@ mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
 			rte_errno = ERANGE;
 			return -rte_errno;
 		}
-		attr.wq_attr.log_hairpin_data_sz = priv->config.log_hp_size;
+		dev_mem_attr.wq_attr.log_hairpin_data_sz = priv->config.log_hp_size;
 	} else {
-		attr.wq_attr.log_hairpin_data_sz =
+		dev_mem_attr.wq_attr.log_hairpin_data_sz =
 				(max_wq_data < MLX5_HAIRPIN_JUMBO_LOG_SIZE) ?
 				 max_wq_data : MLX5_HAIRPIN_JUMBO_LOG_SIZE;
 	}
 	/* Set the packets number to the maximum value for performance. */
-	attr.wq_attr.log_hairpin_num_packets =
-			attr.wq_attr.log_hairpin_data_sz -
+	dev_mem_attr.wq_attr.log_hairpin_num_packets =
+			dev_mem_attr.wq_attr.log_hairpin_data_sz -
 			MLX5_HAIRPIN_QUEUE_STRIDE;
+	dev_mem_attr.hairpin_wq_buffer_type = MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_INTERNAL_BUFFER;
+	if (txq_ctrl->hairpin_conf.use_rte_memory) {
+		uint32_t umem_size;
+		uint32_t umem_dbrec;
+		size_t alignment = MLX5_WQE_BUF_ALIGNMENT;
 
-	attr.tis_num = mlx5_get_txq_tis_num(dev, idx);
-	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->cdev->ctx, &attr);
+		if (alignment == (size_t)-1) {
+			DRV_LOG(ERR, "Failed to get WQE buf alignment.");
+			rte_errno = ENOMEM;
+			return -rte_errno;
+		}
+		/*
+		 * It is assumed that configuration is verified against capabilities
+		 * during queue setup.
+		 */
+		MLX5_ASSERT(hca_attr->hairpin_sq_wq_in_host_mem);
+		MLX5_ASSERT(hca_attr->hairpin_sq_wqe_bb_size > 0);
+		rte_memcpy(&host_mem_attr, &dev_mem_attr, sizeof(host_mem_attr));
+		umem_size = MLX5_WQE_SIZE *
+			RTE_BIT32(host_mem_attr.wq_attr.log_hairpin_num_packets);
+		umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+		umem_size += MLX5_DBR_SIZE;
+		umem_buf = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO, umem_size,
+				       alignment, priv->sh->numa_node);
+		if (umem_buf == NULL && txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(ERR, "Failed to allocate memory for hairpin TX queue");
+			rte_errno = ENOMEM;
+			return -rte_errno;
+		} else if (umem_buf == NULL && !txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(WARNING, "Failed to allocate memory for hairpin TX queue."
+					 " Falling back to TX queue located on the device.");
+			goto create_sq_on_device;
+		}
+		umem_obj = mlx5_os_umem_reg(priv->sh->cdev->ctx,
+					    (void *)(uintptr_t)umem_buf,
+					    umem_size,
+					    IBV_ACCESS_LOCAL_WRITE);
+		if (umem_obj == NULL && txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(ERR, "Failed to register UMEM for hairpin TX queue");
+			mlx5_free(umem_buf);
+			return -rte_errno;
+		} else if (umem_obj == NULL && !txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(WARNING, "Failed to register UMEM for hairpin TX queue."
+					 " Falling back to TX queue located on the device.");
+			rte_errno = 0;
+			mlx5_free(umem_buf);
+			goto create_sq_on_device;
+		}
+		host_mem_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
+		host_mem_attr.wq_attr.wq_umem_valid = 1;
+		host_mem_attr.wq_attr.wq_umem_id = mlx5_os_get_umem_id(umem_obj);
+		host_mem_attr.wq_attr.wq_umem_offset = 0;
+		host_mem_attr.wq_attr.dbr_umem_valid = 1;
+		host_mem_attr.wq_attr.dbr_umem_id = host_mem_attr.wq_attr.wq_umem_id;
+		host_mem_attr.wq_attr.dbr_addr = umem_dbrec;
+		host_mem_attr.wq_attr.log_wq_stride = rte_log2_u32(MLX5_WQE_SIZE);
+		host_mem_attr.wq_attr.log_wq_sz =
+				host_mem_attr.wq_attr.log_hairpin_num_packets *
+				hca_attr->hairpin_sq_wqe_bb_size;
+		host_mem_attr.wq_attr.log_wq_pg_sz = MLX5_LOG_PAGE_SIZE;
+		host_mem_attr.hairpin_wq_buffer_type = MLX5_SQC_HAIRPIN_WQ_BUFFER_TYPE_HOST_MEMORY;
+		tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->cdev->ctx, &host_mem_attr);
+		if (!tmpl->sq && txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(ERR,
+				"Port %u tx hairpin queue %u can't create SQ object.",
+				dev->data->port_id, idx);
+			claim_zero(mlx5_os_umem_dereg(umem_obj));
+			mlx5_free(umem_buf);
+			return -rte_errno;
+		} else if (!tmpl->sq && !txq_ctrl->hairpin_conf.force_memory) {
+			DRV_LOG(WARNING,
+				"Port %u tx hairpin queue %u failed to allocate SQ object"
+				" using host memory. Falling back to TX queue located"
+				" on the device",
+				dev->data->port_id, idx);
+			rte_errno = 0;
+			claim_zero(mlx5_os_umem_dereg(umem_obj));
+			mlx5_free(umem_buf);
+			goto create_sq_on_device;
+		}
+		tmpl->umem_buf_wq_buffer = umem_buf;
+		tmpl->umem_obj_wq_buffer = umem_obj;
+		return 0;
+	}
+
+create_sq_on_device:
+	tmpl->sq = mlx5_devx_cmd_create_sq(priv->sh->cdev->ctx, &dev_mem_attr);
 	if (!tmpl->sq) {
 		DRV_LOG(ERR,
 			"Port %u tx hairpin queue %u can't create SQ object.",
@@ -1452,8 +1541,20 @@ mlx5_txq_devx_obj_release(struct mlx5_txq_obj *txq_obj)
 {
 	MLX5_ASSERT(txq_obj);
 	if (txq_obj->txq_ctrl->is_hairpin) {
+		if (txq_obj->sq) {
+			claim_zero(mlx5_devx_cmd_destroy(txq_obj->sq));
+			txq_obj->sq = NULL;
+		}
 		if (txq_obj->tis)
 			claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		if (txq_obj->umem_obj_wq_buffer) {
+			claim_zero(mlx5_os_umem_dereg(txq_obj->umem_obj_wq_buffer));
+			txq_obj->umem_obj_wq_buffer = NULL;
+		}
+		if (txq_obj->umem_buf_wq_buffer) {
+			mlx5_free(txq_obj->umem_buf_wq_buffer);
+			txq_obj->umem_buf_wq_buffer = NULL;
+		}
 #if defined(HAVE_MLX5DV_DEVX_UAR_OFFSET) || !defined(HAVE_INFINIBAND_VERBS_H)
 	} else {
 		mlx5_txq_release_devx_resources(txq_obj);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index a5c7ca8c52..c59005ea2b 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -729,6 +729,7 @@ int
 mlx5_hairpin_cap_get(struct rte_eth_dev *dev, struct rte_eth_hairpin_cap *cap)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_hca_attr *hca_attr;
 
 	if (!mlx5_devx_obj_ops_en(priv->sh)) {
 		rte_errno = ENOTSUP;
@@ -738,5 +739,8 @@ mlx5_hairpin_cap_get(struct rte_eth_dev *dev, struct rte_eth_hairpin_cap *cap)
 	cap->max_rx_2_tx = 1;
 	cap->max_tx_2_rx = 1;
 	cap->max_nb_desc = 8192;
+	hca_attr = &priv->sh->cdev->config.hca_attr;
+	cap->tx_cap.locked_device_memory = 0;
+	cap->tx_cap.rte_memory = hca_attr->hairpin_sq_wq_in_host_mem;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 5/8] net/mlx5: allow hairpin Rx queue in locked memory
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
                     ` (3 preceding siblings ...)
  2022-10-06 11:01   ` [PATCH v2 4/8] net/mlx5: allow hairpin Tx queue in RTE memory Dariusz Sosnowski
@ 2022-10-06 11:01   ` Dariusz Sosnowski
  2022-10-06 11:01   ` [PATCH v2 6/8] doc: add notes for hairpin to mlx5 documentation Dariusz Sosnowski
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:01 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This patch adds a capability to place hairpin Rx queue in locked device
memory. This capability is equivalent to storing hairpin RQ's data
buffers in locked internal device memory.

Hairpin Rx queue creation is extended with requesting that RQ is
allocated in locked internal device memory. If allocation fails and
force_memory hairpin configuration is set, then hairpin queue creation
(and, as a result, device start) fails. If force_memory is unset, then
PMD will fallback to allocating memory for hairpin RQ in unlocked
internal device memory.

To allow such allocation, the user must set HAIRPIN_DATA_BUFFER_LOCK
flag in FW using mlxconfig tool.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/platform/mlx5.rst   |  5 ++++
 drivers/net/mlx5/mlx5_devx.c   | 51 ++++++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_ethdev.c |  2 ++
 3 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/doc/guides/platform/mlx5.rst b/doc/guides/platform/mlx5.rst
index 46b394c4ee..3cc1dd29e2 100644
--- a/doc/guides/platform/mlx5.rst
+++ b/doc/guides/platform/mlx5.rst
@@ -555,6 +555,11 @@ Below are some firmware configurations listed.
 
    REAL_TIME_CLOCK_ENABLE=1
 
+- allow locking hairpin RQ data buffer in device memory::
+
+   HAIRPIN_DATA_BUFFER_LOCK=1
+   MEMIC_SIZE_LIMIT=0
+
 
 .. _mlx5_common_driver_options:
 
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index c61c34bd99..fe303a73bb 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -468,14 +468,16 @@ mlx5_rxq_obj_hairpin_new(struct mlx5_rxq_priv *rxq)
 {
 	uint16_t idx = rxq->idx;
 	struct mlx5_priv *priv = rxq->priv;
+	struct mlx5_hca_attr *hca_attr __rte_unused = &priv->sh->cdev->config.hca_attr;
 	struct mlx5_rxq_ctrl *rxq_ctrl = rxq->ctrl;
-	struct mlx5_devx_create_rq_attr attr = { 0 };
+	struct mlx5_devx_create_rq_attr unlocked_attr = { 0 };
+	struct mlx5_devx_create_rq_attr locked_attr = { 0 };
 	struct mlx5_rxq_obj *tmpl = rxq_ctrl->obj;
 	uint32_t max_wq_data;
 
 	MLX5_ASSERT(rxq != NULL && rxq->ctrl != NULL && tmpl != NULL);
 	tmpl->rxq_ctrl = rxq_ctrl;
-	attr.hairpin = 1;
+	unlocked_attr.hairpin = 1;
 	max_wq_data =
 		priv->sh->cdev->config.hca_attr.log_max_hairpin_wq_data_sz;
 	/* Jumbo frames > 9KB should be supported, and more packets. */
@@ -487,20 +489,50 @@ mlx5_rxq_obj_hairpin_new(struct mlx5_rxq_priv *rxq)
 			rte_errno = ERANGE;
 			return -rte_errno;
 		}
-		attr.wq_attr.log_hairpin_data_sz = priv->config.log_hp_size;
+		unlocked_attr.wq_attr.log_hairpin_data_sz = priv->config.log_hp_size;
 	} else {
-		attr.wq_attr.log_hairpin_data_sz =
+		unlocked_attr.wq_attr.log_hairpin_data_sz =
 				(max_wq_data < MLX5_HAIRPIN_JUMBO_LOG_SIZE) ?
 				 max_wq_data : MLX5_HAIRPIN_JUMBO_LOG_SIZE;
 	}
 	/* Set the packets number to the maximum value for performance. */
-	attr.wq_attr.log_hairpin_num_packets =
-			attr.wq_attr.log_hairpin_data_sz -
+	unlocked_attr.wq_attr.log_hairpin_num_packets =
+			unlocked_attr.wq_attr.log_hairpin_data_sz -
 			MLX5_HAIRPIN_QUEUE_STRIDE;
-	attr.counter_set_id = priv->counter_set_id;
+	unlocked_attr.counter_set_id = priv->counter_set_id;
 	rxq_ctrl->rxq.delay_drop = priv->config.hp_delay_drop;
-	attr.delay_drop_en = priv->config.hp_delay_drop;
-	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->cdev->ctx, &attr,
+	unlocked_attr.delay_drop_en = priv->config.hp_delay_drop;
+	unlocked_attr.hairpin_data_buffer_type =
+			MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_UNLOCKED_INTERNAL_BUFFER;
+	if (rxq->hairpin_conf.use_locked_device_memory) {
+		/*
+		 * It is assumed that configuration is verified against capabilities
+		 * during queue setup.
+		 */
+		MLX5_ASSERT(hca_attr->hairpin_data_buffer_locked);
+		rte_memcpy(&locked_attr, &unlocked_attr, sizeof(locked_attr));
+		locked_attr.hairpin_data_buffer_type =
+				MLX5_RQC_HAIRPIN_DATA_BUFFER_TYPE_LOCKED_INTERNAL_BUFFER;
+		tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->cdev->ctx, &locked_attr,
+						   rxq_ctrl->socket);
+		if (!tmpl->rq && rxq->hairpin_conf.force_memory) {
+			DRV_LOG(ERR, "Port %u Rx hairpin queue %u can't create RQ object"
+				     " with locked memory buffer",
+				     priv->dev_data->port_id, idx);
+			return -rte_errno;
+		} else if (!tmpl->rq && !rxq->hairpin_conf.force_memory) {
+			DRV_LOG(WARNING, "Port %u Rx hairpin queue %u can't create RQ object"
+					 " with locked memory buffer. Falling back to unlocked"
+					 " device memory.",
+					 priv->dev_data->port_id, idx);
+			rte_errno = 0;
+			goto create_rq_unlocked;
+		}
+		goto create_rq_set_state;
+	}
+
+create_rq_unlocked:
+	tmpl->rq = mlx5_devx_cmd_create_rq(priv->sh->cdev->ctx, &unlocked_attr,
 					   rxq_ctrl->socket);
 	if (!tmpl->rq) {
 		DRV_LOG(ERR,
@@ -509,6 +541,7 @@ mlx5_rxq_obj_hairpin_new(struct mlx5_rxq_priv *rxq)
 		rte_errno = errno;
 		return -rte_errno;
 	}
+create_rq_set_state:
 	priv->dev_data->rx_queue_state[idx] = RTE_ETH_QUEUE_STATE_HAIRPIN;
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index c59005ea2b..4a85415ff3 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -740,6 +740,8 @@ mlx5_hairpin_cap_get(struct rte_eth_dev *dev, struct rte_eth_hairpin_cap *cap)
 	cap->max_tx_2_rx = 1;
 	cap->max_nb_desc = 8192;
 	hca_attr = &priv->sh->cdev->config.hca_attr;
+	cap->rx_cap.locked_device_memory = hca_attr->hairpin_data_buffer_locked;
+	cap->rx_cap.rte_memory = 0;
 	cap->tx_cap.locked_device_memory = 0;
 	cap->tx_cap.rte_memory = hca_attr->hairpin_sq_wq_in_host_mem;
 	return 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 6/8] doc: add notes for hairpin to mlx5 documentation
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
                     ` (4 preceding siblings ...)
  2022-10-06 11:01   ` [PATCH v2 5/8] net/mlx5: allow hairpin Rx queue in locked memory Dariusz Sosnowski
@ 2022-10-06 11:01   ` Dariusz Sosnowski
  2022-10-06 11:01   ` [PATCH v2 7/8] app/testpmd: add hairpin queues memory modes Dariusz Sosnowski
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:01 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev

This patch extends mlx5 PMD documentation with more information
regarding hairpin support.

The following is added to mlx5 PMD documentation:

- description of the default behavior of hairpin queues,
- description of use_locked_device_memory effect on hairpin queue
  configuration,
- description of use_rte_memory effect on hairpin queue configuration,
- DPDK and OFED requirements for new memory options for hairpin.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 doc/guides/nics/mlx5.rst | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 3d4ee31f8d..997cb19ba2 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1517,6 +1517,43 @@ behavior as librte_net_mlx4::
    > port config all rss all
    > port start all
 
+Notes for hairpin
+-----------------
+
+NVIDIA Connect-X and BlueField devices support specifying memory
+placement for hairpin Rx and Tx queues. This feature requires OFED 5.8.
+
+By default, data buffers and packet descriptors for hairpin queues are placed
+in device memory which is shared with other resources (e.g. flow rules).
+
+Starting with DPDK 22.11 and OFED 5.8 applications are allowed to:
+
+#. Place data buffers and Rx packet descriptors in dedicated device memory.
+   Application can request that configuration through ``use_locked_device_memory``
+   configuration option.
+
+   Placing data buffers and Rx packet descriptors in dedicated device memory
+   can decrease latency on hairpinned traffic, since traffic processing
+   for the hairpin queue will not be memory starved.
+
+   However, reserving device memory for hairpin Rx queues may decrease throughput
+   under heavy load, since less resources will be available on device.
+
+   This option is supported only for Rx hairpin queues.
+
+#. Place Tx packet descriptors in host memory.
+   Application can request that configuration through ``use_rte_memory``
+   configuration option.
+
+   Placing Tx packet descritors in host memory can increase traffic throughput.
+   This results in more resources available on the device for other purposes,
+   which reduces memory contention on device.
+   Side effect of this option is visible increase in latency, since each packet
+   incurs additional PCI transactions.
+
+   This option is supported only for Tx hairpin queues.
+
+
 Usage example
 -------------
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 7/8] app/testpmd: add hairpin queues memory modes
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
                     ` (5 preceding siblings ...)
  2022-10-06 11:01   ` [PATCH v2 6/8] doc: add notes for hairpin to mlx5 documentation Dariusz Sosnowski
@ 2022-10-06 11:01   ` Dariusz Sosnowski
  2022-10-06 11:01   ` [PATCH v2 8/8] app/flow-perf: add hairpin queue memory config Dariusz Sosnowski
  2022-10-08 16:31   ` [PATCH v2 0/8] ethdev: introduce hairpin memory capabilities Thomas Monjalon
  8 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:01 UTC (permalink / raw)
  To: Aman Singh, Yuying Zhang; +Cc: dev

This patch extends hairpin-mode command line option of test-pmd
application with an ability to configure whether Rx/Tx hairpin queue
should use locked device memory or RTE memory.

For purposes of this configurations the following bits of 32 bit
hairpin-mode are reserved:

- Bit 8 - If set, then force_memory flag will be set for hairpin RX
  queue.
- Bit 9 - If set, then force_memory flag will be set for hairpin TX
  queue.
- Bits 12-15 - Memory options for hairpin Rx queue:
    - Bit 12 - If set, then use_locked_device_memory will be set.
    - Bit 13 - If set, then use_rte_memory will be set.
    - Bit 14 - Reserved for future use.
    - Bit 15 - Reserved for future use.
- Bits 16-19 - Memory options for hairpin Tx queue:
    - Bit 16 - If set, then use_locked_device_memory will be set.
    - Bit 17 - If set, then use_rte_memory will be set.
    - Bit 18 - Reserved for future use.
    - Bit 19 - Reserved for future use.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 app/test-pmd/parameters.c             |  2 +-
 app/test-pmd/testpmd.c                | 24 +++++++++++++++++++++++-
 app/test-pmd/testpmd.h                |  2 +-
 doc/guides/testpmd_app_ug/run_app.rst | 10 ++++++++--
 4 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 1024b5419c..14752f9571 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -1085,7 +1085,7 @@ launch_args_parse(int argc, char** argv)
 				if (errno != 0 || end == optarg)
 					rte_exit(EXIT_FAILURE, "hairpin mode invalid\n");
 				else
-					hairpin_mode = (uint16_t)n;
+					hairpin_mode = (uint32_t)n;
 			}
 			if (!strcmp(lgopts[opt_idx].name, "burst")) {
 				n = atoi(optarg);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 39ee3d331d..bb1c901742 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -409,7 +409,7 @@ bool setup_on_probe_event = true;
 uint8_t clear_ptypes = true;
 
 /* Hairpin ports configuration mode. */
-uint16_t hairpin_mode;
+uint32_t hairpin_mode;
 
 /* Pretty printing of ethdev events */
 static const char * const eth_event_desc[] = {
@@ -2519,6 +2519,16 @@ port_is_started(portid_t port_id)
 	return 1;
 }
 
+#define HAIRPIN_MODE_RX_FORCE_MEMORY RTE_BIT32(8)
+#define HAIRPIN_MODE_TX_FORCE_MEMORY RTE_BIT32(9)
+
+#define HAIRPIN_MODE_RX_LOCKED_MEMORY RTE_BIT32(12)
+#define HAIRPIN_MODE_RX_RTE_MEMORY RTE_BIT32(13)
+
+#define HAIRPIN_MODE_TX_LOCKED_MEMORY RTE_BIT32(16)
+#define HAIRPIN_MODE_TX_RTE_MEMORY RTE_BIT32(17)
+
+
 /* Configure the Rx and Tx hairpin queues for the selected port. */
 static int
 setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
@@ -2534,6 +2544,12 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 	uint16_t peer_tx_port = pi;
 	uint32_t manual = 1;
 	uint32_t tx_exp = hairpin_mode & 0x10;
+	uint32_t rx_force_memory = hairpin_mode & HAIRPIN_MODE_RX_FORCE_MEMORY;
+	uint32_t rx_locked_memory = hairpin_mode & HAIRPIN_MODE_RX_LOCKED_MEMORY;
+	uint32_t rx_rte_memory = hairpin_mode & HAIRPIN_MODE_RX_RTE_MEMORY;
+	uint32_t tx_force_memory = hairpin_mode & HAIRPIN_MODE_TX_FORCE_MEMORY;
+	uint32_t tx_locked_memory = hairpin_mode & HAIRPIN_MODE_TX_LOCKED_MEMORY;
+	uint32_t tx_rte_memory = hairpin_mode & HAIRPIN_MODE_TX_RTE_MEMORY;
 
 	if (!(hairpin_mode & 0xf)) {
 		peer_rx_port = pi;
@@ -2573,6 +2589,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 		hairpin_conf.peers[0].queue = i + nb_rxq;
 		hairpin_conf.manual_bind = !!manual;
 		hairpin_conf.tx_explicit = !!tx_exp;
+		hairpin_conf.force_memory = !!tx_force_memory;
+		hairpin_conf.use_locked_device_memory = !!tx_locked_memory;
+		hairpin_conf.use_rte_memory = !!tx_rte_memory;
 		diag = rte_eth_tx_hairpin_queue_setup
 			(pi, qi, nb_txd, &hairpin_conf);
 		i++;
@@ -2596,6 +2615,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 		hairpin_conf.peers[0].queue = i + nb_txq;
 		hairpin_conf.manual_bind = !!manual;
 		hairpin_conf.tx_explicit = !!tx_exp;
+		hairpin_conf.force_memory = !!rx_force_memory;
+		hairpin_conf.use_locked_device_memory = !!rx_locked_memory;
+		hairpin_conf.use_rte_memory = !!rx_rte_memory;
 		diag = rte_eth_rx_hairpin_queue_setup
 			(pi, qi, nb_rxd, &hairpin_conf);
 		i++;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 627a42ce3b..2244c25e97 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -562,7 +562,7 @@ extern uint16_t stats_period;
 extern struct rte_eth_xstat_name *xstats_display;
 extern unsigned int xstats_display_num;
 
-extern uint16_t hairpin_mode;
+extern uint32_t hairpin_mode;
 
 #ifdef RTE_LIB_LATENCYSTATS
 extern uint8_t latencystats_enabled;
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 8b41b960c8..abc3ec10a0 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -529,10 +529,16 @@ The command line options are:
 
     Enable display of RX and TX burst stats.
 
-*   ``--hairpin-mode=0xXX``
+*   ``--hairpin-mode=0xXXXX``
 
-    Set the hairpin port mode with bitmask, only valid when hairpin queues number is set::
+    Set the hairpin port configuration with bitmask, only valid when hairpin queues number is set::
 
+	bit 18 - hairpin TX queues will use RTE memory
+	bit 16 - hairpin TX queues will use locked device memory
+	bit 13 - hairpin RX queues will use RTE memory
+	bit 12 - hairpin RX queues will use locked device memory
+	bit 9 - force memory settings of hairpin TX queue
+	bit 8 - force memory settings of hairpin RX queue
 	bit 4 - explicit Tx flow rule
 	bit 1 - two hairpin ports paired
 	bit 0 - two hairpin ports loop
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 8/8] app/flow-perf: add hairpin queue memory config
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
                     ` (6 preceding siblings ...)
  2022-10-06 11:01   ` [PATCH v2 7/8] app/testpmd: add hairpin queues memory modes Dariusz Sosnowski
@ 2022-10-06 11:01   ` Dariusz Sosnowski
  2022-10-15 16:30     ` Wisam Monther
  2022-10-08 16:31   ` [PATCH v2 0/8] ethdev: introduce hairpin memory capabilities Thomas Monjalon
  8 siblings, 1 reply; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:01 UTC (permalink / raw)
  To: Wisam Jaddo; +Cc: dev

This patch adds the hairpin-conf command line parameter to flow-perf
application. hairpin-conf parameter takes a hexadecimal bitmask with
bits having the following meaning:

- Bit 0 - Force memory settings of hairpin RX queue.
- Bit 1 - Force memory settings of hairpin TX queue.
- Bit 4 - Use locked device memory for hairpin RX queue.
- Bit 5 - Use RTE memory for hairpin RX queue.
- Bit 8 - Use locked device memory for hairpin TX queue.
- Bit 9 - Use RTE memory for hairpin TX queue.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 app/test-flow-perf/main.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index f375097028..4a9206803a 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -46,6 +46,15 @@
 #define DEFAULT_RULES_BATCH     100000
 #define DEFAULT_GROUP                0
 
+#define HAIRPIN_RX_CONF_FORCE_MEMORY  (0x0001)
+#define HAIRPIN_TX_CONF_FORCE_MEMORY  (0x0002)
+
+#define HAIRPIN_RX_CONF_LOCKED_MEMORY (0x0010)
+#define HAIRPIN_RX_CONF_RTE_MEMORY    (0x0020)
+
+#define HAIRPIN_TX_CONF_LOCKED_MEMORY (0x0100)
+#define HAIRPIN_TX_CONF_RTE_MEMORY    (0x0200)
+
 struct rte_flow *flow;
 static uint8_t flow_group;
 
@@ -61,6 +70,7 @@ static uint32_t policy_id[MAX_PORTS];
 static uint8_t items_idx, actions_idx, attrs_idx;
 
 static uint64_t ports_mask;
+static uint64_t hairpin_conf_mask;
 static uint16_t dst_ports[RTE_MAX_ETHPORTS];
 static volatile bool force_quit;
 static bool dump_iterations;
@@ -482,6 +492,7 @@ usage(char *progname)
 	printf("  --enable-fwd: To enable packets forwarding"
 		" after insertion\n");
 	printf("  --portmask=N: hexadecimal bitmask of ports used\n");
+	printf("  --hairpin-conf=0xXXXX: hexadecimal bitmask of hairpin queue configuration\n");
 	printf("  --random-priority=N,S: use random priority levels "
 		"from 0 to (N - 1) for flows "
 		"and S as seed for pseudo-random number generator\n");
@@ -629,6 +640,7 @@ static void
 args_parse(int argc, char **argv)
 {
 	uint64_t pm, seed;
+	uint64_t hp_conf;
 	char **argvopt;
 	uint32_t prio;
 	char *token;
@@ -648,6 +660,7 @@ args_parse(int argc, char **argv)
 		{ "enable-fwd",                 0, 0, 0 },
 		{ "unique-data",                0, 0, 0 },
 		{ "portmask",                   1, 0, 0 },
+		{ "hairpin-conf",               1, 0, 0 },
 		{ "cores",                      1, 0, 0 },
 		{ "random-priority",            1, 0, 0 },
 		{ "meter-profile-alg",          1, 0, 0 },
@@ -880,6 +893,13 @@ args_parse(int argc, char **argv)
 					rte_exit(EXIT_FAILURE, "Invalid fwd port mask\n");
 				ports_mask = pm;
 			}
+			if (strcmp(lgopts[opt_idx].name, "hairpin-conf") == 0) {
+				end = NULL;
+				hp_conf = strtoull(optarg, &end, 16);
+				if ((optarg[0] == '\0') || (end == NULL) || (*end != '\0'))
+					rte_exit(EXIT_FAILURE, "Invalid hairpin config mask\n");
+				hairpin_conf_mask = hp_conf;
+			}
 			if (strcmp(lgopts[opt_idx].name,
 					"port-id") == 0) {
 				uint16_t port_idx = 0;
@@ -2035,6 +2055,12 @@ init_port(void)
 				hairpin_conf.peers[0].port = port_id;
 				hairpin_conf.peers[0].queue =
 					std_queue + tx_queues_count;
+				hairpin_conf.use_locked_device_memory =
+					!!(hairpin_conf_mask & HAIRPIN_RX_CONF_LOCKED_MEMORY);
+				hairpin_conf.use_rte_memory =
+					!!(hairpin_conf_mask & HAIRPIN_RX_CONF_RTE_MEMORY);
+				hairpin_conf.force_memory =
+					!!(hairpin_conf_mask & HAIRPIN_RX_CONF_FORCE_MEMORY);
 				ret = rte_eth_rx_hairpin_queue_setup(
 						port_id, hairpin_queue,
 						rxd_count, &hairpin_conf);
@@ -2050,6 +2076,12 @@ init_port(void)
 				hairpin_conf.peers[0].port = port_id;
 				hairpin_conf.peers[0].queue =
 					std_queue + rx_queues_count;
+				hairpin_conf.use_locked_device_memory =
+					!!(hairpin_conf_mask & HAIRPIN_TX_CONF_LOCKED_MEMORY);
+				hairpin_conf.use_rte_memory =
+					!!(hairpin_conf_mask & HAIRPIN_TX_CONF_RTE_MEMORY);
+				hairpin_conf.force_memory =
+					!!(hairpin_conf_mask & HAIRPIN_TX_CONF_FORCE_MEMORY);
 				ret = rte_eth_tx_hairpin_queue_setup(
 						port_id, hairpin_queue,
 						txd_count, &hairpin_conf);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 7/7] app/flow-perf: add hairpin queue memory config
  2022-10-04 12:24   ` Wisam Monther
@ 2022-10-06 11:06     ` Dariusz Sosnowski
  0 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:06 UTC (permalink / raw)
  To: Wisam Monther; +Cc: dev

Hi,

> -----Original Message-----
> From: Wisam Monther <wisamm@nvidia.com>
> Sent: Tuesday, October 4, 2022 14:25
> To: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH 7/7] app/flow-perf: add hairpin queue memory config
> 
> Hi Dariusz,
> 
> > -----Original Message-----
> > From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> > Sent: Monday, September 19, 2022 7:38 PM
> > To: Wisam Monther <wisamm@nvidia.com>
> > Cc: dev@dpdk.org
> > Subject: [PATCH 7/7] app/flow-perf: add hairpin queue memory config
> >
> > This patch adds the hairpin-conf command line parameter to flow-perf
> > application. hairpin-conf parameter takes a hexadecimal bitmask with
> > bits having the following meaning:
> >
> > - Bit 0 - Force memory settings of hairpin RX queue.
> > - Bit 1 - Force memory settings of hairpin TX queue.
> > - Bit 4 - Use locked device memory for hairpin RX queue.
> > - Bit 5 - Use RTE memory for hairpin RX queue.
> > - Bit 8 - Use locked device memory for hairpin TX queue.
> > - Bit 9 - Use RTE memory for hairpin TX queue.
> >
> > Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> > ---
> 
> You have some checks issues; can you please kindly check them?
> 
> BRs,
> Wisam Jaddo

I sent the v2. The checks issue should be fixed in that version.

Thank you.

Best regards,
Dariusz Sosnowski

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 0/7] ethdev: introduce hairpin memory capabilities
  2022-10-04 16:44 ` [PATCH 0/7] ethdev: introduce hairpin memory capabilities Thomas Monjalon
@ 2022-10-06 11:08   ` Dariusz Sosnowski
  0 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:08 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: Ferruh Yigit, Andrew Rybchenko, dev, Slava Ovsiienko,
	Matan Azrad, Ori Kam, Wisam Monther, Aman Singh, Yuying Zhang

Hi Thomas,

I sent the v2 of the patches.

> 1/ motivation (why is this needed)
I added the feature motivation in the cover letter of v2.

> 2/ compilation on Windows
>         looks like devx_umem_reg has 5 parameters in Windows glue!
Fix for that is included in v2.

Best regards,
Dariusz Sosnowski

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 1/7] ethdev: introduce hairpin memory capabilities
  2022-10-04 16:50   ` Thomas Monjalon
@ 2022-10-06 11:21     ` Dariusz Sosnowski
  0 siblings, 0 replies; 30+ messages in thread
From: Dariusz Sosnowski @ 2022-10-06 11:21 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: Ferruh Yigit, Andrew Rybchenko, dev

Hi Thomas,

> What is the benefit?

Goal of this patchset is to present application developers with more options to fine tune hairpin configuration to their use case.
I added more details regarding the possible benefits and motivation to the cover letter of v2.

> How the user knows what to use?
> Why it is not automatic in the driver?

Basic assumption is that the default behavior of the PMD (mlx5 in that specific case) is a baseline for hairpin performance.
If that default performance is enough for a user, he will not have to change anything in the hairpin queue configuration.
If performance is not satisfying, user will have to experiment with different configurations e.g., if decreasing traffic latency is a priority for the user,
user can check how his specific application works when `use_locked_device_memory` is used.

Specific performance gains of each of the hairpin options is not a given,
since hairpin performance will be a function of hairpin configuration and flow configuration (number of flows, types of flows, etc.).

> Isn't it too much low level for a user?

In my opinion, it's not too low level since purpose of these new options is fine tuning the hairpin performance to the specific use case of the application.

> > +      * - When set, PMD will use detault memory type as a backing
> > + storage. Please refer to PMD
> 
> You probably mean "clear".
> Please make lines shorter.
> You should split lines logically, after a dot or at the end of a part.

Fixed in v2.

> > +
> > +     uint32_t reserved:11; /**< Reserved bits. */
> 
> You can insert a blank line here.
> 
> >       struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS];
> > };

Added in v2.

Best regards,
Dariusz Sosnowski


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/8] ethdev: introduce hairpin memory capabilities
  2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
                     ` (7 preceding siblings ...)
  2022-10-06 11:01   ` [PATCH v2 8/8] app/flow-perf: add hairpin queue memory config Dariusz Sosnowski
@ 2022-10-08 16:31   ` Thomas Monjalon
  8 siblings, 0 replies; 30+ messages in thread
From: Thomas Monjalon @ 2022-10-08 16:31 UTC (permalink / raw)
  To: Dariusz Sosnowski
  Cc: Ferruh Yigit, Andrew Rybchenko, dev, Viacheslav Ovsiienko,
	Matan Azrad, Ori Kam, Wisam Jaddo, Aman Singh, Yuying Zhang

06/10/2022 13:00, Dariusz Sosnowski:
> The hairpin queues are used to transmit packets received on the wire, back to the wire.
> How hairpin queues are implemented and configured is decided internally by the PMD and
> applications have no control over the configuration of Rx and Tx hairpin queues.
> This patchset addresses that by:
> 
> - Extending hairpin queue capabilities reported by PMDs.
> - Exposing new configuration options for Rx and Tx hairpin queues.
> 
> Main goal of this patchset is to allow applications to provide configuration hints
> regarding memory placement of hairpin queues.
> These hints specify whether buffers of hairpin queues should be placed in host memory
> or in dedicated device memory.
> 
> For example, in context of NVIDIA Connect-X and BlueField devices,
> this distinction is important for several reasons:
> 
> - By default, data buffers and packet descriptors are placed in device memory region
>   which is shared with other resources (e.g. flow rules).
>   This results in memory contention on the device,
>   which may lead to degraded performance under heavy load.
> - Placing hairpin queues in dedicated device memory can decrease latency of hairpinned traffic,
>   since hairpin queue processing will not be memory starved by other operations.
>   Side effect of this memory configuration is that it leaves less memory for other resources,
>   possibly causing memory contention in non-hairpin traffic.
> - Placing hairpin queues in host memory can increase throughput of hairpinned
>   traffic at the cost of increasing latency.
>   Each packet processed by hairpin queues will incur additional PCI transactions (increase in latency),
>   but memory contention on the device is avoided.
> 
> Depending on the workload and whether throughput or latency has a higher priority for developers,
> it would be beneficial if developers could choose the best hairpin configuration for their use case.
> 
> To address that, this patchset adds the following configuration options (in rte_eth_hairpin_conf struct):
> 
> - use_locked_device_memory - If set, PMD will allocate specialized on-device memory for the queue.
> - use_rte_memory - If set, PMD will use DPDK-managed memory for the queue.
> - force_memory - If set, PMD will be forced to use provided memory configuration.
>   If no appropriate resources are available, the queue allocation will fail.
>   If unset and no appropriate resources are available, PMD will fallback to its default behavior.
> 
> Implementing support for these flags is optional and applications should be allowed to not set any of these new flags.
> This will result in default memory configuration provided by the PMD.
> Application developers should consult the PMD documentation in that case.
> 
> These changes were originally proposed in http://patches.dpdk.org/project/dpdk/patch/20220811120530.191683-1-dsosnowski@nvidia.com/.
> 
> Dariusz Sosnowski (8):
>   ethdev: introduce hairpin memory capabilities
>   common/mlx5: add hairpin SQ buffer type capabilities
>   common/mlx5: add hairpin RQ buffer type capabilities
>   net/mlx5: allow hairpin Tx queue in RTE memory
>   net/mlx5: allow hairpin Rx queue in locked memory
>   doc: add notes for hairpin to mlx5 documentation
>   app/testpmd: add hairpin queues memory modes
>   app/flow-perf: add hairpin queue memory config

Doc squashed in mlx5 commits.
Applied, thanks.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 8/8] app/flow-perf: add hairpin queue memory config
  2022-10-06 11:01   ` [PATCH v2 8/8] app/flow-perf: add hairpin queue memory config Dariusz Sosnowski
@ 2022-10-15 16:30     ` Wisam Monther
  0 siblings, 0 replies; 30+ messages in thread
From: Wisam Monther @ 2022-10-15 16:30 UTC (permalink / raw)
  To: Dariusz Sosnowski; +Cc: dev

Hi,

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Thursday, October 6, 2022 2:01 PM
> To: Wisam Monther <wisamm@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH v2 8/8] app/flow-perf: add hairpin queue memory config
> 
> This patch adds the hairpin-conf command line parameter to flow-perf
> application. hairpin-conf parameter takes a hexadecimal bitmask with bits
> having the following meaning:
> 
> - Bit 0 - Force memory settings of hairpin RX queue.
> - Bit 1 - Force memory settings of hairpin TX queue.
> - Bit 4 - Use locked device memory for hairpin RX queue.
> - Bit 5 - Use RTE memory for hairpin RX queue.
> - Bit 8 - Use locked device memory for hairpin TX queue.
> - Bit 9 - Use RTE memory for hairpin TX queue.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>

Acked-by: Wisam Jaddo <wisamm@nvidia.com>

BRs,
Wisam Jaddo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] net/mlx5: allow hairpin Rx queue in locked memory
  2022-09-19 16:37 ` [PATCH 5/7] net/mlx5: allow hairpin Rx queue in locked memory Dariusz Sosnowski
  2022-09-27 13:04   ` Slava Ovsiienko
@ 2022-11-25 14:06   ` Kenneth Klette Jonassen
  1 sibling, 0 replies; 30+ messages in thread
From: Kenneth Klette Jonassen @ 2022-11-25 14:06 UTC (permalink / raw)
  To: Dariusz Sosnowski; +Cc: Matan Azrad, Viacheslav Ovsiienko, dev

This series adds support for using device-managed MEMIC buffers on
hairpin RQ instead of NIM. Was it ever considered as an alternative
that the UMEM interface be extended to support MEMIC buffers instead?

I'm thinking that could simplify the hairpin-specific firmware bits
being added in this series, e.g. no new HAIRPIN_DATA_BUFFER_LOCK TLV,
and the MEMIC-backed UMEM can be passed to RQ using existing PRM bits.

I'm planning to file a feature request adding MEMIC support to UMEM,
so I'd be interested in knowing if that's somehow not possible. My
current use case is allocating 64 bytes of MEMIC for a collapsed CQE
for something similar to the mlx5 packet send scheduling in DPDK.

Best regards,
Kenneth Jonassen

> On 19 Sep 2022, at 18:37, Dariusz Sosnowski <dsosnowski@nvidia.com> wrote:
> 
> This patch adds a capability to place hairpin Rx queue in locked device
> memory. This capability is equivalent to storing hairpin RQ's data
> buffers in locked internal device memory.
> 
> Hairpin Rx queue creation is extended with requesting that RQ is
> allocated in locked internal device memory. If allocation fails and
> force_memory hairpin configuration is set, then hairpin queue creation
> (and, as a result, device start) fails. If force_memory is unset, then
> PMD will fallback to allocating memory for hairpin RQ in unlocked
> internal device memory.
> 
> To allow such allocation, the user must set HAIRPIN_DATA_BUFFER_LOCK
> flag in FW using mlxconfig tool.
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> ---
> doc/guides/platform/mlx5.rst   |  5 ++++
> drivers/net/mlx5/mlx5_devx.c   | 51 ++++++++++++++++++++++++++++------
> drivers/net/mlx5/mlx5_ethdev.c |  2 ++
> 3 files changed, 49 insertions(+), 9 deletions(-)
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-11-28  9:04 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-19 16:37 [PATCH 0/7] ethdev: introduce hairpin memory capabilities Dariusz Sosnowski
2022-09-19 16:37 ` [PATCH 1/7] " Dariusz Sosnowski
2022-10-04 16:50   ` Thomas Monjalon
2022-10-06 11:21     ` Dariusz Sosnowski
2022-09-19 16:37 ` [PATCH 2/7] common/mlx5: add hairpin SQ buffer type capabilities Dariusz Sosnowski
2022-09-27 13:03   ` Slava Ovsiienko
2022-09-19 16:37 ` [PATCH 3/7] common/mlx5: add hairpin RQ " Dariusz Sosnowski
2022-09-27 13:04   ` Slava Ovsiienko
2022-09-19 16:37 ` [PATCH 4/7] net/mlx5: allow hairpin Tx queue in RTE memory Dariusz Sosnowski
2022-09-27 13:05   ` Slava Ovsiienko
2022-09-19 16:37 ` [PATCH 5/7] net/mlx5: allow hairpin Rx queue in locked memory Dariusz Sosnowski
2022-09-27 13:04   ` Slava Ovsiienko
2022-11-25 14:06   ` Kenneth Klette Jonassen
2022-09-19 16:37 ` [PATCH 6/7] app/testpmd: add hairpin queues memory modes Dariusz Sosnowski
2022-09-19 16:37 ` [PATCH 7/7] app/flow-perf: add hairpin queue memory config Dariusz Sosnowski
2022-10-04 12:24   ` Wisam Monther
2022-10-06 11:06     ` Dariusz Sosnowski
2022-10-04 16:44 ` [PATCH 0/7] ethdev: introduce hairpin memory capabilities Thomas Monjalon
2022-10-06 11:08   ` Dariusz Sosnowski
2022-10-06 11:00 ` [PATCH v2 0/8] " Dariusz Sosnowski
2022-10-06 11:00   ` [PATCH v2 1/8] " Dariusz Sosnowski
2022-10-06 11:00   ` [PATCH v2 2/8] common/mlx5: add hairpin SQ buffer type capabilities Dariusz Sosnowski
2022-10-06 11:01   ` [PATCH v2 3/8] common/mlx5: add hairpin RQ " Dariusz Sosnowski
2022-10-06 11:01   ` [PATCH v2 4/8] net/mlx5: allow hairpin Tx queue in RTE memory Dariusz Sosnowski
2022-10-06 11:01   ` [PATCH v2 5/8] net/mlx5: allow hairpin Rx queue in locked memory Dariusz Sosnowski
2022-10-06 11:01   ` [PATCH v2 6/8] doc: add notes for hairpin to mlx5 documentation Dariusz Sosnowski
2022-10-06 11:01   ` [PATCH v2 7/8] app/testpmd: add hairpin queues memory modes Dariusz Sosnowski
2022-10-06 11:01   ` [PATCH v2 8/8] app/flow-perf: add hairpin queue memory config Dariusz Sosnowski
2022-10-15 16:30     ` Wisam Monther
2022-10-08 16:31   ` [PATCH v2 0/8] ethdev: introduce hairpin memory capabilities Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).