DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC 00/15] Add vDPA multi-threads optiomization
@ 2022-04-08  7:55 Li Zhang
  2022-04-08  7:55 ` [RFC 01/15] examples/vdpa: fix vDPA device remove Li Zhang
                   ` (19 more replies)
  0 siblings, 20 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

Allow the driver to use internal threads to
obtain fast configuration.
All the threads will be open on the same core of
the event completion queue scheduling thread.

Add max_conf_threads parameter to configure
the maximum number of internal threads in addition to
the caller thread (8 is suggested).
These internal threads to pipeline handle VDPA tasks
in system and shared with all VDPA devices.
Default is 0, don't use internal threads for configuration.

Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
http://patchwork.dpdk.org/project/dpdk/list/?series=21868

Li Zhang (11):
  common/mlx5: extend virtq modifiable fields
  vdpa/mlx5: pre-create virtq in the prob
  vdpa/mlx5: optimize datapath-control synchronization
  vdpa/mlx5: add multi-thread management for configuration
  vdpa/mlx5: add task ring for MT management
  vdpa/mlx5: add MT task for VM memory registration
  vdpa/mlx5: add virtq creation task for MT management
  vdpa/mlx5: add virtq LM log task
  vdpa/mlx5: add device close task
  vdpa/mlx5: add virtq sub-resources creation
  vdpa/mlx5: prepare virtqueue resource creation

Yajun Wu (4):
  examples/vdpa: fix vDPA device remove
  vdpa/mlx5: support pre create virtq resource
  common/mlx5: add DevX API to move QP to reset state
  vdpa/mlx5: support event qp reuse

 doc/guides/vdpadevs/mlx5.rst          |  25 ++
 drivers/common/mlx5/mlx5_devx_cmds.c  |  77 +++-
 drivers/common/mlx5/mlx5_devx_cmds.h  |   6 +-
 drivers/common/mlx5/mlx5_prm.h        |  30 +-
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 227 +++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h         | 147 ++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 362 ++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   | 160 +++++--
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 133 ++++--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 268 ++++++++----
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c   |  20 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 582 ++++++++++++++++++--------
 examples/vdpa/main.c                  |   4 +
 14 files changed, 1674 insertions(+), 368 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 01/15] examples/vdpa: fix vDPA device remove
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
@ 2022-04-08  7:55 ` Li Zhang
  2022-04-08  7:55 ` [RFC 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas, Maxime Coquelin, Chenbo Xia,
	Xiaolong Ye, Xiao Wang
  Cc: dev, rasland, Yajun Wu, stable

From: Yajun Wu <yajunw@nvidia.com>

Add calling rte_dev_remove in vDPA example application exit. Otherwise
rte_dev_remove never get called.

Fixes: edbed86d1cc ("examples/vdpa: introduce a new sample for vDPA")
Cc: stable@dpdk.org

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 examples/vdpa/main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c
index bd66deca85..19753f6e09 100644
--- a/examples/vdpa/main.c
+++ b/examples/vdpa/main.c
@@ -593,6 +593,10 @@ main(int argc, char *argv[])
 		vdpa_sample_quit();
 	}
 
+	RTE_DEV_FOREACH(dev, "class=vdpa", &dev_iter) {
+		rte_dev_remove(dev);
+	}
+
 	/* clean up the EAL */
 	rte_eal_cleanup();
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 02/15] vdpa/mlx5: support pre create virtq resource
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
  2022-04-08  7:55 ` [RFC 01/15] examples/vdpa: fix vDPA device remove Li Zhang
@ 2022-04-08  7:55 ` Li Zhang
  2022-04-08  7:55 ` [RFC 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

The motivation of this change is to reduce vDPA device queue creation
time by create some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configurtion
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++++++++++++++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+    first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx queue)
+    for pre-create queue resource to speed up first time queue creation. Set it
+    together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^^^^^^^^^^^^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 534ba64b02..57f9b05e35 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-	mlx5_vdpa_virtqs_cleanup(priv);
+	/* Clean pre-created resource in dev removal only. */
+	if (!priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 		priv->hw_max_latency_us = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_pending_comp") == 0) {
 		priv->hw_max_pending_comp = (uint32_t)tmp;
+	} else if (strcmp(key, "queue_size") == 0) {
+		priv->queue_size = (uint16_t)tmp;
+	} else if (strcmp(key, "queues") == 0) {
+		priv->queues = (uint16_t)tmp;
+	} else {
+		DRV_LOG(WARNING, "Invalid key %s.", key);
 	}
 	return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 	if (!priv->event_us &&
 	    priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
 		priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+	if ((priv->queue_size && !priv->queues) ||
+		(!priv->queue_size && priv->queues)) {
+		priv->queue_size = 0;
+		priv->queues = 0;
+		DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+	}
 	DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
 	DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+	DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+		priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t index;
+	uint32_t i;
+
+	if (!priv->queues)
+		return 0;
+	for (index = 0; index < (priv->queues * 2); ++index) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+		if (priv->caps.queue_counters_valid) {
+			if (!virtq->counters)
+				virtq->counters =
+					mlx5_devx_cmd_create_virtio_q_counters
+						(priv->cdev->ctx);
+			if (!virtq->counters) {
+				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
+					" %d.", index);
+				return -1;
+			}
+		}
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size = priv->caps.umems[i].a * priv->queue_size +
+					priv->caps.umems[i].b;
+			buf = rte_zmalloc(__func__, size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+						" %u.", i, index);
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
+					size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				rte_free(buf);
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+						i, index);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
+		}
+	}
+	return 0;
 }
 
 static int
@@ -604,6 +671,8 @@ mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
 		return -rte_errno;
 	if (mlx5_vdpa_event_qp_global_prepare(priv))
 		return -rte_errno;
+	if (mlx5_vdpa_virtq_resource_prepare(priv))
+		return -rte_errno;
 	return 0;
 }
 
@@ -638,6 +707,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		priv->num_lag_ports = 1;
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
+	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -646,7 +716,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	mlx5_vdpa_config_get(mkvlist, priv);
 	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
@@ -684,6 +753,8 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	if (priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_dev_cache_clean(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
 		if (!priv->virtqs[i].counters)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e7f3319f89..f6719a3c60 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -135,6 +135,8 @@ struct mlx5_vdpa_priv {
 	uint8_t hw_latency_mode; /* Hardware CQ moderation mode. */
 	uint16_t hw_max_latency_us; /* Hardware CQ moderation period in usec. */
 	uint16_t hw_max_pending_comp; /* Hardware CQ moderation counter. */
+	uint16_t queue_size; /* virtq depth for pre-creating virtq resource */
+	uint16_t queues; /* Max virtq pair for pre-creating virtq resource */
 	struct rte_vdpa_device *vdev; /* vDPA device. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	int vid; /* vhost device id. */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 03/15] common/mlx5: add DevX API to move QP to reset state
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
  2022-04-08  7:55 ` [RFC 01/15] examples/vdpa: fix vDPA device remove Li Zhang
  2022-04-08  7:55 ` [RFC 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
@ 2022-04-08  7:55 ` Li Zhang
  2022-04-08  7:55 ` [RFC 04/15] vdpa/mlx5: support event qp reuse Li Zhang
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++++++
 drivers/common/mlx5/mlx5_prm.h       | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index d02ac2a678..a2943c9a58 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2255,11 +2255,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
 	} in;
 	union {
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
 	} out;
 	void *qpc;
 	int ret;
@@ -2302,6 +2304,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		inlen = sizeof(in.rtr2rts);
 		outlen = sizeof(out.rtr2rts);
 		break;
+	case MLX5_CMD_OP_QP_2RST:
+		MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+		inlen = sizeof(in.qp2rst);
+		outlen = sizeof(out.qp2rst);
+		break;
 	default:
 		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
 			qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 44b18225f6..cca6bfc6d4 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3653,6 +3653,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
 	u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 vhca_tunnel_id[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_80[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
 	u8 status[0x8];
 	u8 reserved_0[0x18];
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 04/15] vdpa/mlx5: support event qp reuse
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (2 preceding siblings ...)
  2022-04-08  7:55 ` [RFC 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
@ 2022-04-08  7:55 ` Li Zhang
  2022-04-08  7:55 ` [RFC 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destory,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 57f9b05e35..03ad01c156 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+					-1, &virtq->eqp);
 
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+				index);
+			return -1;
+		}
 		if (priv->caps.queue_counters_valid) {
 			if (!virtq->counters)
 				virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
 	struct mlx5_vdpa_cq cq;
 	struct mlx5_devx_obj *fw_qp;
 	struct mlx5_devx_qp sw_qp;
+	uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			      int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		};
 		uint32_t word;
 	} last_word;
-	uint16_t next_wqe_counter = cq->cq_ci;
+	uint16_t next_wqe_counter = eqp->qp_pi;
 	uint16_t cur_wqe_counter;
 	uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		rte_io_wmb();
 		/* Ring CQ doorbell record. */
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+		eqp->qp_pi += comp;
 		rte_io_wmb();
 		/* Ring SW QP doorbell record. */
-		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
 	}
 	return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 	return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+		mlx5_vdpa_queue_complete(cq);
+		if (cq->cq_obj.cq) {
+			cq->cq_obj.cqes[0].wqe_counter =
+				rte_cpu_to_be_16(UINT16_MAX);
+			priv->virtqs[i].eqp.qp_pi = 0;
+			if (!cq->armed)
+				mlx5_vdpa_cq_arm(priv, cq);
+		}
+	}
+}
+
 /* Wait on all CQs channel for completion event. */
 static struct mlx5_vdpa_cq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
@@ -574,14 +594,44 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
+static int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
+					  eqp->sw_qp.qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp.qp,
+			MLX5_CMD_OP_QP_2RST, eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	return mlx5_vdpa_qps2rts(eqp);
+}
+
 int
-mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			  int callfd, struct mlx5_vdpa_event_qp *eqp)
 {
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
+	if (eqp->cq.cq_obj.cq != NULL && log_desc_n == eqp->cq.log_desc_n) {
+		/* Reuse existing resources. */
+		eqp->cq.callfd = callfd;
+		/* FW will set event qp to error state in q destroy. */
+		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+					&eqp->sw_qp.db_rec[0]);
+			return 0;
+		}
+	}
+	if (eqp->fw_qp)
+		mlx5_vdpa_event_qp_destroy(eqp);
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
@@ -608,8 +658,10 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (mlx5_vdpa_qps2rts(eqp))
 		goto error;
+	eqp->qp_pi = 0;
 	/* First ringing. */
-	rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+	if (eqp->sw_qp.db_rec)
+		rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 			&eqp->sw_qp.db_rec[0]);
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e025be47d2..28cef69a58 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -87,6 +87,8 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 			}
 			virtq->umems[j].size = 0;
 		}
+		if (virtq->eqp.fw_qp)
+			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	}
 }
 
@@ -117,8 +119,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	if (virtq->eqp.fw_qp)
-		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
@@ -246,7 +246,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 						      MLX5_VIRTQ_EVENT_MODE_QP :
 						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
 						&virtq->eqp);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 05/15] common/mlx5: extend virtq modifiable fields
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (3 preceding siblings ...)
  2022-04-08  7:55 ` [RFC 04/15] vdpa/mlx5: support event qp reuse Li Zhang
@ 2022-04-08  7:55 ` Li Zhang
  2022-04-08  7:55 ` [RFC 06/15] vdpa/mlx5: pre-create virtq in the prob Li Zhang
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++++++++++++++++++++++-----
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h       | 13 +++++-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index a2943c9a58..fd5b5dd378 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
 		vdpa_attr->log_doorbell_stride =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_stride);
+		vdpa_attr->vnet_modify_ext =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 vnet_modify_ext);
+		vdpa_attr->virtio_net_q_addr_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_net_q_addr_modify);
+		vdpa_attr->virtio_q_index_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_q_index_modify);
 		vdpa_attr->log_doorbell_bar_size =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_bar_size);
@@ -2065,27 +2074,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+		attr->mod_fields_bitmap);
 	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-	switch (attr->type) {
-	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+	if (!attr->mod_fields_bitmap) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
 		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 			 attr->dirty_bitmap_mkey);
 		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 			 attr->dirty_bitmap_addr);
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 			 attr->dirty_bitmap_size);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+	}
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 			 attr->dirty_bitmap_dump_enable);
-		break;
-	default:
-		rte_errno = EINVAL;
-		return -rte_errno;
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+		MLX5_SET(virtio_q, virtctx, queue_period_mode,
+			attr->hw_latency_mode);
+		MLX5_SET(virtio_q, virtctx, queue_period_us,
+			attr->hw_max_latency_us);
+		MLX5_SET(virtio_q, virtctx, queue_max_count,
+			attr->hw_max_pending_comp);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+		MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+		MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+		MLX5_SET64(virtio_q, virtctx, available_addr,
+			attr->available_addr);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+			attr->hw_used_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+		MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+		MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY)
+		MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	if (attr->mod_fields_bitmap &
+		MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK) {
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+		MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+		MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE) {
+		MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+		MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
 	}
 	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
 					 out, sizeof(out));
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 1bac18c59d..d93be8fe2c 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -74,6 +74,9 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t queue_counters_valid:1;
+	uint32_t vnet_modify_ext:1;
+	uint32_t virtio_net_q_addr_modify:1;
+	uint32_t virtio_q_index_modify:1;
 	uint32_t max_num_virtio_queues;
 	struct {
 		uint32_t a;
@@ -464,7 +467,7 @@ struct mlx5_devx_virtq_attr {
 	uint32_t tis_id;
 	uint32_t counters_obj_id;
 	uint64_t dirty_bitmap_addr;
-	uint64_t type;
+	uint64_t mod_fields_bitmap;
 	uint64_t desc_addr;
 	uint64_t used_addr;
 	uint64_t available_addr;
@@ -474,6 +477,7 @@ struct mlx5_devx_virtq_attr {
 		uint64_t offset;
 	} umems[3];
 	uint8_t error_type;
+	uint8_t q_type;
 };
 
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index cca6bfc6d4..4cc1427b9b 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1798,7 +1798,9 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
 	u8 virtio_queue_type[0x8];
 	u8 reserved_at_20[0x13];
 	u8 log_doorbell_stride[0x5];
-	u8 reserved_at_3b[0x3];
+	u8 vnet_modify_ext[0x1];
+	u8 virtio_net_q_addr_modify[0x1];
+	u8 virtio_q_index_modify[0x1];
 	u8 log_doorbell_bar_size[0x5];
 	u8 doorbell_bar_offset[0x40];
 	u8 reserved_at_80[0x8];
@@ -3020,6 +3022,15 @@ enum {
 	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD = (1UL << 5),
+	MLX5_VIRTQ_MODIFY_TYPE_ADDR = (1UL << 6),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX = (1UL << 7),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX = (1UL << 8),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE = (1UL << 9),
+	MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 = (1UL << 10),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY = (1UL << 11),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK = (1UL << 12),
+	MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE = (1UL << 13),
 };
 
 struct mlx5_ifc_virtio_q_bits {
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 06/15] vdpa/mlx5: pre-create virtq in the prob
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (4 preceding siblings ...)
  2022-04-08  7:55 ` [RFC 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
@ 2022-04-08  7:55 ` Li Zhang
  2022-04-08  7:55 ` [RFC 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h       |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    |  13 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 257 +++++++++++++++++-----------
 3 files changed, 170 insertions(+), 104 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
 	uint16_t vq_size;
 	uint8_t notifier_state;
 	bool stopped;
+	uint32_t configured:1;
 	uint32_t version;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..a8faf0c116 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,14 +12,17 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.mod_fields_bitmap =
+			MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
 		.dirty_bitmap_dump_enable = enable,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
@@ -37,10 +40,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 			   uint64_t log_size)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
 		.dirty_bitmap_addr = log_base,
 		.dirty_bitmap_size = log_size,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 	int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
 					      priv->cdev->pdn,
@@ -54,7 +58,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 						      &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 28cef69a58..ef5bf1ef01 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		rte_intr_fd_set(virtq->intr_handle, -1);
 	}
 	rte_intr_instance_free(virtq->intr_handle);
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
+		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
-			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE,
 			.state = state ? MLX5_VIRTQ_STATE_RDY :
 					 MLX5_VIRTQ_STATE_SUSPEND,
 			.queue_index = virtq->index,
@@ -153,7 +155,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	int ret;
 
-	if (virtq->stopped)
+	if (virtq->stopped || !virtq->configured)
 		return 0;
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
@@ -209,51 +211,54 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
+		struct mlx5_devx_virtq_attr *attr,
+		struct rte_vhost_vring *vq, int index)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
-	struct rte_vhost_vring vq;
-	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
 	unsigned int i;
-	uint16_t last_avail_idx;
-	uint16_t last_used_idx;
-	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
-	uint64_t cookie;
-
-	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
-	if (ret)
-		return -1;
-	if (vq.size == 0)
-		return 0;
-	virtq->index = index;
-	virtq->vq_size = vq.size;
-	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
-							VIRTIO_F_VERSION_1));
-	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+	uint16_t last_avail_idx = 0;
+	uint16_t last_used_idx = 0;
+
+	if (virtq->virtq)
+		attr->mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE |
+			MLX5_VIRTQ_MODIFY_TYPE_ADDR |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
+			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
+			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 =
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
+	attr->q_type =
+		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
 			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
-					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
-						      MLX5_VIRTQ_EVENT_MODE_QP :
-						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
-	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
-						&virtq->eqp);
+	attr->event_mode = vq->callfd != -1 ||
+	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
+		ret = mlx5_vdpa_event_qp_prepare(priv,
+				vq->size, vq->callfd, &virtq->eqp);
 		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+			DRV_LOG(ERR,
+				"Failed to create event QPs for virtq %d.",
 				index);
 			return -1;
 		}
-		attr.qp_id = virtq->eqp.fw_qp->id;
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+		attr->qp_id = virtq->eqp.fw_qp->id;
 	} else {
 		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
 			" need event QPs and event mechanism.", index);
@@ -265,77 +270,82 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (!virtq->counters) {
 			DRV_LOG(ERR, "Failed to create virtq couners for virtq"
 				" %d.", index);
-			goto error;
+			return -1;
 		}
-		attr.counters_obj_id = virtq->counters->id;
+		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		uint32_t size;
-		void *buf;
-		struct mlx5dv_devx_umem *obj;
-
-		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
-		if (virtq->umems[i].size == size &&
-		    virtq->umems[i].obj != NULL) {
-			/* Reuse registered memory. */
-			memset(virtq->umems[i].buf, 0, size);
-			goto reuse;
-		}
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
+	if (virtq->virtq) {
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size =
+		priv->caps.umems[i].a * vq->size + priv->caps.umems[i].b;
+			if (virtq->umems[i].size == size &&
+				virtq->umems[i].obj != NULL) {
+				/* Reuse registered memory. */
+				memset(virtq->umems[i].buf, 0, size);
+				goto reuse;
+			}
+			if (virtq->umems[i].obj)
+				claim_zero(mlx5_glue->devx_umem_dereg
 				   (virtq->umems[i].obj));
-		if (virtq->umems[i].buf)
-			rte_free(virtq->umems[i].buf);
-		virtq->umems[i].size = 0;
-		virtq->umems[i].obj = NULL;
-		virtq->umems[i].buf = NULL;
-		buf = rte_zmalloc(__func__, size, 4096);
-		if (buf == NULL) {
-			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+			if (virtq->umems[i].buf)
+				rte_free(virtq->umems[i].buf);
+			virtq->umems[i].size = 0;
+			virtq->umems[i].obj = NULL;
+			virtq->umems[i].buf = NULL;
+			buf = rte_zmalloc(__func__,
+				size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
-			goto error;
-		}
-		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
-					       IBV_ACCESS_LOCAL_WRITE);
-		if (obj == NULL) {
-			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
+				buf, size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
-			goto error;
-		}
-		virtq->umems[i].size = size;
-		virtq->umems[i].buf = buf;
-		virtq->umems[i].obj = obj;
+				rte_free(buf);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
 reuse:
-		attr.umems[i].id = virtq->umems[i].obj->umem_id;
-		attr.umems[i].offset = 0;
-		attr.umems[i].size = virtq->umems[i].size;
+			attr->umems[i].id = virtq->umems[i].obj->umem_id;
+			attr->umems[i].offset = 0;
+			attr->umems[i].size = virtq->umems[i].size;
+		}
 	}
-	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.desc);
+					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
-			goto error;
+			return -1;
 		}
-		attr.desc_addr = gpa;
+		attr->desc_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.used);
+					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
-			goto error;
+			return -1;
 		}
-		attr.used_addr = gpa;
+		attr->used_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.avail);
+					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-			goto error;
+			return -1;
 		}
-		attr.available_addr = gpa;
+		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
-				 &last_used_idx);
+	ret = rte_vhost_get_vring_base(priv->vid,
+			index, &last_avail_idx, &last_used_idx);
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
@@ -345,24 +355,71 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
 	}
-	attr.hw_available_index = last_avail_idx;
-	attr.hw_used_index = last_used_idx;
-	attr.q_size = vq.size;
-	attr.mkey = priv->gpa_mkey_index;
-	attr.tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
-	attr.queue_index = index;
-	attr.pd = priv->cdev->pdn;
-	attr.hw_latency_mode = priv->hw_latency_mode;
-	attr.hw_max_latency_us = priv->hw_max_latency_us;
-	attr.hw_max_pending_comp = priv->hw_max_pending_comp;
-	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+	attr->hw_available_index = last_avail_idx;
+	attr->hw_used_index = last_used_idx;
+	attr->q_size = vq->size;
+	attr->mkey = priv->gpa_mkey_index;
+	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
+	attr->queue_index = index;
+	attr->pd = priv->cdev->pdn;
+	attr->hw_latency_mode = priv->hw_latency_mode;
+	attr->hw_max_latency_us = priv->hw_max_latency_us;
+	attr->hw_max_pending_comp = priv->hw_max_pending_comp;
+	if (attr->hw_latency_mode || attr->hw_max_latency_us ||
+		attr->hw_max_pending_comp)
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD;
+	return 0;
+}
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
+{
+	return (priv->caps.vnet_modify_ext &&
+			priv->caps.virtio_net_q_addr_modify &&
+			priv->caps.virtio_q_index_modify) ? true : false;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+{
+	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	int ret;
+	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
+	uint64_t cookie;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->priv = priv;
-	if (!virtq->virtq)
+	virtq->stopped = 0;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
+				&vq, index);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to setup update virtq attr"
+			" %d.", index);
 		goto error;
-	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
-	if (mlx5_vdpa_virtq_modify(virtq, 1))
+	}
+	if (!virtq->virtq) {
+		virtq->index = index;
+		virtq->vq_size = vq.size;
+		virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx,
+			&attr);
+		if (!virtq->virtq)
+			goto error;
+		attr.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE;
+	}
+	attr.state = MLX5_VIRTQ_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify virtq %d.", index);
 		goto error;
-	virtq->priv = priv;
+	}
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->configured = 1;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
@@ -553,7 +610,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 			return 0;
 		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
 	}
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
 			ret = mlx5_vdpa_steer_update(priv);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 07/15] vdpa/mlx5: optimize datapath-control synchronization
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (5 preceding siblings ...)
  2022-04-08  7:55 ` [RFC 06/15] vdpa/mlx5: pre-create virtq in the prob Li Zhang
@ 2022-04-08  7:55 ` Li Zhang
  2022-04-08  7:55 ` [RFC 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 24 ++++---
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++++++++++++++++++-----------
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    | 34 +++++++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 83 +++++++++++++++++-------
 6 files changed, 180 insertions(+), 78 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 03ad01c156..e99c86b3d6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
 	struct mlx5_vdpa_priv *priv =
 		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	struct mlx5_vdpa_virtq *virtq;
 	int ret;
 
 	if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
-	pthread_mutex_lock(&priv->vq_config_lock);
+	virtq = &priv->virtqs[vring];
+	pthread_mutex_lock(&virtq->virtq_lock);
 	ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-	pthread_mutex_unlock(&priv->vq_config_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
-	/* The mutex may stay locked after event thread cancel - initiate it. */
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t index;
 	uint32_t i;
 
+	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+		index++) {
+		virtq = &priv->virtqs[index];
+		pthread_mutex_init(&virtq->virtq_lock, NULL);
+	}
 	if (!priv->queues)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
-		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		virtq = &priv->virtqs[index];
 		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, &virtq->eqp);
+					-1, virtq);
 
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
+	rte_spinlock_init(&priv->db_lock);
+	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
-	pthread_mutex_destroy(&priv->vq_config_lock);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
 	bool stopped;
 	uint32_t configured:1;
 	uint32_t version;
+	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
 	enum mlx5_dev_state state;
-	pthread_mutex_t vq_config_lock;
+	rte_spinlock_t db_lock;
+	pthread_mutex_t steer_update_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
 	int event_mode;
@@ -222,14 +224,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   Number of descriptors.
  * @param[in] callfd
  *   The guest notification file descriptor.
- * @param[in/out] eqp
- *   Pointer to the event QP structure.
+ * @param[in/out] virtq
+ *   Pointer to the virt-queue structure.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			      int callfd, struct mlx5_vdpa_event_qp *eqp);
+int
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+	int callfd, struct mlx5_vdpa_virtq *virtq);
 
 /**
  * Destroy an event QP and all its related resources.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b43dca9255..2b0f5936d1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -85,12 +85,13 @@ mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
 
 static int
 mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
-		    int callfd, struct mlx5_vdpa_cq *cq)
+		int callfd, struct mlx5_vdpa_virtq *virtq)
 {
 	struct mlx5_devx_cq_attr attr = {
 		.use_first_only = 1,
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
+	struct mlx5_vdpa_cq *cq = &virtq->eqp.cq;
 	uint16_t event_nums[1] = {0};
 	int ret;
 
@@ -102,10 +103,11 @@ mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
 	cq->log_desc_n = log_desc_n;
 	rte_spinlock_init(&cq->sl);
 	/* Subscribe CQ event to the event channel controlled by the driver. */
-	ret = mlx5_os_devx_subscribe_devx_event(priv->eventc,
-						cq->cq_obj.cq->obj,
-						sizeof(event_nums), event_nums,
-						(uint64_t)(uintptr_t)cq);
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc,
+							cq->cq_obj.cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)virtq);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to subscribe CQE event.");
 		rte_errno = errno;
@@ -167,13 +169,17 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 static void
 mlx5_vdpa_arm_all_cqs(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_cq *cq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		cq = &priv->virtqs[i].eqp.cq;
 		if (cq->cq_obj.cq && !cq->armed)
 			mlx5_vdpa_cq_arm(priv, cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
@@ -220,13 +226,18 @@ mlx5_vdpa_queue_complete(struct mlx5_vdpa_cq *cq)
 static uint32_t
 mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 {
-	int i;
+	struct mlx5_vdpa_virtq *virtq;
+	struct mlx5_vdpa_cq *cq;
 	uint32_t max = 0;
+	uint32_t comp;
+	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
-		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
-		uint32_t comp = mlx5_vdpa_queue_complete(cq);
-
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		cq = &virtq->eqp.cq;
+		comp = mlx5_vdpa_queue_complete(cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		if (comp > max)
 			max = comp;
 	}
@@ -253,7 +264,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 }
 
 /* Wait on all CQs channel for completion event. */
-static struct mlx5_vdpa_cq *
+static struct mlx5_vdpa_virtq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 {
 #ifdef HAVE_IBV_DEVX_EVENT
@@ -265,7 +276,8 @@ mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 					    sizeof(out.buf));
 
 	if (ret >= 0)
-		return (struct mlx5_vdpa_cq *)(uintptr_t)out.event_resp.cookie;
+		return (struct mlx5_vdpa_virtq *)
+				(uintptr_t)out.event_resp.cookie;
 	DRV_LOG(INFO, "Got error in devx_get_event, ret = %d, errno = %d.",
 		ret, errno);
 #endif
@@ -276,7 +288,7 @@ static void *
 mlx5_vdpa_event_handle(void *arg)
 {
 	struct mlx5_vdpa_priv *priv = arg;
-	struct mlx5_vdpa_cq *cq;
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t max;
 
 	switch (priv->event_mode) {
@@ -284,7 +296,6 @@ mlx5_vdpa_event_handle(void *arg)
 	case MLX5_VDPA_EVENT_MODE_FIXED_TIMER:
 		priv->timer_delay_us = priv->event_us;
 		while (1) {
-			pthread_mutex_lock(&priv->vq_config_lock);
 			max = mlx5_vdpa_queues_complete(priv);
 			if (max == 0 && priv->no_traffic_counter++ >=
 			    priv->no_traffic_max) {
@@ -292,32 +303,37 @@ mlx5_vdpa_event_handle(void *arg)
 					priv->vdev->device->name);
 				mlx5_vdpa_arm_all_cqs(priv);
 				do {
-					pthread_mutex_unlock
-							(&priv->vq_config_lock);
-					cq = mlx5_vdpa_event_wait(priv);
-					pthread_mutex_lock
-							(&priv->vq_config_lock);
-					if (cq == NULL ||
-					       mlx5_vdpa_queue_complete(cq) > 0)
+					virtq = mlx5_vdpa_event_wait(priv);
+					if (virtq == NULL)
 						break;
+					pthread_mutex_lock(
+						&virtq->virtq_lock);
+					if (mlx5_vdpa_queue_complete(
+						&virtq->eqp.cq) > 0) {
+						pthread_mutex_unlock(
+							&virtq->virtq_lock);
+						break;
+					}
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
 				} while (1);
 				priv->timer_delay_us = priv->event_us;
 				priv->no_traffic_counter = 0;
 			} else if (max != 0) {
 				priv->no_traffic_counter = 0;
 			}
-			pthread_mutex_unlock(&priv->vq_config_lock);
 			mlx5_vdpa_timer_sleep(priv, max);
 		}
 		return NULL;
 	case MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT:
 		do {
-			cq = mlx5_vdpa_event_wait(priv);
-			if (cq != NULL) {
-				pthread_mutex_lock(&priv->vq_config_lock);
-				if (mlx5_vdpa_queue_complete(cq) > 0)
-					mlx5_vdpa_cq_arm(priv, cq);
-				pthread_mutex_unlock(&priv->vq_config_lock);
+			virtq = mlx5_vdpa_event_wait(priv);
+			if (virtq != NULL) {
+				pthread_mutex_lock(&virtq->virtq_lock);
+				if (mlx5_vdpa_queue_complete(
+					&virtq->eqp.cq) > 0)
+					mlx5_vdpa_cq_arm(priv, &virtq->eqp.cq);
+				pthread_mutex_unlock(&virtq->virtq_lock);
 			}
 		} while (1);
 		return NULL;
@@ -339,7 +355,6 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t sec;
 
-	pthread_mutex_lock(&priv->vq_config_lock);
 	while (mlx5_glue->devx_get_event(priv->err_chnl, &out.event_resp,
 					 sizeof(out.buf)) >=
 				       (ssize_t)sizeof(out.event_resp.cookie)) {
@@ -351,10 +366,11 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			continue;
 		}
 		virtq = &priv->virtqs[vq_index];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		if (!virtq->enable || virtq->version != version)
-			continue;
+			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
-			continue;
+			goto unlock;
 		virtq->stopped = true;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
@@ -384,8 +400,9 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 		for (i = 1; i < RTE_DIM(virtq->err_time); i++)
 			virtq->err_time[i - 1] = virtq->err_time[i];
 		virtq->err_time[RTE_DIM(virtq->err_time) - 1] = rte_rdtsc();
+unlock:
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
-	pthread_mutex_unlock(&priv->vq_config_lock);
 #endif
 }
 
@@ -533,11 +550,18 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 void
 mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	void *status;
+	int i;
 
 	if (priv->timer_tid) {
 		pthread_cancel(priv->timer_tid);
 		pthread_join(priv->timer_tid, &status);
+		/* The mutex may stay locked after event thread cancel, initiate it. */
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_init(&virtq->virtq_lock, NULL);
+		}
 	}
 	priv->timer_tid = 0;
 }
@@ -614,8 +638,9 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			  int callfd, struct mlx5_vdpa_event_qp *eqp)
+	int callfd, struct mlx5_vdpa_virtq *virtq)
 {
+	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
@@ -632,7 +657,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
-	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
+	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, virtq) ||
+		!eqp->cq.cq_obj.cq)
 		return -1;
 	attr.pd = priv->cdev->pdn;
 	attr.ts_format =
@@ -650,8 +676,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	attr.ts_format =
 		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
 	ret = mlx5_devx_qp_create(priv->cdev->ctx, &(eqp->sw_qp),
-					attr.num_of_receive_wqes *
-					MLX5_WSEG_SIZE, &attr, SOCKET_ID_ANY);
+				  attr.num_of_receive_wqes * MLX5_WSEG_SIZE,
+				  &attr, SOCKET_ID_ANY);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
 		goto error;
@@ -668,3 +694,4 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	mlx5_vdpa_event_qp_destroy(eqp);
 	return -1;
 }
+
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index a8faf0c116..efebf364d0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -25,11 +25,18 @@ mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 			   &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
 				"bitmap enabling.", i);
-			return -1;
+				return -1;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -61,10 +68,19 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 		virtq = &priv->virtqs[i];
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
-						      &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for LM.", i);
-			goto err;
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(
+					priv->virtqs[i].virtq,
+					&attr)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to modify virtq %d for LM.", i);
+				goto err;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -79,6 +95,7 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	int i;
@@ -90,10 +107,13 @@ mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 	if (!RTE_VHOST_NEED_LOG(features))
 		return 0;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
+		virtq = &priv->virtqs[i];
 		if (!priv->virtqs[i].virtq) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
 		} else {
+			pthread_mutex_lock(&virtq->virtq_lock);
 			ret = mlx5_vdpa_virtq_stop(priv, i);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 			if (ret) {
 				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
 					"log.", i);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index d4b4375c88..4cbf09784e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -237,19 +237,24 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 {
-	int ret = mlx5_vdpa_rqt_prepare(priv);
+	int ret;
 
+	pthread_mutex_lock(&priv->steer_update_lock);
+	ret = mlx5_vdpa_rqt_prepare(priv);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
+		pthread_mutex_unlock(&priv->steer_update_lock);
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
 		ret = mlx5_vdpa_rss_flows_create(priv);
 		if (ret) {
 			DRV_LOG(ERR, "Cannot create RSS flows.");
+			pthread_mutex_unlock(&priv->steer_update_lock);
 			return -1;
 		}
 	}
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	return 0;
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index ef5bf1ef01..c2c5386075 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -24,13 +24,17 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	pthread_mutex_lock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
 		return;
 	}
-	if (rte_intr_fd_get(virtq->intr_handle) < 0)
+	if (rte_intr_fd_get(virtq->intr_handle) < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
 	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
@@ -44,9 +48,14 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 		}
 		break;
 	}
-	if (nbytes < 0)
+	if (nbytes < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
@@ -66,6 +75,30 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Virtq must be locked before calling this function. */
+static void
+mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
+{
+	int ret = -EAGAIN;
+
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(virtq->intr_handle,
+					mlx5_vdpa_virtq_kick_handler, virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					virtq->index);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+				pthread_mutex_lock(&virtq->virtq_lock);
+			}
+		}
+		rte_intr_fd_set(virtq->intr_handle, -1);
+	}
+	rte_intr_instance_free(virtq->intr_handle);
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
@@ -75,6 +108,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		pthread_mutex_lock(&virtq->virtq_lock);
 		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
@@ -90,28 +124,17 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 		}
 		if (virtq->eqp.fw_qp)
 			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
-		while (ret == -EAGAIN) {
-			ret = rte_intr_callback_unregister(virtq->intr_handle,
-					mlx5_vdpa_virtq_kick_handler, virtq);
-			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
-					rte_intr_fd_get(virtq->intr_handle),
-					virtq->index);
-				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
-			}
-		}
-		rte_intr_fd_set(virtq->intr_handle, -1);
-	}
-	rte_intr_instance_free(virtq->intr_handle);
+	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
@@ -128,10 +151,15 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
-	for (i = 0; i < priv->nr_virtqs; i++)
-		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unset(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -250,7 +278,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
 		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, &virtq->eqp);
+				vq->size, vq->callfd, virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -420,7 +448,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	virtq->configured = 1;
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
 		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
@@ -537,6 +567,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	struct mlx5_vdpa_virtq *virtq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -556,9 +587,17 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++)
-		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-			goto error;
+	for (i = 0; i < nr_vring; i++) {
+		virtq = &priv->virtqs[i];
+		if (virtq->enable) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_vdpa_virtq_setup(priv, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 08/15] vdpa/mlx5: add multi-thread management for configuration
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (6 preceding siblings ...)
  2022-04-08  7:55 ` [RFC 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
@ 2022-04-08  7:55 ` Li Zhang
  2022-04-08  7:55 ` [RFC 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst          |  11 +++
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  41 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  36 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is suggested).
+    This value, if not 0, should be the same for all the devices;
+    the first prob will take it with the event_core for all the multi-thread configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'mlx5_vdpa_virtq.c',
         'mlx5_vdpa_steer.c',
         'mlx5_vdpa_lm.c',
+        'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
         '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e99c86b3d6..eace0e4c9e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 			DRV_LOG(WARNING, "Invalid event_core %s.", val);
 		else
 			priv->event_core = tmp;
+	} else if (strcmp(key, "max_conf_threads") == 0) {
+		if (tmp) {
+			priv->use_c_thread = true;
+			if (!conf_thread_mng.initializer_priv) {
+				conf_thread_mng.initializer_priv = priv;
+				if (tmp > MLX5_VDPA_MAX_C_THRD) {
+					DRV_LOG(WARNING,
+				"Invalid max_conf_threads %s "
+				"and set max_conf_threads to %d",
+				val, MLX5_VDPA_MAX_C_THRD);
+					tmp = MLX5_VDPA_MAX_C_THRD;
+				}
+				conf_thread_mng.max_thrds = tmp;
+			} else if (tmp != conf_thread_mng.max_thrds) {
+				DRV_LOG(WARNING,
+	"max_conf_threads is PMD argument and not per device, "
+	"only the first device configuration set it, current value is %d "
+	"and will not be changed to %d.",
+				conf_thread_mng.max_thrds, (int)tmp);
+			}
+		} else {
+			priv->use_c_thread = false;
+		}
 	} else if (strcmp(key, "hw_latency_mode") == 0) {
 		priv->hw_latency_mode = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		"hw_max_latency_us",
 		"hw_max_pending_comp",
 		"no_traffic_time",
+		"queue_size",
+		"queues",
+		"max_conf_threads",
 		NULL,
 	};
 
@@ -725,6 +753,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
+	if (priv->use_c_thread) {
+		if (conf_thread_mng.initializer_priv == priv)
+			if (mlx5_vdpa_mult_threads_create(priv->event_core))
+				goto error;
+		__atomic_fetch_add(&conf_thread_mng.refcnt, 1,
+			__ATOMIC_RELAXED);
+	}
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -739,6 +774,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
 error:
+	if (conf_thread_mng.initializer_priv == priv)
+		mlx5_vdpa_mult_threads_destroy(false);
 	if (priv)
 		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
@@ -806,6 +843,10 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
+	if (priv->use_c_thread)
+		if (__atomic_fetch_sub(&conf_thread_mng.refcnt,
+			1, __ATOMIC_RELAXED) == 1)
+			mlx5_vdpa_mult_threads_destroy(true);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3fd5eefc5e..4e7c2557b7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -73,6 +73,22 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_MAX_C_THRD 256
+
+/* Generic mlx5_vdpa_c_thread information. */
+struct mlx5_vdpa_c_thread {
+	pthread_t tid;
+};
+
+struct mlx5_vdpa_conf_thread_mng {
+	void *initializer_priv;
+	uint32_t refcnt;
+	uint32_t max_thrds;
+	pthread_mutex_t cthrd_lock;
+	struct mlx5_vdpa_c_thread cthrd[MLX5_VDPA_MAX_C_THRD];
+};
+extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -126,6 +142,7 @@ enum mlx5_dev_state {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
+	bool use_c_thread;
 	enum mlx5_dev_state state;
 	rte_spinlock_t db_lock;
 	pthread_mutex_t steer_update_lock;
@@ -496,4 +513,23 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create configuration multi-threads resource
+ *
+ * @param[in] cpu_core
+ *   CPU core number to set configuration threads affinity to.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_vdpa_mult_threads_create(int cpu_core);
+
+/**
+ * Destroy configuration multi-threads resource
+ *
+ */
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
new file mode 100644
index 0000000000..ba7d8b63b3
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+#include <string.h>
+#include <unistd.h>
+#include <sys/eventfd.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_io.h>
+#include <rte_alarm.h>
+#include <rte_tailq.h>
+#include <rte_ring_elem.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static void *
+mlx5_vdpa_c_thread_handle(void *arg)
+{
+	/* To be added later. */
+	return arg;
+}
+
+static void
+mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
+{
+	if (conf_thread_mng.cthrd[thrd_idx].tid) {
+		pthread_cancel(conf_thread_mng.cthrd[thrd_idx].tid);
+		pthread_join(conf_thread_mng.cthrd[thrd_idx].tid, NULL);
+		conf_thread_mng.cthrd[thrd_idx].tid = 0;
+		if (need_unlock)
+			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	}
+}
+
+static int
+mlx5_vdpa_c_thread_create(int cpu_core)
+{
+	const struct sched_param sp = {
+		.sched_priority = sched_get_priority_max(SCHED_RR),
+	};
+	rte_cpuset_t cpuset;
+	pthread_attr_t attr;
+	uint32_t thrd_idx;
+	char name[32];
+	int ret;
+
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_attr_init(&attr);
+	ret = pthread_attr_setschedpolicy(&attr, SCHED_RR);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread sched policy = RR.");
+		goto c_thread_err;
+	}
+	ret = pthread_attr_setschedparam(&attr, &sp);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread priority.");
+		goto c_thread_err;
+	}
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++) {
+		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
+				&attr, mlx5_vdpa_c_thread_handle,
+				(void *)&conf_thread_mng);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create vdpa multi-threads %d.",
+					thrd_idx);
+			goto c_thread_err;
+		}
+		CPU_ZERO(&cpuset);
+		if (cpu_core != -1)
+			CPU_SET(cpu_core, &cpuset);
+		else
+			cpuset = rte_lcore_cpuset(rte_get_main_lcore());
+		ret = pthread_setaffinity_np(
+				conf_thread_mng.cthrd[thrd_idx].tid,
+				sizeof(cpuset), &cpuset);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set thread affinity for "
+			"vdpa multi-threads %d.", thrd_idx);
+			goto c_thread_err;
+		}
+		snprintf(name, sizeof(name), "vDPA-mthread-%d", thrd_idx);
+		ret = pthread_setname_np(
+				conf_thread_mng.cthrd[thrd_idx].tid, name);
+		if (ret)
+			DRV_LOG(ERR, "Failed to set vdpa multi-threads name %s.",
+					name);
+		else
+			DRV_LOG(DEBUG, "Thread name: %s.", name);
+	}
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+c_thread_err:
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, false);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return -1;
+}
+
+int
+mlx5_vdpa_mult_threads_create(int cpu_core)
+{
+	pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	if (mlx5_vdpa_c_thread_create(cpu_core)) {
+		DRV_LOG(ERR, "Cannot create vDPA configuration threads.");
+		mlx5_vdpa_mult_threads_destroy(false);
+		return -1;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock)
+{
+	uint32_t thrd_idx;
+
+	if (!conf_thread_mng.initializer_priv)
+		return;
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, need_unlock);
+	pthread_mutex_destroy(&conf_thread_mng.cthrd_lock);
+	memset(&conf_thread_mng, 0, sizeof(struct mlx5_vdpa_conf_thread_mng));
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 2b0f5936d1..b45fbac146 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -507,7 +507,7 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 	pthread_attr_t attr;
 	char name[16];
 	const struct sched_param sp = {
-		.sched_priority = sched_get_priority_max(SCHED_RR),
+		.sched_priority = sched_get_priority_max(SCHED_RR) - 1,
 	};
 
 	if (!priv->eventc)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c2c5386075..b884da4ded 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -43,7 +43,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 			    errno == EWOULDBLOCK ||
 			    errno == EAGAIN)
 				continue;
-			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s.",
 				virtq->index, strerror(errno));
 		}
 		break;
@@ -57,7 +57,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	rte_spinlock_unlock(&priv->db_lock);
 	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
-		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling.",
 			priv->vid, virtq->index);
 		return;
 	}
@@ -215,7 +215,7 @@ mlx5_vdpa_virtq_query(struct mlx5_vdpa_priv *priv, int index)
 		return -1;
 	}
 	if (attr.state == MLX5_VIRTQ_STATE_ERROR)
-		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu",
+		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu.",
 			priv->vid, index, attr.error_type);
 	return 0;
 }
@@ -377,7 +377,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0");
+		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
 	} else {
 		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 09/15] vdpa/mlx5: add task ring for MT management
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (7 preceding siblings ...)
  2022-04-08  7:55 ` [RFC 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
@ 2022-04-08  7:55 ` Li Zhang
  2022-04-08  7:56 ` [RFC 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:55 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  17 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 118 +++++++++++++++++++++++++-
 2 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+	struct mlx5_vdpa_priv *priv;
+	uint32_t *remaining_cnt;
+	uint32_t *err_cnt;
+	uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
 	pthread_t tid;
+	struct rte_ring *rng;
+	pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..8475d7788a 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,106 @@
 #include <rte_alarm.h>
 #include <rte_tailq.h>
 #include <rte_ring_elem.h>
+#include <rte_ring_peek.h>
 
 #include <mlx5_common.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+	void **obj, uint32_t n, uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_elem_start(r, obj,
+		sizeof(struct mlx5_vdpa_task), n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_elem_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+	void * const *obj, uint32_t n, uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_elem_finish(r, obj,
+		sizeof(struct mlx5_vdpa_task), n);
+	return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num)
+{
+	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t i;
+
+	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+	for (i = 0 ; i < num; i++) {
+		task[i].priv = priv;
+		/* To be added later. */
+	}
+	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+		return -1;
+	for (i = 0 ; i < num; i++)
+		if (task[i].remaining_cnt)
+			__atomic_fetch_add(task[i].remaining_cnt, 1,
+				__ATOMIC_RELAXED);
+	/* wake up conf thread. */
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-	/* To be added later. */
-	return arg;
+	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_priv *priv;
+	struct mlx5_vdpa_task task;
+	struct rte_ring *rng;
+	uint32_t thrd_idx;
+	uint32_t task_num;
+
+	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+		thrd_idx++)
+		if (multhrd->cthrd[thrd_idx].tid == thread_id)
+			break;
+	if (thrd_idx >= multhrd->max_thrds) {
+		DRV_LOG(ERR, "Invalid thread_id 0x%lx in vdpa multi-thread",
+			thread_id);
+		return NULL;
+	}
+	rng = multhrd->cthrd[thrd_idx].rng;
+	while (1) {
+		task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+			(void **)&task, 1, NULL);
+		if (!task_num) {
+			/* No task and condition wait. */
+			pthread_mutex_lock(&multhrd->cthrd_lock);
+			pthread_cond_wait(
+				&multhrd->cthrd[thrd_idx].c_cond,
+				&multhrd->cthrd_lock);
+			pthread_mutex_unlock(&multhrd->cthrd_lock);
+		}
+		priv = task.priv;
+		if (priv == NULL)
+			continue;
+		__atomic_fetch_sub(task.remaining_cnt,
+			1, __ATOMIC_RELAXED);
+		/* To be added later. */
+	}
+	return NULL;
 }
 
 static void
@@ -34,6 +123,10 @@ mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
 		if (need_unlock)
 			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
 	}
+	if (conf_thread_mng.cthrd[thrd_idx].rng) {
+		rte_ring_free(conf_thread_mng.cthrd[thrd_idx].rng);
+		conf_thread_mng.cthrd[thrd_idx].rng = NULL;
+	}
 }
 
 static int
@@ -45,6 +138,7 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 	rte_cpuset_t cpuset;
 	pthread_attr_t attr;
 	uint32_t thrd_idx;
+	uint32_t ring_num;
 	char name[32];
 	int ret;
 
@@ -60,8 +154,26 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 		DRV_LOG(ERR, "Failed to set thread priority.");
 		goto c_thread_err;
 	}
+	ring_num = MLX5_VDPA_MAX_TASKS_PER_THRD / conf_thread_mng.max_thrds;
+	if (!ring_num) {
+		DRV_LOG(ERR, "Invaild ring number for thread.");
+		goto c_thread_err;
+	}
 	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
 		thrd_idx++) {
+		snprintf(name, sizeof(name), "vDPA-mthread-ring-%d",
+			thrd_idx);
+		conf_thread_mng.cthrd[thrd_idx].rng = rte_ring_create_elem(name,
+			sizeof(struct mlx5_vdpa_task), ring_num,
+			rte_socket_id(),
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ |
+			RING_F_EXACT_SZ);
+		if (!conf_thread_mng.cthrd[thrd_idx].rng) {
+			DRV_LOG(ERR,
+			"Failed to create vdpa multi-threads %d ring.",
+			thrd_idx);
+			goto c_thread_err;
+		}
 		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
 				&attr, mlx5_vdpa_c_thread_handle,
 				(void *)&conf_thread_mng);
@@ -91,6 +203,8 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 					name);
 		else
 			DRV_LOG(DEBUG, "Thread name: %s.", name);
+		pthread_cond_init(&conf_thread_mng.cthrd[thrd_idx].c_cond,
+			NULL);
 	}
 	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
 	return 0;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 10/15] vdpa/mlx5: add MT task for VM memory registration
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (8 preceding siblings ...)
  2022-04-08  7:55 ` [RFC 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
@ 2022-04-08  7:56 ` Li Zhang
  2022-04-08  7:56 ` [RFC 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:56 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the master thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM proccess and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 268 +++++++++++++++++---------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 256 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index eace0e4c9e..8dd8e6a2a0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
 	union {
 		struct ibv_mr *mr;
 		struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0xFFFF
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+	MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
 	struct mlx5_vdpa_priv *priv;
+	enum mlx5_vdpa_task_type type;
 	uint32_t *remaining_cnt;
 	uint32_t *err_cnt;
 	uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+	struct rte_vhost_memory *vmem;
+	uint32_t entries_num;
+	uint64_t gcd;
+	uint64_t size;
+	uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
-	struct rte_vhost_memory *vmem;
+	struct mlx5_vdpa_vmem_info vmem_info;
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_event_channel *err_chnl;
 	struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
 	uint8_t num_lag_ports;
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
+	uint16_t last_c_thrd_idx;
+	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
 	void *virtq_db_addr;
 	struct mlx5_pmd_wrapped_mr lm_mr;
-	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+	struct mlx5_vdpa_query_mr **mrs;
 	struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num);
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 8475d7788a..22e24f7e75 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num)
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
+		void **task_data, uint32_t num)
 {
 	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
 	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t *data = (uint32_t *)task_data;
 	uint32_t i;
 
 	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
 	for (i = 0 ; i < num; i++) {
 		task[i].priv = priv;
 		/* To be added later. */
+		task[i].type = task_type;
+		task[i].remaining_cnt = remaining_cnt;
+		task[i].err_cnt = err_cnt;
+		task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -71,6 +78,23 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time)
+{
+	/* Check and wait all tasks done. */
+	while (__atomic_load_n(remaining_cnt,
+		__ATOMIC_RELAXED) != 0) {
+		rte_delay_us_sleep(sleep_time);
+	}
+	if (__atomic_load_n(err_cnt,
+		__ATOMIC_RELAXED)) {
+		DRV_LOG(ERR, "Tasks done with error.");
+		return true;
+	}
+	return false;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
@@ -81,6 +105,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct rte_ring *rng;
 	uint32_t thrd_idx;
 	uint32_t task_num;
+	int ret;
 
 	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
 		thrd_idx++)
@@ -102,13 +127,29 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&multhrd->cthrd[thrd_idx].c_cond,
 				&multhrd->cthrd_lock);
 			pthread_mutex_unlock(&multhrd->cthrd_lock);
+			continue;
 		}
 		priv = task.priv;
 		if (priv == NULL)
 			continue;
-		__atomic_fetch_sub(task.remaining_cnt,
+		switch (task.type) {
+		case MLX5_VDPA_TASK_REG_MR:
+			ret = mlx5_vdpa_register_mr(priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mr %d.", task.idx);
+				__atomic_fetch_add(task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
+		default:
+			DRV_LOG(ERR, "Invalid vdpa task type %d.",
+			task.type);
+			break;
+		}
+		if (task.remaining_cnt)
+			__atomic_fetch_sub(task.remaining_cnt,
 			1, __ATOMIC_RELAXED);
-		/* To be added later. */
 	}
 	return NULL;
 }
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index d6e3dd664b..3d17ca88af 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -18,24 +18,30 @@ void
 mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_vdpa_query_mr *entry;
-	struct mlx5_vdpa_query_mr *next;
+	uint32_t i;
 
-	entry = SLIST_FIRST(&priv->mr_list);
-	while (entry) {
-		next = SLIST_NEXT(entry, next);
-		if (entry->is_indirect)
-			claim_zero(mlx5_devx_cmd_destroy(entry->mkey));
-		else
-			claim_zero(mlx5_glue->dereg_mr(entry->mr));
-		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
-		rte_free(entry);
-		entry = next;
+	if (priv->mrs) {
+		for (i = 0; i < priv->num_mrs; i++) {
+			entry = (struct mlx5_vdpa_query_mr *)&priv->mrs[i];
+			if (entry->is_indirect) {
+				if (entry->mkey)
+					claim_zero(
+					mlx5_devx_cmd_destroy(entry->mkey));
+			} else {
+				if (entry->mr)
+					claim_zero(
+					mlx5_glue->dereg_mr(entry->mr));
+			}
+		}
+		rte_free(priv->mrs);
+		priv->mrs = NULL;
+		priv->num_mrs = 0;
 	}
-	SLIST_INIT(&priv->mr_list);
-	if (priv->vmem) {
-		free(priv->vmem);
-		priv->vmem = NULL;
+	if (priv->vmem_info.vmem) {
+		free(priv->vmem_info.vmem);
+		priv->vmem_info.vmem = NULL;
 	}
+	priv->gpa_mkey_index = 0;
 }
 
 static int
@@ -167,72 +173,29 @@ mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
-/*
- * The target here is to group all the physical memory regions of the
- * virtio device in one indirect mkey.
- * For KLM Fixed Buffer Size mode (HW find the translation entry in one
- * read according to the guest physical address):
- * All the sub-direct mkeys of it must be in the same size, hence, each
- * one of them should be in the GCD size of all the virtio memory
- * regions and the holes between them.
- * For KLM mode (each entry may be in different size so HW must iterate
- * the entries):
- * Each virtio memory region and each hole between them have one entry,
- * just need to cover the maximum allowed size(2G) by splitting entries
- * which their associated memory regions are bigger than 2G.
- * It means that each virtio memory region may be mapped to more than
- * one direct mkey in the 2 modes.
- * All the holes of invalid memory between the virtio memory regions
- * will be mapped to the null memory region for security.
- */
-int
-mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+static int
+mlx5_vdpa_create_indirect_mkey(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_devx_mkey_attr mkey_attr;
-	struct mlx5_vdpa_query_mr *entry = NULL;
-	struct rte_vhost_mem_region *reg = NULL;
-	uint8_t mode = 0;
-	uint32_t entries_num = 0;
-	uint32_t i;
-	uint64_t gcd = 0;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	uint8_t mode = priv->vmem_info.mode;
+	uint32_t entries_num = priv->vmem_info.entries_num;
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_klm klm_array[entries_num];
+	uint64_t gcd = priv->vmem_info.gcd;
+	int ret = -rte_errno;
 	uint64_t klm_size;
-	uint64_t mem_size;
-	uint64_t k;
 	int klm_index = 0;
-	int ret;
-	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
-			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
-	struct mlx5_klm klm_array[entries_num];
+	uint64_t k;
+	uint32_t i;
 
-	if (!mem)
-		return -rte_errno;
-	if (priv->vmem != NULL) {
-		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
-			/* VM memory not changed, reuse resources. */
-			free(mem);
-			return 0;
-		}
-		mlx5_vdpa_mem_dereg(priv);
-	}
-	priv->vmem = mem;
+	/* If it is the last entry, create indirect mkey. */
 	for (i = 0; i < mem->nregions; i++) {
+		entry = &mrs[i];
 		reg = &mem->regions[i];
-		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-		if (!entry) {
-			ret = -ENOMEM;
-			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
-			goto error;
-		}
-		entry->mr = mlx5_glue->reg_mr_iova(priv->cdev->pd,
-				       (void *)(uintptr_t)(reg->host_user_addr),
-				       reg->size, reg->guest_phys_addr,
-				       IBV_ACCESS_LOCAL_WRITE);
-		if (!entry->mr) {
-			DRV_LOG(ERR, "Failed to create direct Mkey.");
-			ret = -rte_errno;
-			goto error;
-		}
-		entry->is_indirect = 0;
 		if (i > 0) {
 			uint64_t sadd;
 			uint64_t empty_region_sz = reg->guest_phys_addr -
@@ -265,11 +228,10 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 			klm_array[klm_index].address = reg->guest_phys_addr + k;
 			klm_index++;
 		}
-		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	}
 	memset(&mkey_attr, 0, sizeof(mkey_attr));
 	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
-	mkey_attr.size = mem_size;
+	mkey_attr.size = priv->vmem_info.size;
 	mkey_attr.pd = priv->cdev->pdn;
 	mkey_attr.umem_id = 0;
 	/* Must be zero for KLM mode. */
@@ -278,25 +240,159 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	mkey_attr.pg_access = 0;
 	mkey_attr.klm_array = klm_array;
 	mkey_attr.klm_num = klm_index;
-	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-	if (!entry) {
-		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
-		ret = -ENOMEM;
-		goto error;
-	}
+	entry = &mrs[mem->nregions];
 	entry->mkey = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &mkey_attr);
 	if (!entry->mkey) {
 		DRV_LOG(ERR, "Failed to create indirect Mkey.");
-		ret = -rte_errno;
-		goto error;
+		rte_errno = -ret;
+		return ret;
 	}
 	entry->is_indirect = 1;
-	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	priv->gpa_mkey_index = entry->mkey->id;
 	return 0;
+}
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	void *mrs;
+	uint8_t mode = 0;
+	int ret = -rte_errno;
+	uint32_t i, thrd_idx, data[1];
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			(priv->vid, &mode, &priv->vmem_info.size,
+			&priv->vmem_info.gcd, &priv->vmem_info.entries_num);
+
+	if (!mem)
+		return -rte_errno;
+	if (priv->vmem_info.vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem_info.vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
+	priv->vmem_info.vmem = mem;
+	priv->vmem_info.mode = mode;
+	priv->num_mrs = mem->nregions;
+	if (!priv->num_mrs || priv->num_mrs >= MLX5_VDPA_MAX_MRS) {
+		DRV_LOG(ERR,
+		"Invaild number of memory regions.");
+		goto error;
+	}
+	/* The last one is indirect mkey entry. */
+	priv->num_mrs++;
+	mrs = rte_zmalloc("mlx5 vDPA memory regions",
+		sizeof(struct mlx5_vdpa_query_mr) * priv->num_mrs, 0);
+	priv->mrs = mrs;
+	if (!priv->mrs) {
+		DRV_LOG(ERR, "Failed to allocate private memory regions.");
+		goto error;
+	}
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[mem->nregions];
+
+		for (i = 0; i < mem->nregions; i++) {
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_REG_MR,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR,
+				"Fail to add task mem region (%d)", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			ret = mlx5_vdpa_register_mr(priv,
+					main_task_idx[i]);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 100)) {
+			DRV_LOG(ERR,
+			"Failed to wait register mem region tasks ready.");
+			goto error;
+		}
+	} else {
+		for (i = 0; i < mem->nregions; i++) {
+			ret = mlx5_vdpa_register_mr(priv, i);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+	}
+	ret = mlx5_vdpa_create_indirect_mkey(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create indirect mkey .");
+		goto error;
+	}
+	return 0;
 error:
-	rte_free(entry);
 	mlx5_vdpa_mem_dereg(priv);
 	rte_errno = -ret;
 	return ret;
 }
+
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx)
+{
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	int ret;
+
+	reg = &mem->regions[idx];
+	entry = &mrs[idx];
+	entry->mr = mlx5_glue->reg_mr_iova
+				      (priv->cdev->pd,
+				       (void *)(uintptr_t)(reg->host_user_addr),
+				       reg->size, reg->guest_phys_addr,
+				       IBV_ACCESS_LOCAL_WRITE);
+	if (!entry->mr) {
+		DRV_LOG(ERR, "Failed to create direct Mkey.");
+		ret = -rte_errno;
+		return ret;
+	}
+	entry->is_indirect = 0;
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index b884da4ded..3be09f218f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -350,21 +350,21 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 	}
 	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
 			return -1;
 		}
 		attr->desc_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
 			return -1;
 		}
 		attr->used_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 11/15] vdpa/mlx5: add virtq creation task for MT management
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (9 preceding siblings ...)
  2022-04-08  7:56 ` [RFC 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
@ 2022-04-08  7:56 ` Li Zhang
  2022-04-08  7:56 ` [RFC 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:56 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 148 +++++++++++++++++++-------
 4 files changed, 133 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
+	MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
-	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
 	uint8_t notifier_state;
-	bool stopped;
 	uint32_t configured:1;
+	uint32_t enable:1;
+	uint32_t stopped:1;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
 		enum mlx5_vdpa_task_type task_type,
-		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
 		void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 22e24f7e75..a2d1ddb1e1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
 	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
 	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
@@ -142,6 +143,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__ATOMIC_RELAXED);
 			}
 			break;
+		case MLX5_VDPA_TASK_SETUP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_setup(priv,
+				task.idx, false);
+			if (ret) {
+				DRV_LOG(ERR,
+					"Failed to setup virtq %d.", task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1, __ATOMIC_RELAXED);
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
 			goto unlock;
-		virtq->stopped = true;
+		virtq->stopped = 1;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
 			goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 3be09f218f..127b1cee7f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -108,8 +108,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		if (virtq->index != i)
+			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
-		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -128,7 +129,6 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
@@ -188,7 +188,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
 		return -1;
-	virtq->stopped = true;
+	virtq->stopped = 1;
 	DRV_LOG(DEBUG, "vid %u virtq %u was stopped.", priv->vid, index);
 	return mlx5_vdpa_virtq_query(priv, index);
 }
@@ -408,7 +408,38 @@ mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_doorbell_setup(struct mlx5_vdpa_virtq *virtq,
+		struct rte_vhost_vring *vq, int index)
+{
+	virtq->intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (virtq->intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		return -1;
+	}
+	if (rte_intr_fd_set(virtq->intr_handle, vq->kickfd))
+		return -1;
+	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
+		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
+	} else {
+		if (rte_intr_type_set(virtq->intr_handle,
+			RTE_INTR_HANDLE_EXT))
+			return -1;
+		if (rte_intr_callback_register(virtq->intr_handle,
+			mlx5_vdpa_virtq_kick_handler, virtq)) {
+			rte_intr_fd_set(virtq->intr_handle, -1);
+			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
+				index);
+			return -1;
+		}
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			rte_intr_fd_get(virtq->intr_handle), index);
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	struct rte_vhost_vring vq;
@@ -452,33 +483,11 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
-	virtq->intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (virtq->intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
-
-	if (rte_intr_fd_set(virtq->intr_handle, vq.kickfd))
-		goto error;
-
-	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
-		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-	} else {
-		if (rte_intr_type_set(virtq->intr_handle, RTE_INTR_HANDLE_EXT))
-			goto error;
-
-		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_kick_handler,
-					       virtq)) {
-			rte_intr_fd_set(virtq->intr_handle, -1);
+	if (reg_kick) {
+		if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, index)) {
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
-		} else {
-			DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				index);
 		}
 	}
 	/* Subscribe virtq error event. */
@@ -494,7 +503,6 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		rte_errno = errno;
 		goto error;
 	}
-	virtq->stopped = false;
 	/* Initial notification to ask Qemu handling completed buffers. */
 	if (virtq->eqp.cq.callfd != -1)
 		eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
@@ -564,10 +572,12 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t i;
-	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -587,16 +597,82 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		virtq = &priv->virtqs[i];
-		if (virtq->enable) {
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[nr_vring];
+
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_SETUP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+						"task setup virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (mlx5_vdpa_virtq_setup(priv, i)) {
+			if (mlx5_vdpa_virtq_setup(priv,
+				main_task_idx[i], false)) {
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			goto error;
+		}
+		for (i = 0; i < nr_vring; i++) {
+			/* Setup doorbell mapping in order for Qume. */
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->enable || !virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			if (rte_vhost_get_vhost_vring(priv->vid, i, &vq)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to register virtq %d interrupt.", i);
+				goto error;
+			}
+		}
+	} else {
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			if (virtq->enable) {
+				pthread_mutex_lock(&virtq->virtq_lock);
+				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
+					goto error;
+				}
+				pthread_mutex_unlock(&virtq->virtq_lock);
+			}
+		}
 	}
 	return 0;
 error:
@@ -660,7 +736,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		mlx5_vdpa_virtq_unset(virtq);
 	}
 	if (enable) {
-		ret = mlx5_vdpa_virtq_setup(priv, index);
+		ret = mlx5_vdpa_virtq_setup(priv, index, true);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to setup virtq %d.", index);
 			return ret;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 12/15] vdpa/mlx5: add virtq LM log task
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (10 preceding siblings ...)
  2022-04-08  7:56 ` [RFC 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
@ 2022-04-08  7:56 ` Li Zhang
  2022-04-08  7:56 ` [RFC 13/15] vdpa/mlx5: add device close task Li Zhang
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:56 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 ++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 90 ++++++++++++++++++++++-----
 3 files changed, 110 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
+	MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index a2d1ddb1e1..0e54226a90 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
+	uint64_t features;
 	uint32_t thrd_idx;
 	uint32_t task_num;
 	int ret;
@@ -156,6 +157,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_STOP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			ret = rte_vhost_get_negotiated_features(
+				priv->vid, &features);
+			if (ret) {
+				DRV_LOG(ERR,
+		"Failed to get negotiated features virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(
+				priv->vid, task.idx, 0,
+			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index efebf364d0..07575ea8a9 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,95 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
-	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-	int i;
+	int ret;
 
+	ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to get negotiated features.");
 		return -1;
 	}
-	if (!RTE_VHOST_NEED_LOG(features))
-		return 0;
-	for (i = 0; i < priv->nr_virtqs; ++i) {
-		virtq = &priv->virtqs[i];
-		if (!priv->virtqs[i].virtq) {
-			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-		} else {
+	if (priv->use_c_thread && priv->nr_virtqs) {
+		uint32_t main_task_idx[priv->nr_virtqs];
+
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->configured) {
+				DRV_LOG(DEBUG,
+				"virtq %d is invalid for LM log.", i);
+				continue;
+			}
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_STOP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+					"task stop virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			ret = mlx5_vdpa_virtq_stop(priv, i);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					main_task_idx[i]);
+			if (ret) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.", i);
+				return -1;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			return -1;
+		}
+	} else {
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(DEBUG,
+				"virtq %d is invalid for LM log.", i);
+				continue;
+			}
+			ret = mlx5_vdpa_virtq_stop(priv, i);
 			if (ret) {
-				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
-					"log.", i);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d for LM log.", i);
 				return -1;
 			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
-		rte_vhost_log_used_vring(priv->vid, i, 0,
-			      MLX5_VDPA_USED_RING_LEN(priv->virtqs[i].vq_size));
 	}
 	return 0;
 }
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 13/15] vdpa/mlx5: add device close task
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (11 preceding siblings ...)
  2022-04-08  7:56 ` [RFC 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
@ 2022-04-08  7:56 ` Li Zhang
  2022-04-08  7:56 ` [RFC 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:56 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 51 +++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  8 +++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 ++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 ++++++++
 4 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 8dd8e6a2a0..d349682a83 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
 	/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t timeout = 0;
+
+	/* Check and wait all close tasks done. */
+	while (__atomic_load_n(&priv->dev_close_progress,
+		__ATOMIC_RELAXED) != 0 && timeout < 1000) {
+		rte_delay_us_sleep(10000);
+		timeout++;
+	}
+	if (priv->dev_close_progress) {
+		DRV_LOG(ERR,
+		"Failed to wait close device tasks done vid %d.",
+		priv->vid);
+		return true;
+	}
+	return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	if (priv->use_c_thread) {
+		if (priv->last_c_thrd_idx >=
+			(conf_thread_mng.max_thrds - 1))
+			priv->last_c_thrd_idx = 0;
+		else
+			priv->last_c_thrd_idx++;
+		__atomic_store_n(&priv->dev_close_progress,
+			1, __ATOMIC_RELAXED);
+		if (mlx5_vdpa_task_add(priv,
+			priv->last_c_thrd_idx,
+			MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+			NULL, NULL, NULL, 1)) {
+			DRV_LOG(ERR,
+			"Fail to add dev close task. ");
+			goto single_thrd;
+		}
+		priv->state = MLX5_VDPA_STATE_PROBED;
+		DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+		return ret;
+	}
+single_thrd:
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	priv->state = MLX5_VDPA_STATE_PROBED;
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
+	__atomic_store_n(&priv->dev_close_progress, 0,
+		__ATOMIC_RELAXED);
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
+	if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+		return -1;
 	priv->vid = vid;
 	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
@@ -839,6 +884,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 		mlx5_vdpa_dev_close(priv->vid);
+	if (priv->use_c_thread)
+		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
+	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -206,6 +207,7 @@ struct mlx5_vdpa_priv {
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
 	uint16_t last_c_thrd_idx;
+	uint16_t dev_close_progress;
 	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
@@ -578,4 +580,10 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
+void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 0e54226a90..07efa0cb16 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -63,7 +63,8 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		task[i].type = task_type;
 		task[i].remaining_cnt = remaining_cnt;
 		task[i].err_cnt = err_cnt;
-		task[i].idx = data[i];
+		if (data)
+			task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -190,6 +191,23 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT:
+			mlx5_vdpa_virtq_unreg_intr_handle_all(priv);
+			pthread_mutex_lock(&priv->steer_update_lock);
+			mlx5_vdpa_steer_unset(priv);
+			pthread_mutex_unlock(&priv->steer_update_lock);
+			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_drain_cq(priv);
+			if (priv->lm_mr.addr)
+				mlx5_os_wrapped_mkey_destroy(
+					&priv->lm_mr);
+			if (!priv->connected)
+				mlx5_vdpa_dev_cache_clean(priv);
+			priv->vid = 0;
+			__atomic_store_n(
+				&priv->dev_close_progress, 0,
+				__ATOMIC_RELAXED);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 127b1cee7f..c1281be5f2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -99,6 +99,20 @@ mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
 	rte_intr_instance_free(virtq->intr_handle);
 }
 
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
+
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unregister_intr_handle(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 14/15] vdpa/mlx5: add virtq sub-resources creation
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (12 preceding siblings ...)
  2022-04-08  7:56 ` [RFC 13/15] vdpa/mlx5: add device close task Li Zhang
@ 2022-04-08  7:56 ` Li Zhang
  2022-04-08  7:56 ` [RFC 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:56 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland, Yajun Wu

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 68 ++++++--------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  9 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 15 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +++++++++++++++++++++--------
 5 files changed, 117 insertions(+), 91 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d349682a83..eaca571e3e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -624,65 +624,37 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_vdpa_virtq *virtq;
+	uint32_t max_queues = priv->queues * 2;
 	uint32_t index;
-	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
 
 	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
 		index++) {
 		virtq = &priv->virtqs[index];
 		pthread_mutex_init(&virtq->virtq_lock, NULL);
 	}
-	if (!priv->queues)
+	if (!priv->queues || !priv->queue_size)
 		return 0;
-	for (index = 0; index < (priv->queues * 2); ++index) {
+	for (index = 0; index < max_queues; ++index)
+		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+			index))
+			goto error;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			goto error;
+	return 0;
+error:
+	for (index = 0; index < max_queues; ++index) {
 		virtq = &priv->virtqs[index];
-		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, virtq);
-
-		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-				index);
-			return -1;
-		}
-		if (priv->caps.queue_counters_valid) {
-			if (!virtq->counters)
-				virtq->counters =
-					mlx5_devx_cmd_create_virtio_q_counters
-						(priv->cdev->ctx);
-			if (!virtq->counters) {
-				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
-					" %d.", index);
-				return -1;
-			}
-		}
-		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-			uint32_t size;
-			void *buf;
-			struct mlx5dv_devx_umem *obj;
-
-			size = priv->caps.umems[i].a * priv->queue_size +
-					priv->caps.umems[i].b;
-			buf = rte_zmalloc(__func__, size, 4096);
-			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
-						" %u.", i, index);
-				return -1;
-			}
-			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-					size, IBV_ACCESS_LOCAL_WRITE);
-			if (obj == NULL) {
-				rte_free(buf);
-				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
-						i, index);
-				return -1;
-			}
-			virtq->umems[i].size = size;
-			virtq->umems[i].buf = buf;
-			virtq->umems[i].obj = obj;
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
-	return 0;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..00700261ec 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   The guest notification file descriptor.
  * @param[in/out] virtq
  *   Pointer to the virt-queue structure.
+ * @param[in] reset
+ *   If ture, it will reset event qp.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq);
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset);
 
 /**
  * Destroy an event QP and all its related resources.
@@ -403,11 +405,13 @@ void mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] is_dummy
+ *   If set, it is updated with dummy queue for prepare resource.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
-int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv);
+int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy);
 
 /**
  * Setup steering and all its related resources to enable RSS traffic from the
@@ -581,9 +585,14 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 void
-mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
-void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
 void
 mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index);
+int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
+void
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f782b6b832..c7be9d5f38 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -618,7 +618,7 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
-static int
+int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 {
 	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
@@ -638,7 +638,7 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq)
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset)
 {
 	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
@@ -649,11 +649,10 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 		/* Reuse existing resources. */
 		eqp->cq.callfd = callfd;
 		/* FW will set event qp to error state in q destroy. */
-		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+		if (reset && !mlx5_vdpa_qps2rst2rts(eqp))
 			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 					&eqp->sw_qp.db_rec[0]);
-			return 0;
-		}
+		return 0;
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index 4cbf09784e..f7f6dce45c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -57,7 +57,7 @@ mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
  * -1 on error.
  */
 static int
-mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int i;
 	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
@@ -67,15 +67,18 @@ mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 						      sizeof(uint32_t), 0);
 	uint32_t k = 0, j;
 	int ret = 0, num;
+	uint16_t nr_vring = is_dummy ? priv->queues * 2 : priv->nr_virtqs;
 
 	if (!attr) {
 		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	for (i = 0; i < nr_vring; i++) {
 		if (is_virtq_recvq(i, priv->nr_virtqs) &&
-		    priv->virtqs[i].enable && priv->virtqs[i].virtq) {
+			(is_dummy || (priv->virtqs[i].enable &&
+			priv->virtqs[i].configured)) &&
+			priv->virtqs[i].virtq) {
 			attr->rq_list[k] = priv->virtqs[i].virtq->id;
 			k++;
 		}
@@ -235,12 +238,12 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 }
 
 int
-mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int ret;
 
 	pthread_mutex_lock(&priv->steer_update_lock);
-	ret = mlx5_vdpa_rqt_prepare(priv);
+	ret = mlx5_vdpa_rqt_prepare(priv, is_dummy);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
@@ -261,7 +264,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
-	if (mlx5_vdpa_steer_update(priv))
+	if (mlx5_vdpa_steer_update(priv, false))
 		goto error;
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c1281be5f2..4a74738d9c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -143,10 +143,10 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-static int
+void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int ret = -EAGAIN;
+	int ret;
 
 	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
@@ -154,12 +154,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+		virtq->index = 0;
+		virtq->virtq = NULL;
+		virtq->configured = 0;
 	}
-	virtq->virtq = NULL;
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
-	return 0;
 }
 
 void
@@ -172,6 +172,9 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
+		if (i < (priv->queues * 2))
+			mlx5_vdpa_virtq_single_resource_prepare(
+					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 	priv->features = 0;
@@ -255,7 +258,8 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 static int
 mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		struct mlx5_devx_virtq_attr *attr,
-		struct rte_vhost_vring *vq, int index)
+		struct rte_vhost_vring *vq,
+		int index, bool is_prepare)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	uint64_t gpa;
@@ -274,11 +278,15 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
 			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
 			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
-	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr->virtio_version_1_0 =
+	attr->tso_ipv4 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 = is_prepare ? 1 :
 		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
 	attr->q_type =
 		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
@@ -287,12 +295,12 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr->event_mode = vq->callfd != -1 ||
+	attr->event_mode = is_prepare || vq->callfd != -1 ||
 	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, virtq);
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq->size,
+				vq->callfd, virtq, !virtq->virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -317,7 +325,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	if (virtq->virtq) {
+	if (!virtq->virtq) {
 		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 			uint32_t size;
 			void *buf;
@@ -342,7 +350,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			buf = rte_zmalloc(__func__,
 				size, 4096);
 			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq."
 				" %u.", i, index);
 				return -1;
 			}
@@ -363,7 +371,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			attr->umems[i].size = virtq->umems[i].size;
 		}
 	}
-	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (!is_prepare && attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
@@ -386,21 +394,23 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid,
+	if (!is_prepare) {
+		ret = rte_vhost_get_vring_base(priv->vid,
 			index, &last_avail_idx, &last_used_idx);
-	if (ret) {
-		last_avail_idx = 0;
-		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
-	} else {
-		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		if (ret) {
+			last_avail_idx = 0;
+			last_used_idx = 0;
+			DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
+		} else {
+			DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
+		}
 	}
 	attr->hw_available_index = last_avail_idx;
 	attr->hw_used_index = last_used_idx;
 	attr->q_size = vq->size;
-	attr->mkey = priv->gpa_mkey_index;
+	attr->mkey = is_prepare ? 0 : priv->gpa_mkey_index;
 	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
 	attr->queue_index = index;
 	attr->pd = priv->cdev->pdn;
@@ -413,6 +423,39 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq = {
+		.size = priv->queue_size,
+		.callfd = -1,
+	};
+	int ret;
+
+	virtq = &priv->virtqs[index];
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	virtq->configured = 0;
+	virtq->virtq = NULL;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr, &vq, index, true);
+	if (ret) {
+		DRV_LOG(ERR,
+		"Cannot prepare setup resource for virtq %d.", index);
+		return true;
+	}
+	if (mlx5_vdpa_is_modify_virtq_supported(priv)) {
+		virtq->virtq =
+		mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+		virtq->priv = priv;
+		if (!virtq->virtq)
+			return true;
+	}
+	return false;
+}
+
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 {
@@ -470,7 +513,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 	virtq->priv = priv;
 	virtq->stopped = 0;
 	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
-				&vq, index);
+				&vq, index, false);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to setup update virtq attr"
 			" %d.", index);
@@ -742,7 +785,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to disable steering "
 					"for virtq %d.", index);
@@ -757,7 +800,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		}
 		virtq->enable = 1;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to enable steering "
 					"for virtq %d.", index);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [RFC 15/15] vdpa/mlx5: prepare virtqueue resource creation
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (13 preceding siblings ...)
  2022-04-08  7:56 ` [RFC 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
@ 2022-04-08  7:56 ` Li Zhang
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-04-08  7:56 UTC (permalink / raw)
  To: matan, viacheslavo, orika, thomas; +Cc: dev, rasland

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 69 +++++++++++++++++++++++----
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 14 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 35 ++++++++++----
 4 files changed, 104 insertions(+), 21 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index eaca571e3e..15ce30bc49 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,13 +275,17 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(int vid, bool release_resource)
 {
 	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-	struct mlx5_vdpa_priv *priv =
-		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	struct mlx5_vdpa_priv *priv;
 	int ret = 0;
 
+	if (!vdev) {
+		DRV_LOG(ERR, "Invalid vDPA device.");
+		return -1;
+	}
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
 	if (priv == NULL) {
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
@@ -291,7 +295,7 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
-	if (priv->use_c_thread) {
+	if (priv->use_c_thread && !release_resource) {
 		if (priv->last_c_thrd_idx >=
 			(conf_thread_mng.max_thrds - 1))
 			priv->last_c_thrd_idx = 0;
@@ -315,7 +319,7 @@ mlx5_vdpa_dev_close(int vid)
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, release_resource);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +333,12 @@ mlx5_vdpa_dev_close(int vid)
 	return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	return _internal_mlx5_vdpa_dev_close(vid, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,8 +634,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
 	uint32_t max_queues = priv->queues * 2;
-	uint32_t index;
+	uint32_t index, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 
 	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
@@ -635,10 +646,48 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 	}
 	if (!priv->queues || !priv->queue_size)
 		return 0;
-	for (index = 0; index < max_queues; ++index)
-		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-			index))
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[max_queues];
+
+		for (index = 0; index < max_queues; ++index) {
+			virtq = &priv->virtqs[index];
+			thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = index;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = index;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_PREPARE_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+				"task prepare virtq (%d).", index);
+				main_task_idx[task_num] = index;
+				task_num++;
+			}
+		}
+		for (index = 0; index < task_num; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				main_task_idx[index]))
+				goto error;
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue prepare tasks ready.");
 			goto error;
+		}
+	} else {
+		for (index = 0; index < max_queues; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				index))
+				goto error;
+	}
 	if (mlx5_vdpa_is_modify_virtq_supported(priv))
 		if (mlx5_vdpa_steer_update(priv, true))
 			goto error;
@@ -855,7 +904,7 @@ static void
 mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-		mlx5_vdpa_dev_close(priv->vid);
+		_internal_mlx5_vdpa_dev_close(priv->vid, true);
 	if (priv->use_c_thread)
 		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 00700261ec..477f2fdde0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -85,6 +85,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
 	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+	MLX5_VDPA_TASK_PREPARE_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -355,8 +356,12 @@ void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] release_resource
+ *   The vdpa driver realease resource without prepare resource.
  */
-void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+		bool release_resource);
 
 /**
  * Cleanup cached resources of all virtqs.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 07efa0cb16..97109206d2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -196,7 +196,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			pthread_mutex_lock(&priv->steer_update_lock);
 			mlx5_vdpa_steer_unset(priv);
 			pthread_mutex_unlock(&priv->steer_update_lock);
-			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_virtqs_release(priv, false);
 			mlx5_vdpa_drain_cq(priv);
 			if (priv->lm_mr.addr)
 				mlx5_os_wrapped_mkey_destroy(
@@ -208,6 +208,18 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&priv->dev_close_progress, 0,
 				__ATOMIC_RELAXED);
 			break;
+		case MLX5_VDPA_TASK_PREPARE_VIRTQ:
+			ret = mlx5_vdpa_virtq_single_resource_prepare(
+					priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to prepare virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+				task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 4a74738d9c..de6eab9bc6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -113,6 +113,16 @@ mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
 	}
 }
 
+static void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq)
+{
+	/* Clean pre-created resource in dev removal only */
+	claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+	virtq->index = 0;
+	virtq->virtq = NULL;
+	virtq->configured = 0;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
@@ -125,6 +135,8 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 		if (virtq->index != i)
 			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
+		if (virtq->virtq)
+			mlx5_vdpa_vq_destroy(virtq);
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -154,29 +166,34 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
-		virtq->index = 0;
-		virtq->virtq = NULL;
-		virtq->configured = 0;
+		mlx5_vdpa_vq_destroy(virtq);
 	}
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 }
 
 void
-mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+	bool release_resource)
 {
 	struct mlx5_vdpa_virtq *virtq;
-	int i;
+	uint32_t i, max_virtq;
 
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	max_virtq = (release_resource &&
+		(priv->queues * 2) > priv->nr_virtqs) ?
+		(priv->queues * 2) : priv->nr_virtqs;
+	for (i = 0; i < max_virtq; i++) {
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
-		if (i < (priv->queues * 2))
+		if (!release_resource && i < (priv->queues * 2))
 			mlx5_vdpa_virtq_single_resource_prepare(
 					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
+	if (!release_resource && priv->queues &&
+		mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			mlx5_vdpa_steer_unset(priv);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -733,7 +750,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	}
 	return 0;
 error:
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, true);
 	return -1;
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 00/17] Add vDPA multi-threads optiomization
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (14 preceding siblings ...)
  2022-04-08  7:56 ` [RFC 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
@ 2022-06-06 11:20 ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
                     ` (31 more replies)
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                   ` (3 subsequent siblings)
  19 siblings, 32 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Allow the driver to use internal threads to
obtain fast configuration.
All the threads will be open on the same core of
the event completion queue scheduling thread.

Add max_conf_threads parameter to configure
the maximum number of internal threads in addition to
the caller thread (8 is suggested).
These internal threads to pipeline handle VDPA tasks
in system and shared with all VDPA devices.
Default is 0, don't use internal threads for configuration.

Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
http://patchwork.dpdk.org/project/dpdk/list/?series=21868

RFC ("Add vDPA multi-threads optiomization")
https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-lizh@nvidia.com/

Li Zhang (12):
  vdpa/mlx5: fix usage of capability for max number of virtqs
  common/mlx5: extend virtq modifiable fields
  vdpa/mlx5: pre-create virtq in the prob
  vdpa/mlx5: optimize datapath-control synchronization
  vdpa/mlx5: add multi-thread management for configuration
  vdpa/mlx5: add task ring for MT management
  vdpa/mlx5: add MT task for VM memory registration
  vdpa/mlx5: add virtq creation task for MT management
  vdpa/mlx5: add virtq LM log task
  vdpa/mlx5: add device close task
  vdpa/mlx5: add virtq sub-resources creation
  vdpa/mlx5: prepare virtqueue resource creation

Yajun Wu (5):
  eal: add device removal in rte cleanup
  examples/vdpa: fix devices cleanup
  vdpa/mlx5: support pre create virtq resource
  common/mlx5: add DevX API to move QP to reset state
  vdpa/mlx5: support event qp reuse

 doc/guides/vdpadevs/mlx5.rst          |  25 +
 drivers/common/mlx5/mlx5_devx_cmds.c  |  77 ++-
 drivers/common/mlx5/mlx5_devx_cmds.h  |   6 +-
 drivers/common/mlx5/mlx5_prm.h        |  30 +-
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 270 +++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h         | 152 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 360 ++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   | 160 +++++--
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 128 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 270 +++++++----
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c   |  22 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 654 +++++++++++++++++++-------
 examples/vdpa/main.c                  |   5 +-
 lib/eal/freebsd/eal.c                 |  33 ++
 lib/eal/include/rte_dev.h             |   6 +
 lib/eal/linux/eal.c                   |  33 ++
 lib/eal/windows/eal.c                 |  33 ++
 18 files changed, 1878 insertions(+), 387 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 02/17] eal: add device removal in rte cleanup Li Zhang
                     ` (30 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Maxime Coquelin
  Cc: dev, thomas, rasland, roniba, stable

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array")
Cc: stable@dpdk.org

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 12 ++++++------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 76fa5d4299..ee71339b78 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -84,7 +84,7 @@ mlx5_vdpa_get_queue_num(struct rte_vdpa_device *vdev, uint32_t *queue_num)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	*queue_num = priv->caps.max_num_virtio_queues;
+	*queue_num = priv->caps.max_num_virtio_queues / 2;
 	return 0;
 }
 
@@ -141,7 +141,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -EINVAL;
 	}
-	if (vring >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (vring >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
@@ -388,7 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -411,7 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -624,7 +624,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		DRV_LOG(DEBUG, "No capability to support virtq statistics.");
 	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv) +
 			   sizeof(struct mlx5_vdpa_virtq) *
-			   attr->vdpa.max_num_virtio_queues * 2,
+			   attr->vdpa.max_num_virtio_queues,
 			   RTE_CACHE_LINE_SIZE);
 	if (!priv) {
 		DRV_LOG(ERR, "Failed to allocate private memory.");
@@ -685,7 +685,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 
 	mlx5_vdpa_dev_cache_clean(priv);
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
 			continue;
 		claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e025be47d2..c258eb3024 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -72,7 +72,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
@@ -492,9 +492,9 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM.");
 		priv->features |= (1ULL << VIRTIO_NET_F_CSUM);
 	}
-	if (nr_vring > priv->caps.max_num_virtio_queues * 2) {
+	if (nr_vring > priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Do not support more than %d virtqs(%d).",
-			(int)priv->caps.max_num_virtio_queues * 2,
+			(int)priv->caps.max_num_virtio_queues,
 			(int)nr_vring);
 		return -1;
 	}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 02/17] eal: add device removal in rte cleanup
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
  2022-06-06 11:20   ` [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 02/16] examples/vdpa: fix vDPA device remove Li Zhang
                     ` (29 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Bruce Richardson,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam
  Cc: dev, thomas, rasland, roniba, Yajun Wu, stable

From: Yajun Wu <yajunw@nvidia.com>

Add device removal in function rte_eal_cleanup. This is the last chance
device remove get called for sanity. Loop vdev bus first and then all bus
for all device, calling rte_dev_remove.

Cc: stable@dpdk.org

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 lib/eal/freebsd/eal.c     | 33 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_dev.h |  6 ++++++
 lib/eal/linux/eal.c       | 33 +++++++++++++++++++++++++++++++++
 lib/eal/windows/eal.c     | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index a6b20960f2..5ffd9146b6 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -886,11 +886,44 @@ rte_eal_init(int argc, char **argv)
 	return fctret;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+	RTE_SET_USED(bus);
+	RTE_SET_USED(data);
+	return 0;
+}
+
+static void
+remove_all_device(void)
+{
+	struct rte_bus *start = NULL, *next;
+	struct rte_dev_iterator dev_iter = {0};
+	struct rte_device *dev = NULL;
+	struct rte_device *tdev = NULL;
+	char devstr[128];
+
+	RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+		(void)rte_dev_remove(dev);
+	}
+	while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+		start = next;
+		/* Skip buses that don't have iterate method */
+		if (!next->dev_iterate || !next->name)
+			continue;
+		snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+		RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+			(void)rte_dev_remove(dev);
+		}
+	};
+}
+
 int
 rte_eal_cleanup(void)
 {
 	struct internal_config *internal_conf =
 		eal_get_internal_configuration();
+	remove_all_device();
 	rte_service_finalize();
 	rte_mp_channel_cleanup();
 	/* after this point, any DPDK pointers will become dangling */
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index e6ff1218f9..382d548ea3 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -492,6 +492,12 @@ int
 rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
 		  size_t len);
 
+#define RTE_DEV_FOREACH_SAFE(dev, devstr, it, tdev) \
+	for (rte_dev_iterator_init(it, devstr), \
+		(dev) = rte_dev_iterator_next(it); \
+		(dev) && ((tdev) = rte_dev_iterator_next(it), 1); \
+		(dev) = (tdev))
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 1ef263434a..30b295916e 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1248,6 +1248,38 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 	return 0;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+	RTE_SET_USED(bus);
+	RTE_SET_USED(data);
+	return 0;
+}
+
+static void
+remove_all_device(void)
+{
+	struct rte_bus *start = NULL, *next;
+	struct rte_dev_iterator dev_iter = {0};
+	struct rte_device *dev = NULL;
+	struct rte_device *tdev = NULL;
+	char devstr[128];
+
+	RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+		(void)rte_dev_remove(dev);
+	}
+	while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+		start = next;
+		/* Skip buses that don't have iterate method */
+		if (!next->dev_iterate || !next->name)
+			continue;
+		snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+		RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+			(void)rte_dev_remove(dev);
+		}
+	};
+}
+
 int
 rte_eal_cleanup(void)
 {
@@ -1257,6 +1289,7 @@ rte_eal_cleanup(void)
 	struct internal_config *internal_conf =
 		eal_get_internal_configuration();
 
+	remove_all_device();
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
 			internal_conf->hugepage_file.unlink_existing)
 		rte_memseg_walk(mark_freeable, NULL);
diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 122de2a319..3d7d411293 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -254,12 +254,45 @@ __rte_trace_point_register(rte_trace_point_t *trace, const char *name,
 	return -ENOTSUP;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+	RTE_SET_USED(bus);
+	RTE_SET_USED(data);
+	return 0;
+}
+
+static void
+remove_all_device(void)
+{
+	struct rte_bus *start = NULL, *next;
+	struct rte_dev_iterator dev_iter = {0};
+	struct rte_device *dev = NULL;
+	struct rte_device *tdev = NULL;
+	char devstr[128];
+
+	RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+		(void)rte_dev_remove(dev);
+	}
+	while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+		start = next;
+		/* Skip buses that don't have iterate method */
+		if (!next->dev_iterate || !next->name)
+			continue;
+		snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+		RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+			(void)rte_dev_remove(dev);
+		}
+	};
+}
+
 int
 rte_eal_cleanup(void)
 {
 	struct internal_config *internal_conf =
 		eal_get_internal_configuration();
 
+	remove_all_device();
 	eal_intr_thread_cancel();
 	eal_mem_virt2iova_cleanup();
 	/* after this point, any DPDK pointers will become dangling */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 02/16] examples/vdpa: fix vDPA device remove
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
  2022-06-06 11:20   ` [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
  2022-06-06 11:20   ` [PATCH v1 02/17] eal: add device removal in rte cleanup Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 03/17] examples/vdpa: fix devices cleanup Li Zhang
                     ` (28 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Maxime Coquelin, Chenbo Xia,
	Xiaolong Ye, Xiao Wang
  Cc: dev, thomas, rasland, roniba, Yajun Wu, stable

From: Yajun Wu <yajunw@nvidia.com>

Add calling rte_dev_remove in vDPA example application exit. Otherwise
rte_dev_remove never get called.

Fixes: edbed86d1cc ("examples/vdpa: introduce a new sample for vDPA")
Cc: stable@dpdk.org

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 examples/vdpa/main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c
index 7e11ef4e26..534f1e9715 100644
--- a/examples/vdpa/main.c
+++ b/examples/vdpa/main.c
@@ -632,6 +632,10 @@ main(int argc, char *argv[])
 		vdpa_sample_quit();
 	}
 
+	RTE_DEV_FOREACH(dev, "class=vdpa", &dev_iter) {
+		rte_dev_remove(dev);
+	}
+
 	/* clean up the EAL */
 	rte_eal_cleanup();
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 03/17] examples/vdpa: fix devices cleanup
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (2 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 02/16] examples/vdpa: fix vDPA device remove Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 03/16] vdpa/mlx5: support pre create virtq resource Li Zhang
                     ` (27 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Maxime Coquelin, Chenbo Xia,
	Chengchang Tang
  Cc: dev, thomas, rasland, roniba, Yajun Wu, stable

From: Yajun Wu <yajunw@nvidia.com>

Move rte_eal_cleanup to function vdpa_sample_quit which
handling all example app quit.
Otherwise rte_eal_cleanup won't be called on receiving signal
like SIGINT(control + c).

Fixes: 10aa3757 ("examples: add eal cleanup to examples")
Cc: stable@dpdk.org

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 examples/vdpa/main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c
index 7e11ef4e26..62e32b633d 100644
--- a/examples/vdpa/main.c
+++ b/examples/vdpa/main.c
@@ -286,6 +286,8 @@ vdpa_sample_quit(void)
 		if (vports[i].ifname[0] != '\0')
 			close_vdpa(&vports[i]);
 	}
+	/* clean up the EAL */
+	rte_eal_cleanup();
 }
 
 static void
@@ -632,8 +634,5 @@ main(int argc, char *argv[])
 		vdpa_sample_quit();
 	}
 
-	/* clean up the EAL */
-	rte_eal_cleanup();
-
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 03/16] vdpa/mlx5: support pre create virtq resource
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (3 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 03/17] examples/vdpa: fix devices cleanup Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 04/16] common/mlx5: add DevX API to move QP to reset state Li Zhang
                     ` (26 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

The motivation of this change is to reduce vDPA device queue creation
time by create some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++++++++++++++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+    first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx queue)
+    for pre-create queue resource to speed up first time queue creation. Set it
+    together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^^^^^^^^^^^^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee71339b78..faf833ee2f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-	mlx5_vdpa_virtqs_cleanup(priv);
+	/* Clean pre-created resource in dev removal only. */
+	if (!priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 		priv->hw_max_latency_us = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_pending_comp") == 0) {
 		priv->hw_max_pending_comp = (uint32_t)tmp;
+	} else if (strcmp(key, "queue_size") == 0) {
+		priv->queue_size = (uint16_t)tmp;
+	} else if (strcmp(key, "queues") == 0) {
+		priv->queues = (uint16_t)tmp;
+	} else {
+		DRV_LOG(WARNING, "Invalid key %s.", key);
 	}
 	return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 	if (!priv->event_us &&
 	    priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
 		priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+	if ((priv->queue_size && !priv->queues) ||
+		(!priv->queue_size && priv->queues)) {
+		priv->queue_size = 0;
+		priv->queues = 0;
+		DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+	}
 	DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
 	DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+	DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+		priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t index;
+	uint32_t i;
+
+	if (!priv->queues)
+		return 0;
+	for (index = 0; index < (priv->queues * 2); ++index) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+		if (priv->caps.queue_counters_valid) {
+			if (!virtq->counters)
+				virtq->counters =
+					mlx5_devx_cmd_create_virtio_q_counters
+						(priv->cdev->ctx);
+			if (!virtq->counters) {
+				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
+					" %d.", index);
+				return -1;
+			}
+		}
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size = priv->caps.umems[i].a * priv->queue_size +
+					priv->caps.umems[i].b;
+			buf = rte_zmalloc(__func__, size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+						" %u.", i, index);
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
+					size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				rte_free(buf);
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+						i, index);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
+		}
+	}
+	return 0;
 }
 
 static int
@@ -604,6 +671,8 @@ mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
 		return -rte_errno;
 	if (mlx5_vdpa_event_qp_global_prepare(priv))
 		return -rte_errno;
+	if (mlx5_vdpa_virtq_resource_prepare(priv))
+		return -rte_errno;
 	return 0;
 }
 
@@ -638,6 +707,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		priv->num_lag_ports = 1;
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
+	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -646,7 +716,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	mlx5_vdpa_config_get(mkvlist, priv);
 	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
@@ -684,6 +753,8 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	if (priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_dev_cache_clean(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e7f3319f89..f6719a3c60 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -135,6 +135,8 @@ struct mlx5_vdpa_priv {
 	uint8_t hw_latency_mode; /* Hardware CQ moderation mode. */
 	uint16_t hw_max_latency_us; /* Hardware CQ moderation period in usec. */
 	uint16_t hw_max_pending_comp; /* Hardware CQ moderation counter. */
+	uint16_t queue_size; /* virtq depth for pre-creating virtq resource */
+	uint16_t queues; /* Max virtq pair for pre-creating virtq resource */
 	struct rte_vdpa_device *vdev; /* vDPA device. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	int vid; /* vhost device id. */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 04/16] common/mlx5: add DevX API to move QP to reset state
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (4 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 03/16] vdpa/mlx5: support pre create virtq resource Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource Li Zhang
                     ` (25 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++++++
 drivers/common/mlx5/mlx5_prm.h       | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6bdbc12bb..1d6d6578d6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
 	} in;
 	union {
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
 	} out;
 	void *qpc;
 	int ret;
@@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		inlen = sizeof(in.rtr2rts);
 		outlen = sizeof(out.rtr2rts);
 		break;
+	case MLX5_CMD_OP_QP_2RST:
+		MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+		inlen = sizeof(in.qp2rst);
+		outlen = sizeof(out.qp2rst);
+		break;
 	default:
 		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
 			qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index bc3e70a1d1..8a2f55c33e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
 	u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 vhca_tunnel_id[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_80[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
 	u8 status[0x8];
 	u8 reserved_0[0x18];
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (5 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 04/16] common/mlx5: add DevX API to move QP to reset state Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state Li Zhang
                     ` (24 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

The motivation of this change is to reduce vDPA device queue creation
time by create some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++++++++++++++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+    first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx queue)
+    for pre-create queue resource to speed up first time queue creation. Set it
+    together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^^^^^^^^^^^^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee71339b78..faf833ee2f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-	mlx5_vdpa_virtqs_cleanup(priv);
+	/* Clean pre-created resource in dev removal only. */
+	if (!priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 		priv->hw_max_latency_us = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_pending_comp") == 0) {
 		priv->hw_max_pending_comp = (uint32_t)tmp;
+	} else if (strcmp(key, "queue_size") == 0) {
+		priv->queue_size = (uint16_t)tmp;
+	} else if (strcmp(key, "queues") == 0) {
+		priv->queues = (uint16_t)tmp;
+	} else {
+		DRV_LOG(WARNING, "Invalid key %s.", key);
 	}
 	return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 	if (!priv->event_us &&
 	    priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
 		priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+	if ((priv->queue_size && !priv->queues) ||
+		(!priv->queue_size && priv->queues)) {
+		priv->queue_size = 0;
+		priv->queues = 0;
+		DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+	}
 	DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
 	DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+	DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+		priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t index;
+	uint32_t i;
+
+	if (!priv->queues)
+		return 0;
+	for (index = 0; index < (priv->queues * 2); ++index) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+		if (priv->caps.queue_counters_valid) {
+			if (!virtq->counters)
+				virtq->counters =
+					mlx5_devx_cmd_create_virtio_q_counters
+						(priv->cdev->ctx);
+			if (!virtq->counters) {
+				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
+					" %d.", index);
+				return -1;
+			}
+		}
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size = priv->caps.umems[i].a * priv->queue_size +
+					priv->caps.umems[i].b;
+			buf = rte_zmalloc(__func__, size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+						" %u.", i, index);
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
+					size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				rte_free(buf);
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+						i, index);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
+		}
+	}
+	return 0;
 }
 
 static int
@@ -604,6 +671,8 @@ mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
 		return -rte_errno;
 	if (mlx5_vdpa_event_qp_global_prepare(priv))
 		return -rte_errno;
+	if (mlx5_vdpa_virtq_resource_prepare(priv))
+		return -rte_errno;
 	return 0;
 }
 
@@ -638,6 +707,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		priv->num_lag_ports = 1;
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
+	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -646,7 +716,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	mlx5_vdpa_config_get(mkvlist, priv);
 	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
@@ -684,6 +753,8 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	if (priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_dev_cache_clean(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e7f3319f89..f6719a3c60 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -135,6 +135,8 @@ struct mlx5_vdpa_priv {
 	uint8_t hw_latency_mode; /* Hardware CQ moderation mode. */
 	uint16_t hw_max_latency_us; /* Hardware CQ moderation period in usec. */
 	uint16_t hw_max_pending_comp; /* Hardware CQ moderation counter. */
+	uint16_t queue_size; /* virtq depth for pre-creating virtq resource */
+	uint16_t queues; /* Max virtq pair for pre-creating virtq resource */
 	struct rte_vdpa_device *vdev; /* vDPA device. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	int vid; /* vhost device id. */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (6 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 05/16] vdpa/mlx5: support event qp reuse Li Zhang
                     ` (23 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++++++
 drivers/common/mlx5/mlx5_prm.h       | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6bdbc12bb..1d6d6578d6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
 	} in;
 	union {
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
 	} out;
 	void *qpc;
 	int ret;
@@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		inlen = sizeof(in.rtr2rts);
 		outlen = sizeof(out.rtr2rts);
 		break;
+	case MLX5_CMD_OP_QP_2RST:
+		MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+		inlen = sizeof(in.qp2rst);
+		outlen = sizeof(out.qp2rst);
+		break;
 	default:
 		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
 			qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index bc3e70a1d1..8a2f55c33e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
 	u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 vhca_tunnel_id[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_80[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
 	u8 status[0x8];
 	u8 reserved_0[0x18];
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 05/16] vdpa/mlx5: support event qp reuse
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (7 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 06/16] common/mlx5: extend virtq modifiable fields Li Zhang
                     ` (22 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destroy,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index faf833ee2f..ee99952e11 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+					-1, &virtq->eqp);
 
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+				index);
+			return -1;
+		}
 		if (priv->caps.queue_counters_valid) {
 			if (!virtq->counters)
 				virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
 	struct mlx5_vdpa_cq cq;
 	struct mlx5_devx_obj *fw_qp;
 	struct mlx5_devx_qp sw_qp;
+	uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			      int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		};
 		uint32_t word;
 	} last_word;
-	uint16_t next_wqe_counter = cq->cq_ci;
+	uint16_t next_wqe_counter = eqp->qp_pi;
 	uint16_t cur_wqe_counter;
 	uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		rte_io_wmb();
 		/* Ring CQ doorbell record. */
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+		eqp->qp_pi += comp;
 		rte_io_wmb();
 		/* Ring SW QP doorbell record. */
-		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
 	}
 	return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 	return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+		mlx5_vdpa_queue_complete(cq);
+		if (cq->cq_obj.cq) {
+			cq->cq_obj.cqes[0].wqe_counter =
+				rte_cpu_to_be_16(UINT16_MAX);
+			priv->virtqs[i].eqp.qp_pi = 0;
+			if (!cq->armed)
+				mlx5_vdpa_cq_arm(priv, cq);
+		}
+	}
+}
+
 /* Wait on all CQs channel for completion event. */
 static struct mlx5_vdpa_cq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
@@ -574,14 +594,44 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
+static int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
+					  eqp->sw_qp.qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp.qp,
+			MLX5_CMD_OP_QP_2RST, eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	return mlx5_vdpa_qps2rts(eqp);
+}
+
 int
-mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			  int callfd, struct mlx5_vdpa_event_qp *eqp)
 {
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
+	if (eqp->cq.cq_obj.cq != NULL && log_desc_n == eqp->cq.log_desc_n) {
+		/* Reuse existing resources. */
+		eqp->cq.callfd = callfd;
+		/* FW will set event qp to error state in q destroy. */
+		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+					&eqp->sw_qp.db_rec[0]);
+			return 0;
+		}
+	}
+	if (eqp->fw_qp)
+		mlx5_vdpa_event_qp_destroy(eqp);
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
@@ -608,8 +658,10 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (mlx5_vdpa_qps2rts(eqp))
 		goto error;
+	eqp->qp_pi = 0;
 	/* First ringing. */
-	rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+	if (eqp->sw_qp.db_rec)
+		rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 			&eqp->sw_qp.db_rec[0]);
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c258eb3024..6637ba1503 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -87,6 +87,8 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 			}
 			virtq->umems[j].size = 0;
 		}
+		if (virtq->eqp.fw_qp)
+			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	}
 }
 
@@ -117,8 +119,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	if (virtq->eqp.fw_qp)
-		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
@@ -246,7 +246,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 						      MLX5_VIRTQ_EVENT_MODE_QP :
 						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
 						&virtq->eqp);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 06/16] common/mlx5: extend virtq modifiable fields
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (8 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 05/16] vdpa/mlx5: support event qp reuse Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 06/17] vdpa/mlx5: support event qp reuse Li Zhang
                     ` (21 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++++++++++++++++++++++-----
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h       | 13 +++++-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d6d6578d6..1b68c37092 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
 		vdpa_attr->log_doorbell_stride =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_stride);
+		vdpa_attr->vnet_modify_ext =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 vnet_modify_ext);
+		vdpa_attr->virtio_net_q_addr_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_net_q_addr_modify);
+		vdpa_attr->virtio_q_index_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_q_index_modify);
 		vdpa_attr->log_doorbell_bar_size =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_bar_size);
@@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+		attr->mod_fields_bitmap);
 	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-	switch (attr->type) {
-	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+	if (!attr->mod_fields_bitmap) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
 		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 			 attr->dirty_bitmap_mkey);
 		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 			 attr->dirty_bitmap_addr);
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 			 attr->dirty_bitmap_size);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+	}
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 			 attr->dirty_bitmap_dump_enable);
-		break;
-	default:
-		rte_errno = EINVAL;
-		return -rte_errno;
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+		MLX5_SET(virtio_q, virtctx, queue_period_mode,
+			attr->hw_latency_mode);
+		MLX5_SET(virtio_q, virtctx, queue_period_us,
+			attr->hw_max_latency_us);
+		MLX5_SET(virtio_q, virtctx, queue_max_count,
+			attr->hw_max_pending_comp);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+		MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+		MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+		MLX5_SET64(virtio_q, virtctx, available_addr,
+			attr->available_addr);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+			attr->hw_used_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+		MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+		MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY)
+		MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	if (attr->mod_fields_bitmap &
+		MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK) {
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+		MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+		MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE) {
+		MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+		MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
 	}
 	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
 					 out, sizeof(out));
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 3747ef9e33..ec6467d927 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -74,6 +74,9 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t queue_counters_valid:1;
+	uint32_t vnet_modify_ext:1;
+	uint32_t virtio_net_q_addr_modify:1;
+	uint32_t virtio_q_index_modify:1;
 	uint32_t max_num_virtio_queues;
 	struct {
 		uint32_t a;
@@ -465,7 +468,7 @@ struct mlx5_devx_virtq_attr {
 	uint32_t tis_id;
 	uint32_t counters_obj_id;
 	uint64_t dirty_bitmap_addr;
-	uint64_t type;
+	uint64_t mod_fields_bitmap;
 	uint64_t desc_addr;
 	uint64_t used_addr;
 	uint64_t available_addr;
@@ -475,6 +478,7 @@ struct mlx5_devx_virtq_attr {
 		uint64_t offset;
 	} umems[3];
 	uint8_t error_type;
+	uint8_t q_type;
 };
 
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 8a2f55c33e..5f58a6ee1d 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1802,7 +1802,9 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
 	u8 virtio_queue_type[0x8];
 	u8 reserved_at_20[0x13];
 	u8 log_doorbell_stride[0x5];
-	u8 reserved_at_3b[0x3];
+	u8 vnet_modify_ext[0x1];
+	u8 virtio_net_q_addr_modify[0x1];
+	u8 virtio_q_index_modify[0x1];
 	u8 log_doorbell_bar_size[0x5];
 	u8 doorbell_bar_offset[0x40];
 	u8 reserved_at_80[0x8];
@@ -3024,6 +3026,15 @@ enum {
 	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD = (1UL << 5),
+	MLX5_VIRTQ_MODIFY_TYPE_ADDR = (1UL << 6),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX = (1UL << 7),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX = (1UL << 8),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE = (1UL << 9),
+	MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 = (1UL << 10),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY = (1UL << 11),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK = (1UL << 12),
+	MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE = (1UL << 13),
 };
 
 struct mlx5_ifc_virtio_q_bits {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 06/17] vdpa/mlx5: support event qp reuse
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (9 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 06/16] common/mlx5: extend virtq modifiable fields Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 07/17] common/mlx5: extend virtq modifiable fields Li Zhang
                     ` (20 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destroy,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index faf833ee2f..ee99952e11 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+					-1, &virtq->eqp);
 
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+				index);
+			return -1;
+		}
 		if (priv->caps.queue_counters_valid) {
 			if (!virtq->counters)
 				virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
 	struct mlx5_vdpa_cq cq;
 	struct mlx5_devx_obj *fw_qp;
 	struct mlx5_devx_qp sw_qp;
+	uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			      int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		};
 		uint32_t word;
 	} last_word;
-	uint16_t next_wqe_counter = cq->cq_ci;
+	uint16_t next_wqe_counter = eqp->qp_pi;
 	uint16_t cur_wqe_counter;
 	uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		rte_io_wmb();
 		/* Ring CQ doorbell record. */
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+		eqp->qp_pi += comp;
 		rte_io_wmb();
 		/* Ring SW QP doorbell record. */
-		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
 	}
 	return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 	return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+		mlx5_vdpa_queue_complete(cq);
+		if (cq->cq_obj.cq) {
+			cq->cq_obj.cqes[0].wqe_counter =
+				rte_cpu_to_be_16(UINT16_MAX);
+			priv->virtqs[i].eqp.qp_pi = 0;
+			if (!cq->armed)
+				mlx5_vdpa_cq_arm(priv, cq);
+		}
+	}
+}
+
 /* Wait on all CQs channel for completion event. */
 static struct mlx5_vdpa_cq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
@@ -574,14 +594,44 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
+static int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
+					  eqp->sw_qp.qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp.qp,
+			MLX5_CMD_OP_QP_2RST, eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	return mlx5_vdpa_qps2rts(eqp);
+}
+
 int
-mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			  int callfd, struct mlx5_vdpa_event_qp *eqp)
 {
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
+	if (eqp->cq.cq_obj.cq != NULL && log_desc_n == eqp->cq.log_desc_n) {
+		/* Reuse existing resources. */
+		eqp->cq.callfd = callfd;
+		/* FW will set event qp to error state in q destroy. */
+		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+					&eqp->sw_qp.db_rec[0]);
+			return 0;
+		}
+	}
+	if (eqp->fw_qp)
+		mlx5_vdpa_event_qp_destroy(eqp);
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
@@ -608,8 +658,10 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (mlx5_vdpa_qps2rts(eqp))
 		goto error;
+	eqp->qp_pi = 0;
 	/* First ringing. */
-	rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+	if (eqp->sw_qp.db_rec)
+		rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 			&eqp->sw_qp.db_rec[0]);
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c258eb3024..6637ba1503 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -87,6 +87,8 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 			}
 			virtq->umems[j].size = 0;
 		}
+		if (virtq->eqp.fw_qp)
+			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	}
 }
 
@@ -117,8 +119,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	if (virtq->eqp.fw_qp)
-		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
@@ -246,7 +246,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 						      MLX5_VIRTQ_EVENT_MODE_QP :
 						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
 						&virtq->eqp);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 07/17] common/mlx5: extend virtq modifiable fields
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (10 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 06/17] vdpa/mlx5: support event qp reuse Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 07/16] vdpa/mlx5: pre-create virtq in the prob Li Zhang
                     ` (19 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++++++++++++++++++++++-----
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h       | 13 +++++-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d6d6578d6..1b68c37092 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
 		vdpa_attr->log_doorbell_stride =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_stride);
+		vdpa_attr->vnet_modify_ext =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 vnet_modify_ext);
+		vdpa_attr->virtio_net_q_addr_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_net_q_addr_modify);
+		vdpa_attr->virtio_q_index_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_q_index_modify);
 		vdpa_attr->log_doorbell_bar_size =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_bar_size);
@@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+		attr->mod_fields_bitmap);
 	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-	switch (attr->type) {
-	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+	if (!attr->mod_fields_bitmap) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
 		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 			 attr->dirty_bitmap_mkey);
 		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 			 attr->dirty_bitmap_addr);
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 			 attr->dirty_bitmap_size);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+	}
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 			 attr->dirty_bitmap_dump_enable);
-		break;
-	default:
-		rte_errno = EINVAL;
-		return -rte_errno;
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+		MLX5_SET(virtio_q, virtctx, queue_period_mode,
+			attr->hw_latency_mode);
+		MLX5_SET(virtio_q, virtctx, queue_period_us,
+			attr->hw_max_latency_us);
+		MLX5_SET(virtio_q, virtctx, queue_max_count,
+			attr->hw_max_pending_comp);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+		MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+		MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+		MLX5_SET64(virtio_q, virtctx, available_addr,
+			attr->available_addr);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+			attr->hw_used_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+		MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+		MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY)
+		MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	if (attr->mod_fields_bitmap &
+		MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK) {
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+		MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+		MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE) {
+		MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+		MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
 	}
 	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
 					 out, sizeof(out));
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 3747ef9e33..ec6467d927 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -74,6 +74,9 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t queue_counters_valid:1;
+	uint32_t vnet_modify_ext:1;
+	uint32_t virtio_net_q_addr_modify:1;
+	uint32_t virtio_q_index_modify:1;
 	uint32_t max_num_virtio_queues;
 	struct {
 		uint32_t a;
@@ -465,7 +468,7 @@ struct mlx5_devx_virtq_attr {
 	uint32_t tis_id;
 	uint32_t counters_obj_id;
 	uint64_t dirty_bitmap_addr;
-	uint64_t type;
+	uint64_t mod_fields_bitmap;
 	uint64_t desc_addr;
 	uint64_t used_addr;
 	uint64_t available_addr;
@@ -475,6 +478,7 @@ struct mlx5_devx_virtq_attr {
 		uint64_t offset;
 	} umems[3];
 	uint8_t error_type;
+	uint8_t q_type;
 };
 
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 8a2f55c33e..5f58a6ee1d 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1802,7 +1802,9 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
 	u8 virtio_queue_type[0x8];
 	u8 reserved_at_20[0x13];
 	u8 log_doorbell_stride[0x5];
-	u8 reserved_at_3b[0x3];
+	u8 vnet_modify_ext[0x1];
+	u8 virtio_net_q_addr_modify[0x1];
+	u8 virtio_q_index_modify[0x1];
 	u8 log_doorbell_bar_size[0x5];
 	u8 doorbell_bar_offset[0x40];
 	u8 reserved_at_80[0x8];
@@ -3024,6 +3026,15 @@ enum {
 	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD = (1UL << 5),
+	MLX5_VIRTQ_MODIFY_TYPE_ADDR = (1UL << 6),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX = (1UL << 7),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX = (1UL << 8),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE = (1UL << 9),
+	MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 = (1UL << 10),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY = (1UL << 11),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK = (1UL << 12),
+	MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE = (1UL << 13),
 };
 
 struct mlx5_ifc_virtio_q_bits {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 07/16] vdpa/mlx5: pre-create virtq in the prob
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (11 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 07/17] common/mlx5: extend virtq modifiable fields Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 08/16] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
                     ` (18 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h       |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    |  13 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 257 +++++++++++++++++-----------
 3 files changed, 170 insertions(+), 104 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
 	uint16_t vq_size;
 	uint8_t notifier_state;
 	bool stopped;
+	uint32_t configured:1;
 	uint32_t version;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..a8faf0c116 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,14 +12,17 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.mod_fields_bitmap =
+			MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
 		.dirty_bitmap_dump_enable = enable,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
@@ -37,10 +40,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 			   uint64_t log_size)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
 		.dirty_bitmap_addr = log_base,
 		.dirty_bitmap_size = log_size,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 	int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
 					      priv->cdev->pdn,
@@ -54,7 +58,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 						      &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6637ba1503..55cbc9fad2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		rte_intr_fd_set(virtq->intr_handle, -1);
 	}
 	rte_intr_instance_free(virtq->intr_handle);
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
+		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
-			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE,
 			.state = state ? MLX5_VIRTQ_STATE_RDY :
 					 MLX5_VIRTQ_STATE_SUSPEND,
 			.queue_index = virtq->index,
@@ -153,7 +155,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	int ret;
 
-	if (virtq->stopped)
+	if (virtq->stopped || !virtq->configured)
 		return 0;
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
@@ -209,51 +211,54 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
+		struct mlx5_devx_virtq_attr *attr,
+		struct rte_vhost_vring *vq, int index)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
-	struct rte_vhost_vring vq;
-	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
 	unsigned int i;
-	uint16_t last_avail_idx;
-	uint16_t last_used_idx;
-	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
-	uint64_t cookie;
-
-	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
-	if (ret)
-		return -1;
-	if (vq.size == 0)
-		return 0;
-	virtq->index = index;
-	virtq->vq_size = vq.size;
-	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
-							VIRTIO_F_VERSION_1));
-	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+	uint16_t last_avail_idx = 0;
+	uint16_t last_used_idx = 0;
+
+	if (virtq->virtq)
+		attr->mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE |
+			MLX5_VIRTQ_MODIFY_TYPE_ADDR |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
+			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
+			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 =
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
+	attr->q_type =
+		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
 			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
-					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
-						      MLX5_VIRTQ_EVENT_MODE_QP :
-						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
-	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
-						&virtq->eqp);
+	attr->event_mode = vq->callfd != -1 ||
+	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
+		ret = mlx5_vdpa_event_qp_prepare(priv,
+				vq->size, vq->callfd, &virtq->eqp);
 		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+			DRV_LOG(ERR,
+				"Failed to create event QPs for virtq %d.",
 				index);
 			return -1;
 		}
-		attr.qp_id = virtq->eqp.fw_qp->id;
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+		attr->qp_id = virtq->eqp.fw_qp->id;
 	} else {
 		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
 			" need event QPs and event mechanism.", index);
@@ -265,77 +270,82 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (!virtq->counters) {
 			DRV_LOG(ERR, "Failed to create virtq couners for virtq"
 				" %d.", index);
-			goto error;
+			return -1;
 		}
-		attr.counters_obj_id = virtq->counters->id;
+		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		uint32_t size;
-		void *buf;
-		struct mlx5dv_devx_umem *obj;
-
-		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
-		if (virtq->umems[i].size == size &&
-		    virtq->umems[i].obj != NULL) {
-			/* Reuse registered memory. */
-			memset(virtq->umems[i].buf, 0, size);
-			goto reuse;
-		}
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
+	if (virtq->virtq) {
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size =
+		priv->caps.umems[i].a * vq->size + priv->caps.umems[i].b;
+			if (virtq->umems[i].size == size &&
+				virtq->umems[i].obj != NULL) {
+				/* Reuse registered memory. */
+				memset(virtq->umems[i].buf, 0, size);
+				goto reuse;
+			}
+			if (virtq->umems[i].obj)
+				claim_zero(mlx5_glue->devx_umem_dereg
 				   (virtq->umems[i].obj));
-		if (virtq->umems[i].buf)
-			rte_free(virtq->umems[i].buf);
-		virtq->umems[i].size = 0;
-		virtq->umems[i].obj = NULL;
-		virtq->umems[i].buf = NULL;
-		buf = rte_zmalloc(__func__, size, 4096);
-		if (buf == NULL) {
-			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+			if (virtq->umems[i].buf)
+				rte_free(virtq->umems[i].buf);
+			virtq->umems[i].size = 0;
+			virtq->umems[i].obj = NULL;
+			virtq->umems[i].buf = NULL;
+			buf = rte_zmalloc(__func__,
+				size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
-			goto error;
-		}
-		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
-					       IBV_ACCESS_LOCAL_WRITE);
-		if (obj == NULL) {
-			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
+				buf, size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
-			goto error;
-		}
-		virtq->umems[i].size = size;
-		virtq->umems[i].buf = buf;
-		virtq->umems[i].obj = obj;
+				rte_free(buf);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
 reuse:
-		attr.umems[i].id = virtq->umems[i].obj->umem_id;
-		attr.umems[i].offset = 0;
-		attr.umems[i].size = virtq->umems[i].size;
+			attr->umems[i].id = virtq->umems[i].obj->umem_id;
+			attr->umems[i].offset = 0;
+			attr->umems[i].size = virtq->umems[i].size;
+		}
 	}
-	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.desc);
+					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
-			goto error;
+			return -1;
 		}
-		attr.desc_addr = gpa;
+		attr->desc_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.used);
+					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
-			goto error;
+			return -1;
 		}
-		attr.used_addr = gpa;
+		attr->used_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.avail);
+					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-			goto error;
+			return -1;
 		}
-		attr.available_addr = gpa;
+		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
-				 &last_used_idx);
+	ret = rte_vhost_get_vring_base(priv->vid,
+			index, &last_avail_idx, &last_used_idx);
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
@@ -345,24 +355,71 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
 	}
-	attr.hw_available_index = last_avail_idx;
-	attr.hw_used_index = last_used_idx;
-	attr.q_size = vq.size;
-	attr.mkey = priv->gpa_mkey_index;
-	attr.tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
-	attr.queue_index = index;
-	attr.pd = priv->cdev->pdn;
-	attr.hw_latency_mode = priv->hw_latency_mode;
-	attr.hw_max_latency_us = priv->hw_max_latency_us;
-	attr.hw_max_pending_comp = priv->hw_max_pending_comp;
-	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+	attr->hw_available_index = last_avail_idx;
+	attr->hw_used_index = last_used_idx;
+	attr->q_size = vq->size;
+	attr->mkey = priv->gpa_mkey_index;
+	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
+	attr->queue_index = index;
+	attr->pd = priv->cdev->pdn;
+	attr->hw_latency_mode = priv->hw_latency_mode;
+	attr->hw_max_latency_us = priv->hw_max_latency_us;
+	attr->hw_max_pending_comp = priv->hw_max_pending_comp;
+	if (attr->hw_latency_mode || attr->hw_max_latency_us ||
+		attr->hw_max_pending_comp)
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD;
+	return 0;
+}
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
+{
+	return (priv->caps.vnet_modify_ext &&
+			priv->caps.virtio_net_q_addr_modify &&
+			priv->caps.virtio_q_index_modify) ? true : false;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+{
+	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	int ret;
+	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
+	uint64_t cookie;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->priv = priv;
-	if (!virtq->virtq)
+	virtq->stopped = 0;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
+				&vq, index);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to setup update virtq attr"
+			" %d.", index);
 		goto error;
-	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
-	if (mlx5_vdpa_virtq_modify(virtq, 1))
+	}
+	if (!virtq->virtq) {
+		virtq->index = index;
+		virtq->vq_size = vq.size;
+		virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx,
+			&attr);
+		if (!virtq->virtq)
+			goto error;
+		attr.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE;
+	}
+	attr.state = MLX5_VIRTQ_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify virtq %d.", index);
 		goto error;
-	virtq->priv = priv;
+	}
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->configured = 1;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
@@ -553,7 +610,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 			return 0;
 		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
 	}
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
 			ret = mlx5_vdpa_steer_update(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 08/16] vdpa/mlx5: optimize datapath-control synchronization
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (12 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 07/16] vdpa/mlx5: pre-create virtq in the prob Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob Li Zhang
                     ` (17 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 24 ++++---
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++++++++++++++++++-----------
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    | 34 +++++++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++++++++++++++++++-------
 6 files changed, 184 insertions(+), 79 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee99952e11..e5a11f72fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
 	struct mlx5_vdpa_priv *priv =
 		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	struct mlx5_vdpa_virtq *virtq;
 	int ret;
 
 	if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
-	pthread_mutex_lock(&priv->vq_config_lock);
+	virtq = &priv->virtqs[vring];
+	pthread_mutex_lock(&virtq->virtq_lock);
 	ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-	pthread_mutex_unlock(&priv->vq_config_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
-	/* The mutex may stay locked after event thread cancel - initiate it. */
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t index;
 	uint32_t i;
 
+	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+		index++) {
+		virtq = &priv->virtqs[index];
+		pthread_mutex_init(&virtq->virtq_lock, NULL);
+	}
 	if (!priv->queues)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
-		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		virtq = &priv->virtqs[index];
 		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, &virtq->eqp);
+					-1, virtq);
 
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
+	rte_spinlock_init(&priv->db_lock);
+	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
-	pthread_mutex_destroy(&priv->vq_config_lock);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
 	bool stopped;
 	uint32_t configured:1;
 	uint32_t version;
+	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
 	enum mlx5_dev_state state;
-	pthread_mutex_t vq_config_lock;
+	rte_spinlock_t db_lock;
+	pthread_mutex_t steer_update_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
 	int event_mode;
@@ -222,14 +224,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   Number of descriptors.
  * @param[in] callfd
  *   The guest notification file descriptor.
- * @param[in/out] eqp
- *   Pointer to the event QP structure.
+ * @param[in/out] virtq
+ *   Pointer to the virt-queue structure.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			      int callfd, struct mlx5_vdpa_event_qp *eqp);
+int
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+	int callfd, struct mlx5_vdpa_virtq *virtq);
 
 /**
  * Destroy an event QP and all its related resources.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b43dca9255..2b0f5936d1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -85,12 +85,13 @@ mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
 
 static int
 mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
-		    int callfd, struct mlx5_vdpa_cq *cq)
+		int callfd, struct mlx5_vdpa_virtq *virtq)
 {
 	struct mlx5_devx_cq_attr attr = {
 		.use_first_only = 1,
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
+	struct mlx5_vdpa_cq *cq = &virtq->eqp.cq;
 	uint16_t event_nums[1] = {0};
 	int ret;
 
@@ -102,10 +103,11 @@ mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
 	cq->log_desc_n = log_desc_n;
 	rte_spinlock_init(&cq->sl);
 	/* Subscribe CQ event to the event channel controlled by the driver. */
-	ret = mlx5_os_devx_subscribe_devx_event(priv->eventc,
-						cq->cq_obj.cq->obj,
-						sizeof(event_nums), event_nums,
-						(uint64_t)(uintptr_t)cq);
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc,
+							cq->cq_obj.cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)virtq);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to subscribe CQE event.");
 		rte_errno = errno;
@@ -167,13 +169,17 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 static void
 mlx5_vdpa_arm_all_cqs(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_cq *cq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		cq = &priv->virtqs[i].eqp.cq;
 		if (cq->cq_obj.cq && !cq->armed)
 			mlx5_vdpa_cq_arm(priv, cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
@@ -220,13 +226,18 @@ mlx5_vdpa_queue_complete(struct mlx5_vdpa_cq *cq)
 static uint32_t
 mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 {
-	int i;
+	struct mlx5_vdpa_virtq *virtq;
+	struct mlx5_vdpa_cq *cq;
 	uint32_t max = 0;
+	uint32_t comp;
+	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
-		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
-		uint32_t comp = mlx5_vdpa_queue_complete(cq);
-
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		cq = &virtq->eqp.cq;
+		comp = mlx5_vdpa_queue_complete(cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		if (comp > max)
 			max = comp;
 	}
@@ -253,7 +264,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 }
 
 /* Wait on all CQs channel for completion event. */
-static struct mlx5_vdpa_cq *
+static struct mlx5_vdpa_virtq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 {
 #ifdef HAVE_IBV_DEVX_EVENT
@@ -265,7 +276,8 @@ mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 					    sizeof(out.buf));
 
 	if (ret >= 0)
-		return (struct mlx5_vdpa_cq *)(uintptr_t)out.event_resp.cookie;
+		return (struct mlx5_vdpa_virtq *)
+				(uintptr_t)out.event_resp.cookie;
 	DRV_LOG(INFO, "Got error in devx_get_event, ret = %d, errno = %d.",
 		ret, errno);
 #endif
@@ -276,7 +288,7 @@ static void *
 mlx5_vdpa_event_handle(void *arg)
 {
 	struct mlx5_vdpa_priv *priv = arg;
-	struct mlx5_vdpa_cq *cq;
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t max;
 
 	switch (priv->event_mode) {
@@ -284,7 +296,6 @@ mlx5_vdpa_event_handle(void *arg)
 	case MLX5_VDPA_EVENT_MODE_FIXED_TIMER:
 		priv->timer_delay_us = priv->event_us;
 		while (1) {
-			pthread_mutex_lock(&priv->vq_config_lock);
 			max = mlx5_vdpa_queues_complete(priv);
 			if (max == 0 && priv->no_traffic_counter++ >=
 			    priv->no_traffic_max) {
@@ -292,32 +303,37 @@ mlx5_vdpa_event_handle(void *arg)
 					priv->vdev->device->name);
 				mlx5_vdpa_arm_all_cqs(priv);
 				do {
-					pthread_mutex_unlock
-							(&priv->vq_config_lock);
-					cq = mlx5_vdpa_event_wait(priv);
-					pthread_mutex_lock
-							(&priv->vq_config_lock);
-					if (cq == NULL ||
-					       mlx5_vdpa_queue_complete(cq) > 0)
+					virtq = mlx5_vdpa_event_wait(priv);
+					if (virtq == NULL)
 						break;
+					pthread_mutex_lock(
+						&virtq->virtq_lock);
+					if (mlx5_vdpa_queue_complete(
+						&virtq->eqp.cq) > 0) {
+						pthread_mutex_unlock(
+							&virtq->virtq_lock);
+						break;
+					}
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
 				} while (1);
 				priv->timer_delay_us = priv->event_us;
 				priv->no_traffic_counter = 0;
 			} else if (max != 0) {
 				priv->no_traffic_counter = 0;
 			}
-			pthread_mutex_unlock(&priv->vq_config_lock);
 			mlx5_vdpa_timer_sleep(priv, max);
 		}
 		return NULL;
 	case MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT:
 		do {
-			cq = mlx5_vdpa_event_wait(priv);
-			if (cq != NULL) {
-				pthread_mutex_lock(&priv->vq_config_lock);
-				if (mlx5_vdpa_queue_complete(cq) > 0)
-					mlx5_vdpa_cq_arm(priv, cq);
-				pthread_mutex_unlock(&priv->vq_config_lock);
+			virtq = mlx5_vdpa_event_wait(priv);
+			if (virtq != NULL) {
+				pthread_mutex_lock(&virtq->virtq_lock);
+				if (mlx5_vdpa_queue_complete(
+					&virtq->eqp.cq) > 0)
+					mlx5_vdpa_cq_arm(priv, &virtq->eqp.cq);
+				pthread_mutex_unlock(&virtq->virtq_lock);
 			}
 		} while (1);
 		return NULL;
@@ -339,7 +355,6 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t sec;
 
-	pthread_mutex_lock(&priv->vq_config_lock);
 	while (mlx5_glue->devx_get_event(priv->err_chnl, &out.event_resp,
 					 sizeof(out.buf)) >=
 				       (ssize_t)sizeof(out.event_resp.cookie)) {
@@ -351,10 +366,11 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			continue;
 		}
 		virtq = &priv->virtqs[vq_index];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		if (!virtq->enable || virtq->version != version)
-			continue;
+			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
-			continue;
+			goto unlock;
 		virtq->stopped = true;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
@@ -384,8 +400,9 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 		for (i = 1; i < RTE_DIM(virtq->err_time); i++)
 			virtq->err_time[i - 1] = virtq->err_time[i];
 		virtq->err_time[RTE_DIM(virtq->err_time) - 1] = rte_rdtsc();
+unlock:
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
-	pthread_mutex_unlock(&priv->vq_config_lock);
 #endif
 }
 
@@ -533,11 +550,18 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 void
 mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	void *status;
+	int i;
 
 	if (priv->timer_tid) {
 		pthread_cancel(priv->timer_tid);
 		pthread_join(priv->timer_tid, &status);
+		/* The mutex may stay locked after event thread cancel, initiate it. */
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_init(&virtq->virtq_lock, NULL);
+		}
 	}
 	priv->timer_tid = 0;
 }
@@ -614,8 +638,9 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			  int callfd, struct mlx5_vdpa_event_qp *eqp)
+	int callfd, struct mlx5_vdpa_virtq *virtq)
 {
+	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
@@ -632,7 +657,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
-	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
+	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, virtq) ||
+		!eqp->cq.cq_obj.cq)
 		return -1;
 	attr.pd = priv->cdev->pdn;
 	attr.ts_format =
@@ -650,8 +676,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	attr.ts_format =
 		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
 	ret = mlx5_devx_qp_create(priv->cdev->ctx, &(eqp->sw_qp),
-					attr.num_of_receive_wqes *
-					MLX5_WSEG_SIZE, &attr, SOCKET_ID_ANY);
+				  attr.num_of_receive_wqes * MLX5_WSEG_SIZE,
+				  &attr, SOCKET_ID_ANY);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
 		goto error;
@@ -668,3 +694,4 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	mlx5_vdpa_event_qp_destroy(eqp);
 	return -1;
 }
+
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index a8faf0c116..efebf364d0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -25,11 +25,18 @@ mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 			   &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
 				"bitmap enabling.", i);
-			return -1;
+				return -1;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -61,10 +68,19 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 		virtq = &priv->virtqs[i];
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
-						      &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for LM.", i);
-			goto err;
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(
+					priv->virtqs[i].virtq,
+					&attr)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to modify virtq %d for LM.", i);
+				goto err;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -79,6 +95,7 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	int i;
@@ -90,10 +107,13 @@ mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 	if (!RTE_VHOST_NEED_LOG(features))
 		return 0;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
+		virtq = &priv->virtqs[i];
 		if (!priv->virtqs[i].virtq) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
 		} else {
+			pthread_mutex_lock(&virtq->virtq_lock);
 			ret = mlx5_vdpa_virtq_stop(priv, i);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 			if (ret) {
 				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
 					"log.", i);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index d4b4375c88..4cbf09784e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -237,19 +237,24 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 {
-	int ret = mlx5_vdpa_rqt_prepare(priv);
+	int ret;
 
+	pthread_mutex_lock(&priv->steer_update_lock);
+	ret = mlx5_vdpa_rqt_prepare(priv);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
+		pthread_mutex_unlock(&priv->steer_update_lock);
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
 		ret = mlx5_vdpa_rss_flows_create(priv);
 		if (ret) {
 			DRV_LOG(ERR, "Cannot create RSS flows.");
+			pthread_mutex_unlock(&priv->steer_update_lock);
 			return -1;
 		}
 	}
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	return 0;
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 55cbc9fad2..138b7bdbc5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -24,13 +24,17 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	pthread_mutex_lock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
 		return;
 	}
-	if (rte_intr_fd_get(virtq->intr_handle) < 0)
+	if (rte_intr_fd_get(virtq->intr_handle) < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
 	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
@@ -44,9 +48,14 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 		}
 		break;
 	}
-	if (nbytes < 0)
+	if (nbytes < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
@@ -66,6 +75,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Virtq must be locked before calling this function. */
+static void
+mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
+{
+	int ret = -EAGAIN;
+
+	if (!virtq->intr_handle)
+		return;
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(virtq->intr_handle,
+					mlx5_vdpa_virtq_kick_handler, virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					virtq->index);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+				pthread_mutex_lock(&virtq->virtq_lock);
+			}
+		}
+		(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	}
+	rte_intr_instance_free(virtq->intr_handle);
+	virtq->intr_handle = NULL;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
@@ -75,6 +111,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		pthread_mutex_lock(&virtq->virtq_lock);
 		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
@@ -90,28 +127,17 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 		}
 		if (virtq->eqp.fw_qp)
 			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
-		while (ret == -EAGAIN) {
-			ret = rte_intr_callback_unregister(virtq->intr_handle,
-					mlx5_vdpa_virtq_kick_handler, virtq);
-			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
-					rte_intr_fd_get(virtq->intr_handle),
-					virtq->index);
-				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
-			}
-		}
-		rte_intr_fd_set(virtq->intr_handle, -1);
-	}
-	rte_intr_instance_free(virtq->intr_handle);
+	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
@@ -128,10 +154,15 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
-	for (i = 0; i < priv->nr_virtqs; i++)
-		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unset(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -250,7 +281,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
 		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, &virtq->eqp);
+				vq->size, vq->callfd, virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -420,7 +451,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	virtq->configured = 1;
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
 		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
@@ -441,7 +474,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (rte_intr_callback_register(virtq->intr_handle,
 					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
-			rte_intr_fd_set(virtq->intr_handle, -1);
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
@@ -537,6 +570,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	struct mlx5_vdpa_virtq *virtq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -556,9 +590,17 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++)
-		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-			goto error;
+	for (i = 0; i < nr_vring; i++) {
+		virtq = &priv->virtqs[i];
+		if (virtq->enable) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_vdpa_virtq_setup(priv, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (13 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 08/16] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 09/16] vdpa/mlx5: add multi-thread management for configuration Li Zhang
                     ` (16 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h       |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    |  13 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 257 +++++++++++++++++-----------
 3 files changed, 170 insertions(+), 104 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
 	uint16_t vq_size;
 	uint8_t notifier_state;
 	bool stopped;
+	uint32_t configured:1;
 	uint32_t version;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..a8faf0c116 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,14 +12,17 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.mod_fields_bitmap =
+			MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
 		.dirty_bitmap_dump_enable = enable,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
@@ -37,10 +40,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 			   uint64_t log_size)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
 		.dirty_bitmap_addr = log_base,
 		.dirty_bitmap_size = log_size,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 	int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
 					      priv->cdev->pdn,
@@ -54,7 +58,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 						      &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6637ba1503..55cbc9fad2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		rte_intr_fd_set(virtq->intr_handle, -1);
 	}
 	rte_intr_instance_free(virtq->intr_handle);
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
+		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
-			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE,
 			.state = state ? MLX5_VIRTQ_STATE_RDY :
 					 MLX5_VIRTQ_STATE_SUSPEND,
 			.queue_index = virtq->index,
@@ -153,7 +155,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	int ret;
 
-	if (virtq->stopped)
+	if (virtq->stopped || !virtq->configured)
 		return 0;
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
@@ -209,51 +211,54 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
+		struct mlx5_devx_virtq_attr *attr,
+		struct rte_vhost_vring *vq, int index)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
-	struct rte_vhost_vring vq;
-	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
 	unsigned int i;
-	uint16_t last_avail_idx;
-	uint16_t last_used_idx;
-	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
-	uint64_t cookie;
-
-	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
-	if (ret)
-		return -1;
-	if (vq.size == 0)
-		return 0;
-	virtq->index = index;
-	virtq->vq_size = vq.size;
-	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
-							VIRTIO_F_VERSION_1));
-	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+	uint16_t last_avail_idx = 0;
+	uint16_t last_used_idx = 0;
+
+	if (virtq->virtq)
+		attr->mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE |
+			MLX5_VIRTQ_MODIFY_TYPE_ADDR |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
+			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
+			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 =
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
+	attr->q_type =
+		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
 			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
-					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
-						      MLX5_VIRTQ_EVENT_MODE_QP :
-						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
-	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
-						&virtq->eqp);
+	attr->event_mode = vq->callfd != -1 ||
+	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
+		ret = mlx5_vdpa_event_qp_prepare(priv,
+				vq->size, vq->callfd, &virtq->eqp);
 		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+			DRV_LOG(ERR,
+				"Failed to create event QPs for virtq %d.",
 				index);
 			return -1;
 		}
-		attr.qp_id = virtq->eqp.fw_qp->id;
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+		attr->qp_id = virtq->eqp.fw_qp->id;
 	} else {
 		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
 			" need event QPs and event mechanism.", index);
@@ -265,77 +270,82 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (!virtq->counters) {
 			DRV_LOG(ERR, "Failed to create virtq couners for virtq"
 				" %d.", index);
-			goto error;
+			return -1;
 		}
-		attr.counters_obj_id = virtq->counters->id;
+		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		uint32_t size;
-		void *buf;
-		struct mlx5dv_devx_umem *obj;
-
-		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
-		if (virtq->umems[i].size == size &&
-		    virtq->umems[i].obj != NULL) {
-			/* Reuse registered memory. */
-			memset(virtq->umems[i].buf, 0, size);
-			goto reuse;
-		}
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
+	if (virtq->virtq) {
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size =
+		priv->caps.umems[i].a * vq->size + priv->caps.umems[i].b;
+			if (virtq->umems[i].size == size &&
+				virtq->umems[i].obj != NULL) {
+				/* Reuse registered memory. */
+				memset(virtq->umems[i].buf, 0, size);
+				goto reuse;
+			}
+			if (virtq->umems[i].obj)
+				claim_zero(mlx5_glue->devx_umem_dereg
 				   (virtq->umems[i].obj));
-		if (virtq->umems[i].buf)
-			rte_free(virtq->umems[i].buf);
-		virtq->umems[i].size = 0;
-		virtq->umems[i].obj = NULL;
-		virtq->umems[i].buf = NULL;
-		buf = rte_zmalloc(__func__, size, 4096);
-		if (buf == NULL) {
-			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+			if (virtq->umems[i].buf)
+				rte_free(virtq->umems[i].buf);
+			virtq->umems[i].size = 0;
+			virtq->umems[i].obj = NULL;
+			virtq->umems[i].buf = NULL;
+			buf = rte_zmalloc(__func__,
+				size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
-			goto error;
-		}
-		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
-					       IBV_ACCESS_LOCAL_WRITE);
-		if (obj == NULL) {
-			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
+				buf, size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
-			goto error;
-		}
-		virtq->umems[i].size = size;
-		virtq->umems[i].buf = buf;
-		virtq->umems[i].obj = obj;
+				rte_free(buf);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
 reuse:
-		attr.umems[i].id = virtq->umems[i].obj->umem_id;
-		attr.umems[i].offset = 0;
-		attr.umems[i].size = virtq->umems[i].size;
+			attr->umems[i].id = virtq->umems[i].obj->umem_id;
+			attr->umems[i].offset = 0;
+			attr->umems[i].size = virtq->umems[i].size;
+		}
 	}
-	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.desc);
+					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
-			goto error;
+			return -1;
 		}
-		attr.desc_addr = gpa;
+		attr->desc_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.used);
+					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
-			goto error;
+			return -1;
 		}
-		attr.used_addr = gpa;
+		attr->used_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.avail);
+					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-			goto error;
+			return -1;
 		}
-		attr.available_addr = gpa;
+		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
-				 &last_used_idx);
+	ret = rte_vhost_get_vring_base(priv->vid,
+			index, &last_avail_idx, &last_used_idx);
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
@@ -345,24 +355,71 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
 	}
-	attr.hw_available_index = last_avail_idx;
-	attr.hw_used_index = last_used_idx;
-	attr.q_size = vq.size;
-	attr.mkey = priv->gpa_mkey_index;
-	attr.tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
-	attr.queue_index = index;
-	attr.pd = priv->cdev->pdn;
-	attr.hw_latency_mode = priv->hw_latency_mode;
-	attr.hw_max_latency_us = priv->hw_max_latency_us;
-	attr.hw_max_pending_comp = priv->hw_max_pending_comp;
-	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+	attr->hw_available_index = last_avail_idx;
+	attr->hw_used_index = last_used_idx;
+	attr->q_size = vq->size;
+	attr->mkey = priv->gpa_mkey_index;
+	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
+	attr->queue_index = index;
+	attr->pd = priv->cdev->pdn;
+	attr->hw_latency_mode = priv->hw_latency_mode;
+	attr->hw_max_latency_us = priv->hw_max_latency_us;
+	attr->hw_max_pending_comp = priv->hw_max_pending_comp;
+	if (attr->hw_latency_mode || attr->hw_max_latency_us ||
+		attr->hw_max_pending_comp)
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD;
+	return 0;
+}
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
+{
+	return (priv->caps.vnet_modify_ext &&
+			priv->caps.virtio_net_q_addr_modify &&
+			priv->caps.virtio_q_index_modify) ? true : false;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+{
+	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	int ret;
+	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
+	uint64_t cookie;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->priv = priv;
-	if (!virtq->virtq)
+	virtq->stopped = 0;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
+				&vq, index);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to setup update virtq attr"
+			" %d.", index);
 		goto error;
-	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
-	if (mlx5_vdpa_virtq_modify(virtq, 1))
+	}
+	if (!virtq->virtq) {
+		virtq->index = index;
+		virtq->vq_size = vq.size;
+		virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx,
+			&attr);
+		if (!virtq->virtq)
+			goto error;
+		attr.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE;
+	}
+	attr.state = MLX5_VIRTQ_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify virtq %d.", index);
 		goto error;
-	virtq->priv = priv;
+	}
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->configured = 1;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
@@ -553,7 +610,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 			return 0;
 		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
 	}
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
 			ret = mlx5_vdpa_steer_update(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 09/16] vdpa/mlx5: add multi-thread management for configuration
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (14 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
                     ` (15 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst          |  11 +++
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  41 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  36 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is suggested).
+    This value, if not 0, should be the same for all the devices;
+    the first prob will take it with the event_core for all the multi-thread configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'mlx5_vdpa_virtq.c',
         'mlx5_vdpa_steer.c',
         'mlx5_vdpa_lm.c',
+        'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
         '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e5a11f72fd..a9d023ed08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 			DRV_LOG(WARNING, "Invalid event_core %s.", val);
 		else
 			priv->event_core = tmp;
+	} else if (strcmp(key, "max_conf_threads") == 0) {
+		if (tmp) {
+			priv->use_c_thread = true;
+			if (!conf_thread_mng.initializer_priv) {
+				conf_thread_mng.initializer_priv = priv;
+				if (tmp > MLX5_VDPA_MAX_C_THRD) {
+					DRV_LOG(WARNING,
+				"Invalid max_conf_threads %s "
+				"and set max_conf_threads to %d",
+				val, MLX5_VDPA_MAX_C_THRD);
+					tmp = MLX5_VDPA_MAX_C_THRD;
+				}
+				conf_thread_mng.max_thrds = tmp;
+			} else if (tmp != conf_thread_mng.max_thrds) {
+				DRV_LOG(WARNING,
+	"max_conf_threads is PMD argument and not per device, "
+	"only the first device configuration set it, current value is %d "
+	"and will not be changed to %d.",
+				conf_thread_mng.max_thrds, (int)tmp);
+			}
+		} else {
+			priv->use_c_thread = false;
+		}
 	} else if (strcmp(key, "hw_latency_mode") == 0) {
 		priv->hw_latency_mode = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		"hw_max_latency_us",
 		"hw_max_pending_comp",
 		"no_traffic_time",
+		"queue_size",
+		"queues",
+		"max_conf_threads",
 		NULL,
 	};
 
@@ -725,6 +753,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
+	if (priv->use_c_thread) {
+		if (conf_thread_mng.initializer_priv == priv)
+			if (mlx5_vdpa_mult_threads_create(priv->event_core))
+				goto error;
+		__atomic_fetch_add(&conf_thread_mng.refcnt, 1,
+			__ATOMIC_RELAXED);
+	}
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -739,6 +774,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
 error:
+	if (conf_thread_mng.initializer_priv == priv)
+		mlx5_vdpa_mult_threads_destroy(false);
 	if (priv)
 		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
@@ -806,6 +843,10 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
+	if (priv->use_c_thread)
+		if (__atomic_fetch_sub(&conf_thread_mng.refcnt,
+			1, __ATOMIC_RELAXED) == 1)
+			mlx5_vdpa_mult_threads_destroy(true);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3fd5eefc5e..4e7c2557b7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -73,6 +73,22 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_MAX_C_THRD 256
+
+/* Generic mlx5_vdpa_c_thread information. */
+struct mlx5_vdpa_c_thread {
+	pthread_t tid;
+};
+
+struct mlx5_vdpa_conf_thread_mng {
+	void *initializer_priv;
+	uint32_t refcnt;
+	uint32_t max_thrds;
+	pthread_mutex_t cthrd_lock;
+	struct mlx5_vdpa_c_thread cthrd[MLX5_VDPA_MAX_C_THRD];
+};
+extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -126,6 +142,7 @@ enum mlx5_dev_state {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
+	bool use_c_thread;
 	enum mlx5_dev_state state;
 	rte_spinlock_t db_lock;
 	pthread_mutex_t steer_update_lock;
@@ -496,4 +513,23 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create configuration multi-threads resource
+ *
+ * @param[in] cpu_core
+ *   CPU core number to set configuration threads affinity to.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_vdpa_mult_threads_create(int cpu_core);
+
+/**
+ * Destroy configuration multi-threads resource
+ *
+ */
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
new file mode 100644
index 0000000000..ba7d8b63b3
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+#include <string.h>
+#include <unistd.h>
+#include <sys/eventfd.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_io.h>
+#include <rte_alarm.h>
+#include <rte_tailq.h>
+#include <rte_ring_elem.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static void *
+mlx5_vdpa_c_thread_handle(void *arg)
+{
+	/* To be added later. */
+	return arg;
+}
+
+static void
+mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
+{
+	if (conf_thread_mng.cthrd[thrd_idx].tid) {
+		pthread_cancel(conf_thread_mng.cthrd[thrd_idx].tid);
+		pthread_join(conf_thread_mng.cthrd[thrd_idx].tid, NULL);
+		conf_thread_mng.cthrd[thrd_idx].tid = 0;
+		if (need_unlock)
+			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	}
+}
+
+static int
+mlx5_vdpa_c_thread_create(int cpu_core)
+{
+	const struct sched_param sp = {
+		.sched_priority = sched_get_priority_max(SCHED_RR),
+	};
+	rte_cpuset_t cpuset;
+	pthread_attr_t attr;
+	uint32_t thrd_idx;
+	char name[32];
+	int ret;
+
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_attr_init(&attr);
+	ret = pthread_attr_setschedpolicy(&attr, SCHED_RR);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread sched policy = RR.");
+		goto c_thread_err;
+	}
+	ret = pthread_attr_setschedparam(&attr, &sp);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread priority.");
+		goto c_thread_err;
+	}
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++) {
+		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
+				&attr, mlx5_vdpa_c_thread_handle,
+				(void *)&conf_thread_mng);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create vdpa multi-threads %d.",
+					thrd_idx);
+			goto c_thread_err;
+		}
+		CPU_ZERO(&cpuset);
+		if (cpu_core != -1)
+			CPU_SET(cpu_core, &cpuset);
+		else
+			cpuset = rte_lcore_cpuset(rte_get_main_lcore());
+		ret = pthread_setaffinity_np(
+				conf_thread_mng.cthrd[thrd_idx].tid,
+				sizeof(cpuset), &cpuset);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set thread affinity for "
+			"vdpa multi-threads %d.", thrd_idx);
+			goto c_thread_err;
+		}
+		snprintf(name, sizeof(name), "vDPA-mthread-%d", thrd_idx);
+		ret = pthread_setname_np(
+				conf_thread_mng.cthrd[thrd_idx].tid, name);
+		if (ret)
+			DRV_LOG(ERR, "Failed to set vdpa multi-threads name %s.",
+					name);
+		else
+			DRV_LOG(DEBUG, "Thread name: %s.", name);
+	}
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+c_thread_err:
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, false);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return -1;
+}
+
+int
+mlx5_vdpa_mult_threads_create(int cpu_core)
+{
+	pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	if (mlx5_vdpa_c_thread_create(cpu_core)) {
+		DRV_LOG(ERR, "Cannot create vDPA configuration threads.");
+		mlx5_vdpa_mult_threads_destroy(false);
+		return -1;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock)
+{
+	uint32_t thrd_idx;
+
+	if (!conf_thread_mng.initializer_priv)
+		return;
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, need_unlock);
+	pthread_mutex_destroy(&conf_thread_mng.cthrd_lock);
+	memset(&conf_thread_mng, 0, sizeof(struct mlx5_vdpa_conf_thread_mng));
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 2b0f5936d1..b45fbac146 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -507,7 +507,7 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 	pthread_attr_t attr;
 	char name[16];
 	const struct sched_param sp = {
-		.sched_priority = sched_get_priority_max(SCHED_RR),
+		.sched_priority = sched_get_priority_max(SCHED_RR) - 1,
 	};
 
 	if (!priv->eventc)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 138b7bdbc5..599809b09b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -43,7 +43,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 			    errno == EWOULDBLOCK ||
 			    errno == EAGAIN)
 				continue;
-			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s.",
 				virtq->index, strerror(errno));
 		}
 		break;
@@ -57,7 +57,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	rte_spinlock_unlock(&priv->db_lock);
 	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
-		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling.",
 			priv->vid, virtq->index);
 		return;
 	}
@@ -218,7 +218,7 @@ mlx5_vdpa_virtq_query(struct mlx5_vdpa_priv *priv, int index)
 		return -1;
 	}
 	if (attr.state == MLX5_VIRTQ_STATE_ERROR)
-		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu",
+		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu.",
 			priv->vid, index, attr.error_type);
 	return 0;
 }
@@ -380,7 +380,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0");
+		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
 	} else {
 		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (15 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 09/16] vdpa/mlx5: add multi-thread management for configuration Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration Li Zhang
                     ` (14 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 24 ++++---
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++++++++++++++++++-----------
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    | 34 +++++++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++++++++++++++++++-------
 6 files changed, 184 insertions(+), 79 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee99952e11..e5a11f72fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
 	struct mlx5_vdpa_priv *priv =
 		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	struct mlx5_vdpa_virtq *virtq;
 	int ret;
 
 	if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
-	pthread_mutex_lock(&priv->vq_config_lock);
+	virtq = &priv->virtqs[vring];
+	pthread_mutex_lock(&virtq->virtq_lock);
 	ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-	pthread_mutex_unlock(&priv->vq_config_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
-	/* The mutex may stay locked after event thread cancel - initiate it. */
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t index;
 	uint32_t i;
 
+	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+		index++) {
+		virtq = &priv->virtqs[index];
+		pthread_mutex_init(&virtq->virtq_lock, NULL);
+	}
 	if (!priv->queues)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
-		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		virtq = &priv->virtqs[index];
 		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, &virtq->eqp);
+					-1, virtq);
 
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
+	rte_spinlock_init(&priv->db_lock);
+	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
-	pthread_mutex_destroy(&priv->vq_config_lock);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
 	bool stopped;
 	uint32_t configured:1;
 	uint32_t version;
+	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
 	enum mlx5_dev_state state;
-	pthread_mutex_t vq_config_lock;
+	rte_spinlock_t db_lock;
+	pthread_mutex_t steer_update_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
 	int event_mode;
@@ -222,14 +224,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   Number of descriptors.
  * @param[in] callfd
  *   The guest notification file descriptor.
- * @param[in/out] eqp
- *   Pointer to the event QP structure.
+ * @param[in/out] virtq
+ *   Pointer to the virt-queue structure.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			      int callfd, struct mlx5_vdpa_event_qp *eqp);
+int
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+	int callfd, struct mlx5_vdpa_virtq *virtq);
 
 /**
  * Destroy an event QP and all its related resources.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b43dca9255..2b0f5936d1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -85,12 +85,13 @@ mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
 
 static int
 mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
-		    int callfd, struct mlx5_vdpa_cq *cq)
+		int callfd, struct mlx5_vdpa_virtq *virtq)
 {
 	struct mlx5_devx_cq_attr attr = {
 		.use_first_only = 1,
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
+	struct mlx5_vdpa_cq *cq = &virtq->eqp.cq;
 	uint16_t event_nums[1] = {0};
 	int ret;
 
@@ -102,10 +103,11 @@ mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
 	cq->log_desc_n = log_desc_n;
 	rte_spinlock_init(&cq->sl);
 	/* Subscribe CQ event to the event channel controlled by the driver. */
-	ret = mlx5_os_devx_subscribe_devx_event(priv->eventc,
-						cq->cq_obj.cq->obj,
-						sizeof(event_nums), event_nums,
-						(uint64_t)(uintptr_t)cq);
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc,
+							cq->cq_obj.cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)virtq);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to subscribe CQE event.");
 		rte_errno = errno;
@@ -167,13 +169,17 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 static void
 mlx5_vdpa_arm_all_cqs(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_cq *cq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		cq = &priv->virtqs[i].eqp.cq;
 		if (cq->cq_obj.cq && !cq->armed)
 			mlx5_vdpa_cq_arm(priv, cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
@@ -220,13 +226,18 @@ mlx5_vdpa_queue_complete(struct mlx5_vdpa_cq *cq)
 static uint32_t
 mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 {
-	int i;
+	struct mlx5_vdpa_virtq *virtq;
+	struct mlx5_vdpa_cq *cq;
 	uint32_t max = 0;
+	uint32_t comp;
+	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
-		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
-		uint32_t comp = mlx5_vdpa_queue_complete(cq);
-
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		cq = &virtq->eqp.cq;
+		comp = mlx5_vdpa_queue_complete(cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		if (comp > max)
 			max = comp;
 	}
@@ -253,7 +264,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 }
 
 /* Wait on all CQs channel for completion event. */
-static struct mlx5_vdpa_cq *
+static struct mlx5_vdpa_virtq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 {
 #ifdef HAVE_IBV_DEVX_EVENT
@@ -265,7 +276,8 @@ mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 					    sizeof(out.buf));
 
 	if (ret >= 0)
-		return (struct mlx5_vdpa_cq *)(uintptr_t)out.event_resp.cookie;
+		return (struct mlx5_vdpa_virtq *)
+				(uintptr_t)out.event_resp.cookie;
 	DRV_LOG(INFO, "Got error in devx_get_event, ret = %d, errno = %d.",
 		ret, errno);
 #endif
@@ -276,7 +288,7 @@ static void *
 mlx5_vdpa_event_handle(void *arg)
 {
 	struct mlx5_vdpa_priv *priv = arg;
-	struct mlx5_vdpa_cq *cq;
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t max;
 
 	switch (priv->event_mode) {
@@ -284,7 +296,6 @@ mlx5_vdpa_event_handle(void *arg)
 	case MLX5_VDPA_EVENT_MODE_FIXED_TIMER:
 		priv->timer_delay_us = priv->event_us;
 		while (1) {
-			pthread_mutex_lock(&priv->vq_config_lock);
 			max = mlx5_vdpa_queues_complete(priv);
 			if (max == 0 && priv->no_traffic_counter++ >=
 			    priv->no_traffic_max) {
@@ -292,32 +303,37 @@ mlx5_vdpa_event_handle(void *arg)
 					priv->vdev->device->name);
 				mlx5_vdpa_arm_all_cqs(priv);
 				do {
-					pthread_mutex_unlock
-							(&priv->vq_config_lock);
-					cq = mlx5_vdpa_event_wait(priv);
-					pthread_mutex_lock
-							(&priv->vq_config_lock);
-					if (cq == NULL ||
-					       mlx5_vdpa_queue_complete(cq) > 0)
+					virtq = mlx5_vdpa_event_wait(priv);
+					if (virtq == NULL)
 						break;
+					pthread_mutex_lock(
+						&virtq->virtq_lock);
+					if (mlx5_vdpa_queue_complete(
+						&virtq->eqp.cq) > 0) {
+						pthread_mutex_unlock(
+							&virtq->virtq_lock);
+						break;
+					}
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
 				} while (1);
 				priv->timer_delay_us = priv->event_us;
 				priv->no_traffic_counter = 0;
 			} else if (max != 0) {
 				priv->no_traffic_counter = 0;
 			}
-			pthread_mutex_unlock(&priv->vq_config_lock);
 			mlx5_vdpa_timer_sleep(priv, max);
 		}
 		return NULL;
 	case MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT:
 		do {
-			cq = mlx5_vdpa_event_wait(priv);
-			if (cq != NULL) {
-				pthread_mutex_lock(&priv->vq_config_lock);
-				if (mlx5_vdpa_queue_complete(cq) > 0)
-					mlx5_vdpa_cq_arm(priv, cq);
-				pthread_mutex_unlock(&priv->vq_config_lock);
+			virtq = mlx5_vdpa_event_wait(priv);
+			if (virtq != NULL) {
+				pthread_mutex_lock(&virtq->virtq_lock);
+				if (mlx5_vdpa_queue_complete(
+					&virtq->eqp.cq) > 0)
+					mlx5_vdpa_cq_arm(priv, &virtq->eqp.cq);
+				pthread_mutex_unlock(&virtq->virtq_lock);
 			}
 		} while (1);
 		return NULL;
@@ -339,7 +355,6 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t sec;
 
-	pthread_mutex_lock(&priv->vq_config_lock);
 	while (mlx5_glue->devx_get_event(priv->err_chnl, &out.event_resp,
 					 sizeof(out.buf)) >=
 				       (ssize_t)sizeof(out.event_resp.cookie)) {
@@ -351,10 +366,11 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			continue;
 		}
 		virtq = &priv->virtqs[vq_index];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		if (!virtq->enable || virtq->version != version)
-			continue;
+			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
-			continue;
+			goto unlock;
 		virtq->stopped = true;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
@@ -384,8 +400,9 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 		for (i = 1; i < RTE_DIM(virtq->err_time); i++)
 			virtq->err_time[i - 1] = virtq->err_time[i];
 		virtq->err_time[RTE_DIM(virtq->err_time) - 1] = rte_rdtsc();
+unlock:
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
-	pthread_mutex_unlock(&priv->vq_config_lock);
 #endif
 }
 
@@ -533,11 +550,18 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 void
 mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	void *status;
+	int i;
 
 	if (priv->timer_tid) {
 		pthread_cancel(priv->timer_tid);
 		pthread_join(priv->timer_tid, &status);
+		/* The mutex may stay locked after event thread cancel, initiate it. */
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_init(&virtq->virtq_lock, NULL);
+		}
 	}
 	priv->timer_tid = 0;
 }
@@ -614,8 +638,9 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			  int callfd, struct mlx5_vdpa_event_qp *eqp)
+	int callfd, struct mlx5_vdpa_virtq *virtq)
 {
+	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
@@ -632,7 +657,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
-	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
+	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, virtq) ||
+		!eqp->cq.cq_obj.cq)
 		return -1;
 	attr.pd = priv->cdev->pdn;
 	attr.ts_format =
@@ -650,8 +676,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	attr.ts_format =
 		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
 	ret = mlx5_devx_qp_create(priv->cdev->ctx, &(eqp->sw_qp),
-					attr.num_of_receive_wqes *
-					MLX5_WSEG_SIZE, &attr, SOCKET_ID_ANY);
+				  attr.num_of_receive_wqes * MLX5_WSEG_SIZE,
+				  &attr, SOCKET_ID_ANY);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
 		goto error;
@@ -668,3 +694,4 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	mlx5_vdpa_event_qp_destroy(eqp);
 	return -1;
 }
+
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index a8faf0c116..efebf364d0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -25,11 +25,18 @@ mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 			   &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
 				"bitmap enabling.", i);
-			return -1;
+				return -1;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -61,10 +68,19 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 		virtq = &priv->virtqs[i];
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
-						      &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for LM.", i);
-			goto err;
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(
+					priv->virtqs[i].virtq,
+					&attr)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to modify virtq %d for LM.", i);
+				goto err;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -79,6 +95,7 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	int i;
@@ -90,10 +107,13 @@ mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 	if (!RTE_VHOST_NEED_LOG(features))
 		return 0;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
+		virtq = &priv->virtqs[i];
 		if (!priv->virtqs[i].virtq) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
 		} else {
+			pthread_mutex_lock(&virtq->virtq_lock);
 			ret = mlx5_vdpa_virtq_stop(priv, i);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 			if (ret) {
 				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
 					"log.", i);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index d4b4375c88..4cbf09784e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -237,19 +237,24 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 {
-	int ret = mlx5_vdpa_rqt_prepare(priv);
+	int ret;
 
+	pthread_mutex_lock(&priv->steer_update_lock);
+	ret = mlx5_vdpa_rqt_prepare(priv);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
+		pthread_mutex_unlock(&priv->steer_update_lock);
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
 		ret = mlx5_vdpa_rss_flows_create(priv);
 		if (ret) {
 			DRV_LOG(ERR, "Cannot create RSS flows.");
+			pthread_mutex_unlock(&priv->steer_update_lock);
 			return -1;
 		}
 	}
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	return 0;
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 55cbc9fad2..138b7bdbc5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -24,13 +24,17 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	pthread_mutex_lock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
 		return;
 	}
-	if (rte_intr_fd_get(virtq->intr_handle) < 0)
+	if (rte_intr_fd_get(virtq->intr_handle) < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
 	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
@@ -44,9 +48,14 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 		}
 		break;
 	}
-	if (nbytes < 0)
+	if (nbytes < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
@@ -66,6 +75,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Virtq must be locked before calling this function. */
+static void
+mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
+{
+	int ret = -EAGAIN;
+
+	if (!virtq->intr_handle)
+		return;
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(virtq->intr_handle,
+					mlx5_vdpa_virtq_kick_handler, virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					virtq->index);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+				pthread_mutex_lock(&virtq->virtq_lock);
+			}
+		}
+		(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	}
+	rte_intr_instance_free(virtq->intr_handle);
+	virtq->intr_handle = NULL;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
@@ -75,6 +111,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		pthread_mutex_lock(&virtq->virtq_lock);
 		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
@@ -90,28 +127,17 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 		}
 		if (virtq->eqp.fw_qp)
 			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
-		while (ret == -EAGAIN) {
-			ret = rte_intr_callback_unregister(virtq->intr_handle,
-					mlx5_vdpa_virtq_kick_handler, virtq);
-			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
-					rte_intr_fd_get(virtq->intr_handle),
-					virtq->index);
-				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
-			}
-		}
-		rte_intr_fd_set(virtq->intr_handle, -1);
-	}
-	rte_intr_instance_free(virtq->intr_handle);
+	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
@@ -128,10 +154,15 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
-	for (i = 0; i < priv->nr_virtqs; i++)
-		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unset(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -250,7 +281,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
 		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, &virtq->eqp);
+				vq->size, vq->callfd, virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -420,7 +451,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	virtq->configured = 1;
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
 		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
@@ -441,7 +474,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (rte_intr_callback_register(virtq->intr_handle,
 					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
-			rte_intr_fd_set(virtq->intr_handle, -1);
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
@@ -537,6 +570,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	struct mlx5_vdpa_virtq *virtq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -556,9 +590,17 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++)
-		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-			goto error;
+	for (i = 0; i < nr_vring; i++) {
+		virtq = &priv->virtqs[i];
+		if (virtq->enable) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_vdpa_virtq_setup(priv, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (16 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 10/16] vdpa/mlx5: add task ring for MT management Li Zhang
                     ` (13 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst          |  11 +++
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  41 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  36 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is suggested).
+    This value, if not 0, should be the same for all the devices;
+    the first prob will take it with the event_core for all the multi-thread configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'mlx5_vdpa_virtq.c',
         'mlx5_vdpa_steer.c',
         'mlx5_vdpa_lm.c',
+        'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
         '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e5a11f72fd..a9d023ed08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 			DRV_LOG(WARNING, "Invalid event_core %s.", val);
 		else
 			priv->event_core = tmp;
+	} else if (strcmp(key, "max_conf_threads") == 0) {
+		if (tmp) {
+			priv->use_c_thread = true;
+			if (!conf_thread_mng.initializer_priv) {
+				conf_thread_mng.initializer_priv = priv;
+				if (tmp > MLX5_VDPA_MAX_C_THRD) {
+					DRV_LOG(WARNING,
+				"Invalid max_conf_threads %s "
+				"and set max_conf_threads to %d",
+				val, MLX5_VDPA_MAX_C_THRD);
+					tmp = MLX5_VDPA_MAX_C_THRD;
+				}
+				conf_thread_mng.max_thrds = tmp;
+			} else if (tmp != conf_thread_mng.max_thrds) {
+				DRV_LOG(WARNING,
+	"max_conf_threads is PMD argument and not per device, "
+	"only the first device configuration set it, current value is %d "
+	"and will not be changed to %d.",
+				conf_thread_mng.max_thrds, (int)tmp);
+			}
+		} else {
+			priv->use_c_thread = false;
+		}
 	} else if (strcmp(key, "hw_latency_mode") == 0) {
 		priv->hw_latency_mode = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		"hw_max_latency_us",
 		"hw_max_pending_comp",
 		"no_traffic_time",
+		"queue_size",
+		"queues",
+		"max_conf_threads",
 		NULL,
 	};
 
@@ -725,6 +753,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
+	if (priv->use_c_thread) {
+		if (conf_thread_mng.initializer_priv == priv)
+			if (mlx5_vdpa_mult_threads_create(priv->event_core))
+				goto error;
+		__atomic_fetch_add(&conf_thread_mng.refcnt, 1,
+			__ATOMIC_RELAXED);
+	}
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -739,6 +774,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
 error:
+	if (conf_thread_mng.initializer_priv == priv)
+		mlx5_vdpa_mult_threads_destroy(false);
 	if (priv)
 		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
@@ -806,6 +843,10 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
+	if (priv->use_c_thread)
+		if (__atomic_fetch_sub(&conf_thread_mng.refcnt,
+			1, __ATOMIC_RELAXED) == 1)
+			mlx5_vdpa_mult_threads_destroy(true);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3fd5eefc5e..4e7c2557b7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -73,6 +73,22 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_MAX_C_THRD 256
+
+/* Generic mlx5_vdpa_c_thread information. */
+struct mlx5_vdpa_c_thread {
+	pthread_t tid;
+};
+
+struct mlx5_vdpa_conf_thread_mng {
+	void *initializer_priv;
+	uint32_t refcnt;
+	uint32_t max_thrds;
+	pthread_mutex_t cthrd_lock;
+	struct mlx5_vdpa_c_thread cthrd[MLX5_VDPA_MAX_C_THRD];
+};
+extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -126,6 +142,7 @@ enum mlx5_dev_state {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
+	bool use_c_thread;
 	enum mlx5_dev_state state;
 	rte_spinlock_t db_lock;
 	pthread_mutex_t steer_update_lock;
@@ -496,4 +513,23 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create configuration multi-threads resource
+ *
+ * @param[in] cpu_core
+ *   CPU core number to set configuration threads affinity to.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_vdpa_mult_threads_create(int cpu_core);
+
+/**
+ * Destroy configuration multi-threads resource
+ *
+ */
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
new file mode 100644
index 0000000000..ba7d8b63b3
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+#include <string.h>
+#include <unistd.h>
+#include <sys/eventfd.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_io.h>
+#include <rte_alarm.h>
+#include <rte_tailq.h>
+#include <rte_ring_elem.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static void *
+mlx5_vdpa_c_thread_handle(void *arg)
+{
+	/* To be added later. */
+	return arg;
+}
+
+static void
+mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
+{
+	if (conf_thread_mng.cthrd[thrd_idx].tid) {
+		pthread_cancel(conf_thread_mng.cthrd[thrd_idx].tid);
+		pthread_join(conf_thread_mng.cthrd[thrd_idx].tid, NULL);
+		conf_thread_mng.cthrd[thrd_idx].tid = 0;
+		if (need_unlock)
+			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	}
+}
+
+static int
+mlx5_vdpa_c_thread_create(int cpu_core)
+{
+	const struct sched_param sp = {
+		.sched_priority = sched_get_priority_max(SCHED_RR),
+	};
+	rte_cpuset_t cpuset;
+	pthread_attr_t attr;
+	uint32_t thrd_idx;
+	char name[32];
+	int ret;
+
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_attr_init(&attr);
+	ret = pthread_attr_setschedpolicy(&attr, SCHED_RR);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread sched policy = RR.");
+		goto c_thread_err;
+	}
+	ret = pthread_attr_setschedparam(&attr, &sp);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread priority.");
+		goto c_thread_err;
+	}
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++) {
+		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
+				&attr, mlx5_vdpa_c_thread_handle,
+				(void *)&conf_thread_mng);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create vdpa multi-threads %d.",
+					thrd_idx);
+			goto c_thread_err;
+		}
+		CPU_ZERO(&cpuset);
+		if (cpu_core != -1)
+			CPU_SET(cpu_core, &cpuset);
+		else
+			cpuset = rte_lcore_cpuset(rte_get_main_lcore());
+		ret = pthread_setaffinity_np(
+				conf_thread_mng.cthrd[thrd_idx].tid,
+				sizeof(cpuset), &cpuset);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set thread affinity for "
+			"vdpa multi-threads %d.", thrd_idx);
+			goto c_thread_err;
+		}
+		snprintf(name, sizeof(name), "vDPA-mthread-%d", thrd_idx);
+		ret = pthread_setname_np(
+				conf_thread_mng.cthrd[thrd_idx].tid, name);
+		if (ret)
+			DRV_LOG(ERR, "Failed to set vdpa multi-threads name %s.",
+					name);
+		else
+			DRV_LOG(DEBUG, "Thread name: %s.", name);
+	}
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+c_thread_err:
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, false);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return -1;
+}
+
+int
+mlx5_vdpa_mult_threads_create(int cpu_core)
+{
+	pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	if (mlx5_vdpa_c_thread_create(cpu_core)) {
+		DRV_LOG(ERR, "Cannot create vDPA configuration threads.");
+		mlx5_vdpa_mult_threads_destroy(false);
+		return -1;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock)
+{
+	uint32_t thrd_idx;
+
+	if (!conf_thread_mng.initializer_priv)
+		return;
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, need_unlock);
+	pthread_mutex_destroy(&conf_thread_mng.cthrd_lock);
+	memset(&conf_thread_mng, 0, sizeof(struct mlx5_vdpa_conf_thread_mng));
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 2b0f5936d1..b45fbac146 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -507,7 +507,7 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 	pthread_attr_t attr;
 	char name[16];
 	const struct sched_param sp = {
-		.sched_priority = sched_get_priority_max(SCHED_RR),
+		.sched_priority = sched_get_priority_max(SCHED_RR) - 1,
 	};
 
 	if (!priv->eventc)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 138b7bdbc5..599809b09b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -43,7 +43,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 			    errno == EWOULDBLOCK ||
 			    errno == EAGAIN)
 				continue;
-			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s.",
 				virtq->index, strerror(errno));
 		}
 		break;
@@ -57,7 +57,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	rte_spinlock_unlock(&priv->db_lock);
 	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
-		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling.",
 			priv->vid, virtq->index);
 		return;
 	}
@@ -218,7 +218,7 @@ mlx5_vdpa_virtq_query(struct mlx5_vdpa_priv *priv, int index)
 		return -1;
 	}
 	if (attr.state == MLX5_VIRTQ_STATE_ERROR)
-		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu",
+		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu.",
 			priv->vid, index, attr.error_type);
 	return 0;
 }
@@ -380,7 +380,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0");
+		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
 	} else {
 		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 10/16] vdpa/mlx5: add task ring for MT management
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (17 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH 11/16] vdpa/mlx5: add MT task for VM memory registration Li Zhang
                     ` (12 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  17 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+	struct mlx5_vdpa_priv *priv;
+	uint32_t *remaining_cnt;
+	uint32_t *err_cnt;
+	uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
 	pthread_t tid;
+	struct rte_ring *rng;
+	pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..1fdc92d3ad 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,103 @@
 #include <rte_alarm.h>
 #include <rte_tailq.h>
 #include <rte_ring_elem.h>
+#include <rte_ring_peek.h>
 
 #include <mlx5_common.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+	void **obj, uint32_t n, uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_elem_start(r, obj,
+		sizeof(struct mlx5_vdpa_task), n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_elem_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+	void * const *obj, uint32_t n, uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_elem_finish(r, obj,
+		sizeof(struct mlx5_vdpa_task), n);
+	return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num)
+{
+	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t i;
+
+	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+	for (i = 0 ; i < num; i++) {
+		task[i].priv = priv;
+		/* To be added later. */
+	}
+	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+		return -1;
+	for (i = 0 ; i < num; i++)
+		if (task[i].remaining_cnt)
+			__atomic_fetch_add(task[i].remaining_cnt, 1,
+				__ATOMIC_RELAXED);
+	/* wake up conf thread. */
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-	/* To be added later. */
-	return arg;
+	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_priv *priv;
+	struct mlx5_vdpa_task task;
+	struct rte_ring *rng;
+	uint32_t thrd_idx;
+	uint32_t task_num;
+
+	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+		thrd_idx++)
+		if (multhrd->cthrd[thrd_idx].tid == thread_id)
+			break;
+	if (thrd_idx >= multhrd->max_thrds)
+		return NULL;
+	rng = multhrd->cthrd[thrd_idx].rng;
+	while (1) {
+		task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+			(void **)&task, 1, NULL);
+		if (!task_num) {
+			/* No task and condition wait. */
+			pthread_mutex_lock(&multhrd->cthrd_lock);
+			pthread_cond_wait(
+				&multhrd->cthrd[thrd_idx].c_cond,
+				&multhrd->cthrd_lock);
+			pthread_mutex_unlock(&multhrd->cthrd_lock);
+		}
+		priv = task.priv;
+		if (priv == NULL)
+			continue;
+		__atomic_fetch_sub(task.remaining_cnt,
+			1, __ATOMIC_RELAXED);
+		/* To be added later. */
+	}
+	return NULL;
 }
 
 static void
@@ -34,6 +120,10 @@ mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
 		if (need_unlock)
 			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
 	}
+	if (conf_thread_mng.cthrd[thrd_idx].rng) {
+		rte_ring_free(conf_thread_mng.cthrd[thrd_idx].rng);
+		conf_thread_mng.cthrd[thrd_idx].rng = NULL;
+	}
 }
 
 static int
@@ -45,6 +135,7 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 	rte_cpuset_t cpuset;
 	pthread_attr_t attr;
 	uint32_t thrd_idx;
+	uint32_t ring_num;
 	char name[32];
 	int ret;
 
@@ -60,8 +151,26 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 		DRV_LOG(ERR, "Failed to set thread priority.");
 		goto c_thread_err;
 	}
+	ring_num = MLX5_VDPA_MAX_TASKS_PER_THRD / conf_thread_mng.max_thrds;
+	if (!ring_num) {
+		DRV_LOG(ERR, "Invalid ring number for thread.");
+		goto c_thread_err;
+	}
 	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
 		thrd_idx++) {
+		snprintf(name, sizeof(name), "vDPA-mthread-ring-%d",
+			thrd_idx);
+		conf_thread_mng.cthrd[thrd_idx].rng = rte_ring_create_elem(name,
+			sizeof(struct mlx5_vdpa_task), ring_num,
+			rte_socket_id(),
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ |
+			RING_F_EXACT_SZ);
+		if (!conf_thread_mng.cthrd[thrd_idx].rng) {
+			DRV_LOG(ERR,
+			"Failed to create vdpa multi-threads %d ring.",
+			thrd_idx);
+			goto c_thread_err;
+		}
 		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
 				&attr, mlx5_vdpa_c_thread_handle,
 				(void *)&conf_thread_mng);
@@ -91,6 +200,8 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 					name);
 		else
 			DRV_LOG(DEBUG, "Thread name: %s.", name);
+		pthread_cond_init(&conf_thread_mng.cthrd[thrd_idx].c_cond,
+			NULL);
 	}
 	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
 	return 0;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 11/16] vdpa/mlx5: add MT task for VM memory registration
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (18 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 10/16] vdpa/mlx5: add task ring for MT management Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 11/17] vdpa/mlx5: add task ring for MT management Li Zhang
                     ` (11 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 270 ++++++++++++++++++--------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 258 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a9d023ed08..e3b32fa087 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
 	union {
 		struct ibv_mr *mr;
 		struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0xFFFF
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+	MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
 	struct mlx5_vdpa_priv *priv;
+	enum mlx5_vdpa_task_type type;
 	uint32_t *remaining_cnt;
 	uint32_t *err_cnt;
 	uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+	struct rte_vhost_memory *vmem;
+	uint32_t entries_num;
+	uint64_t gcd;
+	uint64_t size;
+	uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
-	struct rte_vhost_memory *vmem;
+	struct mlx5_vdpa_vmem_info vmem_info;
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_event_channel *err_chnl;
 	struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
 	uint8_t num_lag_ports;
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
+	uint16_t last_c_thrd_idx;
+	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
 	void *virtq_db_addr;
 	struct mlx5_pmd_wrapped_mr lm_mr;
-	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+	struct mlx5_vdpa_query_mr **mrs;
 	struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num);
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1fdc92d3ad..10391931ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num)
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
+		void **task_data, uint32_t num)
 {
 	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
 	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t *data = (uint32_t *)task_data;
 	uint32_t i;
 
 	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
 	for (i = 0 ; i < num; i++) {
 		task[i].priv = priv;
 		/* To be added later. */
+		task[i].type = task_type;
+		task[i].remaining_cnt = remaining_cnt;
+		task[i].err_cnt = err_cnt;
+		task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -71,6 +78,23 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time)
+{
+	/* Check and wait all tasks done. */
+	while (__atomic_load_n(remaining_cnt,
+		__ATOMIC_RELAXED) != 0) {
+		rte_delay_us_sleep(sleep_time);
+	}
+	if (__atomic_load_n(err_cnt,
+		__ATOMIC_RELAXED)) {
+		DRV_LOG(ERR, "Tasks done with error.");
+		return true;
+	}
+	return false;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
@@ -81,6 +105,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct rte_ring *rng;
 	uint32_t thrd_idx;
 	uint32_t task_num;
+	int ret;
 
 	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
 		thrd_idx++)
@@ -99,13 +124,29 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&multhrd->cthrd[thrd_idx].c_cond,
 				&multhrd->cthrd_lock);
 			pthread_mutex_unlock(&multhrd->cthrd_lock);
+			continue;
 		}
 		priv = task.priv;
 		if (priv == NULL)
 			continue;
-		__atomic_fetch_sub(task.remaining_cnt,
+		switch (task.type) {
+		case MLX5_VDPA_TASK_REG_MR:
+			ret = mlx5_vdpa_register_mr(priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mr %d.", task.idx);
+				__atomic_fetch_add(task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
+		default:
+			DRV_LOG(ERR, "Invalid vdpa task type %d.",
+			task.type);
+			break;
+		}
+		if (task.remaining_cnt)
+			__atomic_fetch_sub(task.remaining_cnt,
 			1, __ATOMIC_RELAXED);
-		/* To be added later. */
 	}
 	return NULL;
 }
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index d6e3dd664b..e333f0bca6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -17,25 +17,33 @@
 void
 mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
 	struct mlx5_vdpa_query_mr *entry;
-	struct mlx5_vdpa_query_mr *next;
+	int i;
 
-	entry = SLIST_FIRST(&priv->mr_list);
-	while (entry) {
-		next = SLIST_NEXT(entry, next);
-		if (entry->is_indirect)
-			claim_zero(mlx5_devx_cmd_destroy(entry->mkey));
-		else
-			claim_zero(mlx5_glue->dereg_mr(entry->mr));
-		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
-		rte_free(entry);
-		entry = next;
+	if (priv->mrs) {
+		for (i = priv->num_mrs - 1; i >= 0; i--) {
+			entry = &mrs[i];
+			if (entry->is_indirect) {
+				if (entry->mkey)
+					claim_zero(
+					mlx5_devx_cmd_destroy(entry->mkey));
+			} else {
+				if (entry->mr)
+					claim_zero(
+					mlx5_glue->dereg_mr(entry->mr));
+			}
+		}
+		rte_free(priv->mrs);
+		priv->mrs = NULL;
+		priv->num_mrs = 0;
 	}
-	SLIST_INIT(&priv->mr_list);
-	if (priv->vmem) {
-		free(priv->vmem);
-		priv->vmem = NULL;
+	if (priv->vmem_info.vmem) {
+		free(priv->vmem_info.vmem);
+		priv->vmem_info.vmem = NULL;
 	}
+	priv->gpa_mkey_index = 0;
 }
 
 static int
@@ -167,72 +175,29 @@ mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
-/*
- * The target here is to group all the physical memory regions of the
- * virtio device in one indirect mkey.
- * For KLM Fixed Buffer Size mode (HW find the translation entry in one
- * read according to the guest physical address):
- * All the sub-direct mkeys of it must be in the same size, hence, each
- * one of them should be in the GCD size of all the virtio memory
- * regions and the holes between them.
- * For KLM mode (each entry may be in different size so HW must iterate
- * the entries):
- * Each virtio memory region and each hole between them have one entry,
- * just need to cover the maximum allowed size(2G) by splitting entries
- * which their associated memory regions are bigger than 2G.
- * It means that each virtio memory region may be mapped to more than
- * one direct mkey in the 2 modes.
- * All the holes of invalid memory between the virtio memory regions
- * will be mapped to the null memory region for security.
- */
-int
-mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+static int
+mlx5_vdpa_create_indirect_mkey(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_devx_mkey_attr mkey_attr;
-	struct mlx5_vdpa_query_mr *entry = NULL;
-	struct rte_vhost_mem_region *reg = NULL;
-	uint8_t mode = 0;
-	uint32_t entries_num = 0;
-	uint32_t i;
-	uint64_t gcd = 0;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	uint8_t mode = priv->vmem_info.mode;
+	uint32_t entries_num = priv->vmem_info.entries_num;
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_klm klm_array[entries_num];
+	uint64_t gcd = priv->vmem_info.gcd;
+	int ret = -rte_errno;
 	uint64_t klm_size;
-	uint64_t mem_size;
-	uint64_t k;
 	int klm_index = 0;
-	int ret;
-	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
-			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
-	struct mlx5_klm klm_array[entries_num];
+	uint64_t k;
+	uint32_t i;
 
-	if (!mem)
-		return -rte_errno;
-	if (priv->vmem != NULL) {
-		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
-			/* VM memory not changed, reuse resources. */
-			free(mem);
-			return 0;
-		}
-		mlx5_vdpa_mem_dereg(priv);
-	}
-	priv->vmem = mem;
+	/* If it is the last entry, create indirect mkey. */
 	for (i = 0; i < mem->nregions; i++) {
+		entry = &mrs[i];
 		reg = &mem->regions[i];
-		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-		if (!entry) {
-			ret = -ENOMEM;
-			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
-			goto error;
-		}
-		entry->mr = mlx5_glue->reg_mr_iova(priv->cdev->pd,
-				       (void *)(uintptr_t)(reg->host_user_addr),
-				       reg->size, reg->guest_phys_addr,
-				       IBV_ACCESS_LOCAL_WRITE);
-		if (!entry->mr) {
-			DRV_LOG(ERR, "Failed to create direct Mkey.");
-			ret = -rte_errno;
-			goto error;
-		}
-		entry->is_indirect = 0;
 		if (i > 0) {
 			uint64_t sadd;
 			uint64_t empty_region_sz = reg->guest_phys_addr -
@@ -265,11 +230,10 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 			klm_array[klm_index].address = reg->guest_phys_addr + k;
 			klm_index++;
 		}
-		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	}
 	memset(&mkey_attr, 0, sizeof(mkey_attr));
 	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
-	mkey_attr.size = mem_size;
+	mkey_attr.size = priv->vmem_info.size;
 	mkey_attr.pd = priv->cdev->pdn;
 	mkey_attr.umem_id = 0;
 	/* Must be zero for KLM mode. */
@@ -278,25 +242,159 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	mkey_attr.pg_access = 0;
 	mkey_attr.klm_array = klm_array;
 	mkey_attr.klm_num = klm_index;
-	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-	if (!entry) {
-		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
-		ret = -ENOMEM;
-		goto error;
-	}
+	entry = &mrs[mem->nregions];
 	entry->mkey = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &mkey_attr);
 	if (!entry->mkey) {
 		DRV_LOG(ERR, "Failed to create indirect Mkey.");
-		ret = -rte_errno;
-		goto error;
+		rte_errno = -ret;
+		return ret;
 	}
 	entry->is_indirect = 1;
-	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	priv->gpa_mkey_index = entry->mkey->id;
 	return 0;
+}
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	void *mrs;
+	uint8_t mode = 0;
+	int ret = -rte_errno;
+	uint32_t i, thrd_idx, data[1];
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			(priv->vid, &mode, &priv->vmem_info.size,
+			&priv->vmem_info.gcd, &priv->vmem_info.entries_num);
+
+	if (!mem)
+		return -rte_errno;
+	if (priv->vmem_info.vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem_info.vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
+	priv->vmem_info.vmem = mem;
+	priv->vmem_info.mode = mode;
+	priv->num_mrs = mem->nregions;
+	if (!priv->num_mrs || priv->num_mrs >= MLX5_VDPA_MAX_MRS) {
+		DRV_LOG(ERR,
+		"Invalid number of memory regions.");
+		goto error;
+	}
+	/* The last one is indirect mkey entry. */
+	priv->num_mrs++;
+	mrs = rte_zmalloc("mlx5 vDPA memory regions",
+		sizeof(struct mlx5_vdpa_query_mr) * priv->num_mrs, 0);
+	priv->mrs = mrs;
+	if (!priv->mrs) {
+		DRV_LOG(ERR, "Failed to allocate private memory regions.");
+		goto error;
+	}
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[mem->nregions];
+
+		for (i = 0; i < mem->nregions; i++) {
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_REG_MR,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR,
+				"Fail to add task mem region (%d)", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			ret = mlx5_vdpa_register_mr(priv,
+					main_task_idx[i]);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 100)) {
+			DRV_LOG(ERR,
+			"Failed to wait register mem region tasks ready.");
+			goto error;
+		}
+	} else {
+		for (i = 0; i < mem->nregions; i++) {
+			ret = mlx5_vdpa_register_mr(priv, i);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+	}
+	ret = mlx5_vdpa_create_indirect_mkey(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create indirect mkey .");
+		goto error;
+	}
+	return 0;
 error:
-	rte_free(entry);
 	mlx5_vdpa_mem_dereg(priv);
 	rte_errno = -ret;
 	return ret;
 }
+
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx)
+{
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	int ret;
+
+	reg = &mem->regions[idx];
+	entry = &mrs[idx];
+	entry->mr = mlx5_glue->reg_mr_iova
+				      (priv->cdev->pd,
+				       (void *)(uintptr_t)(reg->host_user_addr),
+				       reg->size, reg->guest_phys_addr,
+				       IBV_ACCESS_LOCAL_WRITE);
+	if (!entry->mr) {
+		DRV_LOG(ERR, "Failed to create direct Mkey.");
+		ret = -rte_errno;
+		return ret;
+	}
+	entry->is_indirect = 0;
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 599809b09b..0b317655db 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -353,21 +353,21 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 	}
 	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
 			return -1;
 		}
 		attr->desc_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
 			return -1;
 		}
 		attr->used_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 11/17] vdpa/mlx5: add task ring for MT management
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (19 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH 11/16] vdpa/mlx5: add MT task for VM memory registration Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:20   ` [PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration Li Zhang
                     ` (10 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  17 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+	struct mlx5_vdpa_priv *priv;
+	uint32_t *remaining_cnt;
+	uint32_t *err_cnt;
+	uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
 	pthread_t tid;
+	struct rte_ring *rng;
+	pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..1fdc92d3ad 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,103 @@
 #include <rte_alarm.h>
 #include <rte_tailq.h>
 #include <rte_ring_elem.h>
+#include <rte_ring_peek.h>
 
 #include <mlx5_common.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+	void **obj, uint32_t n, uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_elem_start(r, obj,
+		sizeof(struct mlx5_vdpa_task), n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_elem_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+	void * const *obj, uint32_t n, uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_elem_finish(r, obj,
+		sizeof(struct mlx5_vdpa_task), n);
+	return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num)
+{
+	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t i;
+
+	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+	for (i = 0 ; i < num; i++) {
+		task[i].priv = priv;
+		/* To be added later. */
+	}
+	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+		return -1;
+	for (i = 0 ; i < num; i++)
+		if (task[i].remaining_cnt)
+			__atomic_fetch_add(task[i].remaining_cnt, 1,
+				__ATOMIC_RELAXED);
+	/* wake up conf thread. */
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-	/* To be added later. */
-	return arg;
+	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_priv *priv;
+	struct mlx5_vdpa_task task;
+	struct rte_ring *rng;
+	uint32_t thrd_idx;
+	uint32_t task_num;
+
+	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+		thrd_idx++)
+		if (multhrd->cthrd[thrd_idx].tid == thread_id)
+			break;
+	if (thrd_idx >= multhrd->max_thrds)
+		return NULL;
+	rng = multhrd->cthrd[thrd_idx].rng;
+	while (1) {
+		task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+			(void **)&task, 1, NULL);
+		if (!task_num) {
+			/* No task and condition wait. */
+			pthread_mutex_lock(&multhrd->cthrd_lock);
+			pthread_cond_wait(
+				&multhrd->cthrd[thrd_idx].c_cond,
+				&multhrd->cthrd_lock);
+			pthread_mutex_unlock(&multhrd->cthrd_lock);
+		}
+		priv = task.priv;
+		if (priv == NULL)
+			continue;
+		__atomic_fetch_sub(task.remaining_cnt,
+			1, __ATOMIC_RELAXED);
+		/* To be added later. */
+	}
+	return NULL;
 }
 
 static void
@@ -34,6 +120,10 @@ mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
 		if (need_unlock)
 			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
 	}
+	if (conf_thread_mng.cthrd[thrd_idx].rng) {
+		rte_ring_free(conf_thread_mng.cthrd[thrd_idx].rng);
+		conf_thread_mng.cthrd[thrd_idx].rng = NULL;
+	}
 }
 
 static int
@@ -45,6 +135,7 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 	rte_cpuset_t cpuset;
 	pthread_attr_t attr;
 	uint32_t thrd_idx;
+	uint32_t ring_num;
 	char name[32];
 	int ret;
 
@@ -60,8 +151,26 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 		DRV_LOG(ERR, "Failed to set thread priority.");
 		goto c_thread_err;
 	}
+	ring_num = MLX5_VDPA_MAX_TASKS_PER_THRD / conf_thread_mng.max_thrds;
+	if (!ring_num) {
+		DRV_LOG(ERR, "Invalid ring number for thread.");
+		goto c_thread_err;
+	}
 	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
 		thrd_idx++) {
+		snprintf(name, sizeof(name), "vDPA-mthread-ring-%d",
+			thrd_idx);
+		conf_thread_mng.cthrd[thrd_idx].rng = rte_ring_create_elem(name,
+			sizeof(struct mlx5_vdpa_task), ring_num,
+			rte_socket_id(),
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ |
+			RING_F_EXACT_SZ);
+		if (!conf_thread_mng.cthrd[thrd_idx].rng) {
+			DRV_LOG(ERR,
+			"Failed to create vdpa multi-threads %d ring.",
+			thrd_idx);
+			goto c_thread_err;
+		}
 		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
 				&attr, mlx5_vdpa_c_thread_handle,
 				(void *)&conf_thread_mng);
@@ -91,6 +200,8 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 					name);
 		else
 			DRV_LOG(DEBUG, "Thread name: %s.", name);
+		pthread_cond_init(&conf_thread_mng.cthrd[thrd_idx].c_cond,
+			NULL);
 	}
 	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
 	return 0;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (20 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 11/17] vdpa/mlx5: add task ring for MT management Li Zhang
@ 2022-06-06 11:20   ` Li Zhang
  2022-06-06 11:21   ` [PATCH 12/16] vdpa/mlx5: add virtq creation task for MT management Li Zhang
                     ` (9 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:20 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 270 ++++++++++++++++++--------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 258 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a9d023ed08..e3b32fa087 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
 	union {
 		struct ibv_mr *mr;
 		struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0xFFFF
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+	MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
 	struct mlx5_vdpa_priv *priv;
+	enum mlx5_vdpa_task_type type;
 	uint32_t *remaining_cnt;
 	uint32_t *err_cnt;
 	uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+	struct rte_vhost_memory *vmem;
+	uint32_t entries_num;
+	uint64_t gcd;
+	uint64_t size;
+	uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
-	struct rte_vhost_memory *vmem;
+	struct mlx5_vdpa_vmem_info vmem_info;
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_event_channel *err_chnl;
 	struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
 	uint8_t num_lag_ports;
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
+	uint16_t last_c_thrd_idx;
+	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
 	void *virtq_db_addr;
 	struct mlx5_pmd_wrapped_mr lm_mr;
-	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+	struct mlx5_vdpa_query_mr **mrs;
 	struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num);
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1fdc92d3ad..10391931ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num)
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
+		void **task_data, uint32_t num)
 {
 	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
 	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t *data = (uint32_t *)task_data;
 	uint32_t i;
 
 	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
 	for (i = 0 ; i < num; i++) {
 		task[i].priv = priv;
 		/* To be added later. */
+		task[i].type = task_type;
+		task[i].remaining_cnt = remaining_cnt;
+		task[i].err_cnt = err_cnt;
+		task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -71,6 +78,23 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time)
+{
+	/* Check and wait all tasks done. */
+	while (__atomic_load_n(remaining_cnt,
+		__ATOMIC_RELAXED) != 0) {
+		rte_delay_us_sleep(sleep_time);
+	}
+	if (__atomic_load_n(err_cnt,
+		__ATOMIC_RELAXED)) {
+		DRV_LOG(ERR, "Tasks done with error.");
+		return true;
+	}
+	return false;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
@@ -81,6 +105,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct rte_ring *rng;
 	uint32_t thrd_idx;
 	uint32_t task_num;
+	int ret;
 
 	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
 		thrd_idx++)
@@ -99,13 +124,29 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&multhrd->cthrd[thrd_idx].c_cond,
 				&multhrd->cthrd_lock);
 			pthread_mutex_unlock(&multhrd->cthrd_lock);
+			continue;
 		}
 		priv = task.priv;
 		if (priv == NULL)
 			continue;
-		__atomic_fetch_sub(task.remaining_cnt,
+		switch (task.type) {
+		case MLX5_VDPA_TASK_REG_MR:
+			ret = mlx5_vdpa_register_mr(priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mr %d.", task.idx);
+				__atomic_fetch_add(task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
+		default:
+			DRV_LOG(ERR, "Invalid vdpa task type %d.",
+			task.type);
+			break;
+		}
+		if (task.remaining_cnt)
+			__atomic_fetch_sub(task.remaining_cnt,
 			1, __ATOMIC_RELAXED);
-		/* To be added later. */
 	}
 	return NULL;
 }
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index d6e3dd664b..e333f0bca6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -17,25 +17,33 @@
 void
 mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
 	struct mlx5_vdpa_query_mr *entry;
-	struct mlx5_vdpa_query_mr *next;
+	int i;
 
-	entry = SLIST_FIRST(&priv->mr_list);
-	while (entry) {
-		next = SLIST_NEXT(entry, next);
-		if (entry->is_indirect)
-			claim_zero(mlx5_devx_cmd_destroy(entry->mkey));
-		else
-			claim_zero(mlx5_glue->dereg_mr(entry->mr));
-		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
-		rte_free(entry);
-		entry = next;
+	if (priv->mrs) {
+		for (i = priv->num_mrs - 1; i >= 0; i--) {
+			entry = &mrs[i];
+			if (entry->is_indirect) {
+				if (entry->mkey)
+					claim_zero(
+					mlx5_devx_cmd_destroy(entry->mkey));
+			} else {
+				if (entry->mr)
+					claim_zero(
+					mlx5_glue->dereg_mr(entry->mr));
+			}
+		}
+		rte_free(priv->mrs);
+		priv->mrs = NULL;
+		priv->num_mrs = 0;
 	}
-	SLIST_INIT(&priv->mr_list);
-	if (priv->vmem) {
-		free(priv->vmem);
-		priv->vmem = NULL;
+	if (priv->vmem_info.vmem) {
+		free(priv->vmem_info.vmem);
+		priv->vmem_info.vmem = NULL;
 	}
+	priv->gpa_mkey_index = 0;
 }
 
 static int
@@ -167,72 +175,29 @@ mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
-/*
- * The target here is to group all the physical memory regions of the
- * virtio device in one indirect mkey.
- * For KLM Fixed Buffer Size mode (HW find the translation entry in one
- * read according to the guest physical address):
- * All the sub-direct mkeys of it must be in the same size, hence, each
- * one of them should be in the GCD size of all the virtio memory
- * regions and the holes between them.
- * For KLM mode (each entry may be in different size so HW must iterate
- * the entries):
- * Each virtio memory region and each hole between them have one entry,
- * just need to cover the maximum allowed size(2G) by splitting entries
- * which their associated memory regions are bigger than 2G.
- * It means that each virtio memory region may be mapped to more than
- * one direct mkey in the 2 modes.
- * All the holes of invalid memory between the virtio memory regions
- * will be mapped to the null memory region for security.
- */
-int
-mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+static int
+mlx5_vdpa_create_indirect_mkey(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_devx_mkey_attr mkey_attr;
-	struct mlx5_vdpa_query_mr *entry = NULL;
-	struct rte_vhost_mem_region *reg = NULL;
-	uint8_t mode = 0;
-	uint32_t entries_num = 0;
-	uint32_t i;
-	uint64_t gcd = 0;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	uint8_t mode = priv->vmem_info.mode;
+	uint32_t entries_num = priv->vmem_info.entries_num;
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_klm klm_array[entries_num];
+	uint64_t gcd = priv->vmem_info.gcd;
+	int ret = -rte_errno;
 	uint64_t klm_size;
-	uint64_t mem_size;
-	uint64_t k;
 	int klm_index = 0;
-	int ret;
-	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
-			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
-	struct mlx5_klm klm_array[entries_num];
+	uint64_t k;
+	uint32_t i;
 
-	if (!mem)
-		return -rte_errno;
-	if (priv->vmem != NULL) {
-		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
-			/* VM memory not changed, reuse resources. */
-			free(mem);
-			return 0;
-		}
-		mlx5_vdpa_mem_dereg(priv);
-	}
-	priv->vmem = mem;
+	/* If it is the last entry, create indirect mkey. */
 	for (i = 0; i < mem->nregions; i++) {
+		entry = &mrs[i];
 		reg = &mem->regions[i];
-		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-		if (!entry) {
-			ret = -ENOMEM;
-			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
-			goto error;
-		}
-		entry->mr = mlx5_glue->reg_mr_iova(priv->cdev->pd,
-				       (void *)(uintptr_t)(reg->host_user_addr),
-				       reg->size, reg->guest_phys_addr,
-				       IBV_ACCESS_LOCAL_WRITE);
-		if (!entry->mr) {
-			DRV_LOG(ERR, "Failed to create direct Mkey.");
-			ret = -rte_errno;
-			goto error;
-		}
-		entry->is_indirect = 0;
 		if (i > 0) {
 			uint64_t sadd;
 			uint64_t empty_region_sz = reg->guest_phys_addr -
@@ -265,11 +230,10 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 			klm_array[klm_index].address = reg->guest_phys_addr + k;
 			klm_index++;
 		}
-		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	}
 	memset(&mkey_attr, 0, sizeof(mkey_attr));
 	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
-	mkey_attr.size = mem_size;
+	mkey_attr.size = priv->vmem_info.size;
 	mkey_attr.pd = priv->cdev->pdn;
 	mkey_attr.umem_id = 0;
 	/* Must be zero for KLM mode. */
@@ -278,25 +242,159 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	mkey_attr.pg_access = 0;
 	mkey_attr.klm_array = klm_array;
 	mkey_attr.klm_num = klm_index;
-	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-	if (!entry) {
-		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
-		ret = -ENOMEM;
-		goto error;
-	}
+	entry = &mrs[mem->nregions];
 	entry->mkey = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &mkey_attr);
 	if (!entry->mkey) {
 		DRV_LOG(ERR, "Failed to create indirect Mkey.");
-		ret = -rte_errno;
-		goto error;
+		rte_errno = -ret;
+		return ret;
 	}
 	entry->is_indirect = 1;
-	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	priv->gpa_mkey_index = entry->mkey->id;
 	return 0;
+}
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	void *mrs;
+	uint8_t mode = 0;
+	int ret = -rte_errno;
+	uint32_t i, thrd_idx, data[1];
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			(priv->vid, &mode, &priv->vmem_info.size,
+			&priv->vmem_info.gcd, &priv->vmem_info.entries_num);
+
+	if (!mem)
+		return -rte_errno;
+	if (priv->vmem_info.vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem_info.vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
+	priv->vmem_info.vmem = mem;
+	priv->vmem_info.mode = mode;
+	priv->num_mrs = mem->nregions;
+	if (!priv->num_mrs || priv->num_mrs >= MLX5_VDPA_MAX_MRS) {
+		DRV_LOG(ERR,
+		"Invalid number of memory regions.");
+		goto error;
+	}
+	/* The last one is indirect mkey entry. */
+	priv->num_mrs++;
+	mrs = rte_zmalloc("mlx5 vDPA memory regions",
+		sizeof(struct mlx5_vdpa_query_mr) * priv->num_mrs, 0);
+	priv->mrs = mrs;
+	if (!priv->mrs) {
+		DRV_LOG(ERR, "Failed to allocate private memory regions.");
+		goto error;
+	}
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[mem->nregions];
+
+		for (i = 0; i < mem->nregions; i++) {
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_REG_MR,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR,
+				"Fail to add task mem region (%d)", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			ret = mlx5_vdpa_register_mr(priv,
+					main_task_idx[i]);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 100)) {
+			DRV_LOG(ERR,
+			"Failed to wait register mem region tasks ready.");
+			goto error;
+		}
+	} else {
+		for (i = 0; i < mem->nregions; i++) {
+			ret = mlx5_vdpa_register_mr(priv, i);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+	}
+	ret = mlx5_vdpa_create_indirect_mkey(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create indirect mkey .");
+		goto error;
+	}
+	return 0;
 error:
-	rte_free(entry);
 	mlx5_vdpa_mem_dereg(priv);
 	rte_errno = -ret;
 	return ret;
 }
+
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx)
+{
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	int ret;
+
+	reg = &mem->regions[idx];
+	entry = &mrs[idx];
+	entry->mr = mlx5_glue->reg_mr_iova
+				      (priv->cdev->pd,
+				       (void *)(uintptr_t)(reg->host_user_addr),
+				       reg->size, reg->guest_phys_addr,
+				       IBV_ACCESS_LOCAL_WRITE);
+	if (!entry->mr) {
+		DRV_LOG(ERR, "Failed to create direct Mkey.");
+		ret = -rte_errno;
+		return ret;
+	}
+	entry->is_indirect = 0;
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 599809b09b..0b317655db 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -353,21 +353,21 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 	}
 	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
 			return -1;
 		}
 		attr->desc_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
 			return -1;
 		}
 		attr->used_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 12/16] vdpa/mlx5: add virtq creation task for MT management
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (21 preceding siblings ...)
  2022-06-06 11:20   ` [PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  2022-06-06 11:21   ` [PATCH 13/16] vdpa/mlx5: add virtq LM log task Li Zhang
                     ` (8 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++++++++++++++++++-------
 4 files changed, 134 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
+	MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
-	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
 	uint8_t notifier_state;
-	bool stopped;
 	uint32_t configured:1;
+	uint32_t enable:1;
+	uint32_t stopped:1;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
 		enum mlx5_vdpa_task_type task_type,
-		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
 		void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 10391931ae..1389d369ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
 	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
 	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
@@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__ATOMIC_RELAXED);
 			}
 			break;
+		case MLX5_VDPA_TASK_SETUP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_setup(priv,
+				task.idx, false);
+			if (ret) {
+				DRV_LOG(ERR,
+					"Failed to setup virtq %d.", task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1, __ATOMIC_RELAXED);
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
 			goto unlock;
-		virtq->stopped = true;
+		virtq->stopped = 1;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
 			goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 0b317655db..db05220e76 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		if (virtq->index != i)
+			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
-		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -131,7 +132,6 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
@@ -191,7 +191,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
 		return -1;
-	virtq->stopped = true;
+	virtq->stopped = 1;
 	DRV_LOG(DEBUG, "vid %u virtq %u was stopped.", priv->vid, index);
 	return mlx5_vdpa_virtq_query(priv, index);
 }
@@ -411,7 +411,38 @@ mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_doorbell_setup(struct mlx5_vdpa_virtq *virtq,
+		struct rte_vhost_vring *vq, int index)
+{
+	virtq->intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (virtq->intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		return -1;
+	}
+	if (rte_intr_fd_set(virtq->intr_handle, vq->kickfd))
+		return -1;
+	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
+		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
+	} else {
+		if (rte_intr_type_set(virtq->intr_handle,
+			RTE_INTR_HANDLE_EXT))
+			return -1;
+		if (rte_intr_callback_register(virtq->intr_handle,
+			mlx5_vdpa_virtq_kick_handler, virtq)) {
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
+				index);
+			return -1;
+		}
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			rte_intr_fd_get(virtq->intr_handle), index);
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	struct rte_vhost_vring vq;
@@ -455,33 +486,11 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
-	virtq->intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (virtq->intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
-
-	if (rte_intr_fd_set(virtq->intr_handle, vq.kickfd))
-		goto error;
-
-	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
-		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-	} else {
-		if (rte_intr_type_set(virtq->intr_handle, RTE_INTR_HANDLE_EXT))
-			goto error;
-
-		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_kick_handler,
-					       virtq)) {
-			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	if (reg_kick) {
+		if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, index)) {
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
-		} else {
-			DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				index);
 		}
 	}
 	/* Subscribe virtq error event. */
@@ -497,7 +506,6 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		rte_errno = errno;
 		goto error;
 	}
-	virtq->stopped = false;
 	/* Initial notification to ask Qemu handling completed buffers. */
 	if (virtq->eqp.cq.callfd != -1)
 		eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
@@ -567,10 +575,12 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t i;
-	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -590,16 +600,83 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		virtq = &priv->virtqs[i];
-		if (virtq->enable) {
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[nr_vring];
+
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_SETUP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+						"task setup virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (mlx5_vdpa_virtq_setup(priv, i)) {
+			if (mlx5_vdpa_virtq_setup(priv,
+				main_task_idx[i], false)) {
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			goto error;
+		}
+		for (i = 0; i < nr_vring; i++) {
+			/* Setup doorbell mapping in order for Qume. */
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->enable || !virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			if (rte_vhost_get_vhost_vring(priv->vid, i, &vq)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to register virtq %d interrupt.", i);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	} else {
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (virtq->enable) {
+				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
+					goto error;
+				}
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
 	}
 	return 0;
 error:
@@ -663,7 +740,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		mlx5_vdpa_virtq_unset(virtq);
 	}
 	if (enable) {
-		ret = mlx5_vdpa_virtq_setup(priv, index);
+		ret = mlx5_vdpa_virtq_setup(priv, index, true);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to setup virtq %d.", index);
 			return ret;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 13/16] vdpa/mlx5: add virtq LM log task
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (22 preceding siblings ...)
  2022-06-06 11:21   ` [PATCH 12/16] vdpa/mlx5: add virtq creation task for MT management Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  2022-06-06 11:21   ` [PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management Li Zhang
                     ` (7 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 85 +++++++++++++++++++++------
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
+	MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1389d369ae..98369f0887 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
+	uint64_t features;
 	uint32_t thrd_idx;
 	uint32_t task_num;
 	int ret;
@@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_STOP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			ret = rte_vhost_get_negotiated_features(
+				priv->vid, &features);
+			if (ret) {
+				DRV_LOG(ERR,
+		"Failed to get negotiated features virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(
+				priv->vid, task.idx, 0,
+			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index efebf364d0..c2e78218ca 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
-	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-	int i;
+	int ret;
 
+	ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to get negotiated features.");
 		return -1;
 	}
-	if (!RTE_VHOST_NEED_LOG(features))
-		return 0;
-	for (i = 0; i < priv->nr_virtqs; ++i) {
-		virtq = &priv->virtqs[i];
-		if (!priv->virtqs[i].virtq) {
-			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-		} else {
+	if (priv->use_c_thread && priv->nr_virtqs) {
+		uint32_t main_task_idx[priv->nr_virtqs];
+
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->configured)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_STOP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+					"task stop virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			ret = mlx5_vdpa_virtq_stop(priv, i);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					main_task_idx[i]);
+			if (ret) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.", i);
+				return -1;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			return -1;
+		}
+	} else {
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			ret = mlx5_vdpa_virtq_stop(priv, i);
 			if (ret) {
-				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
-					"log.", i);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d for LM log.", i);
 				return -1;
 			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
-		rte_vhost_log_used_vring(priv->vid, i, 0,
-			      MLX5_VDPA_USED_RING_LEN(priv->virtqs[i].vq_size));
 	}
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (23 preceding siblings ...)
  2022-06-06 11:21   ` [PATCH 13/16] vdpa/mlx5: add virtq LM log task Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  2022-06-06 11:21   ` [PATCH 14/16] vdpa/mlx5: add device close task Li Zhang
                     ` (6 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++++++++++++++++++-------
 4 files changed, 134 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
+	MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
-	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
 	uint8_t notifier_state;
-	bool stopped;
 	uint32_t configured:1;
+	uint32_t enable:1;
+	uint32_t stopped:1;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
 		enum mlx5_vdpa_task_type task_type,
-		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
 		void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 10391931ae..1389d369ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
 	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
 	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
@@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__ATOMIC_RELAXED);
 			}
 			break;
+		case MLX5_VDPA_TASK_SETUP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_setup(priv,
+				task.idx, false);
+			if (ret) {
+				DRV_LOG(ERR,
+					"Failed to setup virtq %d.", task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1, __ATOMIC_RELAXED);
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
 			goto unlock;
-		virtq->stopped = true;
+		virtq->stopped = 1;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
 			goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 0b317655db..db05220e76 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		if (virtq->index != i)
+			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
-		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -131,7 +132,6 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
@@ -191,7 +191,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
 		return -1;
-	virtq->stopped = true;
+	virtq->stopped = 1;
 	DRV_LOG(DEBUG, "vid %u virtq %u was stopped.", priv->vid, index);
 	return mlx5_vdpa_virtq_query(priv, index);
 }
@@ -411,7 +411,38 @@ mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_doorbell_setup(struct mlx5_vdpa_virtq *virtq,
+		struct rte_vhost_vring *vq, int index)
+{
+	virtq->intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (virtq->intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		return -1;
+	}
+	if (rte_intr_fd_set(virtq->intr_handle, vq->kickfd))
+		return -1;
+	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
+		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
+	} else {
+		if (rte_intr_type_set(virtq->intr_handle,
+			RTE_INTR_HANDLE_EXT))
+			return -1;
+		if (rte_intr_callback_register(virtq->intr_handle,
+			mlx5_vdpa_virtq_kick_handler, virtq)) {
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
+				index);
+			return -1;
+		}
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			rte_intr_fd_get(virtq->intr_handle), index);
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	struct rte_vhost_vring vq;
@@ -455,33 +486,11 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
-	virtq->intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (virtq->intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
-
-	if (rte_intr_fd_set(virtq->intr_handle, vq.kickfd))
-		goto error;
-
-	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
-		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-	} else {
-		if (rte_intr_type_set(virtq->intr_handle, RTE_INTR_HANDLE_EXT))
-			goto error;
-
-		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_kick_handler,
-					       virtq)) {
-			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	if (reg_kick) {
+		if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, index)) {
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
-		} else {
-			DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				index);
 		}
 	}
 	/* Subscribe virtq error event. */
@@ -497,7 +506,6 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		rte_errno = errno;
 		goto error;
 	}
-	virtq->stopped = false;
 	/* Initial notification to ask Qemu handling completed buffers. */
 	if (virtq->eqp.cq.callfd != -1)
 		eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
@@ -567,10 +575,12 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t i;
-	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -590,16 +600,83 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		virtq = &priv->virtqs[i];
-		if (virtq->enable) {
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[nr_vring];
+
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_SETUP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+						"task setup virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (mlx5_vdpa_virtq_setup(priv, i)) {
+			if (mlx5_vdpa_virtq_setup(priv,
+				main_task_idx[i], false)) {
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			goto error;
+		}
+		for (i = 0; i < nr_vring; i++) {
+			/* Setup doorbell mapping in order for Qume. */
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->enable || !virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			if (rte_vhost_get_vhost_vring(priv->vid, i, &vq)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to register virtq %d interrupt.", i);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	} else {
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (virtq->enable) {
+				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
+					goto error;
+				}
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
 	}
 	return 0;
 error:
@@ -663,7 +740,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		mlx5_vdpa_virtq_unset(virtq);
 	}
 	if (enable) {
-		ret = mlx5_vdpa_virtq_setup(priv, index);
+		ret = mlx5_vdpa_virtq_setup(priv, index, true);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to setup virtq %d.", index);
 			return ret;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 14/16] vdpa/mlx5: add device close task
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (24 preceding siblings ...)
  2022-06-06 11:21   ` [PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  2022-06-06 11:21   ` [PATCH v1 14/17] vdpa/mlx5: add virtq LM log task Li Zhang
                     ` (5 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 56 +++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++++++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e3b32fa087..d000854c08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
 	/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t timeout = 0;
+
+	/* Check and wait all close tasks done. */
+	while (__atomic_load_n(&priv->dev_close_progress,
+		__ATOMIC_RELAXED) != 0 && timeout < 1000) {
+		rte_delay_us_sleep(10000);
+		timeout++;
+	}
+	if (priv->dev_close_progress) {
+		DRV_LOG(ERR,
+		"Failed to wait close device tasks done vid %d.",
+		priv->vid);
+		return true;
+	}
+	return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	if (priv->use_c_thread) {
+		if (priv->last_c_thrd_idx >=
+			(conf_thread_mng.max_thrds - 1))
+			priv->last_c_thrd_idx = 0;
+		else
+			priv->last_c_thrd_idx++;
+		__atomic_store_n(&priv->dev_close_progress,
+			1, __ATOMIC_RELAXED);
+		if (mlx5_vdpa_task_add(priv,
+			priv->last_c_thrd_idx,
+			MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+			NULL, NULL, NULL, 1)) {
+			DRV_LOG(ERR,
+			"Fail to add dev close task. ");
+			goto single_thrd;
+		}
+		priv->state = MLX5_VDPA_STATE_PROBED;
+		DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+		return ret;
+	}
+single_thrd:
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	priv->state = MLX5_VDPA_STATE_PROBED;
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
+	__atomic_store_n(&priv->dev_close_progress, 0,
+		__ATOMIC_RELAXED);
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
+	if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+		return -1;
 	priv->vid = vid;
 	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
@@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED)
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
+		if (priv->use_c_thread)
+			mlx5_vdpa_wait_dev_close_tasks_done(priv);
 		mlx5_vdpa_dev_cache_clean(priv);
+	}
 	priv->connected = false;
 	return 0;
 }
@@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 		mlx5_vdpa_dev_close(priv->vid);
+	if (priv->use_c_thread)
+		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
+	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -206,6 +207,7 @@ struct mlx5_vdpa_priv {
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
 	uint16_t last_c_thrd_idx;
+	uint16_t dev_close_progress;
 	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
@@ -578,4 +580,10 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
+void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 98369f0887..bb2279440b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -63,7 +63,8 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		task[i].type = task_type;
 		task[i].remaining_cnt = remaining_cnt;
 		task[i].err_cnt = err_cnt;
-		task[i].idx = data[i];
+		if (data)
+			task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -187,6 +188,23 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT:
+			mlx5_vdpa_virtq_unreg_intr_handle_all(priv);
+			pthread_mutex_lock(&priv->steer_update_lock);
+			mlx5_vdpa_steer_unset(priv);
+			pthread_mutex_unlock(&priv->steer_update_lock);
+			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_drain_cq(priv);
+			if (priv->lm_mr.addr)
+				mlx5_os_wrapped_mkey_destroy(
+					&priv->lm_mr);
+			if (!priv->connected)
+				mlx5_vdpa_dev_cache_clean(priv);
+			priv->vid = 0;
+			__atomic_store_n(
+				&priv->dev_close_progress, 0,
+				__ATOMIC_RELAXED);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index db05220e76..a08c854b14 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -102,6 +102,20 @@ mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
 	virtq->intr_handle = NULL;
 }
 
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
+
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unregister_intr_handle(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 14/17] vdpa/mlx5: add virtq LM log task
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (25 preceding siblings ...)
  2022-06-06 11:21   ` [PATCH 14/16] vdpa/mlx5: add device close task Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  2022-06-06 11:21   ` [PATCH v1 15/17] vdpa/mlx5: add device close task Li Zhang
                     ` (4 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 85 +++++++++++++++++++++------
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
+	MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1389d369ae..98369f0887 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
+	uint64_t features;
 	uint32_t thrd_idx;
 	uint32_t task_num;
 	int ret;
@@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_STOP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			ret = rte_vhost_get_negotiated_features(
+				priv->vid, &features);
+			if (ret) {
+				DRV_LOG(ERR,
+		"Failed to get negotiated features virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(
+				priv->vid, task.idx, 0,
+			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index efebf364d0..c2e78218ca 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
-	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-	int i;
+	int ret;
 
+	ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to get negotiated features.");
 		return -1;
 	}
-	if (!RTE_VHOST_NEED_LOG(features))
-		return 0;
-	for (i = 0; i < priv->nr_virtqs; ++i) {
-		virtq = &priv->virtqs[i];
-		if (!priv->virtqs[i].virtq) {
-			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-		} else {
+	if (priv->use_c_thread && priv->nr_virtqs) {
+		uint32_t main_task_idx[priv->nr_virtqs];
+
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->configured)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_STOP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+					"task stop virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			ret = mlx5_vdpa_virtq_stop(priv, i);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					main_task_idx[i]);
+			if (ret) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.", i);
+				return -1;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			return -1;
+		}
+	} else {
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			ret = mlx5_vdpa_virtq_stop(priv, i);
 			if (ret) {
-				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
-					"log.", i);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d for LM log.", i);
 				return -1;
 			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
-		rte_vhost_log_used_vring(priv->vid, i, 0,
-			      MLX5_VDPA_USED_RING_LEN(priv->virtqs[i].vq_size));
 	}
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 15/17] vdpa/mlx5: add device close task
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (26 preceding siblings ...)
  2022-06-06 11:21   ` [PATCH v1 14/17] vdpa/mlx5: add virtq LM log task Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  2022-06-06 11:21   ` [PATCH 15/16] vdpa/mlx5: add virtq sub-resources creation Li Zhang
                     ` (3 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 56 +++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++++++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e3b32fa087..d000854c08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
 	/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t timeout = 0;
+
+	/* Check and wait all close tasks done. */
+	while (__atomic_load_n(&priv->dev_close_progress,
+		__ATOMIC_RELAXED) != 0 && timeout < 1000) {
+		rte_delay_us_sleep(10000);
+		timeout++;
+	}
+	if (priv->dev_close_progress) {
+		DRV_LOG(ERR,
+		"Failed to wait close device tasks done vid %d.",
+		priv->vid);
+		return true;
+	}
+	return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	if (priv->use_c_thread) {
+		if (priv->last_c_thrd_idx >=
+			(conf_thread_mng.max_thrds - 1))
+			priv->last_c_thrd_idx = 0;
+		else
+			priv->last_c_thrd_idx++;
+		__atomic_store_n(&priv->dev_close_progress,
+			1, __ATOMIC_RELAXED);
+		if (mlx5_vdpa_task_add(priv,
+			priv->last_c_thrd_idx,
+			MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+			NULL, NULL, NULL, 1)) {
+			DRV_LOG(ERR,
+			"Fail to add dev close task. ");
+			goto single_thrd;
+		}
+		priv->state = MLX5_VDPA_STATE_PROBED;
+		DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+		return ret;
+	}
+single_thrd:
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	priv->state = MLX5_VDPA_STATE_PROBED;
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
+	__atomic_store_n(&priv->dev_close_progress, 0,
+		__ATOMIC_RELAXED);
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
+	if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+		return -1;
 	priv->vid = vid;
 	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
@@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED)
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
+		if (priv->use_c_thread)
+			mlx5_vdpa_wait_dev_close_tasks_done(priv);
 		mlx5_vdpa_dev_cache_clean(priv);
+	}
 	priv->connected = false;
 	return 0;
 }
@@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 		mlx5_vdpa_dev_close(priv->vid);
+	if (priv->use_c_thread)
+		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
+	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -206,6 +207,7 @@ struct mlx5_vdpa_priv {
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
 	uint16_t last_c_thrd_idx;
+	uint16_t dev_close_progress;
 	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
@@ -578,4 +580,10 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
+void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 98369f0887..bb2279440b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -63,7 +63,8 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		task[i].type = task_type;
 		task[i].remaining_cnt = remaining_cnt;
 		task[i].err_cnt = err_cnt;
-		task[i].idx = data[i];
+		if (data)
+			task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -187,6 +188,23 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT:
+			mlx5_vdpa_virtq_unreg_intr_handle_all(priv);
+			pthread_mutex_lock(&priv->steer_update_lock);
+			mlx5_vdpa_steer_unset(priv);
+			pthread_mutex_unlock(&priv->steer_update_lock);
+			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_drain_cq(priv);
+			if (priv->lm_mr.addr)
+				mlx5_os_wrapped_mkey_destroy(
+					&priv->lm_mr);
+			if (!priv->connected)
+				mlx5_vdpa_dev_cache_clean(priv);
+			priv->vid = 0;
+			__atomic_store_n(
+				&priv->dev_close_progress, 0,
+				__ATOMIC_RELAXED);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index db05220e76..a08c854b14 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -102,6 +102,20 @@ mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
 	virtq->intr_handle = NULL;
 }
 
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
+
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unregister_intr_handle(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 15/16] vdpa/mlx5: add virtq sub-resources creation
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (27 preceding siblings ...)
  2022-06-06 11:21   ` [PATCH v1 15/17] vdpa/mlx5: add device close task Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  2022-06-06 11:21   ` [PATCH v1 16/17] " Li Zhang
                     ` (2 subsequent siblings)
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 72 +++++++--------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +++++++++++++++++++++--------
 5 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d000854c08..f006a9cd3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_vdpa_virtq *virtq;
+	uint32_t max_queues;
 	uint32_t index;
-	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
 
-	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+	for (index = 0; index < priv->caps.max_num_virtio_queues;
 		index++) {
 		virtq = &priv->virtqs[index];
 		pthread_mutex_init(&virtq->virtq_lock, NULL);
 	}
-	if (!priv->queues)
+	if (!priv->queues || !priv->queue_size)
 		return 0;
-	for (index = 0; index < (priv->queues * 2); ++index) {
+	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	for (index = 0; index < max_queues; ++index)
+		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+			index))
+			goto error;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			goto error;
+	return 0;
+error:
+	for (index = 0; index < max_queues; ++index) {
 		virtq = &priv->virtqs[index];
-		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, virtq);
-
-		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-				index);
-			return -1;
-		}
-		if (priv->caps.queue_counters_valid) {
-			if (!virtq->counters)
-				virtq->counters =
-					mlx5_devx_cmd_create_virtio_q_counters
-						(priv->cdev->ctx);
-			if (!virtq->counters) {
-				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
-					" %d.", index);
-				return -1;
-			}
-		}
-		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-			uint32_t size;
-			void *buf;
-			struct mlx5dv_devx_umem *obj;
-
-			size = priv->caps.umems[i].a * priv->queue_size +
-					priv->caps.umems[i].b;
-			buf = rte_zmalloc(__func__, size, 4096);
-			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
-						" %u.", i, index);
-				return -1;
-			}
-			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-					size, IBV_ACCESS_LOCAL_WRITE);
-			if (obj == NULL) {
-				rte_free(buf);
-				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
-						i, index);
-				return -1;
-			}
-			virtq->umems[i].size = size;
-			virtq->umems[i].buf = buf;
-			virtq->umems[i].obj = obj;
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
-	return 0;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..f353db62ac 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   The guest notification file descriptor.
  * @param[in/out] virtq
  *   Pointer to the virt-queue structure.
+ * @param[in] reset
+ *   If true, it will reset event qp.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq);
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset);
 
 /**
  * Destroy an event QP and all its related resources.
@@ -403,11 +405,13 @@ void mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] is_dummy
+ *   If set, it is updated with dummy queue for prepare resource.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
-int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv);
+int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy);
 
 /**
  * Setup steering and all its related resources to enable RSS traffic from the
@@ -581,9 +585,14 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 void
-mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
-void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
 void
 mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index);
+int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
+void
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f782b6b832..22f0920c88 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -249,7 +249,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
 
 		mlx5_vdpa_queue_complete(cq);
@@ -618,7 +618,7 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
-static int
+int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 {
 	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
@@ -638,7 +638,7 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq)
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset)
 {
 	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
@@ -649,11 +649,10 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 		/* Reuse existing resources. */
 		eqp->cq.callfd = callfd;
 		/* FW will set event qp to error state in q destroy. */
-		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+		if (reset && !mlx5_vdpa_qps2rst2rts(eqp))
 			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 					&eqp->sw_qp.db_rec[0]);
-			return 0;
-		}
+		return 0;
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index 4cbf09784e..c2e0a17ace 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -57,7 +57,7 @@ mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
  * -1 on error.
  */
 static int
-mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int i;
 	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
@@ -67,15 +67,20 @@ mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 						      sizeof(uint32_t), 0);
 	uint32_t k = 0, j;
 	int ret = 0, num;
+	uint16_t nr_vring = is_dummy ?
+	(((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+	(priv->queues * 2) : priv->caps.max_num_virtio_queues) : priv->nr_virtqs;
 
 	if (!attr) {
 		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	for (i = 0; i < nr_vring; i++) {
 		if (is_virtq_recvq(i, priv->nr_virtqs) &&
-		    priv->virtqs[i].enable && priv->virtqs[i].virtq) {
+			(is_dummy || (priv->virtqs[i].enable &&
+			priv->virtqs[i].configured)) &&
+			priv->virtqs[i].virtq) {
 			attr->rq_list[k] = priv->virtqs[i].virtq->id;
 			k++;
 		}
@@ -235,12 +240,12 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 }
 
 int
-mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int ret;
 
 	pthread_mutex_lock(&priv->steer_update_lock);
-	ret = mlx5_vdpa_rqt_prepare(priv);
+	ret = mlx5_vdpa_rqt_prepare(priv, is_dummy);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
@@ -261,7 +266,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
-	if (mlx5_vdpa_steer_update(priv))
+	if (mlx5_vdpa_steer_update(priv, false))
 		goto error;
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index a08c854b14..20ce382487 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -146,10 +146,10 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-static int
+void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int ret = -EAGAIN;
+	int ret;
 
 	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
@@ -157,12 +157,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+		virtq->index = 0;
+		virtq->virtq = NULL;
+		virtq->configured = 0;
 	}
-	virtq->virtq = NULL;
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
-	return 0;
 }
 
 void
@@ -175,6 +175,9 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
+		if (i < (priv->queues * 2))
+			mlx5_vdpa_virtq_single_resource_prepare(
+					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 	priv->features = 0;
@@ -258,7 +261,8 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 static int
 mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		struct mlx5_devx_virtq_attr *attr,
-		struct rte_vhost_vring *vq, int index)
+		struct rte_vhost_vring *vq,
+		int index, bool is_prepare)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	uint64_t gpa;
@@ -277,11 +281,15 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
 			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
 			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
-	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr->virtio_version_1_0 =
+	attr->tso_ipv4 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 = is_prepare ? 1 :
 		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
 	attr->q_type =
 		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
@@ -290,12 +298,12 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr->event_mode = vq->callfd != -1 ||
+	attr->event_mode = is_prepare || vq->callfd != -1 ||
 	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, virtq);
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq->size,
+				vq->callfd, virtq, !virtq->virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -320,7 +328,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	if (virtq->virtq) {
+	if (!virtq->virtq) {
 		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 			uint32_t size;
 			void *buf;
@@ -345,7 +353,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			buf = rte_zmalloc(__func__,
 				size, 4096);
 			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq."
 				" %u.", i, index);
 				return -1;
 			}
@@ -366,7 +374,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			attr->umems[i].size = virtq->umems[i].size;
 		}
 	}
-	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (!is_prepare && attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
@@ -389,21 +397,23 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid,
+	if (!is_prepare) {
+		ret = rte_vhost_get_vring_base(priv->vid,
 			index, &last_avail_idx, &last_used_idx);
-	if (ret) {
-		last_avail_idx = 0;
-		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
-	} else {
-		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		if (ret) {
+			last_avail_idx = 0;
+			last_used_idx = 0;
+			DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
+		} else {
+			DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
+		}
 	}
 	attr->hw_available_index = last_avail_idx;
 	attr->hw_used_index = last_used_idx;
 	attr->q_size = vq->size;
-	attr->mkey = priv->gpa_mkey_index;
+	attr->mkey = is_prepare ? 0 : priv->gpa_mkey_index;
 	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
 	attr->queue_index = index;
 	attr->pd = priv->cdev->pdn;
@@ -416,6 +426,39 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq = {
+		.size = priv->queue_size,
+		.callfd = -1,
+	};
+	int ret;
+
+	virtq = &priv->virtqs[index];
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	virtq->configured = 0;
+	virtq->virtq = NULL;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr, &vq, index, true);
+	if (ret) {
+		DRV_LOG(ERR,
+		"Cannot prepare setup resource for virtq %d.", index);
+		return true;
+	}
+	if (mlx5_vdpa_is_modify_virtq_supported(priv)) {
+		virtq->virtq =
+		mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+		virtq->priv = priv;
+		if (!virtq->virtq)
+			return true;
+	}
+	return false;
+}
+
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 {
@@ -473,7 +516,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 	virtq->priv = priv;
 	virtq->stopped = 0;
 	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
-				&vq, index);
+				&vq, index, false);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to setup update virtq attr"
 			" %d.", index);
@@ -746,7 +789,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to disable steering "
 					"for virtq %d.", index);
@@ -761,7 +804,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		}
 		virtq->enable = 1;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to enable steering "
 					"for virtq %d.", index);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 16/17] vdpa/mlx5: add virtq sub-resources creation
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (28 preceding siblings ...)
  2022-06-06 11:21   ` [PATCH 15/16] vdpa/mlx5: add virtq sub-resources creation Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  2022-06-06 11:21   ` [PATCH 16/16] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
  2022-06-06 11:21   ` [PATCH v1 17/17] " Li Zhang
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 72 +++++++--------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +++++++++++++++++++++--------
 5 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d000854c08..f006a9cd3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_vdpa_virtq *virtq;
+	uint32_t max_queues;
 	uint32_t index;
-	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
 
-	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+	for (index = 0; index < priv->caps.max_num_virtio_queues;
 		index++) {
 		virtq = &priv->virtqs[index];
 		pthread_mutex_init(&virtq->virtq_lock, NULL);
 	}
-	if (!priv->queues)
+	if (!priv->queues || !priv->queue_size)
 		return 0;
-	for (index = 0; index < (priv->queues * 2); ++index) {
+	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	for (index = 0; index < max_queues; ++index)
+		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+			index))
+			goto error;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			goto error;
+	return 0;
+error:
+	for (index = 0; index < max_queues; ++index) {
 		virtq = &priv->virtqs[index];
-		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, virtq);
-
-		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-				index);
-			return -1;
-		}
-		if (priv->caps.queue_counters_valid) {
-			if (!virtq->counters)
-				virtq->counters =
-					mlx5_devx_cmd_create_virtio_q_counters
-						(priv->cdev->ctx);
-			if (!virtq->counters) {
-				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
-					" %d.", index);
-				return -1;
-			}
-		}
-		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-			uint32_t size;
-			void *buf;
-			struct mlx5dv_devx_umem *obj;
-
-			size = priv->caps.umems[i].a * priv->queue_size +
-					priv->caps.umems[i].b;
-			buf = rte_zmalloc(__func__, size, 4096);
-			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
-						" %u.", i, index);
-				return -1;
-			}
-			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-					size, IBV_ACCESS_LOCAL_WRITE);
-			if (obj == NULL) {
-				rte_free(buf);
-				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
-						i, index);
-				return -1;
-			}
-			virtq->umems[i].size = size;
-			virtq->umems[i].buf = buf;
-			virtq->umems[i].obj = obj;
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
-	return 0;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..f353db62ac 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   The guest notification file descriptor.
  * @param[in/out] virtq
  *   Pointer to the virt-queue structure.
+ * @param[in] reset
+ *   If true, it will reset event qp.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq);
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset);
 
 /**
  * Destroy an event QP and all its related resources.
@@ -403,11 +405,13 @@ void mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] is_dummy
+ *   If set, it is updated with dummy queue for prepare resource.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
-int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv);
+int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy);
 
 /**
  * Setup steering and all its related resources to enable RSS traffic from the
@@ -581,9 +585,14 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 void
-mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
-void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
 void
 mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index);
+int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
+void
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f782b6b832..22f0920c88 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -249,7 +249,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
 
 		mlx5_vdpa_queue_complete(cq);
@@ -618,7 +618,7 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
-static int
+int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 {
 	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
@@ -638,7 +638,7 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq)
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset)
 {
 	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
@@ -649,11 +649,10 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 		/* Reuse existing resources. */
 		eqp->cq.callfd = callfd;
 		/* FW will set event qp to error state in q destroy. */
-		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+		if (reset && !mlx5_vdpa_qps2rst2rts(eqp))
 			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 					&eqp->sw_qp.db_rec[0]);
-			return 0;
-		}
+		return 0;
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index 4cbf09784e..c2e0a17ace 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -57,7 +57,7 @@ mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
  * -1 on error.
  */
 static int
-mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int i;
 	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
@@ -67,15 +67,20 @@ mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 						      sizeof(uint32_t), 0);
 	uint32_t k = 0, j;
 	int ret = 0, num;
+	uint16_t nr_vring = is_dummy ?
+	(((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+	(priv->queues * 2) : priv->caps.max_num_virtio_queues) : priv->nr_virtqs;
 
 	if (!attr) {
 		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	for (i = 0; i < nr_vring; i++) {
 		if (is_virtq_recvq(i, priv->nr_virtqs) &&
-		    priv->virtqs[i].enable && priv->virtqs[i].virtq) {
+			(is_dummy || (priv->virtqs[i].enable &&
+			priv->virtqs[i].configured)) &&
+			priv->virtqs[i].virtq) {
 			attr->rq_list[k] = priv->virtqs[i].virtq->id;
 			k++;
 		}
@@ -235,12 +240,12 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 }
 
 int
-mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int ret;
 
 	pthread_mutex_lock(&priv->steer_update_lock);
-	ret = mlx5_vdpa_rqt_prepare(priv);
+	ret = mlx5_vdpa_rqt_prepare(priv, is_dummy);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
@@ -261,7 +266,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
-	if (mlx5_vdpa_steer_update(priv))
+	if (mlx5_vdpa_steer_update(priv, false))
 		goto error;
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index a08c854b14..20ce382487 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -146,10 +146,10 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-static int
+void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int ret = -EAGAIN;
+	int ret;
 
 	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
@@ -157,12 +157,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+		virtq->index = 0;
+		virtq->virtq = NULL;
+		virtq->configured = 0;
 	}
-	virtq->virtq = NULL;
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
-	return 0;
 }
 
 void
@@ -175,6 +175,9 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
+		if (i < (priv->queues * 2))
+			mlx5_vdpa_virtq_single_resource_prepare(
+					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 	priv->features = 0;
@@ -258,7 +261,8 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 static int
 mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		struct mlx5_devx_virtq_attr *attr,
-		struct rte_vhost_vring *vq, int index)
+		struct rte_vhost_vring *vq,
+		int index, bool is_prepare)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	uint64_t gpa;
@@ -277,11 +281,15 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
 			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
 			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
-	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr->virtio_version_1_0 =
+	attr->tso_ipv4 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 = is_prepare ? 1 :
 		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
 	attr->q_type =
 		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
@@ -290,12 +298,12 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr->event_mode = vq->callfd != -1 ||
+	attr->event_mode = is_prepare || vq->callfd != -1 ||
 	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, virtq);
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq->size,
+				vq->callfd, virtq, !virtq->virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -320,7 +328,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	if (virtq->virtq) {
+	if (!virtq->virtq) {
 		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 			uint32_t size;
 			void *buf;
@@ -345,7 +353,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			buf = rte_zmalloc(__func__,
 				size, 4096);
 			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq."
 				" %u.", i, index);
 				return -1;
 			}
@@ -366,7 +374,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			attr->umems[i].size = virtq->umems[i].size;
 		}
 	}
-	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (!is_prepare && attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
@@ -389,21 +397,23 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid,
+	if (!is_prepare) {
+		ret = rte_vhost_get_vring_base(priv->vid,
 			index, &last_avail_idx, &last_used_idx);
-	if (ret) {
-		last_avail_idx = 0;
-		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
-	} else {
-		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		if (ret) {
+			last_avail_idx = 0;
+			last_used_idx = 0;
+			DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
+		} else {
+			DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
+		}
 	}
 	attr->hw_available_index = last_avail_idx;
 	attr->hw_used_index = last_used_idx;
 	attr->q_size = vq->size;
-	attr->mkey = priv->gpa_mkey_index;
+	attr->mkey = is_prepare ? 0 : priv->gpa_mkey_index;
 	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
 	attr->queue_index = index;
 	attr->pd = priv->cdev->pdn;
@@ -416,6 +426,39 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq = {
+		.size = priv->queue_size,
+		.callfd = -1,
+	};
+	int ret;
+
+	virtq = &priv->virtqs[index];
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	virtq->configured = 0;
+	virtq->virtq = NULL;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr, &vq, index, true);
+	if (ret) {
+		DRV_LOG(ERR,
+		"Cannot prepare setup resource for virtq %d.", index);
+		return true;
+	}
+	if (mlx5_vdpa_is_modify_virtq_supported(priv)) {
+		virtq->virtq =
+		mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+		virtq->priv = priv;
+		if (!virtq->virtq)
+			return true;
+	}
+	return false;
+}
+
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 {
@@ -473,7 +516,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 	virtq->priv = priv;
 	virtq->stopped = 0;
 	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
-				&vq, index);
+				&vq, index, false);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to setup update virtq attr"
 			" %d.", index);
@@ -746,7 +789,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to disable steering "
 					"for virtq %d.", index);
@@ -761,7 +804,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		}
 		virtq->enable = 1;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to enable steering "
 					"for virtq %d.", index);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 16/16] vdpa/mlx5: prepare virtqueue resource creation
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (29 preceding siblings ...)
  2022-06-06 11:21   ` [PATCH v1 16/17] " Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  2022-06-06 11:21   ` [PATCH v1 17/17] " Li Zhang
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 115 ++++++++++++++++++++------
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  12 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  15 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 111 +++++++++++++++++++++----
 4 files changed, 208 insertions(+), 45 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f006a9cd3f..c5d82872c7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv,
+		bool release_resource)
 {
-	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-	struct mlx5_vdpa_priv *priv =
-		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
 	int ret = 0;
+	int vid = priv->vid;
 
-	if (priv == NULL) {
-		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
-		return -1;
-	}
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
-	if (priv->use_c_thread) {
+	if (priv->use_c_thread && !release_resource) {
 		if (priv->last_c_thrd_idx >=
 			(conf_thread_mng.max_thrds - 1))
 			priv->last_c_thrd_idx = 0;
@@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid)
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, release_resource);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid)
 	return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (!vdev) {
+		DRV_LOG(ERR, "Invalid vDPA device.");
+		return -1;
+	}
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	return _internal_mlx5_vdpa_dev_close(priv, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		priv->queue_size);
 }
 
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t max_queues, index;
+	struct mlx5_vdpa_virtq *virtq;
+
+	if (!priv->queues || !priv->queue_size)
+		return;
+	max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	for (index = 0; index < max_queues; ++index) {
+		virtq = &priv->virtqs[index];
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t max_queues;
-	uint32_t index;
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t max_queues, index, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 
 	for (index = 0; index < priv->caps.max_num_virtio_queues;
@@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
 		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
-	for (index = 0; index < max_queues; ++index)
-		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-			index))
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[max_queues];
+
+		for (index = 0; index < max_queues; ++index) {
+			thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = index;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = index;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_PREPARE_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+				"task prepare virtq (%d).", index);
+				main_task_idx[task_num] = index;
+				task_num++;
+			}
+		}
+		for (index = 0; index < task_num; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				main_task_idx[index]))
+				goto error;
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue prepare tasks ready.");
 			goto error;
+		}
+	} else {
+		for (index = 0; index < max_queues; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				index))
+				goto error;
+	}
 	if (mlx5_vdpa_is_modify_virtq_supported(priv))
 		if (mlx5_vdpa_steer_update(priv, true))
 			goto error;
 	return 0;
 error:
-	for (index = 0; index < max_queues; ++index) {
-		virtq = &priv->virtqs[index];
-		if (virtq->virtq) {
-			pthread_mutex_lock(&virtq->virtq_lock);
-			mlx5_vdpa_virtq_unset(virtq);
-			pthread_mutex_unlock(&virtq->virtq_lock);
-		}
-	}
-	if (mlx5_vdpa_is_modify_virtq_supported(priv))
-		mlx5_vdpa_steer_unset(priv);
+	mlx5_vdpa_prepare_virtq_destroy(priv);
 	return -1;
 }
 
@@ -860,7 +923,7 @@ static void
 mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-		mlx5_vdpa_dev_close(priv->vid);
+		_internal_mlx5_vdpa_dev_close(priv, true);
 	if (priv->use_c_thread)
 		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f353db62ac..dc4dfba5ed 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -85,6 +85,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
 	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+	MLX5_VDPA_TASK_PREPARE_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -128,6 +129,9 @@ struct mlx5_vdpa_virtq {
 	uint32_t configured:1;
 	uint32_t enable:1;
 	uint32_t stopped:1;
+	uint32_t rx_csum:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t event_mode:3;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -355,8 +359,12 @@ void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] release_resource
+ *   The vdpa driver release resource without prepare resource.
  */
-void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+		bool release_resource);
 
 /**
  * Cleanup cached resources of all virtqs.
@@ -595,4 +603,6 @@ int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
 void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index bb2279440b..6e6624e5a3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -153,6 +153,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__atomic_fetch_add(
 					task.err_cnt, 1, __ATOMIC_RELAXED);
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
 		case MLX5_VDPA_TASK_STOP_VIRTQ:
@@ -193,7 +194,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			pthread_mutex_lock(&priv->steer_update_lock);
 			mlx5_vdpa_steer_unset(priv);
 			pthread_mutex_unlock(&priv->steer_update_lock);
-			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_virtqs_release(priv, false);
 			mlx5_vdpa_drain_cq(priv);
 			if (priv->lm_mr.addr)
 				mlx5_os_wrapped_mkey_destroy(
@@ -205,6 +206,18 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&priv->dev_close_progress, 0,
 				__ATOMIC_RELAXED);
 			break;
+		case MLX5_VDPA_TASK_PREPARE_VIRTQ:
+			ret = mlx5_vdpa_virtq_single_resource_prepare(
+					priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to prepare virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+				task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 20ce382487..d4dd73f861 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -116,18 +116,29 @@ mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
 	}
 }
 
+static void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq)
+{
+	/* Clean pre-created resource in dev removal only */
+	claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+	virtq->index = 0;
+	virtq->virtq = NULL;
+	virtq->configured = 0;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
+	mlx5_vdpa_steer_unset(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
-		if (virtq->index != i)
-			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
+		if (virtq->virtq)
+			mlx5_vdpa_vq_destroy(virtq);
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -157,29 +168,37 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
-		virtq->index = 0;
-		virtq->virtq = NULL;
-		virtq->configured = 0;
 	}
+	mlx5_vdpa_vq_destroy(virtq);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 }
 
 void
-mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+	bool release_resource)
 {
 	struct mlx5_vdpa_virtq *virtq;
-	int i;
-
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	uint32_t i, max_virtq, valid_vq_num;
+
+	valid_vq_num = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : priv->caps.max_num_virtio_queues;
+	max_virtq = (release_resource &&
+		(valid_vq_num) > priv->nr_virtqs) ?
+		(valid_vq_num) : priv->nr_virtqs;
+	for (i = 0; i < max_virtq; i++) {
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
-		if (i < (priv->queues * 2))
+		virtq->enable = 0;
+		if (!release_resource && i < valid_vq_num)
 			mlx5_vdpa_virtq_single_resource_prepare(
 					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
+	if (!release_resource && priv->queues &&
+		mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			mlx5_vdpa_steer_unset(priv);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -455,6 +474,9 @@ mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
 		virtq->priv = priv;
 		if (!virtq->virtq)
 			return true;
+		virtq->rx_csum = attr.rx_csum;
+		virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+		virtq->event_mode = attr.event_mode;
 	}
 	return false;
 }
@@ -538,6 +560,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 		goto error;
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->rx_csum = attr.rx_csum;
+	virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+	virtq->event_mode = attr.event_mode;
 	virtq->configured = 1;
 	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
@@ -629,6 +654,31 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 	return 0;
 }
 
+static bool
+mlx5_vdpa_is_pre_created_vq_mismatch(struct mlx5_vdpa_priv *priv,
+		struct mlx5_vdpa_virtq *virtq)
+{
+	struct rte_vhost_vring vq;
+	uint32_t event_mode;
+
+	if (virtq->rx_csum !=
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)))
+		return true;
+	if (virtq->virtio_version_1_0 !=
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1)))
+		return true;
+	if (rte_vhost_get_vhost_vring(priv->vid, virtq->index, &vq))
+		return true;
+	if (vq.size != virtq->vq_size)
+		return true;
+	event_mode = vq.callfd != -1 || !(priv->caps.event_mode &
+		(1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+		MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (virtq->event_mode != event_mode)
+		return true;
+	return false;
+}
+
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
@@ -664,6 +714,15 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			virtq = &priv->virtqs[i];
 			if (!virtq->enable)
 				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
 			if (!thrd_idx) {
 				main_task_idx[task_num] = i;
@@ -693,6 +752,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
@@ -724,20 +784,32 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	} else {
 		for (i = 0; i < nr_vring; i++) {
 			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv,
+					virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(
+					priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (virtq->enable) {
-				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
-					pthread_mutex_unlock(
+			if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+				pthread_mutex_unlock(
 						&virtq->virtq_lock);
-					goto error;
-				}
+				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
 error:
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, true);
 	return -1;
 }
 
@@ -795,6 +867,11 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 					"for virtq %d.", index);
 		}
 		mlx5_vdpa_virtq_unset(virtq);
+	} else {
+		if (virtq->virtq &&
+			mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq))
+			DRV_LOG(WARNING,
+			"Configuration mismatch dummy virtq %d.", index);
 	}
 	if (enable) {
 		ret = mlx5_vdpa_virtq_setup(priv, index, true);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 17/17] vdpa/mlx5: prepare virtqueue resource creation
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (30 preceding siblings ...)
  2022-06-06 11:21   ` [PATCH 16/16] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
@ 2022-06-06 11:21   ` Li Zhang
  31 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:21 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 115 ++++++++++++++++++++------
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  12 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  15 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 111 +++++++++++++++++++++----
 4 files changed, 208 insertions(+), 45 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f006a9cd3f..c5d82872c7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv,
+		bool release_resource)
 {
-	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-	struct mlx5_vdpa_priv *priv =
-		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
 	int ret = 0;
+	int vid = priv->vid;
 
-	if (priv == NULL) {
-		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
-		return -1;
-	}
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
-	if (priv->use_c_thread) {
+	if (priv->use_c_thread && !release_resource) {
 		if (priv->last_c_thrd_idx >=
 			(conf_thread_mng.max_thrds - 1))
 			priv->last_c_thrd_idx = 0;
@@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid)
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, release_resource);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid)
 	return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (!vdev) {
+		DRV_LOG(ERR, "Invalid vDPA device.");
+		return -1;
+	}
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	return _internal_mlx5_vdpa_dev_close(priv, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		priv->queue_size);
 }
 
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t max_queues, index;
+	struct mlx5_vdpa_virtq *virtq;
+
+	if (!priv->queues || !priv->queue_size)
+		return;
+	max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	for (index = 0; index < max_queues; ++index) {
+		virtq = &priv->virtqs[index];
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t max_queues;
-	uint32_t index;
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t max_queues, index, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 
 	for (index = 0; index < priv->caps.max_num_virtio_queues;
@@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
 		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
-	for (index = 0; index < max_queues; ++index)
-		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-			index))
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[max_queues];
+
+		for (index = 0; index < max_queues; ++index) {
+			thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = index;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = index;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_PREPARE_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+				"task prepare virtq (%d).", index);
+				main_task_idx[task_num] = index;
+				task_num++;
+			}
+		}
+		for (index = 0; index < task_num; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				main_task_idx[index]))
+				goto error;
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue prepare tasks ready.");
 			goto error;
+		}
+	} else {
+		for (index = 0; index < max_queues; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				index))
+				goto error;
+	}
 	if (mlx5_vdpa_is_modify_virtq_supported(priv))
 		if (mlx5_vdpa_steer_update(priv, true))
 			goto error;
 	return 0;
 error:
-	for (index = 0; index < max_queues; ++index) {
-		virtq = &priv->virtqs[index];
-		if (virtq->virtq) {
-			pthread_mutex_lock(&virtq->virtq_lock);
-			mlx5_vdpa_virtq_unset(virtq);
-			pthread_mutex_unlock(&virtq->virtq_lock);
-		}
-	}
-	if (mlx5_vdpa_is_modify_virtq_supported(priv))
-		mlx5_vdpa_steer_unset(priv);
+	mlx5_vdpa_prepare_virtq_destroy(priv);
 	return -1;
 }
 
@@ -860,7 +923,7 @@ static void
 mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-		mlx5_vdpa_dev_close(priv->vid);
+		_internal_mlx5_vdpa_dev_close(priv, true);
 	if (priv->use_c_thread)
 		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f353db62ac..dc4dfba5ed 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -85,6 +85,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
 	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+	MLX5_VDPA_TASK_PREPARE_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -128,6 +129,9 @@ struct mlx5_vdpa_virtq {
 	uint32_t configured:1;
 	uint32_t enable:1;
 	uint32_t stopped:1;
+	uint32_t rx_csum:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t event_mode:3;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -355,8 +359,12 @@ void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] release_resource
+ *   The vdpa driver release resource without prepare resource.
  */
-void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+		bool release_resource);
 
 /**
  * Cleanup cached resources of all virtqs.
@@ -595,4 +603,6 @@ int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
 void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index bb2279440b..6e6624e5a3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -153,6 +153,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__atomic_fetch_add(
 					task.err_cnt, 1, __ATOMIC_RELAXED);
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
 		case MLX5_VDPA_TASK_STOP_VIRTQ:
@@ -193,7 +194,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			pthread_mutex_lock(&priv->steer_update_lock);
 			mlx5_vdpa_steer_unset(priv);
 			pthread_mutex_unlock(&priv->steer_update_lock);
-			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_virtqs_release(priv, false);
 			mlx5_vdpa_drain_cq(priv);
 			if (priv->lm_mr.addr)
 				mlx5_os_wrapped_mkey_destroy(
@@ -205,6 +206,18 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&priv->dev_close_progress, 0,
 				__ATOMIC_RELAXED);
 			break;
+		case MLX5_VDPA_TASK_PREPARE_VIRTQ:
+			ret = mlx5_vdpa_virtq_single_resource_prepare(
+					priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to prepare virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+				task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 20ce382487..d4dd73f861 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -116,18 +116,29 @@ mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
 	}
 }
 
+static void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq)
+{
+	/* Clean pre-created resource in dev removal only */
+	claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+	virtq->index = 0;
+	virtq->virtq = NULL;
+	virtq->configured = 0;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
+	mlx5_vdpa_steer_unset(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
-		if (virtq->index != i)
-			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
+		if (virtq->virtq)
+			mlx5_vdpa_vq_destroy(virtq);
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -157,29 +168,37 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
-		virtq->index = 0;
-		virtq->virtq = NULL;
-		virtq->configured = 0;
 	}
+	mlx5_vdpa_vq_destroy(virtq);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 }
 
 void
-mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+	bool release_resource)
 {
 	struct mlx5_vdpa_virtq *virtq;
-	int i;
-
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	uint32_t i, max_virtq, valid_vq_num;
+
+	valid_vq_num = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : priv->caps.max_num_virtio_queues;
+	max_virtq = (release_resource &&
+		(valid_vq_num) > priv->nr_virtqs) ?
+		(valid_vq_num) : priv->nr_virtqs;
+	for (i = 0; i < max_virtq; i++) {
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
-		if (i < (priv->queues * 2))
+		virtq->enable = 0;
+		if (!release_resource && i < valid_vq_num)
 			mlx5_vdpa_virtq_single_resource_prepare(
 					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
+	if (!release_resource && priv->queues &&
+		mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			mlx5_vdpa_steer_unset(priv);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -455,6 +474,9 @@ mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
 		virtq->priv = priv;
 		if (!virtq->virtq)
 			return true;
+		virtq->rx_csum = attr.rx_csum;
+		virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+		virtq->event_mode = attr.event_mode;
 	}
 	return false;
 }
@@ -538,6 +560,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 		goto error;
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->rx_csum = attr.rx_csum;
+	virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+	virtq->event_mode = attr.event_mode;
 	virtq->configured = 1;
 	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
@@ -629,6 +654,31 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 	return 0;
 }
 
+static bool
+mlx5_vdpa_is_pre_created_vq_mismatch(struct mlx5_vdpa_priv *priv,
+		struct mlx5_vdpa_virtq *virtq)
+{
+	struct rte_vhost_vring vq;
+	uint32_t event_mode;
+
+	if (virtq->rx_csum !=
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)))
+		return true;
+	if (virtq->virtio_version_1_0 !=
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1)))
+		return true;
+	if (rte_vhost_get_vhost_vring(priv->vid, virtq->index, &vq))
+		return true;
+	if (vq.size != virtq->vq_size)
+		return true;
+	event_mode = vq.callfd != -1 || !(priv->caps.event_mode &
+		(1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+		MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (virtq->event_mode != event_mode)
+		return true;
+	return false;
+}
+
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
@@ -664,6 +714,15 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			virtq = &priv->virtqs[i];
 			if (!virtq->enable)
 				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
 			if (!thrd_idx) {
 				main_task_idx[task_num] = i;
@@ -693,6 +752,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
@@ -724,20 +784,32 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	} else {
 		for (i = 0; i < nr_vring; i++) {
 			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv,
+					virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(
+					priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (virtq->enable) {
-				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
-					pthread_mutex_unlock(
+			if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+				pthread_mutex_unlock(
 						&virtq->virtq_lock);
-					goto error;
-				}
+				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
 error:
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, true);
 	return -1;
 }
 
@@ -795,6 +867,11 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 					"for virtq %d.", index);
 		}
 		mlx5_vdpa_virtq_unset(virtq);
+	} else {
+		if (virtq->virtq &&
+			mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq))
+			DRV_LOG(WARNING,
+			"Configuration mismatch dummy virtq %d.", index);
 	}
 	if (enable) {
 		ret = mlx5_vdpa_virtq_setup(priv, index, true);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 00/17] Add vDPA multi-threads optiomization
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (15 preceding siblings ...)
  2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
@ 2022-06-06 11:46 ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
                     ` (16 more replies)
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                   ` (2 subsequent siblings)
  19 siblings, 17 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Allow the driver to use internal threads to
obtain fast configuration.
All the threads will be open on the same core of
the event completion queue scheduling thread.

Add max_conf_threads parameter to configure
the maximum number of internal threads in addition to
the caller thread (8 is suggested).
These internal threads to pipeline handle VDPA tasks
in system and shared with all VDPA devices.
Default is 0, don't use internal threads for configuration.

Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
http://patchwork.dpdk.org/project/dpdk/list/?series=21868

RFC ("Add vDPA multi-threads optiomization")
https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-lizh@nvidia.com/

Li Zhang (12):
  vdpa/mlx5: fix usage of capability for max number of virtqs
  common/mlx5: extend virtq modifiable fields
  vdpa/mlx5: pre-create virtq in the prob
  vdpa/mlx5: optimize datapath-control synchronization
  vdpa/mlx5: add multi-thread management for configuration
  vdpa/mlx5: add task ring for MT management
  vdpa/mlx5: add MT task for VM memory registration
  vdpa/mlx5: add virtq creation task for MT management
  vdpa/mlx5: add virtq LM log task
  vdpa/mlx5: add device close task
  vdpa/mlx5: add virtq sub-resources creation
  vdpa/mlx5: prepare virtqueue resource creation

Yajun Wu (5):
  eal: add device removal in rte cleanup
  examples/vdpa: fix devices cleanup
  vdpa/mlx5: support pre create virtq resource
  common/mlx5: add DevX API to move QP to reset state
  vdpa/mlx5: support event qp reuse

 doc/guides/vdpadevs/mlx5.rst          |  25 +
 drivers/common/mlx5/mlx5_devx_cmds.c  |  77 ++-
 drivers/common/mlx5/mlx5_devx_cmds.h  |   6 +-
 drivers/common/mlx5/mlx5_prm.h        |  30 +-
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 270 +++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h         | 152 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 360 ++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   | 160 +++++--
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 128 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 270 +++++++----
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c   |  22 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 654 +++++++++++++++++++-------
 examples/vdpa/main.c                  |   5 +-
 lib/eal/freebsd/eal.c                 |  33 ++
 lib/eal/include/rte_dev.h             |   6 +
 lib/eal/linux/eal.c                   |  33 ++
 lib/eal/windows/eal.c                 |  33 ++
 18 files changed, 1878 insertions(+), 387 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 02/17] eal: add device removal in rte cleanup Li Zhang
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Maxime Coquelin
  Cc: dev, thomas, rasland, roniba, stable

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array")
Cc: stable@dpdk.org

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 12 ++++++------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 76fa5d4299..ee71339b78 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -84,7 +84,7 @@ mlx5_vdpa_get_queue_num(struct rte_vdpa_device *vdev, uint32_t *queue_num)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	*queue_num = priv->caps.max_num_virtio_queues;
+	*queue_num = priv->caps.max_num_virtio_queues / 2;
 	return 0;
 }
 
@@ -141,7 +141,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -EINVAL;
 	}
-	if (vring >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (vring >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
@@ -388,7 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -411,7 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -624,7 +624,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		DRV_LOG(DEBUG, "No capability to support virtq statistics.");
 	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv) +
 			   sizeof(struct mlx5_vdpa_virtq) *
-			   attr->vdpa.max_num_virtio_queues * 2,
+			   attr->vdpa.max_num_virtio_queues,
 			   RTE_CACHE_LINE_SIZE);
 	if (!priv) {
 		DRV_LOG(ERR, "Failed to allocate private memory.");
@@ -685,7 +685,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 
 	mlx5_vdpa_dev_cache_clean(priv);
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
 			continue;
 		claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e025be47d2..c258eb3024 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -72,7 +72,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
@@ -492,9 +492,9 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM.");
 		priv->features |= (1ULL << VIRTIO_NET_F_CSUM);
 	}
-	if (nr_vring > priv->caps.max_num_virtio_queues * 2) {
+	if (nr_vring > priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Do not support more than %d virtqs(%d).",
-			(int)priv->caps.max_num_virtio_queues * 2,
+			(int)priv->caps.max_num_virtio_queues,
 			(int)nr_vring);
 		return -1;
 	}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 02/17] eal: add device removal in rte cleanup
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
  2022-06-06 11:46   ` [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 03/17] examples/vdpa: fix devices cleanup Li Zhang
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Bruce Richardson,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam
  Cc: dev, thomas, rasland, roniba, Yajun Wu, stable

From: Yajun Wu <yajunw@nvidia.com>

Add device removal in function rte_eal_cleanup. This is the last chance
device remove get called for sanity. Loop vdev bus first and then all bus
for all device, calling rte_dev_remove.

Cc: stable@dpdk.org

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 lib/eal/freebsd/eal.c     | 33 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_dev.h |  6 ++++++
 lib/eal/linux/eal.c       | 33 +++++++++++++++++++++++++++++++++
 lib/eal/windows/eal.c     | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index a6b20960f2..5ffd9146b6 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -886,11 +886,44 @@ rte_eal_init(int argc, char **argv)
 	return fctret;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+	RTE_SET_USED(bus);
+	RTE_SET_USED(data);
+	return 0;
+}
+
+static void
+remove_all_device(void)
+{
+	struct rte_bus *start = NULL, *next;
+	struct rte_dev_iterator dev_iter = {0};
+	struct rte_device *dev = NULL;
+	struct rte_device *tdev = NULL;
+	char devstr[128];
+
+	RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+		(void)rte_dev_remove(dev);
+	}
+	while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+		start = next;
+		/* Skip buses that don't have iterate method */
+		if (!next->dev_iterate || !next->name)
+			continue;
+		snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+		RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+			(void)rte_dev_remove(dev);
+		}
+	};
+}
+
 int
 rte_eal_cleanup(void)
 {
 	struct internal_config *internal_conf =
 		eal_get_internal_configuration();
+	remove_all_device();
 	rte_service_finalize();
 	rte_mp_channel_cleanup();
 	/* after this point, any DPDK pointers will become dangling */
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index e6ff1218f9..382d548ea3 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -492,6 +492,12 @@ int
 rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
 		  size_t len);
 
+#define RTE_DEV_FOREACH_SAFE(dev, devstr, it, tdev) \
+	for (rte_dev_iterator_init(it, devstr), \
+		(dev) = rte_dev_iterator_next(it); \
+		(dev) && ((tdev) = rte_dev_iterator_next(it), 1); \
+		(dev) = (tdev))
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 1ef263434a..30b295916e 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1248,6 +1248,38 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 	return 0;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+	RTE_SET_USED(bus);
+	RTE_SET_USED(data);
+	return 0;
+}
+
+static void
+remove_all_device(void)
+{
+	struct rte_bus *start = NULL, *next;
+	struct rte_dev_iterator dev_iter = {0};
+	struct rte_device *dev = NULL;
+	struct rte_device *tdev = NULL;
+	char devstr[128];
+
+	RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+		(void)rte_dev_remove(dev);
+	}
+	while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+		start = next;
+		/* Skip buses that don't have iterate method */
+		if (!next->dev_iterate || !next->name)
+			continue;
+		snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+		RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+			(void)rte_dev_remove(dev);
+		}
+	};
+}
+
 int
 rte_eal_cleanup(void)
 {
@@ -1257,6 +1289,7 @@ rte_eal_cleanup(void)
 	struct internal_config *internal_conf =
 		eal_get_internal_configuration();
 
+	remove_all_device();
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
 			internal_conf->hugepage_file.unlink_existing)
 		rte_memseg_walk(mark_freeable, NULL);
diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 122de2a319..3d7d411293 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -254,12 +254,45 @@ __rte_trace_point_register(rte_trace_point_t *trace, const char *name,
 	return -ENOTSUP;
 }
 
+static int
+bus_match_all(const struct rte_bus *bus, const void *data)
+{
+	RTE_SET_USED(bus);
+	RTE_SET_USED(data);
+	return 0;
+}
+
+static void
+remove_all_device(void)
+{
+	struct rte_bus *start = NULL, *next;
+	struct rte_dev_iterator dev_iter = {0};
+	struct rte_device *dev = NULL;
+	struct rte_device *tdev = NULL;
+	char devstr[128];
+
+	RTE_DEV_FOREACH_SAFE(dev, "bus=vdev", &dev_iter, tdev) {
+		(void)rte_dev_remove(dev);
+	}
+	while ((next = rte_bus_find(start, bus_match_all, NULL)) != NULL) {
+		start = next;
+		/* Skip buses that don't have iterate method */
+		if (!next->dev_iterate || !next->name)
+			continue;
+		snprintf(devstr, sizeof(devstr), "bus=%s", next->name);
+		RTE_DEV_FOREACH_SAFE(dev, devstr, &dev_iter, tdev) {
+			(void)rte_dev_remove(dev);
+		}
+	};
+}
+
 int
 rte_eal_cleanup(void)
 {
 	struct internal_config *internal_conf =
 		eal_get_internal_configuration();
 
+	remove_all_device();
 	eal_intr_thread_cancel();
 	eal_mem_virt2iova_cleanup();
 	/* after this point, any DPDK pointers will become dangling */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 03/17] examples/vdpa: fix devices cleanup
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
  2022-06-06 11:46   ` [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
  2022-06-06 11:46   ` [PATCH v1 02/17] eal: add device removal in rte cleanup Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource Li Zhang
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Maxime Coquelin, Chenbo Xia,
	Chengchang Tang
  Cc: dev, thomas, rasland, roniba, Yajun Wu, stable

From: Yajun Wu <yajunw@nvidia.com>

Move rte_eal_cleanup to function vdpa_sample_quit which
handling all example app quit.
Otherwise rte_eal_cleanup won't be called on receiving signal
like SIGINT(control + c).

Fixes: 10aa3757 ("examples: add eal cleanup to examples")
Cc: stable@dpdk.org

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 examples/vdpa/main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c
index 7e11ef4e26..62e32b633d 100644
--- a/examples/vdpa/main.c
+++ b/examples/vdpa/main.c
@@ -286,6 +286,8 @@ vdpa_sample_quit(void)
 		if (vports[i].ifname[0] != '\0')
 			close_vdpa(&vports[i]);
 	}
+	/* clean up the EAL */
+	rte_eal_cleanup();
 }
 
 static void
@@ -632,8 +634,5 @@ main(int argc, char *argv[])
 		vdpa_sample_quit();
 	}
 
-	/* clean up the EAL */
-	rte_eal_cleanup();
-
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (2 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 03/17] examples/vdpa: fix devices cleanup Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state Li Zhang
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

The motivation of this change is to reduce vDPA device queue creation
time by create some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++++++++++++++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+    first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx queue)
+    for pre-create queue resource to speed up first time queue creation. Set it
+    together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^^^^^^^^^^^^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee71339b78..faf833ee2f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-	mlx5_vdpa_virtqs_cleanup(priv);
+	/* Clean pre-created resource in dev removal only. */
+	if (!priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 		priv->hw_max_latency_us = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_pending_comp") == 0) {
 		priv->hw_max_pending_comp = (uint32_t)tmp;
+	} else if (strcmp(key, "queue_size") == 0) {
+		priv->queue_size = (uint16_t)tmp;
+	} else if (strcmp(key, "queues") == 0) {
+		priv->queues = (uint16_t)tmp;
+	} else {
+		DRV_LOG(WARNING, "Invalid key %s.", key);
 	}
 	return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 	if (!priv->event_us &&
 	    priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
 		priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+	if ((priv->queue_size && !priv->queues) ||
+		(!priv->queue_size && priv->queues)) {
+		priv->queue_size = 0;
+		priv->queues = 0;
+		DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+	}
 	DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
 	DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+	DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+		priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t index;
+	uint32_t i;
+
+	if (!priv->queues)
+		return 0;
+	for (index = 0; index < (priv->queues * 2); ++index) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+		if (priv->caps.queue_counters_valid) {
+			if (!virtq->counters)
+				virtq->counters =
+					mlx5_devx_cmd_create_virtio_q_counters
+						(priv->cdev->ctx);
+			if (!virtq->counters) {
+				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
+					" %d.", index);
+				return -1;
+			}
+		}
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size = priv->caps.umems[i].a * priv->queue_size +
+					priv->caps.umems[i].b;
+			buf = rte_zmalloc(__func__, size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+						" %u.", i, index);
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
+					size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				rte_free(buf);
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+						i, index);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
+		}
+	}
+	return 0;
 }
 
 static int
@@ -604,6 +671,8 @@ mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
 		return -rte_errno;
 	if (mlx5_vdpa_event_qp_global_prepare(priv))
 		return -rte_errno;
+	if (mlx5_vdpa_virtq_resource_prepare(priv))
+		return -rte_errno;
 	return 0;
 }
 
@@ -638,6 +707,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		priv->num_lag_ports = 1;
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
+	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -646,7 +716,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	mlx5_vdpa_config_get(mkvlist, priv);
 	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
@@ -684,6 +753,8 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	if (priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_dev_cache_clean(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e7f3319f89..f6719a3c60 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -135,6 +135,8 @@ struct mlx5_vdpa_priv {
 	uint8_t hw_latency_mode; /* Hardware CQ moderation mode. */
 	uint16_t hw_max_latency_us; /* Hardware CQ moderation period in usec. */
 	uint16_t hw_max_pending_comp; /* Hardware CQ moderation counter. */
+	uint16_t queue_size; /* virtq depth for pre-creating virtq resource */
+	uint16_t queues; /* Max virtq pair for pre-creating virtq resource */
 	struct rte_vdpa_device *vdev; /* vDPA device. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	int vid; /* vhost device id. */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (3 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 06/17] vdpa/mlx5: support event qp reuse Li Zhang
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++++++
 drivers/common/mlx5/mlx5_prm.h       | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6bdbc12bb..1d6d6578d6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
 	} in;
 	union {
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
 	} out;
 	void *qpc;
 	int ret;
@@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		inlen = sizeof(in.rtr2rts);
 		outlen = sizeof(out.rtr2rts);
 		break;
+	case MLX5_CMD_OP_QP_2RST:
+		MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+		inlen = sizeof(in.qp2rst);
+		outlen = sizeof(out.qp2rst);
+		break;
 	default:
 		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
 			qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index bc3e70a1d1..8a2f55c33e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
 	u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 vhca_tunnel_id[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_80[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
 	u8 status[0x8];
 	u8 reserved_0[0x18];
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 06/17] vdpa/mlx5: support event qp reuse
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (4 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 07/17] common/mlx5: extend virtq modifiable fields Li Zhang
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destroy,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index faf833ee2f..ee99952e11 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+					-1, &virtq->eqp);
 
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+				index);
+			return -1;
+		}
 		if (priv->caps.queue_counters_valid) {
 			if (!virtq->counters)
 				virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
 	struct mlx5_vdpa_cq cq;
 	struct mlx5_devx_obj *fw_qp;
 	struct mlx5_devx_qp sw_qp;
+	uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			      int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		};
 		uint32_t word;
 	} last_word;
-	uint16_t next_wqe_counter = cq->cq_ci;
+	uint16_t next_wqe_counter = eqp->qp_pi;
 	uint16_t cur_wqe_counter;
 	uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		rte_io_wmb();
 		/* Ring CQ doorbell record. */
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+		eqp->qp_pi += comp;
 		rte_io_wmb();
 		/* Ring SW QP doorbell record. */
-		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
 	}
 	return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 	return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+		mlx5_vdpa_queue_complete(cq);
+		if (cq->cq_obj.cq) {
+			cq->cq_obj.cqes[0].wqe_counter =
+				rte_cpu_to_be_16(UINT16_MAX);
+			priv->virtqs[i].eqp.qp_pi = 0;
+			if (!cq->armed)
+				mlx5_vdpa_cq_arm(priv, cq);
+		}
+	}
+}
+
 /* Wait on all CQs channel for completion event. */
 static struct mlx5_vdpa_cq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
@@ -574,14 +594,44 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
+static int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
+					  eqp->sw_qp.qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp.qp,
+			MLX5_CMD_OP_QP_2RST, eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	return mlx5_vdpa_qps2rts(eqp);
+}
+
 int
-mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			  int callfd, struct mlx5_vdpa_event_qp *eqp)
 {
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
+	if (eqp->cq.cq_obj.cq != NULL && log_desc_n == eqp->cq.log_desc_n) {
+		/* Reuse existing resources. */
+		eqp->cq.callfd = callfd;
+		/* FW will set event qp to error state in q destroy. */
+		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+					&eqp->sw_qp.db_rec[0]);
+			return 0;
+		}
+	}
+	if (eqp->fw_qp)
+		mlx5_vdpa_event_qp_destroy(eqp);
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
@@ -608,8 +658,10 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (mlx5_vdpa_qps2rts(eqp))
 		goto error;
+	eqp->qp_pi = 0;
 	/* First ringing. */
-	rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+	if (eqp->sw_qp.db_rec)
+		rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 			&eqp->sw_qp.db_rec[0]);
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c258eb3024..6637ba1503 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -87,6 +87,8 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 			}
 			virtq->umems[j].size = 0;
 		}
+		if (virtq->eqp.fw_qp)
+			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	}
 }
 
@@ -117,8 +119,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	if (virtq->eqp.fw_qp)
-		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
@@ -246,7 +246,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 						      MLX5_VIRTQ_EVENT_MODE_QP :
 						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
 						&virtq->eqp);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 07/17] common/mlx5: extend virtq modifiable fields
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (5 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 06/17] vdpa/mlx5: support event qp reuse Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob Li Zhang
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++++++++++++++++++++++-----
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h       | 13 +++++-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d6d6578d6..1b68c37092 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
 		vdpa_attr->log_doorbell_stride =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_stride);
+		vdpa_attr->vnet_modify_ext =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 vnet_modify_ext);
+		vdpa_attr->virtio_net_q_addr_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_net_q_addr_modify);
+		vdpa_attr->virtio_q_index_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_q_index_modify);
 		vdpa_attr->log_doorbell_bar_size =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_bar_size);
@@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+		attr->mod_fields_bitmap);
 	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-	switch (attr->type) {
-	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+	if (!attr->mod_fields_bitmap) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
 		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 			 attr->dirty_bitmap_mkey);
 		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 			 attr->dirty_bitmap_addr);
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 			 attr->dirty_bitmap_size);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+	}
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 			 attr->dirty_bitmap_dump_enable);
-		break;
-	default:
-		rte_errno = EINVAL;
-		return -rte_errno;
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+		MLX5_SET(virtio_q, virtctx, queue_period_mode,
+			attr->hw_latency_mode);
+		MLX5_SET(virtio_q, virtctx, queue_period_us,
+			attr->hw_max_latency_us);
+		MLX5_SET(virtio_q, virtctx, queue_max_count,
+			attr->hw_max_pending_comp);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+		MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+		MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+		MLX5_SET64(virtio_q, virtctx, available_addr,
+			attr->available_addr);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+			attr->hw_used_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+		MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+		MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY)
+		MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	if (attr->mod_fields_bitmap &
+		MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK) {
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+		MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+		MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE) {
+		MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+		MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
 	}
 	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
 					 out, sizeof(out));
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 3747ef9e33..ec6467d927 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -74,6 +74,9 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t queue_counters_valid:1;
+	uint32_t vnet_modify_ext:1;
+	uint32_t virtio_net_q_addr_modify:1;
+	uint32_t virtio_q_index_modify:1;
 	uint32_t max_num_virtio_queues;
 	struct {
 		uint32_t a;
@@ -465,7 +468,7 @@ struct mlx5_devx_virtq_attr {
 	uint32_t tis_id;
 	uint32_t counters_obj_id;
 	uint64_t dirty_bitmap_addr;
-	uint64_t type;
+	uint64_t mod_fields_bitmap;
 	uint64_t desc_addr;
 	uint64_t used_addr;
 	uint64_t available_addr;
@@ -475,6 +478,7 @@ struct mlx5_devx_virtq_attr {
 		uint64_t offset;
 	} umems[3];
 	uint8_t error_type;
+	uint8_t q_type;
 };
 
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 8a2f55c33e..5f58a6ee1d 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1802,7 +1802,9 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
 	u8 virtio_queue_type[0x8];
 	u8 reserved_at_20[0x13];
 	u8 log_doorbell_stride[0x5];
-	u8 reserved_at_3b[0x3];
+	u8 vnet_modify_ext[0x1];
+	u8 virtio_net_q_addr_modify[0x1];
+	u8 virtio_q_index_modify[0x1];
 	u8 log_doorbell_bar_size[0x5];
 	u8 doorbell_bar_offset[0x40];
 	u8 reserved_at_80[0x8];
@@ -3024,6 +3026,15 @@ enum {
 	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD = (1UL << 5),
+	MLX5_VIRTQ_MODIFY_TYPE_ADDR = (1UL << 6),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX = (1UL << 7),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX = (1UL << 8),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE = (1UL << 9),
+	MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 = (1UL << 10),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY = (1UL << 11),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK = (1UL << 12),
+	MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE = (1UL << 13),
 };
 
 struct mlx5_ifc_virtio_q_bits {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (6 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 07/17] common/mlx5: extend virtq modifiable fields Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h       |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    |  13 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 257 +++++++++++++++++-----------
 3 files changed, 170 insertions(+), 104 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
 	uint16_t vq_size;
 	uint8_t notifier_state;
 	bool stopped;
+	uint32_t configured:1;
 	uint32_t version;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..a8faf0c116 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,14 +12,17 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.mod_fields_bitmap =
+			MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
 		.dirty_bitmap_dump_enable = enable,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
@@ -37,10 +40,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 			   uint64_t log_size)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
 		.dirty_bitmap_addr = log_base,
 		.dirty_bitmap_size = log_size,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 	int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
 					      priv->cdev->pdn,
@@ -54,7 +58,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 						      &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6637ba1503..55cbc9fad2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		rte_intr_fd_set(virtq->intr_handle, -1);
 	}
 	rte_intr_instance_free(virtq->intr_handle);
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
+		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
-			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE,
 			.state = state ? MLX5_VIRTQ_STATE_RDY :
 					 MLX5_VIRTQ_STATE_SUSPEND,
 			.queue_index = virtq->index,
@@ -153,7 +155,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	int ret;
 
-	if (virtq->stopped)
+	if (virtq->stopped || !virtq->configured)
 		return 0;
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
@@ -209,51 +211,54 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
+		struct mlx5_devx_virtq_attr *attr,
+		struct rte_vhost_vring *vq, int index)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
-	struct rte_vhost_vring vq;
-	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
 	unsigned int i;
-	uint16_t last_avail_idx;
-	uint16_t last_used_idx;
-	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
-	uint64_t cookie;
-
-	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
-	if (ret)
-		return -1;
-	if (vq.size == 0)
-		return 0;
-	virtq->index = index;
-	virtq->vq_size = vq.size;
-	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
-							VIRTIO_F_VERSION_1));
-	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+	uint16_t last_avail_idx = 0;
+	uint16_t last_used_idx = 0;
+
+	if (virtq->virtq)
+		attr->mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE |
+			MLX5_VIRTQ_MODIFY_TYPE_ADDR |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
+			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
+			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 =
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
+	attr->q_type =
+		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
 			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
-					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
-						      MLX5_VIRTQ_EVENT_MODE_QP :
-						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
-	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
-						&virtq->eqp);
+	attr->event_mode = vq->callfd != -1 ||
+	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
+		ret = mlx5_vdpa_event_qp_prepare(priv,
+				vq->size, vq->callfd, &virtq->eqp);
 		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+			DRV_LOG(ERR,
+				"Failed to create event QPs for virtq %d.",
 				index);
 			return -1;
 		}
-		attr.qp_id = virtq->eqp.fw_qp->id;
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+		attr->qp_id = virtq->eqp.fw_qp->id;
 	} else {
 		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
 			" need event QPs and event mechanism.", index);
@@ -265,77 +270,82 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (!virtq->counters) {
 			DRV_LOG(ERR, "Failed to create virtq couners for virtq"
 				" %d.", index);
-			goto error;
+			return -1;
 		}
-		attr.counters_obj_id = virtq->counters->id;
+		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		uint32_t size;
-		void *buf;
-		struct mlx5dv_devx_umem *obj;
-
-		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
-		if (virtq->umems[i].size == size &&
-		    virtq->umems[i].obj != NULL) {
-			/* Reuse registered memory. */
-			memset(virtq->umems[i].buf, 0, size);
-			goto reuse;
-		}
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
+	if (virtq->virtq) {
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size =
+		priv->caps.umems[i].a * vq->size + priv->caps.umems[i].b;
+			if (virtq->umems[i].size == size &&
+				virtq->umems[i].obj != NULL) {
+				/* Reuse registered memory. */
+				memset(virtq->umems[i].buf, 0, size);
+				goto reuse;
+			}
+			if (virtq->umems[i].obj)
+				claim_zero(mlx5_glue->devx_umem_dereg
 				   (virtq->umems[i].obj));
-		if (virtq->umems[i].buf)
-			rte_free(virtq->umems[i].buf);
-		virtq->umems[i].size = 0;
-		virtq->umems[i].obj = NULL;
-		virtq->umems[i].buf = NULL;
-		buf = rte_zmalloc(__func__, size, 4096);
-		if (buf == NULL) {
-			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+			if (virtq->umems[i].buf)
+				rte_free(virtq->umems[i].buf);
+			virtq->umems[i].size = 0;
+			virtq->umems[i].obj = NULL;
+			virtq->umems[i].buf = NULL;
+			buf = rte_zmalloc(__func__,
+				size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
-			goto error;
-		}
-		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
-					       IBV_ACCESS_LOCAL_WRITE);
-		if (obj == NULL) {
-			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
+				buf, size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
-			goto error;
-		}
-		virtq->umems[i].size = size;
-		virtq->umems[i].buf = buf;
-		virtq->umems[i].obj = obj;
+				rte_free(buf);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
 reuse:
-		attr.umems[i].id = virtq->umems[i].obj->umem_id;
-		attr.umems[i].offset = 0;
-		attr.umems[i].size = virtq->umems[i].size;
+			attr->umems[i].id = virtq->umems[i].obj->umem_id;
+			attr->umems[i].offset = 0;
+			attr->umems[i].size = virtq->umems[i].size;
+		}
 	}
-	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.desc);
+					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
-			goto error;
+			return -1;
 		}
-		attr.desc_addr = gpa;
+		attr->desc_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.used);
+					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
-			goto error;
+			return -1;
 		}
-		attr.used_addr = gpa;
+		attr->used_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.avail);
+					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-			goto error;
+			return -1;
 		}
-		attr.available_addr = gpa;
+		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
-				 &last_used_idx);
+	ret = rte_vhost_get_vring_base(priv->vid,
+			index, &last_avail_idx, &last_used_idx);
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
@@ -345,24 +355,71 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
 	}
-	attr.hw_available_index = last_avail_idx;
-	attr.hw_used_index = last_used_idx;
-	attr.q_size = vq.size;
-	attr.mkey = priv->gpa_mkey_index;
-	attr.tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
-	attr.queue_index = index;
-	attr.pd = priv->cdev->pdn;
-	attr.hw_latency_mode = priv->hw_latency_mode;
-	attr.hw_max_latency_us = priv->hw_max_latency_us;
-	attr.hw_max_pending_comp = priv->hw_max_pending_comp;
-	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+	attr->hw_available_index = last_avail_idx;
+	attr->hw_used_index = last_used_idx;
+	attr->q_size = vq->size;
+	attr->mkey = priv->gpa_mkey_index;
+	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
+	attr->queue_index = index;
+	attr->pd = priv->cdev->pdn;
+	attr->hw_latency_mode = priv->hw_latency_mode;
+	attr->hw_max_latency_us = priv->hw_max_latency_us;
+	attr->hw_max_pending_comp = priv->hw_max_pending_comp;
+	if (attr->hw_latency_mode || attr->hw_max_latency_us ||
+		attr->hw_max_pending_comp)
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD;
+	return 0;
+}
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
+{
+	return (priv->caps.vnet_modify_ext &&
+			priv->caps.virtio_net_q_addr_modify &&
+			priv->caps.virtio_q_index_modify) ? true : false;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+{
+	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	int ret;
+	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
+	uint64_t cookie;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->priv = priv;
-	if (!virtq->virtq)
+	virtq->stopped = 0;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
+				&vq, index);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to setup update virtq attr"
+			" %d.", index);
 		goto error;
-	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
-	if (mlx5_vdpa_virtq_modify(virtq, 1))
+	}
+	if (!virtq->virtq) {
+		virtq->index = index;
+		virtq->vq_size = vq.size;
+		virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx,
+			&attr);
+		if (!virtq->virtq)
+			goto error;
+		attr.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE;
+	}
+	attr.state = MLX5_VIRTQ_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify virtq %d.", index);
 		goto error;
-	virtq->priv = priv;
+	}
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->configured = 1;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
@@ -553,7 +610,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 			return 0;
 		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
 	}
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
 			ret = mlx5_vdpa_steer_update(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (7 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration Li Zhang
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 24 ++++---
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++++++++++++++++++-----------
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    | 34 +++++++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++++++++++++++++++-------
 6 files changed, 184 insertions(+), 79 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee99952e11..e5a11f72fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
 	struct mlx5_vdpa_priv *priv =
 		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	struct mlx5_vdpa_virtq *virtq;
 	int ret;
 
 	if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
-	pthread_mutex_lock(&priv->vq_config_lock);
+	virtq = &priv->virtqs[vring];
+	pthread_mutex_lock(&virtq->virtq_lock);
 	ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-	pthread_mutex_unlock(&priv->vq_config_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
-	/* The mutex may stay locked after event thread cancel - initiate it. */
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t index;
 	uint32_t i;
 
+	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+		index++) {
+		virtq = &priv->virtqs[index];
+		pthread_mutex_init(&virtq->virtq_lock, NULL);
+	}
 	if (!priv->queues)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
-		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		virtq = &priv->virtqs[index];
 		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, &virtq->eqp);
+					-1, virtq);
 
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
+	rte_spinlock_init(&priv->db_lock);
+	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
-	pthread_mutex_destroy(&priv->vq_config_lock);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
 	bool stopped;
 	uint32_t configured:1;
 	uint32_t version;
+	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
 	enum mlx5_dev_state state;
-	pthread_mutex_t vq_config_lock;
+	rte_spinlock_t db_lock;
+	pthread_mutex_t steer_update_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
 	int event_mode;
@@ -222,14 +224,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   Number of descriptors.
  * @param[in] callfd
  *   The guest notification file descriptor.
- * @param[in/out] eqp
- *   Pointer to the event QP structure.
+ * @param[in/out] virtq
+ *   Pointer to the virt-queue structure.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			      int callfd, struct mlx5_vdpa_event_qp *eqp);
+int
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+	int callfd, struct mlx5_vdpa_virtq *virtq);
 
 /**
  * Destroy an event QP and all its related resources.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b43dca9255..2b0f5936d1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -85,12 +85,13 @@ mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
 
 static int
 mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
-		    int callfd, struct mlx5_vdpa_cq *cq)
+		int callfd, struct mlx5_vdpa_virtq *virtq)
 {
 	struct mlx5_devx_cq_attr attr = {
 		.use_first_only = 1,
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
+	struct mlx5_vdpa_cq *cq = &virtq->eqp.cq;
 	uint16_t event_nums[1] = {0};
 	int ret;
 
@@ -102,10 +103,11 @@ mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
 	cq->log_desc_n = log_desc_n;
 	rte_spinlock_init(&cq->sl);
 	/* Subscribe CQ event to the event channel controlled by the driver. */
-	ret = mlx5_os_devx_subscribe_devx_event(priv->eventc,
-						cq->cq_obj.cq->obj,
-						sizeof(event_nums), event_nums,
-						(uint64_t)(uintptr_t)cq);
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc,
+							cq->cq_obj.cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)virtq);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to subscribe CQE event.");
 		rte_errno = errno;
@@ -167,13 +169,17 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 static void
 mlx5_vdpa_arm_all_cqs(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_cq *cq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		cq = &priv->virtqs[i].eqp.cq;
 		if (cq->cq_obj.cq && !cq->armed)
 			mlx5_vdpa_cq_arm(priv, cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
@@ -220,13 +226,18 @@ mlx5_vdpa_queue_complete(struct mlx5_vdpa_cq *cq)
 static uint32_t
 mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 {
-	int i;
+	struct mlx5_vdpa_virtq *virtq;
+	struct mlx5_vdpa_cq *cq;
 	uint32_t max = 0;
+	uint32_t comp;
+	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
-		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
-		uint32_t comp = mlx5_vdpa_queue_complete(cq);
-
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		cq = &virtq->eqp.cq;
+		comp = mlx5_vdpa_queue_complete(cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		if (comp > max)
 			max = comp;
 	}
@@ -253,7 +264,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 }
 
 /* Wait on all CQs channel for completion event. */
-static struct mlx5_vdpa_cq *
+static struct mlx5_vdpa_virtq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 {
 #ifdef HAVE_IBV_DEVX_EVENT
@@ -265,7 +276,8 @@ mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 					    sizeof(out.buf));
 
 	if (ret >= 0)
-		return (struct mlx5_vdpa_cq *)(uintptr_t)out.event_resp.cookie;
+		return (struct mlx5_vdpa_virtq *)
+				(uintptr_t)out.event_resp.cookie;
 	DRV_LOG(INFO, "Got error in devx_get_event, ret = %d, errno = %d.",
 		ret, errno);
 #endif
@@ -276,7 +288,7 @@ static void *
 mlx5_vdpa_event_handle(void *arg)
 {
 	struct mlx5_vdpa_priv *priv = arg;
-	struct mlx5_vdpa_cq *cq;
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t max;
 
 	switch (priv->event_mode) {
@@ -284,7 +296,6 @@ mlx5_vdpa_event_handle(void *arg)
 	case MLX5_VDPA_EVENT_MODE_FIXED_TIMER:
 		priv->timer_delay_us = priv->event_us;
 		while (1) {
-			pthread_mutex_lock(&priv->vq_config_lock);
 			max = mlx5_vdpa_queues_complete(priv);
 			if (max == 0 && priv->no_traffic_counter++ >=
 			    priv->no_traffic_max) {
@@ -292,32 +303,37 @@ mlx5_vdpa_event_handle(void *arg)
 					priv->vdev->device->name);
 				mlx5_vdpa_arm_all_cqs(priv);
 				do {
-					pthread_mutex_unlock
-							(&priv->vq_config_lock);
-					cq = mlx5_vdpa_event_wait(priv);
-					pthread_mutex_lock
-							(&priv->vq_config_lock);
-					if (cq == NULL ||
-					       mlx5_vdpa_queue_complete(cq) > 0)
+					virtq = mlx5_vdpa_event_wait(priv);
+					if (virtq == NULL)
 						break;
+					pthread_mutex_lock(
+						&virtq->virtq_lock);
+					if (mlx5_vdpa_queue_complete(
+						&virtq->eqp.cq) > 0) {
+						pthread_mutex_unlock(
+							&virtq->virtq_lock);
+						break;
+					}
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
 				} while (1);
 				priv->timer_delay_us = priv->event_us;
 				priv->no_traffic_counter = 0;
 			} else if (max != 0) {
 				priv->no_traffic_counter = 0;
 			}
-			pthread_mutex_unlock(&priv->vq_config_lock);
 			mlx5_vdpa_timer_sleep(priv, max);
 		}
 		return NULL;
 	case MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT:
 		do {
-			cq = mlx5_vdpa_event_wait(priv);
-			if (cq != NULL) {
-				pthread_mutex_lock(&priv->vq_config_lock);
-				if (mlx5_vdpa_queue_complete(cq) > 0)
-					mlx5_vdpa_cq_arm(priv, cq);
-				pthread_mutex_unlock(&priv->vq_config_lock);
+			virtq = mlx5_vdpa_event_wait(priv);
+			if (virtq != NULL) {
+				pthread_mutex_lock(&virtq->virtq_lock);
+				if (mlx5_vdpa_queue_complete(
+					&virtq->eqp.cq) > 0)
+					mlx5_vdpa_cq_arm(priv, &virtq->eqp.cq);
+				pthread_mutex_unlock(&virtq->virtq_lock);
 			}
 		} while (1);
 		return NULL;
@@ -339,7 +355,6 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t sec;
 
-	pthread_mutex_lock(&priv->vq_config_lock);
 	while (mlx5_glue->devx_get_event(priv->err_chnl, &out.event_resp,
 					 sizeof(out.buf)) >=
 				       (ssize_t)sizeof(out.event_resp.cookie)) {
@@ -351,10 +366,11 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			continue;
 		}
 		virtq = &priv->virtqs[vq_index];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		if (!virtq->enable || virtq->version != version)
-			continue;
+			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
-			continue;
+			goto unlock;
 		virtq->stopped = true;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
@@ -384,8 +400,9 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 		for (i = 1; i < RTE_DIM(virtq->err_time); i++)
 			virtq->err_time[i - 1] = virtq->err_time[i];
 		virtq->err_time[RTE_DIM(virtq->err_time) - 1] = rte_rdtsc();
+unlock:
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
-	pthread_mutex_unlock(&priv->vq_config_lock);
 #endif
 }
 
@@ -533,11 +550,18 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 void
 mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	void *status;
+	int i;
 
 	if (priv->timer_tid) {
 		pthread_cancel(priv->timer_tid);
 		pthread_join(priv->timer_tid, &status);
+		/* The mutex may stay locked after event thread cancel, initiate it. */
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_init(&virtq->virtq_lock, NULL);
+		}
 	}
 	priv->timer_tid = 0;
 }
@@ -614,8 +638,9 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			  int callfd, struct mlx5_vdpa_event_qp *eqp)
+	int callfd, struct mlx5_vdpa_virtq *virtq)
 {
+	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
@@ -632,7 +657,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
-	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
+	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, virtq) ||
+		!eqp->cq.cq_obj.cq)
 		return -1;
 	attr.pd = priv->cdev->pdn;
 	attr.ts_format =
@@ -650,8 +676,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	attr.ts_format =
 		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
 	ret = mlx5_devx_qp_create(priv->cdev->ctx, &(eqp->sw_qp),
-					attr.num_of_receive_wqes *
-					MLX5_WSEG_SIZE, &attr, SOCKET_ID_ANY);
+				  attr.num_of_receive_wqes * MLX5_WSEG_SIZE,
+				  &attr, SOCKET_ID_ANY);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
 		goto error;
@@ -668,3 +694,4 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	mlx5_vdpa_event_qp_destroy(eqp);
 	return -1;
 }
+
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index a8faf0c116..efebf364d0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -25,11 +25,18 @@ mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 			   &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
 				"bitmap enabling.", i);
-			return -1;
+				return -1;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -61,10 +68,19 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 		virtq = &priv->virtqs[i];
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
-						      &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for LM.", i);
-			goto err;
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(
+					priv->virtqs[i].virtq,
+					&attr)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to modify virtq %d for LM.", i);
+				goto err;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -79,6 +95,7 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	int i;
@@ -90,10 +107,13 @@ mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 	if (!RTE_VHOST_NEED_LOG(features))
 		return 0;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
+		virtq = &priv->virtqs[i];
 		if (!priv->virtqs[i].virtq) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
 		} else {
+			pthread_mutex_lock(&virtq->virtq_lock);
 			ret = mlx5_vdpa_virtq_stop(priv, i);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 			if (ret) {
 				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
 					"log.", i);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index d4b4375c88..4cbf09784e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -237,19 +237,24 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 {
-	int ret = mlx5_vdpa_rqt_prepare(priv);
+	int ret;
 
+	pthread_mutex_lock(&priv->steer_update_lock);
+	ret = mlx5_vdpa_rqt_prepare(priv);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
+		pthread_mutex_unlock(&priv->steer_update_lock);
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
 		ret = mlx5_vdpa_rss_flows_create(priv);
 		if (ret) {
 			DRV_LOG(ERR, "Cannot create RSS flows.");
+			pthread_mutex_unlock(&priv->steer_update_lock);
 			return -1;
 		}
 	}
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	return 0;
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 55cbc9fad2..138b7bdbc5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -24,13 +24,17 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	pthread_mutex_lock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
 		return;
 	}
-	if (rte_intr_fd_get(virtq->intr_handle) < 0)
+	if (rte_intr_fd_get(virtq->intr_handle) < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
 	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
@@ -44,9 +48,14 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 		}
 		break;
 	}
-	if (nbytes < 0)
+	if (nbytes < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
@@ -66,6 +75,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Virtq must be locked before calling this function. */
+static void
+mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
+{
+	int ret = -EAGAIN;
+
+	if (!virtq->intr_handle)
+		return;
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(virtq->intr_handle,
+					mlx5_vdpa_virtq_kick_handler, virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					virtq->index);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+				pthread_mutex_lock(&virtq->virtq_lock);
+			}
+		}
+		(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	}
+	rte_intr_instance_free(virtq->intr_handle);
+	virtq->intr_handle = NULL;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
@@ -75,6 +111,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		pthread_mutex_lock(&virtq->virtq_lock);
 		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
@@ -90,28 +127,17 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 		}
 		if (virtq->eqp.fw_qp)
 			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
-		while (ret == -EAGAIN) {
-			ret = rte_intr_callback_unregister(virtq->intr_handle,
-					mlx5_vdpa_virtq_kick_handler, virtq);
-			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
-					rte_intr_fd_get(virtq->intr_handle),
-					virtq->index);
-				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
-			}
-		}
-		rte_intr_fd_set(virtq->intr_handle, -1);
-	}
-	rte_intr_instance_free(virtq->intr_handle);
+	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
@@ -128,10 +154,15 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
-	for (i = 0; i < priv->nr_virtqs; i++)
-		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unset(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -250,7 +281,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
 		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, &virtq->eqp);
+				vq->size, vq->callfd, virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -420,7 +451,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	virtq->configured = 1;
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
 		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
@@ -441,7 +474,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (rte_intr_callback_register(virtq->intr_handle,
 					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
-			rte_intr_fd_set(virtq->intr_handle, -1);
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
@@ -537,6 +570,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	struct mlx5_vdpa_virtq *virtq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -556,9 +590,17 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++)
-		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-			goto error;
+	for (i = 0; i < nr_vring; i++) {
+		virtq = &priv->virtqs[i];
+		if (virtq->enable) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_vdpa_virtq_setup(priv, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (8 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 11/17] vdpa/mlx5: add task ring for MT management Li Zhang
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst          |  11 +++
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  41 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  36 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is suggested).
+    This value, if not 0, should be the same for all the devices;
+    the first prob will take it with the event_core for all the multi-thread configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'mlx5_vdpa_virtq.c',
         'mlx5_vdpa_steer.c',
         'mlx5_vdpa_lm.c',
+        'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
         '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e5a11f72fd..a9d023ed08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 			DRV_LOG(WARNING, "Invalid event_core %s.", val);
 		else
 			priv->event_core = tmp;
+	} else if (strcmp(key, "max_conf_threads") == 0) {
+		if (tmp) {
+			priv->use_c_thread = true;
+			if (!conf_thread_mng.initializer_priv) {
+				conf_thread_mng.initializer_priv = priv;
+				if (tmp > MLX5_VDPA_MAX_C_THRD) {
+					DRV_LOG(WARNING,
+				"Invalid max_conf_threads %s "
+				"and set max_conf_threads to %d",
+				val, MLX5_VDPA_MAX_C_THRD);
+					tmp = MLX5_VDPA_MAX_C_THRD;
+				}
+				conf_thread_mng.max_thrds = tmp;
+			} else if (tmp != conf_thread_mng.max_thrds) {
+				DRV_LOG(WARNING,
+	"max_conf_threads is PMD argument and not per device, "
+	"only the first device configuration set it, current value is %d "
+	"and will not be changed to %d.",
+				conf_thread_mng.max_thrds, (int)tmp);
+			}
+		} else {
+			priv->use_c_thread = false;
+		}
 	} else if (strcmp(key, "hw_latency_mode") == 0) {
 		priv->hw_latency_mode = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		"hw_max_latency_us",
 		"hw_max_pending_comp",
 		"no_traffic_time",
+		"queue_size",
+		"queues",
+		"max_conf_threads",
 		NULL,
 	};
 
@@ -725,6 +753,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
+	if (priv->use_c_thread) {
+		if (conf_thread_mng.initializer_priv == priv)
+			if (mlx5_vdpa_mult_threads_create(priv->event_core))
+				goto error;
+		__atomic_fetch_add(&conf_thread_mng.refcnt, 1,
+			__ATOMIC_RELAXED);
+	}
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -739,6 +774,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
 error:
+	if (conf_thread_mng.initializer_priv == priv)
+		mlx5_vdpa_mult_threads_destroy(false);
 	if (priv)
 		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
@@ -806,6 +843,10 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
+	if (priv->use_c_thread)
+		if (__atomic_fetch_sub(&conf_thread_mng.refcnt,
+			1, __ATOMIC_RELAXED) == 1)
+			mlx5_vdpa_mult_threads_destroy(true);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3fd5eefc5e..4e7c2557b7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -73,6 +73,22 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_MAX_C_THRD 256
+
+/* Generic mlx5_vdpa_c_thread information. */
+struct mlx5_vdpa_c_thread {
+	pthread_t tid;
+};
+
+struct mlx5_vdpa_conf_thread_mng {
+	void *initializer_priv;
+	uint32_t refcnt;
+	uint32_t max_thrds;
+	pthread_mutex_t cthrd_lock;
+	struct mlx5_vdpa_c_thread cthrd[MLX5_VDPA_MAX_C_THRD];
+};
+extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -126,6 +142,7 @@ enum mlx5_dev_state {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
+	bool use_c_thread;
 	enum mlx5_dev_state state;
 	rte_spinlock_t db_lock;
 	pthread_mutex_t steer_update_lock;
@@ -496,4 +513,23 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create configuration multi-threads resource
+ *
+ * @param[in] cpu_core
+ *   CPU core number to set configuration threads affinity to.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_vdpa_mult_threads_create(int cpu_core);
+
+/**
+ * Destroy configuration multi-threads resource
+ *
+ */
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
new file mode 100644
index 0000000000..ba7d8b63b3
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+#include <string.h>
+#include <unistd.h>
+#include <sys/eventfd.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_io.h>
+#include <rte_alarm.h>
+#include <rte_tailq.h>
+#include <rte_ring_elem.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static void *
+mlx5_vdpa_c_thread_handle(void *arg)
+{
+	/* To be added later. */
+	return arg;
+}
+
+static void
+mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
+{
+	if (conf_thread_mng.cthrd[thrd_idx].tid) {
+		pthread_cancel(conf_thread_mng.cthrd[thrd_idx].tid);
+		pthread_join(conf_thread_mng.cthrd[thrd_idx].tid, NULL);
+		conf_thread_mng.cthrd[thrd_idx].tid = 0;
+		if (need_unlock)
+			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	}
+}
+
+static int
+mlx5_vdpa_c_thread_create(int cpu_core)
+{
+	const struct sched_param sp = {
+		.sched_priority = sched_get_priority_max(SCHED_RR),
+	};
+	rte_cpuset_t cpuset;
+	pthread_attr_t attr;
+	uint32_t thrd_idx;
+	char name[32];
+	int ret;
+
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_attr_init(&attr);
+	ret = pthread_attr_setschedpolicy(&attr, SCHED_RR);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread sched policy = RR.");
+		goto c_thread_err;
+	}
+	ret = pthread_attr_setschedparam(&attr, &sp);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread priority.");
+		goto c_thread_err;
+	}
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++) {
+		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
+				&attr, mlx5_vdpa_c_thread_handle,
+				(void *)&conf_thread_mng);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create vdpa multi-threads %d.",
+					thrd_idx);
+			goto c_thread_err;
+		}
+		CPU_ZERO(&cpuset);
+		if (cpu_core != -1)
+			CPU_SET(cpu_core, &cpuset);
+		else
+			cpuset = rte_lcore_cpuset(rte_get_main_lcore());
+		ret = pthread_setaffinity_np(
+				conf_thread_mng.cthrd[thrd_idx].tid,
+				sizeof(cpuset), &cpuset);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set thread affinity for "
+			"vdpa multi-threads %d.", thrd_idx);
+			goto c_thread_err;
+		}
+		snprintf(name, sizeof(name), "vDPA-mthread-%d", thrd_idx);
+		ret = pthread_setname_np(
+				conf_thread_mng.cthrd[thrd_idx].tid, name);
+		if (ret)
+			DRV_LOG(ERR, "Failed to set vdpa multi-threads name %s.",
+					name);
+		else
+			DRV_LOG(DEBUG, "Thread name: %s.", name);
+	}
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+c_thread_err:
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, false);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return -1;
+}
+
+int
+mlx5_vdpa_mult_threads_create(int cpu_core)
+{
+	pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	if (mlx5_vdpa_c_thread_create(cpu_core)) {
+		DRV_LOG(ERR, "Cannot create vDPA configuration threads.");
+		mlx5_vdpa_mult_threads_destroy(false);
+		return -1;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock)
+{
+	uint32_t thrd_idx;
+
+	if (!conf_thread_mng.initializer_priv)
+		return;
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, need_unlock);
+	pthread_mutex_destroy(&conf_thread_mng.cthrd_lock);
+	memset(&conf_thread_mng, 0, sizeof(struct mlx5_vdpa_conf_thread_mng));
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 2b0f5936d1..b45fbac146 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -507,7 +507,7 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 	pthread_attr_t attr;
 	char name[16];
 	const struct sched_param sp = {
-		.sched_priority = sched_get_priority_max(SCHED_RR),
+		.sched_priority = sched_get_priority_max(SCHED_RR) - 1,
 	};
 
 	if (!priv->eventc)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 138b7bdbc5..599809b09b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -43,7 +43,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 			    errno == EWOULDBLOCK ||
 			    errno == EAGAIN)
 				continue;
-			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s.",
 				virtq->index, strerror(errno));
 		}
 		break;
@@ -57,7 +57,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	rte_spinlock_unlock(&priv->db_lock);
 	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
-		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling.",
 			priv->vid, virtq->index);
 		return;
 	}
@@ -218,7 +218,7 @@ mlx5_vdpa_virtq_query(struct mlx5_vdpa_priv *priv, int index)
 		return -1;
 	}
 	if (attr.state == MLX5_VIRTQ_STATE_ERROR)
-		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu",
+		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu.",
 			priv->vid, index, attr.error_type);
 	return 0;
 }
@@ -380,7 +380,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0");
+		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
 	} else {
 		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 11/17] vdpa/mlx5: add task ring for MT management
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (9 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration Li Zhang
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  17 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+	struct mlx5_vdpa_priv *priv;
+	uint32_t *remaining_cnt;
+	uint32_t *err_cnt;
+	uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
 	pthread_t tid;
+	struct rte_ring *rng;
+	pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..1fdc92d3ad 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,103 @@
 #include <rte_alarm.h>
 #include <rte_tailq.h>
 #include <rte_ring_elem.h>
+#include <rte_ring_peek.h>
 
 #include <mlx5_common.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+	void **obj, uint32_t n, uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_elem_start(r, obj,
+		sizeof(struct mlx5_vdpa_task), n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_elem_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+	void * const *obj, uint32_t n, uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_elem_finish(r, obj,
+		sizeof(struct mlx5_vdpa_task), n);
+	return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num)
+{
+	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t i;
+
+	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+	for (i = 0 ; i < num; i++) {
+		task[i].priv = priv;
+		/* To be added later. */
+	}
+	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+		return -1;
+	for (i = 0 ; i < num; i++)
+		if (task[i].remaining_cnt)
+			__atomic_fetch_add(task[i].remaining_cnt, 1,
+				__ATOMIC_RELAXED);
+	/* wake up conf thread. */
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-	/* To be added later. */
-	return arg;
+	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_priv *priv;
+	struct mlx5_vdpa_task task;
+	struct rte_ring *rng;
+	uint32_t thrd_idx;
+	uint32_t task_num;
+
+	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+		thrd_idx++)
+		if (multhrd->cthrd[thrd_idx].tid == thread_id)
+			break;
+	if (thrd_idx >= multhrd->max_thrds)
+		return NULL;
+	rng = multhrd->cthrd[thrd_idx].rng;
+	while (1) {
+		task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+			(void **)&task, 1, NULL);
+		if (!task_num) {
+			/* No task and condition wait. */
+			pthread_mutex_lock(&multhrd->cthrd_lock);
+			pthread_cond_wait(
+				&multhrd->cthrd[thrd_idx].c_cond,
+				&multhrd->cthrd_lock);
+			pthread_mutex_unlock(&multhrd->cthrd_lock);
+		}
+		priv = task.priv;
+		if (priv == NULL)
+			continue;
+		__atomic_fetch_sub(task.remaining_cnt,
+			1, __ATOMIC_RELAXED);
+		/* To be added later. */
+	}
+	return NULL;
 }
 
 static void
@@ -34,6 +120,10 @@ mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
 		if (need_unlock)
 			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
 	}
+	if (conf_thread_mng.cthrd[thrd_idx].rng) {
+		rte_ring_free(conf_thread_mng.cthrd[thrd_idx].rng);
+		conf_thread_mng.cthrd[thrd_idx].rng = NULL;
+	}
 }
 
 static int
@@ -45,6 +135,7 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 	rte_cpuset_t cpuset;
 	pthread_attr_t attr;
 	uint32_t thrd_idx;
+	uint32_t ring_num;
 	char name[32];
 	int ret;
 
@@ -60,8 +151,26 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 		DRV_LOG(ERR, "Failed to set thread priority.");
 		goto c_thread_err;
 	}
+	ring_num = MLX5_VDPA_MAX_TASKS_PER_THRD / conf_thread_mng.max_thrds;
+	if (!ring_num) {
+		DRV_LOG(ERR, "Invalid ring number for thread.");
+		goto c_thread_err;
+	}
 	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
 		thrd_idx++) {
+		snprintf(name, sizeof(name), "vDPA-mthread-ring-%d",
+			thrd_idx);
+		conf_thread_mng.cthrd[thrd_idx].rng = rte_ring_create_elem(name,
+			sizeof(struct mlx5_vdpa_task), ring_num,
+			rte_socket_id(),
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ |
+			RING_F_EXACT_SZ);
+		if (!conf_thread_mng.cthrd[thrd_idx].rng) {
+			DRV_LOG(ERR,
+			"Failed to create vdpa multi-threads %d ring.",
+			thrd_idx);
+			goto c_thread_err;
+		}
 		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
 				&attr, mlx5_vdpa_c_thread_handle,
 				(void *)&conf_thread_mng);
@@ -91,6 +200,8 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 					name);
 		else
 			DRV_LOG(DEBUG, "Thread name: %s.", name);
+		pthread_cond_init(&conf_thread_mng.cthrd[thrd_idx].c_cond,
+			NULL);
 	}
 	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
 	return 0;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (10 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 11/17] vdpa/mlx5: add task ring for MT management Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management Li Zhang
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 270 ++++++++++++++++++--------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 258 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a9d023ed08..e3b32fa087 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
 	union {
 		struct ibv_mr *mr;
 		struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0xFFFF
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+	MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
 	struct mlx5_vdpa_priv *priv;
+	enum mlx5_vdpa_task_type type;
 	uint32_t *remaining_cnt;
 	uint32_t *err_cnt;
 	uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+	struct rte_vhost_memory *vmem;
+	uint32_t entries_num;
+	uint64_t gcd;
+	uint64_t size;
+	uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
-	struct rte_vhost_memory *vmem;
+	struct mlx5_vdpa_vmem_info vmem_info;
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_event_channel *err_chnl;
 	struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
 	uint8_t num_lag_ports;
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
+	uint16_t last_c_thrd_idx;
+	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
 	void *virtq_db_addr;
 	struct mlx5_pmd_wrapped_mr lm_mr;
-	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+	struct mlx5_vdpa_query_mr **mrs;
 	struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num);
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1fdc92d3ad..10391931ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num)
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
+		void **task_data, uint32_t num)
 {
 	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
 	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t *data = (uint32_t *)task_data;
 	uint32_t i;
 
 	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
 	for (i = 0 ; i < num; i++) {
 		task[i].priv = priv;
 		/* To be added later. */
+		task[i].type = task_type;
+		task[i].remaining_cnt = remaining_cnt;
+		task[i].err_cnt = err_cnt;
+		task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -71,6 +78,23 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time)
+{
+	/* Check and wait all tasks done. */
+	while (__atomic_load_n(remaining_cnt,
+		__ATOMIC_RELAXED) != 0) {
+		rte_delay_us_sleep(sleep_time);
+	}
+	if (__atomic_load_n(err_cnt,
+		__ATOMIC_RELAXED)) {
+		DRV_LOG(ERR, "Tasks done with error.");
+		return true;
+	}
+	return false;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
@@ -81,6 +105,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct rte_ring *rng;
 	uint32_t thrd_idx;
 	uint32_t task_num;
+	int ret;
 
 	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
 		thrd_idx++)
@@ -99,13 +124,29 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&multhrd->cthrd[thrd_idx].c_cond,
 				&multhrd->cthrd_lock);
 			pthread_mutex_unlock(&multhrd->cthrd_lock);
+			continue;
 		}
 		priv = task.priv;
 		if (priv == NULL)
 			continue;
-		__atomic_fetch_sub(task.remaining_cnt,
+		switch (task.type) {
+		case MLX5_VDPA_TASK_REG_MR:
+			ret = mlx5_vdpa_register_mr(priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mr %d.", task.idx);
+				__atomic_fetch_add(task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
+		default:
+			DRV_LOG(ERR, "Invalid vdpa task type %d.",
+			task.type);
+			break;
+		}
+		if (task.remaining_cnt)
+			__atomic_fetch_sub(task.remaining_cnt,
 			1, __ATOMIC_RELAXED);
-		/* To be added later. */
 	}
 	return NULL;
 }
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index d6e3dd664b..e333f0bca6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -17,25 +17,33 @@
 void
 mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
 	struct mlx5_vdpa_query_mr *entry;
-	struct mlx5_vdpa_query_mr *next;
+	int i;
 
-	entry = SLIST_FIRST(&priv->mr_list);
-	while (entry) {
-		next = SLIST_NEXT(entry, next);
-		if (entry->is_indirect)
-			claim_zero(mlx5_devx_cmd_destroy(entry->mkey));
-		else
-			claim_zero(mlx5_glue->dereg_mr(entry->mr));
-		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
-		rte_free(entry);
-		entry = next;
+	if (priv->mrs) {
+		for (i = priv->num_mrs - 1; i >= 0; i--) {
+			entry = &mrs[i];
+			if (entry->is_indirect) {
+				if (entry->mkey)
+					claim_zero(
+					mlx5_devx_cmd_destroy(entry->mkey));
+			} else {
+				if (entry->mr)
+					claim_zero(
+					mlx5_glue->dereg_mr(entry->mr));
+			}
+		}
+		rte_free(priv->mrs);
+		priv->mrs = NULL;
+		priv->num_mrs = 0;
 	}
-	SLIST_INIT(&priv->mr_list);
-	if (priv->vmem) {
-		free(priv->vmem);
-		priv->vmem = NULL;
+	if (priv->vmem_info.vmem) {
+		free(priv->vmem_info.vmem);
+		priv->vmem_info.vmem = NULL;
 	}
+	priv->gpa_mkey_index = 0;
 }
 
 static int
@@ -167,72 +175,29 @@ mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
-/*
- * The target here is to group all the physical memory regions of the
- * virtio device in one indirect mkey.
- * For KLM Fixed Buffer Size mode (HW find the translation entry in one
- * read according to the guest physical address):
- * All the sub-direct mkeys of it must be in the same size, hence, each
- * one of them should be in the GCD size of all the virtio memory
- * regions and the holes between them.
- * For KLM mode (each entry may be in different size so HW must iterate
- * the entries):
- * Each virtio memory region and each hole between them have one entry,
- * just need to cover the maximum allowed size(2G) by splitting entries
- * which their associated memory regions are bigger than 2G.
- * It means that each virtio memory region may be mapped to more than
- * one direct mkey in the 2 modes.
- * All the holes of invalid memory between the virtio memory regions
- * will be mapped to the null memory region for security.
- */
-int
-mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+static int
+mlx5_vdpa_create_indirect_mkey(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_devx_mkey_attr mkey_attr;
-	struct mlx5_vdpa_query_mr *entry = NULL;
-	struct rte_vhost_mem_region *reg = NULL;
-	uint8_t mode = 0;
-	uint32_t entries_num = 0;
-	uint32_t i;
-	uint64_t gcd = 0;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	uint8_t mode = priv->vmem_info.mode;
+	uint32_t entries_num = priv->vmem_info.entries_num;
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_klm klm_array[entries_num];
+	uint64_t gcd = priv->vmem_info.gcd;
+	int ret = -rte_errno;
 	uint64_t klm_size;
-	uint64_t mem_size;
-	uint64_t k;
 	int klm_index = 0;
-	int ret;
-	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
-			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
-	struct mlx5_klm klm_array[entries_num];
+	uint64_t k;
+	uint32_t i;
 
-	if (!mem)
-		return -rte_errno;
-	if (priv->vmem != NULL) {
-		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
-			/* VM memory not changed, reuse resources. */
-			free(mem);
-			return 0;
-		}
-		mlx5_vdpa_mem_dereg(priv);
-	}
-	priv->vmem = mem;
+	/* If it is the last entry, create indirect mkey. */
 	for (i = 0; i < mem->nregions; i++) {
+		entry = &mrs[i];
 		reg = &mem->regions[i];
-		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-		if (!entry) {
-			ret = -ENOMEM;
-			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
-			goto error;
-		}
-		entry->mr = mlx5_glue->reg_mr_iova(priv->cdev->pd,
-				       (void *)(uintptr_t)(reg->host_user_addr),
-				       reg->size, reg->guest_phys_addr,
-				       IBV_ACCESS_LOCAL_WRITE);
-		if (!entry->mr) {
-			DRV_LOG(ERR, "Failed to create direct Mkey.");
-			ret = -rte_errno;
-			goto error;
-		}
-		entry->is_indirect = 0;
 		if (i > 0) {
 			uint64_t sadd;
 			uint64_t empty_region_sz = reg->guest_phys_addr -
@@ -265,11 +230,10 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 			klm_array[klm_index].address = reg->guest_phys_addr + k;
 			klm_index++;
 		}
-		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	}
 	memset(&mkey_attr, 0, sizeof(mkey_attr));
 	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
-	mkey_attr.size = mem_size;
+	mkey_attr.size = priv->vmem_info.size;
 	mkey_attr.pd = priv->cdev->pdn;
 	mkey_attr.umem_id = 0;
 	/* Must be zero for KLM mode. */
@@ -278,25 +242,159 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	mkey_attr.pg_access = 0;
 	mkey_attr.klm_array = klm_array;
 	mkey_attr.klm_num = klm_index;
-	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-	if (!entry) {
-		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
-		ret = -ENOMEM;
-		goto error;
-	}
+	entry = &mrs[mem->nregions];
 	entry->mkey = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &mkey_attr);
 	if (!entry->mkey) {
 		DRV_LOG(ERR, "Failed to create indirect Mkey.");
-		ret = -rte_errno;
-		goto error;
+		rte_errno = -ret;
+		return ret;
 	}
 	entry->is_indirect = 1;
-	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	priv->gpa_mkey_index = entry->mkey->id;
 	return 0;
+}
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	void *mrs;
+	uint8_t mode = 0;
+	int ret = -rte_errno;
+	uint32_t i, thrd_idx, data[1];
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			(priv->vid, &mode, &priv->vmem_info.size,
+			&priv->vmem_info.gcd, &priv->vmem_info.entries_num);
+
+	if (!mem)
+		return -rte_errno;
+	if (priv->vmem_info.vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem_info.vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
+	priv->vmem_info.vmem = mem;
+	priv->vmem_info.mode = mode;
+	priv->num_mrs = mem->nregions;
+	if (!priv->num_mrs || priv->num_mrs >= MLX5_VDPA_MAX_MRS) {
+		DRV_LOG(ERR,
+		"Invalid number of memory regions.");
+		goto error;
+	}
+	/* The last one is indirect mkey entry. */
+	priv->num_mrs++;
+	mrs = rte_zmalloc("mlx5 vDPA memory regions",
+		sizeof(struct mlx5_vdpa_query_mr) * priv->num_mrs, 0);
+	priv->mrs = mrs;
+	if (!priv->mrs) {
+		DRV_LOG(ERR, "Failed to allocate private memory regions.");
+		goto error;
+	}
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[mem->nregions];
+
+		for (i = 0; i < mem->nregions; i++) {
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_REG_MR,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR,
+				"Fail to add task mem region (%d)", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			ret = mlx5_vdpa_register_mr(priv,
+					main_task_idx[i]);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 100)) {
+			DRV_LOG(ERR,
+			"Failed to wait register mem region tasks ready.");
+			goto error;
+		}
+	} else {
+		for (i = 0; i < mem->nregions; i++) {
+			ret = mlx5_vdpa_register_mr(priv, i);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+	}
+	ret = mlx5_vdpa_create_indirect_mkey(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create indirect mkey .");
+		goto error;
+	}
+	return 0;
 error:
-	rte_free(entry);
 	mlx5_vdpa_mem_dereg(priv);
 	rte_errno = -ret;
 	return ret;
 }
+
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx)
+{
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	int ret;
+
+	reg = &mem->regions[idx];
+	entry = &mrs[idx];
+	entry->mr = mlx5_glue->reg_mr_iova
+				      (priv->cdev->pd,
+				       (void *)(uintptr_t)(reg->host_user_addr),
+				       reg->size, reg->guest_phys_addr,
+				       IBV_ACCESS_LOCAL_WRITE);
+	if (!entry->mr) {
+		DRV_LOG(ERR, "Failed to create direct Mkey.");
+		ret = -rte_errno;
+		return ret;
+	}
+	entry->is_indirect = 0;
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 599809b09b..0b317655db 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -353,21 +353,21 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 	}
 	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
 			return -1;
 		}
 		attr->desc_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
 			return -1;
 		}
 		attr->used_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (11 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 14/17] vdpa/mlx5: add virtq LM log task Li Zhang
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++++++++++++++++++-------
 4 files changed, 134 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
+	MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
-	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
 	uint8_t notifier_state;
-	bool stopped;
 	uint32_t configured:1;
+	uint32_t enable:1;
+	uint32_t stopped:1;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
 		enum mlx5_vdpa_task_type task_type,
-		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
 		void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 10391931ae..1389d369ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
 	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
 	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
@@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__ATOMIC_RELAXED);
 			}
 			break;
+		case MLX5_VDPA_TASK_SETUP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_setup(priv,
+				task.idx, false);
+			if (ret) {
+				DRV_LOG(ERR,
+					"Failed to setup virtq %d.", task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1, __ATOMIC_RELAXED);
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
 			goto unlock;
-		virtq->stopped = true;
+		virtq->stopped = 1;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
 			goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 0b317655db..db05220e76 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		if (virtq->index != i)
+			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
-		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -131,7 +132,6 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
@@ -191,7 +191,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
 		return -1;
-	virtq->stopped = true;
+	virtq->stopped = 1;
 	DRV_LOG(DEBUG, "vid %u virtq %u was stopped.", priv->vid, index);
 	return mlx5_vdpa_virtq_query(priv, index);
 }
@@ -411,7 +411,38 @@ mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_doorbell_setup(struct mlx5_vdpa_virtq *virtq,
+		struct rte_vhost_vring *vq, int index)
+{
+	virtq->intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (virtq->intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		return -1;
+	}
+	if (rte_intr_fd_set(virtq->intr_handle, vq->kickfd))
+		return -1;
+	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
+		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
+	} else {
+		if (rte_intr_type_set(virtq->intr_handle,
+			RTE_INTR_HANDLE_EXT))
+			return -1;
+		if (rte_intr_callback_register(virtq->intr_handle,
+			mlx5_vdpa_virtq_kick_handler, virtq)) {
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
+				index);
+			return -1;
+		}
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			rte_intr_fd_get(virtq->intr_handle), index);
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	struct rte_vhost_vring vq;
@@ -455,33 +486,11 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
-	virtq->intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (virtq->intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
-
-	if (rte_intr_fd_set(virtq->intr_handle, vq.kickfd))
-		goto error;
-
-	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
-		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-	} else {
-		if (rte_intr_type_set(virtq->intr_handle, RTE_INTR_HANDLE_EXT))
-			goto error;
-
-		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_kick_handler,
-					       virtq)) {
-			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	if (reg_kick) {
+		if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, index)) {
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
-		} else {
-			DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				index);
 		}
 	}
 	/* Subscribe virtq error event. */
@@ -497,7 +506,6 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		rte_errno = errno;
 		goto error;
 	}
-	virtq->stopped = false;
 	/* Initial notification to ask Qemu handling completed buffers. */
 	if (virtq->eqp.cq.callfd != -1)
 		eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
@@ -567,10 +575,12 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t i;
-	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -590,16 +600,83 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		virtq = &priv->virtqs[i];
-		if (virtq->enable) {
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[nr_vring];
+
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_SETUP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+						"task setup virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (mlx5_vdpa_virtq_setup(priv, i)) {
+			if (mlx5_vdpa_virtq_setup(priv,
+				main_task_idx[i], false)) {
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			goto error;
+		}
+		for (i = 0; i < nr_vring; i++) {
+			/* Setup doorbell mapping in order for Qume. */
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->enable || !virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			if (rte_vhost_get_vhost_vring(priv->vid, i, &vq)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to register virtq %d interrupt.", i);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	} else {
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (virtq->enable) {
+				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
+					goto error;
+				}
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
 	}
 	return 0;
 error:
@@ -663,7 +740,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		mlx5_vdpa_virtq_unset(virtq);
 	}
 	if (enable) {
-		ret = mlx5_vdpa_virtq_setup(priv, index);
+		ret = mlx5_vdpa_virtq_setup(priv, index, true);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to setup virtq %d.", index);
 			return ret;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 14/17] vdpa/mlx5: add virtq LM log task
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (12 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 15/17] vdpa/mlx5: add device close task Li Zhang
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 85 +++++++++++++++++++++------
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
+	MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1389d369ae..98369f0887 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
+	uint64_t features;
 	uint32_t thrd_idx;
 	uint32_t task_num;
 	int ret;
@@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_STOP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			ret = rte_vhost_get_negotiated_features(
+				priv->vid, &features);
+			if (ret) {
+				DRV_LOG(ERR,
+		"Failed to get negotiated features virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(
+				priv->vid, task.idx, 0,
+			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index efebf364d0..c2e78218ca 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
-	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-	int i;
+	int ret;
 
+	ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to get negotiated features.");
 		return -1;
 	}
-	if (!RTE_VHOST_NEED_LOG(features))
-		return 0;
-	for (i = 0; i < priv->nr_virtqs; ++i) {
-		virtq = &priv->virtqs[i];
-		if (!priv->virtqs[i].virtq) {
-			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-		} else {
+	if (priv->use_c_thread && priv->nr_virtqs) {
+		uint32_t main_task_idx[priv->nr_virtqs];
+
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->configured)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_STOP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+					"task stop virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			ret = mlx5_vdpa_virtq_stop(priv, i);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					main_task_idx[i]);
+			if (ret) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.", i);
+				return -1;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			return -1;
+		}
+	} else {
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			ret = mlx5_vdpa_virtq_stop(priv, i);
 			if (ret) {
-				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
-					"log.", i);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d for LM log.", i);
 				return -1;
 			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
-		rte_vhost_log_used_vring(priv->vid, i, 0,
-			      MLX5_VDPA_USED_RING_LEN(priv->virtqs[i].vq_size));
 	}
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 15/17] vdpa/mlx5: add device close task
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (13 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 14/17] vdpa/mlx5: add virtq LM log task Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 16/17] vdpa/mlx5: add virtq sub-resources creation Li Zhang
  2022-06-06 11:46   ` [PATCH v1 17/17] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 56 +++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++++++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e3b32fa087..d000854c08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
 	/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t timeout = 0;
+
+	/* Check and wait all close tasks done. */
+	while (__atomic_load_n(&priv->dev_close_progress,
+		__ATOMIC_RELAXED) != 0 && timeout < 1000) {
+		rte_delay_us_sleep(10000);
+		timeout++;
+	}
+	if (priv->dev_close_progress) {
+		DRV_LOG(ERR,
+		"Failed to wait close device tasks done vid %d.",
+		priv->vid);
+		return true;
+	}
+	return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	if (priv->use_c_thread) {
+		if (priv->last_c_thrd_idx >=
+			(conf_thread_mng.max_thrds - 1))
+			priv->last_c_thrd_idx = 0;
+		else
+			priv->last_c_thrd_idx++;
+		__atomic_store_n(&priv->dev_close_progress,
+			1, __ATOMIC_RELAXED);
+		if (mlx5_vdpa_task_add(priv,
+			priv->last_c_thrd_idx,
+			MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+			NULL, NULL, NULL, 1)) {
+			DRV_LOG(ERR,
+			"Fail to add dev close task. ");
+			goto single_thrd;
+		}
+		priv->state = MLX5_VDPA_STATE_PROBED;
+		DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+		return ret;
+	}
+single_thrd:
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	priv->state = MLX5_VDPA_STATE_PROBED;
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
+	__atomic_store_n(&priv->dev_close_progress, 0,
+		__ATOMIC_RELAXED);
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
+	if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+		return -1;
 	priv->vid = vid;
 	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
@@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED)
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
+		if (priv->use_c_thread)
+			mlx5_vdpa_wait_dev_close_tasks_done(priv);
 		mlx5_vdpa_dev_cache_clean(priv);
+	}
 	priv->connected = false;
 	return 0;
 }
@@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 		mlx5_vdpa_dev_close(priv->vid);
+	if (priv->use_c_thread)
+		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
+	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -206,6 +207,7 @@ struct mlx5_vdpa_priv {
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
 	uint16_t last_c_thrd_idx;
+	uint16_t dev_close_progress;
 	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
@@ -578,4 +580,10 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
+void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 98369f0887..bb2279440b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -63,7 +63,8 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		task[i].type = task_type;
 		task[i].remaining_cnt = remaining_cnt;
 		task[i].err_cnt = err_cnt;
-		task[i].idx = data[i];
+		if (data)
+			task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -187,6 +188,23 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT:
+			mlx5_vdpa_virtq_unreg_intr_handle_all(priv);
+			pthread_mutex_lock(&priv->steer_update_lock);
+			mlx5_vdpa_steer_unset(priv);
+			pthread_mutex_unlock(&priv->steer_update_lock);
+			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_drain_cq(priv);
+			if (priv->lm_mr.addr)
+				mlx5_os_wrapped_mkey_destroy(
+					&priv->lm_mr);
+			if (!priv->connected)
+				mlx5_vdpa_dev_cache_clean(priv);
+			priv->vid = 0;
+			__atomic_store_n(
+				&priv->dev_close_progress, 0,
+				__ATOMIC_RELAXED);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index db05220e76..a08c854b14 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -102,6 +102,20 @@ mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
 	virtq->intr_handle = NULL;
 }
 
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
+
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unregister_intr_handle(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 16/17] vdpa/mlx5: add virtq sub-resources creation
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (14 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 15/17] vdpa/mlx5: add device close task Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  2022-06-06 11:46   ` [PATCH v1 17/17] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 72 +++++++--------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +++++++++++++++++++++--------
 5 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d000854c08..f006a9cd3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_vdpa_virtq *virtq;
+	uint32_t max_queues;
 	uint32_t index;
-	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
 
-	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+	for (index = 0; index < priv->caps.max_num_virtio_queues;
 		index++) {
 		virtq = &priv->virtqs[index];
 		pthread_mutex_init(&virtq->virtq_lock, NULL);
 	}
-	if (!priv->queues)
+	if (!priv->queues || !priv->queue_size)
 		return 0;
-	for (index = 0; index < (priv->queues * 2); ++index) {
+	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	for (index = 0; index < max_queues; ++index)
+		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+			index))
+			goto error;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			goto error;
+	return 0;
+error:
+	for (index = 0; index < max_queues; ++index) {
 		virtq = &priv->virtqs[index];
-		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, virtq);
-
-		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-				index);
-			return -1;
-		}
-		if (priv->caps.queue_counters_valid) {
-			if (!virtq->counters)
-				virtq->counters =
-					mlx5_devx_cmd_create_virtio_q_counters
-						(priv->cdev->ctx);
-			if (!virtq->counters) {
-				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
-					" %d.", index);
-				return -1;
-			}
-		}
-		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-			uint32_t size;
-			void *buf;
-			struct mlx5dv_devx_umem *obj;
-
-			size = priv->caps.umems[i].a * priv->queue_size +
-					priv->caps.umems[i].b;
-			buf = rte_zmalloc(__func__, size, 4096);
-			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
-						" %u.", i, index);
-				return -1;
-			}
-			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-					size, IBV_ACCESS_LOCAL_WRITE);
-			if (obj == NULL) {
-				rte_free(buf);
-				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
-						i, index);
-				return -1;
-			}
-			virtq->umems[i].size = size;
-			virtq->umems[i].buf = buf;
-			virtq->umems[i].obj = obj;
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
-	return 0;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..f353db62ac 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   The guest notification file descriptor.
  * @param[in/out] virtq
  *   Pointer to the virt-queue structure.
+ * @param[in] reset
+ *   If true, it will reset event qp.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq);
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset);
 
 /**
  * Destroy an event QP and all its related resources.
@@ -403,11 +405,13 @@ void mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] is_dummy
+ *   If set, it is updated with dummy queue for prepare resource.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
-int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv);
+int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy);
 
 /**
  * Setup steering and all its related resources to enable RSS traffic from the
@@ -581,9 +585,14 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 void
-mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
-void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
 void
 mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index);
+int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
+void
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f782b6b832..22f0920c88 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -249,7 +249,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
 
 		mlx5_vdpa_queue_complete(cq);
@@ -618,7 +618,7 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
-static int
+int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 {
 	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
@@ -638,7 +638,7 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq)
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset)
 {
 	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
@@ -649,11 +649,10 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 		/* Reuse existing resources. */
 		eqp->cq.callfd = callfd;
 		/* FW will set event qp to error state in q destroy. */
-		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+		if (reset && !mlx5_vdpa_qps2rst2rts(eqp))
 			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 					&eqp->sw_qp.db_rec[0]);
-			return 0;
-		}
+		return 0;
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index 4cbf09784e..c2e0a17ace 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -57,7 +57,7 @@ mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
  * -1 on error.
  */
 static int
-mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int i;
 	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
@@ -67,15 +67,20 @@ mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 						      sizeof(uint32_t), 0);
 	uint32_t k = 0, j;
 	int ret = 0, num;
+	uint16_t nr_vring = is_dummy ?
+	(((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+	(priv->queues * 2) : priv->caps.max_num_virtio_queues) : priv->nr_virtqs;
 
 	if (!attr) {
 		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	for (i = 0; i < nr_vring; i++) {
 		if (is_virtq_recvq(i, priv->nr_virtqs) &&
-		    priv->virtqs[i].enable && priv->virtqs[i].virtq) {
+			(is_dummy || (priv->virtqs[i].enable &&
+			priv->virtqs[i].configured)) &&
+			priv->virtqs[i].virtq) {
 			attr->rq_list[k] = priv->virtqs[i].virtq->id;
 			k++;
 		}
@@ -235,12 +240,12 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 }
 
 int
-mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int ret;
 
 	pthread_mutex_lock(&priv->steer_update_lock);
-	ret = mlx5_vdpa_rqt_prepare(priv);
+	ret = mlx5_vdpa_rqt_prepare(priv, is_dummy);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
@@ -261,7 +266,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
-	if (mlx5_vdpa_steer_update(priv))
+	if (mlx5_vdpa_steer_update(priv, false))
 		goto error;
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index a08c854b14..20ce382487 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -146,10 +146,10 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-static int
+void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int ret = -EAGAIN;
+	int ret;
 
 	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
@@ -157,12 +157,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+		virtq->index = 0;
+		virtq->virtq = NULL;
+		virtq->configured = 0;
 	}
-	virtq->virtq = NULL;
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
-	return 0;
 }
 
 void
@@ -175,6 +175,9 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
+		if (i < (priv->queues * 2))
+			mlx5_vdpa_virtq_single_resource_prepare(
+					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 	priv->features = 0;
@@ -258,7 +261,8 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 static int
 mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		struct mlx5_devx_virtq_attr *attr,
-		struct rte_vhost_vring *vq, int index)
+		struct rte_vhost_vring *vq,
+		int index, bool is_prepare)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	uint64_t gpa;
@@ -277,11 +281,15 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
 			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
 			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
-	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr->virtio_version_1_0 =
+	attr->tso_ipv4 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 = is_prepare ? 1 :
 		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
 	attr->q_type =
 		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
@@ -290,12 +298,12 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr->event_mode = vq->callfd != -1 ||
+	attr->event_mode = is_prepare || vq->callfd != -1 ||
 	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, virtq);
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq->size,
+				vq->callfd, virtq, !virtq->virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -320,7 +328,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	if (virtq->virtq) {
+	if (!virtq->virtq) {
 		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 			uint32_t size;
 			void *buf;
@@ -345,7 +353,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			buf = rte_zmalloc(__func__,
 				size, 4096);
 			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq."
 				" %u.", i, index);
 				return -1;
 			}
@@ -366,7 +374,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			attr->umems[i].size = virtq->umems[i].size;
 		}
 	}
-	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (!is_prepare && attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
@@ -389,21 +397,23 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid,
+	if (!is_prepare) {
+		ret = rte_vhost_get_vring_base(priv->vid,
 			index, &last_avail_idx, &last_used_idx);
-	if (ret) {
-		last_avail_idx = 0;
-		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
-	} else {
-		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		if (ret) {
+			last_avail_idx = 0;
+			last_used_idx = 0;
+			DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
+		} else {
+			DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
+		}
 	}
 	attr->hw_available_index = last_avail_idx;
 	attr->hw_used_index = last_used_idx;
 	attr->q_size = vq->size;
-	attr->mkey = priv->gpa_mkey_index;
+	attr->mkey = is_prepare ? 0 : priv->gpa_mkey_index;
 	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
 	attr->queue_index = index;
 	attr->pd = priv->cdev->pdn;
@@ -416,6 +426,39 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq = {
+		.size = priv->queue_size,
+		.callfd = -1,
+	};
+	int ret;
+
+	virtq = &priv->virtqs[index];
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	virtq->configured = 0;
+	virtq->virtq = NULL;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr, &vq, index, true);
+	if (ret) {
+		DRV_LOG(ERR,
+		"Cannot prepare setup resource for virtq %d.", index);
+		return true;
+	}
+	if (mlx5_vdpa_is_modify_virtq_supported(priv)) {
+		virtq->virtq =
+		mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+		virtq->priv = priv;
+		if (!virtq->virtq)
+			return true;
+	}
+	return false;
+}
+
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 {
@@ -473,7 +516,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 	virtq->priv = priv;
 	virtq->stopped = 0;
 	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
-				&vq, index);
+				&vq, index, false);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to setup update virtq attr"
 			" %d.", index);
@@ -746,7 +789,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to disable steering "
 					"for virtq %d.", index);
@@ -761,7 +804,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		}
 		virtq->enable = 1;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to enable steering "
 					"for virtq %d.", index);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v1 17/17] vdpa/mlx5: prepare virtqueue resource creation
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
                     ` (15 preceding siblings ...)
  2022-06-06 11:46   ` [PATCH v1 16/17] vdpa/mlx5: add virtq sub-resources creation Li Zhang
@ 2022-06-06 11:46   ` Li Zhang
  16 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-06 11:46 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 115 ++++++++++++++++++++------
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  12 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  15 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 111 +++++++++++++++++++++----
 4 files changed, 208 insertions(+), 45 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f006a9cd3f..c5d82872c7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv,
+		bool release_resource)
 {
-	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-	struct mlx5_vdpa_priv *priv =
-		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
 	int ret = 0;
+	int vid = priv->vid;
 
-	if (priv == NULL) {
-		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
-		return -1;
-	}
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
-	if (priv->use_c_thread) {
+	if (priv->use_c_thread && !release_resource) {
 		if (priv->last_c_thrd_idx >=
 			(conf_thread_mng.max_thrds - 1))
 			priv->last_c_thrd_idx = 0;
@@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid)
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, release_resource);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid)
 	return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (!vdev) {
+		DRV_LOG(ERR, "Invalid vDPA device.");
+		return -1;
+	}
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	return _internal_mlx5_vdpa_dev_close(priv, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		priv->queue_size);
 }
 
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t max_queues, index;
+	struct mlx5_vdpa_virtq *virtq;
+
+	if (!priv->queues || !priv->queue_size)
+		return;
+	max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	for (index = 0; index < max_queues; ++index) {
+		virtq = &priv->virtqs[index];
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t max_queues;
-	uint32_t index;
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t max_queues, index, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 
 	for (index = 0; index < priv->caps.max_num_virtio_queues;
@@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
 		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
-	for (index = 0; index < max_queues; ++index)
-		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-			index))
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[max_queues];
+
+		for (index = 0; index < max_queues; ++index) {
+			thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = index;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = index;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_PREPARE_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+				"task prepare virtq (%d).", index);
+				main_task_idx[task_num] = index;
+				task_num++;
+			}
+		}
+		for (index = 0; index < task_num; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				main_task_idx[index]))
+				goto error;
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue prepare tasks ready.");
 			goto error;
+		}
+	} else {
+		for (index = 0; index < max_queues; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				index))
+				goto error;
+	}
 	if (mlx5_vdpa_is_modify_virtq_supported(priv))
 		if (mlx5_vdpa_steer_update(priv, true))
 			goto error;
 	return 0;
 error:
-	for (index = 0; index < max_queues; ++index) {
-		virtq = &priv->virtqs[index];
-		if (virtq->virtq) {
-			pthread_mutex_lock(&virtq->virtq_lock);
-			mlx5_vdpa_virtq_unset(virtq);
-			pthread_mutex_unlock(&virtq->virtq_lock);
-		}
-	}
-	if (mlx5_vdpa_is_modify_virtq_supported(priv))
-		mlx5_vdpa_steer_unset(priv);
+	mlx5_vdpa_prepare_virtq_destroy(priv);
 	return -1;
 }
 
@@ -860,7 +923,7 @@ static void
 mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-		mlx5_vdpa_dev_close(priv->vid);
+		_internal_mlx5_vdpa_dev_close(priv, true);
 	if (priv->use_c_thread)
 		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f353db62ac..dc4dfba5ed 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -85,6 +85,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
 	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+	MLX5_VDPA_TASK_PREPARE_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -128,6 +129,9 @@ struct mlx5_vdpa_virtq {
 	uint32_t configured:1;
 	uint32_t enable:1;
 	uint32_t stopped:1;
+	uint32_t rx_csum:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t event_mode:3;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -355,8 +359,12 @@ void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] release_resource
+ *   The vdpa driver release resource without prepare resource.
  */
-void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+		bool release_resource);
 
 /**
  * Cleanup cached resources of all virtqs.
@@ -595,4 +603,6 @@ int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
 void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index bb2279440b..6e6624e5a3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -153,6 +153,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__atomic_fetch_add(
 					task.err_cnt, 1, __ATOMIC_RELAXED);
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
 		case MLX5_VDPA_TASK_STOP_VIRTQ:
@@ -193,7 +194,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			pthread_mutex_lock(&priv->steer_update_lock);
 			mlx5_vdpa_steer_unset(priv);
 			pthread_mutex_unlock(&priv->steer_update_lock);
-			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_virtqs_release(priv, false);
 			mlx5_vdpa_drain_cq(priv);
 			if (priv->lm_mr.addr)
 				mlx5_os_wrapped_mkey_destroy(
@@ -205,6 +206,18 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&priv->dev_close_progress, 0,
 				__ATOMIC_RELAXED);
 			break;
+		case MLX5_VDPA_TASK_PREPARE_VIRTQ:
+			ret = mlx5_vdpa_virtq_single_resource_prepare(
+					priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to prepare virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+				task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 20ce382487..d4dd73f861 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -116,18 +116,29 @@ mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
 	}
 }
 
+static void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq)
+{
+	/* Clean pre-created resource in dev removal only */
+	claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+	virtq->index = 0;
+	virtq->virtq = NULL;
+	virtq->configured = 0;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
+	mlx5_vdpa_steer_unset(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
-		if (virtq->index != i)
-			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
+		if (virtq->virtq)
+			mlx5_vdpa_vq_destroy(virtq);
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -157,29 +168,37 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
-		virtq->index = 0;
-		virtq->virtq = NULL;
-		virtq->configured = 0;
 	}
+	mlx5_vdpa_vq_destroy(virtq);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 }
 
 void
-mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+	bool release_resource)
 {
 	struct mlx5_vdpa_virtq *virtq;
-	int i;
-
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	uint32_t i, max_virtq, valid_vq_num;
+
+	valid_vq_num = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : priv->caps.max_num_virtio_queues;
+	max_virtq = (release_resource &&
+		(valid_vq_num) > priv->nr_virtqs) ?
+		(valid_vq_num) : priv->nr_virtqs;
+	for (i = 0; i < max_virtq; i++) {
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
-		if (i < (priv->queues * 2))
+		virtq->enable = 0;
+		if (!release_resource && i < valid_vq_num)
 			mlx5_vdpa_virtq_single_resource_prepare(
 					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
+	if (!release_resource && priv->queues &&
+		mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			mlx5_vdpa_steer_unset(priv);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -455,6 +474,9 @@ mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
 		virtq->priv = priv;
 		if (!virtq->virtq)
 			return true;
+		virtq->rx_csum = attr.rx_csum;
+		virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+		virtq->event_mode = attr.event_mode;
 	}
 	return false;
 }
@@ -538,6 +560,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 		goto error;
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->rx_csum = attr.rx_csum;
+	virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+	virtq->event_mode = attr.event_mode;
 	virtq->configured = 1;
 	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
@@ -629,6 +654,31 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 	return 0;
 }
 
+static bool
+mlx5_vdpa_is_pre_created_vq_mismatch(struct mlx5_vdpa_priv *priv,
+		struct mlx5_vdpa_virtq *virtq)
+{
+	struct rte_vhost_vring vq;
+	uint32_t event_mode;
+
+	if (virtq->rx_csum !=
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)))
+		return true;
+	if (virtq->virtio_version_1_0 !=
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1)))
+		return true;
+	if (rte_vhost_get_vhost_vring(priv->vid, virtq->index, &vq))
+		return true;
+	if (vq.size != virtq->vq_size)
+		return true;
+	event_mode = vq.callfd != -1 || !(priv->caps.event_mode &
+		(1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+		MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (virtq->event_mode != event_mode)
+		return true;
+	return false;
+}
+
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
@@ -664,6 +714,15 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			virtq = &priv->virtqs[i];
 			if (!virtq->enable)
 				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
 			if (!thrd_idx) {
 				main_task_idx[task_num] = i;
@@ -693,6 +752,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
@@ -724,20 +784,32 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	} else {
 		for (i = 0; i < nr_vring; i++) {
 			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv,
+					virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(
+					priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (virtq->enable) {
-				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
-					pthread_mutex_unlock(
+			if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+				pthread_mutex_unlock(
 						&virtq->virtq_lock);
-					goto error;
-				}
+				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
 error:
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, true);
 	return -1;
 }
 
@@ -795,6 +867,11 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 					"for virtq %d.", index);
 		}
 		mlx5_vdpa_virtq_unset(virtq);
+	} else {
+		if (virtq->virtq &&
+			mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq))
+			DRV_LOG(WARNING,
+			"Configuration mismatch dummy virtq %d.", index);
 	}
 	if (enable) {
 		ret = mlx5_vdpa_virtq_setup(priv, index, true);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 00/15] mlx5/vdpa: optimize live migration time
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (16 preceding siblings ...)
  2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
@ 2022-06-16  2:29 ` Li Zhang
  2022-06-16  2:29   ` [PATCH v2 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
                     ` (15 more replies)
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
  19 siblings, 16 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:29 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Allow the driver to use internal threads to
obtain fast configuration.
All the threads will be open on the same core of
the event completion queue scheduling thread.

Add max_conf_threads parameter to configure
the maximum number of internal threads in addition to
the caller thread (8 is suggested).
These internal threads to pipeline handle VDPA tasks
in system and shared with all VDPA devices.
Default is 0, don't use internal threads for configuration.

Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
http://patchwork.dpdk.org/project/dpdk/list/?series=21868

RFC ("Add vDPA multi-threads optiomization")
https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-lizh@nvidia.com/

V2:
* Drop eal device removal patch in series.
* Add release note in release_22_07.rst.

Li Zhang (12):
  vdpa/mlx5: fix usage of capability for max number of virtqs
  common/mlx5: extend virtq modifiable fields
  vdpa/mlx5: pre-create virtq in the prob
  vdpa/mlx5: optimize datapath-control synchronization
  vdpa/mlx5: add multi-thread management for configuration
  vdpa/mlx5: add task ring for MT management
  vdpa/mlx5: add MT task for VM memory registration
  vdpa/mlx5: add virtq creation task for MT management
  vdpa/mlx5: add virtq LM log task
  vdpa/mlx5: add device close task
  vdpa/mlx5: add virtq sub-resources creation
  vdpa/mlx5: prepare virtqueue resource creation

Yajun Wu (3):
  vdpa/mlx5: support pre create virtq resource
  common/mlx5: add DevX API to move QP to reset state
  vdpa/mlx5: support event qp reuse

 doc/guides/rel_notes/release_22_07.rst |   5 +
 doc/guides/vdpadevs/mlx5.rst           |  25 +
 drivers/common/mlx5/mlx5_devx_cmds.c   |  77 ++-
 drivers/common/mlx5/mlx5_devx_cmds.h   |   6 +-
 drivers/common/mlx5/mlx5_prm.h         |  30 +-
 drivers/vdpa/mlx5/meson.build          |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c          | 270 ++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h          | 152 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  | 360 ++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c    | 160 ++++--
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c       | 128 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c      | 270 ++++++----
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c    |  22 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 654 ++++++++++++++++++-------
 14 files changed, 1776 insertions(+), 384 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
@ 2022-06-16  2:29   ` Li Zhang
  2022-06-17 14:27     ` Maxime Coquelin
  2022-06-16  2:29   ` [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
                     ` (14 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:29 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Maxime Coquelin
  Cc: dev, thomas, rasland, roniba, stable

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array")
Cc: stable@dpdk.org

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 12 ++++++------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 76fa5d4299..ee71339b78 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -84,7 +84,7 @@ mlx5_vdpa_get_queue_num(struct rte_vdpa_device *vdev, uint32_t *queue_num)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	*queue_num = priv->caps.max_num_virtio_queues;
+	*queue_num = priv->caps.max_num_virtio_queues / 2;
 	return 0;
 }
 
@@ -141,7 +141,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -EINVAL;
 	}
-	if (vring >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (vring >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
@@ -388,7 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -411,7 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -624,7 +624,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		DRV_LOG(DEBUG, "No capability to support virtq statistics.");
 	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv) +
 			   sizeof(struct mlx5_vdpa_virtq) *
-			   attr->vdpa.max_num_virtio_queues * 2,
+			   attr->vdpa.max_num_virtio_queues,
 			   RTE_CACHE_LINE_SIZE);
 	if (!priv) {
 		DRV_LOG(ERR, "Failed to allocate private memory.");
@@ -685,7 +685,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 
 	mlx5_vdpa_dev_cache_clean(priv);
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
 			continue;
 		claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e025be47d2..c258eb3024 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -72,7 +72,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
@@ -492,9 +492,9 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM.");
 		priv->features |= (1ULL << VIRTIO_NET_F_CSUM);
 	}
-	if (nr_vring > priv->caps.max_num_virtio_queues * 2) {
+	if (nr_vring > priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Do not support more than %d virtqs(%d).",
-			(int)priv->caps.max_num_virtio_queues * 2,
+			(int)priv->caps.max_num_virtio_queues,
 			(int)nr_vring);
 		return -1;
 	}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
  2022-06-16  2:29   ` [PATCH v2 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
@ 2022-06-16  2:29   ` Li Zhang
  2022-06-17 15:36     ` Maxime Coquelin
  2022-06-16  2:30   ` [PATCH v2 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
                     ` (13 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:29 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

The motivation of this change is to reduce vDPA device queue creation
time by create some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++++++++++++++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+    first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx queue)
+    for pre-create queue resource to speed up first time queue creation. Set it
+    together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^^^^^^^^^^^^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee71339b78..faf833ee2f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-	mlx5_vdpa_virtqs_cleanup(priv);
+	/* Clean pre-created resource in dev removal only. */
+	if (!priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 		priv->hw_max_latency_us = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_pending_comp") == 0) {
 		priv->hw_max_pending_comp = (uint32_t)tmp;
+	} else if (strcmp(key, "queue_size") == 0) {
+		priv->queue_size = (uint16_t)tmp;
+	} else if (strcmp(key, "queues") == 0) {
+		priv->queues = (uint16_t)tmp;
+	} else {
+		DRV_LOG(WARNING, "Invalid key %s.", key);
 	}
 	return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 	if (!priv->event_us &&
 	    priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
 		priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+	if ((priv->queue_size && !priv->queues) ||
+		(!priv->queue_size && priv->queues)) {
+		priv->queue_size = 0;
+		priv->queues = 0;
+		DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+	}
 	DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
 	DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+	DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+		priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t index;
+	uint32_t i;
+
+	if (!priv->queues)
+		return 0;
+	for (index = 0; index < (priv->queues * 2); ++index) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+		if (priv->caps.queue_counters_valid) {
+			if (!virtq->counters)
+				virtq->counters =
+					mlx5_devx_cmd_create_virtio_q_counters
+						(priv->cdev->ctx);
+			if (!virtq->counters) {
+				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
+					" %d.", index);
+				return -1;
+			}
+		}
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size = priv->caps.umems[i].a * priv->queue_size +
+					priv->caps.umems[i].b;
+			buf = rte_zmalloc(__func__, size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+						" %u.", i, index);
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
+					size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				rte_free(buf);
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+						i, index);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
+		}
+	}
+	return 0;
 }
 
 static int
@@ -604,6 +671,8 @@ mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
 		return -rte_errno;
 	if (mlx5_vdpa_event_qp_global_prepare(priv))
 		return -rte_errno;
+	if (mlx5_vdpa_virtq_resource_prepare(priv))
+		return -rte_errno;
 	return 0;
 }
 
@@ -638,6 +707,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		priv->num_lag_ports = 1;
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
+	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -646,7 +716,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	mlx5_vdpa_config_get(mkvlist, priv);
 	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
@@ -684,6 +753,8 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	if (priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_dev_cache_clean(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e7f3319f89..f6719a3c60 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -135,6 +135,8 @@ struct mlx5_vdpa_priv {
 	uint8_t hw_latency_mode; /* Hardware CQ moderation mode. */
 	uint16_t hw_max_latency_us; /* Hardware CQ moderation period in usec. */
 	uint16_t hw_max_pending_comp; /* Hardware CQ moderation counter. */
+	uint16_t queue_size; /* virtq depth for pre-creating virtq resource */
+	uint16_t queues; /* Max virtq pair for pre-creating virtq resource */
 	struct rte_vdpa_device *vdev; /* vDPA device. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	int vid; /* vhost device id. */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 03/15] common/mlx5: add DevX API to move QP to reset state
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
  2022-06-16  2:29   ` [PATCH v2 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
  2022-06-16  2:29   ` [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-17 15:41     ` Maxime Coquelin
  2022-06-16  2:30   ` [PATCH v2 04/15] vdpa/mlx5: support event qp reuse Li Zhang
                     ` (12 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++++++
 drivers/common/mlx5/mlx5_prm.h       | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6bdbc12bb..1d6d6578d6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
 	} in;
 	union {
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
 	} out;
 	void *qpc;
 	int ret;
@@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		inlen = sizeof(in.rtr2rts);
 		outlen = sizeof(out.rtr2rts);
 		break;
+	case MLX5_CMD_OP_QP_2RST:
+		MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+		inlen = sizeof(in.qp2rst);
+		outlen = sizeof(out.qp2rst);
+		break;
 	default:
 		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
 			qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index bc3e70a1d1..8a2f55c33e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
 	u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 vhca_tunnel_id[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_80[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
 	u8 status[0x8];
 	u8 reserved_0[0x18];
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 04/15] vdpa/mlx5: support event qp reuse
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (2 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  2:30   ` [PATCH v2 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destroy,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index faf833ee2f..ee99952e11 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+					-1, &virtq->eqp);
 
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+				index);
+			return -1;
+		}
 		if (priv->caps.queue_counters_valid) {
 			if (!virtq->counters)
 				virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
 	struct mlx5_vdpa_cq cq;
 	struct mlx5_devx_obj *fw_qp;
 	struct mlx5_devx_qp sw_qp;
+	uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			      int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		};
 		uint32_t word;
 	} last_word;
-	uint16_t next_wqe_counter = cq->cq_ci;
+	uint16_t next_wqe_counter = eqp->qp_pi;
 	uint16_t cur_wqe_counter;
 	uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		rte_io_wmb();
 		/* Ring CQ doorbell record. */
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+		eqp->qp_pi += comp;
 		rte_io_wmb();
 		/* Ring SW QP doorbell record. */
-		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
 	}
 	return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 	return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+		mlx5_vdpa_queue_complete(cq);
+		if (cq->cq_obj.cq) {
+			cq->cq_obj.cqes[0].wqe_counter =
+				rte_cpu_to_be_16(UINT16_MAX);
+			priv->virtqs[i].eqp.qp_pi = 0;
+			if (!cq->armed)
+				mlx5_vdpa_cq_arm(priv, cq);
+		}
+	}
+}
+
 /* Wait on all CQs channel for completion event. */
 static struct mlx5_vdpa_cq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
@@ -574,14 +594,44 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
+static int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
+					  eqp->sw_qp.qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp.qp,
+			MLX5_CMD_OP_QP_2RST, eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	return mlx5_vdpa_qps2rts(eqp);
+}
+
 int
-mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			  int callfd, struct mlx5_vdpa_event_qp *eqp)
 {
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
+	if (eqp->cq.cq_obj.cq != NULL && log_desc_n == eqp->cq.log_desc_n) {
+		/* Reuse existing resources. */
+		eqp->cq.callfd = callfd;
+		/* FW will set event qp to error state in q destroy. */
+		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+					&eqp->sw_qp.db_rec[0]);
+			return 0;
+		}
+	}
+	if (eqp->fw_qp)
+		mlx5_vdpa_event_qp_destroy(eqp);
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
@@ -608,8 +658,10 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (mlx5_vdpa_qps2rts(eqp))
 		goto error;
+	eqp->qp_pi = 0;
 	/* First ringing. */
-	rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+	if (eqp->sw_qp.db_rec)
+		rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 			&eqp->sw_qp.db_rec[0]);
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c258eb3024..6637ba1503 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -87,6 +87,8 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 			}
 			virtq->umems[j].size = 0;
 		}
+		if (virtq->eqp.fw_qp)
+			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	}
 }
 
@@ -117,8 +119,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	if (virtq->eqp.fw_qp)
-		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
@@ -246,7 +246,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 						      MLX5_VIRTQ_EVENT_MODE_QP :
 						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
 						&virtq->eqp);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 05/15] common/mlx5: extend virtq modifiable fields
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (3 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 04/15] vdpa/mlx5: support event qp reuse Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-17 15:45     ` Maxime Coquelin
  2022-06-16  2:30   ` [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob Li Zhang
                     ` (10 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++++++++++++++++++++++-----
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h       | 13 +++++-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d6d6578d6..1b68c37092 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
 		vdpa_attr->log_doorbell_stride =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_stride);
+		vdpa_attr->vnet_modify_ext =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 vnet_modify_ext);
+		vdpa_attr->virtio_net_q_addr_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_net_q_addr_modify);
+		vdpa_attr->virtio_q_index_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_q_index_modify);
 		vdpa_attr->log_doorbell_bar_size =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_bar_size);
@@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+		attr->mod_fields_bitmap);
 	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-	switch (attr->type) {
-	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+	if (!attr->mod_fields_bitmap) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
 		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 			 attr->dirty_bitmap_mkey);
 		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 			 attr->dirty_bitmap_addr);
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 			 attr->dirty_bitmap_size);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+	}
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 			 attr->dirty_bitmap_dump_enable);
-		break;
-	default:
-		rte_errno = EINVAL;
-		return -rte_errno;
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+		MLX5_SET(virtio_q, virtctx, queue_period_mode,
+			attr->hw_latency_mode);
+		MLX5_SET(virtio_q, virtctx, queue_period_us,
+			attr->hw_max_latency_us);
+		MLX5_SET(virtio_q, virtctx, queue_max_count,
+			attr->hw_max_pending_comp);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+		MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+		MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+		MLX5_SET64(virtio_q, virtctx, available_addr,
+			attr->available_addr);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+			attr->hw_used_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+		MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+		MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY)
+		MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	if (attr->mod_fields_bitmap &
+		MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK) {
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+		MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+		MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE) {
+		MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+		MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
 	}
 	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
 					 out, sizeof(out));
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 3747ef9e33..ec6467d927 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -74,6 +74,9 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t queue_counters_valid:1;
+	uint32_t vnet_modify_ext:1;
+	uint32_t virtio_net_q_addr_modify:1;
+	uint32_t virtio_q_index_modify:1;
 	uint32_t max_num_virtio_queues;
 	struct {
 		uint32_t a;
@@ -465,7 +468,7 @@ struct mlx5_devx_virtq_attr {
 	uint32_t tis_id;
 	uint32_t counters_obj_id;
 	uint64_t dirty_bitmap_addr;
-	uint64_t type;
+	uint64_t mod_fields_bitmap;
 	uint64_t desc_addr;
 	uint64_t used_addr;
 	uint64_t available_addr;
@@ -475,6 +478,7 @@ struct mlx5_devx_virtq_attr {
 		uint64_t offset;
 	} umems[3];
 	uint8_t error_type;
+	uint8_t q_type;
 };
 
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 8a2f55c33e..5f58a6ee1d 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1802,7 +1802,9 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
 	u8 virtio_queue_type[0x8];
 	u8 reserved_at_20[0x13];
 	u8 log_doorbell_stride[0x5];
-	u8 reserved_at_3b[0x3];
+	u8 vnet_modify_ext[0x1];
+	u8 virtio_net_q_addr_modify[0x1];
+	u8 virtio_q_index_modify[0x1];
 	u8 log_doorbell_bar_size[0x5];
 	u8 doorbell_bar_offset[0x40];
 	u8 reserved_at_80[0x8];
@@ -3024,6 +3026,15 @@ enum {
 	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD = (1UL << 5),
+	MLX5_VIRTQ_MODIFY_TYPE_ADDR = (1UL << 6),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX = (1UL << 7),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX = (1UL << 8),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE = (1UL << 9),
+	MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 = (1UL << 10),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY = (1UL << 11),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK = (1UL << 12),
+	MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE = (1UL << 13),
 };
 
 struct mlx5_ifc_virtio_q_bits {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (4 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-17 15:53     ` Maxime Coquelin
  2022-06-16  2:30   ` [PATCH v2 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
                     ` (9 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/rel_notes/release_22_07.rst |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa.h          |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c       |  13 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 257 +++++++++++++++----------
 4 files changed, 174 insertions(+), 104 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index f2cf41def9..2056cd9ee7 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -175,6 +175,10 @@ New Features
   This is a fall-back implementation for platforms that
   don't support vector operations.
 
+* **Updated Nvidia mlx5 vDPA driver.**
+
+  * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources.
+
 
 Removed Items
 -------------
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
 	uint16_t vq_size;
 	uint8_t notifier_state;
 	bool stopped;
+	uint32_t configured:1;
 	uint32_t version;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..a8faf0c116 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,14 +12,17 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.mod_fields_bitmap =
+			MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
 		.dirty_bitmap_dump_enable = enable,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
@@ -37,10 +40,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 			   uint64_t log_size)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
 		.dirty_bitmap_addr = log_base,
 		.dirty_bitmap_size = log_size,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 	int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
 					      priv->cdev->pdn,
@@ -54,7 +58,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 						      &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6637ba1503..55cbc9fad2 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		rte_intr_fd_set(virtq->intr_handle, -1);
 	}
 	rte_intr_instance_free(virtq->intr_handle);
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
+		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
-			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE,
 			.state = state ? MLX5_VIRTQ_STATE_RDY :
 					 MLX5_VIRTQ_STATE_SUSPEND,
 			.queue_index = virtq->index,
@@ -153,7 +155,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	int ret;
 
-	if (virtq->stopped)
+	if (virtq->stopped || !virtq->configured)
 		return 0;
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
@@ -209,51 +211,54 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
+		struct mlx5_devx_virtq_attr *attr,
+		struct rte_vhost_vring *vq, int index)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
-	struct rte_vhost_vring vq;
-	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
 	unsigned int i;
-	uint16_t last_avail_idx;
-	uint16_t last_used_idx;
-	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
-	uint64_t cookie;
-
-	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
-	if (ret)
-		return -1;
-	if (vq.size == 0)
-		return 0;
-	virtq->index = index;
-	virtq->vq_size = vq.size;
-	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
-							VIRTIO_F_VERSION_1));
-	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+	uint16_t last_avail_idx = 0;
+	uint16_t last_used_idx = 0;
+
+	if (virtq->virtq)
+		attr->mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE |
+			MLX5_VIRTQ_MODIFY_TYPE_ADDR |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
+			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
+			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 =
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
+	attr->q_type =
+		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
 			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
-					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
-						      MLX5_VIRTQ_EVENT_MODE_QP :
-						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
-	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
-						&virtq->eqp);
+	attr->event_mode = vq->callfd != -1 ||
+	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
+		ret = mlx5_vdpa_event_qp_prepare(priv,
+				vq->size, vq->callfd, &virtq->eqp);
 		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+			DRV_LOG(ERR,
+				"Failed to create event QPs for virtq %d.",
 				index);
 			return -1;
 		}
-		attr.qp_id = virtq->eqp.fw_qp->id;
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+		attr->qp_id = virtq->eqp.fw_qp->id;
 	} else {
 		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
 			" need event QPs and event mechanism.", index);
@@ -265,77 +270,82 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (!virtq->counters) {
 			DRV_LOG(ERR, "Failed to create virtq couners for virtq"
 				" %d.", index);
-			goto error;
+			return -1;
 		}
-		attr.counters_obj_id = virtq->counters->id;
+		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		uint32_t size;
-		void *buf;
-		struct mlx5dv_devx_umem *obj;
-
-		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
-		if (virtq->umems[i].size == size &&
-		    virtq->umems[i].obj != NULL) {
-			/* Reuse registered memory. */
-			memset(virtq->umems[i].buf, 0, size);
-			goto reuse;
-		}
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
+	if (virtq->virtq) {
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size =
+		priv->caps.umems[i].a * vq->size + priv->caps.umems[i].b;
+			if (virtq->umems[i].size == size &&
+				virtq->umems[i].obj != NULL) {
+				/* Reuse registered memory. */
+				memset(virtq->umems[i].buf, 0, size);
+				goto reuse;
+			}
+			if (virtq->umems[i].obj)
+				claim_zero(mlx5_glue->devx_umem_dereg
 				   (virtq->umems[i].obj));
-		if (virtq->umems[i].buf)
-			rte_free(virtq->umems[i].buf);
-		virtq->umems[i].size = 0;
-		virtq->umems[i].obj = NULL;
-		virtq->umems[i].buf = NULL;
-		buf = rte_zmalloc(__func__, size, 4096);
-		if (buf == NULL) {
-			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+			if (virtq->umems[i].buf)
+				rte_free(virtq->umems[i].buf);
+			virtq->umems[i].size = 0;
+			virtq->umems[i].obj = NULL;
+			virtq->umems[i].buf = NULL;
+			buf = rte_zmalloc(__func__,
+				size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
-			goto error;
-		}
-		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
-					       IBV_ACCESS_LOCAL_WRITE);
-		if (obj == NULL) {
-			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
+				buf, size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
-			goto error;
-		}
-		virtq->umems[i].size = size;
-		virtq->umems[i].buf = buf;
-		virtq->umems[i].obj = obj;
+				rte_free(buf);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
 reuse:
-		attr.umems[i].id = virtq->umems[i].obj->umem_id;
-		attr.umems[i].offset = 0;
-		attr.umems[i].size = virtq->umems[i].size;
+			attr->umems[i].id = virtq->umems[i].obj->umem_id;
+			attr->umems[i].offset = 0;
+			attr->umems[i].size = virtq->umems[i].size;
+		}
 	}
-	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.desc);
+					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
-			goto error;
+			return -1;
 		}
-		attr.desc_addr = gpa;
+		attr->desc_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.used);
+					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
-			goto error;
+			return -1;
 		}
-		attr.used_addr = gpa;
+		attr->used_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.avail);
+					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-			goto error;
+			return -1;
 		}
-		attr.available_addr = gpa;
+		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
-				 &last_used_idx);
+	ret = rte_vhost_get_vring_base(priv->vid,
+			index, &last_avail_idx, &last_used_idx);
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
@@ -345,24 +355,71 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
 	}
-	attr.hw_available_index = last_avail_idx;
-	attr.hw_used_index = last_used_idx;
-	attr.q_size = vq.size;
-	attr.mkey = priv->gpa_mkey_index;
-	attr.tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
-	attr.queue_index = index;
-	attr.pd = priv->cdev->pdn;
-	attr.hw_latency_mode = priv->hw_latency_mode;
-	attr.hw_max_latency_us = priv->hw_max_latency_us;
-	attr.hw_max_pending_comp = priv->hw_max_pending_comp;
-	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+	attr->hw_available_index = last_avail_idx;
+	attr->hw_used_index = last_used_idx;
+	attr->q_size = vq->size;
+	attr->mkey = priv->gpa_mkey_index;
+	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
+	attr->queue_index = index;
+	attr->pd = priv->cdev->pdn;
+	attr->hw_latency_mode = priv->hw_latency_mode;
+	attr->hw_max_latency_us = priv->hw_max_latency_us;
+	attr->hw_max_pending_comp = priv->hw_max_pending_comp;
+	if (attr->hw_latency_mode || attr->hw_max_latency_us ||
+		attr->hw_max_pending_comp)
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD;
+	return 0;
+}
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
+{
+	return (priv->caps.vnet_modify_ext &&
+			priv->caps.virtio_net_q_addr_modify &&
+			priv->caps.virtio_q_index_modify) ? true : false;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+{
+	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	int ret;
+	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
+	uint64_t cookie;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->priv = priv;
-	if (!virtq->virtq)
+	virtq->stopped = 0;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
+				&vq, index);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to setup update virtq attr"
+			" %d.", index);
 		goto error;
-	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
-	if (mlx5_vdpa_virtq_modify(virtq, 1))
+	}
+	if (!virtq->virtq) {
+		virtq->index = index;
+		virtq->vq_size = vq.size;
+		virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx,
+			&attr);
+		if (!virtq->virtq)
+			goto error;
+		attr.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE;
+	}
+	attr.state = MLX5_VIRTQ_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify virtq %d.", index);
 		goto error;
-	virtq->priv = priv;
+	}
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->configured = 1;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
@@ -553,7 +610,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 			return 0;
 		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
 	}
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
 			ret = mlx5_vdpa_steer_update(priv);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 07/15] vdpa/mlx5: optimize datapath-control synchronization
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (5 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  2:30   ` [PATCH v2 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 24 ++++---
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++++++++++++++++++-----------
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    | 34 +++++++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++++++++++++++++++-------
 6 files changed, 184 insertions(+), 79 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee99952e11..e5a11f72fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
 	struct mlx5_vdpa_priv *priv =
 		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	struct mlx5_vdpa_virtq *virtq;
 	int ret;
 
 	if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
-	pthread_mutex_lock(&priv->vq_config_lock);
+	virtq = &priv->virtqs[vring];
+	pthread_mutex_lock(&virtq->virtq_lock);
 	ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-	pthread_mutex_unlock(&priv->vq_config_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
-	/* The mutex may stay locked after event thread cancel - initiate it. */
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t index;
 	uint32_t i;
 
+	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+		index++) {
+		virtq = &priv->virtqs[index];
+		pthread_mutex_init(&virtq->virtq_lock, NULL);
+	}
 	if (!priv->queues)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
-		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		virtq = &priv->virtqs[index];
 		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, &virtq->eqp);
+					-1, virtq);
 
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
+	rte_spinlock_init(&priv->db_lock);
+	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
-	pthread_mutex_destroy(&priv->vq_config_lock);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
 	bool stopped;
 	uint32_t configured:1;
 	uint32_t version;
+	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
 	enum mlx5_dev_state state;
-	pthread_mutex_t vq_config_lock;
+	rte_spinlock_t db_lock;
+	pthread_mutex_t steer_update_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
 	int event_mode;
@@ -222,14 +224,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   Number of descriptors.
  * @param[in] callfd
  *   The guest notification file descriptor.
- * @param[in/out] eqp
- *   Pointer to the event QP structure.
+ * @param[in/out] virtq
+ *   Pointer to the virt-queue structure.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			      int callfd, struct mlx5_vdpa_event_qp *eqp);
+int
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+	int callfd, struct mlx5_vdpa_virtq *virtq);
 
 /**
  * Destroy an event QP and all its related resources.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b43dca9255..2b0f5936d1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -85,12 +85,13 @@ mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
 
 static int
 mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
-		    int callfd, struct mlx5_vdpa_cq *cq)
+		int callfd, struct mlx5_vdpa_virtq *virtq)
 {
 	struct mlx5_devx_cq_attr attr = {
 		.use_first_only = 1,
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
+	struct mlx5_vdpa_cq *cq = &virtq->eqp.cq;
 	uint16_t event_nums[1] = {0};
 	int ret;
 
@@ -102,10 +103,11 @@ mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
 	cq->log_desc_n = log_desc_n;
 	rte_spinlock_init(&cq->sl);
 	/* Subscribe CQ event to the event channel controlled by the driver. */
-	ret = mlx5_os_devx_subscribe_devx_event(priv->eventc,
-						cq->cq_obj.cq->obj,
-						sizeof(event_nums), event_nums,
-						(uint64_t)(uintptr_t)cq);
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc,
+							cq->cq_obj.cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)virtq);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to subscribe CQE event.");
 		rte_errno = errno;
@@ -167,13 +169,17 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 static void
 mlx5_vdpa_arm_all_cqs(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_cq *cq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		cq = &priv->virtqs[i].eqp.cq;
 		if (cq->cq_obj.cq && !cq->armed)
 			mlx5_vdpa_cq_arm(priv, cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
@@ -220,13 +226,18 @@ mlx5_vdpa_queue_complete(struct mlx5_vdpa_cq *cq)
 static uint32_t
 mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 {
-	int i;
+	struct mlx5_vdpa_virtq *virtq;
+	struct mlx5_vdpa_cq *cq;
 	uint32_t max = 0;
+	uint32_t comp;
+	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
-		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
-		uint32_t comp = mlx5_vdpa_queue_complete(cq);
-
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		cq = &virtq->eqp.cq;
+		comp = mlx5_vdpa_queue_complete(cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		if (comp > max)
 			max = comp;
 	}
@@ -253,7 +264,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 }
 
 /* Wait on all CQs channel for completion event. */
-static struct mlx5_vdpa_cq *
+static struct mlx5_vdpa_virtq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 {
 #ifdef HAVE_IBV_DEVX_EVENT
@@ -265,7 +276,8 @@ mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 					    sizeof(out.buf));
 
 	if (ret >= 0)
-		return (struct mlx5_vdpa_cq *)(uintptr_t)out.event_resp.cookie;
+		return (struct mlx5_vdpa_virtq *)
+				(uintptr_t)out.event_resp.cookie;
 	DRV_LOG(INFO, "Got error in devx_get_event, ret = %d, errno = %d.",
 		ret, errno);
 #endif
@@ -276,7 +288,7 @@ static void *
 mlx5_vdpa_event_handle(void *arg)
 {
 	struct mlx5_vdpa_priv *priv = arg;
-	struct mlx5_vdpa_cq *cq;
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t max;
 
 	switch (priv->event_mode) {
@@ -284,7 +296,6 @@ mlx5_vdpa_event_handle(void *arg)
 	case MLX5_VDPA_EVENT_MODE_FIXED_TIMER:
 		priv->timer_delay_us = priv->event_us;
 		while (1) {
-			pthread_mutex_lock(&priv->vq_config_lock);
 			max = mlx5_vdpa_queues_complete(priv);
 			if (max == 0 && priv->no_traffic_counter++ >=
 			    priv->no_traffic_max) {
@@ -292,32 +303,37 @@ mlx5_vdpa_event_handle(void *arg)
 					priv->vdev->device->name);
 				mlx5_vdpa_arm_all_cqs(priv);
 				do {
-					pthread_mutex_unlock
-							(&priv->vq_config_lock);
-					cq = mlx5_vdpa_event_wait(priv);
-					pthread_mutex_lock
-							(&priv->vq_config_lock);
-					if (cq == NULL ||
-					       mlx5_vdpa_queue_complete(cq) > 0)
+					virtq = mlx5_vdpa_event_wait(priv);
+					if (virtq == NULL)
 						break;
+					pthread_mutex_lock(
+						&virtq->virtq_lock);
+					if (mlx5_vdpa_queue_complete(
+						&virtq->eqp.cq) > 0) {
+						pthread_mutex_unlock(
+							&virtq->virtq_lock);
+						break;
+					}
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
 				} while (1);
 				priv->timer_delay_us = priv->event_us;
 				priv->no_traffic_counter = 0;
 			} else if (max != 0) {
 				priv->no_traffic_counter = 0;
 			}
-			pthread_mutex_unlock(&priv->vq_config_lock);
 			mlx5_vdpa_timer_sleep(priv, max);
 		}
 		return NULL;
 	case MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT:
 		do {
-			cq = mlx5_vdpa_event_wait(priv);
-			if (cq != NULL) {
-				pthread_mutex_lock(&priv->vq_config_lock);
-				if (mlx5_vdpa_queue_complete(cq) > 0)
-					mlx5_vdpa_cq_arm(priv, cq);
-				pthread_mutex_unlock(&priv->vq_config_lock);
+			virtq = mlx5_vdpa_event_wait(priv);
+			if (virtq != NULL) {
+				pthread_mutex_lock(&virtq->virtq_lock);
+				if (mlx5_vdpa_queue_complete(
+					&virtq->eqp.cq) > 0)
+					mlx5_vdpa_cq_arm(priv, &virtq->eqp.cq);
+				pthread_mutex_unlock(&virtq->virtq_lock);
 			}
 		} while (1);
 		return NULL;
@@ -339,7 +355,6 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t sec;
 
-	pthread_mutex_lock(&priv->vq_config_lock);
 	while (mlx5_glue->devx_get_event(priv->err_chnl, &out.event_resp,
 					 sizeof(out.buf)) >=
 				       (ssize_t)sizeof(out.event_resp.cookie)) {
@@ -351,10 +366,11 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			continue;
 		}
 		virtq = &priv->virtqs[vq_index];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		if (!virtq->enable || virtq->version != version)
-			continue;
+			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
-			continue;
+			goto unlock;
 		virtq->stopped = true;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
@@ -384,8 +400,9 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 		for (i = 1; i < RTE_DIM(virtq->err_time); i++)
 			virtq->err_time[i - 1] = virtq->err_time[i];
 		virtq->err_time[RTE_DIM(virtq->err_time) - 1] = rte_rdtsc();
+unlock:
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
-	pthread_mutex_unlock(&priv->vq_config_lock);
 #endif
 }
 
@@ -533,11 +550,18 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 void
 mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	void *status;
+	int i;
 
 	if (priv->timer_tid) {
 		pthread_cancel(priv->timer_tid);
 		pthread_join(priv->timer_tid, &status);
+		/* The mutex may stay locked after event thread cancel, initiate it. */
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_init(&virtq->virtq_lock, NULL);
+		}
 	}
 	priv->timer_tid = 0;
 }
@@ -614,8 +638,9 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			  int callfd, struct mlx5_vdpa_event_qp *eqp)
+	int callfd, struct mlx5_vdpa_virtq *virtq)
 {
+	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
@@ -632,7 +657,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
-	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
+	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, virtq) ||
+		!eqp->cq.cq_obj.cq)
 		return -1;
 	attr.pd = priv->cdev->pdn;
 	attr.ts_format =
@@ -650,8 +676,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	attr.ts_format =
 		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
 	ret = mlx5_devx_qp_create(priv->cdev->ctx, &(eqp->sw_qp),
-					attr.num_of_receive_wqes *
-					MLX5_WSEG_SIZE, &attr, SOCKET_ID_ANY);
+				  attr.num_of_receive_wqes * MLX5_WSEG_SIZE,
+				  &attr, SOCKET_ID_ANY);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
 		goto error;
@@ -668,3 +694,4 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	mlx5_vdpa_event_qp_destroy(eqp);
 	return -1;
 }
+
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index a8faf0c116..efebf364d0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -25,11 +25,18 @@ mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
 				"enabling.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 			   &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
 				"bitmap enabling.", i);
-			return -1;
+				return -1;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -61,10 +68,19 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 		virtq = &priv->virtqs[i];
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
-						      &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for LM.", i);
-			goto err;
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(
+					priv->virtqs[i].virtq,
+					&attr)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to modify virtq %d for LM.", i);
+				goto err;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -79,6 +95,7 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	int i;
@@ -90,10 +107,13 @@ mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 	if (!RTE_VHOST_NEED_LOG(features))
 		return 0;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
+		virtq = &priv->virtqs[i];
 		if (!priv->virtqs[i].virtq) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
 		} else {
+			pthread_mutex_lock(&virtq->virtq_lock);
 			ret = mlx5_vdpa_virtq_stop(priv, i);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 			if (ret) {
 				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
 					"log.", i);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index d4b4375c88..4cbf09784e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -237,19 +237,24 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 {
-	int ret = mlx5_vdpa_rqt_prepare(priv);
+	int ret;
 
+	pthread_mutex_lock(&priv->steer_update_lock);
+	ret = mlx5_vdpa_rqt_prepare(priv);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
+		pthread_mutex_unlock(&priv->steer_update_lock);
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
 		ret = mlx5_vdpa_rss_flows_create(priv);
 		if (ret) {
 			DRV_LOG(ERR, "Cannot create RSS flows.");
+			pthread_mutex_unlock(&priv->steer_update_lock);
 			return -1;
 		}
 	}
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	return 0;
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 55cbc9fad2..138b7bdbc5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -24,13 +24,17 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	pthread_mutex_lock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
 		return;
 	}
-	if (rte_intr_fd_get(virtq->intr_handle) < 0)
+	if (rte_intr_fd_get(virtq->intr_handle) < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
 	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
@@ -44,9 +48,14 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 		}
 		break;
 	}
-	if (nbytes < 0)
+	if (nbytes < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
@@ -66,6 +75,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Virtq must be locked before calling this function. */
+static void
+mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
+{
+	int ret = -EAGAIN;
+
+	if (!virtq->intr_handle)
+		return;
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(virtq->intr_handle,
+					mlx5_vdpa_virtq_kick_handler, virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					virtq->index);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+				pthread_mutex_lock(&virtq->virtq_lock);
+			}
+		}
+		(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	}
+	rte_intr_instance_free(virtq->intr_handle);
+	virtq->intr_handle = NULL;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
@@ -75,6 +111,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		pthread_mutex_lock(&virtq->virtq_lock);
 		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
@@ -90,28 +127,17 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 		}
 		if (virtq->eqp.fw_qp)
 			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
-		while (ret == -EAGAIN) {
-			ret = rte_intr_callback_unregister(virtq->intr_handle,
-					mlx5_vdpa_virtq_kick_handler, virtq);
-			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
-					rte_intr_fd_get(virtq->intr_handle),
-					virtq->index);
-				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
-			}
-		}
-		rte_intr_fd_set(virtq->intr_handle, -1);
-	}
-	rte_intr_instance_free(virtq->intr_handle);
+	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
@@ -128,10 +154,15 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
-	for (i = 0; i < priv->nr_virtqs; i++)
-		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unset(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -250,7 +281,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
 		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, &virtq->eqp);
+				vq->size, vq->callfd, virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -420,7 +451,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	virtq->configured = 1;
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
 		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
@@ -441,7 +474,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (rte_intr_callback_register(virtq->intr_handle,
 					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
-			rte_intr_fd_set(virtq->intr_handle, -1);
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
@@ -537,6 +570,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	struct mlx5_vdpa_virtq *virtq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -556,9 +590,17 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++)
-		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-			goto error;
+	for (i = 0; i < nr_vring; i++) {
+		virtq = &priv->virtqs[i];
+		if (virtq->enable) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_vdpa_virtq_setup(priv, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 08/15] vdpa/mlx5: add multi-thread management for configuration
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (6 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  2:30   ` [PATCH v2 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst          |  11 +++
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  41 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  36 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is suggested).
+    This value, if not 0, should be the same for all the devices;
+    the first prob will take it with the event_core for all the multi-thread configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'mlx5_vdpa_virtq.c',
         'mlx5_vdpa_steer.c',
         'mlx5_vdpa_lm.c',
+        'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
         '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e5a11f72fd..a9d023ed08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 			DRV_LOG(WARNING, "Invalid event_core %s.", val);
 		else
 			priv->event_core = tmp;
+	} else if (strcmp(key, "max_conf_threads") == 0) {
+		if (tmp) {
+			priv->use_c_thread = true;
+			if (!conf_thread_mng.initializer_priv) {
+				conf_thread_mng.initializer_priv = priv;
+				if (tmp > MLX5_VDPA_MAX_C_THRD) {
+					DRV_LOG(WARNING,
+				"Invalid max_conf_threads %s "
+				"and set max_conf_threads to %d",
+				val, MLX5_VDPA_MAX_C_THRD);
+					tmp = MLX5_VDPA_MAX_C_THRD;
+				}
+				conf_thread_mng.max_thrds = tmp;
+			} else if (tmp != conf_thread_mng.max_thrds) {
+				DRV_LOG(WARNING,
+	"max_conf_threads is PMD argument and not per device, "
+	"only the first device configuration set it, current value is %d "
+	"and will not be changed to %d.",
+				conf_thread_mng.max_thrds, (int)tmp);
+			}
+		} else {
+			priv->use_c_thread = false;
+		}
 	} else if (strcmp(key, "hw_latency_mode") == 0) {
 		priv->hw_latency_mode = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		"hw_max_latency_us",
 		"hw_max_pending_comp",
 		"no_traffic_time",
+		"queue_size",
+		"queues",
+		"max_conf_threads",
 		NULL,
 	};
 
@@ -725,6 +753,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
+	if (priv->use_c_thread) {
+		if (conf_thread_mng.initializer_priv == priv)
+			if (mlx5_vdpa_mult_threads_create(priv->event_core))
+				goto error;
+		__atomic_fetch_add(&conf_thread_mng.refcnt, 1,
+			__ATOMIC_RELAXED);
+	}
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -739,6 +774,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
 error:
+	if (conf_thread_mng.initializer_priv == priv)
+		mlx5_vdpa_mult_threads_destroy(false);
 	if (priv)
 		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
@@ -806,6 +843,10 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
+	if (priv->use_c_thread)
+		if (__atomic_fetch_sub(&conf_thread_mng.refcnt,
+			1, __ATOMIC_RELAXED) == 1)
+			mlx5_vdpa_mult_threads_destroy(true);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3fd5eefc5e..4e7c2557b7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -73,6 +73,22 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_MAX_C_THRD 256
+
+/* Generic mlx5_vdpa_c_thread information. */
+struct mlx5_vdpa_c_thread {
+	pthread_t tid;
+};
+
+struct mlx5_vdpa_conf_thread_mng {
+	void *initializer_priv;
+	uint32_t refcnt;
+	uint32_t max_thrds;
+	pthread_mutex_t cthrd_lock;
+	struct mlx5_vdpa_c_thread cthrd[MLX5_VDPA_MAX_C_THRD];
+};
+extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -126,6 +142,7 @@ enum mlx5_dev_state {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
+	bool use_c_thread;
 	enum mlx5_dev_state state;
 	rte_spinlock_t db_lock;
 	pthread_mutex_t steer_update_lock;
@@ -496,4 +513,23 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create configuration multi-threads resource
+ *
+ * @param[in] cpu_core
+ *   CPU core number to set configuration threads affinity to.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_vdpa_mult_threads_create(int cpu_core);
+
+/**
+ * Destroy configuration multi-threads resource
+ *
+ */
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
new file mode 100644
index 0000000000..ba7d8b63b3
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+#include <string.h>
+#include <unistd.h>
+#include <sys/eventfd.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_io.h>
+#include <rte_alarm.h>
+#include <rte_tailq.h>
+#include <rte_ring_elem.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static void *
+mlx5_vdpa_c_thread_handle(void *arg)
+{
+	/* To be added later. */
+	return arg;
+}
+
+static void
+mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
+{
+	if (conf_thread_mng.cthrd[thrd_idx].tid) {
+		pthread_cancel(conf_thread_mng.cthrd[thrd_idx].tid);
+		pthread_join(conf_thread_mng.cthrd[thrd_idx].tid, NULL);
+		conf_thread_mng.cthrd[thrd_idx].tid = 0;
+		if (need_unlock)
+			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	}
+}
+
+static int
+mlx5_vdpa_c_thread_create(int cpu_core)
+{
+	const struct sched_param sp = {
+		.sched_priority = sched_get_priority_max(SCHED_RR),
+	};
+	rte_cpuset_t cpuset;
+	pthread_attr_t attr;
+	uint32_t thrd_idx;
+	char name[32];
+	int ret;
+
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_attr_init(&attr);
+	ret = pthread_attr_setschedpolicy(&attr, SCHED_RR);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread sched policy = RR.");
+		goto c_thread_err;
+	}
+	ret = pthread_attr_setschedparam(&attr, &sp);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread priority.");
+		goto c_thread_err;
+	}
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++) {
+		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
+				&attr, mlx5_vdpa_c_thread_handle,
+				(void *)&conf_thread_mng);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create vdpa multi-threads %d.",
+					thrd_idx);
+			goto c_thread_err;
+		}
+		CPU_ZERO(&cpuset);
+		if (cpu_core != -1)
+			CPU_SET(cpu_core, &cpuset);
+		else
+			cpuset = rte_lcore_cpuset(rte_get_main_lcore());
+		ret = pthread_setaffinity_np(
+				conf_thread_mng.cthrd[thrd_idx].tid,
+				sizeof(cpuset), &cpuset);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set thread affinity for "
+			"vdpa multi-threads %d.", thrd_idx);
+			goto c_thread_err;
+		}
+		snprintf(name, sizeof(name), "vDPA-mthread-%d", thrd_idx);
+		ret = pthread_setname_np(
+				conf_thread_mng.cthrd[thrd_idx].tid, name);
+		if (ret)
+			DRV_LOG(ERR, "Failed to set vdpa multi-threads name %s.",
+					name);
+		else
+			DRV_LOG(DEBUG, "Thread name: %s.", name);
+	}
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+c_thread_err:
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, false);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return -1;
+}
+
+int
+mlx5_vdpa_mult_threads_create(int cpu_core)
+{
+	pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	if (mlx5_vdpa_c_thread_create(cpu_core)) {
+		DRV_LOG(ERR, "Cannot create vDPA configuration threads.");
+		mlx5_vdpa_mult_threads_destroy(false);
+		return -1;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock)
+{
+	uint32_t thrd_idx;
+
+	if (!conf_thread_mng.initializer_priv)
+		return;
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, need_unlock);
+	pthread_mutex_destroy(&conf_thread_mng.cthrd_lock);
+	memset(&conf_thread_mng, 0, sizeof(struct mlx5_vdpa_conf_thread_mng));
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 2b0f5936d1..b45fbac146 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -507,7 +507,7 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 	pthread_attr_t attr;
 	char name[16];
 	const struct sched_param sp = {
-		.sched_priority = sched_get_priority_max(SCHED_RR),
+		.sched_priority = sched_get_priority_max(SCHED_RR) - 1,
 	};
 
 	if (!priv->eventc)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 138b7bdbc5..599809b09b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -43,7 +43,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 			    errno == EWOULDBLOCK ||
 			    errno == EAGAIN)
 				continue;
-			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s.",
 				virtq->index, strerror(errno));
 		}
 		break;
@@ -57,7 +57,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	rte_spinlock_unlock(&priv->db_lock);
 	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
-		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling.",
 			priv->vid, virtq->index);
 		return;
 	}
@@ -218,7 +218,7 @@ mlx5_vdpa_virtq_query(struct mlx5_vdpa_priv *priv, int index)
 		return -1;
 	}
 	if (attr.state == MLX5_VIRTQ_STATE_ERROR)
-		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu",
+		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu.",
 			priv->vid, index, attr.error_type);
 	return 0;
 }
@@ -380,7 +380,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0");
+		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
 	} else {
 		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 09/15] vdpa/mlx5: add task ring for MT management
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (7 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  2:30   ` [PATCH v2 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  17 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+	struct mlx5_vdpa_priv *priv;
+	uint32_t *remaining_cnt;
+	uint32_t *err_cnt;
+	uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
 	pthread_t tid;
+	struct rte_ring *rng;
+	pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..1fdc92d3ad 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,103 @@
 #include <rte_alarm.h>
 #include <rte_tailq.h>
 #include <rte_ring_elem.h>
+#include <rte_ring_peek.h>
 
 #include <mlx5_common.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+	void **obj, uint32_t n, uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_elem_start(r, obj,
+		sizeof(struct mlx5_vdpa_task), n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_elem_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+	void * const *obj, uint32_t n, uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_elem_finish(r, obj,
+		sizeof(struct mlx5_vdpa_task), n);
+	return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num)
+{
+	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t i;
+
+	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+	for (i = 0 ; i < num; i++) {
+		task[i].priv = priv;
+		/* To be added later. */
+	}
+	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+		return -1;
+	for (i = 0 ; i < num; i++)
+		if (task[i].remaining_cnt)
+			__atomic_fetch_add(task[i].remaining_cnt, 1,
+				__ATOMIC_RELAXED);
+	/* wake up conf thread. */
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-	/* To be added later. */
-	return arg;
+	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_priv *priv;
+	struct mlx5_vdpa_task task;
+	struct rte_ring *rng;
+	uint32_t thrd_idx;
+	uint32_t task_num;
+
+	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+		thrd_idx++)
+		if (multhrd->cthrd[thrd_idx].tid == thread_id)
+			break;
+	if (thrd_idx >= multhrd->max_thrds)
+		return NULL;
+	rng = multhrd->cthrd[thrd_idx].rng;
+	while (1) {
+		task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+			(void **)&task, 1, NULL);
+		if (!task_num) {
+			/* No task and condition wait. */
+			pthread_mutex_lock(&multhrd->cthrd_lock);
+			pthread_cond_wait(
+				&multhrd->cthrd[thrd_idx].c_cond,
+				&multhrd->cthrd_lock);
+			pthread_mutex_unlock(&multhrd->cthrd_lock);
+		}
+		priv = task.priv;
+		if (priv == NULL)
+			continue;
+		__atomic_fetch_sub(task.remaining_cnt,
+			1, __ATOMIC_RELAXED);
+		/* To be added later. */
+	}
+	return NULL;
 }
 
 static void
@@ -34,6 +120,10 @@ mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
 		if (need_unlock)
 			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
 	}
+	if (conf_thread_mng.cthrd[thrd_idx].rng) {
+		rte_ring_free(conf_thread_mng.cthrd[thrd_idx].rng);
+		conf_thread_mng.cthrd[thrd_idx].rng = NULL;
+	}
 }
 
 static int
@@ -45,6 +135,7 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 	rte_cpuset_t cpuset;
 	pthread_attr_t attr;
 	uint32_t thrd_idx;
+	uint32_t ring_num;
 	char name[32];
 	int ret;
 
@@ -60,8 +151,26 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 		DRV_LOG(ERR, "Failed to set thread priority.");
 		goto c_thread_err;
 	}
+	ring_num = MLX5_VDPA_MAX_TASKS_PER_THRD / conf_thread_mng.max_thrds;
+	if (!ring_num) {
+		DRV_LOG(ERR, "Invalid ring number for thread.");
+		goto c_thread_err;
+	}
 	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
 		thrd_idx++) {
+		snprintf(name, sizeof(name), "vDPA-mthread-ring-%d",
+			thrd_idx);
+		conf_thread_mng.cthrd[thrd_idx].rng = rte_ring_create_elem(name,
+			sizeof(struct mlx5_vdpa_task), ring_num,
+			rte_socket_id(),
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ |
+			RING_F_EXACT_SZ);
+		if (!conf_thread_mng.cthrd[thrd_idx].rng) {
+			DRV_LOG(ERR,
+			"Failed to create vdpa multi-threads %d ring.",
+			thrd_idx);
+			goto c_thread_err;
+		}
 		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
 				&attr, mlx5_vdpa_c_thread_handle,
 				(void *)&conf_thread_mng);
@@ -91,6 +200,8 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 					name);
 		else
 			DRV_LOG(DEBUG, "Thread name: %s.", name);
+		pthread_cond_init(&conf_thread_mng.cthrd[thrd_idx].c_cond,
+			NULL);
 	}
 	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
 	return 0;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 10/15] vdpa/mlx5: add MT task for VM memory registration
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (8 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  2:30   ` [PATCH v2 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 270 ++++++++++++++++++--------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 258 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a9d023ed08..e3b32fa087 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
 	union {
 		struct ibv_mr *mr;
 		struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0xFFFF
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+	MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
 	struct mlx5_vdpa_priv *priv;
+	enum mlx5_vdpa_task_type type;
 	uint32_t *remaining_cnt;
 	uint32_t *err_cnt;
 	uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+	struct rte_vhost_memory *vmem;
+	uint32_t entries_num;
+	uint64_t gcd;
+	uint64_t size;
+	uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
-	struct rte_vhost_memory *vmem;
+	struct mlx5_vdpa_vmem_info vmem_info;
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_event_channel *err_chnl;
 	struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
 	uint8_t num_lag_ports;
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
+	uint16_t last_c_thrd_idx;
+	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
 	void *virtq_db_addr;
 	struct mlx5_pmd_wrapped_mr lm_mr;
-	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+	struct mlx5_vdpa_query_mr **mrs;
 	struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num);
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1fdc92d3ad..10391931ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num)
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
+		void **task_data, uint32_t num)
 {
 	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
 	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t *data = (uint32_t *)task_data;
 	uint32_t i;
 
 	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
 	for (i = 0 ; i < num; i++) {
 		task[i].priv = priv;
 		/* To be added later. */
+		task[i].type = task_type;
+		task[i].remaining_cnt = remaining_cnt;
+		task[i].err_cnt = err_cnt;
+		task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -71,6 +78,23 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time)
+{
+	/* Check and wait all tasks done. */
+	while (__atomic_load_n(remaining_cnt,
+		__ATOMIC_RELAXED) != 0) {
+		rte_delay_us_sleep(sleep_time);
+	}
+	if (__atomic_load_n(err_cnt,
+		__ATOMIC_RELAXED)) {
+		DRV_LOG(ERR, "Tasks done with error.");
+		return true;
+	}
+	return false;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
@@ -81,6 +105,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct rte_ring *rng;
 	uint32_t thrd_idx;
 	uint32_t task_num;
+	int ret;
 
 	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
 		thrd_idx++)
@@ -99,13 +124,29 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&multhrd->cthrd[thrd_idx].c_cond,
 				&multhrd->cthrd_lock);
 			pthread_mutex_unlock(&multhrd->cthrd_lock);
+			continue;
 		}
 		priv = task.priv;
 		if (priv == NULL)
 			continue;
-		__atomic_fetch_sub(task.remaining_cnt,
+		switch (task.type) {
+		case MLX5_VDPA_TASK_REG_MR:
+			ret = mlx5_vdpa_register_mr(priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mr %d.", task.idx);
+				__atomic_fetch_add(task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
+		default:
+			DRV_LOG(ERR, "Invalid vdpa task type %d.",
+			task.type);
+			break;
+		}
+		if (task.remaining_cnt)
+			__atomic_fetch_sub(task.remaining_cnt,
 			1, __ATOMIC_RELAXED);
-		/* To be added later. */
 	}
 	return NULL;
 }
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index d6e3dd664b..e333f0bca6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -17,25 +17,33 @@
 void
 mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
 	struct mlx5_vdpa_query_mr *entry;
-	struct mlx5_vdpa_query_mr *next;
+	int i;
 
-	entry = SLIST_FIRST(&priv->mr_list);
-	while (entry) {
-		next = SLIST_NEXT(entry, next);
-		if (entry->is_indirect)
-			claim_zero(mlx5_devx_cmd_destroy(entry->mkey));
-		else
-			claim_zero(mlx5_glue->dereg_mr(entry->mr));
-		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
-		rte_free(entry);
-		entry = next;
+	if (priv->mrs) {
+		for (i = priv->num_mrs - 1; i >= 0; i--) {
+			entry = &mrs[i];
+			if (entry->is_indirect) {
+				if (entry->mkey)
+					claim_zero(
+					mlx5_devx_cmd_destroy(entry->mkey));
+			} else {
+				if (entry->mr)
+					claim_zero(
+					mlx5_glue->dereg_mr(entry->mr));
+			}
+		}
+		rte_free(priv->mrs);
+		priv->mrs = NULL;
+		priv->num_mrs = 0;
 	}
-	SLIST_INIT(&priv->mr_list);
-	if (priv->vmem) {
-		free(priv->vmem);
-		priv->vmem = NULL;
+	if (priv->vmem_info.vmem) {
+		free(priv->vmem_info.vmem);
+		priv->vmem_info.vmem = NULL;
 	}
+	priv->gpa_mkey_index = 0;
 }
 
 static int
@@ -167,72 +175,29 @@ mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
-/*
- * The target here is to group all the physical memory regions of the
- * virtio device in one indirect mkey.
- * For KLM Fixed Buffer Size mode (HW find the translation entry in one
- * read according to the guest physical address):
- * All the sub-direct mkeys of it must be in the same size, hence, each
- * one of them should be in the GCD size of all the virtio memory
- * regions and the holes between them.
- * For KLM mode (each entry may be in different size so HW must iterate
- * the entries):
- * Each virtio memory region and each hole between them have one entry,
- * just need to cover the maximum allowed size(2G) by splitting entries
- * which their associated memory regions are bigger than 2G.
- * It means that each virtio memory region may be mapped to more than
- * one direct mkey in the 2 modes.
- * All the holes of invalid memory between the virtio memory regions
- * will be mapped to the null memory region for security.
- */
-int
-mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+static int
+mlx5_vdpa_create_indirect_mkey(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_devx_mkey_attr mkey_attr;
-	struct mlx5_vdpa_query_mr *entry = NULL;
-	struct rte_vhost_mem_region *reg = NULL;
-	uint8_t mode = 0;
-	uint32_t entries_num = 0;
-	uint32_t i;
-	uint64_t gcd = 0;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	uint8_t mode = priv->vmem_info.mode;
+	uint32_t entries_num = priv->vmem_info.entries_num;
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_klm klm_array[entries_num];
+	uint64_t gcd = priv->vmem_info.gcd;
+	int ret = -rte_errno;
 	uint64_t klm_size;
-	uint64_t mem_size;
-	uint64_t k;
 	int klm_index = 0;
-	int ret;
-	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
-			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
-	struct mlx5_klm klm_array[entries_num];
+	uint64_t k;
+	uint32_t i;
 
-	if (!mem)
-		return -rte_errno;
-	if (priv->vmem != NULL) {
-		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
-			/* VM memory not changed, reuse resources. */
-			free(mem);
-			return 0;
-		}
-		mlx5_vdpa_mem_dereg(priv);
-	}
-	priv->vmem = mem;
+	/* If it is the last entry, create indirect mkey. */
 	for (i = 0; i < mem->nregions; i++) {
+		entry = &mrs[i];
 		reg = &mem->regions[i];
-		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-		if (!entry) {
-			ret = -ENOMEM;
-			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
-			goto error;
-		}
-		entry->mr = mlx5_glue->reg_mr_iova(priv->cdev->pd,
-				       (void *)(uintptr_t)(reg->host_user_addr),
-				       reg->size, reg->guest_phys_addr,
-				       IBV_ACCESS_LOCAL_WRITE);
-		if (!entry->mr) {
-			DRV_LOG(ERR, "Failed to create direct Mkey.");
-			ret = -rte_errno;
-			goto error;
-		}
-		entry->is_indirect = 0;
 		if (i > 0) {
 			uint64_t sadd;
 			uint64_t empty_region_sz = reg->guest_phys_addr -
@@ -265,11 +230,10 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 			klm_array[klm_index].address = reg->guest_phys_addr + k;
 			klm_index++;
 		}
-		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	}
 	memset(&mkey_attr, 0, sizeof(mkey_attr));
 	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
-	mkey_attr.size = mem_size;
+	mkey_attr.size = priv->vmem_info.size;
 	mkey_attr.pd = priv->cdev->pdn;
 	mkey_attr.umem_id = 0;
 	/* Must be zero for KLM mode. */
@@ -278,25 +242,159 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	mkey_attr.pg_access = 0;
 	mkey_attr.klm_array = klm_array;
 	mkey_attr.klm_num = klm_index;
-	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-	if (!entry) {
-		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
-		ret = -ENOMEM;
-		goto error;
-	}
+	entry = &mrs[mem->nregions];
 	entry->mkey = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &mkey_attr);
 	if (!entry->mkey) {
 		DRV_LOG(ERR, "Failed to create indirect Mkey.");
-		ret = -rte_errno;
-		goto error;
+		rte_errno = -ret;
+		return ret;
 	}
 	entry->is_indirect = 1;
-	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	priv->gpa_mkey_index = entry->mkey->id;
 	return 0;
+}
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	void *mrs;
+	uint8_t mode = 0;
+	int ret = -rte_errno;
+	uint32_t i, thrd_idx, data[1];
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			(priv->vid, &mode, &priv->vmem_info.size,
+			&priv->vmem_info.gcd, &priv->vmem_info.entries_num);
+
+	if (!mem)
+		return -rte_errno;
+	if (priv->vmem_info.vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem_info.vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
+	priv->vmem_info.vmem = mem;
+	priv->vmem_info.mode = mode;
+	priv->num_mrs = mem->nregions;
+	if (!priv->num_mrs || priv->num_mrs >= MLX5_VDPA_MAX_MRS) {
+		DRV_LOG(ERR,
+		"Invalid number of memory regions.");
+		goto error;
+	}
+	/* The last one is indirect mkey entry. */
+	priv->num_mrs++;
+	mrs = rte_zmalloc("mlx5 vDPA memory regions",
+		sizeof(struct mlx5_vdpa_query_mr) * priv->num_mrs, 0);
+	priv->mrs = mrs;
+	if (!priv->mrs) {
+		DRV_LOG(ERR, "Failed to allocate private memory regions.");
+		goto error;
+	}
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[mem->nregions];
+
+		for (i = 0; i < mem->nregions; i++) {
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_REG_MR,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR,
+				"Fail to add task mem region (%d)", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			ret = mlx5_vdpa_register_mr(priv,
+					main_task_idx[i]);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 100)) {
+			DRV_LOG(ERR,
+			"Failed to wait register mem region tasks ready.");
+			goto error;
+		}
+	} else {
+		for (i = 0; i < mem->nregions; i++) {
+			ret = mlx5_vdpa_register_mr(priv, i);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+	}
+	ret = mlx5_vdpa_create_indirect_mkey(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create indirect mkey .");
+		goto error;
+	}
+	return 0;
 error:
-	rte_free(entry);
 	mlx5_vdpa_mem_dereg(priv);
 	rte_errno = -ret;
 	return ret;
 }
+
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx)
+{
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	int ret;
+
+	reg = &mem->regions[idx];
+	entry = &mrs[idx];
+	entry->mr = mlx5_glue->reg_mr_iova
+				      (priv->cdev->pd,
+				       (void *)(uintptr_t)(reg->host_user_addr),
+				       reg->size, reg->guest_phys_addr,
+				       IBV_ACCESS_LOCAL_WRITE);
+	if (!entry->mr) {
+		DRV_LOG(ERR, "Failed to create direct Mkey.");
+		ret = -rte_errno;
+		return ret;
+	}
+	entry->is_indirect = 0;
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 599809b09b..0b317655db 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -353,21 +353,21 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 	}
 	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
 			return -1;
 		}
 		attr->desc_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
 			return -1;
 		}
 		attr->used_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 11/15] vdpa/mlx5: add virtq creation task for MT management
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (9 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  2:30   ` [PATCH v2 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++++++++++++++++++-------
 4 files changed, 134 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
+	MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
-	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
 	uint8_t notifier_state;
-	bool stopped;
 	uint32_t configured:1;
+	uint32_t enable:1;
+	uint32_t stopped:1;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
 		enum mlx5_vdpa_task_type task_type,
-		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
 		void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 10391931ae..1389d369ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
 	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
 	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
@@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__ATOMIC_RELAXED);
 			}
 			break;
+		case MLX5_VDPA_TASK_SETUP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_setup(priv,
+				task.idx, false);
+			if (ret) {
+				DRV_LOG(ERR,
+					"Failed to setup virtq %d.", task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1, __ATOMIC_RELAXED);
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
 			goto unlock;
-		virtq->stopped = true;
+		virtq->stopped = 1;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
 			goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 0b317655db..db05220e76 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		if (virtq->index != i)
+			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
-		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -131,7 +132,6 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
@@ -191,7 +191,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
 		return -1;
-	virtq->stopped = true;
+	virtq->stopped = 1;
 	DRV_LOG(DEBUG, "vid %u virtq %u was stopped.", priv->vid, index);
 	return mlx5_vdpa_virtq_query(priv, index);
 }
@@ -411,7 +411,38 @@ mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_doorbell_setup(struct mlx5_vdpa_virtq *virtq,
+		struct rte_vhost_vring *vq, int index)
+{
+	virtq->intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (virtq->intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		return -1;
+	}
+	if (rte_intr_fd_set(virtq->intr_handle, vq->kickfd))
+		return -1;
+	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
+		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
+	} else {
+		if (rte_intr_type_set(virtq->intr_handle,
+			RTE_INTR_HANDLE_EXT))
+			return -1;
+		if (rte_intr_callback_register(virtq->intr_handle,
+			mlx5_vdpa_virtq_kick_handler, virtq)) {
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
+				index);
+			return -1;
+		}
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			rte_intr_fd_get(virtq->intr_handle), index);
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	struct rte_vhost_vring vq;
@@ -455,33 +486,11 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
-	virtq->intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (virtq->intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
-
-	if (rte_intr_fd_set(virtq->intr_handle, vq.kickfd))
-		goto error;
-
-	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
-		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-	} else {
-		if (rte_intr_type_set(virtq->intr_handle, RTE_INTR_HANDLE_EXT))
-			goto error;
-
-		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_kick_handler,
-					       virtq)) {
-			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	if (reg_kick) {
+		if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, index)) {
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
-		} else {
-			DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				index);
 		}
 	}
 	/* Subscribe virtq error event. */
@@ -497,7 +506,6 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		rte_errno = errno;
 		goto error;
 	}
-	virtq->stopped = false;
 	/* Initial notification to ask Qemu handling completed buffers. */
 	if (virtq->eqp.cq.callfd != -1)
 		eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
@@ -567,10 +575,12 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t i;
-	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -590,16 +600,83 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		virtq = &priv->virtqs[i];
-		if (virtq->enable) {
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[nr_vring];
+
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_SETUP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+						"task setup virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (mlx5_vdpa_virtq_setup(priv, i)) {
+			if (mlx5_vdpa_virtq_setup(priv,
+				main_task_idx[i], false)) {
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			goto error;
+		}
+		for (i = 0; i < nr_vring; i++) {
+			/* Setup doorbell mapping in order for Qume. */
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->enable || !virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			if (rte_vhost_get_vhost_vring(priv->vid, i, &vq)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to register virtq %d interrupt.", i);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	} else {
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (virtq->enable) {
+				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
+					goto error;
+				}
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
 	}
 	return 0;
 error:
@@ -663,7 +740,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		mlx5_vdpa_virtq_unset(virtq);
 	}
 	if (enable) {
-		ret = mlx5_vdpa_virtq_setup(priv, index);
+		ret = mlx5_vdpa_virtq_setup(priv, index, true);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to setup virtq %d.", index);
 			return ret;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 12/15] vdpa/mlx5: add virtq LM log task
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (10 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  2:30   ` [PATCH v2 13/15] vdpa/mlx5: add device close task Li Zhang
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 85 +++++++++++++++++++++------
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
+	MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1389d369ae..98369f0887 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
+	uint64_t features;
 	uint32_t thrd_idx;
 	uint32_t task_num;
 	int ret;
@@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_STOP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			ret = rte_vhost_get_negotiated_features(
+				priv->vid, &features);
+			if (ret) {
+				DRV_LOG(ERR,
+		"Failed to get negotiated features virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(
+				priv->vid, task.idx, 0,
+			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index efebf364d0..c2e78218ca 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
-	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-	int i;
+	int ret;
 
+	ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to get negotiated features.");
 		return -1;
 	}
-	if (!RTE_VHOST_NEED_LOG(features))
-		return 0;
-	for (i = 0; i < priv->nr_virtqs; ++i) {
-		virtq = &priv->virtqs[i];
-		if (!priv->virtqs[i].virtq) {
-			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-		} else {
+	if (priv->use_c_thread && priv->nr_virtqs) {
+		uint32_t main_task_idx[priv->nr_virtqs];
+
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->configured)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_STOP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+					"task stop virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			ret = mlx5_vdpa_virtq_stop(priv, i);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					main_task_idx[i]);
+			if (ret) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.", i);
+				return -1;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			return -1;
+		}
+	} else {
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			ret = mlx5_vdpa_virtq_stop(priv, i);
 			if (ret) {
-				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
-					"log.", i);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d for LM log.", i);
 				return -1;
 			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
-		rte_vhost_log_used_vring(priv->vid, i, 0,
-			      MLX5_VDPA_USED_RING_LEN(priv->virtqs[i].vq_size));
 	}
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 13/15] vdpa/mlx5: add device close task
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (11 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  2:30   ` [PATCH v2 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 56 +++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++++++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e3b32fa087..d000854c08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
 	/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t timeout = 0;
+
+	/* Check and wait all close tasks done. */
+	while (__atomic_load_n(&priv->dev_close_progress,
+		__ATOMIC_RELAXED) != 0 && timeout < 1000) {
+		rte_delay_us_sleep(10000);
+		timeout++;
+	}
+	if (priv->dev_close_progress) {
+		DRV_LOG(ERR,
+		"Failed to wait close device tasks done vid %d.",
+		priv->vid);
+		return true;
+	}
+	return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	if (priv->use_c_thread) {
+		if (priv->last_c_thrd_idx >=
+			(conf_thread_mng.max_thrds - 1))
+			priv->last_c_thrd_idx = 0;
+		else
+			priv->last_c_thrd_idx++;
+		__atomic_store_n(&priv->dev_close_progress,
+			1, __ATOMIC_RELAXED);
+		if (mlx5_vdpa_task_add(priv,
+			priv->last_c_thrd_idx,
+			MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+			NULL, NULL, NULL, 1)) {
+			DRV_LOG(ERR,
+			"Fail to add dev close task. ");
+			goto single_thrd;
+		}
+		priv->state = MLX5_VDPA_STATE_PROBED;
+		DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+		return ret;
+	}
+single_thrd:
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	priv->state = MLX5_VDPA_STATE_PROBED;
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
+	__atomic_store_n(&priv->dev_close_progress, 0,
+		__ATOMIC_RELAXED);
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
+	if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+		return -1;
 	priv->vid = vid;
 	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
@@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED)
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
+		if (priv->use_c_thread)
+			mlx5_vdpa_wait_dev_close_tasks_done(priv);
 		mlx5_vdpa_dev_cache_clean(priv);
+	}
 	priv->connected = false;
 	return 0;
 }
@@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 		mlx5_vdpa_dev_close(priv->vid);
+	if (priv->use_c_thread)
+		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
+	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -206,6 +207,7 @@ struct mlx5_vdpa_priv {
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
 	uint16_t last_c_thrd_idx;
+	uint16_t dev_close_progress;
 	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
@@ -578,4 +580,10 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
+void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 98369f0887..bb2279440b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -63,7 +63,8 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		task[i].type = task_type;
 		task[i].remaining_cnt = remaining_cnt;
 		task[i].err_cnt = err_cnt;
-		task[i].idx = data[i];
+		if (data)
+			task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -187,6 +188,23 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT:
+			mlx5_vdpa_virtq_unreg_intr_handle_all(priv);
+			pthread_mutex_lock(&priv->steer_update_lock);
+			mlx5_vdpa_steer_unset(priv);
+			pthread_mutex_unlock(&priv->steer_update_lock);
+			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_drain_cq(priv);
+			if (priv->lm_mr.addr)
+				mlx5_os_wrapped_mkey_destroy(
+					&priv->lm_mr);
+			if (!priv->connected)
+				mlx5_vdpa_dev_cache_clean(priv);
+			priv->vid = 0;
+			__atomic_store_n(
+				&priv->dev_close_progress, 0,
+				__ATOMIC_RELAXED);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index db05220e76..a08c854b14 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -102,6 +102,20 @@ mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
 	virtq->intr_handle = NULL;
 }
 
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
+
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unregister_intr_handle(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 14/15] vdpa/mlx5: add virtq sub-resources creation
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (12 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 13/15] vdpa/mlx5: add device close task Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  2:30   ` [PATCH v2 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
  2022-06-16  7:24   ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Maxime Coquelin
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba, Yajun Wu

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 72 +++++++--------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +++++++++++++++++++++--------
 5 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d000854c08..f006a9cd3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_vdpa_virtq *virtq;
+	uint32_t max_queues;
 	uint32_t index;
-	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
 
-	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+	for (index = 0; index < priv->caps.max_num_virtio_queues;
 		index++) {
 		virtq = &priv->virtqs[index];
 		pthread_mutex_init(&virtq->virtq_lock, NULL);
 	}
-	if (!priv->queues)
+	if (!priv->queues || !priv->queue_size)
 		return 0;
-	for (index = 0; index < (priv->queues * 2); ++index) {
+	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	for (index = 0; index < max_queues; ++index)
+		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+			index))
+			goto error;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			goto error;
+	return 0;
+error:
+	for (index = 0; index < max_queues; ++index) {
 		virtq = &priv->virtqs[index];
-		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, virtq);
-
-		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-				index);
-			return -1;
-		}
-		if (priv->caps.queue_counters_valid) {
-			if (!virtq->counters)
-				virtq->counters =
-					mlx5_devx_cmd_create_virtio_q_counters
-						(priv->cdev->ctx);
-			if (!virtq->counters) {
-				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
-					" %d.", index);
-				return -1;
-			}
-		}
-		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-			uint32_t size;
-			void *buf;
-			struct mlx5dv_devx_umem *obj;
-
-			size = priv->caps.umems[i].a * priv->queue_size +
-					priv->caps.umems[i].b;
-			buf = rte_zmalloc(__func__, size, 4096);
-			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
-						" %u.", i, index);
-				return -1;
-			}
-			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-					size, IBV_ACCESS_LOCAL_WRITE);
-			if (obj == NULL) {
-				rte_free(buf);
-				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
-						i, index);
-				return -1;
-			}
-			virtq->umems[i].size = size;
-			virtq->umems[i].buf = buf;
-			virtq->umems[i].obj = obj;
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
-	return 0;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..f353db62ac 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   The guest notification file descriptor.
  * @param[in/out] virtq
  *   Pointer to the virt-queue structure.
+ * @param[in] reset
+ *   If true, it will reset event qp.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq);
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset);
 
 /**
  * Destroy an event QP and all its related resources.
@@ -403,11 +405,13 @@ void mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] is_dummy
+ *   If set, it is updated with dummy queue for prepare resource.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
-int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv);
+int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy);
 
 /**
  * Setup steering and all its related resources to enable RSS traffic from the
@@ -581,9 +585,14 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 void
-mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
-void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
 void
 mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index);
+int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
+void
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f782b6b832..22f0920c88 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -249,7 +249,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
 
 		mlx5_vdpa_queue_complete(cq);
@@ -618,7 +618,7 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
-static int
+int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 {
 	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
@@ -638,7 +638,7 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq)
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset)
 {
 	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
@@ -649,11 +649,10 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 		/* Reuse existing resources. */
 		eqp->cq.callfd = callfd;
 		/* FW will set event qp to error state in q destroy. */
-		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+		if (reset && !mlx5_vdpa_qps2rst2rts(eqp))
 			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 					&eqp->sw_qp.db_rec[0]);
-			return 0;
-		}
+		return 0;
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index 4cbf09784e..c2e0a17ace 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -57,7 +57,7 @@ mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
  * -1 on error.
  */
 static int
-mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int i;
 	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
@@ -67,15 +67,20 @@ mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 						      sizeof(uint32_t), 0);
 	uint32_t k = 0, j;
 	int ret = 0, num;
+	uint16_t nr_vring = is_dummy ?
+	(((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+	(priv->queues * 2) : priv->caps.max_num_virtio_queues) : priv->nr_virtqs;
 
 	if (!attr) {
 		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	for (i = 0; i < nr_vring; i++) {
 		if (is_virtq_recvq(i, priv->nr_virtqs) &&
-		    priv->virtqs[i].enable && priv->virtqs[i].virtq) {
+			(is_dummy || (priv->virtqs[i].enable &&
+			priv->virtqs[i].configured)) &&
+			priv->virtqs[i].virtq) {
 			attr->rq_list[k] = priv->virtqs[i].virtq->id;
 			k++;
 		}
@@ -235,12 +240,12 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 }
 
 int
-mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int ret;
 
 	pthread_mutex_lock(&priv->steer_update_lock);
-	ret = mlx5_vdpa_rqt_prepare(priv);
+	ret = mlx5_vdpa_rqt_prepare(priv, is_dummy);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
@@ -261,7 +266,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
-	if (mlx5_vdpa_steer_update(priv))
+	if (mlx5_vdpa_steer_update(priv, false))
 		goto error;
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index a08c854b14..20ce382487 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -146,10 +146,10 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-static int
+void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int ret = -EAGAIN;
+	int ret;
 
 	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
@@ -157,12 +157,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+		virtq->index = 0;
+		virtq->virtq = NULL;
+		virtq->configured = 0;
 	}
-	virtq->virtq = NULL;
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
-	return 0;
 }
 
 void
@@ -175,6 +175,9 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
+		if (i < (priv->queues * 2))
+			mlx5_vdpa_virtq_single_resource_prepare(
+					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 	priv->features = 0;
@@ -258,7 +261,8 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 static int
 mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		struct mlx5_devx_virtq_attr *attr,
-		struct rte_vhost_vring *vq, int index)
+		struct rte_vhost_vring *vq,
+		int index, bool is_prepare)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	uint64_t gpa;
@@ -277,11 +281,15 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
 			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
 			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
-	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr->virtio_version_1_0 =
+	attr->tso_ipv4 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 = is_prepare ? 1 :
 		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
 	attr->q_type =
 		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
@@ -290,12 +298,12 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr->event_mode = vq->callfd != -1 ||
+	attr->event_mode = is_prepare || vq->callfd != -1 ||
 	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, virtq);
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq->size,
+				vq->callfd, virtq, !virtq->virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -320,7 +328,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	if (virtq->virtq) {
+	if (!virtq->virtq) {
 		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 			uint32_t size;
 			void *buf;
@@ -345,7 +353,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			buf = rte_zmalloc(__func__,
 				size, 4096);
 			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq."
 				" %u.", i, index);
 				return -1;
 			}
@@ -366,7 +374,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			attr->umems[i].size = virtq->umems[i].size;
 		}
 	}
-	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (!is_prepare && attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
@@ -389,21 +397,23 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid,
+	if (!is_prepare) {
+		ret = rte_vhost_get_vring_base(priv->vid,
 			index, &last_avail_idx, &last_used_idx);
-	if (ret) {
-		last_avail_idx = 0;
-		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
-	} else {
-		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		if (ret) {
+			last_avail_idx = 0;
+			last_used_idx = 0;
+			DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
+		} else {
+			DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
+		}
 	}
 	attr->hw_available_index = last_avail_idx;
 	attr->hw_used_index = last_used_idx;
 	attr->q_size = vq->size;
-	attr->mkey = priv->gpa_mkey_index;
+	attr->mkey = is_prepare ? 0 : priv->gpa_mkey_index;
 	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
 	attr->queue_index = index;
 	attr->pd = priv->cdev->pdn;
@@ -416,6 +426,39 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq = {
+		.size = priv->queue_size,
+		.callfd = -1,
+	};
+	int ret;
+
+	virtq = &priv->virtqs[index];
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	virtq->configured = 0;
+	virtq->virtq = NULL;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr, &vq, index, true);
+	if (ret) {
+		DRV_LOG(ERR,
+		"Cannot prepare setup resource for virtq %d.", index);
+		return true;
+	}
+	if (mlx5_vdpa_is_modify_virtq_supported(priv)) {
+		virtq->virtq =
+		mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+		virtq->priv = priv;
+		if (!virtq->virtq)
+			return true;
+	}
+	return false;
+}
+
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 {
@@ -473,7 +516,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 	virtq->priv = priv;
 	virtq->stopped = 0;
 	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
-				&vq, index);
+				&vq, index, false);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to setup update virtq attr"
 			" %d.", index);
@@ -746,7 +789,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to disable steering "
 					"for virtq %d.", index);
@@ -761,7 +804,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		}
 		virtq->enable = 1;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to enable steering "
 					"for virtq %d.", index);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v2 15/15] vdpa/mlx5: prepare virtqueue resource creation
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (13 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
@ 2022-06-16  2:30   ` Li Zhang
  2022-06-16  7:24   ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Maxime Coquelin
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-16  2:30 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/rel_notes/release_22_07.rst |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c          | 115 +++++++++++++++++++------
 drivers/vdpa/mlx5/mlx5_vdpa.h          |  12 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  |  15 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 111 ++++++++++++++++++++----
 5 files changed, 209 insertions(+), 45 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 2056cd9ee7..e1a9796e5c 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -178,6 +178,7 @@ New Features
 * **Updated Nvidia mlx5 vDPA driver.**
 
   * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources.
+  * Added new devarg ``max_conf_threads`` defines the number of multi-thread management to parallel the configurations.
 
 
 Removed Items
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f006a9cd3f..c5d82872c7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv,
+		bool release_resource)
 {
-	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-	struct mlx5_vdpa_priv *priv =
-		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
 	int ret = 0;
+	int vid = priv->vid;
 
-	if (priv == NULL) {
-		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
-		return -1;
-	}
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
-	if (priv->use_c_thread) {
+	if (priv->use_c_thread && !release_resource) {
 		if (priv->last_c_thrd_idx >=
 			(conf_thread_mng.max_thrds - 1))
 			priv->last_c_thrd_idx = 0;
@@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid)
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, release_resource);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid)
 	return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (!vdev) {
+		DRV_LOG(ERR, "Invalid vDPA device.");
+		return -1;
+	}
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	return _internal_mlx5_vdpa_dev_close(priv, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		priv->queue_size);
 }
 
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t max_queues, index;
+	struct mlx5_vdpa_virtq *virtq;
+
+	if (!priv->queues || !priv->queue_size)
+		return;
+	max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	for (index = 0; index < max_queues; ++index) {
+		virtq = &priv->virtqs[index];
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t max_queues;
-	uint32_t index;
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t max_queues, index, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 
 	for (index = 0; index < priv->caps.max_num_virtio_queues;
@@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
 		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
-	for (index = 0; index < max_queues; ++index)
-		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-			index))
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[max_queues];
+
+		for (index = 0; index < max_queues; ++index) {
+			thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = index;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = index;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_PREPARE_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+				"task prepare virtq (%d).", index);
+				main_task_idx[task_num] = index;
+				task_num++;
+			}
+		}
+		for (index = 0; index < task_num; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				main_task_idx[index]))
+				goto error;
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue prepare tasks ready.");
 			goto error;
+		}
+	} else {
+		for (index = 0; index < max_queues; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				index))
+				goto error;
+	}
 	if (mlx5_vdpa_is_modify_virtq_supported(priv))
 		if (mlx5_vdpa_steer_update(priv, true))
 			goto error;
 	return 0;
 error:
-	for (index = 0; index < max_queues; ++index) {
-		virtq = &priv->virtqs[index];
-		if (virtq->virtq) {
-			pthread_mutex_lock(&virtq->virtq_lock);
-			mlx5_vdpa_virtq_unset(virtq);
-			pthread_mutex_unlock(&virtq->virtq_lock);
-		}
-	}
-	if (mlx5_vdpa_is_modify_virtq_supported(priv))
-		mlx5_vdpa_steer_unset(priv);
+	mlx5_vdpa_prepare_virtq_destroy(priv);
 	return -1;
 }
 
@@ -860,7 +923,7 @@ static void
 mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-		mlx5_vdpa_dev_close(priv->vid);
+		_internal_mlx5_vdpa_dev_close(priv, true);
 	if (priv->use_c_thread)
 		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f353db62ac..dc4dfba5ed 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -85,6 +85,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
 	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+	MLX5_VDPA_TASK_PREPARE_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -128,6 +129,9 @@ struct mlx5_vdpa_virtq {
 	uint32_t configured:1;
 	uint32_t enable:1;
 	uint32_t stopped:1;
+	uint32_t rx_csum:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t event_mode:3;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -355,8 +359,12 @@ void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] release_resource
+ *   The vdpa driver release resource without prepare resource.
  */
-void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+		bool release_resource);
 
 /**
  * Cleanup cached resources of all virtqs.
@@ -595,4 +603,6 @@ int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
 void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index bb2279440b..6e6624e5a3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -153,6 +153,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__atomic_fetch_add(
 					task.err_cnt, 1, __ATOMIC_RELAXED);
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
 		case MLX5_VDPA_TASK_STOP_VIRTQ:
@@ -193,7 +194,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			pthread_mutex_lock(&priv->steer_update_lock);
 			mlx5_vdpa_steer_unset(priv);
 			pthread_mutex_unlock(&priv->steer_update_lock);
-			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_virtqs_release(priv, false);
 			mlx5_vdpa_drain_cq(priv);
 			if (priv->lm_mr.addr)
 				mlx5_os_wrapped_mkey_destroy(
@@ -205,6 +206,18 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&priv->dev_close_progress, 0,
 				__ATOMIC_RELAXED);
 			break;
+		case MLX5_VDPA_TASK_PREPARE_VIRTQ:
+			ret = mlx5_vdpa_virtq_single_resource_prepare(
+					priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to prepare virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+				task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 20ce382487..d4dd73f861 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -116,18 +116,29 @@ mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
 	}
 }
 
+static void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq)
+{
+	/* Clean pre-created resource in dev removal only */
+	claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+	virtq->index = 0;
+	virtq->virtq = NULL;
+	virtq->configured = 0;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
+	mlx5_vdpa_steer_unset(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
-		if (virtq->index != i)
-			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
+		if (virtq->virtq)
+			mlx5_vdpa_vq_destroy(virtq);
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -157,29 +168,37 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
-		virtq->index = 0;
-		virtq->virtq = NULL;
-		virtq->configured = 0;
 	}
+	mlx5_vdpa_vq_destroy(virtq);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 }
 
 void
-mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+	bool release_resource)
 {
 	struct mlx5_vdpa_virtq *virtq;
-	int i;
-
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	uint32_t i, max_virtq, valid_vq_num;
+
+	valid_vq_num = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : priv->caps.max_num_virtio_queues;
+	max_virtq = (release_resource &&
+		(valid_vq_num) > priv->nr_virtqs) ?
+		(valid_vq_num) : priv->nr_virtqs;
+	for (i = 0; i < max_virtq; i++) {
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
-		if (i < (priv->queues * 2))
+		virtq->enable = 0;
+		if (!release_resource && i < valid_vq_num)
 			mlx5_vdpa_virtq_single_resource_prepare(
 					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
+	if (!release_resource && priv->queues &&
+		mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			mlx5_vdpa_steer_unset(priv);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -455,6 +474,9 @@ mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
 		virtq->priv = priv;
 		if (!virtq->virtq)
 			return true;
+		virtq->rx_csum = attr.rx_csum;
+		virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+		virtq->event_mode = attr.event_mode;
 	}
 	return false;
 }
@@ -538,6 +560,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 		goto error;
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->rx_csum = attr.rx_csum;
+	virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+	virtq->event_mode = attr.event_mode;
 	virtq->configured = 1;
 	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
@@ -629,6 +654,31 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 	return 0;
 }
 
+static bool
+mlx5_vdpa_is_pre_created_vq_mismatch(struct mlx5_vdpa_priv *priv,
+		struct mlx5_vdpa_virtq *virtq)
+{
+	struct rte_vhost_vring vq;
+	uint32_t event_mode;
+
+	if (virtq->rx_csum !=
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)))
+		return true;
+	if (virtq->virtio_version_1_0 !=
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1)))
+		return true;
+	if (rte_vhost_get_vhost_vring(priv->vid, virtq->index, &vq))
+		return true;
+	if (vq.size != virtq->vq_size)
+		return true;
+	event_mode = vq.callfd != -1 || !(priv->caps.event_mode &
+		(1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+		MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (virtq->event_mode != event_mode)
+		return true;
+	return false;
+}
+
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
@@ -664,6 +714,15 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			virtq = &priv->virtqs[i];
 			if (!virtq->enable)
 				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
 			if (!thrd_idx) {
 				main_task_idx[task_num] = i;
@@ -693,6 +752,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
@@ -724,20 +784,32 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	} else {
 		for (i = 0; i < nr_vring; i++) {
 			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv,
+					virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(
+					priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (virtq->enable) {
-				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
-					pthread_mutex_unlock(
+			if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+				pthread_mutex_unlock(
 						&virtq->virtq_lock);
-					goto error;
-				}
+				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
 error:
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, true);
 	return -1;
 }
 
@@ -795,6 +867,11 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 					"for virtq %d.", index);
 		}
 		mlx5_vdpa_virtq_unset(virtq);
+	} else {
+		if (virtq->virtq &&
+			mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq))
+			DRV_LOG(WARNING,
+			"Configuration mismatch dummy virtq %d.", index);
 	}
 	if (enable) {
 		ret = mlx5_vdpa_virtq_setup(priv, index, true);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v2 00/15] mlx5/vdpa: optimize live migration time
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (14 preceding siblings ...)
  2022-06-16  2:30   ` [PATCH v2 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
@ 2022-06-16  7:24   ` Maxime Coquelin
  2022-06-16  9:02     ` Maxime Coquelin
  15 siblings, 1 reply; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-16  7:24 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

Hi Li,

On 6/16/22 04:29, Li Zhang wrote:
> Allow the driver to use internal threads to
> obtain fast configuration.
> All the threads will be open on the same core of
> the event completion queue scheduling thread.
> 
> Add max_conf_threads parameter to configure
> the maximum number of internal threads in addition to
> the caller thread (8 is suggested).
> These internal threads to pipeline handle VDPA tasks
> in system and shared with all VDPA devices.
> Default is 0, don't use internal threads for configuration.
> 
> Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
> http://patchwork.dpdk.org/project/dpdk/list/?series=21868
> 
> RFC ("Add vDPA multi-threads optiomization")
> https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-lizh@nvidia.com/

I just notice there was a RFC that was sent on time because I was not
cc'ed. I thought V1, which arrived on June 6th was targetting v22.11.

Given how late we are in the schedule for v22.07, this series will be
postponed to v22.11.

Regards,
Maxime

> V2:
> * Drop eal device removal patch in series.
> * Add release note in release_22_07.rst.
> 
> Li Zhang (12):
>    vdpa/mlx5: fix usage of capability for max number of virtqs
>    common/mlx5: extend virtq modifiable fields
>    vdpa/mlx5: pre-create virtq in the prob
>    vdpa/mlx5: optimize datapath-control synchronization
>    vdpa/mlx5: add multi-thread management for configuration
>    vdpa/mlx5: add task ring for MT management
>    vdpa/mlx5: add MT task for VM memory registration
>    vdpa/mlx5: add virtq creation task for MT management
>    vdpa/mlx5: add virtq LM log task
>    vdpa/mlx5: add device close task
>    vdpa/mlx5: add virtq sub-resources creation
>    vdpa/mlx5: prepare virtqueue resource creation
> 
> Yajun Wu (3):
>    vdpa/mlx5: support pre create virtq resource
>    common/mlx5: add DevX API to move QP to reset state
>    vdpa/mlx5: support event qp reuse
> 
>   doc/guides/rel_notes/release_22_07.rst |   5 +
>   doc/guides/vdpadevs/mlx5.rst           |  25 +
>   drivers/common/mlx5/mlx5_devx_cmds.c   |  77 ++-
>   drivers/common/mlx5/mlx5_devx_cmds.h   |   6 +-
>   drivers/common/mlx5/mlx5_prm.h         |  30 +-
>   drivers/vdpa/mlx5/meson.build          |   1 +
>   drivers/vdpa/mlx5/mlx5_vdpa.c          | 270 ++++++++--
>   drivers/vdpa/mlx5/mlx5_vdpa.h          | 152 +++++-
>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  | 360 ++++++++++++++
>   drivers/vdpa/mlx5/mlx5_vdpa_event.c    | 160 ++++--
>   drivers/vdpa/mlx5/mlx5_vdpa_lm.c       | 128 ++++-
>   drivers/vdpa/mlx5/mlx5_vdpa_mem.c      | 270 ++++++----
>   drivers/vdpa/mlx5/mlx5_vdpa_steer.c    |  22 +-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 654 ++++++++++++++++++-------
>   14 files changed, 1776 insertions(+), 384 deletions(-)
>   create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
> 


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v2 00/15] mlx5/vdpa: optimize live migration time
  2022-06-16  7:24   ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Maxime Coquelin
@ 2022-06-16  9:02     ` Maxime Coquelin
  2022-06-17  1:49       ` Li Zhang
  0 siblings, 1 reply; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-16  9:02 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/16/22 09:24, Maxime Coquelin wrote:
> Hi Li,
> 
> On 6/16/22 04:29, Li Zhang wrote:
>> Allow the driver to use internal threads to
>> obtain fast configuration.
>> All the threads will be open on the same core of
>> the event completion queue scheduling thread.
>>
>> Add max_conf_threads parameter to configure
>> the maximum number of internal threads in addition to
>> the caller thread (8 is suggested).
>> These internal threads to pipeline handle VDPA tasks
>> in system and shared with all VDPA devices.
>> Default is 0, don't use internal threads for configuration.
>>
>> Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
>> http://patchwork.dpdk.org/project/dpdk/list/?series=21868
>>
>> RFC ("Add vDPA multi-threads optiomization")
>> https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-lizh@nvidia.com/ 
>>
> 
> I just notice there was a RFC that was sent on time because I was not
> cc'ed. I thought V1, which arrived on June 6th was targetting v22.11.

Ok, so checking with Thomas, get_maintainer.pl script does not return me
for vDPA drivers patches, so that 'explain why I'm not cc'ed
automatically.

Also, the auto-delegation script in patchwork seems to assign it to
Andrew, that's why I did not see it.

I'll try to review it tomorrow.

> Given how late we are in the schedule for v22.07, this series will be
> postponed to v22.11.
> 
> Regards,
> Maxime
> 
>> V2:
>> * Drop eal device removal patch in series.
>> * Add release note in release_22_07.rst.
>>
>> Li Zhang (12):
>>    vdpa/mlx5: fix usage of capability for max number of virtqs
>>    common/mlx5: extend virtq modifiable fields
>>    vdpa/mlx5: pre-create virtq in the prob
>>    vdpa/mlx5: optimize datapath-control synchronization
>>    vdpa/mlx5: add multi-thread management for configuration
>>    vdpa/mlx5: add task ring for MT management
>>    vdpa/mlx5: add MT task for VM memory registration
>>    vdpa/mlx5: add virtq creation task for MT management
>>    vdpa/mlx5: add virtq LM log task
>>    vdpa/mlx5: add device close task
>>    vdpa/mlx5: add virtq sub-resources creation
>>    vdpa/mlx5: prepare virtqueue resource creation
>>
>> Yajun Wu (3):
>>    vdpa/mlx5: support pre create virtq resource
>>    common/mlx5: add DevX API to move QP to reset state
>>    vdpa/mlx5: support event qp reuse
>>
>>   doc/guides/rel_notes/release_22_07.rst |   5 +
>>   doc/guides/vdpadevs/mlx5.rst           |  25 +
>>   drivers/common/mlx5/mlx5_devx_cmds.c   |  77 ++-
>>   drivers/common/mlx5/mlx5_devx_cmds.h   |   6 +-
>>   drivers/common/mlx5/mlx5_prm.h         |  30 +-
>>   drivers/vdpa/mlx5/meson.build          |   1 +
>>   drivers/vdpa/mlx5/mlx5_vdpa.c          | 270 ++++++++--
>>   drivers/vdpa/mlx5/mlx5_vdpa.h          | 152 +++++-
>>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  | 360 ++++++++++++++
>>   drivers/vdpa/mlx5/mlx5_vdpa_event.c    | 160 ++++--
>>   drivers/vdpa/mlx5/mlx5_vdpa_lm.c       | 128 ++++-
>>   drivers/vdpa/mlx5/mlx5_vdpa_mem.c      | 270 ++++++----
>>   drivers/vdpa/mlx5/mlx5_vdpa_steer.c    |  22 +-
>>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 654 ++++++++++++++++++-------
>>   14 files changed, 1776 insertions(+), 384 deletions(-)
>>   create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
>>


^ permalink raw reply	[flat|nested] 137+ messages in thread

* RE: [PATCH v2 00/15] mlx5/vdpa: optimize live migration time
  2022-06-16  9:02     ` Maxime Coquelin
@ 2022-06-17  1:49       ` Li Zhang
  0 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-17  1:49 UTC (permalink / raw)
  To: Maxime Coquelin, Ori Kam, Slava Ovsiienko, Matan Azrad, Shahaf Shuler
  Cc: dev, NBU-Contact-Thomas Monjalon (EXTERNAL),
	Raslan Darawsheh, Roni Bar Yanai

Hi Maxime,

Are there any comments about the patch?
Please let me know and thanks help review it.

Regards,
Li Zhang

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Thursday, June 16, 2022 5:02 PM
> To: Li Zhang <lizh@nvidia.com>; Ori Kam <orika@nvidia.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad <matan@nvidia.com>;
> Shahaf Shuler <shahafs@nvidia.com>
> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; Raslan Darawsheh <rasland@nvidia.com>; Roni
> Bar Yanai <roniba@nvidia.com>
> Subject: Re: [PATCH v2 00/15] mlx5/vdpa: optimize live migration time
> 
> External email: Use caution opening links or attachments
> 
> 
> On 6/16/22 09:24, Maxime Coquelin wrote:
> > Hi Li,
> >
> > On 6/16/22 04:29, Li Zhang wrote:
> >> Allow the driver to use internal threads to obtain fast
> >> configuration.
> >> All the threads will be open on the same core of the event completion
> >> queue scheduling thread.
> >>
> >> Add max_conf_threads parameter to configure the maximum number of
> >> internal threads in addition to the caller thread (8 is suggested).
> >> These internal threads to pipeline handle VDPA tasks in system and
> >> shared with all VDPA devices.
> >> Default is 0, don't use internal threads for configuration.
> >>
> >> Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
> >> http://patchwork.dpdk.org/project/dpdk/list/?series=21868
> >>
> >> RFC ("Add vDPA multi-threads optiomization")
> >> https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-
> 1-
> >> lizh@nvidia.com/
> >>
> >
> > I just notice there was a RFC that was sent on time because I was not
> > cc'ed. I thought V1, which arrived on June 6th was targetting v22.11.
> 
> Ok, so checking with Thomas, get_maintainer.pl script does not return me for
> vDPA drivers patches, so that 'explain why I'm not cc'ed automatically.
> 
> Also, the auto-delegation script in patchwork seems to assign it to Andrew,
> that's why I did not see it.
> 
> I'll try to review it tomorrow.
> 
> > Given how late we are in the schedule for v22.07, this series will be
> > postponed to v22.11.
> >
> > Regards,
> > Maxime
> >
> >> V2:
> >> * Drop eal device removal patch in series.
> >> * Add release note in release_22_07.rst.
> >>
> >> Li Zhang (12):
> >>    vdpa/mlx5: fix usage of capability for max number of virtqs
> >>    common/mlx5: extend virtq modifiable fields
> >>    vdpa/mlx5: pre-create virtq in the prob
> >>    vdpa/mlx5: optimize datapath-control synchronization
> >>    vdpa/mlx5: add multi-thread management for configuration
> >>    vdpa/mlx5: add task ring for MT management
> >>    vdpa/mlx5: add MT task for VM memory registration
> >>    vdpa/mlx5: add virtq creation task for MT management
> >>    vdpa/mlx5: add virtq LM log task
> >>    vdpa/mlx5: add device close task
> >>    vdpa/mlx5: add virtq sub-resources creation
> >>    vdpa/mlx5: prepare virtqueue resource creation
> >>
> >> Yajun Wu (3):
> >>    vdpa/mlx5: support pre create virtq resource
> >>    common/mlx5: add DevX API to move QP to reset state
> >>    vdpa/mlx5: support event qp reuse
> >>
> >>   doc/guides/rel_notes/release_22_07.rst |   5 +
> >>   doc/guides/vdpadevs/mlx5.rst           |  25 +
> >>   drivers/common/mlx5/mlx5_devx_cmds.c   |  77 ++-
> >>   drivers/common/mlx5/mlx5_devx_cmds.h   |   6 +-
> >>   drivers/common/mlx5/mlx5_prm.h         |  30 +-
> >>   drivers/vdpa/mlx5/meson.build          |   1 +
> >>   drivers/vdpa/mlx5/mlx5_vdpa.c          | 270 ++++++++--
> >>   drivers/vdpa/mlx5/mlx5_vdpa.h          | 152 +++++-
> >>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  | 360 ++++++++++++++
> >>   drivers/vdpa/mlx5/mlx5_vdpa_event.c    | 160 ++++--
> >>   drivers/vdpa/mlx5/mlx5_vdpa_lm.c       | 128 ++++-
> >>   drivers/vdpa/mlx5/mlx5_vdpa_mem.c      | 270 ++++++----
> >>   drivers/vdpa/mlx5/mlx5_vdpa_steer.c    |  22 +-
> >>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 654 ++++++++++++++++++-------
> >>   14 files changed, 1776 insertions(+), 384 deletions(-)
> >>   create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
> >>


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v2 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs
  2022-06-16  2:29   ` [PATCH v2 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
@ 2022-06-17 14:27     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-17 14:27 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, stable



On 6/16/22 04:29, Li Zhang wrote:
> The driver wrongly takes the capability value for
> the number of virtq pairs instead of just the number of virtqs.
> 
> Adjust all the usages of it to be the number of virtqs.
> 
> Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c       | 12 ++++++------
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +++---
>   2 files changed, 9 insertions(+), 9 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource
  2022-06-16  2:29   ` [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
@ 2022-06-17 15:36     ` Maxime Coquelin
  2022-06-18  8:04       ` Li Zhang
  0 siblings, 1 reply; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-17 15:36 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, Yajun Wu



On 6/16/22 04:29, Li Zhang wrote:
> From: Yajun Wu <yajunw@nvidia.com>
> 
> The motivation of this change is to reduce vDPA device queue creation
> time by create some queue resource in vDPA device probe stage.

s/create/creating/

> 
> In VM live migration scenario, this can reduce 0.8ms for each queue
> creation, thus reduce LM network downtime.
> 
> To create queue resource(umem/counter) in advance, we need to know
> virtio queue depth and max number of queue VM will use.
> 
> Introduce two new devargs: queues(max queue pair number) and queue_size
> (queue depth). Two args must be both provided, if only one argument
> provided, the argument will be ignored and no pre-creation.
> 
> The queues and queue_size must also be identical to vhost configuration
> driver later receive. Otherwise either the pre-create resource is wasted
> or missing or the resource need destroy and recreate(in case queue_size
> mismatch).
> 
> Pre-create umem/counter will keep alive until vDPA device removal.
> 
> Signed-off-by: Yajun Wu <yajunw@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   doc/guides/vdpadevs/mlx5.rst  | 14 +++++++
>   drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++++++++++++++++++++++++++++++++++-
>   drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
>   3 files changed, 89 insertions(+), 2 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v2 03/15] common/mlx5: add DevX API to move QP to reset state
  2022-06-16  2:30   ` [PATCH v2 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
@ 2022-06-17 15:41     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-17 15:41 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, Yajun Wu



On 6/16/22 04:30, Li Zhang wrote:
> From: Yajun Wu <yajunw@nvidia.com>
> 
> Support set QP to RESET state.
> 
> Signed-off-by: Yajun Wu <yajunw@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++++++
>   drivers/common/mlx5/mlx5_prm.h       | 17 +++++++++++++++++
>   2 files changed, 24 insertions(+)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v2 05/15] common/mlx5: extend virtq modifiable fields
  2022-06-16  2:30   ` [PATCH v2 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
@ 2022-06-17 15:45     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-17 15:45 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/16/22 04:30, Li Zhang wrote:
> A virtq configuration can be modified after the virtq creation.
> Added the following modifiable fields:
> 1.address fields: desc_addr/used_addr/available_addr
> 2.hw_available_index
> 3.hw_used_index
> 4.virtio_q_type
> 5.version type
> 6.queue mkey
> 7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
> 8.event mode: event_mode/event_qpn_or_msix
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++++++++++++++++++++++-----
>   drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
>   drivers/common/mlx5/mlx5_prm.h       | 13 +++++-
>   3 files changed, 76 insertions(+), 13 deletions(-)
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob
  2022-06-16  2:30   ` [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob Li Zhang
@ 2022-06-17 15:53     ` Maxime Coquelin
  2022-06-18  7:54       ` Li Zhang
  0 siblings, 1 reply; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-17 15:53 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba

I would rename the title to something like:

"vdpa/mlx5: pre-create virtq at probe time"

On 6/16/22 04:30, Li Zhang wrote:
> dev_config operation is called in LM progress.
> LM time is very critical because all
> the VM packets are dropped directly at that time.
> 
> Move the virtq creation to probe time and
> only modify the configuration later in
> the dev_config stage using the new ability
> to modify virtq.
> 
> This optimization accelerates the LM process and
> reduces its time by 70%.

Nice.

> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   doc/guides/rel_notes/release_22_07.rst |   4 +
>   drivers/vdpa/mlx5/mlx5_vdpa.h          |   4 +
>   drivers/vdpa/mlx5/mlx5_vdpa_lm.c       |  13 +-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 257 +++++++++++++++----------
>   4 files changed, 174 insertions(+), 104 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
> index f2cf41def9..2056cd9ee7 100644
> --- a/doc/guides/rel_notes/release_22_07.rst
> +++ b/doc/guides/rel_notes/release_22_07.rst
> @@ -175,6 +175,10 @@ New Features
>     This is a fall-back implementation for platforms that
>     don't support vector operations.
>   
> +* **Updated Nvidia mlx5 vDPA driver.**
> +
> +  * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources.
> +
>   
>   Removed Items
>   -------------
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
> index bf82026e37..e5553079fe 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
> @@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
>   	uint16_t vq_size;
>   	uint8_t notifier_state;
>   	bool stopped;
> +	uint32_t configured:1;
>   	uint32_t version;
>   	struct mlx5_vdpa_priv *priv;
>   	struct mlx5_devx_obj *virtq;
> @@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
>    */
>   void
>   mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
> +
> +bool
> +mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
>   #endif /* RTE_PMD_MLX5_VDPA_H_ */
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> index 43a2b98255..a8faf0c116 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> @@ -12,14 +12,17 @@ int
>   mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
>   {
>   	struct mlx5_devx_virtq_attr attr = {
> -		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
> +		.mod_fields_bitmap =
> +			MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
>   		.dirty_bitmap_dump_enable = enable,
>   	};
> +	struct mlx5_vdpa_virtq *virtq;
>   	int i;
>   
>   	for (i = 0; i < priv->nr_virtqs; ++i) {
>   		attr.queue_index = i;
> -		if (!priv->virtqs[i].virtq) {
> +		virtq = &priv->virtqs[i];
> +		if (!virtq->configured) {
>   			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
>   				"enabling.", i);

Please avoid cutting logs, it makes it harder to grep in the code.
Also, now we can have up to 100 chars, so maybe it would fit anyway.


Other than that:

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* RE: [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob
  2022-06-17 15:53     ` Maxime Coquelin
@ 2022-06-18  7:54       ` Li Zhang
  0 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  7:54 UTC (permalink / raw)
  To: Maxime Coquelin, Ori Kam, Slava Ovsiienko, Matan Azrad, Shahaf Shuler
  Cc: dev, NBU-Contact-Thomas Monjalon (EXTERNAL),
	Raslan Darawsheh, Roni Bar Yanai

Thanks for your comments and will fix it on V3.

Regards,
Li Zhang

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 17, 2022 11:54 PM
> To: Li Zhang <lizh@nvidia.com>; Ori Kam <orika@nvidia.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad <matan@nvidia.com>;
> Shahaf Shuler <shahafs@nvidia.com>
> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; Raslan Darawsheh <rasland@nvidia.com>; Roni
> Bar Yanai <roniba@nvidia.com>
> Subject: Re: [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob
> 
> External email: Use caution opening links or attachments
> 
> 
> I would rename the title to something like:
> 
> "vdpa/mlx5: pre-create virtq at probe time"
> 
> On 6/16/22 04:30, Li Zhang wrote:
> > dev_config operation is called in LM progress.
> > LM time is very critical because all
> > the VM packets are dropped directly at that time.
> >
> > Move the virtq creation to probe time and only modify the
> > configuration later in the dev_config stage using the new ability to
> > modify virtq.
> >
> > This optimization accelerates the LM process and reduces its time by
> > 70%.
> 
> Nice.
> 
> > Signed-off-by: Li Zhang <lizh@nvidia.com>
> > Acked-by: Matan Azrad <matan@nvidia.com>
> > ---
> >   doc/guides/rel_notes/release_22_07.rst |   4 +
> >   drivers/vdpa/mlx5/mlx5_vdpa.h          |   4 +
> >   drivers/vdpa/mlx5/mlx5_vdpa_lm.c       |  13 +-
> >   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 257 +++++++++++++++----------
> >   4 files changed, 174 insertions(+), 104 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_22_07.rst
> > b/doc/guides/rel_notes/release_22_07.rst
> > index f2cf41def9..2056cd9ee7 100644
> > --- a/doc/guides/rel_notes/release_22_07.rst
> > +++ b/doc/guides/rel_notes/release_22_07.rst
> > @@ -175,6 +175,10 @@ New Features
> >     This is a fall-back implementation for platforms that
> >     don't support vector operations.
> >
> > +* **Updated Nvidia mlx5 vDPA driver.**
> > +
> > +  * Added new devargs ``queue_size`` and ``queues`` to allow prior
> creation of virtq resources.
> > +
> >
> >   Removed Items
> >   -------------
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h
> > b/drivers/vdpa/mlx5/mlx5_vdpa.h index bf82026e37..e5553079fe 100644
> > --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
> > @@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
> >       uint16_t vq_size;
> >       uint8_t notifier_state;
> >       bool stopped;
> > +     uint32_t configured:1;
> >       uint32_t version;
> >       struct mlx5_vdpa_priv *priv;
> >       struct mlx5_devx_obj *virtq;
> > @@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct
> mlx5_vdpa_priv *priv, int qid);
> >    */
> >   void
> >   mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
> > +
> > +bool
> > +mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
> >   #endif /* RTE_PMD_MLX5_VDPA_H_ */
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> > b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> > index 43a2b98255..a8faf0c116 100644
> > --- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
> > @@ -12,14 +12,17 @@ int
> >   mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
> >   {
> >       struct mlx5_devx_virtq_attr attr = {
> > -             .type =
> MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
> > +             .mod_fields_bitmap =
> > +                     MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
> >               .dirty_bitmap_dump_enable = enable,
> >       };
> > +     struct mlx5_vdpa_virtq *virtq;
> >       int i;
> >
> >       for (i = 0; i < priv->nr_virtqs; ++i) {
> >               attr.queue_index = i;
> > -             if (!priv->virtqs[i].virtq) {
> > +             virtq = &priv->virtqs[i];
> > +             if (!virtq->configured) {
> >                       DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
> >                               "enabling.", i);
> 
> Please avoid cutting logs, it makes it harder to grep in the code.
> Also, now we can have up to 100 chars, so maybe it would fit anyway.
> 
> 
> Other than that:
> 
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Thanks,
> Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* RE: [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource
  2022-06-17 15:36     ` Maxime Coquelin
@ 2022-06-18  8:04       ` Li Zhang
  0 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:04 UTC (permalink / raw)
  To: Maxime Coquelin, Ori Kam, Slava Ovsiienko, Matan Azrad, Shahaf Shuler
  Cc: dev, NBU-Contact-Thomas Monjalon (EXTERNAL),
	Raslan Darawsheh, Roni Bar Yanai, Yajun Wu

Thanks for your comment and will fix it on V3.

Regards,
Li Zhang

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 17, 2022 11:37 PM
> To: Li Zhang <lizh@nvidia.com>; Ori Kam <orika@nvidia.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad <matan@nvidia.com>;
> Shahaf Shuler <shahafs@nvidia.com>
> Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; Raslan Darawsheh <rasland@nvidia.com>; Roni
> Bar Yanai <roniba@nvidia.com>; Yajun Wu <yajunw@nvidia.com>
> Subject: Re: [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource
> 
> External email: Use caution opening links or attachments
> 
> 
> On 6/16/22 04:29, Li Zhang wrote:
> > From: Yajun Wu <yajunw@nvidia.com>
> >
> > The motivation of this change is to reduce vDPA device queue creation
> > time by create some queue resource in vDPA device probe stage.
> 
> s/create/creating/
> 
> >
> > In VM live migration scenario, this can reduce 0.8ms for each queue
> > creation, thus reduce LM network downtime.
> >
> > To create queue resource(umem/counter) in advance, we need to know
> > virtio queue depth and max number of queue VM will use.
> >
> > Introduce two new devargs: queues(max queue pair number) and
> > queue_size (queue depth). Two args must be both provided, if only one
> > argument provided, the argument will be ignored and no pre-creation.
> >
> > The queues and queue_size must also be identical to vhost
> > configuration driver later receive. Otherwise either the pre-create
> > resource is wasted or missing or the resource need destroy and
> > recreate(in case queue_size mismatch).
> >
> > Pre-create umem/counter will keep alive until vDPA device removal.
> >
> > Signed-off-by: Yajun Wu <yajunw@nvidia.com>
> > Acked-by: Matan Azrad <matan@nvidia.com>
> > ---
> >   doc/guides/vdpadevs/mlx5.rst  | 14 +++++++
> >   drivers/vdpa/mlx5/mlx5_vdpa.c | 75
> ++++++++++++++++++++++++++++++++++-
> >   drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
> >   3 files changed, 89 insertions(+), 2 deletions(-)
> >
> 
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Thanks,
> Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 00/15] mlx5/vdpa: optimize live migration time
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (17 preceding siblings ...)
  2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
@ 2022-06-18  8:47 ` Li Zhang
  2022-06-18  8:47   ` [PATCH v3 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
                     ` (14 more replies)
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
  19 siblings, 15 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

Allow the driver to use internal threads to
obtain fast configuration.
All the threads will be open on the same core of
the event completion queue scheduling thread.

Add max_conf_threads parameter to configure
the maximum number of internal threads in addition to
the caller thread (8 is suggested).
These internal threads to pipeline handle VDPA tasks
in system and shared with all VDPA devices.
Default is 0, don't use internal threads for configuration.

Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
http://patchwork.dpdk.org/project/dpdk/list/?series=21868

RFC ("Add vDPA multi-threads optiomization")
https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-lizh@nvidia.com/

V2:
* Drop eal device removal patch in series.
* Add release note in release_22_07.rst.

V3:
* Fix comments about commit log issue.
* Avoid cutting logs.

Li Zhang (12):
  vdpa/mlx5: fix usage of capability for max number of virtqs
  common/mlx5: extend virtq modifiable fields
  vdpa/mlx5: pre-create virtq at probe time
  vdpa/mlx5: optimize datapath-control synchronization
  vdpa/mlx5: add multi-thread management for configuration
  vdpa/mlx5: add task ring for MT management
  vdpa/mlx5: add MT task for VM memory registration
  vdpa/mlx5: add virtq creation task for MT management
  vdpa/mlx5: add virtq LM log task
  vdpa/mlx5: add device close task
  vdpa/mlx5: add virtq sub-resources creation
  vdpa/mlx5: prepare virtqueue resource creation

Yajun Wu (3):
  vdpa/mlx5: support pre create virtq resource
  common/mlx5: add DevX API to move QP to reset state
  vdpa/mlx5: support event qp reuse

 doc/guides/rel_notes/release_22_07.rst |   5 +
 doc/guides/vdpadevs/mlx5.rst           |  25 +
 drivers/common/mlx5/mlx5_devx_cmds.c   |  77 ++-
 drivers/common/mlx5/mlx5_devx_cmds.h   |   6 +-
 drivers/common/mlx5/mlx5_prm.h         |  30 +-
 drivers/vdpa/mlx5/meson.build          |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c          | 270 ++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h          | 152 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  | 360 ++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c    | 160 ++++--
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c       | 132 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c      | 270 ++++++----
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c    |  22 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 654 ++++++++++++++++++-------
 14 files changed, 1777 insertions(+), 387 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
@ 2022-06-18  8:47   ` Li Zhang
  2022-06-18  8:47   ` [PATCH v3 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Maxime Coquelin
  Cc: dev, thomas, rasland, roniba, stable

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array")
Cc: stable@dpdk.org

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 12 ++++++------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 76fa5d4299..ee71339b78 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -84,7 +84,7 @@ mlx5_vdpa_get_queue_num(struct rte_vdpa_device *vdev, uint32_t *queue_num)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	*queue_num = priv->caps.max_num_virtio_queues;
+	*queue_num = priv->caps.max_num_virtio_queues / 2;
 	return 0;
 }
 
@@ -141,7 +141,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -EINVAL;
 	}
-	if (vring >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (vring >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
@@ -388,7 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -411,7 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -624,7 +624,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		DRV_LOG(DEBUG, "No capability to support virtq statistics.");
 	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv) +
 			   sizeof(struct mlx5_vdpa_virtq) *
-			   attr->vdpa.max_num_virtio_queues * 2,
+			   attr->vdpa.max_num_virtio_queues,
 			   RTE_CACHE_LINE_SIZE);
 	if (!priv) {
 		DRV_LOG(ERR, "Failed to allocate private memory.");
@@ -685,7 +685,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 
 	mlx5_vdpa_dev_cache_clean(priv);
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
 			continue;
 		claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e025be47d2..c258eb3024 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -72,7 +72,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
@@ -492,9 +492,9 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM.");
 		priv->features |= (1ULL << VIRTIO_NET_F_CSUM);
 	}
-	if (nr_vring > priv->caps.max_num_virtio_queues * 2) {
+	if (nr_vring > priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Do not support more than %d virtqs(%d).",
-			(int)priv->caps.max_num_virtio_queues * 2,
+			(int)priv->caps.max_num_virtio_queues,
 			(int)nr_vring);
 		return -1;
 	}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 02/15] vdpa/mlx5: support pre create virtq resource
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
  2022-06-18  8:47   ` [PATCH v3 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
@ 2022-06-18  8:47   ` Li Zhang
  2022-06-18  8:47   ` [PATCH v3 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

The motivation of this change is to reduce vDPA device queue creation
time by creating some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++++++++++++++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+    first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx queue)
+    for pre-create queue resource to speed up first time queue creation. Set it
+    together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^^^^^^^^^^^^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee71339b78..faf833ee2f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-	mlx5_vdpa_virtqs_cleanup(priv);
+	/* Clean pre-created resource in dev removal only. */
+	if (!priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 		priv->hw_max_latency_us = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_pending_comp") == 0) {
 		priv->hw_max_pending_comp = (uint32_t)tmp;
+	} else if (strcmp(key, "queue_size") == 0) {
+		priv->queue_size = (uint16_t)tmp;
+	} else if (strcmp(key, "queues") == 0) {
+		priv->queues = (uint16_t)tmp;
+	} else {
+		DRV_LOG(WARNING, "Invalid key %s.", key);
 	}
 	return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 	if (!priv->event_us &&
 	    priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
 		priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+	if ((priv->queue_size && !priv->queues) ||
+		(!priv->queue_size && priv->queues)) {
+		priv->queue_size = 0;
+		priv->queues = 0;
+		DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+	}
 	DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
 	DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+	DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+		priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t index;
+	uint32_t i;
+
+	if (!priv->queues)
+		return 0;
+	for (index = 0; index < (priv->queues * 2); ++index) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+		if (priv->caps.queue_counters_valid) {
+			if (!virtq->counters)
+				virtq->counters =
+					mlx5_devx_cmd_create_virtio_q_counters
+						(priv->cdev->ctx);
+			if (!virtq->counters) {
+				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
+					" %d.", index);
+				return -1;
+			}
+		}
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size = priv->caps.umems[i].a * priv->queue_size +
+					priv->caps.umems[i].b;
+			buf = rte_zmalloc(__func__, size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+						" %u.", i, index);
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
+					size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				rte_free(buf);
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+						i, index);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
+		}
+	}
+	return 0;
 }
 
 static int
@@ -604,6 +671,8 @@ mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
 		return -rte_errno;
 	if (mlx5_vdpa_event_qp_global_prepare(priv))
 		return -rte_errno;
+	if (mlx5_vdpa_virtq_resource_prepare(priv))
+		return -rte_errno;
 	return 0;
 }
 
@@ -638,6 +707,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		priv->num_lag_ports = 1;
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
+	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -646,7 +716,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	mlx5_vdpa_config_get(mkvlist, priv);
 	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
@@ -684,6 +753,8 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	if (priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_dev_cache_clean(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e7f3319f89..f6719a3c60 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -135,6 +135,8 @@ struct mlx5_vdpa_priv {
 	uint8_t hw_latency_mode; /* Hardware CQ moderation mode. */
 	uint16_t hw_max_latency_us; /* Hardware CQ moderation period in usec. */
 	uint16_t hw_max_pending_comp; /* Hardware CQ moderation counter. */
+	uint16_t queue_size; /* virtq depth for pre-creating virtq resource */
+	uint16_t queues; /* Max virtq pair for pre-creating virtq resource */
 	struct rte_vdpa_device *vdev; /* vDPA device. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	int vid; /* vhost device id. */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 03/15] common/mlx5: add DevX API to move QP to reset state
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
  2022-06-18  8:47   ` [PATCH v3 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
  2022-06-18  8:47   ` [PATCH v3 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
@ 2022-06-18  8:47   ` Li Zhang
  2022-06-18  8:47   ` [PATCH v3 04/15] vdpa/mlx5: support event qp reuse Li Zhang
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++++++
 drivers/common/mlx5/mlx5_prm.h       | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6bdbc12bb..1d6d6578d6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
 	} in;
 	union {
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
 	} out;
 	void *qpc;
 	int ret;
@@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		inlen = sizeof(in.rtr2rts);
 		outlen = sizeof(out.rtr2rts);
 		break;
+	case MLX5_CMD_OP_QP_2RST:
+		MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+		inlen = sizeof(in.qp2rst);
+		outlen = sizeof(out.qp2rst);
+		break;
 	default:
 		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
 			qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index bc3e70a1d1..8a2f55c33e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
 	u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 vhca_tunnel_id[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_80[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
 	u8 status[0x8];
 	u8 reserved_0[0x18];
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 04/15] vdpa/mlx5: support event qp reuse
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (2 preceding siblings ...)
  2022-06-18  8:47   ` [PATCH v3 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
@ 2022-06-18  8:47   ` Li Zhang
  2022-06-18  8:47   ` [PATCH v3 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destroy,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index faf833ee2f..ee99952e11 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+					-1, &virtq->eqp);
 
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+				index);
+			return -1;
+		}
 		if (priv->caps.queue_counters_valid) {
 			if (!virtq->counters)
 				virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
 	struct mlx5_vdpa_cq cq;
 	struct mlx5_devx_obj *fw_qp;
 	struct mlx5_devx_qp sw_qp;
+	uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			      int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		};
 		uint32_t word;
 	} last_word;
-	uint16_t next_wqe_counter = cq->cq_ci;
+	uint16_t next_wqe_counter = eqp->qp_pi;
 	uint16_t cur_wqe_counter;
 	uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		rte_io_wmb();
 		/* Ring CQ doorbell record. */
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+		eqp->qp_pi += comp;
 		rte_io_wmb();
 		/* Ring SW QP doorbell record. */
-		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
 	}
 	return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 	return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+		mlx5_vdpa_queue_complete(cq);
+		if (cq->cq_obj.cq) {
+			cq->cq_obj.cqes[0].wqe_counter =
+				rte_cpu_to_be_16(UINT16_MAX);
+			priv->virtqs[i].eqp.qp_pi = 0;
+			if (!cq->armed)
+				mlx5_vdpa_cq_arm(priv, cq);
+		}
+	}
+}
+
 /* Wait on all CQs channel for completion event. */
 static struct mlx5_vdpa_cq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
@@ -574,14 +594,44 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
+static int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
+					  eqp->sw_qp.qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp.qp,
+			MLX5_CMD_OP_QP_2RST, eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	return mlx5_vdpa_qps2rts(eqp);
+}
+
 int
-mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			  int callfd, struct mlx5_vdpa_event_qp *eqp)
 {
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
+	if (eqp->cq.cq_obj.cq != NULL && log_desc_n == eqp->cq.log_desc_n) {
+		/* Reuse existing resources. */
+		eqp->cq.callfd = callfd;
+		/* FW will set event qp to error state in q destroy. */
+		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+					&eqp->sw_qp.db_rec[0]);
+			return 0;
+		}
+	}
+	if (eqp->fw_qp)
+		mlx5_vdpa_event_qp_destroy(eqp);
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
@@ -608,8 +658,10 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (mlx5_vdpa_qps2rts(eqp))
 		goto error;
+	eqp->qp_pi = 0;
 	/* First ringing. */
-	rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+	if (eqp->sw_qp.db_rec)
+		rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 			&eqp->sw_qp.db_rec[0]);
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c258eb3024..6637ba1503 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -87,6 +87,8 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 			}
 			virtq->umems[j].size = 0;
 		}
+		if (virtq->eqp.fw_qp)
+			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	}
 }
 
@@ -117,8 +119,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	if (virtq->eqp.fw_qp)
-		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
@@ -246,7 +246,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 						      MLX5_VIRTQ_EVENT_MODE_QP :
 						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
 						&virtq->eqp);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 05/15] common/mlx5: extend virtq modifiable fields
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (3 preceding siblings ...)
  2022-06-18  8:47   ` [PATCH v3 04/15] vdpa/mlx5: support event qp reuse Li Zhang
@ 2022-06-18  8:47   ` Li Zhang
  2022-06-18  8:47   ` [PATCH v3 06/15] vdpa/mlx5: pre-create virtq at probe time Li Zhang
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++++++++++++++++++++++-----
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h       | 13 +++++-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d6d6578d6..1b68c37092 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
 		vdpa_attr->log_doorbell_stride =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_stride);
+		vdpa_attr->vnet_modify_ext =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 vnet_modify_ext);
+		vdpa_attr->virtio_net_q_addr_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_net_q_addr_modify);
+		vdpa_attr->virtio_q_index_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_q_index_modify);
 		vdpa_attr->log_doorbell_bar_size =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_bar_size);
@@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+		attr->mod_fields_bitmap);
 	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-	switch (attr->type) {
-	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+	if (!attr->mod_fields_bitmap) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
 		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 			 attr->dirty_bitmap_mkey);
 		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 			 attr->dirty_bitmap_addr);
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 			 attr->dirty_bitmap_size);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+	}
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 			 attr->dirty_bitmap_dump_enable);
-		break;
-	default:
-		rte_errno = EINVAL;
-		return -rte_errno;
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+		MLX5_SET(virtio_q, virtctx, queue_period_mode,
+			attr->hw_latency_mode);
+		MLX5_SET(virtio_q, virtctx, queue_period_us,
+			attr->hw_max_latency_us);
+		MLX5_SET(virtio_q, virtctx, queue_max_count,
+			attr->hw_max_pending_comp);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+		MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+		MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+		MLX5_SET64(virtio_q, virtctx, available_addr,
+			attr->available_addr);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+			attr->hw_used_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+		MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+		MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY)
+		MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	if (attr->mod_fields_bitmap &
+		MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK) {
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+		MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+		MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE) {
+		MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+		MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
 	}
 	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
 					 out, sizeof(out));
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 3747ef9e33..ec6467d927 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -74,6 +74,9 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t queue_counters_valid:1;
+	uint32_t vnet_modify_ext:1;
+	uint32_t virtio_net_q_addr_modify:1;
+	uint32_t virtio_q_index_modify:1;
 	uint32_t max_num_virtio_queues;
 	struct {
 		uint32_t a;
@@ -465,7 +468,7 @@ struct mlx5_devx_virtq_attr {
 	uint32_t tis_id;
 	uint32_t counters_obj_id;
 	uint64_t dirty_bitmap_addr;
-	uint64_t type;
+	uint64_t mod_fields_bitmap;
 	uint64_t desc_addr;
 	uint64_t used_addr;
 	uint64_t available_addr;
@@ -475,6 +478,7 @@ struct mlx5_devx_virtq_attr {
 		uint64_t offset;
 	} umems[3];
 	uint8_t error_type;
+	uint8_t q_type;
 };
 
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 8a2f55c33e..5f58a6ee1d 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1802,7 +1802,9 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
 	u8 virtio_queue_type[0x8];
 	u8 reserved_at_20[0x13];
 	u8 log_doorbell_stride[0x5];
-	u8 reserved_at_3b[0x3];
+	u8 vnet_modify_ext[0x1];
+	u8 virtio_net_q_addr_modify[0x1];
+	u8 virtio_q_index_modify[0x1];
 	u8 log_doorbell_bar_size[0x5];
 	u8 doorbell_bar_offset[0x40];
 	u8 reserved_at_80[0x8];
@@ -3024,6 +3026,15 @@ enum {
 	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD = (1UL << 5),
+	MLX5_VIRTQ_MODIFY_TYPE_ADDR = (1UL << 6),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX = (1UL << 7),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX = (1UL << 8),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE = (1UL << 9),
+	MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 = (1UL << 10),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY = (1UL << 11),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK = (1UL << 12),
+	MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE = (1UL << 13),
 };
 
 struct mlx5_ifc_virtio_q_bits {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 06/15] vdpa/mlx5: pre-create virtq at probe time
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (4 preceding siblings ...)
  2022-06-18  8:47   ` [PATCH v3 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
@ 2022-06-18  8:47   ` Li Zhang
  2022-06-18  8:47   ` [PATCH v3 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_22_07.rst |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa.h          |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c       |  19 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 257 +++++++++++++++----------
 4 files changed, 176 insertions(+), 108 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index f2cf41def9..2056cd9ee7 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -175,6 +175,10 @@ New Features
   This is a fall-back implementation for platforms that
   don't support vector operations.
 
+* **Updated Nvidia mlx5 vDPA driver.**
+
+  * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources.
+
 
 Removed Items
 -------------
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
 	uint16_t vq_size;
 	uint8_t notifier_state;
 	bool stopped;
+	uint32_t configured:1;
 	uint32_t version;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..284758ad56 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,20 +12,21 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.mod_fields_bitmap =
+			MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
 		.dirty_bitmap_dump_enable = enable,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
-			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
-				"enabling.", i);
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
+			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap enabling.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 			   &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
-				"bitmap enabling.", i);
+			DRV_LOG(ERR, "Failed to modify virtq %d for dirty bitmap enabling.", i);
 			return -1;
 		}
 	}
@@ -37,10 +38,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 			   uint64_t log_size)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
 		.dirty_bitmap_addr = log_base,
 		.dirty_bitmap_size = log_size,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 	int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
 					      priv->cdev->pdn,
@@ -54,7 +56,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 						      &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6637ba1503..6e08d619e4 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		rte_intr_fd_set(virtq->intr_handle, -1);
 	}
 	rte_intr_instance_free(virtq->intr_handle);
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
+		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
-			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE,
 			.state = state ? MLX5_VIRTQ_STATE_RDY :
 					 MLX5_VIRTQ_STATE_SUSPEND,
 			.queue_index = virtq->index,
@@ -153,7 +155,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	int ret;
 
-	if (virtq->stopped)
+	if (virtq->stopped || !virtq->configured)
 		return 0;
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
@@ -209,51 +211,54 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
+		struct mlx5_devx_virtq_attr *attr,
+		struct rte_vhost_vring *vq, int index)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
-	struct rte_vhost_vring vq;
-	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
 	unsigned int i;
-	uint16_t last_avail_idx;
-	uint16_t last_used_idx;
-	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
-	uint64_t cookie;
-
-	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
-	if (ret)
-		return -1;
-	if (vq.size == 0)
-		return 0;
-	virtq->index = index;
-	virtq->vq_size = vq.size;
-	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
-							VIRTIO_F_VERSION_1));
-	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+	uint16_t last_avail_idx = 0;
+	uint16_t last_used_idx = 0;
+
+	if (virtq->virtq)
+		attr->mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE |
+			MLX5_VIRTQ_MODIFY_TYPE_ADDR |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
+			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
+			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 =
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
+	attr->q_type =
+		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
 			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
-					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
-						      MLX5_VIRTQ_EVENT_MODE_QP :
-						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
-	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
-						&virtq->eqp);
+	attr->event_mode = vq->callfd != -1 ||
+	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
+		ret = mlx5_vdpa_event_qp_prepare(priv,
+				vq->size, vq->callfd, &virtq->eqp);
 		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+			DRV_LOG(ERR,
+				"Failed to create event QPs for virtq %d.",
 				index);
 			return -1;
 		}
-		attr.qp_id = virtq->eqp.fw_qp->id;
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+		attr->qp_id = virtq->eqp.fw_qp->id;
 	} else {
 		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
 			" need event QPs and event mechanism.", index);
@@ -265,77 +270,82 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (!virtq->counters) {
 			DRV_LOG(ERR, "Failed to create virtq couners for virtq"
 				" %d.", index);
-			goto error;
+			return -1;
 		}
-		attr.counters_obj_id = virtq->counters->id;
+		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		uint32_t size;
-		void *buf;
-		struct mlx5dv_devx_umem *obj;
-
-		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
-		if (virtq->umems[i].size == size &&
-		    virtq->umems[i].obj != NULL) {
-			/* Reuse registered memory. */
-			memset(virtq->umems[i].buf, 0, size);
-			goto reuse;
-		}
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
+	if (virtq->virtq) {
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size =
+		priv->caps.umems[i].a * vq->size + priv->caps.umems[i].b;
+			if (virtq->umems[i].size == size &&
+				virtq->umems[i].obj != NULL) {
+				/* Reuse registered memory. */
+				memset(virtq->umems[i].buf, 0, size);
+				goto reuse;
+			}
+			if (virtq->umems[i].obj)
+				claim_zero(mlx5_glue->devx_umem_dereg
 				   (virtq->umems[i].obj));
-		if (virtq->umems[i].buf)
-			rte_free(virtq->umems[i].buf);
-		virtq->umems[i].size = 0;
-		virtq->umems[i].obj = NULL;
-		virtq->umems[i].buf = NULL;
-		buf = rte_zmalloc(__func__, size, 4096);
-		if (buf == NULL) {
-			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+			if (virtq->umems[i].buf)
+				rte_free(virtq->umems[i].buf);
+			virtq->umems[i].size = 0;
+			virtq->umems[i].obj = NULL;
+			virtq->umems[i].buf = NULL;
+			buf = rte_zmalloc(__func__,
+				size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
-			goto error;
-		}
-		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
-					       IBV_ACCESS_LOCAL_WRITE);
-		if (obj == NULL) {
-			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
+				buf, size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
-			goto error;
-		}
-		virtq->umems[i].size = size;
-		virtq->umems[i].buf = buf;
-		virtq->umems[i].obj = obj;
+				rte_free(buf);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
 reuse:
-		attr.umems[i].id = virtq->umems[i].obj->umem_id;
-		attr.umems[i].offset = 0;
-		attr.umems[i].size = virtq->umems[i].size;
+			attr->umems[i].id = virtq->umems[i].obj->umem_id;
+			attr->umems[i].offset = 0;
+			attr->umems[i].size = virtq->umems[i].size;
+		}
 	}
-	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.desc);
+					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
-			goto error;
+			return -1;
 		}
-		attr.desc_addr = gpa;
+		attr->desc_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.used);
+					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
-			goto error;
+			return -1;
 		}
-		attr.used_addr = gpa;
+		attr->used_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.avail);
+					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-			goto error;
+			return -1;
 		}
-		attr.available_addr = gpa;
+		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
-				 &last_used_idx);
+	ret = rte_vhost_get_vring_base(priv->vid,
+			index, &last_avail_idx, &last_used_idx);
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
@@ -345,24 +355,71 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
 	}
-	attr.hw_available_index = last_avail_idx;
-	attr.hw_used_index = last_used_idx;
-	attr.q_size = vq.size;
-	attr.mkey = priv->gpa_mkey_index;
-	attr.tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
-	attr.queue_index = index;
-	attr.pd = priv->cdev->pdn;
-	attr.hw_latency_mode = priv->hw_latency_mode;
-	attr.hw_max_latency_us = priv->hw_max_latency_us;
-	attr.hw_max_pending_comp = priv->hw_max_pending_comp;
-	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+	attr->hw_available_index = last_avail_idx;
+	attr->hw_used_index = last_used_idx;
+	attr->q_size = vq->size;
+	attr->mkey = priv->gpa_mkey_index;
+	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
+	attr->queue_index = index;
+	attr->pd = priv->cdev->pdn;
+	attr->hw_latency_mode = priv->hw_latency_mode;
+	attr->hw_max_latency_us = priv->hw_max_latency_us;
+	attr->hw_max_pending_comp = priv->hw_max_pending_comp;
+	if (attr->hw_latency_mode || attr->hw_max_latency_us ||
+		attr->hw_max_pending_comp)
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD;
+	return 0;
+}
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
+{
+	return (priv->caps.vnet_modify_ext &&
+			priv->caps.virtio_net_q_addr_modify &&
+			priv->caps.virtio_q_index_modify) ? true : false;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+{
+	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	int ret;
+	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
+	uint64_t cookie;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->priv = priv;
-	if (!virtq->virtq)
+	virtq->stopped = 0;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
+				&vq, index);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to setup update virtq attr %d.",
+			index);
 		goto error;
-	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
-	if (mlx5_vdpa_virtq_modify(virtq, 1))
+	}
+	if (!virtq->virtq) {
+		virtq->index = index;
+		virtq->vq_size = vq.size;
+		virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx,
+			&attr);
+		if (!virtq->virtq)
+			goto error;
+		attr.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE;
+	}
+	attr.state = MLX5_VIRTQ_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify virtq %d.", index);
 		goto error;
-	virtq->priv = priv;
+	}
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->configured = 1;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
@@ -553,7 +610,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 			return 0;
 		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
 	}
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
 			ret = mlx5_vdpa_steer_update(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 07/15] vdpa/mlx5: optimize datapath-control synchronization
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (5 preceding siblings ...)
  2022-06-18  8:47   ` [PATCH v3 06/15] vdpa/mlx5: pre-create virtq at probe time Li Zhang
@ 2022-06-18  8:47   ` Li Zhang
  2022-06-18  8:47   ` [PATCH v3 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 24 ++++---
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++++++++++++++++++-----------
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    | 34 +++++++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++++++++++++++++++-------
 6 files changed, 184 insertions(+), 79 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee99952e11..e5a11f72fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
 	struct mlx5_vdpa_priv *priv =
 		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	struct mlx5_vdpa_virtq *virtq;
 	int ret;
 
 	if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
-	pthread_mutex_lock(&priv->vq_config_lock);
+	virtq = &priv->virtqs[vring];
+	pthread_mutex_lock(&virtq->virtq_lock);
 	ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-	pthread_mutex_unlock(&priv->vq_config_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
-	/* The mutex may stay locked after event thread cancel - initiate it. */
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t index;
 	uint32_t i;
 
+	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+		index++) {
+		virtq = &priv->virtqs[index];
+		pthread_mutex_init(&virtq->virtq_lock, NULL);
+	}
 	if (!priv->queues)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
-		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		virtq = &priv->virtqs[index];
 		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, &virtq->eqp);
+					-1, virtq);
 
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
+	rte_spinlock_init(&priv->db_lock);
+	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
-	pthread_mutex_destroy(&priv->vq_config_lock);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
 	bool stopped;
 	uint32_t configured:1;
 	uint32_t version;
+	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
 	enum mlx5_dev_state state;
-	pthread_mutex_t vq_config_lock;
+	rte_spinlock_t db_lock;
+	pthread_mutex_t steer_update_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
 	int event_mode;
@@ -222,14 +224,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   Number of descriptors.
  * @param[in] callfd
  *   The guest notification file descriptor.
- * @param[in/out] eqp
- *   Pointer to the event QP structure.
+ * @param[in/out] virtq
+ *   Pointer to the virt-queue structure.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			      int callfd, struct mlx5_vdpa_event_qp *eqp);
+int
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+	int callfd, struct mlx5_vdpa_virtq *virtq);
 
 /**
  * Destroy an event QP and all its related resources.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b43dca9255..2b0f5936d1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -85,12 +85,13 @@ mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
 
 static int
 mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
-		    int callfd, struct mlx5_vdpa_cq *cq)
+		int callfd, struct mlx5_vdpa_virtq *virtq)
 {
 	struct mlx5_devx_cq_attr attr = {
 		.use_first_only = 1,
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
+	struct mlx5_vdpa_cq *cq = &virtq->eqp.cq;
 	uint16_t event_nums[1] = {0};
 	int ret;
 
@@ -102,10 +103,11 @@ mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
 	cq->log_desc_n = log_desc_n;
 	rte_spinlock_init(&cq->sl);
 	/* Subscribe CQ event to the event channel controlled by the driver. */
-	ret = mlx5_os_devx_subscribe_devx_event(priv->eventc,
-						cq->cq_obj.cq->obj,
-						sizeof(event_nums), event_nums,
-						(uint64_t)(uintptr_t)cq);
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc,
+							cq->cq_obj.cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)virtq);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to subscribe CQE event.");
 		rte_errno = errno;
@@ -167,13 +169,17 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 static void
 mlx5_vdpa_arm_all_cqs(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_cq *cq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		cq = &priv->virtqs[i].eqp.cq;
 		if (cq->cq_obj.cq && !cq->armed)
 			mlx5_vdpa_cq_arm(priv, cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
@@ -220,13 +226,18 @@ mlx5_vdpa_queue_complete(struct mlx5_vdpa_cq *cq)
 static uint32_t
 mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 {
-	int i;
+	struct mlx5_vdpa_virtq *virtq;
+	struct mlx5_vdpa_cq *cq;
 	uint32_t max = 0;
+	uint32_t comp;
+	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
-		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
-		uint32_t comp = mlx5_vdpa_queue_complete(cq);
-
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		cq = &virtq->eqp.cq;
+		comp = mlx5_vdpa_queue_complete(cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		if (comp > max)
 			max = comp;
 	}
@@ -253,7 +264,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 }
 
 /* Wait on all CQs channel for completion event. */
-static struct mlx5_vdpa_cq *
+static struct mlx5_vdpa_virtq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 {
 #ifdef HAVE_IBV_DEVX_EVENT
@@ -265,7 +276,8 @@ mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 					    sizeof(out.buf));
 
 	if (ret >= 0)
-		return (struct mlx5_vdpa_cq *)(uintptr_t)out.event_resp.cookie;
+		return (struct mlx5_vdpa_virtq *)
+				(uintptr_t)out.event_resp.cookie;
 	DRV_LOG(INFO, "Got error in devx_get_event, ret = %d, errno = %d.",
 		ret, errno);
 #endif
@@ -276,7 +288,7 @@ static void *
 mlx5_vdpa_event_handle(void *arg)
 {
 	struct mlx5_vdpa_priv *priv = arg;
-	struct mlx5_vdpa_cq *cq;
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t max;
 
 	switch (priv->event_mode) {
@@ -284,7 +296,6 @@ mlx5_vdpa_event_handle(void *arg)
 	case MLX5_VDPA_EVENT_MODE_FIXED_TIMER:
 		priv->timer_delay_us = priv->event_us;
 		while (1) {
-			pthread_mutex_lock(&priv->vq_config_lock);
 			max = mlx5_vdpa_queues_complete(priv);
 			if (max == 0 && priv->no_traffic_counter++ >=
 			    priv->no_traffic_max) {
@@ -292,32 +303,37 @@ mlx5_vdpa_event_handle(void *arg)
 					priv->vdev->device->name);
 				mlx5_vdpa_arm_all_cqs(priv);
 				do {
-					pthread_mutex_unlock
-							(&priv->vq_config_lock);
-					cq = mlx5_vdpa_event_wait(priv);
-					pthread_mutex_lock
-							(&priv->vq_config_lock);
-					if (cq == NULL ||
-					       mlx5_vdpa_queue_complete(cq) > 0)
+					virtq = mlx5_vdpa_event_wait(priv);
+					if (virtq == NULL)
 						break;
+					pthread_mutex_lock(
+						&virtq->virtq_lock);
+					if (mlx5_vdpa_queue_complete(
+						&virtq->eqp.cq) > 0) {
+						pthread_mutex_unlock(
+							&virtq->virtq_lock);
+						break;
+					}
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
 				} while (1);
 				priv->timer_delay_us = priv->event_us;
 				priv->no_traffic_counter = 0;
 			} else if (max != 0) {
 				priv->no_traffic_counter = 0;
 			}
-			pthread_mutex_unlock(&priv->vq_config_lock);
 			mlx5_vdpa_timer_sleep(priv, max);
 		}
 		return NULL;
 	case MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT:
 		do {
-			cq = mlx5_vdpa_event_wait(priv);
-			if (cq != NULL) {
-				pthread_mutex_lock(&priv->vq_config_lock);
-				if (mlx5_vdpa_queue_complete(cq) > 0)
-					mlx5_vdpa_cq_arm(priv, cq);
-				pthread_mutex_unlock(&priv->vq_config_lock);
+			virtq = mlx5_vdpa_event_wait(priv);
+			if (virtq != NULL) {
+				pthread_mutex_lock(&virtq->virtq_lock);
+				if (mlx5_vdpa_queue_complete(
+					&virtq->eqp.cq) > 0)
+					mlx5_vdpa_cq_arm(priv, &virtq->eqp.cq);
+				pthread_mutex_unlock(&virtq->virtq_lock);
 			}
 		} while (1);
 		return NULL;
@@ -339,7 +355,6 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t sec;
 
-	pthread_mutex_lock(&priv->vq_config_lock);
 	while (mlx5_glue->devx_get_event(priv->err_chnl, &out.event_resp,
 					 sizeof(out.buf)) >=
 				       (ssize_t)sizeof(out.event_resp.cookie)) {
@@ -351,10 +366,11 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			continue;
 		}
 		virtq = &priv->virtqs[vq_index];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		if (!virtq->enable || virtq->version != version)
-			continue;
+			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
-			continue;
+			goto unlock;
 		virtq->stopped = true;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
@@ -384,8 +400,9 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 		for (i = 1; i < RTE_DIM(virtq->err_time); i++)
 			virtq->err_time[i - 1] = virtq->err_time[i];
 		virtq->err_time[RTE_DIM(virtq->err_time) - 1] = rte_rdtsc();
+unlock:
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
-	pthread_mutex_unlock(&priv->vq_config_lock);
 #endif
 }
 
@@ -533,11 +550,18 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 void
 mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	void *status;
+	int i;
 
 	if (priv->timer_tid) {
 		pthread_cancel(priv->timer_tid);
 		pthread_join(priv->timer_tid, &status);
+		/* The mutex may stay locked after event thread cancel, initiate it. */
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_init(&virtq->virtq_lock, NULL);
+		}
 	}
 	priv->timer_tid = 0;
 }
@@ -614,8 +638,9 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			  int callfd, struct mlx5_vdpa_event_qp *eqp)
+	int callfd, struct mlx5_vdpa_virtq *virtq)
 {
+	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
@@ -632,7 +657,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
-	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
+	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, virtq) ||
+		!eqp->cq.cq_obj.cq)
 		return -1;
 	attr.pd = priv->cdev->pdn;
 	attr.ts_format =
@@ -650,8 +676,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	attr.ts_format =
 		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
 	ret = mlx5_devx_qp_create(priv->cdev->ctx, &(eqp->sw_qp),
-					attr.num_of_receive_wqes *
-					MLX5_WSEG_SIZE, &attr, SOCKET_ID_ANY);
+				  attr.num_of_receive_wqes * MLX5_WSEG_SIZE,
+				  &attr, SOCKET_ID_ANY);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
 		goto error;
@@ -668,3 +694,4 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	mlx5_vdpa_event_qp_destroy(eqp);
 	return -1;
 }
+
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 284758ad56..ae495a35f3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -24,10 +24,17 @@ mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 		virtq = &priv->virtqs[i];
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap enabling.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 			   &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for dirty bitmap enabling.", i);
-			return -1;
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR, "Failed to modify virtq %d for dirty bitmap enabling.", i);
+				return -1;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -59,10 +66,19 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 		virtq = &priv->virtqs[i];
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
-						      &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for LM.", i);
-			goto err;
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(
+					priv->virtqs[i].virtq,
+					&attr)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to modify virtq %d for LM.", i);
+				goto err;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -77,6 +93,7 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	int i;
@@ -88,10 +105,13 @@ mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 	if (!RTE_VHOST_NEED_LOG(features))
 		return 0;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
+		virtq = &priv->virtqs[i];
 		if (!priv->virtqs[i].virtq) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
 		} else {
+			pthread_mutex_lock(&virtq->virtq_lock);
 			ret = mlx5_vdpa_virtq_stop(priv, i);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 			if (ret) {
 				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
 					"log.", i);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index d4b4375c88..4cbf09784e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -237,19 +237,24 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 {
-	int ret = mlx5_vdpa_rqt_prepare(priv);
+	int ret;
 
+	pthread_mutex_lock(&priv->steer_update_lock);
+	ret = mlx5_vdpa_rqt_prepare(priv);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
+		pthread_mutex_unlock(&priv->steer_update_lock);
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
 		ret = mlx5_vdpa_rss_flows_create(priv);
 		if (ret) {
 			DRV_LOG(ERR, "Cannot create RSS flows.");
+			pthread_mutex_unlock(&priv->steer_update_lock);
 			return -1;
 		}
 	}
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	return 0;
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6e08d619e4..63b6f44725 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -24,13 +24,17 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	pthread_mutex_lock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
 		return;
 	}
-	if (rte_intr_fd_get(virtq->intr_handle) < 0)
+	if (rte_intr_fd_get(virtq->intr_handle) < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
 	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
@@ -44,9 +48,14 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 		}
 		break;
 	}
-	if (nbytes < 0)
+	if (nbytes < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
@@ -66,6 +75,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Virtq must be locked before calling this function. */
+static void
+mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
+{
+	int ret = -EAGAIN;
+
+	if (!virtq->intr_handle)
+		return;
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(virtq->intr_handle,
+					mlx5_vdpa_virtq_kick_handler, virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					virtq->index);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+				pthread_mutex_lock(&virtq->virtq_lock);
+			}
+		}
+		(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	}
+	rte_intr_instance_free(virtq->intr_handle);
+	virtq->intr_handle = NULL;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
@@ -75,6 +111,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		pthread_mutex_lock(&virtq->virtq_lock);
 		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
@@ -90,28 +127,17 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 		}
 		if (virtq->eqp.fw_qp)
 			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
-		while (ret == -EAGAIN) {
-			ret = rte_intr_callback_unregister(virtq->intr_handle,
-					mlx5_vdpa_virtq_kick_handler, virtq);
-			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
-					rte_intr_fd_get(virtq->intr_handle),
-					virtq->index);
-				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
-			}
-		}
-		rte_intr_fd_set(virtq->intr_handle, -1);
-	}
-	rte_intr_instance_free(virtq->intr_handle);
+	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
@@ -128,10 +154,15 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
-	for (i = 0; i < priv->nr_virtqs; i++)
-		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unset(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -250,7 +281,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
 		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, &virtq->eqp);
+				vq->size, vq->callfd, virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -420,7 +451,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	virtq->configured = 1;
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
 		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
@@ -441,7 +474,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (rte_intr_callback_register(virtq->intr_handle,
 					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
-			rte_intr_fd_set(virtq->intr_handle, -1);
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
@@ -537,6 +570,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	struct mlx5_vdpa_virtq *virtq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -556,9 +590,17 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++)
-		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-			goto error;
+	for (i = 0; i < nr_vring; i++) {
+		virtq = &priv->virtqs[i];
+		if (virtq->enable) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_vdpa_virtq_setup(priv, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 08/15] vdpa/mlx5: add multi-thread management for configuration
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (6 preceding siblings ...)
  2022-06-18  8:47   ` [PATCH v3 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
@ 2022-06-18  8:47   ` Li Zhang
  2022-06-18  8:47   ` [PATCH v3 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst          |  11 +++
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  41 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  36 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is suggested).
+    This value, if not 0, should be the same for all the devices;
+    the first prob will take it with the event_core for all the multi-thread configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'mlx5_vdpa_virtq.c',
         'mlx5_vdpa_steer.c',
         'mlx5_vdpa_lm.c',
+        'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
         '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e5a11f72fd..a9d023ed08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 			DRV_LOG(WARNING, "Invalid event_core %s.", val);
 		else
 			priv->event_core = tmp;
+	} else if (strcmp(key, "max_conf_threads") == 0) {
+		if (tmp) {
+			priv->use_c_thread = true;
+			if (!conf_thread_mng.initializer_priv) {
+				conf_thread_mng.initializer_priv = priv;
+				if (tmp > MLX5_VDPA_MAX_C_THRD) {
+					DRV_LOG(WARNING,
+				"Invalid max_conf_threads %s "
+				"and set max_conf_threads to %d",
+				val, MLX5_VDPA_MAX_C_THRD);
+					tmp = MLX5_VDPA_MAX_C_THRD;
+				}
+				conf_thread_mng.max_thrds = tmp;
+			} else if (tmp != conf_thread_mng.max_thrds) {
+				DRV_LOG(WARNING,
+	"max_conf_threads is PMD argument and not per device, "
+	"only the first device configuration set it, current value is %d "
+	"and will not be changed to %d.",
+				conf_thread_mng.max_thrds, (int)tmp);
+			}
+		} else {
+			priv->use_c_thread = false;
+		}
 	} else if (strcmp(key, "hw_latency_mode") == 0) {
 		priv->hw_latency_mode = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		"hw_max_latency_us",
 		"hw_max_pending_comp",
 		"no_traffic_time",
+		"queue_size",
+		"queues",
+		"max_conf_threads",
 		NULL,
 	};
 
@@ -725,6 +753,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
+	if (priv->use_c_thread) {
+		if (conf_thread_mng.initializer_priv == priv)
+			if (mlx5_vdpa_mult_threads_create(priv->event_core))
+				goto error;
+		__atomic_fetch_add(&conf_thread_mng.refcnt, 1,
+			__ATOMIC_RELAXED);
+	}
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -739,6 +774,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
 error:
+	if (conf_thread_mng.initializer_priv == priv)
+		mlx5_vdpa_mult_threads_destroy(false);
 	if (priv)
 		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
@@ -806,6 +843,10 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
+	if (priv->use_c_thread)
+		if (__atomic_fetch_sub(&conf_thread_mng.refcnt,
+			1, __ATOMIC_RELAXED) == 1)
+			mlx5_vdpa_mult_threads_destroy(true);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3fd5eefc5e..4e7c2557b7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -73,6 +73,22 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_MAX_C_THRD 256
+
+/* Generic mlx5_vdpa_c_thread information. */
+struct mlx5_vdpa_c_thread {
+	pthread_t tid;
+};
+
+struct mlx5_vdpa_conf_thread_mng {
+	void *initializer_priv;
+	uint32_t refcnt;
+	uint32_t max_thrds;
+	pthread_mutex_t cthrd_lock;
+	struct mlx5_vdpa_c_thread cthrd[MLX5_VDPA_MAX_C_THRD];
+};
+extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -126,6 +142,7 @@ enum mlx5_dev_state {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
+	bool use_c_thread;
 	enum mlx5_dev_state state;
 	rte_spinlock_t db_lock;
 	pthread_mutex_t steer_update_lock;
@@ -496,4 +513,23 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create configuration multi-threads resource
+ *
+ * @param[in] cpu_core
+ *   CPU core number to set configuration threads affinity to.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_vdpa_mult_threads_create(int cpu_core);
+
+/**
+ * Destroy configuration multi-threads resource
+ *
+ */
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
new file mode 100644
index 0000000000..ba7d8b63b3
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+#include <string.h>
+#include <unistd.h>
+#include <sys/eventfd.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_io.h>
+#include <rte_alarm.h>
+#include <rte_tailq.h>
+#include <rte_ring_elem.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static void *
+mlx5_vdpa_c_thread_handle(void *arg)
+{
+	/* To be added later. */
+	return arg;
+}
+
+static void
+mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
+{
+	if (conf_thread_mng.cthrd[thrd_idx].tid) {
+		pthread_cancel(conf_thread_mng.cthrd[thrd_idx].tid);
+		pthread_join(conf_thread_mng.cthrd[thrd_idx].tid, NULL);
+		conf_thread_mng.cthrd[thrd_idx].tid = 0;
+		if (need_unlock)
+			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	}
+}
+
+static int
+mlx5_vdpa_c_thread_create(int cpu_core)
+{
+	const struct sched_param sp = {
+		.sched_priority = sched_get_priority_max(SCHED_RR),
+	};
+	rte_cpuset_t cpuset;
+	pthread_attr_t attr;
+	uint32_t thrd_idx;
+	char name[32];
+	int ret;
+
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_attr_init(&attr);
+	ret = pthread_attr_setschedpolicy(&attr, SCHED_RR);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread sched policy = RR.");
+		goto c_thread_err;
+	}
+	ret = pthread_attr_setschedparam(&attr, &sp);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread priority.");
+		goto c_thread_err;
+	}
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++) {
+		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
+				&attr, mlx5_vdpa_c_thread_handle,
+				(void *)&conf_thread_mng);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create vdpa multi-threads %d.",
+					thrd_idx);
+			goto c_thread_err;
+		}
+		CPU_ZERO(&cpuset);
+		if (cpu_core != -1)
+			CPU_SET(cpu_core, &cpuset);
+		else
+			cpuset = rte_lcore_cpuset(rte_get_main_lcore());
+		ret = pthread_setaffinity_np(
+				conf_thread_mng.cthrd[thrd_idx].tid,
+				sizeof(cpuset), &cpuset);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set thread affinity for "
+			"vdpa multi-threads %d.", thrd_idx);
+			goto c_thread_err;
+		}
+		snprintf(name, sizeof(name), "vDPA-mthread-%d", thrd_idx);
+		ret = pthread_setname_np(
+				conf_thread_mng.cthrd[thrd_idx].tid, name);
+		if (ret)
+			DRV_LOG(ERR, "Failed to set vdpa multi-threads name %s.",
+					name);
+		else
+			DRV_LOG(DEBUG, "Thread name: %s.", name);
+	}
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+c_thread_err:
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, false);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return -1;
+}
+
+int
+mlx5_vdpa_mult_threads_create(int cpu_core)
+{
+	pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	if (mlx5_vdpa_c_thread_create(cpu_core)) {
+		DRV_LOG(ERR, "Cannot create vDPA configuration threads.");
+		mlx5_vdpa_mult_threads_destroy(false);
+		return -1;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock)
+{
+	uint32_t thrd_idx;
+
+	if (!conf_thread_mng.initializer_priv)
+		return;
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, need_unlock);
+	pthread_mutex_destroy(&conf_thread_mng.cthrd_lock);
+	memset(&conf_thread_mng, 0, sizeof(struct mlx5_vdpa_conf_thread_mng));
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 2b0f5936d1..b45fbac146 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -507,7 +507,7 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 	pthread_attr_t attr;
 	char name[16];
 	const struct sched_param sp = {
-		.sched_priority = sched_get_priority_max(SCHED_RR),
+		.sched_priority = sched_get_priority_max(SCHED_RR) - 1,
 	};
 
 	if (!priv->eventc)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 63b6f44725..ce3f524fdb 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -43,7 +43,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 			    errno == EWOULDBLOCK ||
 			    errno == EAGAIN)
 				continue;
-			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s.",
 				virtq->index, strerror(errno));
 		}
 		break;
@@ -57,7 +57,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	rte_spinlock_unlock(&priv->db_lock);
 	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
-		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling.",
 			priv->vid, virtq->index);
 		return;
 	}
@@ -218,7 +218,7 @@ mlx5_vdpa_virtq_query(struct mlx5_vdpa_priv *priv, int index)
 		return -1;
 	}
 	if (attr.state == MLX5_VIRTQ_STATE_ERROR)
-		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu",
+		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu.",
 			priv->vid, index, attr.error_type);
 	return 0;
 }
@@ -380,7 +380,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0");
+		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
 	} else {
 		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 09/15] vdpa/mlx5: add task ring for MT management
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (7 preceding siblings ...)
  2022-06-18  8:47   ` [PATCH v3 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
@ 2022-06-18  8:47   ` Li Zhang
  2022-06-18  8:48   ` [PATCH v3 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:47 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  17 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+	struct mlx5_vdpa_priv *priv;
+	uint32_t *remaining_cnt;
+	uint32_t *err_cnt;
+	uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
 	pthread_t tid;
+	struct rte_ring *rng;
+	pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..1fdc92d3ad 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,103 @@
 #include <rte_alarm.h>
 #include <rte_tailq.h>
 #include <rte_ring_elem.h>
+#include <rte_ring_peek.h>
 
 #include <mlx5_common.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+	void **obj, uint32_t n, uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_elem_start(r, obj,
+		sizeof(struct mlx5_vdpa_task), n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_elem_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+	void * const *obj, uint32_t n, uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_elem_finish(r, obj,
+		sizeof(struct mlx5_vdpa_task), n);
+	return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num)
+{
+	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t i;
+
+	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+	for (i = 0 ; i < num; i++) {
+		task[i].priv = priv;
+		/* To be added later. */
+	}
+	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+		return -1;
+	for (i = 0 ; i < num; i++)
+		if (task[i].remaining_cnt)
+			__atomic_fetch_add(task[i].remaining_cnt, 1,
+				__ATOMIC_RELAXED);
+	/* wake up conf thread. */
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-	/* To be added later. */
-	return arg;
+	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_priv *priv;
+	struct mlx5_vdpa_task task;
+	struct rte_ring *rng;
+	uint32_t thrd_idx;
+	uint32_t task_num;
+
+	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+		thrd_idx++)
+		if (multhrd->cthrd[thrd_idx].tid == thread_id)
+			break;
+	if (thrd_idx >= multhrd->max_thrds)
+		return NULL;
+	rng = multhrd->cthrd[thrd_idx].rng;
+	while (1) {
+		task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+			(void **)&task, 1, NULL);
+		if (!task_num) {
+			/* No task and condition wait. */
+			pthread_mutex_lock(&multhrd->cthrd_lock);
+			pthread_cond_wait(
+				&multhrd->cthrd[thrd_idx].c_cond,
+				&multhrd->cthrd_lock);
+			pthread_mutex_unlock(&multhrd->cthrd_lock);
+		}
+		priv = task.priv;
+		if (priv == NULL)
+			continue;
+		__atomic_fetch_sub(task.remaining_cnt,
+			1, __ATOMIC_RELAXED);
+		/* To be added later. */
+	}
+	return NULL;
 }
 
 static void
@@ -34,6 +120,10 @@ mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
 		if (need_unlock)
 			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
 	}
+	if (conf_thread_mng.cthrd[thrd_idx].rng) {
+		rte_ring_free(conf_thread_mng.cthrd[thrd_idx].rng);
+		conf_thread_mng.cthrd[thrd_idx].rng = NULL;
+	}
 }
 
 static int
@@ -45,6 +135,7 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 	rte_cpuset_t cpuset;
 	pthread_attr_t attr;
 	uint32_t thrd_idx;
+	uint32_t ring_num;
 	char name[32];
 	int ret;
 
@@ -60,8 +151,26 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 		DRV_LOG(ERR, "Failed to set thread priority.");
 		goto c_thread_err;
 	}
+	ring_num = MLX5_VDPA_MAX_TASKS_PER_THRD / conf_thread_mng.max_thrds;
+	if (!ring_num) {
+		DRV_LOG(ERR, "Invalid ring number for thread.");
+		goto c_thread_err;
+	}
 	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
 		thrd_idx++) {
+		snprintf(name, sizeof(name), "vDPA-mthread-ring-%d",
+			thrd_idx);
+		conf_thread_mng.cthrd[thrd_idx].rng = rte_ring_create_elem(name,
+			sizeof(struct mlx5_vdpa_task), ring_num,
+			rte_socket_id(),
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ |
+			RING_F_EXACT_SZ);
+		if (!conf_thread_mng.cthrd[thrd_idx].rng) {
+			DRV_LOG(ERR,
+			"Failed to create vdpa multi-threads %d ring.",
+			thrd_idx);
+			goto c_thread_err;
+		}
 		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
 				&attr, mlx5_vdpa_c_thread_handle,
 				(void *)&conf_thread_mng);
@@ -91,6 +200,8 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 					name);
 		else
 			DRV_LOG(DEBUG, "Thread name: %s.", name);
+		pthread_cond_init(&conf_thread_mng.cthrd[thrd_idx].c_cond,
+			NULL);
 	}
 	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
 	return 0;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 10/15] vdpa/mlx5: add MT task for VM memory registration
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (8 preceding siblings ...)
  2022-06-18  8:47   ` [PATCH v3 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
@ 2022-06-18  8:48   ` Li Zhang
  2022-06-18  8:48   ` [PATCH v3 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:48 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 270 ++++++++++++++++++--------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 258 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a9d023ed08..e3b32fa087 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
 	union {
 		struct ibv_mr *mr;
 		struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0xFFFF
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+	MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
 	struct mlx5_vdpa_priv *priv;
+	enum mlx5_vdpa_task_type type;
 	uint32_t *remaining_cnt;
 	uint32_t *err_cnt;
 	uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+	struct rte_vhost_memory *vmem;
+	uint32_t entries_num;
+	uint64_t gcd;
+	uint64_t size;
+	uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
-	struct rte_vhost_memory *vmem;
+	struct mlx5_vdpa_vmem_info vmem_info;
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_event_channel *err_chnl;
 	struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
 	uint8_t num_lag_ports;
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
+	uint16_t last_c_thrd_idx;
+	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
 	void *virtq_db_addr;
 	struct mlx5_pmd_wrapped_mr lm_mr;
-	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+	struct mlx5_vdpa_query_mr **mrs;
 	struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num);
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1fdc92d3ad..10391931ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num)
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
+		void **task_data, uint32_t num)
 {
 	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
 	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t *data = (uint32_t *)task_data;
 	uint32_t i;
 
 	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
 	for (i = 0 ; i < num; i++) {
 		task[i].priv = priv;
 		/* To be added later. */
+		task[i].type = task_type;
+		task[i].remaining_cnt = remaining_cnt;
+		task[i].err_cnt = err_cnt;
+		task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -71,6 +78,23 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time)
+{
+	/* Check and wait all tasks done. */
+	while (__atomic_load_n(remaining_cnt,
+		__ATOMIC_RELAXED) != 0) {
+		rte_delay_us_sleep(sleep_time);
+	}
+	if (__atomic_load_n(err_cnt,
+		__ATOMIC_RELAXED)) {
+		DRV_LOG(ERR, "Tasks done with error.");
+		return true;
+	}
+	return false;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
@@ -81,6 +105,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct rte_ring *rng;
 	uint32_t thrd_idx;
 	uint32_t task_num;
+	int ret;
 
 	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
 		thrd_idx++)
@@ -99,13 +124,29 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&multhrd->cthrd[thrd_idx].c_cond,
 				&multhrd->cthrd_lock);
 			pthread_mutex_unlock(&multhrd->cthrd_lock);
+			continue;
 		}
 		priv = task.priv;
 		if (priv == NULL)
 			continue;
-		__atomic_fetch_sub(task.remaining_cnt,
+		switch (task.type) {
+		case MLX5_VDPA_TASK_REG_MR:
+			ret = mlx5_vdpa_register_mr(priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mr %d.", task.idx);
+				__atomic_fetch_add(task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
+		default:
+			DRV_LOG(ERR, "Invalid vdpa task type %d.",
+			task.type);
+			break;
+		}
+		if (task.remaining_cnt)
+			__atomic_fetch_sub(task.remaining_cnt,
 			1, __ATOMIC_RELAXED);
-		/* To be added later. */
 	}
 	return NULL;
 }
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index d6e3dd664b..e333f0bca6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -17,25 +17,33 @@
 void
 mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
 	struct mlx5_vdpa_query_mr *entry;
-	struct mlx5_vdpa_query_mr *next;
+	int i;
 
-	entry = SLIST_FIRST(&priv->mr_list);
-	while (entry) {
-		next = SLIST_NEXT(entry, next);
-		if (entry->is_indirect)
-			claim_zero(mlx5_devx_cmd_destroy(entry->mkey));
-		else
-			claim_zero(mlx5_glue->dereg_mr(entry->mr));
-		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
-		rte_free(entry);
-		entry = next;
+	if (priv->mrs) {
+		for (i = priv->num_mrs - 1; i >= 0; i--) {
+			entry = &mrs[i];
+			if (entry->is_indirect) {
+				if (entry->mkey)
+					claim_zero(
+					mlx5_devx_cmd_destroy(entry->mkey));
+			} else {
+				if (entry->mr)
+					claim_zero(
+					mlx5_glue->dereg_mr(entry->mr));
+			}
+		}
+		rte_free(priv->mrs);
+		priv->mrs = NULL;
+		priv->num_mrs = 0;
 	}
-	SLIST_INIT(&priv->mr_list);
-	if (priv->vmem) {
-		free(priv->vmem);
-		priv->vmem = NULL;
+	if (priv->vmem_info.vmem) {
+		free(priv->vmem_info.vmem);
+		priv->vmem_info.vmem = NULL;
 	}
+	priv->gpa_mkey_index = 0;
 }
 
 static int
@@ -167,72 +175,29 @@ mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
-/*
- * The target here is to group all the physical memory regions of the
- * virtio device in one indirect mkey.
- * For KLM Fixed Buffer Size mode (HW find the translation entry in one
- * read according to the guest physical address):
- * All the sub-direct mkeys of it must be in the same size, hence, each
- * one of them should be in the GCD size of all the virtio memory
- * regions and the holes between them.
- * For KLM mode (each entry may be in different size so HW must iterate
- * the entries):
- * Each virtio memory region and each hole between them have one entry,
- * just need to cover the maximum allowed size(2G) by splitting entries
- * which their associated memory regions are bigger than 2G.
- * It means that each virtio memory region may be mapped to more than
- * one direct mkey in the 2 modes.
- * All the holes of invalid memory between the virtio memory regions
- * will be mapped to the null memory region for security.
- */
-int
-mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+static int
+mlx5_vdpa_create_indirect_mkey(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_devx_mkey_attr mkey_attr;
-	struct mlx5_vdpa_query_mr *entry = NULL;
-	struct rte_vhost_mem_region *reg = NULL;
-	uint8_t mode = 0;
-	uint32_t entries_num = 0;
-	uint32_t i;
-	uint64_t gcd = 0;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	uint8_t mode = priv->vmem_info.mode;
+	uint32_t entries_num = priv->vmem_info.entries_num;
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_klm klm_array[entries_num];
+	uint64_t gcd = priv->vmem_info.gcd;
+	int ret = -rte_errno;
 	uint64_t klm_size;
-	uint64_t mem_size;
-	uint64_t k;
 	int klm_index = 0;
-	int ret;
-	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
-			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
-	struct mlx5_klm klm_array[entries_num];
+	uint64_t k;
+	uint32_t i;
 
-	if (!mem)
-		return -rte_errno;
-	if (priv->vmem != NULL) {
-		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
-			/* VM memory not changed, reuse resources. */
-			free(mem);
-			return 0;
-		}
-		mlx5_vdpa_mem_dereg(priv);
-	}
-	priv->vmem = mem;
+	/* If it is the last entry, create indirect mkey. */
 	for (i = 0; i < mem->nregions; i++) {
+		entry = &mrs[i];
 		reg = &mem->regions[i];
-		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-		if (!entry) {
-			ret = -ENOMEM;
-			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
-			goto error;
-		}
-		entry->mr = mlx5_glue->reg_mr_iova(priv->cdev->pd,
-				       (void *)(uintptr_t)(reg->host_user_addr),
-				       reg->size, reg->guest_phys_addr,
-				       IBV_ACCESS_LOCAL_WRITE);
-		if (!entry->mr) {
-			DRV_LOG(ERR, "Failed to create direct Mkey.");
-			ret = -rte_errno;
-			goto error;
-		}
-		entry->is_indirect = 0;
 		if (i > 0) {
 			uint64_t sadd;
 			uint64_t empty_region_sz = reg->guest_phys_addr -
@@ -265,11 +230,10 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 			klm_array[klm_index].address = reg->guest_phys_addr + k;
 			klm_index++;
 		}
-		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	}
 	memset(&mkey_attr, 0, sizeof(mkey_attr));
 	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
-	mkey_attr.size = mem_size;
+	mkey_attr.size = priv->vmem_info.size;
 	mkey_attr.pd = priv->cdev->pdn;
 	mkey_attr.umem_id = 0;
 	/* Must be zero for KLM mode. */
@@ -278,25 +242,159 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	mkey_attr.pg_access = 0;
 	mkey_attr.klm_array = klm_array;
 	mkey_attr.klm_num = klm_index;
-	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-	if (!entry) {
-		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
-		ret = -ENOMEM;
-		goto error;
-	}
+	entry = &mrs[mem->nregions];
 	entry->mkey = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &mkey_attr);
 	if (!entry->mkey) {
 		DRV_LOG(ERR, "Failed to create indirect Mkey.");
-		ret = -rte_errno;
-		goto error;
+		rte_errno = -ret;
+		return ret;
 	}
 	entry->is_indirect = 1;
-	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	priv->gpa_mkey_index = entry->mkey->id;
 	return 0;
+}
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	void *mrs;
+	uint8_t mode = 0;
+	int ret = -rte_errno;
+	uint32_t i, thrd_idx, data[1];
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			(priv->vid, &mode, &priv->vmem_info.size,
+			&priv->vmem_info.gcd, &priv->vmem_info.entries_num);
+
+	if (!mem)
+		return -rte_errno;
+	if (priv->vmem_info.vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem_info.vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
+	priv->vmem_info.vmem = mem;
+	priv->vmem_info.mode = mode;
+	priv->num_mrs = mem->nregions;
+	if (!priv->num_mrs || priv->num_mrs >= MLX5_VDPA_MAX_MRS) {
+		DRV_LOG(ERR,
+		"Invalid number of memory regions.");
+		goto error;
+	}
+	/* The last one is indirect mkey entry. */
+	priv->num_mrs++;
+	mrs = rte_zmalloc("mlx5 vDPA memory regions",
+		sizeof(struct mlx5_vdpa_query_mr) * priv->num_mrs, 0);
+	priv->mrs = mrs;
+	if (!priv->mrs) {
+		DRV_LOG(ERR, "Failed to allocate private memory regions.");
+		goto error;
+	}
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[mem->nregions];
+
+		for (i = 0; i < mem->nregions; i++) {
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_REG_MR,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR,
+				"Fail to add task mem region (%d)", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			ret = mlx5_vdpa_register_mr(priv,
+					main_task_idx[i]);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 100)) {
+			DRV_LOG(ERR,
+			"Failed to wait register mem region tasks ready.");
+			goto error;
+		}
+	} else {
+		for (i = 0; i < mem->nregions; i++) {
+			ret = mlx5_vdpa_register_mr(priv, i);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+	}
+	ret = mlx5_vdpa_create_indirect_mkey(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create indirect mkey .");
+		goto error;
+	}
+	return 0;
 error:
-	rte_free(entry);
 	mlx5_vdpa_mem_dereg(priv);
 	rte_errno = -ret;
 	return ret;
 }
+
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx)
+{
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	int ret;
+
+	reg = &mem->regions[idx];
+	entry = &mrs[idx];
+	entry->mr = mlx5_glue->reg_mr_iova
+				      (priv->cdev->pd,
+				       (void *)(uintptr_t)(reg->host_user_addr),
+				       reg->size, reg->guest_phys_addr,
+				       IBV_ACCESS_LOCAL_WRITE);
+	if (!entry->mr) {
+		DRV_LOG(ERR, "Failed to create direct Mkey.");
+		ret = -rte_errno;
+		return ret;
+	}
+	entry->is_indirect = 0;
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index ce3f524fdb..1f81fb8723 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -353,21 +353,21 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 	}
 	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
 			return -1;
 		}
 		attr->desc_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
 			return -1;
 		}
 		attr->used_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 11/15] vdpa/mlx5: add virtq creation task for MT management
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (9 preceding siblings ...)
  2022-06-18  8:48   ` [PATCH v3 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
@ 2022-06-18  8:48   ` Li Zhang
  2022-06-18  8:48   ` [PATCH v3 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:48 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++++++++++++++++++-------
 4 files changed, 134 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
+	MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
-	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
 	uint8_t notifier_state;
-	bool stopped;
 	uint32_t configured:1;
+	uint32_t enable:1;
+	uint32_t stopped:1;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
 		enum mlx5_vdpa_task_type task_type,
-		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
 		void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 10391931ae..1389d369ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
 	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
 	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
@@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__ATOMIC_RELAXED);
 			}
 			break;
+		case MLX5_VDPA_TASK_SETUP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_setup(priv,
+				task.idx, false);
+			if (ret) {
+				DRV_LOG(ERR,
+					"Failed to setup virtq %d.", task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1, __ATOMIC_RELAXED);
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
 			goto unlock;
-		virtq->stopped = true;
+		virtq->stopped = 1;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
 			goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 1f81fb8723..50d59a8394 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		if (virtq->index != i)
+			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
-		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -131,7 +132,6 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
@@ -191,7 +191,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
 		return -1;
-	virtq->stopped = true;
+	virtq->stopped = 1;
 	DRV_LOG(DEBUG, "vid %u virtq %u was stopped.", priv->vid, index);
 	return mlx5_vdpa_virtq_query(priv, index);
 }
@@ -411,7 +411,38 @@ mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_doorbell_setup(struct mlx5_vdpa_virtq *virtq,
+		struct rte_vhost_vring *vq, int index)
+{
+	virtq->intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (virtq->intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		return -1;
+	}
+	if (rte_intr_fd_set(virtq->intr_handle, vq->kickfd))
+		return -1;
+	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
+		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
+	} else {
+		if (rte_intr_type_set(virtq->intr_handle,
+			RTE_INTR_HANDLE_EXT))
+			return -1;
+		if (rte_intr_callback_register(virtq->intr_handle,
+			mlx5_vdpa_virtq_kick_handler, virtq)) {
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
+				index);
+			return -1;
+		}
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			rte_intr_fd_get(virtq->intr_handle), index);
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	struct rte_vhost_vring vq;
@@ -455,33 +486,11 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
-	virtq->intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (virtq->intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
-
-	if (rte_intr_fd_set(virtq->intr_handle, vq.kickfd))
-		goto error;
-
-	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
-		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-	} else {
-		if (rte_intr_type_set(virtq->intr_handle, RTE_INTR_HANDLE_EXT))
-			goto error;
-
-		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_kick_handler,
-					       virtq)) {
-			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	if (reg_kick) {
+		if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, index)) {
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
-		} else {
-			DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				index);
 		}
 	}
 	/* Subscribe virtq error event. */
@@ -497,7 +506,6 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		rte_errno = errno;
 		goto error;
 	}
-	virtq->stopped = false;
 	/* Initial notification to ask Qemu handling completed buffers. */
 	if (virtq->eqp.cq.callfd != -1)
 		eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
@@ -567,10 +575,12 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t i;
-	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -590,16 +600,83 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		virtq = &priv->virtqs[i];
-		if (virtq->enable) {
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[nr_vring];
+
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_SETUP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+						"task setup virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (mlx5_vdpa_virtq_setup(priv, i)) {
+			if (mlx5_vdpa_virtq_setup(priv,
+				main_task_idx[i], false)) {
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			goto error;
+		}
+		for (i = 0; i < nr_vring; i++) {
+			/* Setup doorbell mapping in order for Qume. */
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->enable || !virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			if (rte_vhost_get_vhost_vring(priv->vid, i, &vq)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to register virtq %d interrupt.", i);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	} else {
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (virtq->enable) {
+				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
+					goto error;
+				}
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
 	}
 	return 0;
 error:
@@ -663,7 +740,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		mlx5_vdpa_virtq_unset(virtq);
 	}
 	if (enable) {
-		ret = mlx5_vdpa_virtq_setup(priv, index);
+		ret = mlx5_vdpa_virtq_setup(priv, index, true);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to setup virtq %d.", index);
 			return ret;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 12/15] vdpa/mlx5: add virtq LM log task
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (10 preceding siblings ...)
  2022-06-18  8:48   ` [PATCH v3 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
@ 2022-06-18  8:48   ` Li Zhang
  2022-06-18  8:48   ` [PATCH v3 13/15] vdpa/mlx5: add device close task Li Zhang
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:48 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 85 +++++++++++++++++++++------
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
+	MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1389d369ae..98369f0887 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
+	uint64_t features;
 	uint32_t thrd_idx;
 	uint32_t task_num;
 	int ret;
@@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_STOP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			ret = rte_vhost_get_negotiated_features(
+				priv->vid, &features);
+			if (ret) {
+				DRV_LOG(ERR,
+		"Failed to get negotiated features virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(
+				priv->vid, task.idx, 0,
+			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index ae495a35f3..016e2a097b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -87,39 +87,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
-	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-	int i;
+	int ret;
 
+	ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to get negotiated features.");
 		return -1;
 	}
-	if (!RTE_VHOST_NEED_LOG(features))
-		return 0;
-	for (i = 0; i < priv->nr_virtqs; ++i) {
-		virtq = &priv->virtqs[i];
-		if (!priv->virtqs[i].virtq) {
-			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-		} else {
+	if (priv->use_c_thread && priv->nr_virtqs) {
+		uint32_t main_task_idx[priv->nr_virtqs];
+
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->configured)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_STOP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+					"task stop virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			ret = mlx5_vdpa_virtq_stop(priv, i);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					main_task_idx[i]);
+			if (ret) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.", i);
+				return -1;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			return -1;
+		}
+	} else {
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			ret = mlx5_vdpa_virtq_stop(priv, i);
 			if (ret) {
-				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
-					"log.", i);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d for LM log.", i);
 				return -1;
 			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
-		rte_vhost_log_used_vring(priv->vid, i, 0,
-			      MLX5_VDPA_USED_RING_LEN(priv->virtqs[i].vq_size));
 	}
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 13/15] vdpa/mlx5: add device close task
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (11 preceding siblings ...)
  2022-06-18  8:48   ` [PATCH v3 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
@ 2022-06-18  8:48   ` Li Zhang
  2022-06-18  8:48   ` [PATCH v3 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
  2022-06-18  8:48   ` [PATCH v3 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:48 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 56 +++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++++++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e3b32fa087..d000854c08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
 	/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t timeout = 0;
+
+	/* Check and wait all close tasks done. */
+	while (__atomic_load_n(&priv->dev_close_progress,
+		__ATOMIC_RELAXED) != 0 && timeout < 1000) {
+		rte_delay_us_sleep(10000);
+		timeout++;
+	}
+	if (priv->dev_close_progress) {
+		DRV_LOG(ERR,
+		"Failed to wait close device tasks done vid %d.",
+		priv->vid);
+		return true;
+	}
+	return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	if (priv->use_c_thread) {
+		if (priv->last_c_thrd_idx >=
+			(conf_thread_mng.max_thrds - 1))
+			priv->last_c_thrd_idx = 0;
+		else
+			priv->last_c_thrd_idx++;
+		__atomic_store_n(&priv->dev_close_progress,
+			1, __ATOMIC_RELAXED);
+		if (mlx5_vdpa_task_add(priv,
+			priv->last_c_thrd_idx,
+			MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+			NULL, NULL, NULL, 1)) {
+			DRV_LOG(ERR,
+			"Fail to add dev close task. ");
+			goto single_thrd;
+		}
+		priv->state = MLX5_VDPA_STATE_PROBED;
+		DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+		return ret;
+	}
+single_thrd:
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	priv->state = MLX5_VDPA_STATE_PROBED;
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
+	__atomic_store_n(&priv->dev_close_progress, 0,
+		__ATOMIC_RELAXED);
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
+	if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+		return -1;
 	priv->vid = vid;
 	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
@@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED)
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
+		if (priv->use_c_thread)
+			mlx5_vdpa_wait_dev_close_tasks_done(priv);
 		mlx5_vdpa_dev_cache_clean(priv);
+	}
 	priv->connected = false;
 	return 0;
 }
@@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 		mlx5_vdpa_dev_close(priv->vid);
+	if (priv->use_c_thread)
+		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
+	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -206,6 +207,7 @@ struct mlx5_vdpa_priv {
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
 	uint16_t last_c_thrd_idx;
+	uint16_t dev_close_progress;
 	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
@@ -578,4 +580,10 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
+void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 98369f0887..bb2279440b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -63,7 +63,8 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		task[i].type = task_type;
 		task[i].remaining_cnt = remaining_cnt;
 		task[i].err_cnt = err_cnt;
-		task[i].idx = data[i];
+		if (data)
+			task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -187,6 +188,23 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT:
+			mlx5_vdpa_virtq_unreg_intr_handle_all(priv);
+			pthread_mutex_lock(&priv->steer_update_lock);
+			mlx5_vdpa_steer_unset(priv);
+			pthread_mutex_unlock(&priv->steer_update_lock);
+			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_drain_cq(priv);
+			if (priv->lm_mr.addr)
+				mlx5_os_wrapped_mkey_destroy(
+					&priv->lm_mr);
+			if (!priv->connected)
+				mlx5_vdpa_dev_cache_clean(priv);
+			priv->vid = 0;
+			__atomic_store_n(
+				&priv->dev_close_progress, 0,
+				__ATOMIC_RELAXED);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 50d59a8394..79d48a6569 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -102,6 +102,20 @@ mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
 	virtq->intr_handle = NULL;
 }
 
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
+
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unregister_intr_handle(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 14/15] vdpa/mlx5: add virtq sub-resources creation
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (12 preceding siblings ...)
  2022-06-18  8:48   ` [PATCH v3 13/15] vdpa/mlx5: add device close task Li Zhang
@ 2022-06-18  8:48   ` Li Zhang
  2022-06-18  8:48   ` [PATCH v3 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:48 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin, Yajun Wu

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 72 +++++++--------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +++++++++++++++++++++--------
 5 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d000854c08..f006a9cd3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_vdpa_virtq *virtq;
+	uint32_t max_queues;
 	uint32_t index;
-	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
 
-	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+	for (index = 0; index < priv->caps.max_num_virtio_queues;
 		index++) {
 		virtq = &priv->virtqs[index];
 		pthread_mutex_init(&virtq->virtq_lock, NULL);
 	}
-	if (!priv->queues)
+	if (!priv->queues || !priv->queue_size)
 		return 0;
-	for (index = 0; index < (priv->queues * 2); ++index) {
+	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	for (index = 0; index < max_queues; ++index)
+		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+			index))
+			goto error;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			goto error;
+	return 0;
+error:
+	for (index = 0; index < max_queues; ++index) {
 		virtq = &priv->virtqs[index];
-		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, virtq);
-
-		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-				index);
-			return -1;
-		}
-		if (priv->caps.queue_counters_valid) {
-			if (!virtq->counters)
-				virtq->counters =
-					mlx5_devx_cmd_create_virtio_q_counters
-						(priv->cdev->ctx);
-			if (!virtq->counters) {
-				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
-					" %d.", index);
-				return -1;
-			}
-		}
-		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-			uint32_t size;
-			void *buf;
-			struct mlx5dv_devx_umem *obj;
-
-			size = priv->caps.umems[i].a * priv->queue_size +
-					priv->caps.umems[i].b;
-			buf = rte_zmalloc(__func__, size, 4096);
-			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
-						" %u.", i, index);
-				return -1;
-			}
-			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-					size, IBV_ACCESS_LOCAL_WRITE);
-			if (obj == NULL) {
-				rte_free(buf);
-				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
-						i, index);
-				return -1;
-			}
-			virtq->umems[i].size = size;
-			virtq->umems[i].buf = buf;
-			virtq->umems[i].obj = obj;
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
-	return 0;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..f353db62ac 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   The guest notification file descriptor.
  * @param[in/out] virtq
  *   Pointer to the virt-queue structure.
+ * @param[in] reset
+ *   If true, it will reset event qp.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq);
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset);
 
 /**
  * Destroy an event QP and all its related resources.
@@ -403,11 +405,13 @@ void mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] is_dummy
+ *   If set, it is updated with dummy queue for prepare resource.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
-int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv);
+int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy);
 
 /**
  * Setup steering and all its related resources to enable RSS traffic from the
@@ -581,9 +585,14 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 void
-mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
-void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
 void
 mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index);
+int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
+void
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f782b6b832..22f0920c88 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -249,7 +249,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
 
 		mlx5_vdpa_queue_complete(cq);
@@ -618,7 +618,7 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
-static int
+int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 {
 	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
@@ -638,7 +638,7 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq)
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset)
 {
 	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
@@ -649,11 +649,10 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 		/* Reuse existing resources. */
 		eqp->cq.callfd = callfd;
 		/* FW will set event qp to error state in q destroy. */
-		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+		if (reset && !mlx5_vdpa_qps2rst2rts(eqp))
 			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 					&eqp->sw_qp.db_rec[0]);
-			return 0;
-		}
+		return 0;
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index 4cbf09784e..c2e0a17ace 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -57,7 +57,7 @@ mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
  * -1 on error.
  */
 static int
-mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int i;
 	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
@@ -67,15 +67,20 @@ mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 						      sizeof(uint32_t), 0);
 	uint32_t k = 0, j;
 	int ret = 0, num;
+	uint16_t nr_vring = is_dummy ?
+	(((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+	(priv->queues * 2) : priv->caps.max_num_virtio_queues) : priv->nr_virtqs;
 
 	if (!attr) {
 		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	for (i = 0; i < nr_vring; i++) {
 		if (is_virtq_recvq(i, priv->nr_virtqs) &&
-		    priv->virtqs[i].enable && priv->virtqs[i].virtq) {
+			(is_dummy || (priv->virtqs[i].enable &&
+			priv->virtqs[i].configured)) &&
+			priv->virtqs[i].virtq) {
 			attr->rq_list[k] = priv->virtqs[i].virtq->id;
 			k++;
 		}
@@ -235,12 +240,12 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 }
 
 int
-mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int ret;
 
 	pthread_mutex_lock(&priv->steer_update_lock);
-	ret = mlx5_vdpa_rqt_prepare(priv);
+	ret = mlx5_vdpa_rqt_prepare(priv, is_dummy);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
@@ -261,7 +266,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
-	if (mlx5_vdpa_steer_update(priv))
+	if (mlx5_vdpa_steer_update(priv, false))
 		goto error;
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 79d48a6569..58466b3c0b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -146,10 +146,10 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-static int
+void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int ret = -EAGAIN;
+	int ret;
 
 	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
@@ -157,12 +157,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+		virtq->index = 0;
+		virtq->virtq = NULL;
+		virtq->configured = 0;
 	}
-	virtq->virtq = NULL;
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
-	return 0;
 }
 
 void
@@ -175,6 +175,9 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
+		if (i < (priv->queues * 2))
+			mlx5_vdpa_virtq_single_resource_prepare(
+					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 	priv->features = 0;
@@ -258,7 +261,8 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 static int
 mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		struct mlx5_devx_virtq_attr *attr,
-		struct rte_vhost_vring *vq, int index)
+		struct rte_vhost_vring *vq,
+		int index, bool is_prepare)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	uint64_t gpa;
@@ -277,11 +281,15 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
 			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
 			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
-	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr->virtio_version_1_0 =
+	attr->tso_ipv4 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 = is_prepare ? 1 :
 		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
 	attr->q_type =
 		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
@@ -290,12 +298,12 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr->event_mode = vq->callfd != -1 ||
+	attr->event_mode = is_prepare || vq->callfd != -1 ||
 	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, virtq);
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq->size,
+				vq->callfd, virtq, !virtq->virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -320,7 +328,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	if (virtq->virtq) {
+	if (!virtq->virtq) {
 		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 			uint32_t size;
 			void *buf;
@@ -345,7 +353,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			buf = rte_zmalloc(__func__,
 				size, 4096);
 			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq."
 				" %u.", i, index);
 				return -1;
 			}
@@ -366,7 +374,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			attr->umems[i].size = virtq->umems[i].size;
 		}
 	}
-	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (!is_prepare && attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
@@ -389,21 +397,23 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid,
+	if (!is_prepare) {
+		ret = rte_vhost_get_vring_base(priv->vid,
 			index, &last_avail_idx, &last_used_idx);
-	if (ret) {
-		last_avail_idx = 0;
-		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
-	} else {
-		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		if (ret) {
+			last_avail_idx = 0;
+			last_used_idx = 0;
+			DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
+		} else {
+			DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
+		}
 	}
 	attr->hw_available_index = last_avail_idx;
 	attr->hw_used_index = last_used_idx;
 	attr->q_size = vq->size;
-	attr->mkey = priv->gpa_mkey_index;
+	attr->mkey = is_prepare ? 0 : priv->gpa_mkey_index;
 	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
 	attr->queue_index = index;
 	attr->pd = priv->cdev->pdn;
@@ -416,6 +426,39 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq = {
+		.size = priv->queue_size,
+		.callfd = -1,
+	};
+	int ret;
+
+	virtq = &priv->virtqs[index];
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	virtq->configured = 0;
+	virtq->virtq = NULL;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr, &vq, index, true);
+	if (ret) {
+		DRV_LOG(ERR,
+		"Cannot prepare setup resource for virtq %d.", index);
+		return true;
+	}
+	if (mlx5_vdpa_is_modify_virtq_supported(priv)) {
+		virtq->virtq =
+		mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+		virtq->priv = priv;
+		if (!virtq->virtq)
+			return true;
+	}
+	return false;
+}
+
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 {
@@ -473,7 +516,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 	virtq->priv = priv;
 	virtq->stopped = 0;
 	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
-				&vq, index);
+				&vq, index, false);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to setup update virtq attr %d.",
 			index);
@@ -746,7 +789,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to disable steering "
 					"for virtq %d.", index);
@@ -761,7 +804,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		}
 		virtq->enable = 1;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to enable steering "
 					"for virtq %d.", index);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v3 15/15] vdpa/mlx5: prepare virtqueue resource creation
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
                     ` (13 preceding siblings ...)
  2022-06-18  8:48   ` [PATCH v3 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
@ 2022-06-18  8:48   ` Li Zhang
  14 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  8:48 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/rel_notes/release_22_07.rst |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c          | 115 +++++++++++++++++++------
 drivers/vdpa/mlx5/mlx5_vdpa.h          |  12 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  |  15 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 111 ++++++++++++++++++++----
 5 files changed, 209 insertions(+), 45 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 2056cd9ee7..e1a9796e5c 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -178,6 +178,7 @@ New Features
 * **Updated Nvidia mlx5 vDPA driver.**
 
   * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources.
+  * Added new devarg ``max_conf_threads`` defines the number of multi-thread management to parallel the configurations.
 
 
 Removed Items
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f006a9cd3f..c5d82872c7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv,
+		bool release_resource)
 {
-	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-	struct mlx5_vdpa_priv *priv =
-		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
 	int ret = 0;
+	int vid = priv->vid;
 
-	if (priv == NULL) {
-		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
-		return -1;
-	}
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
-	if (priv->use_c_thread) {
+	if (priv->use_c_thread && !release_resource) {
 		if (priv->last_c_thrd_idx >=
 			(conf_thread_mng.max_thrds - 1))
 			priv->last_c_thrd_idx = 0;
@@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid)
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, release_resource);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid)
 	return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (!vdev) {
+		DRV_LOG(ERR, "Invalid vDPA device.");
+		return -1;
+	}
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	return _internal_mlx5_vdpa_dev_close(priv, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		priv->queue_size);
 }
 
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t max_queues, index;
+	struct mlx5_vdpa_virtq *virtq;
+
+	if (!priv->queues || !priv->queue_size)
+		return;
+	max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	for (index = 0; index < max_queues; ++index) {
+		virtq = &priv->virtqs[index];
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t max_queues;
-	uint32_t index;
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t max_queues, index, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 
 	for (index = 0; index < priv->caps.max_num_virtio_queues;
@@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
 		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
-	for (index = 0; index < max_queues; ++index)
-		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-			index))
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[max_queues];
+
+		for (index = 0; index < max_queues; ++index) {
+			thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = index;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = index;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_PREPARE_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+				"task prepare virtq (%d).", index);
+				main_task_idx[task_num] = index;
+				task_num++;
+			}
+		}
+		for (index = 0; index < task_num; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				main_task_idx[index]))
+				goto error;
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue prepare tasks ready.");
 			goto error;
+		}
+	} else {
+		for (index = 0; index < max_queues; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				index))
+				goto error;
+	}
 	if (mlx5_vdpa_is_modify_virtq_supported(priv))
 		if (mlx5_vdpa_steer_update(priv, true))
 			goto error;
 	return 0;
 error:
-	for (index = 0; index < max_queues; ++index) {
-		virtq = &priv->virtqs[index];
-		if (virtq->virtq) {
-			pthread_mutex_lock(&virtq->virtq_lock);
-			mlx5_vdpa_virtq_unset(virtq);
-			pthread_mutex_unlock(&virtq->virtq_lock);
-		}
-	}
-	if (mlx5_vdpa_is_modify_virtq_supported(priv))
-		mlx5_vdpa_steer_unset(priv);
+	mlx5_vdpa_prepare_virtq_destroy(priv);
 	return -1;
 }
 
@@ -860,7 +923,7 @@ static void
 mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-		mlx5_vdpa_dev_close(priv->vid);
+		_internal_mlx5_vdpa_dev_close(priv, true);
 	if (priv->use_c_thread)
 		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f353db62ac..dc4dfba5ed 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -85,6 +85,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
 	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+	MLX5_VDPA_TASK_PREPARE_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -128,6 +129,9 @@ struct mlx5_vdpa_virtq {
 	uint32_t configured:1;
 	uint32_t enable:1;
 	uint32_t stopped:1;
+	uint32_t rx_csum:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t event_mode:3;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -355,8 +359,12 @@ void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] release_resource
+ *   The vdpa driver release resource without prepare resource.
  */
-void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+		bool release_resource);
 
 /**
  * Cleanup cached resources of all virtqs.
@@ -595,4 +603,6 @@ int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
 void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index bb2279440b..6e6624e5a3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -153,6 +153,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__atomic_fetch_add(
 					task.err_cnt, 1, __ATOMIC_RELAXED);
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
 		case MLX5_VDPA_TASK_STOP_VIRTQ:
@@ -193,7 +194,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			pthread_mutex_lock(&priv->steer_update_lock);
 			mlx5_vdpa_steer_unset(priv);
 			pthread_mutex_unlock(&priv->steer_update_lock);
-			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_virtqs_release(priv, false);
 			mlx5_vdpa_drain_cq(priv);
 			if (priv->lm_mr.addr)
 				mlx5_os_wrapped_mkey_destroy(
@@ -205,6 +206,18 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&priv->dev_close_progress, 0,
 				__ATOMIC_RELAXED);
 			break;
+		case MLX5_VDPA_TASK_PREPARE_VIRTQ:
+			ret = mlx5_vdpa_virtq_single_resource_prepare(
+					priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to prepare virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+				task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 58466b3c0b..06a5c26947 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -116,18 +116,29 @@ mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
 	}
 }
 
+static void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq)
+{
+	/* Clean pre-created resource in dev removal only */
+	claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+	virtq->index = 0;
+	virtq->virtq = NULL;
+	virtq->configured = 0;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
+	mlx5_vdpa_steer_unset(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
-		if (virtq->index != i)
-			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
+		if (virtq->virtq)
+			mlx5_vdpa_vq_destroy(virtq);
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -157,29 +168,37 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
-		virtq->index = 0;
-		virtq->virtq = NULL;
-		virtq->configured = 0;
 	}
+	mlx5_vdpa_vq_destroy(virtq);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 }
 
 void
-mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+	bool release_resource)
 {
 	struct mlx5_vdpa_virtq *virtq;
-	int i;
-
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	uint32_t i, max_virtq, valid_vq_num;
+
+	valid_vq_num = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : priv->caps.max_num_virtio_queues;
+	max_virtq = (release_resource &&
+		(valid_vq_num) > priv->nr_virtqs) ?
+		(valid_vq_num) : priv->nr_virtqs;
+	for (i = 0; i < max_virtq; i++) {
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
-		if (i < (priv->queues * 2))
+		virtq->enable = 0;
+		if (!release_resource && i < valid_vq_num)
 			mlx5_vdpa_virtq_single_resource_prepare(
 					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
+	if (!release_resource && priv->queues &&
+		mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			mlx5_vdpa_steer_unset(priv);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -455,6 +474,9 @@ mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
 		virtq->priv = priv;
 		if (!virtq->virtq)
 			return true;
+		virtq->rx_csum = attr.rx_csum;
+		virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+		virtq->event_mode = attr.event_mode;
 	}
 	return false;
 }
@@ -538,6 +560,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 		goto error;
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->rx_csum = attr.rx_csum;
+	virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+	virtq->event_mode = attr.event_mode;
 	virtq->configured = 1;
 	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
@@ -629,6 +654,31 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 	return 0;
 }
 
+static bool
+mlx5_vdpa_is_pre_created_vq_mismatch(struct mlx5_vdpa_priv *priv,
+		struct mlx5_vdpa_virtq *virtq)
+{
+	struct rte_vhost_vring vq;
+	uint32_t event_mode;
+
+	if (virtq->rx_csum !=
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)))
+		return true;
+	if (virtq->virtio_version_1_0 !=
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1)))
+		return true;
+	if (rte_vhost_get_vhost_vring(priv->vid, virtq->index, &vq))
+		return true;
+	if (vq.size != virtq->vq_size)
+		return true;
+	event_mode = vq.callfd != -1 || !(priv->caps.event_mode &
+		(1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+		MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (virtq->event_mode != event_mode)
+		return true;
+	return false;
+}
+
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
@@ -664,6 +714,15 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			virtq = &priv->virtqs[i];
 			if (!virtq->enable)
 				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
 			if (!thrd_idx) {
 				main_task_idx[task_num] = i;
@@ -693,6 +752,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
@@ -724,20 +784,32 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	} else {
 		for (i = 0; i < nr_vring; i++) {
 			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv,
+					virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(
+					priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (virtq->enable) {
-				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
-					pthread_mutex_unlock(
+			if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+				pthread_mutex_unlock(
 						&virtq->virtq_lock);
-					goto error;
-				}
+				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
 error:
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, true);
 	return -1;
 }
 
@@ -795,6 +867,11 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 					"for virtq %d.", index);
 		}
 		mlx5_vdpa_virtq_unset(virtq);
+	} else {
+		if (virtq->virtq &&
+			mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq))
+			DRV_LOG(WARNING,
+			"Configuration mismatch dummy virtq %d.", index);
 	}
 	if (enable) {
 		ret = mlx5_vdpa_virtq_setup(priv, index, true);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 00/15] mlx5/vdpa: optimize live migration time
  2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
                   ` (18 preceding siblings ...)
  2022-06-18  8:47 ` [PATCH v3 " Li Zhang
@ 2022-06-18  9:02 ` Li Zhang
  2022-06-18  9:02   ` [PATCH v4 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
                     ` (15 more replies)
  19 siblings, 16 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

Allow the driver to use internal threads to
obtain fast configuration.
All the threads will be open on the same core of
the event completion queue scheduling thread.

Add max_conf_threads parameter to configure
the maximum number of internal threads in addition to
the caller thread (8 is suggested).
These internal threads to pipeline handle VDPA tasks
in system and shared with all VDPA devices.
Default is 0, don't use internal threads for configuration.

Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
http://patchwork.dpdk.org/project/dpdk/list/?series=21868

RFC ("Add vDPA multi-threads optiomization")
https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-lizh@nvidia.com/

V2:
* Drop eal device removal patch in series.
* Add release note in release_22_07.rst.

V3:
* Fix comments about commit log issue.
* Avoid cutting logs.

V4:
* Fix coding style issue

Li Zhang (12):
  vdpa/mlx5: fix usage of capability for max number of virtqs
  common/mlx5: extend virtq modifiable fields
  vdpa/mlx5: pre-create virtq at probe time
  vdpa/mlx5: optimize datapath-control synchronization
  vdpa/mlx5: add multi-thread management for configuration
  vdpa/mlx5: add task ring for MT management
  vdpa/mlx5: add MT task for VM memory registration
  vdpa/mlx5: add virtq creation task for MT management
  vdpa/mlx5: add virtq LM log task
  vdpa/mlx5: add device close task
  vdpa/mlx5: add virtq sub-resources creation
  vdpa/mlx5: prepare virtqueue resource creation

Yajun Wu (3):
  vdpa/mlx5: support pre create virtq resource
  common/mlx5: add DevX API to move QP to reset state
  vdpa/mlx5: support event qp reuse

 doc/guides/rel_notes/release_22_07.rst |   5 +
 doc/guides/vdpadevs/mlx5.rst           |  25 +
 drivers/common/mlx5/mlx5_devx_cmds.c   |  77 ++-
 drivers/common/mlx5/mlx5_devx_cmds.h   |   6 +-
 drivers/common/mlx5/mlx5_prm.h         |  30 +-
 drivers/vdpa/mlx5/meson.build          |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c          | 270 ++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h          | 152 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  | 360 ++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c    | 160 ++++--
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c       | 134 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c      | 270 ++++++----
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c    |  22 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 654 ++++++++++++++++++-------
 14 files changed, 1779 insertions(+), 387 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-18  9:02   ` [PATCH v4 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs, Maxime Coquelin
  Cc: dev, thomas, rasland, roniba, stable

The driver wrongly takes the capability value for
the number of virtq pairs instead of just the number of virtqs.

Adjust all the usages of it to be the number of virtqs.

Fixes: c2eb33a ("vdpa/mlx5: manage virtqs by array")
Cc: stable@dpdk.org

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 12 ++++++------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 76fa5d4299..ee71339b78 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -84,7 +84,7 @@ mlx5_vdpa_get_queue_num(struct rte_vdpa_device *vdev, uint32_t *queue_num)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	*queue_num = priv->caps.max_num_virtio_queues;
+	*queue_num = priv->caps.max_num_virtio_queues / 2;
 	return 0;
 }
 
@@ -141,7 +141,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -EINVAL;
 	}
-	if (vring >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (vring >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
@@ -388,7 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -411,7 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -624,7 +624,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		DRV_LOG(DEBUG, "No capability to support virtq statistics.");
 	priv = rte_zmalloc("mlx5 vDPA device private", sizeof(*priv) +
 			   sizeof(struct mlx5_vdpa_virtq) *
-			   attr->vdpa.max_num_virtio_queues * 2,
+			   attr->vdpa.max_num_virtio_queues,
 			   RTE_CACHE_LINE_SIZE);
 	if (!priv) {
 		DRV_LOG(ERR, "Failed to allocate private memory.");
@@ -685,7 +685,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 
 	mlx5_vdpa_dev_cache_clean(priv);
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
 			continue;
 		claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e025be47d2..c258eb3024 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -72,7 +72,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
@@ -492,9 +492,9 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		DRV_LOG(INFO, "TSO is enabled without CSUM, force CSUM.");
 		priv->features |= (1ULL << VIRTIO_NET_F_CSUM);
 	}
-	if (nr_vring > priv->caps.max_num_virtio_queues * 2) {
+	if (nr_vring > priv->caps.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Do not support more than %d virtqs(%d).",
-			(int)priv->caps.max_num_virtio_queues * 2,
+			(int)priv->caps.max_num_virtio_queues,
 			(int)nr_vring);
 		return -1;
 	}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 02/15] vdpa/mlx5: support pre create virtq resource
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
  2022-06-18  9:02   ` [PATCH v4 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-18  9:02   ` [PATCH v4 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

The motivation of this change is to reduce vDPA device queue creation
time by creating some queue resource in vDPA device probe stage.

In VM live migration scenario, this can reduce 0.8ms for each queue
creation, thus reduce LM network downtime.

To create queue resource(umem/counter) in advance, we need to know
virtio queue depth and max number of queue VM will use.

Introduce two new devargs: queues(max queue pair number) and queue_size
(queue depth). Two args must be both provided, if only one argument
provided, the argument will be ignored and no pre-creation.

The queues and queue_size must also be identical to vhost configuration
driver later receive. Otherwise either the pre-create resource is wasted
or missing or the resource need destroy and recreate(in case queue_size
mismatch).

Pre-create umem/counter will keep alive until vDPA device removal.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/vdpadevs/mlx5.rst  | 14 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c | 75 ++++++++++++++++++++++++++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h |  2 +
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 3ded142311..0ad77bf535 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -101,6 +101,20 @@ for an additional list of options shared with other mlx5 drivers.
 
   - 0, HW default.
 
+- ``queue_size`` parameter [int]
+
+  - 1 - 1024, Virio Queue depth for pre-creating queue resource to speed up
+    first time queue creation. Set it together with queues devarg.
+
+  - 0, default value, no pre-create virtq resource.
+
+- ``queues`` parameter [int]
+
+  - 1 - 128, Max number of virio queue pair(including 1 rx queue and 1 tx queue)
+    for pre-create queue resource to speed up first time queue creation. Set it
+    together with queue_size devarg.
+
+  - 0, default value, no pre-create virtq resource.
 
 Error handling
 ^^^^^^^^^^^^^^
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee71339b78..faf833ee2f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -244,7 +244,9 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 static void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
-	mlx5_vdpa_virtqs_cleanup(priv);
+	/* Clean pre-created resource in dev removal only. */
+	if (!priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_mem_dereg(priv);
 }
 
@@ -494,6 +496,12 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 		priv->hw_max_latency_us = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_pending_comp") == 0) {
 		priv->hw_max_pending_comp = (uint32_t)tmp;
+	} else if (strcmp(key, "queue_size") == 0) {
+		priv->queue_size = (uint16_t)tmp;
+	} else if (strcmp(key, "queues") == 0) {
+		priv->queues = (uint16_t)tmp;
+	} else {
+		DRV_LOG(WARNING, "Invalid key %s.", key);
 	}
 	return 0;
 }
@@ -524,9 +532,68 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 	if (!priv->event_us &&
 	    priv->event_mode == MLX5_VDPA_EVENT_MODE_DYNAMIC_TIMER)
 		priv->event_us = MLX5_VDPA_DEFAULT_TIMER_STEP_US;
+	if ((priv->queue_size && !priv->queues) ||
+		(!priv->queue_size && priv->queues)) {
+		priv->queue_size = 0;
+		priv->queues = 0;
+		DRV_LOG(WARNING, "Please provide both queue_size and queues.");
+	}
 	DRV_LOG(DEBUG, "event mode is %d.", priv->event_mode);
 	DRV_LOG(DEBUG, "event_us is %u us.", priv->event_us);
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
+	DRV_LOG(DEBUG, "queues is %u, queue_size is %u.", priv->queues,
+		priv->queue_size);
+}
+
+static int
+mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t index;
+	uint32_t i;
+
+	if (!priv->queues)
+		return 0;
+	for (index = 0; index < (priv->queues * 2); ++index) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+
+		if (priv->caps.queue_counters_valid) {
+			if (!virtq->counters)
+				virtq->counters =
+					mlx5_devx_cmd_create_virtio_q_counters
+						(priv->cdev->ctx);
+			if (!virtq->counters) {
+				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
+					" %d.", index);
+				return -1;
+			}
+		}
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size = priv->caps.umems[i].a * priv->queue_size +
+					priv->caps.umems[i].b;
+			buf = rte_zmalloc(__func__, size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+						" %u.", i, index);
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
+					size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				rte_free(buf);
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+						i, index);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
+		}
+	}
+	return 0;
 }
 
 static int
@@ -604,6 +671,8 @@ mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
 		return -rte_errno;
 	if (mlx5_vdpa_event_qp_global_prepare(priv))
 		return -rte_errno;
+	if (mlx5_vdpa_virtq_resource_prepare(priv))
+		return -rte_errno;
 	return 0;
 }
 
@@ -638,6 +707,7 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		priv->num_lag_ports = 1;
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
+	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -646,7 +716,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	mlx5_vdpa_config_get(mkvlist, priv);
 	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
@@ -684,6 +753,8 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	if (priv->queues)
+		mlx5_vdpa_virtqs_cleanup(priv);
 	mlx5_vdpa_dev_cache_clean(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		if (!priv->virtqs[i].counters)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e7f3319f89..f6719a3c60 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -135,6 +135,8 @@ struct mlx5_vdpa_priv {
 	uint8_t hw_latency_mode; /* Hardware CQ moderation mode. */
 	uint16_t hw_max_latency_us; /* Hardware CQ moderation period in usec. */
 	uint16_t hw_max_pending_comp; /* Hardware CQ moderation counter. */
+	uint16_t queue_size; /* virtq depth for pre-creating virtq resource */
+	uint16_t queues; /* Max virtq pair for pre-creating virtq resource */
 	struct rte_vdpa_device *vdev; /* vDPA device. */
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	int vid; /* vhost device id. */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 03/15] common/mlx5: add DevX API to move QP to reset state
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
  2022-06-18  9:02   ` [PATCH v4 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
  2022-06-18  9:02   ` [PATCH v4 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-18  9:02   ` [PATCH v4 04/15] vdpa/mlx5: support event qp reuse Li Zhang
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

Support set QP to RESET state.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  7 +++++++
 drivers/common/mlx5/mlx5_prm.h       | 17 +++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index c6bdbc12bb..1d6d6578d6 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2264,11 +2264,13 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_in)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_in)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_in)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_in)];
 	} in;
 	union {
 		uint32_t rst2init[MLX5_ST_SZ_DW(rst2init_qp_out)];
 		uint32_t init2rtr[MLX5_ST_SZ_DW(init2rtr_qp_out)];
 		uint32_t rtr2rts[MLX5_ST_SZ_DW(rtr2rts_qp_out)];
+		uint32_t qp2rst[MLX5_ST_SZ_DW(2rst_qp_out)];
 	} out;
 	void *qpc;
 	int ret;
@@ -2311,6 +2313,11 @@ mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp, uint32_t qp_st_mod_op,
 		inlen = sizeof(in.rtr2rts);
 		outlen = sizeof(out.rtr2rts);
 		break;
+	case MLX5_CMD_OP_QP_2RST:
+		MLX5_SET(2rst_qp_in, &in, qpn, qp->id);
+		inlen = sizeof(in.qp2rst);
+		outlen = sizeof(out.qp2rst);
+		break;
 	default:
 		DRV_LOG(ERR, "Invalid or unsupported QP modify op %u.",
 			qp_st_mod_op);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index bc3e70a1d1..8a2f55c33e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3657,6 +3657,23 @@ struct mlx5_ifc_init2init_qp_in_bits {
 	u8 reserved_at_800[0x80];
 };
 
+struct mlx5_ifc_2rst_qp_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_2rst_qp_in_bits {
+	u8 opcode[0x10];
+	u8 uid[0x10];
+	u8 vhca_tunnel_id[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_80[0x8];
+	u8 qpn[0x18];
+	u8 reserved_at_a0[0x20];
+};
+
 struct mlx5_ifc_dealloc_pd_out_bits {
 	u8 status[0x8];
 	u8 reserved_0[0x18];
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 04/15] vdpa/mlx5: support event qp reuse
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (2 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20  8:27     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
                     ` (11 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin, Yajun Wu

From: Yajun Wu <yajunw@nvidia.com>

To speed up queue create time, event qp and cq will create only once.
Each virtq creation will reuse same event qp and cq.

Because FW will set event qp to error state during virtq destroy,
need modify event qp to RESET state, then modify qp to RTS state as
usual. This can save about 1.5ms for each virtq creation.

After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
previous. Add new variable qp_ci to save SW qp ci. Move qp pi
independently with cq ci.

Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
release.

Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 +++++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index faf833ee2f..ee99952e11 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -269,6 +269,7 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
@@ -555,7 +556,14 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
+					-1, &virtq->eqp);
 
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+				index);
+			return -1;
+		}
 		if (priv->caps.queue_counters_valid) {
 			if (!virtq->counters)
 				virtq->counters =
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f6719a3c60..bf82026e37 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -55,6 +55,7 @@ struct mlx5_vdpa_event_qp {
 	struct mlx5_vdpa_cq cq;
 	struct mlx5_devx_obj *fw_qp;
 	struct mlx5_devx_qp sw_qp;
+	uint16_t qp_pi;
 };
 
 struct mlx5_vdpa_query_mr {
@@ -226,7 +227,7 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			      int callfd, struct mlx5_vdpa_event_qp *eqp);
 
 /**
@@ -479,4 +480,13 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
  */
 int
 mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
+
+/**
+ * Drain virtq CQ CQE.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 7167a98db0..b43dca9255 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -137,7 +137,7 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		};
 		uint32_t word;
 	} last_word;
-	uint16_t next_wqe_counter = cq->cq_ci;
+	uint16_t next_wqe_counter = eqp->qp_pi;
 	uint16_t cur_wqe_counter;
 	uint16_t comp;
 
@@ -156,9 +156,10 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 		rte_io_wmb();
 		/* Ring CQ doorbell record. */
 		cq->cq_obj.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci);
+		eqp->qp_pi += comp;
 		rte_io_wmb();
 		/* Ring SW QP doorbell record. */
-		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(cq->cq_ci + cq_size);
+		eqp->sw_qp.db_rec[0] = rte_cpu_to_be_32(eqp->qp_pi + cq_size);
 	}
 	return comp;
 }
@@ -232,6 +233,25 @@ mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 	return max;
 }
 
+void
+mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
+
+		mlx5_vdpa_queue_complete(cq);
+		if (cq->cq_obj.cq) {
+			cq->cq_obj.cqes[0].wqe_counter =
+				rte_cpu_to_be_16(UINT16_MAX);
+			priv->virtqs[i].eqp.qp_pi = 0;
+			if (!cq->armed)
+				mlx5_vdpa_cq_arm(priv, cq);
+		}
+	}
+}
+
 /* Wait on all CQs channel for completion event. */
 static struct mlx5_vdpa_cq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
@@ -574,14 +594,44 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
+static int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
+{
+	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
+					  eqp->sw_qp.qp->id)) {
+		DRV_LOG(ERR, "Failed to modify FW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	if (mlx5_devx_cmd_modify_qp_state(eqp->sw_qp.qp,
+			MLX5_CMD_OP_QP_2RST, eqp->fw_qp->id)) {
+		DRV_LOG(ERR, "Failed to modify SW QP to RST state(%u).",
+			rte_errno);
+		return -1;
+	}
+	return mlx5_vdpa_qps2rts(eqp);
+}
+
 int
-mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 			  int callfd, struct mlx5_vdpa_event_qp *eqp)
 {
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
+	if (eqp->cq.cq_obj.cq != NULL && log_desc_n == eqp->cq.log_desc_n) {
+		/* Reuse existing resources. */
+		eqp->cq.callfd = callfd;
+		/* FW will set event qp to error state in q destroy. */
+		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+					&eqp->sw_qp.db_rec[0]);
+			return 0;
+		}
+	}
+	if (eqp->fw_qp)
+		mlx5_vdpa_event_qp_destroy(eqp);
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
@@ -608,8 +658,10 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (mlx5_vdpa_qps2rts(eqp))
 		goto error;
+	eqp->qp_pi = 0;
 	/* First ringing. */
-	rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
+	if (eqp->sw_qp.db_rec)
+		rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 			&eqp->sw_qp.db_rec[0]);
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c258eb3024..6637ba1503 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -87,6 +87,8 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 			}
 			virtq->umems[j].size = 0;
 		}
+		if (virtq->eqp.fw_qp)
+			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	}
 }
 
@@ -117,8 +119,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	if (virtq->eqp.fw_qp)
-		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
@@ -246,7 +246,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 						      MLX5_VIRTQ_EVENT_MODE_QP :
 						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_create(priv, vq.size, vq.callfd,
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
 						&virtq->eqp);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 05/15] common/mlx5: extend virtq modifiable fields
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (3 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 04/15] vdpa/mlx5: support event qp reuse Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20  9:01     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 06/15] vdpa/mlx5: pre-create virtq at probe time Li Zhang
                     ` (10 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

A virtq configuration can be modified after the virtq creation.
Added the following modifiable fields:
1.address fields: desc_addr/used_addr/available_addr
2.hw_available_index
3.hw_used_index
4.virtio_q_type
5.version type
6.queue mkey
7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
8.event mode: event_mode/event_qpn_or_msix

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++++++++++++++++++++++-----
 drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
 drivers/common/mlx5/mlx5_prm.h       | 13 +++++-
 3 files changed, 76 insertions(+), 13 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1d6d6578d6..1b68c37092 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -545,6 +545,15 @@ mlx5_devx_cmd_query_hca_vdpa_attr(void *ctx,
 		vdpa_attr->log_doorbell_stride =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_stride);
+		vdpa_attr->vnet_modify_ext =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 vnet_modify_ext);
+		vdpa_attr->virtio_net_q_addr_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_net_q_addr_modify);
+		vdpa_attr->virtio_q_index_modify =
+			MLX5_GET(virtio_emulation_cap, hcattr,
+				 virtio_q_index_modify);
 		vdpa_attr->log_doorbell_bar_size =
 			MLX5_GET(virtio_emulation_cap, hcattr,
 				 log_doorbell_bar_size);
@@ -2074,27 +2083,66 @@ mlx5_devx_cmd_modify_virtq(struct mlx5_devx_obj *virtq_obj,
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 		 MLX5_GENERAL_OBJ_TYPE_VIRTQ);
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, virtq_obj->id);
-	MLX5_SET64(virtio_net_q, virtq, modify_field_select, attr->type);
+	MLX5_SET64(virtio_net_q, virtq, modify_field_select,
+		attr->mod_fields_bitmap);
 	MLX5_SET16(virtio_q, virtctx, queue_index, attr->queue_index);
-	switch (attr->type) {
-	case MLX5_VIRTQ_MODIFY_TYPE_STATE:
+	if (!attr->mod_fields_bitmap) {
+		DRV_LOG(ERR, "Failed to modify VIRTQ for no type set.");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_STATE)
 		MLX5_SET16(virtio_net_q, virtq, state, attr->state);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS:
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS) {
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_mkey,
 			 attr->dirty_bitmap_mkey);
 		MLX5_SET64(virtio_net_q, virtq, dirty_bitmap_addr,
 			 attr->dirty_bitmap_addr);
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_size,
 			 attr->dirty_bitmap_size);
-		break;
-	case MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE:
+	}
+	if (attr->mod_fields_bitmap &
+	    MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE)
 		MLX5_SET(virtio_net_q, virtq, dirty_bitmap_dump_enable,
 			 attr->dirty_bitmap_dump_enable);
-		break;
-	default:
-		rte_errno = EINVAL;
-		return -rte_errno;
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD) {
+		MLX5_SET(virtio_q, virtctx, queue_period_mode,
+			attr->hw_latency_mode);
+		MLX5_SET(virtio_q, virtctx, queue_period_us,
+			attr->hw_max_latency_us);
+		MLX5_SET(virtio_q, virtctx, queue_max_count,
+			attr->hw_max_pending_comp);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_ADDR) {
+		MLX5_SET64(virtio_q, virtctx, desc_addr, attr->desc_addr);
+		MLX5_SET64(virtio_q, virtctx, used_addr, attr->used_addr);
+		MLX5_SET64(virtio_q, virtctx, available_addr,
+			attr->available_addr);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_available_index,
+		   attr->hw_available_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX)
+		MLX5_SET16(virtio_net_q, virtq, hw_used_index,
+			attr->hw_used_index);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE)
+		MLX5_SET16(virtio_q, virtctx, virtio_q_type, attr->q_type);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0)
+		MLX5_SET16(virtio_q, virtctx, virtio_version_1_0,
+		   attr->virtio_version_1_0);
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY)
+		MLX5_SET(virtio_q, virtctx, virtio_q_mkey, attr->mkey);
+	if (attr->mod_fields_bitmap &
+		MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK) {
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv4, attr->tso_ipv4);
+		MLX5_SET16(virtio_net_q, virtq, tso_ipv6, attr->tso_ipv6);
+		MLX5_SET16(virtio_net_q, virtq, tx_csum, attr->tx_csum);
+		MLX5_SET16(virtio_net_q, virtq, rx_csum, attr->rx_csum);
+	}
+	if (attr->mod_fields_bitmap & MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE) {
+		MLX5_SET16(virtio_q, virtctx, event_mode, attr->event_mode);
+		MLX5_SET(virtio_q, virtctx, event_qpn_or_msix, attr->qp_id);
 	}
 	ret = mlx5_glue->devx_obj_modify(virtq_obj->obj, in, sizeof(in),
 					 out, sizeof(out));
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 3747ef9e33..ec6467d927 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -74,6 +74,9 @@ struct mlx5_hca_vdpa_attr {
 	uint32_t log_doorbell_stride:5;
 	uint32_t log_doorbell_bar_size:5;
 	uint32_t queue_counters_valid:1;
+	uint32_t vnet_modify_ext:1;
+	uint32_t virtio_net_q_addr_modify:1;
+	uint32_t virtio_q_index_modify:1;
 	uint32_t max_num_virtio_queues;
 	struct {
 		uint32_t a;
@@ -465,7 +468,7 @@ struct mlx5_devx_virtq_attr {
 	uint32_t tis_id;
 	uint32_t counters_obj_id;
 	uint64_t dirty_bitmap_addr;
-	uint64_t type;
+	uint64_t mod_fields_bitmap;
 	uint64_t desc_addr;
 	uint64_t used_addr;
 	uint64_t available_addr;
@@ -475,6 +478,7 @@ struct mlx5_devx_virtq_attr {
 		uint64_t offset;
 	} umems[3];
 	uint8_t error_type;
+	uint8_t q_type;
 };
 
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 8a2f55c33e..5f58a6ee1d 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1802,7 +1802,9 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
 	u8 virtio_queue_type[0x8];
 	u8 reserved_at_20[0x13];
 	u8 log_doorbell_stride[0x5];
-	u8 reserved_at_3b[0x3];
+	u8 vnet_modify_ext[0x1];
+	u8 virtio_net_q_addr_modify[0x1];
+	u8 virtio_q_index_modify[0x1];
 	u8 log_doorbell_bar_size[0x5];
 	u8 doorbell_bar_offset[0x40];
 	u8 reserved_at_80[0x8];
@@ -3024,6 +3026,15 @@ enum {
 	MLX5_VIRTQ_MODIFY_TYPE_STATE = (1UL << 0),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS = (1UL << 3),
 	MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE = (1UL << 4),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD = (1UL << 5),
+	MLX5_VIRTQ_MODIFY_TYPE_ADDR = (1UL << 6),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX = (1UL << 7),
+	MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX = (1UL << 8),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE = (1UL << 9),
+	MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 = (1UL << 10),
+	MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY = (1UL << 11),
+	MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK = (1UL << 12),
+	MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE = (1UL << 13),
 };
 
 struct mlx5_ifc_virtio_q_bits {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 06/15] vdpa/mlx5: pre-create virtq at probe time
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (4 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-18  9:02   ` [PATCH v4 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

dev_config operation is called in LM progress.
LM time is very critical because all
the VM packets are dropped directly at that time.

Move the virtq creation to probe time and
only modify the configuration later in
the dev_config stage using the new ability
to modify virtq.

This optimization accelerates the LM process and
reduces its time by 70%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_22_07.rst |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa.h          |   4 +
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c       |  19 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 257 +++++++++++++++----------
 4 files changed, 176 insertions(+), 108 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index f2cf41def9..2056cd9ee7 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -175,6 +175,10 @@ New Features
   This is a fall-back implementation for platforms that
   don't support vector operations.
 
+* **Updated Nvidia mlx5 vDPA driver.**
+
+  * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources.
+
 
 Removed Items
 -------------
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index bf82026e37..e5553079fe 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ struct mlx5_vdpa_virtq {
 	uint16_t vq_size;
 	uint8_t notifier_state;
 	bool stopped;
+	uint32_t configured:1;
 	uint32_t version;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
@@ -489,4 +490,7 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid);
  */
 void
 mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 43a2b98255..284758ad56 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -12,20 +12,21 @@ int
 mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
+		.mod_fields_bitmap =
+			MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_DUMP_ENABLE,
 		.dirty_bitmap_dump_enable = enable,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
-			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap "
-				"enabling.", i);
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
+			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap enabling.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 			   &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for dirty "
-				"bitmap enabling.", i);
+			DRV_LOG(ERR, "Failed to modify virtq %d for dirty bitmap enabling.", i);
 			return -1;
 		}
 	}
@@ -37,10 +38,11 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 			   uint64_t log_size)
 {
 	struct mlx5_devx_virtq_attr attr = {
-		.type = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
+		.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_DIRTY_BITMAP_PARAMS,
 		.dirty_bitmap_addr = log_base,
 		.dirty_bitmap_size = log_size,
 	};
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 	int ret = mlx5_os_wrapped_mkey_create(priv->cdev->ctx, priv->cdev->pd,
 					      priv->cdev->pdn,
@@ -54,7 +56,8 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	attr.dirty_bitmap_mkey = priv->lm_mr.lkey;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
 		attr.queue_index = i;
-		if (!priv->virtqs[i].virtq) {
+		virtq = &priv->virtqs[i];
+		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
 		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 						      &attr)) {
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6637ba1503..6e08d619e4 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -75,6 +75,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -111,11 +112,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		rte_intr_fd_set(virtq->intr_handle, -1);
 	}
 	rte_intr_instance_free(virtq->intr_handle);
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
+		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
@@ -138,7 +140,7 @@ int
 mlx5_vdpa_virtq_modify(struct mlx5_vdpa_virtq *virtq, int state)
 {
 	struct mlx5_devx_virtq_attr attr = {
-			.type = MLX5_VIRTQ_MODIFY_TYPE_STATE,
+			.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE,
 			.state = state ? MLX5_VIRTQ_STATE_RDY :
 					 MLX5_VIRTQ_STATE_SUSPEND,
 			.queue_index = virtq->index,
@@ -153,7 +155,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	int ret;
 
-	if (virtq->stopped)
+	if (virtq->stopped || !virtq->configured)
 		return 0;
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
@@ -209,51 +211,54 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
+		struct mlx5_devx_virtq_attr *attr,
+		struct rte_vhost_vring *vq, int index)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
-	struct rte_vhost_vring vq;
-	struct mlx5_devx_virtq_attr attr = {0};
 	uint64_t gpa;
 	int ret;
 	unsigned int i;
-	uint16_t last_avail_idx;
-	uint16_t last_used_idx;
-	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
-	uint64_t cookie;
-
-	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
-	if (ret)
-		return -1;
-	if (vq.size == 0)
-		return 0;
-	virtq->index = index;
-	virtq->vq_size = vq.size;
-	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr.tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr.tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr.rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr.virtio_version_1_0 = !!(priv->features & (1ULL <<
-							VIRTIO_F_VERSION_1));
-	attr.type = (priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
+	uint16_t last_avail_idx = 0;
+	uint16_t last_used_idx = 0;
+
+	if (virtq->virtq)
+		attr->mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE |
+			MLX5_VIRTQ_MODIFY_TYPE_ADDR |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_AVAILABLE_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_HW_USED_INDEX |
+			MLX5_VIRTQ_MODIFY_TYPE_VERSION_1_0 |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_TYPE |
+			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
+			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
+			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 =
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
+	attr->q_type =
+		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
 			MLX5_VIRTQ_TYPE_PACKED : MLX5_VIRTQ_TYPE_SPLIT;
 	/*
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr.event_mode = vq.callfd != -1 || !(priv->caps.event_mode & (1 <<
-					       MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
-						      MLX5_VIRTQ_EVENT_MODE_QP :
-						  MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
-	if (attr.event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv, vq.size, vq.callfd,
-						&virtq->eqp);
+	attr->event_mode = vq->callfd != -1 ||
+	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
+		ret = mlx5_vdpa_event_qp_prepare(priv,
+				vq->size, vq->callfd, &virtq->eqp);
 		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
+			DRV_LOG(ERR,
+				"Failed to create event QPs for virtq %d.",
 				index);
 			return -1;
 		}
-		attr.qp_id = virtq->eqp.fw_qp->id;
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
+		attr->qp_id = virtq->eqp.fw_qp->id;
 	} else {
 		DRV_LOG(INFO, "Virtq %d is, for sure, working by poll mode, no"
 			" need event QPs and event mechanism.", index);
@@ -265,77 +270,82 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (!virtq->counters) {
 			DRV_LOG(ERR, "Failed to create virtq couners for virtq"
 				" %d.", index);
-			goto error;
+			return -1;
 		}
-		attr.counters_obj_id = virtq->counters->id;
+		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		uint32_t size;
-		void *buf;
-		struct mlx5dv_devx_umem *obj;
-
-		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
-		if (virtq->umems[i].size == size &&
-		    virtq->umems[i].obj != NULL) {
-			/* Reuse registered memory. */
-			memset(virtq->umems[i].buf, 0, size);
-			goto reuse;
-		}
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
+	if (virtq->virtq) {
+		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
+			uint32_t size;
+			void *buf;
+			struct mlx5dv_devx_umem *obj;
+
+			size =
+		priv->caps.umems[i].a * vq->size + priv->caps.umems[i].b;
+			if (virtq->umems[i].size == size &&
+				virtq->umems[i].obj != NULL) {
+				/* Reuse registered memory. */
+				memset(virtq->umems[i].buf, 0, size);
+				goto reuse;
+			}
+			if (virtq->umems[i].obj)
+				claim_zero(mlx5_glue->devx_umem_dereg
 				   (virtq->umems[i].obj));
-		if (virtq->umems[i].buf)
-			rte_free(virtq->umems[i].buf);
-		virtq->umems[i].size = 0;
-		virtq->umems[i].obj = NULL;
-		virtq->umems[i].buf = NULL;
-		buf = rte_zmalloc(__func__, size, 4096);
-		if (buf == NULL) {
-			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+			if (virtq->umems[i].buf)
+				rte_free(virtq->umems[i].buf);
+			virtq->umems[i].size = 0;
+			virtq->umems[i].obj = NULL;
+			virtq->umems[i].buf = NULL;
+			buf = rte_zmalloc(__func__,
+				size, 4096);
+			if (buf == NULL) {
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
-			goto error;
-		}
-		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
-					       IBV_ACCESS_LOCAL_WRITE);
-		if (obj == NULL) {
-			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
+				return -1;
+			}
+			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
+				buf, size, IBV_ACCESS_LOCAL_WRITE);
+			if (obj == NULL) {
+				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
-			goto error;
-		}
-		virtq->umems[i].size = size;
-		virtq->umems[i].buf = buf;
-		virtq->umems[i].obj = obj;
+				rte_free(buf);
+				return -1;
+			}
+			virtq->umems[i].size = size;
+			virtq->umems[i].buf = buf;
+			virtq->umems[i].obj = obj;
 reuse:
-		attr.umems[i].id = virtq->umems[i].obj->umem_id;
-		attr.umems[i].offset = 0;
-		attr.umems[i].size = virtq->umems[i].size;
+			attr->umems[i].id = virtq->umems[i].obj->umem_id;
+			attr->umems[i].offset = 0;
+			attr->umems[i].size = virtq->umems[i].size;
+		}
 	}
-	if (attr.type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.desc);
+					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
-			goto error;
+			return -1;
 		}
-		attr.desc_addr = gpa;
+		attr->desc_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.used);
+					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
-			goto error;
+			return -1;
 		}
-		attr.used_addr = gpa;
+		attr->used_addr = gpa;
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
-					   (uint64_t)(uintptr_t)vq.avail);
+					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-			goto error;
+			return -1;
 		}
-		attr.available_addr = gpa;
+		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid, index, &last_avail_idx,
-				 &last_used_idx);
+	ret = rte_vhost_get_vring_base(priv->vid,
+			index, &last_avail_idx, &last_used_idx);
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
@@ -345,24 +355,71 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
 	}
-	attr.hw_available_index = last_avail_idx;
-	attr.hw_used_index = last_used_idx;
-	attr.q_size = vq.size;
-	attr.mkey = priv->gpa_mkey_index;
-	attr.tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
-	attr.queue_index = index;
-	attr.pd = priv->cdev->pdn;
-	attr.hw_latency_mode = priv->hw_latency_mode;
-	attr.hw_max_latency_us = priv->hw_max_latency_us;
-	attr.hw_max_pending_comp = priv->hw_max_pending_comp;
-	virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+	attr->hw_available_index = last_avail_idx;
+	attr->hw_used_index = last_used_idx;
+	attr->q_size = vq->size;
+	attr->mkey = priv->gpa_mkey_index;
+	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
+	attr->queue_index = index;
+	attr->pd = priv->cdev->pdn;
+	attr->hw_latency_mode = priv->hw_latency_mode;
+	attr->hw_max_latency_us = priv->hw_max_latency_us;
+	attr->hw_max_pending_comp = priv->hw_max_pending_comp;
+	if (attr->hw_latency_mode || attr->hw_max_latency_us ||
+		attr->hw_max_pending_comp)
+		attr->mod_fields_bitmap |= MLX5_VIRTQ_MODIFY_TYPE_QUEUE_PERIOD;
+	return 0;
+}
+
+bool
+mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
+{
+	return (priv->caps.vnet_modify_ext &&
+			priv->caps.virtio_net_q_addr_modify &&
+			priv->caps.virtio_q_index_modify) ? true : false;
+}
+
+static int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+{
+	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+	struct rte_vhost_vring vq;
+	struct mlx5_devx_virtq_attr attr = {0};
+	int ret;
+	uint16_t event_num = MLX5_EVENT_TYPE_OBJECT_CHANGE;
+	uint64_t cookie;
+
+	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
+	if (ret)
+		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->priv = priv;
-	if (!virtq->virtq)
+	virtq->stopped = 0;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
+				&vq, index);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to setup update virtq attr %d.",
+			index);
 		goto error;
-	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
-	if (mlx5_vdpa_virtq_modify(virtq, 1))
+	}
+	if (!virtq->virtq) {
+		virtq->index = index;
+		virtq->vq_size = vq.size;
+		virtq->virtq = mlx5_devx_cmd_create_virtq(priv->cdev->ctx,
+			&attr);
+		if (!virtq->virtq)
+			goto error;
+		attr.mod_fields_bitmap = MLX5_VIRTQ_MODIFY_TYPE_STATE;
+	}
+	attr.state = MLX5_VIRTQ_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_virtq(virtq->virtq, &attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to modify virtq %d.", index);
 		goto error;
-	virtq->priv = priv;
+	}
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->configured = 1;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
@@ -553,7 +610,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 			return 0;
 		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
 	}
-	if (virtq->virtq) {
+	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
 			ret = mlx5_vdpa_steer_update(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 07/15] vdpa/mlx5: optimize datapath-control synchronization
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (5 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 06/15] vdpa/mlx5: pre-create virtq at probe time Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20  9:25     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
                     ` (8 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The driver used a single global lock for any synchronization
needed for the datapath and control path.
It is better to group the critical sections with
the other ones that should be synchronized.

Replace the global lock with the following locks:

1.virtq locks(per virtq) synchronize datapath polling and
  parallel configurations on the same virtq.
2.A doorbell lock synchronizes doorbell update,
  which is shared for all the virtqs in the device.
3.A steering lock for the shared steering objects updates.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 24 ++++---
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 13 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++++++++++++++++++-----------
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c    | 36 ++++++++---
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++++++++++++++++++-------
 6 files changed, 186 insertions(+), 79 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee99952e11..e5a11f72fd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -135,6 +135,7 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
 	struct mlx5_vdpa_priv *priv =
 		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	struct mlx5_vdpa_virtq *virtq;
 	int ret;
 
 	if (priv == NULL) {
@@ -145,9 +146,10 @@ mlx5_vdpa_set_vring_state(int vid, int vring, int state)
 		DRV_LOG(ERR, "Too big vring id: %d.", vring);
 		return -E2BIG;
 	}
-	pthread_mutex_lock(&priv->vq_config_lock);
+	virtq = &priv->virtqs[vring];
+	pthread_mutex_lock(&virtq->virtq_lock);
 	ret = mlx5_vdpa_virtq_enable(priv, vring, state);
-	pthread_mutex_unlock(&priv->vq_config_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	return ret;
 }
 
@@ -267,7 +269,9 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
@@ -276,8 +280,6 @@ mlx5_vdpa_dev_close(int vid)
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
-	/* The mutex may stay locked after event thread cancel - initiate it. */
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -549,15 +551,21 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t index;
 	uint32_t i;
 
+	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+		index++) {
+		virtq = &priv->virtqs[index];
+		pthread_mutex_init(&virtq->virtq_lock, NULL);
+	}
 	if (!priv->queues)
 		return 0;
 	for (index = 0; index < (priv->queues * 2); ++index) {
-		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
+		virtq = &priv->virtqs[index];
 		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, &virtq->eqp);
+					-1, virtq);
 
 		if (ret) {
 			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
@@ -713,7 +721,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
+	rte_spinlock_init(&priv->db_lock);
+	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
 	if (mlx5_vdpa_create_dev_resources(priv))
@@ -797,7 +806,6 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
-	pthread_mutex_destroy(&priv->vq_config_lock);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e5553079fe..3fd5eefc5e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -82,6 +82,7 @@ struct mlx5_vdpa_virtq {
 	bool stopped;
 	uint32_t configured:1;
 	uint32_t version;
+	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -126,7 +127,8 @@ struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
 	enum mlx5_dev_state state;
-	pthread_mutex_t vq_config_lock;
+	rte_spinlock_t db_lock;
+	pthread_mutex_t steer_update_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
 	int event_mode;
@@ -222,14 +224,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   Number of descriptors.
  * @param[in] callfd
  *   The guest notification file descriptor.
- * @param[in/out] eqp
- *   Pointer to the event QP structure.
+ * @param[in/out] virtq
+ *   Pointer to the virt-queue structure.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
-int mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			      int callfd, struct mlx5_vdpa_event_qp *eqp);
+int
+mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
+	int callfd, struct mlx5_vdpa_virtq *virtq);
 
 /**
  * Destroy an event QP and all its related resources.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b43dca9255..2b0f5936d1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -85,12 +85,13 @@ mlx5_vdpa_cq_arm(struct mlx5_vdpa_priv *priv, struct mlx5_vdpa_cq *cq)
 
 static int
 mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
-		    int callfd, struct mlx5_vdpa_cq *cq)
+		int callfd, struct mlx5_vdpa_virtq *virtq)
 {
 	struct mlx5_devx_cq_attr attr = {
 		.use_first_only = 1,
 		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
 	};
+	struct mlx5_vdpa_cq *cq = &virtq->eqp.cq;
 	uint16_t event_nums[1] = {0};
 	int ret;
 
@@ -102,10 +103,11 @@ mlx5_vdpa_cq_create(struct mlx5_vdpa_priv *priv, uint16_t log_desc_n,
 	cq->log_desc_n = log_desc_n;
 	rte_spinlock_init(&cq->sl);
 	/* Subscribe CQ event to the event channel controlled by the driver. */
-	ret = mlx5_os_devx_subscribe_devx_event(priv->eventc,
-						cq->cq_obj.cq->obj,
-						sizeof(event_nums), event_nums,
-						(uint64_t)(uintptr_t)cq);
+	ret = mlx5_glue->devx_subscribe_devx_event(priv->eventc,
+							cq->cq_obj.cq->obj,
+						   sizeof(event_nums),
+						   event_nums,
+						   (uint64_t)(uintptr_t)virtq);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to subscribe CQE event.");
 		rte_errno = errno;
@@ -167,13 +169,17 @@ mlx5_vdpa_cq_poll(struct mlx5_vdpa_cq *cq)
 static void
 mlx5_vdpa_arm_all_cqs(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_cq *cq;
 	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		cq = &priv->virtqs[i].eqp.cq;
 		if (cq->cq_obj.cq && !cq->armed)
 			mlx5_vdpa_cq_arm(priv, cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
@@ -220,13 +226,18 @@ mlx5_vdpa_queue_complete(struct mlx5_vdpa_cq *cq)
 static uint32_t
 mlx5_vdpa_queues_complete(struct mlx5_vdpa_priv *priv)
 {
-	int i;
+	struct mlx5_vdpa_virtq *virtq;
+	struct mlx5_vdpa_cq *cq;
 	uint32_t max = 0;
+	uint32_t comp;
+	int i;
 
 	for (i = 0; i < priv->nr_virtqs; i++) {
-		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
-		uint32_t comp = mlx5_vdpa_queue_complete(cq);
-
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		cq = &virtq->eqp.cq;
+		comp = mlx5_vdpa_queue_complete(cq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		if (comp > max)
 			max = comp;
 	}
@@ -253,7 +264,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 }
 
 /* Wait on all CQs channel for completion event. */
-static struct mlx5_vdpa_cq *
+static struct mlx5_vdpa_virtq *
 mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 {
 #ifdef HAVE_IBV_DEVX_EVENT
@@ -265,7 +276,8 @@ mlx5_vdpa_event_wait(struct mlx5_vdpa_priv *priv __rte_unused)
 					    sizeof(out.buf));
 
 	if (ret >= 0)
-		return (struct mlx5_vdpa_cq *)(uintptr_t)out.event_resp.cookie;
+		return (struct mlx5_vdpa_virtq *)
+				(uintptr_t)out.event_resp.cookie;
 	DRV_LOG(INFO, "Got error in devx_get_event, ret = %d, errno = %d.",
 		ret, errno);
 #endif
@@ -276,7 +288,7 @@ static void *
 mlx5_vdpa_event_handle(void *arg)
 {
 	struct mlx5_vdpa_priv *priv = arg;
-	struct mlx5_vdpa_cq *cq;
+	struct mlx5_vdpa_virtq *virtq;
 	uint32_t max;
 
 	switch (priv->event_mode) {
@@ -284,7 +296,6 @@ mlx5_vdpa_event_handle(void *arg)
 	case MLX5_VDPA_EVENT_MODE_FIXED_TIMER:
 		priv->timer_delay_us = priv->event_us;
 		while (1) {
-			pthread_mutex_lock(&priv->vq_config_lock);
 			max = mlx5_vdpa_queues_complete(priv);
 			if (max == 0 && priv->no_traffic_counter++ >=
 			    priv->no_traffic_max) {
@@ -292,32 +303,37 @@ mlx5_vdpa_event_handle(void *arg)
 					priv->vdev->device->name);
 				mlx5_vdpa_arm_all_cqs(priv);
 				do {
-					pthread_mutex_unlock
-							(&priv->vq_config_lock);
-					cq = mlx5_vdpa_event_wait(priv);
-					pthread_mutex_lock
-							(&priv->vq_config_lock);
-					if (cq == NULL ||
-					       mlx5_vdpa_queue_complete(cq) > 0)
+					virtq = mlx5_vdpa_event_wait(priv);
+					if (virtq == NULL)
 						break;
+					pthread_mutex_lock(
+						&virtq->virtq_lock);
+					if (mlx5_vdpa_queue_complete(
+						&virtq->eqp.cq) > 0) {
+						pthread_mutex_unlock(
+							&virtq->virtq_lock);
+						break;
+					}
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
 				} while (1);
 				priv->timer_delay_us = priv->event_us;
 				priv->no_traffic_counter = 0;
 			} else if (max != 0) {
 				priv->no_traffic_counter = 0;
 			}
-			pthread_mutex_unlock(&priv->vq_config_lock);
 			mlx5_vdpa_timer_sleep(priv, max);
 		}
 		return NULL;
 	case MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT:
 		do {
-			cq = mlx5_vdpa_event_wait(priv);
-			if (cq != NULL) {
-				pthread_mutex_lock(&priv->vq_config_lock);
-				if (mlx5_vdpa_queue_complete(cq) > 0)
-					mlx5_vdpa_cq_arm(priv, cq);
-				pthread_mutex_unlock(&priv->vq_config_lock);
+			virtq = mlx5_vdpa_event_wait(priv);
+			if (virtq != NULL) {
+				pthread_mutex_lock(&virtq->virtq_lock);
+				if (mlx5_vdpa_queue_complete(
+					&virtq->eqp.cq) > 0)
+					mlx5_vdpa_cq_arm(priv, &virtq->eqp.cq);
+				pthread_mutex_unlock(&virtq->virtq_lock);
 			}
 		} while (1);
 		return NULL;
@@ -339,7 +355,6 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t sec;
 
-	pthread_mutex_lock(&priv->vq_config_lock);
 	while (mlx5_glue->devx_get_event(priv->err_chnl, &out.event_resp,
 					 sizeof(out.buf)) >=
 				       (ssize_t)sizeof(out.event_resp.cookie)) {
@@ -351,10 +366,11 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			continue;
 		}
 		virtq = &priv->virtqs[vq_index];
+		pthread_mutex_lock(&virtq->virtq_lock);
 		if (!virtq->enable || virtq->version != version)
-			continue;
+			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
-			continue;
+			goto unlock;
 		virtq->stopped = true;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
@@ -384,8 +400,9 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 		for (i = 1; i < RTE_DIM(virtq->err_time); i++)
 			virtq->err_time[i - 1] = virtq->err_time[i];
 		virtq->err_time[RTE_DIM(virtq->err_time) - 1] = rte_rdtsc();
+unlock:
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
-	pthread_mutex_unlock(&priv->vq_config_lock);
 #endif
 }
 
@@ -533,11 +550,18 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 void
 mlx5_vdpa_cqe_event_unset(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	void *status;
+	int i;
 
 	if (priv->timer_tid) {
 		pthread_cancel(priv->timer_tid);
 		pthread_join(priv->timer_tid, &status);
+		/* The mutex may stay locked after event thread cancel, initiate it. */
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_init(&virtq->virtq_lock, NULL);
+		}
 	}
 	priv->timer_tid = 0;
 }
@@ -614,8 +638,9 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-			  int callfd, struct mlx5_vdpa_event_qp *eqp)
+	int callfd, struct mlx5_vdpa_virtq *virtq)
 {
+	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
@@ -632,7 +657,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
-	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
+	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, virtq) ||
+		!eqp->cq.cq_obj.cq)
 		return -1;
 	attr.pd = priv->cdev->pdn;
 	attr.ts_format =
@@ -650,8 +676,8 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	attr.ts_format =
 		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
 	ret = mlx5_devx_qp_create(priv->cdev->ctx, &(eqp->sw_qp),
-					attr.num_of_receive_wqes *
-					MLX5_WSEG_SIZE, &attr, SOCKET_ID_ANY);
+				  attr.num_of_receive_wqes * MLX5_WSEG_SIZE,
+				  &attr, SOCKET_ID_ANY);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
 		goto error;
@@ -668,3 +694,4 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	mlx5_vdpa_event_qp_destroy(eqp);
 	return -1;
 }
+
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index 284758ad56..bfa5d4d571 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -24,10 +24,19 @@ mlx5_vdpa_logging_enable(struct mlx5_vdpa_priv *priv, int enable)
 		virtq = &priv->virtqs[i];
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for dirty bitmap enabling.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
 			   &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for dirty bitmap enabling.", i);
-			return -1;
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+					"Failed to modify virtq %d for dirty bitmap enabling.",
+					i);
+				return -1;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -59,10 +68,19 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 		virtq = &priv->virtqs[i];
 		if (!virtq->configured) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM.", i);
-		} else if (mlx5_devx_cmd_modify_virtq(priv->virtqs[i].virtq,
-						      &attr)) {
-			DRV_LOG(ERR, "Failed to modify virtq %d for LM.", i);
-			goto err;
+		} else {
+			struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_devx_cmd_modify_virtq(
+					priv->virtqs[i].virtq,
+					&attr)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to modify virtq %d for LM.", i);
+				goto err;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
@@ -77,6 +95,7 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	int i;
@@ -88,10 +107,13 @@ mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 	if (!RTE_VHOST_NEED_LOG(features))
 		return 0;
 	for (i = 0; i < priv->nr_virtqs; ++i) {
+		virtq = &priv->virtqs[i];
 		if (!priv->virtqs[i].virtq) {
 			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
 		} else {
+			pthread_mutex_lock(&virtq->virtq_lock);
 			ret = mlx5_vdpa_virtq_stop(priv, i);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 			if (ret) {
 				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
 					"log.", i);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index d4b4375c88..4cbf09784e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -237,19 +237,24 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 {
-	int ret = mlx5_vdpa_rqt_prepare(priv);
+	int ret;
 
+	pthread_mutex_lock(&priv->steer_update_lock);
+	ret = mlx5_vdpa_rqt_prepare(priv);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
+		pthread_mutex_unlock(&priv->steer_update_lock);
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
 		ret = mlx5_vdpa_rss_flows_create(priv);
 		if (ret) {
 			DRV_LOG(ERR, "Cannot create RSS flows.");
+			pthread_mutex_unlock(&priv->steer_update_lock);
 			return -1;
 		}
 	}
+	pthread_mutex_unlock(&priv->steer_update_lock);
 	return 0;
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6e08d619e4..63b6f44725 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -24,13 +24,17 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	pthread_mutex_lock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
 		return;
 	}
-	if (rte_intr_fd_get(virtq->intr_handle) < 0)
+	if (rte_intr_fd_get(virtq->intr_handle) < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
 	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
@@ -44,9 +48,14 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 		}
 		break;
 	}
-	if (nbytes < 0)
+	if (nbytes < 0) {
+		pthread_mutex_unlock(&virtq->virtq_lock);
 		return;
+	}
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
+	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
 		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
 			priv->vid, virtq->index);
@@ -66,6 +75,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Virtq must be locked before calling this function. */
+static void
+mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
+{
+	int ret = -EAGAIN;
+
+	if (!virtq->intr_handle)
+		return;
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
+			ret = rte_intr_callback_unregister(virtq->intr_handle,
+					mlx5_vdpa_virtq_kick_handler, virtq);
+			if (ret == -EAGAIN) {
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					virtq->index);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
+				pthread_mutex_lock(&virtq->virtq_lock);
+			}
+		}
+		(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	}
+	rte_intr_instance_free(virtq->intr_handle);
+	virtq->intr_handle = NULL;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
@@ -75,6 +111,7 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		pthread_mutex_lock(&virtq->virtq_lock);
 		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
@@ -90,28 +127,17 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 		}
 		if (virtq->eqp.fw_qp)
 			mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 }
 
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
-		while (ret == -EAGAIN) {
-			ret = rte_intr_callback_unregister(virtq->intr_handle,
-					mlx5_vdpa_virtq_kick_handler, virtq);
-			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
-					rte_intr_fd_get(virtq->intr_handle),
-					virtq->index);
-				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
-			}
-		}
-		rte_intr_fd_set(virtq->intr_handle, -1);
-	}
-	rte_intr_instance_free(virtq->intr_handle);
+	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
 		ret = mlx5_vdpa_virtq_stop(virtq->priv, virtq->index);
 		if (ret)
@@ -128,10 +154,15 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_virtq *virtq;
 	int i;
 
-	for (i = 0; i < priv->nr_virtqs; i++)
-		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unset(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -250,7 +281,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
 		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, &virtq->eqp);
+				vq->size, vq->callfd, virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -420,7 +451,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	virtq->configured = 1;
+	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
 	virtq->intr_handle =
 		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
@@ -441,7 +474,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		if (rte_intr_callback_register(virtq->intr_handle,
 					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
-			rte_intr_fd_set(virtq->intr_handle, -1);
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
@@ -537,6 +570,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	struct mlx5_vdpa_virtq *virtq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -556,9 +590,17 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++)
-		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-			goto error;
+	for (i = 0; i < nr_vring; i++) {
+		virtq = &priv->virtqs[i];
+		if (virtq->enable) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (mlx5_vdpa_virtq_setup(priv, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 08/15] vdpa/mlx5: add multi-thread management for configuration
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (6 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20 10:57     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
                     ` (7 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The LM process includes a lot of objects creations and
destructions in the source and the destination servers.
As much as LM time increases, the packet drop of the VM increases.
To improve LM time need to parallel the configurations for mlx5 FW.
Add internal multi-thread management in the driver for it.

A new devarg defines the number of threads and their CPU.
The management is shared between all the devices of the driver.
Since the event_core also affects the datapath events thread,
reduce the priority of the datapath event thread to
allow fast configuration of the devices doing the LM.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst          |  11 +++
 drivers/vdpa/mlx5/meson.build         |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c         |  41 ++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  36 +++++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++++++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
 7 files changed, 223 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 0ad77bf535..b75a01688d 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -78,6 +78,17 @@ for an additional list of options shared with other mlx5 drivers.
   CPU core number to set polling thread affinity to, default to control plane
   cpu.
 
+- ``max_conf_threads`` parameter [int]
+
+  Allow the driver to use internal threads to obtain fast configuration.
+  All the threads will be open on the same core of the event completion queue scheduling thread.
+
+  - 0, default, don't use internal threads for configuration.
+
+  - 1 - 256, number of internal threads in addition to the caller thread (8 is suggested).
+    This value, if not 0, should be the same for all the devices;
+    the first prob will take it with the event_core for all the multi-thread configurations in the driver.
+
 - ``hw_latency_mode`` parameter [int]
 
   The completion queue moderation mode:
diff --git a/drivers/vdpa/mlx5/meson.build b/drivers/vdpa/mlx5/meson.build
index 0fa82ad257..9d8dbb1a82 100644
--- a/drivers/vdpa/mlx5/meson.build
+++ b/drivers/vdpa/mlx5/meson.build
@@ -15,6 +15,7 @@ sources = files(
         'mlx5_vdpa_virtq.c',
         'mlx5_vdpa_steer.c',
         'mlx5_vdpa_lm.c',
+        'mlx5_vdpa_cthread.c',
 )
 cflags_options = [
         '-std=c11',
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e5a11f72fd..a9d023ed08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -50,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
 
 static struct mlx5_vdpa_priv *
@@ -493,6 +495,29 @@ mlx5_vdpa_args_check_handler(const char *key, const char *val, void *opaque)
 			DRV_LOG(WARNING, "Invalid event_core %s.", val);
 		else
 			priv->event_core = tmp;
+	} else if (strcmp(key, "max_conf_threads") == 0) {
+		if (tmp) {
+			priv->use_c_thread = true;
+			if (!conf_thread_mng.initializer_priv) {
+				conf_thread_mng.initializer_priv = priv;
+				if (tmp > MLX5_VDPA_MAX_C_THRD) {
+					DRV_LOG(WARNING,
+				"Invalid max_conf_threads %s "
+				"and set max_conf_threads to %d",
+				val, MLX5_VDPA_MAX_C_THRD);
+					tmp = MLX5_VDPA_MAX_C_THRD;
+				}
+				conf_thread_mng.max_thrds = tmp;
+			} else if (tmp != conf_thread_mng.max_thrds) {
+				DRV_LOG(WARNING,
+	"max_conf_threads is PMD argument and not per device, "
+	"only the first device configuration set it, current value is %d "
+	"and will not be changed to %d.",
+				conf_thread_mng.max_thrds, (int)tmp);
+			}
+		} else {
+			priv->use_c_thread = false;
+		}
 	} else if (strcmp(key, "hw_latency_mode") == 0) {
 		priv->hw_latency_mode = (uint32_t)tmp;
 	} else if (strcmp(key, "hw_max_latency_us") == 0) {
@@ -521,6 +546,9 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		"hw_max_latency_us",
 		"hw_max_pending_comp",
 		"no_traffic_time",
+		"queue_size",
+		"queues",
+		"max_conf_threads",
 		NULL,
 	};
 
@@ -725,6 +753,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_init(&priv->steer_update_lock, NULL);
 	priv->cdev = cdev;
 	mlx5_vdpa_config_get(mkvlist, priv);
+	if (priv->use_c_thread) {
+		if (conf_thread_mng.initializer_priv == priv)
+			if (mlx5_vdpa_mult_threads_create(priv->event_core))
+				goto error;
+		__atomic_fetch_add(&conf_thread_mng.refcnt, 1,
+			__ATOMIC_RELAXED);
+	}
 	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
@@ -739,6 +774,8 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
 error:
+	if (conf_thread_mng.initializer_priv == priv)
+		mlx5_vdpa_mult_threads_destroy(false);
 	if (priv)
 		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
@@ -806,6 +843,10 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
+	if (priv->use_c_thread)
+		if (__atomic_fetch_sub(&conf_thread_mng.refcnt,
+			1, __ATOMIC_RELAXED) == 1)
+			mlx5_vdpa_mult_threads_destroy(true);
 	rte_free(priv);
 }
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3fd5eefc5e..4e7c2557b7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -73,6 +73,22 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_MAX_C_THRD 256
+
+/* Generic mlx5_vdpa_c_thread information. */
+struct mlx5_vdpa_c_thread {
+	pthread_t tid;
+};
+
+struct mlx5_vdpa_conf_thread_mng {
+	void *initializer_priv;
+	uint32_t refcnt;
+	uint32_t max_thrds;
+	pthread_mutex_t cthrd_lock;
+	struct mlx5_vdpa_c_thread cthrd[MLX5_VDPA_MAX_C_THRD];
+};
+extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -126,6 +142,7 @@ enum mlx5_dev_state {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	bool connected;
+	bool use_c_thread;
 	enum mlx5_dev_state state;
 	rte_spinlock_t db_lock;
 	pthread_mutex_t steer_update_lock;
@@ -496,4 +513,23 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv);
 
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv);
+
+/**
+ * Create configuration multi-threads resource
+ *
+ * @param[in] cpu_core
+ *   CPU core number to set configuration threads affinity to.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_vdpa_mult_threads_create(int cpu_core);
+
+/**
+ * Destroy configuration multi-threads resource
+ *
+ */
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
new file mode 100644
index 0000000000..ba7d8b63b3
--- /dev/null
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 NVIDIA Corporation & Affiliates
+ */
+#include <string.h>
+#include <unistd.h>
+#include <sys/eventfd.h>
+
+#include <rte_malloc.h>
+#include <rte_errno.h>
+#include <rte_io.h>
+#include <rte_alarm.h>
+#include <rte_tailq.h>
+#include <rte_ring_elem.h>
+
+#include <mlx5_common.h>
+
+#include "mlx5_vdpa_utils.h"
+#include "mlx5_vdpa.h"
+
+static void *
+mlx5_vdpa_c_thread_handle(void *arg)
+{
+	/* To be added later. */
+	return arg;
+}
+
+static void
+mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
+{
+	if (conf_thread_mng.cthrd[thrd_idx].tid) {
+		pthread_cancel(conf_thread_mng.cthrd[thrd_idx].tid);
+		pthread_join(conf_thread_mng.cthrd[thrd_idx].tid, NULL);
+		conf_thread_mng.cthrd[thrd_idx].tid = 0;
+		if (need_unlock)
+			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	}
+}
+
+static int
+mlx5_vdpa_c_thread_create(int cpu_core)
+{
+	const struct sched_param sp = {
+		.sched_priority = sched_get_priority_max(SCHED_RR),
+	};
+	rte_cpuset_t cpuset;
+	pthread_attr_t attr;
+	uint32_t thrd_idx;
+	char name[32];
+	int ret;
+
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_attr_init(&attr);
+	ret = pthread_attr_setschedpolicy(&attr, SCHED_RR);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread sched policy = RR.");
+		goto c_thread_err;
+	}
+	ret = pthread_attr_setschedparam(&attr, &sp);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set thread priority.");
+		goto c_thread_err;
+	}
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++) {
+		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
+				&attr, mlx5_vdpa_c_thread_handle,
+				(void *)&conf_thread_mng);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to create vdpa multi-threads %d.",
+					thrd_idx);
+			goto c_thread_err;
+		}
+		CPU_ZERO(&cpuset);
+		if (cpu_core != -1)
+			CPU_SET(cpu_core, &cpuset);
+		else
+			cpuset = rte_lcore_cpuset(rte_get_main_lcore());
+		ret = pthread_setaffinity_np(
+				conf_thread_mng.cthrd[thrd_idx].tid,
+				sizeof(cpuset), &cpuset);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to set thread affinity for "
+			"vdpa multi-threads %d.", thrd_idx);
+			goto c_thread_err;
+		}
+		snprintf(name, sizeof(name), "vDPA-mthread-%d", thrd_idx);
+		ret = pthread_setname_np(
+				conf_thread_mng.cthrd[thrd_idx].tid, name);
+		if (ret)
+			DRV_LOG(ERR, "Failed to set vdpa multi-threads name %s.",
+					name);
+		else
+			DRV_LOG(DEBUG, "Thread name: %s.", name);
+	}
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+c_thread_err:
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, false);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return -1;
+}
+
+int
+mlx5_vdpa_mult_threads_create(int cpu_core)
+{
+	pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
+	if (mlx5_vdpa_c_thread_create(cpu_core)) {
+		DRV_LOG(ERR, "Cannot create vDPA configuration threads.");
+		mlx5_vdpa_mult_threads_destroy(false);
+		return -1;
+	}
+	return 0;
+}
+
+void
+mlx5_vdpa_mult_threads_destroy(bool need_unlock)
+{
+	uint32_t thrd_idx;
+
+	if (!conf_thread_mng.initializer_priv)
+		return;
+	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
+		thrd_idx++)
+		mlx5_vdpa_c_thread_destroy(thrd_idx, need_unlock);
+	pthread_mutex_destroy(&conf_thread_mng.cthrd_lock);
+	memset(&conf_thread_mng, 0, sizeof(struct mlx5_vdpa_conf_thread_mng));
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 2b0f5936d1..b45fbac146 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -507,7 +507,7 @@ mlx5_vdpa_cqe_event_setup(struct mlx5_vdpa_priv *priv)
 	pthread_attr_t attr;
 	char name[16];
 	const struct sched_param sp = {
-		.sched_priority = sched_get_priority_max(SCHED_RR),
+		.sched_priority = sched_get_priority_max(SCHED_RR) - 1,
 	};
 
 	if (!priv->eventc)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 63b6f44725..ce3f524fdb 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -43,7 +43,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 			    errno == EWOULDBLOCK ||
 			    errno == EAGAIN)
 				continue;
-			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s",
+			DRV_LOG(ERR,  "Failed to read kickfd of virtq %d: %s.",
 				virtq->index, strerror(errno));
 		}
 		break;
@@ -57,7 +57,7 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	rte_spinlock_unlock(&priv->db_lock);
 	pthread_mutex_unlock(&virtq->virtq_lock);
 	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
-		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling.",
 			priv->vid, virtq->index);
 		return;
 	}
@@ -218,7 +218,7 @@ mlx5_vdpa_virtq_query(struct mlx5_vdpa_priv *priv, int index)
 		return -1;
 	}
 	if (attr.state == MLX5_VIRTQ_STATE_ERROR)
-		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu",
+		DRV_LOG(WARNING, "vid %d vring %d hw error=%hhu.",
 			priv->vid, index, attr.error_type);
 	return 0;
 }
@@ -380,7 +380,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	if (ret) {
 		last_avail_idx = 0;
 		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0");
+		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
 	} else {
 		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 09/15] vdpa/mlx5: add task ring for MT management
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (7 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20 15:05     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
                     ` (6 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The configuration threads tasks need a container to
support multiple tasks assigned to a thread in parallel.
Use rte_ring container per thread to manage
the thread tasks without locks.
The caller thread from the user context opens a task to
a thread and enqueue it to the thread ring.
The thread polls its ring and dequeue tasks.
That’s why the ring should be in multi-producer
and single consumer mode.
Anatomic counter manages the tasks completion notification.
The threads report errors to the caller by
a dedicated error counter per task.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  17 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 4e7c2557b7..2bbb868ec6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -74,10 +74,22 @@ enum {
 };
 
 #define MLX5_VDPA_MAX_C_THRD 256
+#define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
+#define MLX5_VDPA_TASKS_PER_DEV 64
+
+/* Generic task information and size must be multiple of 4B. */
+struct mlx5_vdpa_task {
+	struct mlx5_vdpa_priv *priv;
+	uint32_t *remaining_cnt;
+	uint32_t *err_cnt;
+	uint32_t idx;
+} __rte_packed __rte_aligned(4);
 
 /* Generic mlx5_vdpa_c_thread information. */
 struct mlx5_vdpa_c_thread {
 	pthread_t tid;
+	struct rte_ring *rng;
+	pthread_cond_t c_cond;
 };
 
 struct mlx5_vdpa_conf_thread_mng {
@@ -532,4 +544,9 @@ mlx5_vdpa_mult_threads_create(int cpu_core);
  */
 void
 mlx5_vdpa_mult_threads_destroy(bool need_unlock);
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index ba7d8b63b3..1fdc92d3ad 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -11,17 +11,103 @@
 #include <rte_alarm.h>
 #include <rte_tailq.h>
 #include <rte_ring_elem.h>
+#include <rte_ring_peek.h>
 
 #include <mlx5_common.h>
 
 #include "mlx5_vdpa_utils.h"
 #include "mlx5_vdpa.h"
 
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_dequeue_bulk(struct rte_ring *r,
+	void **obj, uint32_t n, uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_elem_start(r, obj,
+		sizeof(struct mlx5_vdpa_task), n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_elem_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
+	void * const *obj, uint32_t n, uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_elem_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_elem_finish(r, obj,
+		sizeof(struct mlx5_vdpa_task), n);
+	return n;
+}
+
+bool
+mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
+		uint32_t thrd_idx,
+		uint32_t num)
+{
+	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
+	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t i;
+
+	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
+	for (i = 0 ; i < num; i++) {
+		task[i].priv = priv;
+		/* To be added later. */
+	}
+	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
+		return -1;
+	for (i = 0 ; i < num; i++)
+		if (task[i].remaining_cnt)
+			__atomic_fetch_add(task[i].remaining_cnt, 1,
+				__ATOMIC_RELAXED);
+	/* wake up conf thread. */
+	pthread_mutex_lock(&conf_thread_mng.cthrd_lock);
+	pthread_cond_signal(&conf_thread_mng.cthrd[thrd_idx].c_cond);
+	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
+	return 0;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
-	/* To be added later. */
-	return arg;
+	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
+	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_priv *priv;
+	struct mlx5_vdpa_task task;
+	struct rte_ring *rng;
+	uint32_t thrd_idx;
+	uint32_t task_num;
+
+	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
+		thrd_idx++)
+		if (multhrd->cthrd[thrd_idx].tid == thread_id)
+			break;
+	if (thrd_idx >= multhrd->max_thrds)
+		return NULL;
+	rng = multhrd->cthrd[thrd_idx].rng;
+	while (1) {
+		task_num = mlx5_vdpa_c_thrd_ring_dequeue_bulk(rng,
+			(void **)&task, 1, NULL);
+		if (!task_num) {
+			/* No task and condition wait. */
+			pthread_mutex_lock(&multhrd->cthrd_lock);
+			pthread_cond_wait(
+				&multhrd->cthrd[thrd_idx].c_cond,
+				&multhrd->cthrd_lock);
+			pthread_mutex_unlock(&multhrd->cthrd_lock);
+		}
+		priv = task.priv;
+		if (priv == NULL)
+			continue;
+		__atomic_fetch_sub(task.remaining_cnt,
+			1, __ATOMIC_RELAXED);
+		/* To be added later. */
+	}
+	return NULL;
 }
 
 static void
@@ -34,6 +120,10 @@ mlx5_vdpa_c_thread_destroy(uint32_t thrd_idx, bool need_unlock)
 		if (need_unlock)
 			pthread_mutex_init(&conf_thread_mng.cthrd_lock, NULL);
 	}
+	if (conf_thread_mng.cthrd[thrd_idx].rng) {
+		rte_ring_free(conf_thread_mng.cthrd[thrd_idx].rng);
+		conf_thread_mng.cthrd[thrd_idx].rng = NULL;
+	}
 }
 
 static int
@@ -45,6 +135,7 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 	rte_cpuset_t cpuset;
 	pthread_attr_t attr;
 	uint32_t thrd_idx;
+	uint32_t ring_num;
 	char name[32];
 	int ret;
 
@@ -60,8 +151,26 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 		DRV_LOG(ERR, "Failed to set thread priority.");
 		goto c_thread_err;
 	}
+	ring_num = MLX5_VDPA_MAX_TASKS_PER_THRD / conf_thread_mng.max_thrds;
+	if (!ring_num) {
+		DRV_LOG(ERR, "Invalid ring number for thread.");
+		goto c_thread_err;
+	}
 	for (thrd_idx = 0; thrd_idx < conf_thread_mng.max_thrds;
 		thrd_idx++) {
+		snprintf(name, sizeof(name), "vDPA-mthread-ring-%d",
+			thrd_idx);
+		conf_thread_mng.cthrd[thrd_idx].rng = rte_ring_create_elem(name,
+			sizeof(struct mlx5_vdpa_task), ring_num,
+			rte_socket_id(),
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ |
+			RING_F_EXACT_SZ);
+		if (!conf_thread_mng.cthrd[thrd_idx].rng) {
+			DRV_LOG(ERR,
+			"Failed to create vdpa multi-threads %d ring.",
+			thrd_idx);
+			goto c_thread_err;
+		}
 		ret = pthread_create(&conf_thread_mng.cthrd[thrd_idx].tid,
 				&attr, mlx5_vdpa_c_thread_handle,
 				(void *)&conf_thread_mng);
@@ -91,6 +200,8 @@ mlx5_vdpa_c_thread_create(int cpu_core)
 					name);
 		else
 			DRV_LOG(DEBUG, "Thread name: %s.", name);
+		pthread_cond_init(&conf_thread_mng.cthrd[thrd_idx].c_cond,
+			NULL);
 	}
 	pthread_mutex_unlock(&conf_thread_mng.cthrd_lock);
 	return 0;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 10/15] vdpa/mlx5: add MT task for VM memory registration
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (8 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20 15:12     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
                     ` (5 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The driver creates a direct MR object of
the HW for each VM memory region,
which maps the VM physical address to
the actual physical address.

Later, after all the MRs are ready,
the driver creates an indirect MR to group all the direct MRs
into one virtual space from the HW perspective.

Create direct MRs in parallel using the MT mechanism.
After completion, the primary thread creates the indirect MR
needed for the following virtqs configurations.

This optimization accelerrate the LM process and
reduce its time by 5%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         |   1 -
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  31 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 270 ++++++++++++++++++--------
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
 5 files changed, 258 insertions(+), 97 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a9d023ed08..e3b32fa087 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -768,7 +768,6 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = rte_errno ? rte_errno : EINVAL;
 		goto error;
 	}
-	SLIST_INIT(&priv->mr_list);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2bbb868ec6..3316ce42be 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -59,7 +59,6 @@ struct mlx5_vdpa_event_qp {
 };
 
 struct mlx5_vdpa_query_mr {
-	SLIST_ENTRY(mlx5_vdpa_query_mr) next;
 	union {
 		struct ibv_mr *mr;
 		struct mlx5_devx_obj *mkey;
@@ -76,10 +75,17 @@ enum {
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
+#define MLX5_VDPA_MAX_MRS 0xFFFF
+
+/* Vdpa task types. */
+enum mlx5_vdpa_task_type {
+	MLX5_VDPA_TASK_REG_MR = 1,
+};
 
 /* Generic task information and size must be multiple of 4B. */
 struct mlx5_vdpa_task {
 	struct mlx5_vdpa_priv *priv;
+	enum mlx5_vdpa_task_type type;
 	uint32_t *remaining_cnt;
 	uint32_t *err_cnt;
 	uint32_t idx;
@@ -101,6 +107,14 @@ struct mlx5_vdpa_conf_thread_mng {
 };
 extern struct mlx5_vdpa_conf_thread_mng conf_thread_mng;
 
+struct mlx5_vdpa_vmem_info {
+	struct rte_vhost_memory *vmem;
+	uint32_t entries_num;
+	uint64_t gcd;
+	uint64_t size;
+	uint8_t mode;
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
@@ -176,7 +190,7 @@ struct mlx5_vdpa_priv {
 	struct mlx5_hca_vdpa_attr caps;
 	uint32_t gpa_mkey_index;
 	struct ibv_mr *null_mr;
-	struct rte_vhost_memory *vmem;
+	struct mlx5_vdpa_vmem_info vmem_info;
 	struct mlx5dv_devx_event_channel *eventc;
 	struct mlx5dv_devx_event_channel *err_chnl;
 	struct mlx5_uar uar;
@@ -187,11 +201,13 @@ struct mlx5_vdpa_priv {
 	uint8_t num_lag_ports;
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
+	uint16_t last_c_thrd_idx;
+	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
 	void *virtq_db_addr;
 	struct mlx5_pmd_wrapped_mr lm_mr;
-	SLIST_HEAD(mr_list, mlx5_vdpa_query_mr) mr_list;
+	struct mlx5_vdpa_query_mr **mrs;
 	struct mlx5_vdpa_virtq virtqs[];
 };
 
@@ -548,5 +564,12 @@ mlx5_vdpa_mult_threads_destroy(bool need_unlock);
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num);
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		void **task_data, uint32_t num);
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1fdc92d3ad..10391931ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -47,16 +47,23 @@ mlx5_vdpa_c_thrd_ring_enqueue_bulk(struct rte_ring *r,
 bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
-		uint32_t num)
+		enum mlx5_vdpa_task_type task_type,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
+		void **task_data, uint32_t num)
 {
 	struct rte_ring *rng = conf_thread_mng.cthrd[thrd_idx].rng;
 	struct mlx5_vdpa_task task[MLX5_VDPA_TASKS_PER_DEV];
+	uint32_t *data = (uint32_t *)task_data;
 	uint32_t i;
 
 	MLX5_ASSERT(num <= MLX5_VDPA_TASKS_PER_DEV);
 	for (i = 0 ; i < num; i++) {
 		task[i].priv = priv;
 		/* To be added later. */
+		task[i].type = task_type;
+		task[i].remaining_cnt = remaining_cnt;
+		task[i].err_cnt = err_cnt;
+		task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -71,6 +78,23 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
+		uint32_t *err_cnt, uint32_t sleep_time)
+{
+	/* Check and wait all tasks done. */
+	while (__atomic_load_n(remaining_cnt,
+		__ATOMIC_RELAXED) != 0) {
+		rte_delay_us_sleep(sleep_time);
+	}
+	if (__atomic_load_n(err_cnt,
+		__ATOMIC_RELAXED)) {
+		DRV_LOG(ERR, "Tasks done with error.");
+		return true;
+	}
+	return false;
+}
+
 static void *
 mlx5_vdpa_c_thread_handle(void *arg)
 {
@@ -81,6 +105,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct rte_ring *rng;
 	uint32_t thrd_idx;
 	uint32_t task_num;
+	int ret;
 
 	for (thrd_idx = 0; thrd_idx < multhrd->max_thrds;
 		thrd_idx++)
@@ -99,13 +124,29 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&multhrd->cthrd[thrd_idx].c_cond,
 				&multhrd->cthrd_lock);
 			pthread_mutex_unlock(&multhrd->cthrd_lock);
+			continue;
 		}
 		priv = task.priv;
 		if (priv == NULL)
 			continue;
-		__atomic_fetch_sub(task.remaining_cnt,
+		switch (task.type) {
+		case MLX5_VDPA_TASK_REG_MR:
+			ret = mlx5_vdpa_register_mr(priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mr %d.", task.idx);
+				__atomic_fetch_add(task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
+		default:
+			DRV_LOG(ERR, "Invalid vdpa task type %d.",
+			task.type);
+			break;
+		}
+		if (task.remaining_cnt)
+			__atomic_fetch_sub(task.remaining_cnt,
 			1, __ATOMIC_RELAXED);
-		/* To be added later. */
 	}
 	return NULL;
 }
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index d6e3dd664b..e333f0bca6 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -17,25 +17,33 @@
 void
 mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 {
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
 	struct mlx5_vdpa_query_mr *entry;
-	struct mlx5_vdpa_query_mr *next;
+	int i;
 
-	entry = SLIST_FIRST(&priv->mr_list);
-	while (entry) {
-		next = SLIST_NEXT(entry, next);
-		if (entry->is_indirect)
-			claim_zero(mlx5_devx_cmd_destroy(entry->mkey));
-		else
-			claim_zero(mlx5_glue->dereg_mr(entry->mr));
-		SLIST_REMOVE(&priv->mr_list, entry, mlx5_vdpa_query_mr, next);
-		rte_free(entry);
-		entry = next;
+	if (priv->mrs) {
+		for (i = priv->num_mrs - 1; i >= 0; i--) {
+			entry = &mrs[i];
+			if (entry->is_indirect) {
+				if (entry->mkey)
+					claim_zero(
+					mlx5_devx_cmd_destroy(entry->mkey));
+			} else {
+				if (entry->mr)
+					claim_zero(
+					mlx5_glue->dereg_mr(entry->mr));
+			}
+		}
+		rte_free(priv->mrs);
+		priv->mrs = NULL;
+		priv->num_mrs = 0;
 	}
-	SLIST_INIT(&priv->mr_list);
-	if (priv->vmem) {
-		free(priv->vmem);
-		priv->vmem = NULL;
+	if (priv->vmem_info.vmem) {
+		free(priv->vmem_info.vmem);
+		priv->vmem_info.vmem = NULL;
 	}
+	priv->gpa_mkey_index = 0;
 }
 
 static int
@@ -167,72 +175,29 @@ mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
-/*
- * The target here is to group all the physical memory regions of the
- * virtio device in one indirect mkey.
- * For KLM Fixed Buffer Size mode (HW find the translation entry in one
- * read according to the guest physical address):
- * All the sub-direct mkeys of it must be in the same size, hence, each
- * one of them should be in the GCD size of all the virtio memory
- * regions and the holes between them.
- * For KLM mode (each entry may be in different size so HW must iterate
- * the entries):
- * Each virtio memory region and each hole between them have one entry,
- * just need to cover the maximum allowed size(2G) by splitting entries
- * which their associated memory regions are bigger than 2G.
- * It means that each virtio memory region may be mapped to more than
- * one direct mkey in the 2 modes.
- * All the holes of invalid memory between the virtio memory regions
- * will be mapped to the null memory region for security.
- */
-int
-mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+static int
+mlx5_vdpa_create_indirect_mkey(struct mlx5_vdpa_priv *priv)
 {
 	struct mlx5_devx_mkey_attr mkey_attr;
-	struct mlx5_vdpa_query_mr *entry = NULL;
-	struct rte_vhost_mem_region *reg = NULL;
-	uint8_t mode = 0;
-	uint32_t entries_num = 0;
-	uint32_t i;
-	uint64_t gcd = 0;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	uint8_t mode = priv->vmem_info.mode;
+	uint32_t entries_num = priv->vmem_info.entries_num;
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_klm klm_array[entries_num];
+	uint64_t gcd = priv->vmem_info.gcd;
+	int ret = -rte_errno;
 	uint64_t klm_size;
-	uint64_t mem_size;
-	uint64_t k;
 	int klm_index = 0;
-	int ret;
-	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
-			      (priv->vid, &mode, &mem_size, &gcd, &entries_num);
-	struct mlx5_klm klm_array[entries_num];
+	uint64_t k;
+	uint32_t i;
 
-	if (!mem)
-		return -rte_errno;
-	if (priv->vmem != NULL) {
-		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
-			/* VM memory not changed, reuse resources. */
-			free(mem);
-			return 0;
-		}
-		mlx5_vdpa_mem_dereg(priv);
-	}
-	priv->vmem = mem;
+	/* If it is the last entry, create indirect mkey. */
 	for (i = 0; i < mem->nregions; i++) {
+		entry = &mrs[i];
 		reg = &mem->regions[i];
-		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-		if (!entry) {
-			ret = -ENOMEM;
-			DRV_LOG(ERR, "Failed to allocate mem entry memory.");
-			goto error;
-		}
-		entry->mr = mlx5_glue->reg_mr_iova(priv->cdev->pd,
-				       (void *)(uintptr_t)(reg->host_user_addr),
-				       reg->size, reg->guest_phys_addr,
-				       IBV_ACCESS_LOCAL_WRITE);
-		if (!entry->mr) {
-			DRV_LOG(ERR, "Failed to create direct Mkey.");
-			ret = -rte_errno;
-			goto error;
-		}
-		entry->is_indirect = 0;
 		if (i > 0) {
 			uint64_t sadd;
 			uint64_t empty_region_sz = reg->guest_phys_addr -
@@ -265,11 +230,10 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 			klm_array[klm_index].address = reg->guest_phys_addr + k;
 			klm_index++;
 		}
-		SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	}
 	memset(&mkey_attr, 0, sizeof(mkey_attr));
 	mkey_attr.addr = (uintptr_t)(mem->regions[0].guest_phys_addr);
-	mkey_attr.size = mem_size;
+	mkey_attr.size = priv->vmem_info.size;
 	mkey_attr.pd = priv->cdev->pdn;
 	mkey_attr.umem_id = 0;
 	/* Must be zero for KLM mode. */
@@ -278,25 +242,159 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	mkey_attr.pg_access = 0;
 	mkey_attr.klm_array = klm_array;
 	mkey_attr.klm_num = klm_index;
-	entry = rte_zmalloc(__func__, sizeof(*entry), 0);
-	if (!entry) {
-		DRV_LOG(ERR, "Failed to allocate memory for indirect entry.");
-		ret = -ENOMEM;
-		goto error;
-	}
+	entry = &mrs[mem->nregions];
 	entry->mkey = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &mkey_attr);
 	if (!entry->mkey) {
 		DRV_LOG(ERR, "Failed to create indirect Mkey.");
-		ret = -rte_errno;
-		goto error;
+		rte_errno = -ret;
+		return ret;
 	}
 	entry->is_indirect = 1;
-	SLIST_INSERT_HEAD(&priv->mr_list, entry, next);
 	priv->gpa_mkey_index = entry->mkey->id;
 	return 0;
+}
+
+/*
+ * The target here is to group all the physical memory regions of the
+ * virtio device in one indirect mkey.
+ * For KLM Fixed Buffer Size mode (HW find the translation entry in one
+ * read according to the guest phisical address):
+ * All the sub-direct mkeys of it must be in the same size, hence, each
+ * one of them should be in the GCD size of all the virtio memory
+ * regions and the holes between them.
+ * For KLM mode (each entry may be in different size so HW must iterate
+ * the entries):
+ * Each virtio memory region and each hole between them have one entry,
+ * just need to cover the maximum allowed size(2G) by splitting entries
+ * which their associated memory regions are bigger than 2G.
+ * It means that each virtio memory region may be mapped to more than
+ * one direct mkey in the 2 modes.
+ * All the holes of invalid memory between the virtio memory regions
+ * will be mapped to the null memory region for security.
+ */
+int
+mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
+{
+	void *mrs;
+	uint8_t mode = 0;
+	int ret = -rte_errno;
+	uint32_t i, thrd_idx, data[1];
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	struct rte_vhost_memory *mem = mlx5_vdpa_vhost_mem_regions_prepare
+			(priv->vid, &mode, &priv->vmem_info.size,
+			&priv->vmem_info.gcd, &priv->vmem_info.entries_num);
+
+	if (!mem)
+		return -rte_errno;
+	if (priv->vmem_info.vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem_info.vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
+	priv->vmem_info.vmem = mem;
+	priv->vmem_info.mode = mode;
+	priv->num_mrs = mem->nregions;
+	if (!priv->num_mrs || priv->num_mrs >= MLX5_VDPA_MAX_MRS) {
+		DRV_LOG(ERR,
+		"Invalid number of memory regions.");
+		goto error;
+	}
+	/* The last one is indirect mkey entry. */
+	priv->num_mrs++;
+	mrs = rte_zmalloc("mlx5 vDPA memory regions",
+		sizeof(struct mlx5_vdpa_query_mr) * priv->num_mrs, 0);
+	priv->mrs = mrs;
+	if (!priv->mrs) {
+		DRV_LOG(ERR, "Failed to allocate private memory regions.");
+		goto error;
+	}
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[mem->nregions];
+
+		for (i = 0; i < mem->nregions; i++) {
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_REG_MR,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR,
+				"Fail to add task mem region (%d)", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			ret = mlx5_vdpa_register_mr(priv,
+					main_task_idx[i]);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 100)) {
+			DRV_LOG(ERR,
+			"Failed to wait register mem region tasks ready.");
+			goto error;
+		}
+	} else {
+		for (i = 0; i < mem->nregions; i++) {
+			ret = mlx5_vdpa_register_mr(priv, i);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to register mem region %d.", i);
+				goto error;
+			}
+		}
+	}
+	ret = mlx5_vdpa_create_indirect_mkey(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create indirect mkey .");
+		goto error;
+	}
+	return 0;
 error:
-	rte_free(entry);
 	mlx5_vdpa_mem_dereg(priv);
 	rte_errno = -ret;
 	return ret;
 }
+
+int
+mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx)
+{
+	struct rte_vhost_memory *mem = priv->vmem_info.vmem;
+	struct mlx5_vdpa_query_mr *mrs =
+		(struct mlx5_vdpa_query_mr *)priv->mrs;
+	struct mlx5_vdpa_query_mr *entry;
+	struct rte_vhost_mem_region *reg;
+	int ret;
+
+	reg = &mem->regions[idx];
+	entry = &mrs[idx];
+	entry->mr = mlx5_glue->reg_mr_iova
+				      (priv->cdev->pd,
+				       (void *)(uintptr_t)(reg->host_user_addr),
+				       reg->size, reg->guest_phys_addr,
+				       IBV_ACCESS_LOCAL_WRITE);
+	if (!entry->mr) {
+		DRV_LOG(ERR, "Failed to create direct Mkey.");
+		ret = -rte_errno;
+		return ret;
+	}
+	entry->is_indirect = 0;
+	return 0;
+}
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index ce3f524fdb..1f81fb8723 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -353,21 +353,21 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 	}
 	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get descriptor ring GPA.");
 			return -1;
 		}
 		attr->desc_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->used);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for used ring.");
 			return -1;
 		}
 		attr->used_addr = gpa;
-		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem,
+		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->avail);
 		if (!gpa) {
 			DRV_LOG(ERR, "Failed to get GPA for available ring.");
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 11/15] vdpa/mlx5: add virtq creation task for MT management
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (9 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20 15:19     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
                     ` (4 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

The virtq object and all its sub-resources use a lot of
FW commands and can be accelerated by the MT management.
Split the virtqs creation between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |   9 +-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++++++++++++++++++-------
 4 files changed, 134 insertions(+), 40 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 3316ce42be..35221f5ddc 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -80,6 +80,7 @@ enum {
 /* Vdpa task types. */
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
+	MLX5_VDPA_TASK_SETUP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -117,12 +118,12 @@ struct mlx5_vdpa_vmem_info {
 
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
-	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
 	uint8_t notifier_state;
-	bool stopped;
 	uint32_t configured:1;
+	uint32_t enable:1;
+	uint32_t stopped:1;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -565,11 +566,13 @@ bool
 mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		uint32_t thrd_idx,
 		enum mlx5_vdpa_task_type task_type,
-		uint32_t *bulk_refcnt, uint32_t *bulk_err_cnt,
+		uint32_t *remaining_cnt, uint32_t *err_cnt,
 		void **task_data, uint32_t num);
 int
 mlx5_vdpa_register_mr(struct mlx5_vdpa_priv *priv, uint32_t idx);
 bool
 mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 10391931ae..1389d369ae 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -100,6 +100,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 {
 	struct mlx5_vdpa_conf_thread_mng *multhrd = arg;
 	pthread_t thread_id = pthread_self();
+	struct mlx5_vdpa_virtq *virtq;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
@@ -139,6 +140,19 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__ATOMIC_RELAXED);
 			}
 			break;
+		case MLX5_VDPA_TASK_SETUP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_setup(priv,
+				task.idx, false);
+			if (ret) {
+				DRV_LOG(ERR,
+					"Failed to setup virtq %d.", task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1, __ATOMIC_RELAXED);
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index b45fbac146..f782b6b832 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -371,7 +371,7 @@ mlx5_vdpa_err_interrupt_handler(void *cb_arg __rte_unused)
 			goto unlock;
 		if (rte_rdtsc() / rte_get_tsc_hz() < MLX5_VDPA_ERROR_TIME_SEC)
 			goto unlock;
-		virtq->stopped = true;
+		virtq->stopped = 1;
 		/* Query error info. */
 		if (mlx5_vdpa_virtq_query(priv, vq_index))
 			goto log;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 1f81fb8723..50d59a8394 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -111,8 +111,9 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
+		if (virtq->index != i)
+			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
-		virtq->configured = 0;
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -131,7 +132,6 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
@@ -191,7 +191,7 @@ mlx5_vdpa_virtq_stop(struct mlx5_vdpa_priv *priv, int index)
 	ret = mlx5_vdpa_virtq_modify(virtq, 0);
 	if (ret)
 		return -1;
-	virtq->stopped = true;
+	virtq->stopped = 1;
 	DRV_LOG(DEBUG, "vid %u virtq %u was stopped.", priv->vid, index);
 	return mlx5_vdpa_virtq_query(priv, index);
 }
@@ -411,7 +411,38 @@ mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
+mlx5_vdpa_virtq_doorbell_setup(struct mlx5_vdpa_virtq *virtq,
+		struct rte_vhost_vring *vq, int index)
+{
+	virtq->intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (virtq->intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		return -1;
+	}
+	if (rte_intr_fd_set(virtq->intr_handle, vq->kickfd))
+		return -1;
+	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
+		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
+	} else {
+		if (rte_intr_type_set(virtq->intr_handle,
+			RTE_INTR_HANDLE_EXT))
+			return -1;
+		if (rte_intr_callback_register(virtq->intr_handle,
+			mlx5_vdpa_virtq_kick_handler, virtq)) {
+			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
+				index);
+			return -1;
+		}
+		DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
+			rte_intr_fd_get(virtq->intr_handle), index);
+	}
+	return 0;
+}
+
+int
+mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	struct rte_vhost_vring vq;
@@ -455,33 +486,11 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	rte_spinlock_unlock(&priv->db_lock);
 	/* Setup doorbell mapping. */
-	virtq->intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (virtq->intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
-
-	if (rte_intr_fd_set(virtq->intr_handle, vq.kickfd))
-		goto error;
-
-	if (rte_intr_fd_get(virtq->intr_handle) == -1) {
-		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-	} else {
-		if (rte_intr_type_set(virtq->intr_handle, RTE_INTR_HANDLE_EXT))
-			goto error;
-
-		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_kick_handler,
-					       virtq)) {
-			(void)rte_intr_fd_set(virtq->intr_handle, -1);
+	if (reg_kick) {
+		if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, index)) {
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
 				index);
 			goto error;
-		} else {
-			DRV_LOG(DEBUG, "Register fd %d interrupt for virtq %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				index);
 		}
 	}
 	/* Subscribe virtq error event. */
@@ -497,7 +506,6 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 		rte_errno = errno;
 		goto error;
 	}
-	virtq->stopped = false;
 	/* Initial notification to ask Qemu handling completed buffers. */
 	if (virtq->eqp.cq.callfd != -1)
 		eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
@@ -567,10 +575,12 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t i;
-	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
+	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq;
 
 	if (ret || mlx5_vdpa_features_validate(priv)) {
 		DRV_LOG(ERR, "Failed to configure negotiated features.");
@@ -590,16 +600,83 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 		return -1;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		virtq = &priv->virtqs[i];
-		if (virtq->enable) {
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[nr_vring];
+
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_SETUP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+						"task setup virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (mlx5_vdpa_virtq_setup(priv, i)) {
+			if (mlx5_vdpa_virtq_setup(priv,
+				main_task_idx[i], false)) {
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			goto error;
+		}
+		for (i = 0; i < nr_vring; i++) {
+			/* Setup doorbell mapping in order for Qume. */
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->enable || !virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			if (rte_vhost_get_vhost_vring(priv->vid, i, &vq)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				goto error;
+			}
+			if (mlx5_vdpa_virtq_doorbell_setup(virtq, &vq, i)) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to register virtq %d interrupt.", i);
+				goto error;
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	} else {
+		for (i = 0; i < nr_vring; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (virtq->enable) {
+				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+					pthread_mutex_unlock(
+						&virtq->virtq_lock);
+					goto error;
+				}
+			}
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
 	}
 	return 0;
 error:
@@ -663,7 +740,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		mlx5_vdpa_virtq_unset(virtq);
 	}
 	if (enable) {
-		ret = mlx5_vdpa_virtq_setup(priv, index);
+		ret = mlx5_vdpa_virtq_setup(priv, index, true);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to setup virtq %d.", index);
 			return ret;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 12/15] vdpa/mlx5: add virtq LM log task
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (10 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20 15:42     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 13/15] vdpa/mlx5: add device close task Li Zhang
                     ` (3 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

Split the virtqs LM log between the configuration threads.
This accelerates the LM process and reduces its time by 20%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  3 +
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 85 +++++++++++++++++++++------
 3 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 35221f5ddc..e08931719f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -72,6 +72,8 @@ enum {
 	MLX5_VDPA_NOTIFIER_STATE_ERR
 };
 
+#define MLX5_VDPA_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
 #define MLX5_VDPA_MAX_C_THRD 256
 #define MLX5_VDPA_MAX_TASKS_PER_THRD 4096
 #define MLX5_VDPA_TASKS_PER_DEV 64
@@ -81,6 +83,7 @@ enum {
 enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
+	MLX5_VDPA_TASK_STOP_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 1389d369ae..98369f0887 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -104,6 +104,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_vdpa_task task;
 	struct rte_ring *rng;
+	uint64_t features;
 	uint32_t thrd_idx;
 	uint32_t task_num;
 	int ret;
@@ -153,6 +154,39 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			}
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_STOP_VIRTQ:
+			virtq = &priv->virtqs[task.idx];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			ret = rte_vhost_get_negotiated_features(
+				priv->vid, &features);
+			if (ret) {
+				DRV_LOG(ERR,
+		"Failed to get negotiated features virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+					task.err_cnt, 1,
+					__ATOMIC_RELAXED);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				break;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(
+				priv->vid, task.idx, 0,
+			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
index bfa5d4d571..0fa671fc7c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_lm.c
@@ -89,39 +89,90 @@ mlx5_vdpa_dirty_bitmap_set(struct mlx5_vdpa_priv *priv, uint64_t log_base,
 	return -1;
 }
 
-#define MLX5_VDPA_USED_RING_LEN(size) \
-	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
-
 int
 mlx5_vdpa_lm_log(struct mlx5_vdpa_priv *priv)
 {
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t i, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 	uint64_t features;
-	int ret = rte_vhost_get_negotiated_features(priv->vid, &features);
-	int i;
+	int ret;
 
+	ret = rte_vhost_get_negotiated_features(priv->vid, &features);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to get negotiated features.");
 		return -1;
 	}
-	if (!RTE_VHOST_NEED_LOG(features))
-		return 0;
-	for (i = 0; i < priv->nr_virtqs; ++i) {
-		virtq = &priv->virtqs[i];
-		if (!priv->virtqs[i].virtq) {
-			DRV_LOG(DEBUG, "virtq %d is invalid for LM log.", i);
-		} else {
+	if (priv->use_c_thread && priv->nr_virtqs) {
+		uint32_t main_task_idx[priv->nr_virtqs];
+
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			if (!virtq->configured)
+				continue;
+			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = i;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = i;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_STOP_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+					"task stop virtq (%d).", i);
+				main_task_idx[task_num] = i;
+				task_num++;
+			}
+		}
+		for (i = 0; i < task_num; i++) {
+			virtq = &priv->virtqs[main_task_idx[i]];
 			pthread_mutex_lock(&virtq->virtq_lock);
-			ret = mlx5_vdpa_virtq_stop(priv, i);
+			ret = mlx5_vdpa_virtq_stop(priv,
+					main_task_idx[i]);
+			if (ret) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d.", i);
+				return -1;
+			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue setup tasks ready.");
+			return -1;
+		}
+	} else {
+		for (i = 0; i < priv->nr_virtqs; i++) {
+			virtq = &priv->virtqs[i];
+			pthread_mutex_lock(&virtq->virtq_lock);
+			if (!virtq->configured) {
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				continue;
+			}
+			ret = mlx5_vdpa_virtq_stop(priv, i);
 			if (ret) {
-				DRV_LOG(ERR, "Failed to stop virtq %d for LM "
-					"log.", i);
+				pthread_mutex_unlock(&virtq->virtq_lock);
+				DRV_LOG(ERR,
+				"Failed to stop virtq %d for LM log.", i);
 				return -1;
 			}
+			if (RTE_VHOST_NEED_LOG(features))
+				rte_vhost_log_used_vring(priv->vid, i, 0,
+				MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
-		rte_vhost_log_used_vring(priv->vid, i, 0,
-			      MLX5_VDPA_USED_RING_LEN(priv->virtqs[i].vq_size));
 	}
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 13/15] vdpa/mlx5: add device close task
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (11 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20 15:54     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
                     ` (2 subsequent siblings)
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

Split the virtqs device close tasks after
stopping virt-queue between the configuration threads.
This accelerates the LM process and
reduces its time by 50%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c         | 56 +++++++++++++++++++++++++--
 drivers/vdpa/mlx5/mlx5_vdpa.h         |  8 ++++
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++++++
 4 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index e3b32fa087..d000854c08 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -245,7 +245,7 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
-static void
+void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 {
 	/* Clean pre-created resource in dev removal only. */
@@ -254,6 +254,26 @@ mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
 	mlx5_vdpa_mem_dereg(priv);
 }
 
+static bool
+mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t timeout = 0;
+
+	/* Check and wait all close tasks done. */
+	while (__atomic_load_n(&priv->dev_close_progress,
+		__ATOMIC_RELAXED) != 0 && timeout < 1000) {
+		rte_delay_us_sleep(10000);
+		timeout++;
+	}
+	if (priv->dev_close_progress) {
+		DRV_LOG(ERR,
+		"Failed to wait close device tasks done vid %d.",
+		priv->vid);
+		return true;
+	}
+	return false;
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -271,6 +291,27 @@ mlx5_vdpa_dev_close(int vid)
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
+	if (priv->use_c_thread) {
+		if (priv->last_c_thrd_idx >=
+			(conf_thread_mng.max_thrds - 1))
+			priv->last_c_thrd_idx = 0;
+		else
+			priv->last_c_thrd_idx++;
+		__atomic_store_n(&priv->dev_close_progress,
+			1, __ATOMIC_RELAXED);
+		if (mlx5_vdpa_task_add(priv,
+			priv->last_c_thrd_idx,
+			MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+			NULL, NULL, NULL, 1)) {
+			DRV_LOG(ERR,
+			"Fail to add dev close task. ");
+			goto single_thrd;
+		}
+		priv->state = MLX5_VDPA_STATE_PROBED;
+		DRV_LOG(INFO, "vDPA device %d was closed.", vid);
+		return ret;
+	}
+single_thrd:
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
@@ -278,10 +319,12 @@ mlx5_vdpa_dev_close(int vid)
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	priv->state = MLX5_VDPA_STATE_PROBED;
 	if (!priv->connected)
 		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
+	__atomic_store_n(&priv->dev_close_progress, 0,
+		__ATOMIC_RELAXED);
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	DRV_LOG(INFO, "vDPA device %d was closed.", vid);
 	return ret;
 }
@@ -302,6 +345,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
+	if (mlx5_vdpa_wait_dev_close_tasks_done(priv))
+		return -1;
 	priv->vid = vid;
 	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
@@ -444,8 +489,11 @@ mlx5_vdpa_dev_cleanup(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED)
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
+		if (priv->use_c_thread)
+			mlx5_vdpa_wait_dev_close_tasks_done(priv);
 		mlx5_vdpa_dev_cache_clean(priv);
+	}
 	priv->connected = false;
 	return 0;
 }
@@ -839,6 +887,8 @@ mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 		mlx5_vdpa_dev_close(priv->vid);
+	if (priv->use_c_thread)
+		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
 	if (priv->vdev)
 		rte_vdpa_unregister_device(priv->vdev);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e08931719f..b6392b9d66 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -84,6 +84,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_REG_MR = 1,
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
+	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -206,6 +207,7 @@ struct mlx5_vdpa_priv {
 	uint64_t features; /* Negotiated features. */
 	uint16_t log_max_rqt_size;
 	uint16_t last_c_thrd_idx;
+	uint16_t dev_close_progress;
 	uint16_t num_mrs; /* Number of memory regions. */
 	struct mlx5_vdpa_steer steer;
 	struct mlx5dv_var *var;
@@ -578,4 +580,10 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 		uint32_t *err_cnt, uint32_t sleep_time);
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
+void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index 98369f0887..bb2279440b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -63,7 +63,8 @@ mlx5_vdpa_task_add(struct mlx5_vdpa_priv *priv,
 		task[i].type = task_type;
 		task[i].remaining_cnt = remaining_cnt;
 		task[i].err_cnt = err_cnt;
-		task[i].idx = data[i];
+		if (data)
+			task[i].idx = data[i];
 	}
 	if (!mlx5_vdpa_c_thrd_ring_enqueue_bulk(rng, (void **)&task, num, NULL))
 		return -1;
@@ -187,6 +188,23 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			    MLX5_VDPA_USED_RING_LEN(virtq->vq_size));
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
+		case MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT:
+			mlx5_vdpa_virtq_unreg_intr_handle_all(priv);
+			pthread_mutex_lock(&priv->steer_update_lock);
+			mlx5_vdpa_steer_unset(priv);
+			pthread_mutex_unlock(&priv->steer_update_lock);
+			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_drain_cq(priv);
+			if (priv->lm_mr.addr)
+				mlx5_os_wrapped_mkey_destroy(
+					&priv->lm_mr);
+			if (!priv->connected)
+				mlx5_vdpa_dev_cache_clean(priv);
+			priv->vid = 0;
+			__atomic_store_n(
+				&priv->dev_close_progress, 0,
+				__ATOMIC_RELAXED);
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 50d59a8394..79d48a6569 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -102,6 +102,20 @@ mlx5_vdpa_virtq_unregister_intr_handle(struct mlx5_vdpa_virtq *virtq)
 	virtq->intr_handle = NULL;
 }
 
+void
+mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
+
+	for (i = 0; i < priv->nr_virtqs; i++) {
+		virtq = &priv->virtqs[i];
+		pthread_mutex_lock(&virtq->virtq_lock);
+		mlx5_vdpa_virtq_unregister_intr_handle(virtq);
+		pthread_mutex_unlock(&virtq->virtq_lock);
+	}
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 14/15] vdpa/mlx5: add virtq sub-resources creation
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (12 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 13/15] vdpa/mlx5: add device close task Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20 16:01     ` Maxime Coquelin
  2022-06-18  9:02   ` [PATCH v4 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
  2022-06-21  9:29   ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Maxime Coquelin
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin, Yajun Wu

pre-created virt-queue sub-resource in device probe stage
and then modify virtqueue in device config stage.
Steer table also need to support dummy virt-queue.
This accelerates the LM process and reduces its time by 40%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 72 +++++++--------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +++++++++++++++++++++--------
 5 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index d000854c08..f006a9cd3f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -627,65 +627,39 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_vdpa_virtq *virtq;
+	uint32_t max_queues;
 	uint32_t index;
-	uint32_t i;
+	struct mlx5_vdpa_virtq *virtq;
 
-	for (index = 0; index < priv->caps.max_num_virtio_queues * 2;
+	for (index = 0; index < priv->caps.max_num_virtio_queues;
 		index++) {
 		virtq = &priv->virtqs[index];
 		pthread_mutex_init(&virtq->virtq_lock, NULL);
 	}
-	if (!priv->queues)
+	if (!priv->queues || !priv->queue_size)
 		return 0;
-	for (index = 0; index < (priv->queues * 2); ++index) {
+	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	for (index = 0; index < max_queues; ++index)
+		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+			index))
+			goto error;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			goto error;
+	return 0;
+error:
+	for (index = 0; index < max_queues; ++index) {
 		virtq = &priv->virtqs[index];
-		int ret = mlx5_vdpa_event_qp_prepare(priv, priv->queue_size,
-					-1, virtq);
-
-		if (ret) {
-			DRV_LOG(ERR, "Failed to create event QPs for virtq %d.",
-				index);
-			return -1;
-		}
-		if (priv->caps.queue_counters_valid) {
-			if (!virtq->counters)
-				virtq->counters =
-					mlx5_devx_cmd_create_virtio_q_counters
-						(priv->cdev->ctx);
-			if (!virtq->counters) {
-				DRV_LOG(ERR, "Failed to create virtq couners for virtq"
-					" %d.", index);
-				return -1;
-			}
-		}
-		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-			uint32_t size;
-			void *buf;
-			struct mlx5dv_devx_umem *obj;
-
-			size = priv->caps.umems[i].a * priv->queue_size +
-					priv->caps.umems[i].b;
-			buf = rte_zmalloc(__func__, size, 4096);
-			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
-						" %u.", i, index);
-				return -1;
-			}
-			obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf,
-					size, IBV_ACCESS_LOCAL_WRITE);
-			if (obj == NULL) {
-				rte_free(buf);
-				DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
-						i, index);
-				return -1;
-			}
-			virtq->umems[i].size = size;
-			virtq->umems[i].buf = buf;
-			virtq->umems[i].obj = obj;
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
-	return 0;
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	return -1;
 }
 
 static int
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index b6392b9d66..f353db62ac 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -277,13 +277,15 @@ int mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv);
  *   The guest notification file descriptor.
  * @param[in/out] virtq
  *   Pointer to the virt-queue structure.
+ * @param[in] reset
+ *   If true, it will reset event qp.
  *
  * @return
  *   0 on success, -1 otherwise and rte_errno is set.
  */
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq);
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset);
 
 /**
  * Destroy an event QP and all its related resources.
@@ -403,11 +405,13 @@ void mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] is_dummy
+ *   If set, it is updated with dummy queue for prepare resource.
  *
  * @return
  *   0 on success, a negative value otherwise.
  */
-int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv);
+int mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy);
 
 /**
  * Setup steering and all its related resources to enable RSS traffic from the
@@ -581,9 +585,14 @@ mlx5_vdpa_c_thread_wait_bulk_tasks_done(uint32_t *remaining_cnt,
 int
 mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick);
 void
-mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq);
-void
 mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv);
 void
 mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv);
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index);
+int
+mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
+void
+mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f782b6b832..22f0920c88 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -249,7 +249,7 @@ mlx5_vdpa_drain_cq(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i;
 
-	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_cq *cq = &priv->virtqs[i].eqp.cq;
 
 		mlx5_vdpa_queue_complete(cq);
@@ -618,7 +618,7 @@ mlx5_vdpa_qps2rts(struct mlx5_vdpa_event_qp *eqp)
 	return 0;
 }
 
-static int
+int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 {
 	if (mlx5_devx_cmd_modify_qp_state(eqp->fw_qp, MLX5_CMD_OP_QP_2RST,
@@ -638,7 +638,7 @@ mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp)
 
 int
 mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
-	int callfd, struct mlx5_vdpa_virtq *virtq)
+	int callfd, struct mlx5_vdpa_virtq *virtq, bool reset)
 {
 	struct mlx5_vdpa_event_qp *eqp = &virtq->eqp;
 	struct mlx5_devx_qp_attr attr = {0};
@@ -649,11 +649,10 @@ mlx5_vdpa_event_qp_prepare(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 		/* Reuse existing resources. */
 		eqp->cq.callfd = callfd;
 		/* FW will set event qp to error state in q destroy. */
-		if (!mlx5_vdpa_qps2rst2rts(eqp)) {
+		if (reset && !mlx5_vdpa_qps2rst2rts(eqp))
 			rte_write32(rte_cpu_to_be_32(RTE_BIT32(log_desc_n)),
 					&eqp->sw_qp.db_rec[0]);
-			return 0;
-		}
+		return 0;
 	}
 	if (eqp->fw_qp)
 		mlx5_vdpa_event_qp_destroy(eqp);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index 4cbf09784e..c2e0a17ace 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -57,7 +57,7 @@ mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
  * -1 on error.
  */
 static int
-mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int i;
 	uint32_t rqt_n = RTE_MIN(MLX5_VDPA_DEFAULT_RQT_SIZE,
@@ -67,15 +67,20 @@ mlx5_vdpa_rqt_prepare(struct mlx5_vdpa_priv *priv)
 						      sizeof(uint32_t), 0);
 	uint32_t k = 0, j;
 	int ret = 0, num;
+	uint16_t nr_vring = is_dummy ?
+	(((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+	(priv->queues * 2) : priv->caps.max_num_virtio_queues) : priv->nr_virtqs;
 
 	if (!attr) {
 		DRV_LOG(ERR, "Failed to allocate RQT attributes memory.");
 		rte_errno = ENOMEM;
 		return -ENOMEM;
 	}
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	for (i = 0; i < nr_vring; i++) {
 		if (is_virtq_recvq(i, priv->nr_virtqs) &&
-		    priv->virtqs[i].enable && priv->virtqs[i].virtq) {
+			(is_dummy || (priv->virtqs[i].enable &&
+			priv->virtqs[i].configured)) &&
+			priv->virtqs[i].virtq) {
 			attr->rq_list[k] = priv->virtqs[i].virtq->id;
 			k++;
 		}
@@ -235,12 +240,12 @@ mlx5_vdpa_rss_flows_create(struct mlx5_vdpa_priv *priv)
 }
 
 int
-mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv, bool is_dummy)
 {
 	int ret;
 
 	pthread_mutex_lock(&priv->steer_update_lock);
-	ret = mlx5_vdpa_rqt_prepare(priv);
+	ret = mlx5_vdpa_rqt_prepare(priv, is_dummy);
 	if (ret == 0) {
 		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
@@ -261,7 +266,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
-	if (mlx5_vdpa_steer_update(priv))
+	if (mlx5_vdpa_steer_update(priv, false))
 		goto error;
 	return 0;
 error:
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 79d48a6569..58466b3c0b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -146,10 +146,10 @@ mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 	}
 }
 
-static int
+void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	int ret = -EAGAIN;
+	int ret;
 
 	mlx5_vdpa_virtq_unregister_intr_handle(virtq);
 	if (virtq->configured) {
@@ -157,12 +157,12 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		virtq->configured = 0;
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+		virtq->index = 0;
+		virtq->virtq = NULL;
+		virtq->configured = 0;
 	}
-	virtq->virtq = NULL;
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
-	return 0;
 }
 
 void
@@ -175,6 +175,9 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
+		if (i < (priv->queues * 2))
+			mlx5_vdpa_virtq_single_resource_prepare(
+					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
 	priv->features = 0;
@@ -258,7 +261,8 @@ mlx5_vdpa_hva_to_gpa(struct rte_vhost_memory *mem, uint64_t hva)
 static int
 mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		struct mlx5_devx_virtq_attr *attr,
-		struct rte_vhost_vring *vq, int index)
+		struct rte_vhost_vring *vq,
+		int index, bool is_prepare)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[index];
 	uint64_t gpa;
@@ -277,11 +281,15 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			MLX5_VIRTQ_MODIFY_TYPE_Q_MKEY |
 			MLX5_VIRTQ_MODIFY_TYPE_QUEUE_FEATURE_BIT_MASK |
 			MLX5_VIRTQ_MODIFY_TYPE_EVENT_MODE;
-	attr->tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
-	attr->tso_ipv6 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
-	attr->tx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
-	attr->rx_csum = !!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
-	attr->virtio_version_1_0 =
+	attr->tso_ipv4 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
+	attr->tso_ipv6 = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO6));
+	attr->tx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_CSUM));
+	attr->rx_csum = is_prepare ? 1 :
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM));
+	attr->virtio_version_1_0 = is_prepare ? 1 :
 		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1));
 	attr->q_type =
 		(priv->features & (1ULL << VIRTIO_F_RING_PACKED)) ?
@@ -290,12 +298,12 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	 * No need event QPs creation when the guest in poll mode or when the
 	 * capability allows it.
 	 */
-	attr->event_mode = vq->callfd != -1 ||
+	attr->event_mode = is_prepare || vq->callfd != -1 ||
 	!(priv->caps.event_mode & (1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
 	MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
 	if (attr->event_mode == MLX5_VIRTQ_EVENT_MODE_QP) {
-		ret = mlx5_vdpa_event_qp_prepare(priv,
-				vq->size, vq->callfd, virtq);
+		ret = mlx5_vdpa_event_qp_prepare(priv, vq->size,
+				vq->callfd, virtq, !virtq->virtq);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Failed to create event QPs for virtq %d.",
@@ -320,7 +328,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		attr->counters_obj_id = virtq->counters->id;
 	}
 	/* Setup 3 UMEMs for each virtq. */
-	if (virtq->virtq) {
+	if (!virtq->virtq) {
 		for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
 			uint32_t size;
 			void *buf;
@@ -345,7 +353,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			buf = rte_zmalloc(__func__,
 				size, 4096);
 			if (buf == NULL) {
-				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
+				DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq."
 				" %u.", i, index);
 				return -1;
 			}
@@ -366,7 +374,7 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 			attr->umems[i].size = virtq->umems[i].size;
 		}
 	}
-	if (attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
+	if (!is_prepare && attr->q_type == MLX5_VIRTQ_TYPE_SPLIT) {
 		gpa = mlx5_vdpa_hva_to_gpa(priv->vmem_info.vmem,
 					   (uint64_t)(uintptr_t)vq->desc);
 		if (!gpa) {
@@ -389,21 +397,23 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 		}
 		attr->available_addr = gpa;
 	}
-	ret = rte_vhost_get_vring_base(priv->vid,
+	if (!is_prepare) {
+		ret = rte_vhost_get_vring_base(priv->vid,
 			index, &last_avail_idx, &last_used_idx);
-	if (ret) {
-		last_avail_idx = 0;
-		last_used_idx = 0;
-		DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
-	} else {
-		DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
+		if (ret) {
+			last_avail_idx = 0;
+			last_used_idx = 0;
+			DRV_LOG(WARNING, "Couldn't get vring base, idx are set to 0.");
+		} else {
+			DRV_LOG(INFO, "vid %d: Init last_avail_idx=%d, last_used_idx=%d for "
 				"virtq %d.", priv->vid, last_avail_idx,
 				last_used_idx, index);
+		}
 	}
 	attr->hw_available_index = last_avail_idx;
 	attr->hw_used_index = last_used_idx;
 	attr->q_size = vq->size;
-	attr->mkey = priv->gpa_mkey_index;
+	attr->mkey = is_prepare ? 0 : priv->gpa_mkey_index;
 	attr->tis_id = priv->tiss[(index / 2) % priv->num_lag_ports]->id;
 	attr->queue_index = index;
 	attr->pd = priv->cdev->pdn;
@@ -416,6 +426,39 @@ mlx5_vdpa_virtq_sub_objs_prepare(struct mlx5_vdpa_priv *priv,
 	return 0;
 }
 
+bool
+mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
+		int index)
+{
+	struct mlx5_devx_virtq_attr attr = {0};
+	struct mlx5_vdpa_virtq *virtq;
+	struct rte_vhost_vring vq = {
+		.size = priv->queue_size,
+		.callfd = -1,
+	};
+	int ret;
+
+	virtq = &priv->virtqs[index];
+	virtq->index = index;
+	virtq->vq_size = vq.size;
+	virtq->configured = 0;
+	virtq->virtq = NULL;
+	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr, &vq, index, true);
+	if (ret) {
+		DRV_LOG(ERR,
+		"Cannot prepare setup resource for virtq %d.", index);
+		return true;
+	}
+	if (mlx5_vdpa_is_modify_virtq_supported(priv)) {
+		virtq->virtq =
+		mlx5_devx_cmd_create_virtq(priv->cdev->ctx, &attr);
+		virtq->priv = priv;
+		if (!virtq->virtq)
+			return true;
+	}
+	return false;
+}
+
 bool
 mlx5_vdpa_is_modify_virtq_supported(struct mlx5_vdpa_priv *priv)
 {
@@ -473,7 +516,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 	virtq->priv = priv;
 	virtq->stopped = 0;
 	ret = mlx5_vdpa_virtq_sub_objs_prepare(priv, &attr,
-				&vq, index);
+				&vq, index, false);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to setup update virtq attr %d.",
 			index);
@@ -746,7 +789,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 	if (virtq->configured) {
 		virtq->enable = 0;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to disable steering "
 					"for virtq %d.", index);
@@ -761,7 +804,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 		}
 		virtq->enable = 1;
 		if (is_virtq_recvq(virtq->index, priv->nr_virtqs)) {
-			ret = mlx5_vdpa_steer_update(priv);
+			ret = mlx5_vdpa_steer_update(priv, false);
 			if (ret)
 				DRV_LOG(WARNING, "Failed to enable steering "
 					"for virtq %d.", index);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH v4 15/15] vdpa/mlx5: prepare virtqueue resource creation
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (13 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
@ 2022-06-18  9:02   ` Li Zhang
  2022-06-20 16:30     ` Maxime Coquelin
  2022-06-21  9:29   ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Maxime Coquelin
  15 siblings, 1 reply; 137+ messages in thread
From: Li Zhang @ 2022-06-18  9:02 UTC (permalink / raw)
  To: orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, maxime.coquelin

Split the virtqs virt-queue resource between
the configuration threads.
Also need pre-created virt-queue resource
after virtq destruction.
This accelerates the LM process and reduces its time by 30%.

Signed-off-by: Li Zhang <lizh@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/rel_notes/release_22_07.rst |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa.c          | 115 +++++++++++++++++++------
 drivers/vdpa/mlx5/mlx5_vdpa.h          |  12 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  |  15 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 111 ++++++++++++++++++++----
 5 files changed, 209 insertions(+), 45 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 2056cd9ee7..e1a9796e5c 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -178,6 +178,7 @@ New Features
 * **Updated Nvidia mlx5 vDPA driver.**
 
   * Added new devargs ``queue_size`` and ``queues`` to allow prior creation of virtq resources.
+  * Added new devarg ``max_conf_threads`` defines the number of multi-thread management to parallel the configurations.
 
 
 Removed Items
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f006a9cd3f..c5d82872c7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -275,23 +275,18 @@ mlx5_vdpa_wait_dev_close_tasks_done(struct mlx5_vdpa_priv *priv)
 }
 
 static int
-mlx5_vdpa_dev_close(int vid)
+_internal_mlx5_vdpa_dev_close(struct mlx5_vdpa_priv *priv,
+		bool release_resource)
 {
-	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
-	struct mlx5_vdpa_priv *priv =
-		mlx5_vdpa_find_priv_resource_by_vdev(vdev);
 	int ret = 0;
+	int vid = priv->vid;
 
-	if (priv == NULL) {
-		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
-		return -1;
-	}
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
 		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
 	}
-	if (priv->use_c_thread) {
+	if (priv->use_c_thread && !release_resource) {
 		if (priv->last_c_thrd_idx >=
 			(conf_thread_mng.max_thrds - 1))
 			priv->last_c_thrd_idx = 0;
@@ -315,7 +310,7 @@ mlx5_vdpa_dev_close(int vid)
 	pthread_mutex_lock(&priv->steer_update_lock);
 	mlx5_vdpa_steer_unset(priv);
 	pthread_mutex_unlock(&priv->steer_update_lock);
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, release_resource);
 	mlx5_vdpa_drain_cq(priv);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
@@ -329,6 +324,24 @@ mlx5_vdpa_dev_close(int vid)
 	return ret;
 }
 
+static int
+mlx5_vdpa_dev_close(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (!vdev) {
+		DRV_LOG(ERR, "Invalid vDPA device.");
+		return -1;
+	}
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	return _internal_mlx5_vdpa_dev_close(priv, false);
+}
+
 static int
 mlx5_vdpa_dev_config(int vid)
 {
@@ -624,11 +637,33 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 		priv->queue_size);
 }
 
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t max_queues, index;
+	struct mlx5_vdpa_virtq *virtq;
+
+	if (!priv->queues || !priv->queue_size)
+		return;
+	max_queues = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
+	if (mlx5_vdpa_is_modify_virtq_supported(priv))
+		mlx5_vdpa_steer_unset(priv);
+	for (index = 0; index < max_queues; ++index) {
+		virtq = &priv->virtqs[index];
+		if (virtq->virtq) {
+			pthread_mutex_lock(&virtq->virtq_lock);
+			mlx5_vdpa_virtq_unset(virtq);
+			pthread_mutex_unlock(&virtq->virtq_lock);
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 {
-	uint32_t max_queues;
-	uint32_t index;
+	uint32_t remaining_cnt = 0, err_cnt = 0, task_num = 0;
+	uint32_t max_queues, index, thrd_idx, data[1];
 	struct mlx5_vdpa_virtq *virtq;
 
 	for (index = 0; index < priv->caps.max_num_virtio_queues;
@@ -640,25 +675,53 @@ mlx5_vdpa_virtq_resource_prepare(struct mlx5_vdpa_priv *priv)
 		return 0;
 	max_queues = (priv->queues < priv->caps.max_num_virtio_queues) ?
 		(priv->queues * 2) : (priv->caps.max_num_virtio_queues);
-	for (index = 0; index < max_queues; ++index)
-		if (mlx5_vdpa_virtq_single_resource_prepare(priv,
-			index))
+	if (priv->use_c_thread) {
+		uint32_t main_task_idx[max_queues];
+
+		for (index = 0; index < max_queues; ++index) {
+			thrd_idx = index % (conf_thread_mng.max_thrds + 1);
+			if (!thrd_idx) {
+				main_task_idx[task_num] = index;
+				task_num++;
+				continue;
+			}
+			thrd_idx = priv->last_c_thrd_idx + 1;
+			if (thrd_idx >= conf_thread_mng.max_thrds)
+				thrd_idx = 0;
+			priv->last_c_thrd_idx = thrd_idx;
+			data[0] = index;
+			if (mlx5_vdpa_task_add(priv, thrd_idx,
+				MLX5_VDPA_TASK_PREPARE_VIRTQ,
+				&remaining_cnt, &err_cnt,
+				(void **)&data, 1)) {
+				DRV_LOG(ERR, "Fail to add "
+				"task prepare virtq (%d).", index);
+				main_task_idx[task_num] = index;
+				task_num++;
+			}
+		}
+		for (index = 0; index < task_num; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				main_task_idx[index]))
+				goto error;
+		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
+			&err_cnt, 2000)) {
+			DRV_LOG(ERR,
+			"Failed to wait virt-queue prepare tasks ready.");
 			goto error;
+		}
+	} else {
+		for (index = 0; index < max_queues; ++index)
+			if (mlx5_vdpa_virtq_single_resource_prepare(priv,
+				index))
+				goto error;
+	}
 	if (mlx5_vdpa_is_modify_virtq_supported(priv))
 		if (mlx5_vdpa_steer_update(priv, true))
 			goto error;
 	return 0;
 error:
-	for (index = 0; index < max_queues; ++index) {
-		virtq = &priv->virtqs[index];
-		if (virtq->virtq) {
-			pthread_mutex_lock(&virtq->virtq_lock);
-			mlx5_vdpa_virtq_unset(virtq);
-			pthread_mutex_unlock(&virtq->virtq_lock);
-		}
-	}
-	if (mlx5_vdpa_is_modify_virtq_supported(priv))
-		mlx5_vdpa_steer_unset(priv);
+	mlx5_vdpa_prepare_virtq_destroy(priv);
 	return -1;
 }
 
@@ -860,7 +923,7 @@ static void
 mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
 {
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-		mlx5_vdpa_dev_close(priv->vid);
+		_internal_mlx5_vdpa_dev_close(priv, true);
 	if (priv->use_c_thread)
 		mlx5_vdpa_wait_dev_close_tasks_done(priv);
 	mlx5_vdpa_release_dev_resources(priv);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index f353db62ac..dc4dfba5ed 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -85,6 +85,7 @@ enum mlx5_vdpa_task_type {
 	MLX5_VDPA_TASK_SETUP_VIRTQ,
 	MLX5_VDPA_TASK_STOP_VIRTQ,
 	MLX5_VDPA_TASK_DEV_CLOSE_NOWAIT,
+	MLX5_VDPA_TASK_PREPARE_VIRTQ,
 };
 
 /* Generic task information and size must be multiple of 4B. */
@@ -128,6 +129,9 @@ struct mlx5_vdpa_virtq {
 	uint32_t configured:1;
 	uint32_t enable:1;
 	uint32_t stopped:1;
+	uint32_t rx_csum:1;
+	uint32_t virtio_version_1_0:1;
+	uint32_t event_mode:3;
 	uint32_t version;
 	pthread_mutex_t virtq_lock;
 	struct mlx5_vdpa_priv *priv;
@@ -355,8 +359,12 @@ void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
  *
  * @param[in] priv
  *   The vdpa driver private structure.
+ * @param[in] release_resource
+ *   The vdpa driver release resource without prepare resource.
  */
-void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
+void
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+		bool release_resource);
 
 /**
  * Cleanup cached resources of all virtqs.
@@ -595,4 +603,6 @@ int
 mlx5_vdpa_qps2rst2rts(struct mlx5_vdpa_event_qp *eqp);
 void
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq);
+void
+mlx5_vdpa_prepare_virtq_destroy(struct mlx5_vdpa_priv *priv);
 #endif /* RTE_PMD_MLX5_VDPA_H_ */
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
index bb2279440b..6e6624e5a3 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
@@ -153,6 +153,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				__atomic_fetch_add(
 					task.err_cnt, 1, __ATOMIC_RELAXED);
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 			break;
 		case MLX5_VDPA_TASK_STOP_VIRTQ:
@@ -193,7 +194,7 @@ mlx5_vdpa_c_thread_handle(void *arg)
 			pthread_mutex_lock(&priv->steer_update_lock);
 			mlx5_vdpa_steer_unset(priv);
 			pthread_mutex_unlock(&priv->steer_update_lock);
-			mlx5_vdpa_virtqs_release(priv);
+			mlx5_vdpa_virtqs_release(priv, false);
 			mlx5_vdpa_drain_cq(priv);
 			if (priv->lm_mr.addr)
 				mlx5_os_wrapped_mkey_destroy(
@@ -205,6 +206,18 @@ mlx5_vdpa_c_thread_handle(void *arg)
 				&priv->dev_close_progress, 0,
 				__ATOMIC_RELAXED);
 			break;
+		case MLX5_VDPA_TASK_PREPARE_VIRTQ:
+			ret = mlx5_vdpa_virtq_single_resource_prepare(
+					priv, task.idx);
+			if (ret) {
+				DRV_LOG(ERR,
+				"Failed to prepare virtq %d.",
+				task.idx);
+				__atomic_fetch_add(
+				task.err_cnt, 1,
+				__ATOMIC_RELAXED);
+			}
+			break;
 		default:
 			DRV_LOG(ERR, "Invalid vdpa task type %d.",
 			task.type);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 58466b3c0b..06a5c26947 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -116,18 +116,29 @@ mlx5_vdpa_virtq_unreg_intr_handle_all(struct mlx5_vdpa_priv *priv)
 	}
 }
 
+static void
+mlx5_vdpa_vq_destroy(struct mlx5_vdpa_virtq *virtq)
+{
+	/* Clean pre-created resource in dev removal only */
+	claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
+	virtq->index = 0;
+	virtq->virtq = NULL;
+	virtq->configured = 0;
+}
+
 /* Release cached VQ resources. */
 void
 mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
 {
 	unsigned int i, j;
 
+	mlx5_vdpa_steer_unset(priv);
 	for (i = 0; i < priv->caps.max_num_virtio_queues; i++) {
 		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
 
-		if (virtq->index != i)
-			continue;
 		pthread_mutex_lock(&virtq->virtq_lock);
+		if (virtq->virtq)
+			mlx5_vdpa_vq_destroy(virtq);
 		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
 			if (virtq->umems[j].obj) {
 				claim_zero(mlx5_glue->devx_umem_dereg
@@ -157,29 +168,37 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		if (ret)
 			DRV_LOG(WARNING, "Failed to stop virtq %d.",
 				virtq->index);
-		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
-		virtq->index = 0;
-		virtq->virtq = NULL;
-		virtq->configured = 0;
 	}
+	mlx5_vdpa_vq_destroy(virtq);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 }
 
 void
-mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
+mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv,
+	bool release_resource)
 {
 	struct mlx5_vdpa_virtq *virtq;
-	int i;
-
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	uint32_t i, max_virtq, valid_vq_num;
+
+	valid_vq_num = ((priv->queues * 2) < priv->caps.max_num_virtio_queues) ?
+		(priv->queues * 2) : priv->caps.max_num_virtio_queues;
+	max_virtq = (release_resource &&
+		(valid_vq_num) > priv->nr_virtqs) ?
+		(valid_vq_num) : priv->nr_virtqs;
+	for (i = 0; i < max_virtq; i++) {
 		virtq = &priv->virtqs[i];
 		pthread_mutex_lock(&virtq->virtq_lock);
 		mlx5_vdpa_virtq_unset(virtq);
-		if (i < (priv->queues * 2))
+		virtq->enable = 0;
+		if (!release_resource && i < valid_vq_num)
 			mlx5_vdpa_virtq_single_resource_prepare(
 					priv, i);
 		pthread_mutex_unlock(&virtq->virtq_lock);
 	}
+	if (!release_resource && priv->queues &&
+		mlx5_vdpa_is_modify_virtq_supported(priv))
+		if (mlx5_vdpa_steer_update(priv, true))
+			mlx5_vdpa_steer_unset(priv);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -455,6 +474,9 @@ mlx5_vdpa_virtq_single_resource_prepare(struct mlx5_vdpa_priv *priv,
 		virtq->priv = priv;
 		if (!virtq->virtq)
 			return true;
+		virtq->rx_csum = attr.rx_csum;
+		virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+		virtq->event_mode = attr.event_mode;
 	}
 	return false;
 }
@@ -538,6 +560,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index, bool reg_kick)
 		goto error;
 	}
 	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
+	virtq->rx_csum = attr.rx_csum;
+	virtq->virtio_version_1_0 = attr.virtio_version_1_0;
+	virtq->event_mode = attr.event_mode;
 	virtq->configured = 1;
 	rte_spinlock_lock(&priv->db_lock);
 	rte_write32(virtq->index, priv->virtq_db_addr);
@@ -629,6 +654,31 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 	return 0;
 }
 
+static bool
+mlx5_vdpa_is_pre_created_vq_mismatch(struct mlx5_vdpa_priv *priv,
+		struct mlx5_vdpa_virtq *virtq)
+{
+	struct rte_vhost_vring vq;
+	uint32_t event_mode;
+
+	if (virtq->rx_csum !=
+		!!(priv->features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)))
+		return true;
+	if (virtq->virtio_version_1_0 !=
+		!!(priv->features & (1ULL << VIRTIO_F_VERSION_1)))
+		return true;
+	if (rte_vhost_get_vhost_vring(priv->vid, virtq->index, &vq))
+		return true;
+	if (vq.size != virtq->vq_size)
+		return true;
+	event_mode = vq.callfd != -1 || !(priv->caps.event_mode &
+		(1 << MLX5_VIRTQ_EVENT_MODE_NO_MSIX)) ?
+		MLX5_VIRTQ_EVENT_MODE_QP : MLX5_VIRTQ_EVENT_MODE_NO_MSIX;
+	if (virtq->event_mode != event_mode)
+		return true;
+	return false;
+}
+
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
@@ -664,6 +714,15 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			virtq = &priv->virtqs[i];
 			if (!virtq->enable)
 				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			thrd_idx = i % (conf_thread_mng.max_thrds + 1);
 			if (!thrd_idx) {
 				main_task_idx[task_num] = i;
@@ -693,6 +752,7 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 				pthread_mutex_unlock(&virtq->virtq_lock);
 				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 		if (mlx5_vdpa_c_thread_wait_bulk_tasks_done(&remaining_cnt,
@@ -724,20 +784,32 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 	} else {
 		for (i = 0; i < nr_vring; i++) {
 			virtq = &priv->virtqs[i];
+			if (!virtq->enable)
+				continue;
+			if (priv->queues && virtq->virtq) {
+				if (mlx5_vdpa_is_pre_created_vq_mismatch(priv,
+					virtq)) {
+					mlx5_vdpa_prepare_virtq_destroy(
+					priv);
+					i = 0;
+					virtq = &priv->virtqs[i];
+					if (!virtq->enable)
+						continue;
+				}
+			}
 			pthread_mutex_lock(&virtq->virtq_lock);
-			if (virtq->enable) {
-				if (mlx5_vdpa_virtq_setup(priv, i, true)) {
-					pthread_mutex_unlock(
+			if (mlx5_vdpa_virtq_setup(priv, i, true)) {
+				pthread_mutex_unlock(
 						&virtq->virtq_lock);
-					goto error;
-				}
+				goto error;
 			}
+			virtq->enable = 1;
 			pthread_mutex_unlock(&virtq->virtq_lock);
 		}
 	}
 	return 0;
 error:
-	mlx5_vdpa_virtqs_release(priv);
+	mlx5_vdpa_virtqs_release(priv, true);
 	return -1;
 }
 
@@ -795,6 +867,11 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 					"for virtq %d.", index);
 		}
 		mlx5_vdpa_virtq_unset(virtq);
+	} else {
+		if (virtq->virtq &&
+			mlx5_vdpa_is_pre_created_vq_mismatch(priv, virtq))
+			DRV_LOG(WARNING,
+			"Configuration mismatch dummy virtq %d.", index);
 	}
 	if (enable) {
 		ret = mlx5_vdpa_virtq_setup(priv, index, true);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 04/15] vdpa/mlx5: support event qp reuse
  2022-06-18  9:02   ` [PATCH v4 04/15] vdpa/mlx5: support event qp reuse Li Zhang
@ 2022-06-20  8:27     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20  8:27 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, Yajun Wu



On 6/18/22 11:02, Li Zhang wrote:
> From: Yajun Wu <yajunw@nvidia.com>
> 
> To speed up queue create time, event qp and cq will create only once.
> Each virtq creation will reuse same event qp and cq.
> 
> Because FW will set event qp to error state during virtq destroy,
> need modify event qp to RESET state, then modify qp to RTS state as
> usual. This can save about 1.5ms for each virtq creation.
> 
> After SW qp reset, qp pi/ci all become 0 while cq pi/ci keep as
> previous. Add new variable qp_ci to save SW qp ci. Move qp pi
> independently with cq ci.
> 
> Add new function mlx5_vdpa_drain_cq to drain cq CQE after virtq
> release.
> 
> Signed-off-by: Yajun Wu <yajunw@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c       |  8 ++++
>   drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 +++++-
>   drivers/vdpa/mlx5/mlx5_vdpa_event.c | 60 +++++++++++++++++++++++++++--
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  6 +--
>   4 files changed, 78 insertions(+), 8 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 05/15] common/mlx5: extend virtq modifiable fields
  2022-06-18  9:02   ` [PATCH v4 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
@ 2022-06-20  9:01     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20  9:01 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> A virtq configuration can be modified after the virtq creation.
> Added the following modifiable fields:
> 1.address fields: desc_addr/used_addr/available_addr
> 2.hw_available_index
> 3.hw_used_index
> 4.virtio_q_type
> 5.version type
> 6.queue mkey
> 7.feature bit mask: tso_ipv4/tso_ipv6/tx_csum/rx_csum
> 8.event mode: event_mode/event_qpn_or_msix
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/common/mlx5/mlx5_devx_cmds.c | 70 +++++++++++++++++++++++-----
>   drivers/common/mlx5/mlx5_devx_cmds.h |  6 ++-
>   drivers/common/mlx5/mlx5_prm.h       | 13 +++++-
>   3 files changed, 76 insertions(+), 13 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 07/15] vdpa/mlx5: optimize datapath-control synchronization
  2022-06-18  9:02   ` [PATCH v4 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
@ 2022-06-20  9:25     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20  9:25 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> The driver used a single global lock for any synchronization
> needed for the datapath and control path.
> It is better to group the critical sections with
> the other ones that should be synchronized.
> 
> Replace the global lock with the following locks:
> 
> 1.virtq locks(per virtq) synchronize datapath polling and
>    parallel configurations on the same virtq.
> 2.A doorbell lock synchronizes doorbell update,
>    which is shared for all the virtqs in the device.
> 3.A steering lock for the shared steering objects updates.
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c       | 24 ++++---
>   drivers/vdpa/mlx5/mlx5_vdpa.h       | 13 ++--
>   drivers/vdpa/mlx5/mlx5_vdpa_event.c | 97 ++++++++++++++++++-----------
>   drivers/vdpa/mlx5/mlx5_vdpa_lm.c    | 36 ++++++++---
>   drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  7 ++-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 88 +++++++++++++++++++-------
>   6 files changed, 186 insertions(+), 79 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 08/15] vdpa/mlx5: add multi-thread management for configuration
  2022-06-18  9:02   ` [PATCH v4 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
@ 2022-06-20 10:57     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20 10:57 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> The LM process includes a lot of objects creations and
> destructions in the source and the destination servers.
> As much as LM time increases, the packet drop of the VM increases.
> To improve LM time need to parallel the configurations for mlx5 FW.
> Add internal multi-thread management in the driver for it.
> 
> A new devarg defines the number of threads and their CPU.
> The management is shared between all the devices of the driver.
> Since the event_core also affects the datapath events thread,
> reduce the priority of the datapath event thread to
> allow fast configuration of the devices doing the LM.
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   doc/guides/vdpadevs/mlx5.rst          |  11 +++
>   drivers/vdpa/mlx5/meson.build         |   1 +
>   drivers/vdpa/mlx5/mlx5_vdpa.c         |  41 ++++++++
>   drivers/vdpa/mlx5/mlx5_vdpa.h         |  36 +++++++
>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 129 ++++++++++++++++++++++++++
>   drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   8 +-
>   7 files changed, 223 insertions(+), 5 deletions(-)
>   create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 09/15] vdpa/mlx5: add task ring for MT management
  2022-06-18  9:02   ` [PATCH v4 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
@ 2022-06-20 15:05     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20 15:05 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> The configuration threads tasks need a container to
> support multiple tasks assigned to a thread in parallel.
> Use rte_ring container per thread to manage
> the thread tasks without locks.
> The caller thread from the user context opens a task to
> a thread and enqueue it to the thread ring.
> The thread polls its ring and dequeue tasks.
> That’s why the ring should be in multi-producer
> and single consumer mode.
> Anatomic counter manages the tasks completion notification.
> The threads report errors to the caller by
> a dedicated error counter per task.
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.h         |  17 ++++
>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 115 +++++++++++++++++++++++++-
>   2 files changed, 130 insertions(+), 2 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 10/15] vdpa/mlx5: add MT task for VM memory registration
  2022-06-18  9:02   ` [PATCH v4 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
@ 2022-06-20 15:12     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20 15:12 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> The driver creates a direct MR object of
> the HW for each VM memory region,
> which maps the VM physical address to
> the actual physical address.
> 
> Later, after all the MRs are ready,
> the driver creates an indirect MR to group all the direct MRs
> into one virtual space from the HW perspective.
> 
> Create direct MRs in parallel using the MT mechanism.
> After completion, the primary thread creates the indirect MR
> needed for the following virtqs configurations.
> 
> This optimization accelerrate the LM process and
> reduce its time by 5%.
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c         |   1 -
>   drivers/vdpa/mlx5/mlx5_vdpa.h         |  31 ++-
>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  47 ++++-
>   drivers/vdpa/mlx5/mlx5_vdpa_mem.c     | 270 ++++++++++++++++++--------
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   |   6 +-
>   5 files changed, 258 insertions(+), 97 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 11/15] vdpa/mlx5: add virtq creation task for MT management
  2022-06-18  9:02   ` [PATCH v4 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
@ 2022-06-20 15:19     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20 15:19 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> The virtq object and all its sub-resources use a lot of
> FW commands and can be accelerated by the MT management.
> Split the virtqs creation between the configuration threads.
> This accelerates the LM process and reduces its time by 20%.
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.h         |   9 +-
>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c |  14 +++
>   drivers/vdpa/mlx5/mlx5_vdpa_event.c   |   2 +-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 149 +++++++++++++++++++-------
>   4 files changed, 134 insertions(+), 40 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 12/15] vdpa/mlx5: add virtq LM log task
  2022-06-18  9:02   ` [PATCH v4 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
@ 2022-06-20 15:42     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20 15:42 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> Split the virtqs LM log between the configuration threads.
> This accelerates the LM process and reduces its time by 20%.
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.h         |  3 +
>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 34 +++++++++++
>   drivers/vdpa/mlx5/mlx5_vdpa_lm.c      | 85 +++++++++++++++++++++------
>   3 files changed, 105 insertions(+), 17 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 13/15] vdpa/mlx5: add device close task
  2022-06-18  9:02   ` [PATCH v4 13/15] vdpa/mlx5: add device close task Li Zhang
@ 2022-06-20 15:54     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20 15:54 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> Split the virtqs device close tasks after
> stopping virt-queue between the configuration threads.
> This accelerates the LM process and
> reduces its time by 50%.
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c         | 56 +++++++++++++++++++++++++--
>   drivers/vdpa/mlx5/mlx5_vdpa.h         |  8 ++++
>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c | 20 +++++++++-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c   | 14 +++++++
>   4 files changed, 94 insertions(+), 4 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 14/15] vdpa/mlx5: add virtq sub-resources creation
  2022-06-18  9:02   ` [PATCH v4 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
@ 2022-06-20 16:01     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20 16:01 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs
  Cc: dev, thomas, rasland, roniba, Yajun Wu



On 6/18/22 11:02, Li Zhang wrote:
> pre-created virt-queue sub-resource in device probe stage
> and then modify virtqueue in device config stage.
> Steer table also need to support dummy virt-queue.
> This accelerates the LM process and reduces its time by 40%.
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Signed-off-by: Yajun Wu <yajunw@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c       | 72 +++++++--------------
>   drivers/vdpa/mlx5/mlx5_vdpa.h       | 17 +++--
>   drivers/vdpa/mlx5/mlx5_vdpa_event.c | 11 ++--
>   drivers/vdpa/mlx5/mlx5_vdpa_steer.c | 17 +++--
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 99 +++++++++++++++++++++--------
>   5 files changed, 123 insertions(+), 93 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 15/15] vdpa/mlx5: prepare virtqueue resource creation
  2022-06-18  9:02   ` [PATCH v4 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
@ 2022-06-20 16:30     ` Maxime Coquelin
  0 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-20 16:30 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> Split the virtqs virt-queue resource between
> the configuration threads.
> Also need pre-created virt-queue resource
> after virtq destruction.
> This accelerates the LM process and reduces its time by 30%.
> 
> Signed-off-by: Li Zhang <lizh@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>   doc/guides/rel_notes/release_22_07.rst |   1 +
>   drivers/vdpa/mlx5/mlx5_vdpa.c          | 115 +++++++++++++++++++------
>   drivers/vdpa/mlx5/mlx5_vdpa.h          |  12 ++-
>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  |  15 +++-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 111 ++++++++++++++++++++----
>   5 files changed, 209 insertions(+), 45 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH v4 00/15] mlx5/vdpa: optimize live migration time
  2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
                     ` (14 preceding siblings ...)
  2022-06-18  9:02   ` [PATCH v4 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
@ 2022-06-21  9:29   ` Maxime Coquelin
  15 siblings, 0 replies; 137+ messages in thread
From: Maxime Coquelin @ 2022-06-21  9:29 UTC (permalink / raw)
  To: Li Zhang, orika, viacheslavo, matan, shahafs; +Cc: dev, thomas, rasland, roniba



On 6/18/22 11:02, Li Zhang wrote:
> Allow the driver to use internal threads to
> obtain fast configuration.
> All the threads will be open on the same core of
> the event completion queue scheduling thread.
> 
> Add max_conf_threads parameter to configure
> the maximum number of internal threads in addition to
> the caller thread (8 is suggested).
> These internal threads to pipeline handle VDPA tasks
> in system and shared with all VDPA devices.
> Default is 0, don't use internal threads for configuration.
> 
> Depends-on: series=21868 ("vdpa/mlx5: improve device shutdown time")
> http://patchwork.dpdk.org/project/dpdk/list/?series=21868
> 
> RFC ("Add vDPA multi-threads optiomization")
> https://patchwork.dpdk.org/project/dpdk/cover/20220408075606.33056-1-lizh@nvidia.com/
> 
> V2:
> * Drop eal device removal patch in series.
> * Add release note in release_22_07.rst.
> 
> V3:
> * Fix comments about commit log issue.
> * Avoid cutting logs.
> 
> V4:
> * Fix coding style issue
> 
> Li Zhang (12):
>    vdpa/mlx5: fix usage of capability for max number of virtqs
>    common/mlx5: extend virtq modifiable fields
>    vdpa/mlx5: pre-create virtq at probe time
>    vdpa/mlx5: optimize datapath-control synchronization
>    vdpa/mlx5: add multi-thread management for configuration
>    vdpa/mlx5: add task ring for MT management
>    vdpa/mlx5: add MT task for VM memory registration
>    vdpa/mlx5: add virtq creation task for MT management
>    vdpa/mlx5: add virtq LM log task
>    vdpa/mlx5: add device close task
>    vdpa/mlx5: add virtq sub-resources creation
>    vdpa/mlx5: prepare virtqueue resource creation
> 
> Yajun Wu (3):
>    vdpa/mlx5: support pre create virtq resource
>    common/mlx5: add DevX API to move QP to reset state
>    vdpa/mlx5: support event qp reuse
> 
>   doc/guides/rel_notes/release_22_07.rst |   5 +
>   doc/guides/vdpadevs/mlx5.rst           |  25 +
>   drivers/common/mlx5/mlx5_devx_cmds.c   |  77 ++-
>   drivers/common/mlx5/mlx5_devx_cmds.h   |   6 +-
>   drivers/common/mlx5/mlx5_prm.h         |  30 +-
>   drivers/vdpa/mlx5/meson.build          |   1 +
>   drivers/vdpa/mlx5/mlx5_vdpa.c          | 270 ++++++++--
>   drivers/vdpa/mlx5/mlx5_vdpa.h          | 152 +++++-
>   drivers/vdpa/mlx5/mlx5_vdpa_cthread.c  | 360 ++++++++++++++
>   drivers/vdpa/mlx5/mlx5_vdpa_event.c    | 160 ++++--
>   drivers/vdpa/mlx5/mlx5_vdpa_lm.c       | 134 +++--
>   drivers/vdpa/mlx5/mlx5_vdpa_mem.c      | 270 ++++++----
>   drivers/vdpa/mlx5/mlx5_vdpa_steer.c    |  22 +-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 654 ++++++++++++++++++-------
>   14 files changed, 1779 insertions(+), 387 deletions(-)
>   create mode 100644 drivers/vdpa/mlx5/mlx5_vdpa_cthread.c
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 137+ messages in thread

end of thread, other threads:[~2022-06-21  9:30 UTC | newest]

Thread overview: 137+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-08  7:55 [RFC 00/15] Add vDPA multi-threads optiomization Li Zhang
2022-04-08  7:55 ` [RFC 01/15] examples/vdpa: fix vDPA device remove Li Zhang
2022-04-08  7:55 ` [RFC 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
2022-04-08  7:55 ` [RFC 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
2022-04-08  7:55 ` [RFC 04/15] vdpa/mlx5: support event qp reuse Li Zhang
2022-04-08  7:55 ` [RFC 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
2022-04-08  7:55 ` [RFC 06/15] vdpa/mlx5: pre-create virtq in the prob Li Zhang
2022-04-08  7:55 ` [RFC 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
2022-04-08  7:55 ` [RFC 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
2022-04-08  7:55 ` [RFC 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
2022-04-08  7:56 ` [RFC 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
2022-04-08  7:56 ` [RFC 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
2022-04-08  7:56 ` [RFC 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
2022-04-08  7:56 ` [RFC 13/15] vdpa/mlx5: add device close task Li Zhang
2022-04-08  7:56 ` [RFC 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
2022-04-08  7:56 ` [RFC 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
2022-06-06 11:20 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
2022-06-06 11:20   ` [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
2022-06-06 11:20   ` [PATCH v1 02/17] eal: add device removal in rte cleanup Li Zhang
2022-06-06 11:20   ` [PATCH 02/16] examples/vdpa: fix vDPA device remove Li Zhang
2022-06-06 11:20   ` [PATCH v1 03/17] examples/vdpa: fix devices cleanup Li Zhang
2022-06-06 11:20   ` [PATCH 03/16] vdpa/mlx5: support pre create virtq resource Li Zhang
2022-06-06 11:20   ` [PATCH 04/16] common/mlx5: add DevX API to move QP to reset state Li Zhang
2022-06-06 11:20   ` [PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource Li Zhang
2022-06-06 11:20   ` [PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state Li Zhang
2022-06-06 11:20   ` [PATCH 05/16] vdpa/mlx5: support event qp reuse Li Zhang
2022-06-06 11:20   ` [PATCH 06/16] common/mlx5: extend virtq modifiable fields Li Zhang
2022-06-06 11:20   ` [PATCH v1 06/17] vdpa/mlx5: support event qp reuse Li Zhang
2022-06-06 11:20   ` [PATCH v1 07/17] common/mlx5: extend virtq modifiable fields Li Zhang
2022-06-06 11:20   ` [PATCH 07/16] vdpa/mlx5: pre-create virtq in the prob Li Zhang
2022-06-06 11:20   ` [PATCH 08/16] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
2022-06-06 11:20   ` [PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob Li Zhang
2022-06-06 11:20   ` [PATCH 09/16] vdpa/mlx5: add multi-thread management for configuration Li Zhang
2022-06-06 11:20   ` [PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
2022-06-06 11:20   ` [PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration Li Zhang
2022-06-06 11:20   ` [PATCH 10/16] vdpa/mlx5: add task ring for MT management Li Zhang
2022-06-06 11:20   ` [PATCH 11/16] vdpa/mlx5: add MT task for VM memory registration Li Zhang
2022-06-06 11:20   ` [PATCH v1 11/17] vdpa/mlx5: add task ring for MT management Li Zhang
2022-06-06 11:20   ` [PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration Li Zhang
2022-06-06 11:21   ` [PATCH 12/16] vdpa/mlx5: add virtq creation task for MT management Li Zhang
2022-06-06 11:21   ` [PATCH 13/16] vdpa/mlx5: add virtq LM log task Li Zhang
2022-06-06 11:21   ` [PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management Li Zhang
2022-06-06 11:21   ` [PATCH 14/16] vdpa/mlx5: add device close task Li Zhang
2022-06-06 11:21   ` [PATCH v1 14/17] vdpa/mlx5: add virtq LM log task Li Zhang
2022-06-06 11:21   ` [PATCH v1 15/17] vdpa/mlx5: add device close task Li Zhang
2022-06-06 11:21   ` [PATCH 15/16] vdpa/mlx5: add virtq sub-resources creation Li Zhang
2022-06-06 11:21   ` [PATCH v1 16/17] " Li Zhang
2022-06-06 11:21   ` [PATCH 16/16] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
2022-06-06 11:21   ` [PATCH v1 17/17] " Li Zhang
2022-06-06 11:46 ` [PATCH v1 00/17] Add vDPA multi-threads optiomization Li Zhang
2022-06-06 11:46   ` [PATCH v1 01/17] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
2022-06-06 11:46   ` [PATCH v1 02/17] eal: add device removal in rte cleanup Li Zhang
2022-06-06 11:46   ` [PATCH v1 03/17] examples/vdpa: fix devices cleanup Li Zhang
2022-06-06 11:46   ` [PATCH v1 04/17] vdpa/mlx5: support pre create virtq resource Li Zhang
2022-06-06 11:46   ` [PATCH v1 05/17] common/mlx5: add DevX API to move QP to reset state Li Zhang
2022-06-06 11:46   ` [PATCH v1 06/17] vdpa/mlx5: support event qp reuse Li Zhang
2022-06-06 11:46   ` [PATCH v1 07/17] common/mlx5: extend virtq modifiable fields Li Zhang
2022-06-06 11:46   ` [PATCH v1 08/17] vdpa/mlx5: pre-create virtq in the prob Li Zhang
2022-06-06 11:46   ` [PATCH v1 09/17] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
2022-06-06 11:46   ` [PATCH v1 10/17] vdpa/mlx5: add multi-thread management for configuration Li Zhang
2022-06-06 11:46   ` [PATCH v1 11/17] vdpa/mlx5: add task ring for MT management Li Zhang
2022-06-06 11:46   ` [PATCH v1 12/17] vdpa/mlx5: add MT task for VM memory registration Li Zhang
2022-06-06 11:46   ` [PATCH v1 13/17] vdpa/mlx5: add virtq creation task for MT management Li Zhang
2022-06-06 11:46   ` [PATCH v1 14/17] vdpa/mlx5: add virtq LM log task Li Zhang
2022-06-06 11:46   ` [PATCH v1 15/17] vdpa/mlx5: add device close task Li Zhang
2022-06-06 11:46   ` [PATCH v1 16/17] vdpa/mlx5: add virtq sub-resources creation Li Zhang
2022-06-06 11:46   ` [PATCH v1 17/17] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
2022-06-16  2:29 ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Li Zhang
2022-06-16  2:29   ` [PATCH v2 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
2022-06-17 14:27     ` Maxime Coquelin
2022-06-16  2:29   ` [PATCH v2 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
2022-06-17 15:36     ` Maxime Coquelin
2022-06-18  8:04       ` Li Zhang
2022-06-16  2:30   ` [PATCH v2 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
2022-06-17 15:41     ` Maxime Coquelin
2022-06-16  2:30   ` [PATCH v2 04/15] vdpa/mlx5: support event qp reuse Li Zhang
2022-06-16  2:30   ` [PATCH v2 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
2022-06-17 15:45     ` Maxime Coquelin
2022-06-16  2:30   ` [PATCH v2 06/15] vdpa/mlx5: pre-create virtq in the prob Li Zhang
2022-06-17 15:53     ` Maxime Coquelin
2022-06-18  7:54       ` Li Zhang
2022-06-16  2:30   ` [PATCH v2 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
2022-06-16  2:30   ` [PATCH v2 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
2022-06-16  2:30   ` [PATCH v2 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
2022-06-16  2:30   ` [PATCH v2 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
2022-06-16  2:30   ` [PATCH v2 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
2022-06-16  2:30   ` [PATCH v2 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
2022-06-16  2:30   ` [PATCH v2 13/15] vdpa/mlx5: add device close task Li Zhang
2022-06-16  2:30   ` [PATCH v2 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
2022-06-16  2:30   ` [PATCH v2 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
2022-06-16  7:24   ` [PATCH v2 00/15] mlx5/vdpa: optimize live migration time Maxime Coquelin
2022-06-16  9:02     ` Maxime Coquelin
2022-06-17  1:49       ` Li Zhang
2022-06-18  8:47 ` [PATCH v3 " Li Zhang
2022-06-18  8:47   ` [PATCH v3 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
2022-06-18  8:47   ` [PATCH v3 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
2022-06-18  8:47   ` [PATCH v3 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
2022-06-18  8:47   ` [PATCH v3 04/15] vdpa/mlx5: support event qp reuse Li Zhang
2022-06-18  8:47   ` [PATCH v3 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
2022-06-18  8:47   ` [PATCH v3 06/15] vdpa/mlx5: pre-create virtq at probe time Li Zhang
2022-06-18  8:47   ` [PATCH v3 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
2022-06-18  8:47   ` [PATCH v3 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
2022-06-18  8:47   ` [PATCH v3 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
2022-06-18  8:48   ` [PATCH v3 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
2022-06-18  8:48   ` [PATCH v3 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
2022-06-18  8:48   ` [PATCH v3 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
2022-06-18  8:48   ` [PATCH v3 13/15] vdpa/mlx5: add device close task Li Zhang
2022-06-18  8:48   ` [PATCH v3 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
2022-06-18  8:48   ` [PATCH v3 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
2022-06-18  9:02 ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Li Zhang
2022-06-18  9:02   ` [PATCH v4 01/15] vdpa/mlx5: fix usage of capability for max number of virtqs Li Zhang
2022-06-18  9:02   ` [PATCH v4 02/15] vdpa/mlx5: support pre create virtq resource Li Zhang
2022-06-18  9:02   ` [PATCH v4 03/15] common/mlx5: add DevX API to move QP to reset state Li Zhang
2022-06-18  9:02   ` [PATCH v4 04/15] vdpa/mlx5: support event qp reuse Li Zhang
2022-06-20  8:27     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 05/15] common/mlx5: extend virtq modifiable fields Li Zhang
2022-06-20  9:01     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 06/15] vdpa/mlx5: pre-create virtq at probe time Li Zhang
2022-06-18  9:02   ` [PATCH v4 07/15] vdpa/mlx5: optimize datapath-control synchronization Li Zhang
2022-06-20  9:25     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 08/15] vdpa/mlx5: add multi-thread management for configuration Li Zhang
2022-06-20 10:57     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 09/15] vdpa/mlx5: add task ring for MT management Li Zhang
2022-06-20 15:05     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 10/15] vdpa/mlx5: add MT task for VM memory registration Li Zhang
2022-06-20 15:12     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 11/15] vdpa/mlx5: add virtq creation task for MT management Li Zhang
2022-06-20 15:19     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 12/15] vdpa/mlx5: add virtq LM log task Li Zhang
2022-06-20 15:42     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 13/15] vdpa/mlx5: add device close task Li Zhang
2022-06-20 15:54     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 14/15] vdpa/mlx5: add virtq sub-resources creation Li Zhang
2022-06-20 16:01     ` Maxime Coquelin
2022-06-18  9:02   ` [PATCH v4 15/15] vdpa/mlx5: prepare virtqueue resource creation Li Zhang
2022-06-20 16:30     ` Maxime Coquelin
2022-06-21  9:29   ` [PATCH v4 00/15] mlx5/vdpa: optimize live migration time Maxime Coquelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).