DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/7] vdpa/mlx5: improve device shutdown time
@ 2022-02-24 13:28 Xueming Li
  2022-02-24 13:28 ` [PATCH 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
                   ` (9 more replies)
  0 siblings, 10 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 13:28 UTC (permalink / raw)
  To: dev; +Cc: xuemingl


Xueming Li (7):
  vdpa/mlx5: fix interrupt trash that leads to segment fault
  vdpa/mlx5: fix dead loop when process interrupted
  vdpa/mlx5: no kick handling during shutdown
  vdpa/mlx5: reuse resources in reconfiguration
  vdpa/mlx5: cache and reuse hardware resources
  vdpa/mlx5: support device cleanup callback
  vdpa/mlx5: make statistics counter persistent

 doc/guides/vdpadevs/mlx5.rst        |   6 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 229 +++++++++++++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  31 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 +--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  38 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  25 +--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 189 +++++++++++------------
 7 files changed, 334 insertions(+), 207 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
@ 2022-02-24 13:28 ` Xueming Li
  2022-02-24 13:28 ` [PATCH 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 13:28 UTC (permalink / raw)
  To: dev
  Cc: xuemingl, matan, stable, Matan Azrad, Viacheslav Ovsiienko,
	Maxime Coquelin

Disable interrupt unregister timeout to avoid invalid FD caused
interrupt thread segment fault.

Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
Cc: matan@mellanox.com
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 3416797d289..de324506cb9 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -17,7 +17,7 @@
 
 
 static void
-mlx5_vdpa_virtq_handler(void *cb_arg)
+mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 {
 	struct mlx5_vdpa_virtq *virtq = cb_arg;
 	struct mlx5_vdpa_priv *priv = virtq->priv;
@@ -59,20 +59,16 @@ static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	unsigned int i;
-	int retries = MLX5_VDPA_INTR_RETRIES;
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) != -1) {
-		while (retries-- && ret == -EAGAIN) {
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
 			ret = rte_intr_callback_unregister(virtq->intr_handle,
-							mlx5_vdpa_virtq_handler,
-							virtq);
+					mlx5_vdpa_virtq_kick_handler, virtq);
 			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d "
-				"of virtq %d interrupt, retries = %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				(int)virtq->index, retries);
-
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					(int)virtq->index);
 				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
 			}
 		}
@@ -359,7 +355,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 			goto error;
 
 		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_handler,
+					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
 			rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 2/7] vdpa/mlx5: fix dead loop when process interrupted
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  2022-02-24 13:28 ` [PATCH 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
@ 2022-02-24 13:28 ` Xueming Li
  2022-02-24 13:28 ` [PATCH 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 13:28 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, stable, Matan Azrad, Viacheslav Ovsiienko, Maxime Coquelin

In Ctrl+C handling, sometimes kick handling thread gets endless EGAIN
error and fall into dead lock.

Kick happens frequently in real system due to busy traffic or retry
mechanism. This patch simplifies kick firmware anyway and skip setting
hardware notifier due to potential device error, notifier could be set
in next successful kick request.

Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index de324506cb9..e1e05924a40 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -23,11 +23,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	struct mlx5_vdpa_priv *priv = virtq->priv;
 	uint64_t buf;
 	int nbytes;
+	int retry;
 
 	if (rte_intr_fd_get(virtq->intr_handle) < 0)
 		return;
-
-	do {
+	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
 		if (nbytes < 0) {
@@ -39,7 +39,9 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 				virtq->index, strerror(errno));
 		}
 		break;
-	} while (1);
+	}
+	if (nbytes < 0)
+		return;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
 		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/7] vdpa/mlx5: no kick handling during shutdown
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  2022-02-24 13:28 ` [PATCH 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
  2022-02-24 13:28 ` [PATCH 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
@ 2022-02-24 13:28 ` Xueming Li
  2022-02-24 13:28 ` [PATCH 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 13:28 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

When Qemu suspend a VM, hw notifier is un-mmapped while vCPU thread may
still active and write notifier through kick socket.

PMD kick handler thread tries to install hw notifier through slave
socket in such case will timeout and slow down device close.

This patch skips hw notifier install if VQ or device in middle of
shutdown.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 17 ++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 12 +++++++++++-
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 749c9d097cf..48f20d9ecdb 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -252,13 +252,15 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_err_event_unset(priv);
 	mlx5_vdpa_cqe_event_unset(priv);
-	if (priv->configured)
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
+		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
+	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_mem_dereg(priv);
-	priv->configured = 0;
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
@@ -277,7 +279,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -EINVAL;
 	}
-	if (priv->configured && mlx5_vdpa_dev_close(vid)) {
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED &&
+	    mlx5_vdpa_dev_close(vid)) {
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
@@ -291,7 +294,7 @@ mlx5_vdpa_dev_config(int vid)
 		mlx5_vdpa_dev_close(vid);
 		return -1;
 	}
-	priv->configured = 1;
+	priv->state = MLX5_VDPA_STATE_CONFIGURED;
 	DRV_LOG(INFO, "vDPA device %d was configured.", vid);
 	return 0;
 }
@@ -373,7 +376,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		DRV_LOG(ERR, "Device %s was not configured.",
 				vdev->device->name);
 		return -ENODATA;
@@ -401,7 +404,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		DRV_LOG(ERR, "Device %s was not configured.",
 				vdev->device->name);
 		return -ENODATA;
@@ -594,7 +597,7 @@ mlx5_vdpa_dev_remove(struct mlx5_common_device *cdev)
 		TAILQ_REMOVE(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	if (found) {
-		if (priv->configured)
+		if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 			mlx5_vdpa_dev_close(priv->vid);
 		if (priv->var) {
 			mlx5_glue->dv_free_var(priv->var);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 22617924eac..cc83d7cba3d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -113,9 +113,15 @@ enum {
 	MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT
 };
 
+enum mlx5_dev_state {
+	MLX5_VDPA_STATE_PROBED = 0,
+	MLX5_VDPA_STATE_CONFIGURED,
+	MLX5_VDPA_STATE_IN_PROGRESS /* Shutting down. */
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
-	uint8_t configured;
+	enum mlx5_dev_state state;
 	pthread_mutex_t vq_config_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e1e05924a40..b1d584ca8b0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -25,6 +25,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+			priv->vid, virtq->index);
+		return;
+	}
 	if (rte_intr_fd_get(virtq->intr_handle) < 0)
 		return;
 	for (retry = 0; retry < 3; ++retry) {
@@ -43,6 +48,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	if (nbytes < 0)
 		return;
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+			priv->vid, virtq->index);
+		return;
+	}
 	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
 		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
 			virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_ERR;
@@ -541,7 +551,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 
 	DRV_LOG(INFO, "Update virtq %d status %sable -> %sable.", index,
 		virtq->enable ? "en" : "dis", enable ? "en" : "dis");
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		virtq->enable = !!enable;
 		return 0;
 	}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 4/7] vdpa/mlx5: reuse resources in reconfiguration
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                   ` (2 preceding siblings ...)
  2022-02-24 13:28 ` [PATCH 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
@ 2022-02-24 13:28 ` Xueming Li
  2022-02-24 13:28 ` [PATCH 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 13:28 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

To speed up device resume, create reuseable resources during device
probe state, release when device remove. Reused resources includes TIS,
TD, VAR Doorbell mmap, error handling event channel and interrupt
handler, UAR, Rx event channel, NULL MR, steer domain and table.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 165 +++++++++++++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |   9 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  11 --
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  25 +----
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  44 --------
 6 files changed, 147 insertions(+), 130 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 48f20d9ecdb..7e57ae715a8 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -5,6 +5,7 @@
 #include <net/if.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/mman.h>
 #include <fcntl.h>
 #include <netinet/in.h>
 
@@ -49,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
+
 static struct mlx5_vdpa_priv *
 mlx5_vdpa_find_priv_resource_by_vdev(struct rte_vdpa_device *vdev)
 {
@@ -250,7 +253,6 @@ mlx5_vdpa_dev_close(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	mlx5_vdpa_err_event_unset(priv);
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
@@ -258,7 +260,6 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
-	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_mem_dereg(priv);
 	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
@@ -288,7 +289,7 @@ mlx5_vdpa_dev_config(int vid)
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %s.",
 				vdev->device->name);
-	if (mlx5_vdpa_mem_register(priv) || mlx5_vdpa_err_event_setup(priv) ||
+	if (mlx5_vdpa_mem_register(priv) ||
 	    mlx5_vdpa_virtqs_prepare(priv) || mlx5_vdpa_steer_setup(priv) ||
 	    mlx5_vdpa_cqe_event_setup(priv)) {
 		mlx5_vdpa_dev_close(vid);
@@ -507,13 +508,88 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
 }
 
+static int
+mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_tis_attr tis_attr = {0};
+	struct ibv_context *ctx = priv->cdev->ctx;
+	uint32_t i;
+	int retry;
+
+	for (retry = 0; retry < 7; retry++) {
+		priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+		if (priv->var != NULL)
+			break;
+		DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.", retry);
+		/* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
+		usleep(100000U << retry);
+	}
+	if (!priv->var) {
+		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	/* Always map the entire page. */
+	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
+				   PROT_WRITE, MAP_SHARED, ctx->cmd_fd,
+				   priv->var->mmap_off);
+	if (priv->virtq_db_addr == MAP_FAILED) {
+		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
+		priv->virtq_db_addr = NULL;
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
+		priv->virtq_db_addr);
+	priv->td = mlx5_devx_cmd_create_td(ctx);
+	if (!priv->td) {
+		DRV_LOG(ERR, "Failed to create transport domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	tis_attr.transport_domain = priv->td->id;
+	for (i = 0; i < priv->num_lag_ports; i++) {
+		/* 0 is auto affinity, non-zero value to propose port. */
+		tis_attr.lag_tx_port_affinity = i + 1;
+		priv->tiss[i] = mlx5_devx_cmd_create_tis(ctx, &tis_attr);
+		if (!priv->tiss[i]) {
+			DRV_LOG(ERR, "Failed to create TIS %u.", i);
+			return -rte_errno;
+		}
+	}
+	priv->null_mr = mlx5_glue->alloc_null_mr(priv->cdev->pd);
+	if (!priv->null_mr) {
+		DRV_LOG(ERR, "Failed to allocate null MR.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
+	priv->steer.domain = mlx5_glue->dr_create_domain(ctx,
+					MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
+	if (!priv->steer.domain) {
+		DRV_LOG(ERR, "Failed to create Rx domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
+	if (!priv->steer.tbl) {
+		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	if (mlx5_vdpa_err_event_setup(priv) != 0)
+		return -rte_errno;
+	if (mlx5_vdpa_event_qp_global_prepare(priv))
+		return -rte_errno;
+	return 0;
+}
+
 static int
 mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		    struct mlx5_kvargs_ctrl *mkvlist)
 {
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct mlx5_hca_attr *attr = &cdev->config.hca_attr;
-	int retry;
 
 	if (!attr->vdpa.valid || !attr->vdpa.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Not enough capabilities to support vdpa, maybe "
@@ -537,25 +613,10 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
+	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
-	for (retry = 0; retry < 7; retry++) {
-		priv->var = mlx5_glue->dv_alloc_var(priv->cdev->ctx, 0);
-		if (priv->var != NULL)
-			break;
-		DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
-		/* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
-		usleep(100000U << retry);
-	}
-	if (!priv->var) {
-		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
+	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
-	}
-	priv->err_intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (priv->err_intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
 	if (priv->vdev == NULL) {
 		DRV_LOG(ERR, "Failed to register vDPA device.");
@@ -564,19 +625,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	}
 	mlx5_vdpa_config_get(mkvlist, priv);
 	SLIST_INIT(&priv->mr_list);
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
-
 error:
-	if (priv) {
-		if (priv->var)
-			mlx5_glue->dv_free_var(priv->var);
-		rte_intr_instance_free(priv->err_intr_handle);
-		rte_free(priv);
-	}
+	if (priv)
+		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
 }
 
@@ -596,22 +651,48 @@ mlx5_vdpa_dev_remove(struct mlx5_common_device *cdev)
 	if (found)
 		TAILQ_REMOVE(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
-	if (found) {
-		if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-			mlx5_vdpa_dev_close(priv->vid);
-		if (priv->var) {
-			mlx5_glue->dv_free_var(priv->var);
-			priv->var = NULL;
-		}
-		if (priv->vdev)
-			rte_vdpa_unregister_device(priv->vdev);
-		pthread_mutex_destroy(&priv->vq_config_lock);
-		rte_intr_instance_free(priv->err_intr_handle);
-		rte_free(priv);
-	}
+	if (found)
+		mlx5_vdpa_dev_release(priv);
 	return 0;
 }
 
+static void
+mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+
+	mlx5_vdpa_event_qp_global_release(priv);
+	mlx5_vdpa_err_event_unset(priv);
+	if (priv->steer.tbl)
+		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
+	if (priv->steer.domain)
+		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
+	if (priv->null_mr)
+		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
+	for (i = 0; i < priv->num_lag_ports; i++) {
+		if (priv->tiss[i])
+			claim_zero(mlx5_devx_cmd_destroy(priv->tiss[i]));
+	}
+	if (priv->td)
+		claim_zero(mlx5_devx_cmd_destroy(priv->td));
+	if (priv->virtq_db_addr)
+		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
+	if (priv->var)
+		mlx5_glue->dv_free_var(priv->var);
+}
+
+static void
+mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
+		mlx5_vdpa_dev_close(priv->vid);
+	mlx5_vdpa_release_dev_resources(priv);
+	if (priv->vdev)
+		rte_vdpa_unregister_device(priv->vdev);
+	pthread_mutex_destroy(&priv->vq_config_lock);
+	rte_free(priv);
+}
+
 static const struct rte_pci_id mlx5_vdpa_pci_id_map[] = {
 	{
 		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index cc83d7cba3d..e0ba20b953c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -233,6 +233,15 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 void mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp);
 
+/**
+ * Create all the event global resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+int
+mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv);
+
 /**
  * Release all the event global resources.
  *
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f8d910b33f8..7167a98db0f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -40,11 +40,9 @@ mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv)
 }
 
 /* Prepare all the global resources for all the event objects.*/
-static int
+int
 mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv)
 {
-	if (priv->eventc)
-		return 0;
 	priv->eventc = mlx5_os_devx_create_event_channel(priv->cdev->ctx,
 			   MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
 	if (!priv->eventc) {
@@ -389,22 +387,30 @@ mlx5_vdpa_err_event_setup(struct mlx5_vdpa_priv *priv)
 	flags = fcntl(priv->err_chnl->fd, F_GETFL);
 	ret = fcntl(priv->err_chnl->fd, F_SETFL, flags | O_NONBLOCK);
 	if (ret) {
+		rte_errno = errno;
 		DRV_LOG(ERR, "Failed to change device event channel FD.");
 		goto error;
 	}
-
+	priv->err_intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (priv->err_intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		goto error;
+	}
 	if (rte_intr_fd_set(priv->err_intr_handle, priv->err_chnl->fd))
 		goto error;
 
 	if (rte_intr_type_set(priv->err_intr_handle, RTE_INTR_HANDLE_EXT))
 		goto error;
 
-	if (rte_intr_callback_register(priv->err_intr_handle,
-				       mlx5_vdpa_err_interrupt_handler,
-				       priv)) {
+	ret = rte_intr_callback_register(priv->err_intr_handle,
+					 mlx5_vdpa_err_interrupt_handler,
+					 priv);
+	if (ret != 0) {
 		rte_intr_fd_set(priv->err_intr_handle, 0);
 		DRV_LOG(ERR, "Failed to register error interrupt for device %d.",
 			priv->vid);
+		rte_errno = -ret;
 		goto error;
 	} else {
 		DRV_LOG(DEBUG, "Registered error interrupt for device%d.",
@@ -453,6 +459,7 @@ mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv)
 		mlx5_glue->devx_destroy_event_channel(priv->err_chnl);
 		priv->err_chnl = NULL;
 	}
+	rte_intr_instance_free(priv->err_intr_handle);
 }
 
 int
@@ -575,8 +582,6 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
-	if (mlx5_vdpa_event_qp_global_prepare(priv))
-		return -1;
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index 599079500b0..62f5530e91d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -34,10 +34,6 @@ mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 	SLIST_INIT(&priv->mr_list);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	if (priv->null_mr) {
-		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
-		priv->null_mr = NULL;
-	}
 	if (priv->vmem) {
 		free(priv->vmem);
 		priv->vmem = NULL;
@@ -196,13 +192,6 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	if (!mem)
 		return -rte_errno;
 	priv->vmem = mem;
-	priv->null_mr = mlx5_glue->alloc_null_mr(priv->cdev->pd);
-	if (!priv->null_mr) {
-		DRV_LOG(ERR, "Failed to allocate null MR.");
-		ret = -errno;
-		goto error;
-	}
-	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
 	for (i = 0; i < mem->nregions; i++) {
 		reg = &mem->regions[i];
 		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index a0fd2776e57..e42868486e7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -45,14 +45,6 @@ void
 mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
 {
 	mlx5_vdpa_rss_flows_destroy(priv);
-	if (priv->steer.tbl) {
-		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
-		priv->steer.tbl = NULL;
-	}
-	if (priv->steer.domain) {
-		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
-		priv->steer.domain = NULL;
-	}
 	if (priv->steer.rqt) {
 		claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
 		priv->steer.rqt = NULL;
@@ -248,11 +240,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 	int ret = mlx5_vdpa_rqt_prepare(priv);
 
 	if (ret == 0) {
-		mlx5_vdpa_rss_flows_destroy(priv);
-		if (priv->steer.rqt) {
-			claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
-			priv->steer.rqt = NULL;
-		}
+		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
@@ -269,17 +257,6 @@ int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
 #ifdef HAVE_MLX5DV_DR
-	priv->steer.domain = mlx5_glue->dr_create_domain(priv->cdev->ctx,
-						  MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
-	if (!priv->steer.domain) {
-		DRV_LOG(ERR, "Failed to create Rx domain.");
-		goto error;
-	}
-	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
-	if (!priv->steer.tbl) {
-		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
-		goto error;
-	}
 	if (mlx5_vdpa_steer_update(priv))
 		goto error;
 	return 0;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index b1d584ca8b0..6bda9f1814a 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -3,7 +3,6 @@
  */
 #include <string.h>
 #include <unistd.h>
-#include <sys/mman.h>
 #include <sys/eventfd.h>
 
 #include <rte_malloc.h>
@@ -120,20 +119,6 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		if (virtq->counters)
 			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
 	}
-	for (i = 0; i < priv->num_lag_ports; i++) {
-		if (priv->tiss[i]) {
-			claim_zero(mlx5_devx_cmd_destroy(priv->tiss[i]));
-			priv->tiss[i] = NULL;
-		}
-	}
-	if (priv->td) {
-		claim_zero(mlx5_devx_cmd_destroy(priv->td));
-		priv->td = NULL;
-	}
-	if (priv->virtq_db_addr) {
-		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
-		priv->virtq_db_addr = NULL;
-	}
 	priv->features = 0;
 	memset(priv->virtqs, 0, sizeof(*virtq) * priv->nr_virtqs);
 	priv->nr_virtqs = 0;
@@ -462,8 +447,6 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_devx_tis_attr tis_attr = {0};
-	struct ibv_context *ctx = priv->cdev->ctx;
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
@@ -485,33 +468,6 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			(int)nr_vring);
 		return -1;
 	}
-	/* Always map the entire page. */
-	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
-				   PROT_WRITE, MAP_SHARED, ctx->cmd_fd,
-				   priv->var->mmap_off);
-	if (priv->virtq_db_addr == MAP_FAILED) {
-		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
-		priv->virtq_db_addr = NULL;
-		goto error;
-	} else {
-		DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
-			priv->virtq_db_addr);
-	}
-	priv->td = mlx5_devx_cmd_create_td(ctx);
-	if (!priv->td) {
-		DRV_LOG(ERR, "Failed to create transport domain.");
-		return -rte_errno;
-	}
-	tis_attr.transport_domain = priv->td->id;
-	for (i = 0; i < priv->num_lag_ports; i++) {
-		/* 0 is auto affinity, non-zero value to propose port. */
-		tis_attr.lag_tx_port_affinity = i + 1;
-		priv->tiss[i] = mlx5_devx_cmd_create_tis(ctx, &tis_attr);
-		if (!priv->tiss[i]) {
-			DRV_LOG(ERR, "Failed to create TIS %u.", i);
-			goto error;
-		}
-	}
 	priv->nr_virtqs = nr_vring;
 	for (i = 0; i < nr_vring; i++)
 		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/7] vdpa/mlx5: cache and reuse hardware resources
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                   ` (3 preceding siblings ...)
  2022-02-24 13:28 ` [PATCH 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
@ 2022-02-24 13:28 ` Xueming Li
  2022-02-24 13:28 ` [PATCH 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 13:28 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

During device suspend and resume, resources are not changed normally.
When huge resources allocated to VM, like huge memory size or lots of
queues, time spent on release and recreate became significant.

To speed up, this patch reuse resoruces like VM MR and VirtQ memory if
not changed.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 11 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   | 27 ++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 73 +++++++++++++++++++++--------
 4 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 7e57ae715a8..dbaa590d5d1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -241,6 +241,13 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
+static void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
+{
+	mlx5_vdpa_virtqs_cleanup(priv);
+	mlx5_vdpa_mem_dereg(priv);
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -260,7 +267,8 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
-	mlx5_vdpa_mem_dereg(priv);
+	if (priv->lm_mr.addr)
+		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
@@ -661,6 +669,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	mlx5_vdpa_dev_cache_clean(priv);
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_err_event_unset(priv);
 	if (priv->steer.tbl)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e0ba20b953c..540bf87a352 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -289,13 +289,21 @@ int mlx5_vdpa_err_event_setup(struct mlx5_vdpa_priv *priv);
 void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
 
 /**
- * Release a virtq and all its related resources.
+ * Release virtqs and resources except that to be reused.
  *
  * @param[in] priv
  *   The vdpa driver private structure.
  */
 void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Cleanup cached resources of all virtqs.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv);
+
 /**
  * Create all the HW virtqs resources and all their related resources.
  *
@@ -323,7 +331,7 @@ int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 int mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable);
 
 /**
- * Unset steering and release all its related resources- stop traffic.
+ * Unset steering - stop traffic.
  *
  * @param[in] priv
  *   The vdpa driver private structure.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index 62f5530e91d..d6e3dd664b5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -32,8 +32,6 @@ mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 		entry = next;
 	}
 	SLIST_INIT(&priv->mr_list);
-	if (priv->lm_mr.addr)
-		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	if (priv->vmem) {
 		free(priv->vmem);
 		priv->vmem = NULL;
@@ -149,6 +147,23 @@ mlx5_vdpa_vhost_mem_regions_prepare(int vid, uint8_t *mode, uint64_t *mem_size,
 	return mem;
 }
 
+static int
+mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
+{
+	uint32_t i;
+
+	if (mem1->nregions != mem2->nregions)
+		return -1;
+	for (i = 0; i < mem1->nregions; i++) {
+		if (mem1->regions[i].guest_phys_addr !=
+		    mem2->regions[i].guest_phys_addr)
+			return -1;
+		if (mem1->regions[i].size != mem2->regions[i].size)
+			return -1;
+	}
+	return 0;
+}
+
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
@@ -191,6 +206,14 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 
 	if (!mem)
 		return -rte_errno;
+	if (priv->vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
 	priv->vmem = mem;
 	for (i = 0; i < mem->nregions; i++) {
 		reg = &mem->regions[i];
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6bda9f1814a..c42846ecb3c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -66,10 +66,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Release cached VQ resources. */
+void
+mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
+			if (virtq->umems[j].obj) {
+				claim_zero(mlx5_glue->devx_umem_dereg
+							(virtq->umems[j].obj));
+				virtq->umems[j].obj = NULL;
+			}
+			if (virtq->umems[j].buf) {
+				rte_free(virtq->umems[j].buf);
+				virtq->umems[j].buf = NULL;
+			}
+			virtq->umems[j].size = 0;
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	unsigned int i;
 	int ret = -EAGAIN;
 
 	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
@@ -94,13 +117,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
-							 (virtq->umems[i].obj));
-		rte_free(virtq->umems[i].buf);
-	}
-	memset(&virtq->umems, 0, sizeof(virtq->umems));
 	if (virtq->eqp.fw_qp)
 		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
@@ -120,7 +136,6 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
 	}
 	priv->features = 0;
-	memset(priv->virtqs, 0, sizeof(*virtq) * priv->nr_virtqs);
 	priv->nr_virtqs = 0;
 }
 
@@ -215,6 +230,8 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
 	if (ret)
 		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->index = index;
 	virtq->vq_size = vq.size;
 	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
@@ -259,24 +276,42 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	/* Setup 3 UMEMs for each virtq. */
 	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
-							  priv->caps.umems[i].b;
-		virtq->umems[i].buf = rte_zmalloc(__func__,
-						  virtq->umems[i].size, 4096);
-		if (!virtq->umems[i].buf) {
+		uint32_t size;
+		void *buf;
+		struct mlx5dv_devx_umem *obj;
+
+		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
+		if (virtq->umems[i].size == size &&
+		    virtq->umems[i].obj != NULL) {
+			/* Reuse registered memory. */
+			memset(virtq->umems[i].buf, 0, size);
+			goto reuse;
+		}
+		if (virtq->umems[i].obj)
+			claim_zero(mlx5_glue->devx_umem_dereg
+				   (virtq->umems[i].obj));
+		if (virtq->umems[i].buf)
+			rte_free(virtq->umems[i].buf);
+		virtq->umems[i].size = 0;
+		virtq->umems[i].obj = NULL;
+		virtq->umems[i].buf = NULL;
+		buf = rte_zmalloc(__func__, size, 4096);
+		if (buf == NULL) {
 			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
 			goto error;
 		}
-		virtq->umems[i].obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
-							virtq->umems[i].buf,
-							virtq->umems[i].size,
-							IBV_ACCESS_LOCAL_WRITE);
-		if (!virtq->umems[i].obj) {
+		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
+					       IBV_ACCESS_LOCAL_WRITE);
+		if (obj == NULL) {
 			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
 			goto error;
 		}
+		virtq->umems[i].size = size;
+		virtq->umems[i].buf = buf;
+		virtq->umems[i].obj = obj;
+reuse:
 		attr.umems[i].id = virtq->umems[i].obj->umem_id;
 		attr.umems[i].offset = 0;
 		attr.umems[i].size = virtq->umems[i].size;
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 6/7] vdpa/mlx5: support device cleanup callback
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                   ` (4 preceding siblings ...)
  2022-02-24 13:28 ` [PATCH 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
@ 2022-02-24 13:28 ` Xueming Li
  2022-02-24 13:28 ` [PATCH 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 13:28 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

This patch supports device cleanup callback API which called when device
disconected with VM. Cached resources like VM MR and VQ memory are
released.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 23 +++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index dbaa590d5d1..c83b1141482 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -270,6 +270,8 @@ mlx5_vdpa_dev_close(int vid)
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
+	if (!priv->connected)
+		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
@@ -294,6 +296,7 @@ mlx5_vdpa_dev_config(int vid)
 		return -1;
 	}
 	priv->vid = vid;
+	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %s.",
 				vdev->device->name);
@@ -431,12 +434,32 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 	return mlx5_vdpa_virtq_stats_reset(priv, qid);
 }
 
+static int
+mlx5_vdpa_dev_clean(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (vdev == NULL)
+		return -1;
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	if (priv->state == MLX5_VDPA_STATE_PROBED)
+		mlx5_vdpa_dev_cache_clean(priv);
+	priv->connected = false;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = mlx5_vdpa_dev_config,
 	.dev_close = mlx5_vdpa_dev_close,
+	.dev_cleanup = mlx5_vdpa_dev_clean,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 540bf87a352..24bafe85b44 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -121,6 +121,7 @@ enum mlx5_dev_state {
 
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	bool connected;
 	enum mlx5_dev_state state;
 	pthread_mutex_t vq_config_lock;
 	uint64_t no_traffic_counter;
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 7/7] vdpa/mlx5: make statistics counter persistent
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                   ` (5 preceding siblings ...)
  2022-02-24 13:28 ` [PATCH 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
@ 2022-02-24 13:28 ` Xueming Li
  2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 13:28 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

To speed the device suspend and resume time, make counter persitent
in reconfiguration until the device gets removed.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst        |  6 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 19 +++++++----------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  1 +
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 32 +++++++++++------------------
 4 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index acb791032ad..3ded142311e 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -109,3 +109,9 @@ Upon potential hardware errors, mlx5 PMD try to recover, give up if failed 3
 times in 3 seconds, virtq will be put in disable state. User should check log
 to get error information, or query vdpa statistics counter to know error type
 and count report.
+
+Statistics
+^^^^^^^^^^
+
+The device statistics counter persists in reconfiguration until the device gets
+removed. User can reset counters by calling function rte_vdpa_reset_stats().
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index c83b1141482..92ef7777169 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -388,12 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED) {
-		DRV_LOG(ERR, "Device %s was not configured.",
-				vdev->device->name);
-		return -ENODATA;
-	}
-	if (qid >= (int)priv->nr_virtqs) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -416,12 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED) {
-		DRV_LOG(ERR, "Device %s was not configured.",
-				vdev->device->name);
-		return -ENODATA;
-	}
-	if (qid >= (int)priv->nr_virtqs) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -693,6 +683,11 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 
 	mlx5_vdpa_dev_cache_clean(priv);
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		if (!priv->virtqs[i].counters)
+			continue;
+		claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
+	}
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_err_event_unset(priv);
 	if (priv->steer.tbl)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 24bafe85b44..e7f3319f896 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -92,6 +92,7 @@ struct mlx5_vdpa_virtq {
 	struct rte_intr_handle *intr_handle;
 	uint64_t err_time[3]; /* RDTSC time of recent errors. */
 	uint32_t n_retry;
+	struct mlx5_devx_virtio_q_couners_attr stats;
 	struct mlx5_devx_virtio_q_couners_attr reset;
 };
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c42846ecb3c..d2c91b25db1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -127,14 +127,9 @@ void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
 	int i;
-	struct mlx5_vdpa_virtq *virtq;
 
-	for (i = 0; i < priv->nr_virtqs; i++) {
-		virtq = &priv->virtqs[i];
-		mlx5_vdpa_virtq_unset(virtq);
-		if (virtq->counters)
-			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
-	}
+	for (i = 0; i < priv->nr_virtqs; i++)
+		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -590,7 +585,7 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 			  struct rte_vdpa_stat *stats, unsigned int n)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[qid];
-	struct mlx5_devx_virtio_q_couners_attr attr = {0};
+	struct mlx5_devx_virtio_q_couners_attr *attr = &virtq->stats;
 	int ret;
 
 	if (!virtq->counters) {
@@ -598,7 +593,7 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 			"is invalid.", qid);
 		return -EINVAL;
 	}
-	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters, &attr);
+	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters, attr);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to read virtq %d stats from HW.", qid);
 		return ret;
@@ -608,37 +603,37 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 		return ret;
 	stats[MLX5_VDPA_STATS_RECEIVED_DESCRIPTORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_RECEIVED_DESCRIPTORS,
-		.value = attr.received_desc - virtq->reset.received_desc,
+		.value = attr->received_desc - virtq->reset.received_desc,
 	};
 	if (ret == MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS,
-		.value = attr.completed_desc - virtq->reset.completed_desc,
+		.value = attr->completed_desc - virtq->reset.completed_desc,
 	};
 	if (ret == MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS,
-		.value = attr.bad_desc_errors - virtq->reset.bad_desc_errors,
+		.value = attr->bad_desc_errors - virtq->reset.bad_desc_errors,
 	};
 	if (ret == MLX5_VDPA_STATS_EXCEED_MAX_CHAIN)
 		return ret;
 	stats[MLX5_VDPA_STATS_EXCEED_MAX_CHAIN] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_EXCEED_MAX_CHAIN,
-		.value = attr.exceed_max_chain - virtq->reset.exceed_max_chain,
+		.value = attr->exceed_max_chain - virtq->reset.exceed_max_chain,
 	};
 	if (ret == MLX5_VDPA_STATS_INVALID_BUFFER)
 		return ret;
 	stats[MLX5_VDPA_STATS_INVALID_BUFFER] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_INVALID_BUFFER,
-		.value = attr.invalid_buffer - virtq->reset.invalid_buffer,
+		.value = attr->invalid_buffer - virtq->reset.invalid_buffer,
 	};
 	if (ret == MLX5_VDPA_STATS_COMPLETION_ERRORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_COMPLETION_ERRORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_COMPLETION_ERRORS,
-		.value = attr.error_cqes - virtq->reset.error_cqes,
+		.value = attr->error_cqes - virtq->reset.error_cqes,
 	};
 	return ret;
 }
@@ -649,11 +644,8 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[qid];
 	int ret;
 
-	if (!virtq->counters) {
-		DRV_LOG(ERR, "Failed to read virtq %d statistics - virtq "
-			"is invalid.", qid);
-		return -EINVAL;
-	}
+	if (virtq->counters == NULL) /* VQ not enabled. */
+		return 0;
 	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters,
 						    &virtq->reset);
 	if (ret)
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                   ` (6 preceding siblings ...)
  2022-02-24 13:28 ` [PATCH 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
@ 2022-02-24 14:38 ` Xueming Li
  2022-02-24 14:38   ` [PATCH v1 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
                     ` (6 more replies)
  2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  9 siblings, 7 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 14:38 UTC (permalink / raw)
  To: dev; +Cc: xuemingl

v1:
 - rebase with latest upstream code
 - fix coverity issues

Xueming Li (7):
  vdpa/mlx5: fix interrupt trash that leads to segment fault
  vdpa/mlx5: fix dead loop when process interrupted
  vdpa/mlx5: no kick handling during shutdown
  vdpa/mlx5: reuse resources in reconfiguration
  vdpa/mlx5: cache and reuse hardware resources
  vdpa/mlx5: support device cleanup callback
  vdpa/mlx5: make statistics counter persistent

 doc/guides/vdpadevs/mlx5.rst        |   6 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 229 +++++++++++++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  31 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 +--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  38 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  25 +--
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 189 +++++++++++------------
 7 files changed, 334 insertions(+), 207 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault
  2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
@ 2022-02-24 14:38   ` Xueming Li
  2022-02-24 14:38   ` [PATCH v1 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 14:38 UTC (permalink / raw)
  To: dev
  Cc: xuemingl, matan, stable, Matan Azrad, Viacheslav Ovsiienko,
	Maxime Coquelin

Disable interrupt unregister timeout to avoid invalid FD caused
interrupt thread segment fault.

Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
Cc: matan@mellanox.com
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 3416797d289..de324506cb9 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -17,7 +17,7 @@
 
 
 static void
-mlx5_vdpa_virtq_handler(void *cb_arg)
+mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 {
 	struct mlx5_vdpa_virtq *virtq = cb_arg;
 	struct mlx5_vdpa_priv *priv = virtq->priv;
@@ -59,20 +59,16 @@ static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	unsigned int i;
-	int retries = MLX5_VDPA_INTR_RETRIES;
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) != -1) {
-		while (retries-- && ret == -EAGAIN) {
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
 			ret = rte_intr_callback_unregister(virtq->intr_handle,
-							mlx5_vdpa_virtq_handler,
-							virtq);
+					mlx5_vdpa_virtq_kick_handler, virtq);
 			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d "
-				"of virtq %d interrupt, retries = %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				(int)virtq->index, retries);
-
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					(int)virtq->index);
 				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
 			}
 		}
@@ -359,7 +355,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 			goto error;
 
 		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_handler,
+					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
 			rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 2/7] vdpa/mlx5: fix dead loop when process interrupted
  2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  2022-02-24 14:38   ` [PATCH v1 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
@ 2022-02-24 14:38   ` Xueming Li
  2022-02-24 14:38   ` [PATCH v1 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 14:38 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, stable, Matan Azrad, Viacheslav Ovsiienko, Maxime Coquelin

In Ctrl+C handling, sometimes kick handling thread gets endless EGAIN
error and fall into dead lock.

Kick happens frequently in real system due to busy traffic or retry
mechanism. This patch simplifies kick firmware anyway and skip setting
hardware notifier due to potential device error, notifier could be set
in next successful kick request.

Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index de324506cb9..e1e05924a40 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -23,11 +23,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	struct mlx5_vdpa_priv *priv = virtq->priv;
 	uint64_t buf;
 	int nbytes;
+	int retry;
 
 	if (rte_intr_fd_get(virtq->intr_handle) < 0)
 		return;
-
-	do {
+	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
 		if (nbytes < 0) {
@@ -39,7 +39,9 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 				virtq->index, strerror(errno));
 		}
 		break;
-	} while (1);
+	}
+	if (nbytes < 0)
+		return;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
 		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 3/7] vdpa/mlx5: no kick handling during shutdown
  2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  2022-02-24 14:38   ` [PATCH v1 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
  2022-02-24 14:38   ` [PATCH v1 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
@ 2022-02-24 14:38   ` Xueming Li
  2022-02-24 14:38   ` [PATCH v1 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 14:38 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

When Qemu suspend a VM, hw notifier is un-mmapped while vCPU thread may
still active and write notifier through kick socket.

PMD kick handler thread tries to install hw notifier through client
socket in such case will timeout and slow down device close.

This patch skips hw notifier install if VQ or device in middle of
shutdown.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 17 ++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 12 +++++++++++-
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 8dfaba791dc..a93a9e78f7f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -252,13 +252,15 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_err_event_unset(priv);
 	mlx5_vdpa_cqe_event_unset(priv);
-	if (priv->configured)
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
+		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
+	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_mem_dereg(priv);
-	priv->configured = 0;
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
@@ -277,7 +279,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -EINVAL;
 	}
-	if (priv->configured && mlx5_vdpa_dev_close(vid)) {
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED &&
+	    mlx5_vdpa_dev_close(vid)) {
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
@@ -291,7 +294,7 @@ mlx5_vdpa_dev_config(int vid)
 		mlx5_vdpa_dev_close(vid);
 		return -1;
 	}
-	priv->configured = 1;
+	priv->state = MLX5_VDPA_STATE_CONFIGURED;
 	DRV_LOG(INFO, "vDPA device %d was configured.", vid);
 	return 0;
 }
@@ -373,7 +376,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		DRV_LOG(ERR, "Device %s was not configured.",
 				vdev->device->name);
 		return -ENODATA;
@@ -401,7 +404,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		DRV_LOG(ERR, "Device %s was not configured.",
 				vdev->device->name);
 		return -ENODATA;
@@ -590,7 +593,7 @@ mlx5_vdpa_dev_remove(struct mlx5_common_device *cdev)
 		TAILQ_REMOVE(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	if (found) {
-		if (priv->configured)
+		if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 			mlx5_vdpa_dev_close(priv->vid);
 		if (priv->var) {
 			mlx5_glue->dv_free_var(priv->var);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 22617924eac..cc83d7cba3d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -113,9 +113,15 @@ enum {
 	MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT
 };
 
+enum mlx5_dev_state {
+	MLX5_VDPA_STATE_PROBED = 0,
+	MLX5_VDPA_STATE_CONFIGURED,
+	MLX5_VDPA_STATE_IN_PROGRESS /* Shutting down. */
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
-	uint8_t configured;
+	enum mlx5_dev_state state;
 	pthread_mutex_t vq_config_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e1e05924a40..b1d584ca8b0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -25,6 +25,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+			priv->vid, virtq->index);
+		return;
+	}
 	if (rte_intr_fd_get(virtq->intr_handle) < 0)
 		return;
 	for (retry = 0; retry < 3; ++retry) {
@@ -43,6 +48,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	if (nbytes < 0)
 		return;
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+			priv->vid, virtq->index);
+		return;
+	}
 	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
 		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
 			virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_ERR;
@@ -541,7 +551,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 
 	DRV_LOG(INFO, "Update virtq %d status %sable -> %sable.", index,
 		virtq->enable ? "en" : "dis", enable ? "en" : "dis");
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		virtq->enable = !!enable;
 		return 0;
 	}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 4/7] vdpa/mlx5: reuse resources in reconfiguration
  2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (2 preceding siblings ...)
  2022-02-24 14:38   ` [PATCH v1 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
@ 2022-02-24 14:38   ` Xueming Li
  2022-02-24 14:38   ` [PATCH v1 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 14:38 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

To speed up device resume, create reuseable resources during device
probe state, release when device remove. Reused resources includes TIS,
TD, VAR Doorbell mmap, error handling event channel and interrupt
handler, UAR, Rx event channel, NULL MR, steer domain and table.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 165 +++++++++++++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |   9 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  11 --
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  25 +----
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  44 --------
 6 files changed, 147 insertions(+), 130 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a93a9e78f7f..ee35c36624b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -5,6 +5,7 @@
 #include <net/if.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/mman.h>
 #include <fcntl.h>
 #include <netinet/in.h>
 
@@ -49,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
+
 static struct mlx5_vdpa_priv *
 mlx5_vdpa_find_priv_resource_by_vdev(struct rte_vdpa_device *vdev)
 {
@@ -250,7 +253,6 @@ mlx5_vdpa_dev_close(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	mlx5_vdpa_err_event_unset(priv);
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
@@ -258,7 +260,6 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
-	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_mem_dereg(priv);
 	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
@@ -288,7 +289,7 @@ mlx5_vdpa_dev_config(int vid)
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %s.",
 				vdev->device->name);
-	if (mlx5_vdpa_mem_register(priv) || mlx5_vdpa_err_event_setup(priv) ||
+	if (mlx5_vdpa_mem_register(priv) ||
 	    mlx5_vdpa_virtqs_prepare(priv) || mlx5_vdpa_steer_setup(priv) ||
 	    mlx5_vdpa_cqe_event_setup(priv)) {
 		mlx5_vdpa_dev_close(vid);
@@ -504,12 +505,87 @@ mlx5_vdpa_config_get(struct rte_devargs *devargs, struct mlx5_vdpa_priv *priv)
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
 }
 
+static int
+mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_tis_attr tis_attr = {0};
+	struct ibv_context *ctx = priv->cdev->ctx;
+	uint32_t i;
+	int retry;
+
+	for (retry = 0; retry < 7; retry++) {
+		priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+		if (priv->var != NULL)
+			break;
+		DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.", retry);
+		/* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
+		usleep(100000U << retry);
+	}
+	if (!priv->var) {
+		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	/* Always map the entire page. */
+	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
+				   PROT_WRITE, MAP_SHARED, ctx->cmd_fd,
+				   priv->var->mmap_off);
+	if (priv->virtq_db_addr == MAP_FAILED) {
+		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
+		priv->virtq_db_addr = NULL;
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
+		priv->virtq_db_addr);
+	priv->td = mlx5_devx_cmd_create_td(ctx);
+	if (!priv->td) {
+		DRV_LOG(ERR, "Failed to create transport domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	tis_attr.transport_domain = priv->td->id;
+	for (i = 0; i < priv->num_lag_ports; i++) {
+		/* 0 is auto affinity, non-zero value to propose port. */
+		tis_attr.lag_tx_port_affinity = i + 1;
+		priv->tiss[i] = mlx5_devx_cmd_create_tis(ctx, &tis_attr);
+		if (!priv->tiss[i]) {
+			DRV_LOG(ERR, "Failed to create TIS %u.", i);
+			return -rte_errno;
+		}
+	}
+	priv->null_mr = mlx5_glue->alloc_null_mr(priv->cdev->pd);
+	if (!priv->null_mr) {
+		DRV_LOG(ERR, "Failed to allocate null MR.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
+	priv->steer.domain = mlx5_glue->dr_create_domain(ctx,
+					MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
+	if (!priv->steer.domain) {
+		DRV_LOG(ERR, "Failed to create Rx domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
+	if (!priv->steer.tbl) {
+		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	if (mlx5_vdpa_err_event_setup(priv) != 0)
+		return -rte_errno;
+	if (mlx5_vdpa_event_qp_global_prepare(priv))
+		return -rte_errno;
+	return 0;
+}
+
 static int
 mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev)
 {
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct mlx5_hca_attr *attr = &cdev->config.hca_attr;
-	int retry;
 
 	if (!attr->vdpa.valid || !attr->vdpa.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Not enough capabilities to support vdpa, maybe "
@@ -533,25 +609,10 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev)
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
+	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
-	for (retry = 0; retry < 7; retry++) {
-		priv->var = mlx5_glue->dv_alloc_var(priv->cdev->ctx, 0);
-		if (priv->var != NULL)
-			break;
-		DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
-		/* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
-		usleep(100000U << retry);
-	}
-	if (!priv->var) {
-		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
+	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
-	}
-	priv->err_intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (priv->err_intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
 	if (priv->vdev == NULL) {
 		DRV_LOG(ERR, "Failed to register vDPA device.");
@@ -560,19 +621,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev)
 	}
 	mlx5_vdpa_config_get(cdev->dev->devargs, priv);
 	SLIST_INIT(&priv->mr_list);
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
-
 error:
-	if (priv) {
-		if (priv->var)
-			mlx5_glue->dv_free_var(priv->var);
-		rte_intr_instance_free(priv->err_intr_handle);
-		rte_free(priv);
-	}
+	if (priv)
+		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
 }
 
@@ -592,22 +647,48 @@ mlx5_vdpa_dev_remove(struct mlx5_common_device *cdev)
 	if (found)
 		TAILQ_REMOVE(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
-	if (found) {
-		if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-			mlx5_vdpa_dev_close(priv->vid);
-		if (priv->var) {
-			mlx5_glue->dv_free_var(priv->var);
-			priv->var = NULL;
-		}
-		if (priv->vdev)
-			rte_vdpa_unregister_device(priv->vdev);
-		pthread_mutex_destroy(&priv->vq_config_lock);
-		rte_intr_instance_free(priv->err_intr_handle);
-		rte_free(priv);
-	}
+	if (found)
+		mlx5_vdpa_dev_release(priv);
 	return 0;
 }
 
+static void
+mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+
+	mlx5_vdpa_event_qp_global_release(priv);
+	mlx5_vdpa_err_event_unset(priv);
+	if (priv->steer.tbl)
+		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
+	if (priv->steer.domain)
+		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
+	if (priv->null_mr)
+		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
+	for (i = 0; i < priv->num_lag_ports; i++) {
+		if (priv->tiss[i])
+			claim_zero(mlx5_devx_cmd_destroy(priv->tiss[i]));
+	}
+	if (priv->td)
+		claim_zero(mlx5_devx_cmd_destroy(priv->td));
+	if (priv->virtq_db_addr)
+		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
+	if (priv->var)
+		mlx5_glue->dv_free_var(priv->var);
+}
+
+static void
+mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
+		mlx5_vdpa_dev_close(priv->vid);
+	mlx5_vdpa_release_dev_resources(priv);
+	if (priv->vdev)
+		rte_vdpa_unregister_device(priv->vdev);
+	pthread_mutex_destroy(&priv->vq_config_lock);
+	rte_free(priv);
+}
+
 static const struct rte_pci_id mlx5_vdpa_pci_id_map[] = {
 	{
 		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index cc83d7cba3d..e0ba20b953c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -233,6 +233,15 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 void mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp);
 
+/**
+ * Create all the event global resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+int
+mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv);
+
 /**
  * Release all the event global resources.
  *
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f8d910b33f8..7167a98db0f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -40,11 +40,9 @@ mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv)
 }
 
 /* Prepare all the global resources for all the event objects.*/
-static int
+int
 mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv)
 {
-	if (priv->eventc)
-		return 0;
 	priv->eventc = mlx5_os_devx_create_event_channel(priv->cdev->ctx,
 			   MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
 	if (!priv->eventc) {
@@ -389,22 +387,30 @@ mlx5_vdpa_err_event_setup(struct mlx5_vdpa_priv *priv)
 	flags = fcntl(priv->err_chnl->fd, F_GETFL);
 	ret = fcntl(priv->err_chnl->fd, F_SETFL, flags | O_NONBLOCK);
 	if (ret) {
+		rte_errno = errno;
 		DRV_LOG(ERR, "Failed to change device event channel FD.");
 		goto error;
 	}
-
+	priv->err_intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (priv->err_intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		goto error;
+	}
 	if (rte_intr_fd_set(priv->err_intr_handle, priv->err_chnl->fd))
 		goto error;
 
 	if (rte_intr_type_set(priv->err_intr_handle, RTE_INTR_HANDLE_EXT))
 		goto error;
 
-	if (rte_intr_callback_register(priv->err_intr_handle,
-				       mlx5_vdpa_err_interrupt_handler,
-				       priv)) {
+	ret = rte_intr_callback_register(priv->err_intr_handle,
+					 mlx5_vdpa_err_interrupt_handler,
+					 priv);
+	if (ret != 0) {
 		rte_intr_fd_set(priv->err_intr_handle, 0);
 		DRV_LOG(ERR, "Failed to register error interrupt for device %d.",
 			priv->vid);
+		rte_errno = -ret;
 		goto error;
 	} else {
 		DRV_LOG(DEBUG, "Registered error interrupt for device%d.",
@@ -453,6 +459,7 @@ mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv)
 		mlx5_glue->devx_destroy_event_channel(priv->err_chnl);
 		priv->err_chnl = NULL;
 	}
+	rte_intr_instance_free(priv->err_intr_handle);
 }
 
 int
@@ -575,8 +582,6 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
-	if (mlx5_vdpa_event_qp_global_prepare(priv))
-		return -1;
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index 599079500b0..62f5530e91d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -34,10 +34,6 @@ mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 	SLIST_INIT(&priv->mr_list);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	if (priv->null_mr) {
-		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
-		priv->null_mr = NULL;
-	}
 	if (priv->vmem) {
 		free(priv->vmem);
 		priv->vmem = NULL;
@@ -196,13 +192,6 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	if (!mem)
 		return -rte_errno;
 	priv->vmem = mem;
-	priv->null_mr = mlx5_glue->alloc_null_mr(priv->cdev->pd);
-	if (!priv->null_mr) {
-		DRV_LOG(ERR, "Failed to allocate null MR.");
-		ret = -errno;
-		goto error;
-	}
-	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
 	for (i = 0; i < mem->nregions; i++) {
 		reg = &mem->regions[i];
 		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index a0fd2776e57..e42868486e7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -45,14 +45,6 @@ void
 mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
 {
 	mlx5_vdpa_rss_flows_destroy(priv);
-	if (priv->steer.tbl) {
-		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
-		priv->steer.tbl = NULL;
-	}
-	if (priv->steer.domain) {
-		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
-		priv->steer.domain = NULL;
-	}
 	if (priv->steer.rqt) {
 		claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
 		priv->steer.rqt = NULL;
@@ -248,11 +240,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 	int ret = mlx5_vdpa_rqt_prepare(priv);
 
 	if (ret == 0) {
-		mlx5_vdpa_rss_flows_destroy(priv);
-		if (priv->steer.rqt) {
-			claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
-			priv->steer.rqt = NULL;
-		}
+		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
@@ -269,17 +257,6 @@ int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
 #ifdef HAVE_MLX5DV_DR
-	priv->steer.domain = mlx5_glue->dr_create_domain(priv->cdev->ctx,
-						  MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
-	if (!priv->steer.domain) {
-		DRV_LOG(ERR, "Failed to create Rx domain.");
-		goto error;
-	}
-	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
-	if (!priv->steer.tbl) {
-		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
-		goto error;
-	}
 	if (mlx5_vdpa_steer_update(priv))
 		goto error;
 	return 0;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index b1d584ca8b0..6bda9f1814a 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -3,7 +3,6 @@
  */
 #include <string.h>
 #include <unistd.h>
-#include <sys/mman.h>
 #include <sys/eventfd.h>
 
 #include <rte_malloc.h>
@@ -120,20 +119,6 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		if (virtq->counters)
 			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
 	}
-	for (i = 0; i < priv->num_lag_ports; i++) {
-		if (priv->tiss[i]) {
-			claim_zero(mlx5_devx_cmd_destroy(priv->tiss[i]));
-			priv->tiss[i] = NULL;
-		}
-	}
-	if (priv->td) {
-		claim_zero(mlx5_devx_cmd_destroy(priv->td));
-		priv->td = NULL;
-	}
-	if (priv->virtq_db_addr) {
-		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
-		priv->virtq_db_addr = NULL;
-	}
 	priv->features = 0;
 	memset(priv->virtqs, 0, sizeof(*virtq) * priv->nr_virtqs);
 	priv->nr_virtqs = 0;
@@ -462,8 +447,6 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_devx_tis_attr tis_attr = {0};
-	struct ibv_context *ctx = priv->cdev->ctx;
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
@@ -485,33 +468,6 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			(int)nr_vring);
 		return -1;
 	}
-	/* Always map the entire page. */
-	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
-				   PROT_WRITE, MAP_SHARED, ctx->cmd_fd,
-				   priv->var->mmap_off);
-	if (priv->virtq_db_addr == MAP_FAILED) {
-		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
-		priv->virtq_db_addr = NULL;
-		goto error;
-	} else {
-		DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
-			priv->virtq_db_addr);
-	}
-	priv->td = mlx5_devx_cmd_create_td(ctx);
-	if (!priv->td) {
-		DRV_LOG(ERR, "Failed to create transport domain.");
-		return -rte_errno;
-	}
-	tis_attr.transport_domain = priv->td->id;
-	for (i = 0; i < priv->num_lag_ports; i++) {
-		/* 0 is auto affinity, non-zero value to propose port. */
-		tis_attr.lag_tx_port_affinity = i + 1;
-		priv->tiss[i] = mlx5_devx_cmd_create_tis(ctx, &tis_attr);
-		if (!priv->tiss[i]) {
-			DRV_LOG(ERR, "Failed to create TIS %u.", i);
-			goto error;
-		}
-	}
 	priv->nr_virtqs = nr_vring;
 	for (i = 0; i < nr_vring; i++)
 		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 5/7] vdpa/mlx5: cache and reuse hardware resources
  2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (3 preceding siblings ...)
  2022-02-24 14:38   ` [PATCH v1 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
@ 2022-02-24 14:38   ` Xueming Li
  2022-02-24 14:38   ` [PATCH v1 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
  2022-02-24 14:38   ` [PATCH v1 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
  6 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 14:38 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

During device suspend and resume, resources are not changed normally.
When huge resources allocated to VM, like huge memory size or lots of
queues, time spent on release and recreate became significant.

To speed up, this patch reuse resoruces like VM MR and VirtQ memory if
not changed.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 11 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   | 27 ++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 73 +++++++++++++++++++++--------
 4 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index ee35c36624b..f794cb9bd61 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -241,6 +241,13 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
+static void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
+{
+	mlx5_vdpa_virtqs_cleanup(priv);
+	mlx5_vdpa_mem_dereg(priv);
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -260,7 +267,8 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
-	mlx5_vdpa_mem_dereg(priv);
+	if (priv->lm_mr.addr)
+		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
@@ -657,6 +665,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	mlx5_vdpa_dev_cache_clean(priv);
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_err_event_unset(priv);
 	if (priv->steer.tbl)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e0ba20b953c..540bf87a352 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -289,13 +289,21 @@ int mlx5_vdpa_err_event_setup(struct mlx5_vdpa_priv *priv);
 void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
 
 /**
- * Release a virtq and all its related resources.
+ * Release virtqs and resources except that to be reused.
  *
  * @param[in] priv
  *   The vdpa driver private structure.
  */
 void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Cleanup cached resources of all virtqs.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv);
+
 /**
  * Create all the HW virtqs resources and all their related resources.
  *
@@ -323,7 +331,7 @@ int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 int mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable);
 
 /**
- * Unset steering and release all its related resources- stop traffic.
+ * Unset steering - stop traffic.
  *
  * @param[in] priv
  *   The vdpa driver private structure.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index 62f5530e91d..d6e3dd664b5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -32,8 +32,6 @@ mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 		entry = next;
 	}
 	SLIST_INIT(&priv->mr_list);
-	if (priv->lm_mr.addr)
-		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	if (priv->vmem) {
 		free(priv->vmem);
 		priv->vmem = NULL;
@@ -149,6 +147,23 @@ mlx5_vdpa_vhost_mem_regions_prepare(int vid, uint8_t *mode, uint64_t *mem_size,
 	return mem;
 }
 
+static int
+mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
+{
+	uint32_t i;
+
+	if (mem1->nregions != mem2->nregions)
+		return -1;
+	for (i = 0; i < mem1->nregions; i++) {
+		if (mem1->regions[i].guest_phys_addr !=
+		    mem2->regions[i].guest_phys_addr)
+			return -1;
+		if (mem1->regions[i].size != mem2->regions[i].size)
+			return -1;
+	}
+	return 0;
+}
+
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
@@ -191,6 +206,14 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 
 	if (!mem)
 		return -rte_errno;
+	if (priv->vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
 	priv->vmem = mem;
 	for (i = 0; i < mem->nregions; i++) {
 		reg = &mem->regions[i];
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6bda9f1814a..c42846ecb3c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -66,10 +66,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Release cached VQ resources. */
+void
+mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
+			if (virtq->umems[j].obj) {
+				claim_zero(mlx5_glue->devx_umem_dereg
+							(virtq->umems[j].obj));
+				virtq->umems[j].obj = NULL;
+			}
+			if (virtq->umems[j].buf) {
+				rte_free(virtq->umems[j].buf);
+				virtq->umems[j].buf = NULL;
+			}
+			virtq->umems[j].size = 0;
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	unsigned int i;
 	int ret = -EAGAIN;
 
 	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
@@ -94,13 +117,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
-							 (virtq->umems[i].obj));
-		rte_free(virtq->umems[i].buf);
-	}
-	memset(&virtq->umems, 0, sizeof(virtq->umems));
 	if (virtq->eqp.fw_qp)
 		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
@@ -120,7 +136,6 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
 	}
 	priv->features = 0;
-	memset(priv->virtqs, 0, sizeof(*virtq) * priv->nr_virtqs);
 	priv->nr_virtqs = 0;
 }
 
@@ -215,6 +230,8 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
 	if (ret)
 		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->index = index;
 	virtq->vq_size = vq.size;
 	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
@@ -259,24 +276,42 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	/* Setup 3 UMEMs for each virtq. */
 	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
-							  priv->caps.umems[i].b;
-		virtq->umems[i].buf = rte_zmalloc(__func__,
-						  virtq->umems[i].size, 4096);
-		if (!virtq->umems[i].buf) {
+		uint32_t size;
+		void *buf;
+		struct mlx5dv_devx_umem *obj;
+
+		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
+		if (virtq->umems[i].size == size &&
+		    virtq->umems[i].obj != NULL) {
+			/* Reuse registered memory. */
+			memset(virtq->umems[i].buf, 0, size);
+			goto reuse;
+		}
+		if (virtq->umems[i].obj)
+			claim_zero(mlx5_glue->devx_umem_dereg
+				   (virtq->umems[i].obj));
+		if (virtq->umems[i].buf)
+			rte_free(virtq->umems[i].buf);
+		virtq->umems[i].size = 0;
+		virtq->umems[i].obj = NULL;
+		virtq->umems[i].buf = NULL;
+		buf = rte_zmalloc(__func__, size, 4096);
+		if (buf == NULL) {
 			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
 			goto error;
 		}
-		virtq->umems[i].obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
-							virtq->umems[i].buf,
-							virtq->umems[i].size,
-							IBV_ACCESS_LOCAL_WRITE);
-		if (!virtq->umems[i].obj) {
+		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
+					       IBV_ACCESS_LOCAL_WRITE);
+		if (obj == NULL) {
 			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
 			goto error;
 		}
+		virtq->umems[i].size = size;
+		virtq->umems[i].buf = buf;
+		virtq->umems[i].obj = obj;
+reuse:
 		attr.umems[i].id = virtq->umems[i].obj->umem_id;
 		attr.umems[i].offset = 0;
 		attr.umems[i].size = virtq->umems[i].size;
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 6/7] vdpa/mlx5: support device cleanup callback
  2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (4 preceding siblings ...)
  2022-02-24 14:38   ` [PATCH v1 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
@ 2022-02-24 14:38   ` Xueming Li
  2022-02-24 14:38   ` [PATCH v1 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
  6 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 14:38 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

This patch supports device cleanup callback API which called when device
disconnected with VM. Cached resources like VM MR and VQ memory are
released.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 23 +++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index f794cb9bd61..a64445cd8b5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -270,6 +270,8 @@ mlx5_vdpa_dev_close(int vid)
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
+	if (!priv->connected)
+		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
@@ -294,6 +296,7 @@ mlx5_vdpa_dev_config(int vid)
 		return -1;
 	}
 	priv->vid = vid;
+	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %s.",
 				vdev->device->name);
@@ -431,12 +434,32 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 	return mlx5_vdpa_virtq_stats_reset(priv, qid);
 }
 
+static int
+mlx5_vdpa_dev_clean(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (vdev == NULL)
+		return -1;
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	if (priv->state == MLX5_VDPA_STATE_PROBED)
+		mlx5_vdpa_dev_cache_clean(priv);
+	priv->connected = false;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = mlx5_vdpa_dev_config,
 	.dev_close = mlx5_vdpa_dev_close,
+	.dev_cleanup = mlx5_vdpa_dev_clean,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 540bf87a352..24bafe85b44 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -121,6 +121,7 @@ enum mlx5_dev_state {
 
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	bool connected;
 	enum mlx5_dev_state state;
 	pthread_mutex_t vq_config_lock;
 	uint64_t no_traffic_counter;
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v1 7/7] vdpa/mlx5: make statistics counter persistent
  2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (5 preceding siblings ...)
  2022-02-24 14:38   ` [PATCH v1 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
@ 2022-02-24 14:38   ` Xueming Li
  6 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 14:38 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

To speed the device suspend and resume time, make counter persitent
in reconfiguration until the device gets removed.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst        |  6 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 19 +++++++----------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  1 +
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 32 +++++++++++------------------
 4 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 30f0b62eb41..070208d3952 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -182,3 +182,9 @@ Upon potential hardware errors, mlx5 PMD try to recover, give up if failed 3
 times in 3 seconds, virtq will be put in disable state. User should check log
 to get error information, or query vdpa statistics counter to know error type
 and count report.
+
+Statistics
+^^^^^^^^^^
+
+The device statistics counter persists in reconfiguration until the device gets
+removed. User can reset counters by calling function rte_vdpa_reset_stats().
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a64445cd8b5..e9038e3904e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -388,12 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED) {
-		DRV_LOG(ERR, "Device %s was not configured.",
-				vdev->device->name);
-		return -ENODATA;
-	}
-	if (qid >= (int)priv->nr_virtqs) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -416,12 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED) {
-		DRV_LOG(ERR, "Device %s was not configured.",
-				vdev->device->name);
-		return -ENODATA;
-	}
-	if (qid >= (int)priv->nr_virtqs) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -689,6 +679,11 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 
 	mlx5_vdpa_dev_cache_clean(priv);
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		if (!priv->virtqs[i].counters)
+			continue;
+		claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
+	}
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_err_event_unset(priv);
 	if (priv->steer.tbl)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 24bafe85b44..e7f3319f896 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -92,6 +92,7 @@ struct mlx5_vdpa_virtq {
 	struct rte_intr_handle *intr_handle;
 	uint64_t err_time[3]; /* RDTSC time of recent errors. */
 	uint32_t n_retry;
+	struct mlx5_devx_virtio_q_couners_attr stats;
 	struct mlx5_devx_virtio_q_couners_attr reset;
 };
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c42846ecb3c..d2c91b25db1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -127,14 +127,9 @@ void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
 	int i;
-	struct mlx5_vdpa_virtq *virtq;
 
-	for (i = 0; i < priv->nr_virtqs; i++) {
-		virtq = &priv->virtqs[i];
-		mlx5_vdpa_virtq_unset(virtq);
-		if (virtq->counters)
-			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
-	}
+	for (i = 0; i < priv->nr_virtqs; i++)
+		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -590,7 +585,7 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 			  struct rte_vdpa_stat *stats, unsigned int n)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[qid];
-	struct mlx5_devx_virtio_q_couners_attr attr = {0};
+	struct mlx5_devx_virtio_q_couners_attr *attr = &virtq->stats;
 	int ret;
 
 	if (!virtq->counters) {
@@ -598,7 +593,7 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 			"is invalid.", qid);
 		return -EINVAL;
 	}
-	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters, &attr);
+	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters, attr);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to read virtq %d stats from HW.", qid);
 		return ret;
@@ -608,37 +603,37 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 		return ret;
 	stats[MLX5_VDPA_STATS_RECEIVED_DESCRIPTORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_RECEIVED_DESCRIPTORS,
-		.value = attr.received_desc - virtq->reset.received_desc,
+		.value = attr->received_desc - virtq->reset.received_desc,
 	};
 	if (ret == MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS,
-		.value = attr.completed_desc - virtq->reset.completed_desc,
+		.value = attr->completed_desc - virtq->reset.completed_desc,
 	};
 	if (ret == MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS,
-		.value = attr.bad_desc_errors - virtq->reset.bad_desc_errors,
+		.value = attr->bad_desc_errors - virtq->reset.bad_desc_errors,
 	};
 	if (ret == MLX5_VDPA_STATS_EXCEED_MAX_CHAIN)
 		return ret;
 	stats[MLX5_VDPA_STATS_EXCEED_MAX_CHAIN] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_EXCEED_MAX_CHAIN,
-		.value = attr.exceed_max_chain - virtq->reset.exceed_max_chain,
+		.value = attr->exceed_max_chain - virtq->reset.exceed_max_chain,
 	};
 	if (ret == MLX5_VDPA_STATS_INVALID_BUFFER)
 		return ret;
 	stats[MLX5_VDPA_STATS_INVALID_BUFFER] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_INVALID_BUFFER,
-		.value = attr.invalid_buffer - virtq->reset.invalid_buffer,
+		.value = attr->invalid_buffer - virtq->reset.invalid_buffer,
 	};
 	if (ret == MLX5_VDPA_STATS_COMPLETION_ERRORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_COMPLETION_ERRORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_COMPLETION_ERRORS,
-		.value = attr.error_cqes - virtq->reset.error_cqes,
+		.value = attr->error_cqes - virtq->reset.error_cqes,
 	};
 	return ret;
 }
@@ -649,11 +644,8 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[qid];
 	int ret;
 
-	if (!virtq->counters) {
-		DRV_LOG(ERR, "Failed to read virtq %d statistics - virtq "
-			"is invalid.", qid);
-		return -EINVAL;
-	}
+	if (virtq->counters == NULL) /* VQ not enabled. */
+		return 0;
 	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters,
 						    &virtq->reset);
 	if (ret)
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                   ` (7 preceding siblings ...)
  2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
@ 2022-02-24 15:50 ` Xueming Li
  2022-02-24 15:50   ` [PATCH v2 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
                     ` (6 more replies)
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  9 siblings, 7 replies; 43+ messages in thread
From: Xueming Li @ 2022-02-24 15:50 UTC (permalink / raw)
  To: dev; +Cc: xuemingl

v1:
 - rebase with latest upstream code
 - fix coverity issues
v2:
 - fix build issue on OS w/o flow DR API

Xueming Li (7):
  vdpa/mlx5: fix interrupt trash that leads to segment fault
  vdpa/mlx5: fix dead loop when process interrupted
  vdpa/mlx5: no kick handling during shutdown
  vdpa/mlx5: reuse resources in reconfiguration
  vdpa/mlx5: cache and reuse hardware resources
  vdpa/mlx5: support device cleanup callback
  vdpa/mlx5: make statistics counter persistent

 doc/guides/vdpadevs/mlx5.rst        |   6 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 231 +++++++++++++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  31 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 +--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  38 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  30 +---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 189 +++++++++++------------
 7 files changed, 336 insertions(+), 212 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault
  2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
@ 2022-02-24 15:50   ` Xueming Li
  2022-04-20 10:39     ` Maxime Coquelin
  2022-02-24 15:50   ` [PATCH v2 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2022-02-24 15:50 UTC (permalink / raw)
  To: dev
  Cc: xuemingl, matan, stable, Matan Azrad, Viacheslav Ovsiienko,
	Maxime Coquelin

Disable interrupt unregister timeout to avoid invalid FD caused
interrupt thread segment fault.

Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
Cc: matan@mellanox.com
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 3416797d289..de324506cb9 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -17,7 +17,7 @@
 
 
 static void
-mlx5_vdpa_virtq_handler(void *cb_arg)
+mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 {
 	struct mlx5_vdpa_virtq *virtq = cb_arg;
 	struct mlx5_vdpa_priv *priv = virtq->priv;
@@ -59,20 +59,16 @@ static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	unsigned int i;
-	int retries = MLX5_VDPA_INTR_RETRIES;
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) != -1) {
-		while (retries-- && ret == -EAGAIN) {
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
 			ret = rte_intr_callback_unregister(virtq->intr_handle,
-							mlx5_vdpa_virtq_handler,
-							virtq);
+					mlx5_vdpa_virtq_kick_handler, virtq);
 			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d "
-				"of virtq %d interrupt, retries = %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				(int)virtq->index, retries);
-
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					(int)virtq->index);
 				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
 			}
 		}
@@ -359,7 +355,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 			goto error;
 
 		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_handler,
+					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
 			rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 2/7] vdpa/mlx5: fix dead loop when process interrupted
  2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  2022-02-24 15:50   ` [PATCH v2 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
@ 2022-02-24 15:50   ` Xueming Li
  2022-04-20 10:33     ` Maxime Coquelin
  2022-02-24 15:50   ` [PATCH v2 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2022-02-24 15:50 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, stable, Matan Azrad, Viacheslav Ovsiienko, Maxime Coquelin

In Ctrl+C handling, sometimes kick handling thread gets endless EGAIN
error and fall into dead lock.

Kick happens frequently in real system due to busy traffic or retry
mechanism. This patch simplifies kick firmware anyway and skip setting
hardware notifier due to potential device error, notifier could be set
in next successful kick request.

Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index de324506cb9..e1e05924a40 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -23,11 +23,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	struct mlx5_vdpa_priv *priv = virtq->priv;
 	uint64_t buf;
 	int nbytes;
+	int retry;
 
 	if (rte_intr_fd_get(virtq->intr_handle) < 0)
 		return;
-
-	do {
+	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
 		if (nbytes < 0) {
@@ -39,7 +39,9 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 				virtq->index, strerror(errno));
 		}
 		break;
-	} while (1);
+	}
+	if (nbytes < 0)
+		return;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
 		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 3/7] vdpa/mlx5: no kick handling during shutdown
  2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  2022-02-24 15:50   ` [PATCH v2 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
  2022-02-24 15:50   ` [PATCH v2 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
@ 2022-02-24 15:50   ` Xueming Li
  2022-04-20 12:37     ` Maxime Coquelin
  2022-02-24 15:50   ` [PATCH v2 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2022-02-24 15:50 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

When Qemu suspend a VM, hw notifier is un-mmapped while vCPU thread may
still active and write notifier through kick socket.

PMD kick handler thread tries to install hw notifier through client
socket in such case will timeout and slow down device close.

This patch skips hw notifier install if VQ or device in middle of
shutdown.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 17 ++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 12 +++++++++++-
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 8dfaba791dc..a93a9e78f7f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -252,13 +252,15 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_err_event_unset(priv);
 	mlx5_vdpa_cqe_event_unset(priv);
-	if (priv->configured)
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
+		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
+	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_mem_dereg(priv);
-	priv->configured = 0;
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
@@ -277,7 +279,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -EINVAL;
 	}
-	if (priv->configured && mlx5_vdpa_dev_close(vid)) {
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED &&
+	    mlx5_vdpa_dev_close(vid)) {
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
@@ -291,7 +294,7 @@ mlx5_vdpa_dev_config(int vid)
 		mlx5_vdpa_dev_close(vid);
 		return -1;
 	}
-	priv->configured = 1;
+	priv->state = MLX5_VDPA_STATE_CONFIGURED;
 	DRV_LOG(INFO, "vDPA device %d was configured.", vid);
 	return 0;
 }
@@ -373,7 +376,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		DRV_LOG(ERR, "Device %s was not configured.",
 				vdev->device->name);
 		return -ENODATA;
@@ -401,7 +404,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		DRV_LOG(ERR, "Device %s was not configured.",
 				vdev->device->name);
 		return -ENODATA;
@@ -590,7 +593,7 @@ mlx5_vdpa_dev_remove(struct mlx5_common_device *cdev)
 		TAILQ_REMOVE(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	if (found) {
-		if (priv->configured)
+		if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 			mlx5_vdpa_dev_close(priv->vid);
 		if (priv->var) {
 			mlx5_glue->dv_free_var(priv->var);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 22617924eac..cc83d7cba3d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -113,9 +113,15 @@ enum {
 	MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT
 };
 
+enum mlx5_dev_state {
+	MLX5_VDPA_STATE_PROBED = 0,
+	MLX5_VDPA_STATE_CONFIGURED,
+	MLX5_VDPA_STATE_IN_PROGRESS /* Shutting down. */
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
-	uint8_t configured;
+	enum mlx5_dev_state state;
 	pthread_mutex_t vq_config_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index e1e05924a40..b1d584ca8b0 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -25,6 +25,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+			priv->vid, virtq->index);
+		return;
+	}
 	if (rte_intr_fd_get(virtq->intr_handle) < 0)
 		return;
 	for (retry = 0; retry < 3; ++retry) {
@@ -43,6 +48,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	if (nbytes < 0)
 		return;
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+			priv->vid, virtq->index);
+		return;
+	}
 	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
 		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
 			virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_ERR;
@@ -541,7 +551,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 
 	DRV_LOG(INFO, "Update virtq %d status %sable -> %sable.", index,
 		virtq->enable ? "en" : "dis", enable ? "en" : "dis");
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		virtq->enable = !!enable;
 		return 0;
 	}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 4/7] vdpa/mlx5: reuse resources in reconfiguration
  2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (2 preceding siblings ...)
  2022-02-24 15:50   ` [PATCH v2 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
@ 2022-02-24 15:50   ` Xueming Li
  2022-04-20 14:49     ` Maxime Coquelin
  2022-02-24 15:50   ` [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2022-02-24 15:50 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

To speed up device resume, create reuseable resources during device
probe state, release when device remove. Reused resources includes TIS,
TD, VAR Doorbell mmap, error handling event channel and interrupt
handler, UAR, Rx event channel, NULL MR, steer domain and table.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 167 +++++++++++++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |   9 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  11 --
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  30 +----
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  44 --------
 6 files changed, 149 insertions(+), 135 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index a93a9e78f7f..9862141497b 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -5,6 +5,7 @@
 #include <net/if.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/mman.h>
 #include <fcntl.h>
 #include <netinet/in.h>
 
@@ -49,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
+
 static struct mlx5_vdpa_priv *
 mlx5_vdpa_find_priv_resource_by_vdev(struct rte_vdpa_device *vdev)
 {
@@ -250,7 +253,6 @@ mlx5_vdpa_dev_close(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	mlx5_vdpa_err_event_unset(priv);
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
@@ -258,7 +260,6 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
-	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_mem_dereg(priv);
 	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
@@ -288,7 +289,7 @@ mlx5_vdpa_dev_config(int vid)
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %s.",
 				vdev->device->name);
-	if (mlx5_vdpa_mem_register(priv) || mlx5_vdpa_err_event_setup(priv) ||
+	if (mlx5_vdpa_mem_register(priv) ||
 	    mlx5_vdpa_virtqs_prepare(priv) || mlx5_vdpa_steer_setup(priv) ||
 	    mlx5_vdpa_cqe_event_setup(priv)) {
 		mlx5_vdpa_dev_close(vid);
@@ -504,12 +505,89 @@ mlx5_vdpa_config_get(struct rte_devargs *devargs, struct mlx5_vdpa_priv *priv)
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
 }
 
+static int
+mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_tis_attr tis_attr = {0};
+	struct ibv_context *ctx = priv->cdev->ctx;
+	uint32_t i;
+	int retry;
+
+	for (retry = 0; retry < 7; retry++) {
+		priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+		if (priv->var != NULL)
+			break;
+		DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.", retry);
+		/* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
+		usleep(100000U << retry);
+	}
+	if (!priv->var) {
+		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	/* Always map the entire page. */
+	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
+				   PROT_WRITE, MAP_SHARED, ctx->cmd_fd,
+				   priv->var->mmap_off);
+	if (priv->virtq_db_addr == MAP_FAILED) {
+		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
+		priv->virtq_db_addr = NULL;
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
+		priv->virtq_db_addr);
+	priv->td = mlx5_devx_cmd_create_td(ctx);
+	if (!priv->td) {
+		DRV_LOG(ERR, "Failed to create transport domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	tis_attr.transport_domain = priv->td->id;
+	for (i = 0; i < priv->num_lag_ports; i++) {
+		/* 0 is auto affinity, non-zero value to propose port. */
+		tis_attr.lag_tx_port_affinity = i + 1;
+		priv->tiss[i] = mlx5_devx_cmd_create_tis(ctx, &tis_attr);
+		if (!priv->tiss[i]) {
+			DRV_LOG(ERR, "Failed to create TIS %u.", i);
+			return -rte_errno;
+		}
+	}
+	priv->null_mr = mlx5_glue->alloc_null_mr(priv->cdev->pd);
+	if (!priv->null_mr) {
+		DRV_LOG(ERR, "Failed to allocate null MR.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
+#ifdef HAVE_MLX5DV_DR
+	priv->steer.domain = mlx5_glue->dr_create_domain(ctx,
+					MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
+	if (!priv->steer.domain) {
+		DRV_LOG(ERR, "Failed to create Rx domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+#endif
+	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
+	if (!priv->steer.tbl) {
+		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	if (mlx5_vdpa_err_event_setup(priv) != 0)
+		return -rte_errno;
+	if (mlx5_vdpa_event_qp_global_prepare(priv))
+		return -rte_errno;
+	return 0;
+}
+
 static int
 mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev)
 {
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct mlx5_hca_attr *attr = &cdev->config.hca_attr;
-	int retry;
 
 	if (!attr->vdpa.valid || !attr->vdpa.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Not enough capabilities to support vdpa, maybe "
@@ -533,25 +611,10 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev)
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
+	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
-	for (retry = 0; retry < 7; retry++) {
-		priv->var = mlx5_glue->dv_alloc_var(priv->cdev->ctx, 0);
-		if (priv->var != NULL)
-			break;
-		DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
-		/* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
-		usleep(100000U << retry);
-	}
-	if (!priv->var) {
-		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
+	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
-	}
-	priv->err_intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (priv->err_intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
 	if (priv->vdev == NULL) {
 		DRV_LOG(ERR, "Failed to register vDPA device.");
@@ -560,19 +623,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev)
 	}
 	mlx5_vdpa_config_get(cdev->dev->devargs, priv);
 	SLIST_INIT(&priv->mr_list);
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
-
 error:
-	if (priv) {
-		if (priv->var)
-			mlx5_glue->dv_free_var(priv->var);
-		rte_intr_instance_free(priv->err_intr_handle);
-		rte_free(priv);
-	}
+	if (priv)
+		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
 }
 
@@ -592,22 +649,48 @@ mlx5_vdpa_dev_remove(struct mlx5_common_device *cdev)
 	if (found)
 		TAILQ_REMOVE(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
-	if (found) {
-		if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-			mlx5_vdpa_dev_close(priv->vid);
-		if (priv->var) {
-			mlx5_glue->dv_free_var(priv->var);
-			priv->var = NULL;
-		}
-		if (priv->vdev)
-			rte_vdpa_unregister_device(priv->vdev);
-		pthread_mutex_destroy(&priv->vq_config_lock);
-		rte_intr_instance_free(priv->err_intr_handle);
-		rte_free(priv);
-	}
+	if (found)
+		mlx5_vdpa_dev_release(priv);
 	return 0;
 }
 
+static void
+mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+
+	mlx5_vdpa_event_qp_global_release(priv);
+	mlx5_vdpa_err_event_unset(priv);
+	if (priv->steer.tbl)
+		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
+	if (priv->steer.domain)
+		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
+	if (priv->null_mr)
+		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
+	for (i = 0; i < priv->num_lag_ports; i++) {
+		if (priv->tiss[i])
+			claim_zero(mlx5_devx_cmd_destroy(priv->tiss[i]));
+	}
+	if (priv->td)
+		claim_zero(mlx5_devx_cmd_destroy(priv->td));
+	if (priv->virtq_db_addr)
+		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
+	if (priv->var)
+		mlx5_glue->dv_free_var(priv->var);
+}
+
+static void
+mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
+		mlx5_vdpa_dev_close(priv->vid);
+	mlx5_vdpa_release_dev_resources(priv);
+	if (priv->vdev)
+		rte_vdpa_unregister_device(priv->vdev);
+	pthread_mutex_destroy(&priv->vq_config_lock);
+	rte_free(priv);
+}
+
 static const struct rte_pci_id mlx5_vdpa_pci_id_map[] = {
 	{
 		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index cc83d7cba3d..e0ba20b953c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -233,6 +233,15 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 void mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp);
 
+/**
+ * Create all the event global resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+int
+mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv);
+
 /**
  * Release all the event global resources.
  *
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f8d910b33f8..7167a98db0f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -40,11 +40,9 @@ mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv)
 }
 
 /* Prepare all the global resources for all the event objects.*/
-static int
+int
 mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv)
 {
-	if (priv->eventc)
-		return 0;
 	priv->eventc = mlx5_os_devx_create_event_channel(priv->cdev->ctx,
 			   MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
 	if (!priv->eventc) {
@@ -389,22 +387,30 @@ mlx5_vdpa_err_event_setup(struct mlx5_vdpa_priv *priv)
 	flags = fcntl(priv->err_chnl->fd, F_GETFL);
 	ret = fcntl(priv->err_chnl->fd, F_SETFL, flags | O_NONBLOCK);
 	if (ret) {
+		rte_errno = errno;
 		DRV_LOG(ERR, "Failed to change device event channel FD.");
 		goto error;
 	}
-
+	priv->err_intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (priv->err_intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		goto error;
+	}
 	if (rte_intr_fd_set(priv->err_intr_handle, priv->err_chnl->fd))
 		goto error;
 
 	if (rte_intr_type_set(priv->err_intr_handle, RTE_INTR_HANDLE_EXT))
 		goto error;
 
-	if (rte_intr_callback_register(priv->err_intr_handle,
-				       mlx5_vdpa_err_interrupt_handler,
-				       priv)) {
+	ret = rte_intr_callback_register(priv->err_intr_handle,
+					 mlx5_vdpa_err_interrupt_handler,
+					 priv);
+	if (ret != 0) {
 		rte_intr_fd_set(priv->err_intr_handle, 0);
 		DRV_LOG(ERR, "Failed to register error interrupt for device %d.",
 			priv->vid);
+		rte_errno = -ret;
 		goto error;
 	} else {
 		DRV_LOG(DEBUG, "Registered error interrupt for device%d.",
@@ -453,6 +459,7 @@ mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv)
 		mlx5_glue->devx_destroy_event_channel(priv->err_chnl);
 		priv->err_chnl = NULL;
 	}
+	rte_intr_instance_free(priv->err_intr_handle);
 }
 
 int
@@ -575,8 +582,6 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
-	if (mlx5_vdpa_event_qp_global_prepare(priv))
-		return -1;
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index 599079500b0..62f5530e91d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -34,10 +34,6 @@ mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 	SLIST_INIT(&priv->mr_list);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	if (priv->null_mr) {
-		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
-		priv->null_mr = NULL;
-	}
 	if (priv->vmem) {
 		free(priv->vmem);
 		priv->vmem = NULL;
@@ -196,13 +192,6 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	if (!mem)
 		return -rte_errno;
 	priv->vmem = mem;
-	priv->null_mr = mlx5_glue->alloc_null_mr(priv->cdev->pd);
-	if (!priv->null_mr) {
-		DRV_LOG(ERR, "Failed to allocate null MR.");
-		ret = -errno;
-		goto error;
-	}
-	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
 	for (i = 0; i < mem->nregions; i++) {
 		reg = &mem->regions[i];
 		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index a0fd2776e57..d4b4375c886 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -45,14 +45,6 @@ void
 mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
 {
 	mlx5_vdpa_rss_flows_destroy(priv);
-	if (priv->steer.tbl) {
-		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
-		priv->steer.tbl = NULL;
-	}
-	if (priv->steer.domain) {
-		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
-		priv->steer.domain = NULL;
-	}
 	if (priv->steer.rqt) {
 		claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
 		priv->steer.rqt = NULL;
@@ -248,11 +240,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 	int ret = mlx5_vdpa_rqt_prepare(priv);
 
 	if (ret == 0) {
-		mlx5_vdpa_rss_flows_destroy(priv);
-		if (priv->steer.rqt) {
-			claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
-			priv->steer.rqt = NULL;
-		}
+		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
@@ -268,26 +256,10 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
-#ifdef HAVE_MLX5DV_DR
-	priv->steer.domain = mlx5_glue->dr_create_domain(priv->cdev->ctx,
-						  MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
-	if (!priv->steer.domain) {
-		DRV_LOG(ERR, "Failed to create Rx domain.");
-		goto error;
-	}
-	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
-	if (!priv->steer.tbl) {
-		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
-		goto error;
-	}
 	if (mlx5_vdpa_steer_update(priv))
 		goto error;
 	return 0;
 error:
 	mlx5_vdpa_steer_unset(priv);
 	return -1;
-#else
-	(void)priv;
-	return -ENOTSUP;
-#endif /* HAVE_MLX5DV_DR */
 }
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index b1d584ca8b0..6bda9f1814a 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -3,7 +3,6 @@
  */
 #include <string.h>
 #include <unistd.h>
-#include <sys/mman.h>
 #include <sys/eventfd.h>
 
 #include <rte_malloc.h>
@@ -120,20 +119,6 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		if (virtq->counters)
 			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
 	}
-	for (i = 0; i < priv->num_lag_ports; i++) {
-		if (priv->tiss[i]) {
-			claim_zero(mlx5_devx_cmd_destroy(priv->tiss[i]));
-			priv->tiss[i] = NULL;
-		}
-	}
-	if (priv->td) {
-		claim_zero(mlx5_devx_cmd_destroy(priv->td));
-		priv->td = NULL;
-	}
-	if (priv->virtq_db_addr) {
-		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
-		priv->virtq_db_addr = NULL;
-	}
 	priv->features = 0;
 	memset(priv->virtqs, 0, sizeof(*virtq) * priv->nr_virtqs);
 	priv->nr_virtqs = 0;
@@ -462,8 +447,6 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_devx_tis_attr tis_attr = {0};
-	struct ibv_context *ctx = priv->cdev->ctx;
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
@@ -485,33 +468,6 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			(int)nr_vring);
 		return -1;
 	}
-	/* Always map the entire page. */
-	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
-				   PROT_WRITE, MAP_SHARED, ctx->cmd_fd,
-				   priv->var->mmap_off);
-	if (priv->virtq_db_addr == MAP_FAILED) {
-		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
-		priv->virtq_db_addr = NULL;
-		goto error;
-	} else {
-		DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
-			priv->virtq_db_addr);
-	}
-	priv->td = mlx5_devx_cmd_create_td(ctx);
-	if (!priv->td) {
-		DRV_LOG(ERR, "Failed to create transport domain.");
-		return -rte_errno;
-	}
-	tis_attr.transport_domain = priv->td->id;
-	for (i = 0; i < priv->num_lag_ports; i++) {
-		/* 0 is auto affinity, non-zero value to propose port. */
-		tis_attr.lag_tx_port_affinity = i + 1;
-		priv->tiss[i] = mlx5_devx_cmd_create_tis(ctx, &tis_attr);
-		if (!priv->tiss[i]) {
-			DRV_LOG(ERR, "Failed to create TIS %u.", i);
-			goto error;
-		}
-	}
 	priv->nr_virtqs = nr_vring;
 	for (i = 0; i < nr_vring; i++)
 		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources
  2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (3 preceding siblings ...)
  2022-02-24 15:50   ` [PATCH v2 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
@ 2022-02-24 15:50   ` Xueming Li
  2022-04-20 15:03     ` Maxime Coquelin
  2022-02-24 15:51   ` [PATCH v2 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
  2022-02-24 15:51   ` [PATCH v2 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
  6 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2022-02-24 15:50 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

During device suspend and resume, resources are not changed normally.
When huge resources allocated to VM, like huge memory size or lots of
queues, time spent on release and recreate became significant.

To speed up, this patch reuse resoruces like VM MR and VirtQ memory if
not changed.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 11 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   | 27 ++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 73 +++++++++++++++++++++--------
 4 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 9862141497b..38ed45f95f7 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -241,6 +241,13 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
+static void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
+{
+	mlx5_vdpa_virtqs_cleanup(priv);
+	mlx5_vdpa_mem_dereg(priv);
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -260,7 +267,8 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
-	mlx5_vdpa_mem_dereg(priv);
+	if (priv->lm_mr.addr)
+		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
@@ -659,6 +667,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	mlx5_vdpa_dev_cache_clean(priv);
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_err_event_unset(priv);
 	if (priv->steer.tbl)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e0ba20b953c..540bf87a352 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -289,13 +289,21 @@ int mlx5_vdpa_err_event_setup(struct mlx5_vdpa_priv *priv);
 void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
 
 /**
- * Release a virtq and all its related resources.
+ * Release virtqs and resources except that to be reused.
  *
  * @param[in] priv
  *   The vdpa driver private structure.
  */
 void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Cleanup cached resources of all virtqs.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv);
+
 /**
  * Create all the HW virtqs resources and all their related resources.
  *
@@ -323,7 +331,7 @@ int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 int mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable);
 
 /**
- * Unset steering and release all its related resources- stop traffic.
+ * Unset steering - stop traffic.
  *
  * @param[in] priv
  *   The vdpa driver private structure.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index 62f5530e91d..d6e3dd664b5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -32,8 +32,6 @@ mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 		entry = next;
 	}
 	SLIST_INIT(&priv->mr_list);
-	if (priv->lm_mr.addr)
-		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	if (priv->vmem) {
 		free(priv->vmem);
 		priv->vmem = NULL;
@@ -149,6 +147,23 @@ mlx5_vdpa_vhost_mem_regions_prepare(int vid, uint8_t *mode, uint64_t *mem_size,
 	return mem;
 }
 
+static int
+mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
+{
+	uint32_t i;
+
+	if (mem1->nregions != mem2->nregions)
+		return -1;
+	for (i = 0; i < mem1->nregions; i++) {
+		if (mem1->regions[i].guest_phys_addr !=
+		    mem2->regions[i].guest_phys_addr)
+			return -1;
+		if (mem1->regions[i].size != mem2->regions[i].size)
+			return -1;
+	}
+	return 0;
+}
+
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
@@ -191,6 +206,14 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 
 	if (!mem)
 		return -rte_errno;
+	if (priv->vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
 	priv->vmem = mem;
 	for (i = 0; i < mem->nregions; i++) {
 		reg = &mem->regions[i];
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 6bda9f1814a..c42846ecb3c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -66,10 +66,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Release cached VQ resources. */
+void
+mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
+			if (virtq->umems[j].obj) {
+				claim_zero(mlx5_glue->devx_umem_dereg
+							(virtq->umems[j].obj));
+				virtq->umems[j].obj = NULL;
+			}
+			if (virtq->umems[j].buf) {
+				rte_free(virtq->umems[j].buf);
+				virtq->umems[j].buf = NULL;
+			}
+			virtq->umems[j].size = 0;
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	unsigned int i;
 	int ret = -EAGAIN;
 
 	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
@@ -94,13 +117,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
-							 (virtq->umems[i].obj));
-		rte_free(virtq->umems[i].buf);
-	}
-	memset(&virtq->umems, 0, sizeof(virtq->umems));
 	if (virtq->eqp.fw_qp)
 		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
@@ -120,7 +136,6 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
 	}
 	priv->features = 0;
-	memset(priv->virtqs, 0, sizeof(*virtq) * priv->nr_virtqs);
 	priv->nr_virtqs = 0;
 }
 
@@ -215,6 +230,8 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
 	if (ret)
 		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->index = index;
 	virtq->vq_size = vq.size;
 	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
@@ -259,24 +276,42 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	/* Setup 3 UMEMs for each virtq. */
 	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
-							  priv->caps.umems[i].b;
-		virtq->umems[i].buf = rte_zmalloc(__func__,
-						  virtq->umems[i].size, 4096);
-		if (!virtq->umems[i].buf) {
+		uint32_t size;
+		void *buf;
+		struct mlx5dv_devx_umem *obj;
+
+		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
+		if (virtq->umems[i].size == size &&
+		    virtq->umems[i].obj != NULL) {
+			/* Reuse registered memory. */
+			memset(virtq->umems[i].buf, 0, size);
+			goto reuse;
+		}
+		if (virtq->umems[i].obj)
+			claim_zero(mlx5_glue->devx_umem_dereg
+				   (virtq->umems[i].obj));
+		if (virtq->umems[i].buf)
+			rte_free(virtq->umems[i].buf);
+		virtq->umems[i].size = 0;
+		virtq->umems[i].obj = NULL;
+		virtq->umems[i].buf = NULL;
+		buf = rte_zmalloc(__func__, size, 4096);
+		if (buf == NULL) {
 			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
 			goto error;
 		}
-		virtq->umems[i].obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
-							virtq->umems[i].buf,
-							virtq->umems[i].size,
-							IBV_ACCESS_LOCAL_WRITE);
-		if (!virtq->umems[i].obj) {
+		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
+					       IBV_ACCESS_LOCAL_WRITE);
+		if (obj == NULL) {
 			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
 			goto error;
 		}
+		virtq->umems[i].size = size;
+		virtq->umems[i].buf = buf;
+		virtq->umems[i].obj = obj;
+reuse:
 		attr.umems[i].id = virtq->umems[i].obj->umem_id;
 		attr.umems[i].offset = 0;
 		attr.umems[i].size = virtq->umems[i].size;
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 6/7] vdpa/mlx5: support device cleanup callback
  2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (4 preceding siblings ...)
  2022-02-24 15:50   ` [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
@ 2022-02-24 15:51   ` Xueming Li
  2022-04-21  8:19     ` Maxime Coquelin
  2022-02-24 15:51   ` [PATCH v2 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
  6 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2022-02-24 15:51 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

This patch supports device cleanup callback API which called when device
disconnected with VM. Cached resources like VM MR and VQ memory are
released.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 23 +++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 38ed45f95f7..47874c9b1ff 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -270,6 +270,8 @@ mlx5_vdpa_dev_close(int vid)
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
+	if (!priv->connected)
+		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
@@ -294,6 +296,7 @@ mlx5_vdpa_dev_config(int vid)
 		return -1;
 	}
 	priv->vid = vid;
+	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %s.",
 				vdev->device->name);
@@ -431,12 +434,32 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 	return mlx5_vdpa_virtq_stats_reset(priv, qid);
 }
 
+static int
+mlx5_vdpa_dev_clean(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (vdev == NULL)
+		return -1;
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	if (priv->state == MLX5_VDPA_STATE_PROBED)
+		mlx5_vdpa_dev_cache_clean(priv);
+	priv->connected = false;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = mlx5_vdpa_dev_config,
 	.dev_close = mlx5_vdpa_dev_close,
+	.dev_cleanup = mlx5_vdpa_dev_clean,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 540bf87a352..24bafe85b44 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -121,6 +121,7 @@ enum mlx5_dev_state {
 
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	bool connected;
 	enum mlx5_dev_state state;
 	pthread_mutex_t vq_config_lock;
 	uint64_t no_traffic_counter;
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 7/7] vdpa/mlx5: make statistics counter persistent
  2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (5 preceding siblings ...)
  2022-02-24 15:51   ` [PATCH v2 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
@ 2022-02-24 15:51   ` Xueming Li
  2022-04-21  8:22     ` Maxime Coquelin
  6 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2022-02-24 15:51 UTC (permalink / raw)
  To: dev; +Cc: xuemingl, Matan Azrad, Viacheslav Ovsiienko

To speed the device suspend and resume time, make counter persitent
in reconfiguration until the device gets removed.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 doc/guides/vdpadevs/mlx5.rst        |  6 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 19 +++++++----------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  1 +
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 32 +++++++++++------------------
 4 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index 30f0b62eb41..070208d3952 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -182,3 +182,9 @@ Upon potential hardware errors, mlx5 PMD try to recover, give up if failed 3
 times in 3 seconds, virtq will be put in disable state. User should check log
 to get error information, or query vdpa statistics counter to know error type
 and count report.
+
+Statistics
+^^^^^^^^^^
+
+The device statistics counter persists in reconfiguration until the device gets
+removed. User can reset counters by calling function rte_vdpa_reset_stats().
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 47874c9b1ff..c695f176c9d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -388,12 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED) {
-		DRV_LOG(ERR, "Device %s was not configured.",
-				vdev->device->name);
-		return -ENODATA;
-	}
-	if (qid >= (int)priv->nr_virtqs) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -416,12 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED) {
-		DRV_LOG(ERR, "Device %s was not configured.",
-				vdev->device->name);
-		return -ENODATA;
-	}
-	if (qid >= (int)priv->nr_virtqs) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -691,6 +681,11 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 
 	mlx5_vdpa_dev_cache_clean(priv);
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		if (!priv->virtqs[i].counters)
+			continue;
+		claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
+	}
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_err_event_unset(priv);
 	if (priv->steer.tbl)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 24bafe85b44..e7f3319f896 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -92,6 +92,7 @@ struct mlx5_vdpa_virtq {
 	struct rte_intr_handle *intr_handle;
 	uint64_t err_time[3]; /* RDTSC time of recent errors. */
 	uint32_t n_retry;
+	struct mlx5_devx_virtio_q_couners_attr stats;
 	struct mlx5_devx_virtio_q_couners_attr reset;
 };
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index c42846ecb3c..d2c91b25db1 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -127,14 +127,9 @@ void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
 	int i;
-	struct mlx5_vdpa_virtq *virtq;
 
-	for (i = 0; i < priv->nr_virtqs; i++) {
-		virtq = &priv->virtqs[i];
-		mlx5_vdpa_virtq_unset(virtq);
-		if (virtq->counters)
-			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
-	}
+	for (i = 0; i < priv->nr_virtqs; i++)
+		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -590,7 +585,7 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 			  struct rte_vdpa_stat *stats, unsigned int n)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[qid];
-	struct mlx5_devx_virtio_q_couners_attr attr = {0};
+	struct mlx5_devx_virtio_q_couners_attr *attr = &virtq->stats;
 	int ret;
 
 	if (!virtq->counters) {
@@ -598,7 +593,7 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 			"is invalid.", qid);
 		return -EINVAL;
 	}
-	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters, &attr);
+	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters, attr);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to read virtq %d stats from HW.", qid);
 		return ret;
@@ -608,37 +603,37 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 		return ret;
 	stats[MLX5_VDPA_STATS_RECEIVED_DESCRIPTORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_RECEIVED_DESCRIPTORS,
-		.value = attr.received_desc - virtq->reset.received_desc,
+		.value = attr->received_desc - virtq->reset.received_desc,
 	};
 	if (ret == MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS,
-		.value = attr.completed_desc - virtq->reset.completed_desc,
+		.value = attr->completed_desc - virtq->reset.completed_desc,
 	};
 	if (ret == MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS,
-		.value = attr.bad_desc_errors - virtq->reset.bad_desc_errors,
+		.value = attr->bad_desc_errors - virtq->reset.bad_desc_errors,
 	};
 	if (ret == MLX5_VDPA_STATS_EXCEED_MAX_CHAIN)
 		return ret;
 	stats[MLX5_VDPA_STATS_EXCEED_MAX_CHAIN] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_EXCEED_MAX_CHAIN,
-		.value = attr.exceed_max_chain - virtq->reset.exceed_max_chain,
+		.value = attr->exceed_max_chain - virtq->reset.exceed_max_chain,
 	};
 	if (ret == MLX5_VDPA_STATS_INVALID_BUFFER)
 		return ret;
 	stats[MLX5_VDPA_STATS_INVALID_BUFFER] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_INVALID_BUFFER,
-		.value = attr.invalid_buffer - virtq->reset.invalid_buffer,
+		.value = attr->invalid_buffer - virtq->reset.invalid_buffer,
 	};
 	if (ret == MLX5_VDPA_STATS_COMPLETION_ERRORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_COMPLETION_ERRORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_COMPLETION_ERRORS,
-		.value = attr.error_cqes - virtq->reset.error_cqes,
+		.value = attr->error_cqes - virtq->reset.error_cqes,
 	};
 	return ret;
 }
@@ -649,11 +644,8 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[qid];
 	int ret;
 
-	if (!virtq->counters) {
-		DRV_LOG(ERR, "Failed to read virtq %d statistics - virtq "
-			"is invalid.", qid);
-		return -EINVAL;
-	}
+	if (virtq->counters == NULL) /* VQ not enabled. */
+		return 0;
 	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters,
 						    &virtq->reset);
 	if (ret)
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/7] vdpa/mlx5: fix dead loop when process interrupted
  2022-02-24 15:50   ` [PATCH v2 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
@ 2022-04-20 10:33     ` Maxime Coquelin
  0 siblings, 0 replies; 43+ messages in thread
From: Maxime Coquelin @ 2022-04-20 10:33 UTC (permalink / raw)
  To: Xueming Li, dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko



On 2/24/22 16:50, Xueming Li wrote:
> In Ctrl+C handling, sometimes kick handling thread gets endless EGAIN
> error and fall into dead lock.
> 
> Kick happens frequently in real system due to busy traffic or retry
> mechanism. This patch simplifies kick firmware anyway and skip setting
> hardware notifier due to potential device error, notifier could be set
> in next successful kick request.
> 
> Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> index de324506cb9..e1e05924a40 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> @@ -23,11 +23,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
>   	struct mlx5_vdpa_priv *priv = virtq->priv;
>   	uint64_t buf;
>   	int nbytes;
> +	int retry;
>   
>   	if (rte_intr_fd_get(virtq->intr_handle) < 0)
>   		return;
> -
> -	do {
> +	for (retry = 0; retry < 3; ++retry) {
>   		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
>   			      8);
>   		if (nbytes < 0) {
> @@ -39,7 +39,9 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
>   				virtq->index, strerror(errno));
>   		}
>   		break;
> -	} while (1);
> +	}
> +	if (nbytes < 0)
> +		return;
>   	rte_write32(virtq->index, priv->virtq_db_addr);
>   	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
>   		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault
  2022-02-24 15:50   ` [PATCH v2 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
@ 2022-04-20 10:39     ` Maxime Coquelin
  0 siblings, 0 replies; 43+ messages in thread
From: Maxime Coquelin @ 2022-04-20 10:39 UTC (permalink / raw)
  To: Xueming Li, dev; +Cc: matan, stable, Matan Azrad, Viacheslav Ovsiienko



On 2/24/22 16:50, Xueming Li wrote:
> Disable interrupt unregister timeout to avoid invalid FD caused
> interrupt thread segment fault.
> 
> Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
> Cc: matan@mellanox.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 20 ++++++++------------
>   1 file changed, 8 insertions(+), 12 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/7] vdpa/mlx5: no kick handling during shutdown
  2022-02-24 15:50   ` [PATCH v2 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
@ 2022-04-20 12:37     ` Maxime Coquelin
  2022-04-20 13:23       ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Maxime Coquelin @ 2022-04-20 12:37 UTC (permalink / raw)
  To: Xueming Li, dev; +Cc: Matan Azrad, Viacheslav Ovsiienko



On 2/24/22 16:50, Xueming Li wrote:
> When Qemu suspend a VM, hw notifier is un-mmapped while vCPU thread may
suspends
> still active and write notifier through kick socket.
still be active

> 
> PMD kick handler thread tries to install hw notifier through client
> socket in such case will timeout and slow down device close.
socket. In such case, it will

> 
> This patch skips hw notifier install if VQ or device in middle of
> shutdown.
> 
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c       | 17 ++++++++++-------
>   drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++++++-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 12 +++++++++++-
>   3 files changed, 28 insertions(+), 9 deletions(-)
> 

Other than the commit messages comments:

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

If you are fine with my suggestions and no other revision needed, I can
fixup while applying.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v2 3/7] vdpa/mlx5: no kick handling during shutdown
  2022-04-20 12:37     ` Maxime Coquelin
@ 2022-04-20 13:23       ` Xueming(Steven) Li
  0 siblings, 0 replies; 43+ messages in thread
From: Xueming(Steven) Li @ 2022-04-20 13:23 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: Matan Azrad, Slava Ovsiienko


> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, April 20, 2022 8:38 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: Re: [PATCH v2 3/7] vdpa/mlx5: no kick handling during shutdown
> 
> 
> 
> On 2/24/22 16:50, Xueming Li wrote:
> > When Qemu suspend a VM, hw notifier is un-mmapped while vCPU thread
> > may
> suspends
> > still active and write notifier through kick socket.
> still be active
> 
> >
> > PMD kick handler thread tries to install hw notifier through client
> > socket in such case will timeout and slow down device close.
> socket. In such case, it will
> 
> >
> > This patch skips hw notifier install if VQ or device in middle of
> > shutdown.
> >
> > Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> > ---
> >   drivers/vdpa/mlx5/mlx5_vdpa.c       | 17 ++++++++++-------
> >   drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++++++-
> >   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 12 +++++++++++-
> >   3 files changed, 28 insertions(+), 9 deletions(-)
> >
> 
> Other than the commit messages comments:
> 
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> If you are fine with my suggestions and no other revision needed, I can fixup while applying.

Hi Maxime,

No further changes so far, please continue, thanks for taking care of this series!

> 
> Thanks,
> Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 4/7] vdpa/mlx5: reuse resources in reconfiguration
  2022-02-24 15:50   ` [PATCH v2 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
@ 2022-04-20 14:49     ` Maxime Coquelin
  0 siblings, 0 replies; 43+ messages in thread
From: Maxime Coquelin @ 2022-04-20 14:49 UTC (permalink / raw)
  To: Xueming Li, dev; +Cc: Matan Azrad, Viacheslav Ovsiienko



On 2/24/22 16:50, Xueming Li wrote:
> To speed up device resume, create reuseable resources during device
> probe state, release when device remove. Reused resources includes TIS,

"when device is removed"

> TD, VAR Doorbell mmap, error handling event channel and interrupt
> handler, UAR, Rx event channel, NULL MR, steer domain and table.
> 
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c       | 167 +++++++++++++++++++++-------
>   drivers/vdpa/mlx5/mlx5_vdpa.h       |   9 ++
>   drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 ++--
>   drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  11 --
>   drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  30 +----
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  44 --------
>   6 files changed, 149 insertions(+), 135 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources
  2022-02-24 15:50   ` [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
@ 2022-04-20 15:03     ` Maxime Coquelin
  2022-04-25 13:28       ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Maxime Coquelin @ 2022-04-20 15:03 UTC (permalink / raw)
  To: Xueming Li, dev; +Cc: Matan Azrad, Viacheslav Ovsiienko



On 2/24/22 16:50, Xueming Li wrote:
> During device suspend and resume, resources are not changed normally.
> When huge resources allocated to VM, like huge memory size or lots of

"When huge resources were allocated"

> queues, time spent on release and recreate became significant.
> 
> To speed up, this patch reuse resoruces like VM MR and VirtQ memory if

"reuses resources"

> not changed.
> 
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c       | 11 ++++-
>   drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 ++++-
>   drivers/vdpa/mlx5/mlx5_vdpa_mem.c   | 27 ++++++++++-
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 73 +++++++++++++++++++++--------
>   4 files changed, 99 insertions(+), 24 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 6/7] vdpa/mlx5: support device cleanup callback
  2022-02-24 15:51   ` [PATCH v2 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
@ 2022-04-21  8:19     ` Maxime Coquelin
  0 siblings, 0 replies; 43+ messages in thread
From: Maxime Coquelin @ 2022-04-21  8:19 UTC (permalink / raw)
  To: Xueming Li, dev; +Cc: Matan Azrad, Viacheslav Ovsiienko



On 2/24/22 16:51, Xueming Li wrote:
> This patch supports device cleanup callback API which called when device
> disconnected with VM.

"This patch supports device cleanup callback API which is called when
the device is disconnected from the VM."

> Cached resources like VM MR and VQ memory are
> released.
> 
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> ---
>   drivers/vdpa/mlx5/mlx5_vdpa.c | 23 +++++++++++++++++++++++
>   drivers/vdpa/mlx5/mlx5_vdpa.h |  1 +
>   2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index 38ed45f95f7..47874c9b1ff 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -270,6 +270,8 @@ mlx5_vdpa_dev_close(int vid)
>   	if (priv->lm_mr.addr)
>   		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
>   	priv->state = MLX5_VDPA_STATE_PROBED;
> +	if (!priv->connected)
> +		mlx5_vdpa_dev_cache_clean(priv);
>   	priv->vid = 0;
>   	/* The mutex may stay locked after event thread cancel - initiate it. */
>   	pthread_mutex_init(&priv->vq_config_lock, NULL);
> @@ -294,6 +296,7 @@ mlx5_vdpa_dev_config(int vid)
>   		return -1;
>   	}
>   	priv->vid = vid;
> +	priv->connected = true;
>   	if (mlx5_vdpa_mtu_set(priv))
>   		DRV_LOG(WARNING, "MTU cannot be set on device %s.",
>   				vdev->device->name);
> @@ -431,12 +434,32 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
>   	return mlx5_vdpa_virtq_stats_reset(priv, qid);
>   }
>   
> +static int
> +mlx5_vdpa_dev_clean(int vid)

mlx5_vdpa_dev_cleanup

> +{
> +	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
> +	struct mlx5_vdpa_priv *priv;
> +
> +	if (vdev == NULL)
> +		return -1;
> +	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
> +	if (priv == NULL) {
> +		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
> +		return -1;
> +	}
> +	if (priv->state == MLX5_VDPA_STATE_PROBED)
> +		mlx5_vdpa_dev_cache_clean(priv);
> +	priv->connected = false;
> +	return 0;
> +}
> +
>   static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
>   	.get_queue_num = mlx5_vdpa_get_queue_num,
>   	.get_features = mlx5_vdpa_get_vdpa_features,
>   	.get_protocol_features = mlx5_vdpa_get_protocol_features,
>   	.dev_conf = mlx5_vdpa_dev_config,
>   	.dev_close = mlx5_vdpa_dev_close,
> +	.dev_cleanup = mlx5_vdpa_dev_clean,
>   	.set_vring_state = mlx5_vdpa_set_vring_state,
>   	.set_features = mlx5_vdpa_features_set,
>   	.migration_done = NULL,
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
> index 540bf87a352..24bafe85b44 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
> @@ -121,6 +121,7 @@ enum mlx5_dev_state {
>   
>   struct mlx5_vdpa_priv {
>   	TAILQ_ENTRY(mlx5_vdpa_priv) next;
> +	bool connected;
>   	enum mlx5_dev_state state;
>   	pthread_mutex_t vq_config_lock;
>   	uint64_t no_traffic_counter;


Other then that:

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 7/7] vdpa/mlx5: make statistics counter persistent
  2022-02-24 15:51   ` [PATCH v2 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
@ 2022-04-21  8:22     ` Maxime Coquelin
  0 siblings, 0 replies; 43+ messages in thread
From: Maxime Coquelin @ 2022-04-21  8:22 UTC (permalink / raw)
  To: Xueming Li, dev; +Cc: Matan Azrad, Viacheslav Ovsiienko

"vdpa/mlx5: make statistics counters persistent"

On 2/24/22 16:51, Xueming Li wrote:
> To speed the device suspend and resume time, make counter persitent
> in reconfiguration until the device gets removed.

"In order to speed-up the device suspend and resume, make the statistics
counters persistent in reconfiguration..."

> 
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> ---
>   doc/guides/vdpadevs/mlx5.rst        |  6 ++++++
>   drivers/vdpa/mlx5/mlx5_vdpa.c       | 19 +++++++----------
>   drivers/vdpa/mlx5/mlx5_vdpa.h       |  1 +
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 32 +++++++++++------------------
>   4 files changed, 26 insertions(+), 32 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources
  2022-04-20 15:03     ` Maxime Coquelin
@ 2022-04-25 13:28       ` Xueming(Steven) Li
  2022-05-05 20:01         ` Maxime Coquelin
  0 siblings, 1 reply; 43+ messages in thread
From: Xueming(Steven) Li @ 2022-04-25 13:28 UTC (permalink / raw)
  To: Maxime Coquelin, dev; +Cc: Matan Azrad, Slava Ovsiienko

Hi Maxime,

Thanks for the suggestion, I'll send out a new version.

Regards,
Xueming Li

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, April 20, 2022 11:03 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: Re: [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources
> 
> 
> 
> On 2/24/22 16:50, Xueming Li wrote:
> > During device suspend and resume, resources are not changed normally.
> > When huge resources allocated to VM, like huge memory size or lots of
> 
> "When huge resources were allocated"
> 
> > queues, time spent on release and recreate became significant.
> >
> > To speed up, this patch reuse resoruces like VM MR and VirtQ memory if
> 
> "reuses resources"
> 
> > not changed.
> >
> > Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> > ---
> >   drivers/vdpa/mlx5/mlx5_vdpa.c       | 11 ++++-
> >   drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 ++++-
> >   drivers/vdpa/mlx5/mlx5_vdpa_mem.c   | 27 ++++++++++-
> >   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 73 +++++++++++++++++++++--------
> >   4 files changed, 99 insertions(+), 24 deletions(-)
> >
> 
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Thanks,
> Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources
  2022-04-25 13:28       ` Xueming(Steven) Li
@ 2022-05-05 20:01         ` Maxime Coquelin
  0 siblings, 0 replies; 43+ messages in thread
From: Maxime Coquelin @ 2022-05-05 20:01 UTC (permalink / raw)
  To: Xueming(Steven) Li, dev; +Cc: Matan Azrad, Slava Ovsiienko

Hi Xueming,

On 4/25/22 15:28, Xueming(Steven) Li wrote:
> Hi Maxime,
> 
> Thanks for the suggestion, I'll send out a new version.

Ok, if you sent it early next week, it could be part of next week's PR.

Thanks,
Maxime

> Regards,
> Xueming Li
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Wednesday, April 20, 2022 11:03 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>; dev@dpdk.org
>> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
>> Subject: Re: [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources
>>
>>
>>
>> On 2/24/22 16:50, Xueming Li wrote:
>>> During device suspend and resume, resources are not changed normally.
>>> When huge resources allocated to VM, like huge memory size or lots of
>>
>> "When huge resources were allocated"
>>
>>> queues, time spent on release and recreate became significant.
>>>
>>> To speed up, this patch reuse resoruces like VM MR and VirtQ memory if
>>
>> "reuses resources"
>>
>>> not changed.
>>>
>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
>>> ---
>>>    drivers/vdpa/mlx5/mlx5_vdpa.c       | 11 ++++-
>>>    drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 ++++-
>>>    drivers/vdpa/mlx5/mlx5_vdpa_mem.c   | 27 ++++++++++-
>>>    drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 73 +++++++++++++++++++++--------
>>>    4 files changed, 99 insertions(+), 24 deletions(-)
>>>
>>
>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>
>> Thanks,
>> Maxime
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time
  2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                   ` (8 preceding siblings ...)
  2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
@ 2022-05-08 14:25 ` Xueming Li
  2022-05-08 14:25   ` [PATCH v3 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
                     ` (7 more replies)
  9 siblings, 8 replies; 43+ messages in thread
From: Xueming Li @ 2022-05-08 14:25 UTC (permalink / raw)
  To: dev, Maxime Coquelin; +Cc: xuemingl

v1:
 - rebase with latest upstream code
 - fix coverity issues
v2:
 - fix build issue on OS w/o flow DR API
v3:
 - commit message update, thanks Maxime!


Xueming Li (7):
  vdpa/mlx5: fix interrupt trash that leads to segment fault
  vdpa/mlx5: fix dead loop when process interrupted
  vdpa/mlx5: no kick handling during shutdown
  vdpa/mlx5: reuse resources in reconfiguration
  vdpa/mlx5: cache and reuse hardware resources
  vdpa/mlx5: support device cleanup callback
  vdpa/mlx5: make statistics counter persistent

 doc/guides/vdpadevs/mlx5.rst        |   6 +
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 231 +++++++++++++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  31 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 +--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  38 +++--
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  30 +---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 189 +++++++++++------------
 7 files changed, 336 insertions(+), 212 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
@ 2022-05-08 14:25   ` Xueming Li
  2022-05-08 14:25   ` [PATCH v3 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-05-08 14:25 UTC (permalink / raw)
  To: dev, Maxime Coquelin; +Cc: xuemingl, matan, stable

Disable interrupt unregister timeout to avoid invalid FD caused
interrupt thread segment fault.

Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
Cc: matan@mellanox.com
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 3416797d289..2e517beda24 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -17,7 +17,7 @@
 
 
 static void
-mlx5_vdpa_virtq_handler(void *cb_arg)
+mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 {
 	struct mlx5_vdpa_virtq *virtq = cb_arg;
 	struct mlx5_vdpa_priv *priv = virtq->priv;
@@ -59,20 +59,16 @@ static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
 	unsigned int i;
-	int retries = MLX5_VDPA_INTR_RETRIES;
 	int ret = -EAGAIN;
 
-	if (rte_intr_fd_get(virtq->intr_handle) != -1) {
-		while (retries-- && ret == -EAGAIN) {
+	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
+		while (ret == -EAGAIN) {
 			ret = rte_intr_callback_unregister(virtq->intr_handle,
-							mlx5_vdpa_virtq_handler,
-							virtq);
+					mlx5_vdpa_virtq_kick_handler, virtq);
 			if (ret == -EAGAIN) {
-				DRV_LOG(DEBUG, "Try again to unregister fd %d "
-				"of virtq %d interrupt, retries = %d.",
-				rte_intr_fd_get(virtq->intr_handle),
-				(int)virtq->index, retries);
-
+				DRV_LOG(DEBUG, "Try again to unregister fd %d of virtq %hu interrupt",
+					rte_intr_fd_get(virtq->intr_handle),
+					virtq->index);
 				usleep(MLX5_VDPA_INTR_RETRIES_USEC);
 			}
 		}
@@ -359,7 +355,7 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 			goto error;
 
 		if (rte_intr_callback_register(virtq->intr_handle,
-					       mlx5_vdpa_virtq_handler,
+					       mlx5_vdpa_virtq_kick_handler,
 					       virtq)) {
 			rte_intr_fd_set(virtq->intr_handle, -1);
 			DRV_LOG(ERR, "Failed to register virtq %d interrupt.",
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 2/7] vdpa/mlx5: fix dead loop when process interrupted
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  2022-05-08 14:25   ` [PATCH v3 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
@ 2022-05-08 14:25   ` Xueming Li
  2022-05-08 14:25   ` [PATCH v3 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-05-08 14:25 UTC (permalink / raw)
  To: dev, Maxime Coquelin; +Cc: xuemingl, stable

In Ctrl+C handling, sometimes kick handling thread gets endless EGAIN
error and fall into dead lock.

Kick happens frequently in real system due to busy traffic or retry
mechanism. This patch simplifies kick firmware anyway and skip setting
hardware notifier due to potential device error, notifier could be set
in next successful kick request.

Fixes: 62c813706e41 ("vdpa/mlx5: map doorbell")
Cc: stable@dpdk.org

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 2e517beda24..2696d54b412 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -23,11 +23,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	struct mlx5_vdpa_priv *priv = virtq->priv;
 	uint64_t buf;
 	int nbytes;
+	int retry;
 
 	if (rte_intr_fd_get(virtq->intr_handle) < 0)
 		return;
-
-	do {
+	for (retry = 0; retry < 3; ++retry) {
 		nbytes = read(rte_intr_fd_get(virtq->intr_handle), &buf,
 			      8);
 		if (nbytes < 0) {
@@ -39,7 +39,9 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 				virtq->index, strerror(errno));
 		}
 		break;
-	} while (1);
+	}
+	if (nbytes < 0)
+		return;
 	rte_write32(virtq->index, priv->virtq_db_addr);
 	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
 		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 3/7] vdpa/mlx5: no kick handling during shutdown
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
  2022-05-08 14:25   ` [PATCH v3 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
  2022-05-08 14:25   ` [PATCH v3 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
@ 2022-05-08 14:25   ` Xueming Li
  2022-05-08 14:25   ` [PATCH v3 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-05-08 14:25 UTC (permalink / raw)
  To: dev, Maxime Coquelin; +Cc: xuemingl

When Qemu suspends a VM, hw notifier is un-mmapped while vCPU thread may
still be active and write notifier through kick socket.

PMD kick handler thread tries to install hw notifier through client
socket. In such case, it will timeout and slow down device close.

This patch skips hw notifier install if VQ or device in middle of
shutdown.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 17 ++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 12 +++++++++++-
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 749c9d097cf..48f20d9ecdb 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -252,13 +252,15 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_err_event_unset(priv);
 	mlx5_vdpa_cqe_event_unset(priv);
-	if (priv->configured)
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
+		priv->state = MLX5_VDPA_STATE_IN_PROGRESS;
+	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_mem_dereg(priv);
-	priv->configured = 0;
+	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
@@ -277,7 +279,8 @@ mlx5_vdpa_dev_config(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -EINVAL;
 	}
-	if (priv->configured && mlx5_vdpa_dev_close(vid)) {
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED &&
+	    mlx5_vdpa_dev_close(vid)) {
 		DRV_LOG(ERR, "Failed to reconfigure vid %d.", vid);
 		return -1;
 	}
@@ -291,7 +294,7 @@ mlx5_vdpa_dev_config(int vid)
 		mlx5_vdpa_dev_close(vid);
 		return -1;
 	}
-	priv->configured = 1;
+	priv->state = MLX5_VDPA_STATE_CONFIGURED;
 	DRV_LOG(INFO, "vDPA device %d was configured.", vid);
 	return 0;
 }
@@ -373,7 +376,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		DRV_LOG(ERR, "Device %s was not configured.",
 				vdev->device->name);
 		return -ENODATA;
@@ -401,7 +404,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		DRV_LOG(ERR, "Device %s was not configured.",
 				vdev->device->name);
 		return -ENODATA;
@@ -594,7 +597,7 @@ mlx5_vdpa_dev_remove(struct mlx5_common_device *cdev)
 		TAILQ_REMOVE(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	if (found) {
-		if (priv->configured)
+		if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
 			mlx5_vdpa_dev_close(priv->vid);
 		if (priv->var) {
 			mlx5_glue->dv_free_var(priv->var);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 22617924eac..cc83d7cba3d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -113,9 +113,15 @@ enum {
 	MLX5_VDPA_EVENT_MODE_ONLY_INTERRUPT
 };
 
+enum mlx5_dev_state {
+	MLX5_VDPA_STATE_PROBED = 0,
+	MLX5_VDPA_STATE_CONFIGURED,
+	MLX5_VDPA_STATE_IN_PROGRESS /* Shutting down. */
+};
+
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
-	uint8_t configured;
+	enum mlx5_dev_state state;
 	pthread_mutex_t vq_config_lock;
 	uint64_t no_traffic_counter;
 	pthread_t timer_tid;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 2696d54b412..4c34983da41 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -25,6 +25,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	int nbytes;
 	int retry;
 
+	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+			priv->vid, virtq->index);
+		return;
+	}
 	if (rte_intr_fd_get(virtq->intr_handle) < 0)
 		return;
 	for (retry = 0; retry < 3; ++retry) {
@@ -43,6 +48,11 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	if (nbytes < 0)
 		return;
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	if (priv->state != MLX5_VDPA_STATE_CONFIGURED && !virtq->enable) {
+		DRV_LOG(ERR,  "device %d queue %d down, skip kick handling",
+			priv->vid, virtq->index);
+		return;
+	}
 	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
 		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
 			virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_ERR;
@@ -541,7 +551,7 @@ mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 
 	DRV_LOG(INFO, "Update virtq %d status %sable -> %sable.", index,
 		virtq->enable ? "en" : "dis", enable ? "en" : "dis");
-	if (!priv->configured) {
+	if (priv->state == MLX5_VDPA_STATE_PROBED) {
 		virtq->enable = !!enable;
 		return 0;
 	}
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 4/7] vdpa/mlx5: reuse resources in reconfiguration
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (2 preceding siblings ...)
  2022-05-08 14:25   ` [PATCH v3 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
@ 2022-05-08 14:25   ` Xueming Li
  2022-05-08 14:25   ` [PATCH v3 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-05-08 14:25 UTC (permalink / raw)
  To: dev, Maxime Coquelin; +Cc: xuemingl

To speed up device resume, create reuseable resources during device
probe state, release when device is removed. Reused resources includes
TIS,
TD, VAR Doorbell mmap, error handling event channel and interrupt
handler, UAR, Rx event channel, NULL MR, steer domain and table.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 167 +++++++++++++++++++++-------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |   9 ++
 drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 ++--
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  11 --
 drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  30 +----
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c |  44 --------
 6 files changed, 149 insertions(+), 135 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 48f20d9ecdb..4408aeccfbd 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -5,6 +5,7 @@
 #include <net/if.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/mman.h>
 #include <fcntl.h>
 #include <netinet/in.h>
 
@@ -49,6 +50,8 @@ TAILQ_HEAD(mlx5_vdpa_privs, mlx5_vdpa_priv) priv_list =
 					      TAILQ_HEAD_INITIALIZER(priv_list);
 static pthread_mutex_t priv_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+static void mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv);
+
 static struct mlx5_vdpa_priv *
 mlx5_vdpa_find_priv_resource_by_vdev(struct rte_vdpa_device *vdev)
 {
@@ -250,7 +253,6 @@ mlx5_vdpa_dev_close(int vid)
 		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
 		return -1;
 	}
-	mlx5_vdpa_err_event_unset(priv);
 	mlx5_vdpa_cqe_event_unset(priv);
 	if (priv->state == MLX5_VDPA_STATE_CONFIGURED) {
 		ret |= mlx5_vdpa_lm_log(priv);
@@ -258,7 +260,6 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
-	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_mem_dereg(priv);
 	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
@@ -288,7 +289,7 @@ mlx5_vdpa_dev_config(int vid)
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %s.",
 				vdev->device->name);
-	if (mlx5_vdpa_mem_register(priv) || mlx5_vdpa_err_event_setup(priv) ||
+	if (mlx5_vdpa_mem_register(priv) ||
 	    mlx5_vdpa_virtqs_prepare(priv) || mlx5_vdpa_steer_setup(priv) ||
 	    mlx5_vdpa_cqe_event_setup(priv)) {
 		mlx5_vdpa_dev_close(vid);
@@ -507,13 +508,90 @@ mlx5_vdpa_config_get(struct mlx5_kvargs_ctrl *mkvlist,
 	DRV_LOG(DEBUG, "no traffic max is %u.", priv->no_traffic_max);
 }
 
+static int
+mlx5_vdpa_create_dev_resources(struct mlx5_vdpa_priv *priv)
+{
+	struct mlx5_devx_tis_attr tis_attr = {0};
+	struct ibv_context *ctx = priv->cdev->ctx;
+	uint32_t i;
+	int retry;
+
+	for (retry = 0; retry < 7; retry++) {
+		priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+		if (priv->var != NULL)
+			break;
+		DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.", retry);
+		/* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
+		usleep(100000U << retry);
+	}
+	if (!priv->var) {
+		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	/* Always map the entire page. */
+	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
+				   PROT_WRITE, MAP_SHARED, ctx->cmd_fd,
+				   priv->var->mmap_off);
+	if (priv->virtq_db_addr == MAP_FAILED) {
+		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
+		priv->virtq_db_addr = NULL;
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
+		priv->virtq_db_addr);
+	priv->td = mlx5_devx_cmd_create_td(ctx);
+	if (!priv->td) {
+		DRV_LOG(ERR, "Failed to create transport domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	tis_attr.transport_domain = priv->td->id;
+	for (i = 0; i < priv->num_lag_ports; i++) {
+		/* 0 is auto affinity, non-zero value to propose port. */
+		tis_attr.lag_tx_port_affinity = i + 1;
+		priv->tiss[i] = mlx5_devx_cmd_create_tis(ctx, &tis_attr);
+		if (!priv->tiss[i]) {
+			DRV_LOG(ERR, "Failed to create TIS %u.", i);
+			return -rte_errno;
+		}
+	}
+	priv->null_mr = mlx5_glue->alloc_null_mr(priv->cdev->pd);
+	if (!priv->null_mr) {
+		DRV_LOG(ERR, "Failed to allocate null MR.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
+#ifdef HAVE_MLX5DV_DR
+	priv->steer.domain = mlx5_glue->dr_create_domain(ctx,
+					MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
+	if (!priv->steer.domain) {
+		DRV_LOG(ERR, "Failed to create Rx domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+#endif
+	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
+	if (!priv->steer.tbl) {
+		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	if (mlx5_vdpa_err_event_setup(priv) != 0)
+		return -rte_errno;
+	if (mlx5_vdpa_event_qp_global_prepare(priv))
+		return -rte_errno;
+	return 0;
+}
+
 static int
 mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 		    struct mlx5_kvargs_ctrl *mkvlist)
 {
 	struct mlx5_vdpa_priv *priv = NULL;
 	struct mlx5_hca_attr *attr = &cdev->config.hca_attr;
-	int retry;
 
 	if (!attr->vdpa.valid || !attr->vdpa.max_num_virtio_queues) {
 		DRV_LOG(ERR, "Not enough capabilities to support vdpa, maybe "
@@ -537,25 +615,10 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	priv->num_lag_ports = attr->num_lag_ports;
 	if (attr->num_lag_ports == 0)
 		priv->num_lag_ports = 1;
+	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	priv->cdev = cdev;
-	for (retry = 0; retry < 7; retry++) {
-		priv->var = mlx5_glue->dv_alloc_var(priv->cdev->ctx, 0);
-		if (priv->var != NULL)
-			break;
-		DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
-		/* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
-		usleep(100000U << retry);
-	}
-	if (!priv->var) {
-		DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
+	if (mlx5_vdpa_create_dev_resources(priv))
 		goto error;
-	}
-	priv->err_intr_handle =
-		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
-	if (priv->err_intr_handle == NULL) {
-		DRV_LOG(ERR, "Fail to allocate intr_handle");
-		goto error;
-	}
 	priv->vdev = rte_vdpa_register_device(cdev->dev, &mlx5_vdpa_ops);
 	if (priv->vdev == NULL) {
 		DRV_LOG(ERR, "Failed to register vDPA device.");
@@ -564,19 +627,13 @@ mlx5_vdpa_dev_probe(struct mlx5_common_device *cdev,
 	}
 	mlx5_vdpa_config_get(mkvlist, priv);
 	SLIST_INIT(&priv->mr_list);
-	pthread_mutex_init(&priv->vq_config_lock, NULL);
 	pthread_mutex_lock(&priv_list_lock);
 	TAILQ_INSERT_TAIL(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
 	return 0;
-
 error:
-	if (priv) {
-		if (priv->var)
-			mlx5_glue->dv_free_var(priv->var);
-		rte_intr_instance_free(priv->err_intr_handle);
-		rte_free(priv);
-	}
+	if (priv)
+		mlx5_vdpa_dev_release(priv);
 	return -rte_errno;
 }
 
@@ -596,22 +653,48 @@ mlx5_vdpa_dev_remove(struct mlx5_common_device *cdev)
 	if (found)
 		TAILQ_REMOVE(&priv_list, priv, next);
 	pthread_mutex_unlock(&priv_list_lock);
-	if (found) {
-		if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
-			mlx5_vdpa_dev_close(priv->vid);
-		if (priv->var) {
-			mlx5_glue->dv_free_var(priv->var);
-			priv->var = NULL;
-		}
-		if (priv->vdev)
-			rte_vdpa_unregister_device(priv->vdev);
-		pthread_mutex_destroy(&priv->vq_config_lock);
-		rte_intr_instance_free(priv->err_intr_handle);
-		rte_free(priv);
-	}
+	if (found)
+		mlx5_vdpa_dev_release(priv);
 	return 0;
 }
 
+static void
+mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
+{
+	uint32_t i;
+
+	mlx5_vdpa_event_qp_global_release(priv);
+	mlx5_vdpa_err_event_unset(priv);
+	if (priv->steer.tbl)
+		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
+	if (priv->steer.domain)
+		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
+	if (priv->null_mr)
+		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
+	for (i = 0; i < priv->num_lag_ports; i++) {
+		if (priv->tiss[i])
+			claim_zero(mlx5_devx_cmd_destroy(priv->tiss[i]));
+	}
+	if (priv->td)
+		claim_zero(mlx5_devx_cmd_destroy(priv->td));
+	if (priv->virtq_db_addr)
+		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
+	if (priv->var)
+		mlx5_glue->dv_free_var(priv->var);
+}
+
+static void
+mlx5_vdpa_dev_release(struct mlx5_vdpa_priv *priv)
+{
+	if (priv->state == MLX5_VDPA_STATE_CONFIGURED)
+		mlx5_vdpa_dev_close(priv->vid);
+	mlx5_vdpa_release_dev_resources(priv);
+	if (priv->vdev)
+		rte_vdpa_unregister_device(priv->vdev);
+	pthread_mutex_destroy(&priv->vq_config_lock);
+	rte_free(priv);
+}
+
 static const struct rte_pci_id mlx5_vdpa_pci_id_map[] = {
 	{
 		RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index cc83d7cba3d..e0ba20b953c 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -233,6 +233,15 @@ int mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
  */
 void mlx5_vdpa_event_qp_destroy(struct mlx5_vdpa_event_qp *eqp);
 
+/**
+ * Create all the event global resources.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+int
+mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv);
+
 /**
  * Release all the event global resources.
  *
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index f8d910b33f8..7167a98db0f 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -40,11 +40,9 @@ mlx5_vdpa_event_qp_global_release(struct mlx5_vdpa_priv *priv)
 }
 
 /* Prepare all the global resources for all the event objects.*/
-static int
+int
 mlx5_vdpa_event_qp_global_prepare(struct mlx5_vdpa_priv *priv)
 {
-	if (priv->eventc)
-		return 0;
 	priv->eventc = mlx5_os_devx_create_event_channel(priv->cdev->ctx,
 			   MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
 	if (!priv->eventc) {
@@ -389,22 +387,30 @@ mlx5_vdpa_err_event_setup(struct mlx5_vdpa_priv *priv)
 	flags = fcntl(priv->err_chnl->fd, F_GETFL);
 	ret = fcntl(priv->err_chnl->fd, F_SETFL, flags | O_NONBLOCK);
 	if (ret) {
+		rte_errno = errno;
 		DRV_LOG(ERR, "Failed to change device event channel FD.");
 		goto error;
 	}
-
+	priv->err_intr_handle =
+		rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
+	if (priv->err_intr_handle == NULL) {
+		DRV_LOG(ERR, "Fail to allocate intr_handle");
+		goto error;
+	}
 	if (rte_intr_fd_set(priv->err_intr_handle, priv->err_chnl->fd))
 		goto error;
 
 	if (rte_intr_type_set(priv->err_intr_handle, RTE_INTR_HANDLE_EXT))
 		goto error;
 
-	if (rte_intr_callback_register(priv->err_intr_handle,
-				       mlx5_vdpa_err_interrupt_handler,
-				       priv)) {
+	ret = rte_intr_callback_register(priv->err_intr_handle,
+					 mlx5_vdpa_err_interrupt_handler,
+					 priv);
+	if (ret != 0) {
 		rte_intr_fd_set(priv->err_intr_handle, 0);
 		DRV_LOG(ERR, "Failed to register error interrupt for device %d.",
 			priv->vid);
+		rte_errno = -ret;
 		goto error;
 	} else {
 		DRV_LOG(DEBUG, "Registered error interrupt for device%d.",
@@ -453,6 +459,7 @@ mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv)
 		mlx5_glue->devx_destroy_event_channel(priv->err_chnl);
 		priv->err_chnl = NULL;
 	}
+	rte_intr_instance_free(priv->err_intr_handle);
 }
 
 int
@@ -575,8 +582,6 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, uint16_t desc_n,
 	uint16_t log_desc_n = rte_log2_u32(desc_n);
 	uint32_t ret;
 
-	if (mlx5_vdpa_event_qp_global_prepare(priv))
-		return -1;
 	if (mlx5_vdpa_cq_create(priv, log_desc_n, callfd, &eqp->cq))
 		return -1;
 	attr.pd = priv->cdev->pdn;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index 599079500b0..62f5530e91d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -34,10 +34,6 @@ mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 	SLIST_INIT(&priv->mr_list);
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
-	if (priv->null_mr) {
-		claim_zero(mlx5_glue->dereg_mr(priv->null_mr));
-		priv->null_mr = NULL;
-	}
 	if (priv->vmem) {
 		free(priv->vmem);
 		priv->vmem = NULL;
@@ -196,13 +192,6 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 	if (!mem)
 		return -rte_errno;
 	priv->vmem = mem;
-	priv->null_mr = mlx5_glue->alloc_null_mr(priv->cdev->pd);
-	if (!priv->null_mr) {
-		DRV_LOG(ERR, "Failed to allocate null MR.");
-		ret = -errno;
-		goto error;
-	}
-	DRV_LOG(DEBUG, "Dump fill Mkey = %u.", priv->null_mr->lkey);
 	for (i = 0; i < mem->nregions; i++) {
 		reg = &mem->regions[i];
 		entry = rte_zmalloc(__func__, sizeof(*entry), 0);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
index a0fd2776e57..d4b4375c886 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_steer.c
@@ -45,14 +45,6 @@ void
 mlx5_vdpa_steer_unset(struct mlx5_vdpa_priv *priv)
 {
 	mlx5_vdpa_rss_flows_destroy(priv);
-	if (priv->steer.tbl) {
-		claim_zero(mlx5_glue->dr_destroy_flow_tbl(priv->steer.tbl));
-		priv->steer.tbl = NULL;
-	}
-	if (priv->steer.domain) {
-		claim_zero(mlx5_glue->dr_destroy_domain(priv->steer.domain));
-		priv->steer.domain = NULL;
-	}
 	if (priv->steer.rqt) {
 		claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
 		priv->steer.rqt = NULL;
@@ -248,11 +240,7 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 	int ret = mlx5_vdpa_rqt_prepare(priv);
 
 	if (ret == 0) {
-		mlx5_vdpa_rss_flows_destroy(priv);
-		if (priv->steer.rqt) {
-			claim_zero(mlx5_devx_cmd_destroy(priv->steer.rqt));
-			priv->steer.rqt = NULL;
-		}
+		mlx5_vdpa_steer_unset(priv);
 	} else if (ret < 0) {
 		return ret;
 	} else if (!priv->steer.rss[0].flow) {
@@ -268,26 +256,10 @@ mlx5_vdpa_steer_update(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_steer_setup(struct mlx5_vdpa_priv *priv)
 {
-#ifdef HAVE_MLX5DV_DR
-	priv->steer.domain = mlx5_glue->dr_create_domain(priv->cdev->ctx,
-						  MLX5DV_DR_DOMAIN_TYPE_NIC_RX);
-	if (!priv->steer.domain) {
-		DRV_LOG(ERR, "Failed to create Rx domain.");
-		goto error;
-	}
-	priv->steer.tbl = mlx5_glue->dr_create_flow_tbl(priv->steer.domain, 0);
-	if (!priv->steer.tbl) {
-		DRV_LOG(ERR, "Failed to create table 0 with Rx domain.");
-		goto error;
-	}
 	if (mlx5_vdpa_steer_update(priv))
 		goto error;
 	return 0;
 error:
 	mlx5_vdpa_steer_unset(priv);
 	return -1;
-#else
-	(void)priv;
-	return -ENOTSUP;
-#endif /* HAVE_MLX5DV_DR */
 }
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 4c34983da41..5ab63930ce8 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -3,7 +3,6 @@
  */
 #include <string.h>
 #include <unistd.h>
-#include <sys/mman.h>
 #include <sys/eventfd.h>
 
 #include <rte_malloc.h>
@@ -120,20 +119,6 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 		if (virtq->counters)
 			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
 	}
-	for (i = 0; i < priv->num_lag_ports; i++) {
-		if (priv->tiss[i]) {
-			claim_zero(mlx5_devx_cmd_destroy(priv->tiss[i]));
-			priv->tiss[i] = NULL;
-		}
-	}
-	if (priv->td) {
-		claim_zero(mlx5_devx_cmd_destroy(priv->td));
-		priv->td = NULL;
-	}
-	if (priv->virtq_db_addr) {
-		claim_zero(munmap(priv->virtq_db_addr, priv->var->length));
-		priv->virtq_db_addr = NULL;
-	}
 	priv->features = 0;
 	memset(priv->virtqs, 0, sizeof(*virtq) * priv->nr_virtqs);
 	priv->nr_virtqs = 0;
@@ -462,8 +447,6 @@ mlx5_vdpa_features_validate(struct mlx5_vdpa_priv *priv)
 int
 mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 {
-	struct mlx5_devx_tis_attr tis_attr = {0};
-	struct ibv_context *ctx = priv->cdev->ctx;
 	uint32_t i;
 	uint16_t nr_vring = rte_vhost_get_vring_num(priv->vid);
 	int ret = rte_vhost_get_negotiated_features(priv->vid, &priv->features);
@@ -485,33 +468,6 @@ mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv)
 			(int)nr_vring);
 		return -1;
 	}
-	/* Always map the entire page. */
-	priv->virtq_db_addr = mmap(NULL, priv->var->length, PROT_READ |
-				   PROT_WRITE, MAP_SHARED, ctx->cmd_fd,
-				   priv->var->mmap_off);
-	if (priv->virtq_db_addr == MAP_FAILED) {
-		DRV_LOG(ERR, "Failed to map doorbell page %u.", errno);
-		priv->virtq_db_addr = NULL;
-		goto error;
-	} else {
-		DRV_LOG(DEBUG, "VAR address of doorbell mapping is %p.",
-			priv->virtq_db_addr);
-	}
-	priv->td = mlx5_devx_cmd_create_td(ctx);
-	if (!priv->td) {
-		DRV_LOG(ERR, "Failed to create transport domain.");
-		return -rte_errno;
-	}
-	tis_attr.transport_domain = priv->td->id;
-	for (i = 0; i < priv->num_lag_ports; i++) {
-		/* 0 is auto affinity, non-zero value to propose port. */
-		tis_attr.lag_tx_port_affinity = i + 1;
-		priv->tiss[i] = mlx5_devx_cmd_create_tis(ctx, &tis_attr);
-		if (!priv->tiss[i]) {
-			DRV_LOG(ERR, "Failed to create TIS %u.", i);
-			goto error;
-		}
-	}
 	priv->nr_virtqs = nr_vring;
 	for (i = 0; i < nr_vring; i++)
 		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 5/7] vdpa/mlx5: cache and reuse hardware resources
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (3 preceding siblings ...)
  2022-05-08 14:25   ` [PATCH v3 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
@ 2022-05-08 14:25   ` Xueming Li
  2022-05-08 14:25   ` [PATCH v3 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-05-08 14:25 UTC (permalink / raw)
  To: dev, Maxime Coquelin; +Cc: xuemingl

During device suspend and resume, resources are not changed normally.
When huge resources were allocated to VM, like huge memory size or lots
of queues, time spent on release and recreate became significant.

To speed up, this patch reuses resources like VM MR and VirtQ memory if
not changed.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 11 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa.h       | 12 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_mem.c   | 27 ++++++++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 73 +++++++++++++++++++++--------
 4 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 4408aeccfbd..fb5d9276621 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -241,6 +241,13 @@ mlx5_vdpa_mtu_set(struct mlx5_vdpa_priv *priv)
 	return kern_mtu == vhost_mtu ? 0 : -1;
 }
 
+static void
+mlx5_vdpa_dev_cache_clean(struct mlx5_vdpa_priv *priv)
+{
+	mlx5_vdpa_virtqs_cleanup(priv);
+	mlx5_vdpa_mem_dereg(priv);
+}
+
 static int
 mlx5_vdpa_dev_close(int vid)
 {
@@ -260,7 +267,8 @@ mlx5_vdpa_dev_close(int vid)
 	}
 	mlx5_vdpa_steer_unset(priv);
 	mlx5_vdpa_virtqs_release(priv);
-	mlx5_vdpa_mem_dereg(priv);
+	if (priv->lm_mr.addr)
+		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
@@ -663,6 +671,7 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 {
 	uint32_t i;
 
+	mlx5_vdpa_dev_cache_clean(priv);
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_err_event_unset(priv);
 	if (priv->steer.tbl)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index e0ba20b953c..540bf87a352 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -289,13 +289,21 @@ int mlx5_vdpa_err_event_setup(struct mlx5_vdpa_priv *priv);
 void mlx5_vdpa_err_event_unset(struct mlx5_vdpa_priv *priv);
 
 /**
- * Release a virtq and all its related resources.
+ * Release virtqs and resources except that to be reused.
  *
  * @param[in] priv
  *   The vdpa driver private structure.
  */
 void mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv);
 
+/**
+ * Cleanup cached resources of all virtqs.
+ *
+ * @param[in] priv
+ *   The vdpa driver private structure.
+ */
+void mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv);
+
 /**
  * Create all the HW virtqs resources and all their related resources.
  *
@@ -323,7 +331,7 @@ int mlx5_vdpa_virtqs_prepare(struct mlx5_vdpa_priv *priv);
 int mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable);
 
 /**
- * Unset steering and release all its related resources- stop traffic.
+ * Unset steering - stop traffic.
  *
  * @param[in] priv
  *   The vdpa driver private structure.
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
index 62f5530e91d..d6e3dd664b5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_mem.c
@@ -32,8 +32,6 @@ mlx5_vdpa_mem_dereg(struct mlx5_vdpa_priv *priv)
 		entry = next;
 	}
 	SLIST_INIT(&priv->mr_list);
-	if (priv->lm_mr.addr)
-		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	if (priv->vmem) {
 		free(priv->vmem);
 		priv->vmem = NULL;
@@ -149,6 +147,23 @@ mlx5_vdpa_vhost_mem_regions_prepare(int vid, uint8_t *mode, uint64_t *mem_size,
 	return mem;
 }
 
+static int
+mlx5_vdpa_mem_cmp(struct rte_vhost_memory *mem1, struct rte_vhost_memory *mem2)
+{
+	uint32_t i;
+
+	if (mem1->nregions != mem2->nregions)
+		return -1;
+	for (i = 0; i < mem1->nregions; i++) {
+		if (mem1->regions[i].guest_phys_addr !=
+		    mem2->regions[i].guest_phys_addr)
+			return -1;
+		if (mem1->regions[i].size != mem2->regions[i].size)
+			return -1;
+	}
+	return 0;
+}
+
 #define KLM_SIZE_MAX_ALIGN(sz) ((sz) > MLX5_MAX_KLM_BYTE_COUNT ? \
 				MLX5_MAX_KLM_BYTE_COUNT : (sz))
 
@@ -191,6 +206,14 @@ mlx5_vdpa_mem_register(struct mlx5_vdpa_priv *priv)
 
 	if (!mem)
 		return -rte_errno;
+	if (priv->vmem != NULL) {
+		if (mlx5_vdpa_mem_cmp(mem, priv->vmem) == 0) {
+			/* VM memory not changed, reuse resources. */
+			free(mem);
+			return 0;
+		}
+		mlx5_vdpa_mem_dereg(priv);
+	}
 	priv->vmem = mem;
 	for (i = 0; i < mem->nregions; i++) {
 		reg = &mem->regions[i];
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 5ab63930ce8..0dfeb8fce24 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -66,10 +66,33 @@ mlx5_vdpa_virtq_kick_handler(void *cb_arg)
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
+/* Release cached VQ resources. */
+void
+mlx5_vdpa_virtqs_cleanup(struct mlx5_vdpa_priv *priv)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		struct mlx5_vdpa_virtq *virtq = &priv->virtqs[i];
+
+		for (j = 0; j < RTE_DIM(virtq->umems); ++j) {
+			if (virtq->umems[j].obj) {
+				claim_zero(mlx5_glue->devx_umem_dereg
+							(virtq->umems[j].obj));
+				virtq->umems[j].obj = NULL;
+			}
+			if (virtq->umems[j].buf) {
+				rte_free(virtq->umems[j].buf);
+				virtq->umems[j].buf = NULL;
+			}
+			virtq->umems[j].size = 0;
+		}
+	}
+}
+
 static int
 mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 {
-	unsigned int i;
 	int ret = -EAGAIN;
 
 	if (rte_intr_fd_get(virtq->intr_handle) >= 0) {
@@ -94,13 +117,6 @@ mlx5_vdpa_virtq_unset(struct mlx5_vdpa_virtq *virtq)
 		claim_zero(mlx5_devx_cmd_destroy(virtq->virtq));
 	}
 	virtq->virtq = NULL;
-	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		if (virtq->umems[i].obj)
-			claim_zero(mlx5_glue->devx_umem_dereg
-							 (virtq->umems[i].obj));
-		rte_free(virtq->umems[i].buf);
-	}
-	memset(&virtq->umems, 0, sizeof(virtq->umems));
 	if (virtq->eqp.fw_qp)
 		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
 	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
@@ -120,7 +136,6 @@ mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
 	}
 	priv->features = 0;
-	memset(priv->virtqs, 0, sizeof(*virtq) * priv->nr_virtqs);
 	priv->nr_virtqs = 0;
 }
 
@@ -215,6 +230,8 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	ret = rte_vhost_get_vhost_vring(priv->vid, index, &vq);
 	if (ret)
 		return -1;
+	if (vq.size == 0)
+		return 0;
 	virtq->index = index;
 	virtq->vq_size = vq.size;
 	attr.tso_ipv4 = !!(priv->features & (1ULL << VIRTIO_NET_F_HOST_TSO4));
@@ -259,24 +276,42 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
 	}
 	/* Setup 3 UMEMs for each virtq. */
 	for (i = 0; i < RTE_DIM(virtq->umems); ++i) {
-		virtq->umems[i].size = priv->caps.umems[i].a * vq.size +
-							  priv->caps.umems[i].b;
-		virtq->umems[i].buf = rte_zmalloc(__func__,
-						  virtq->umems[i].size, 4096);
-		if (!virtq->umems[i].buf) {
+		uint32_t size;
+		void *buf;
+		struct mlx5dv_devx_umem *obj;
+
+		size = priv->caps.umems[i].a * vq.size + priv->caps.umems[i].b;
+		if (virtq->umems[i].size == size &&
+		    virtq->umems[i].obj != NULL) {
+			/* Reuse registered memory. */
+			memset(virtq->umems[i].buf, 0, size);
+			goto reuse;
+		}
+		if (virtq->umems[i].obj)
+			claim_zero(mlx5_glue->devx_umem_dereg
+				   (virtq->umems[i].obj));
+		if (virtq->umems[i].buf)
+			rte_free(virtq->umems[i].buf);
+		virtq->umems[i].size = 0;
+		virtq->umems[i].obj = NULL;
+		virtq->umems[i].buf = NULL;
+		buf = rte_zmalloc(__func__, size, 4096);
+		if (buf == NULL) {
 			DRV_LOG(ERR, "Cannot allocate umem %d memory for virtq"
 				" %u.", i, index);
 			goto error;
 		}
-		virtq->umems[i].obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx,
-							virtq->umems[i].buf,
-							virtq->umems[i].size,
-							IBV_ACCESS_LOCAL_WRITE);
-		if (!virtq->umems[i].obj) {
+		obj = mlx5_glue->devx_umem_reg(priv->cdev->ctx, buf, size,
+					       IBV_ACCESS_LOCAL_WRITE);
+		if (obj == NULL) {
 			DRV_LOG(ERR, "Failed to register umem %d for virtq %u.",
 				i, index);
 			goto error;
 		}
+		virtq->umems[i].size = size;
+		virtq->umems[i].buf = buf;
+		virtq->umems[i].obj = obj;
+reuse:
 		attr.umems[i].id = virtq->umems[i].obj->umem_id;
 		attr.umems[i].offset = 0;
 		attr.umems[i].size = virtq->umems[i].size;
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 6/7] vdpa/mlx5: support device cleanup callback
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (4 preceding siblings ...)
  2022-05-08 14:25   ` [PATCH v3 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
@ 2022-05-08 14:25   ` Xueming Li
  2022-05-08 14:25   ` [PATCH v3 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
  2022-05-09 19:38   ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Maxime Coquelin
  7 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-05-08 14:25 UTC (permalink / raw)
  To: dev, Maxime Coquelin; +Cc: xuemingl

This patch supports device cleanup callback API which is called when
the device is disconnected from the VM. Cached resources like VM MR and
VQ memory are released.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c | 23 +++++++++++++++++++++++
 drivers/vdpa/mlx5/mlx5_vdpa.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index fb5d9276621..b1d5487080d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -270,6 +270,8 @@ mlx5_vdpa_dev_close(int vid)
 	if (priv->lm_mr.addr)
 		mlx5_os_wrapped_mkey_destroy(&priv->lm_mr);
 	priv->state = MLX5_VDPA_STATE_PROBED;
+	if (!priv->connected)
+		mlx5_vdpa_dev_cache_clean(priv);
 	priv->vid = 0;
 	/* The mutex may stay locked after event thread cancel - initiate it. */
 	pthread_mutex_init(&priv->vq_config_lock, NULL);
@@ -294,6 +296,7 @@ mlx5_vdpa_dev_config(int vid)
 		return -1;
 	}
 	priv->vid = vid;
+	priv->connected = true;
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %s.",
 				vdev->device->name);
@@ -431,12 +434,32 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 	return mlx5_vdpa_virtq_stats_reset(priv, qid);
 }
 
+static int
+mlx5_vdpa_dev_cleanup(int vid)
+{
+	struct rte_vdpa_device *vdev = rte_vhost_get_vdpa_device(vid);
+	struct mlx5_vdpa_priv *priv;
+
+	if (vdev == NULL)
+		return -1;
+	priv = mlx5_vdpa_find_priv_resource_by_vdev(vdev);
+	if (priv == NULL) {
+		DRV_LOG(ERR, "Invalid vDPA device: %s.", vdev->device->name);
+		return -1;
+	}
+	if (priv->state == MLX5_VDPA_STATE_PROBED)
+		mlx5_vdpa_dev_cache_clean(priv);
+	priv->connected = false;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops mlx5_vdpa_ops = {
 	.get_queue_num = mlx5_vdpa_get_queue_num,
 	.get_features = mlx5_vdpa_get_vdpa_features,
 	.get_protocol_features = mlx5_vdpa_get_protocol_features,
 	.dev_conf = mlx5_vdpa_dev_config,
 	.dev_close = mlx5_vdpa_dev_close,
+	.dev_cleanup = mlx5_vdpa_dev_cleanup,
 	.set_vring_state = mlx5_vdpa_set_vring_state,
 	.set_features = mlx5_vdpa_features_set,
 	.migration_done = NULL,
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 540bf87a352..24bafe85b44 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -121,6 +121,7 @@ enum mlx5_dev_state {
 
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
+	bool connected;
 	enum mlx5_dev_state state;
 	pthread_mutex_t vq_config_lock;
 	uint64_t no_traffic_counter;
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 7/7] vdpa/mlx5: make statistics counter persistent
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (5 preceding siblings ...)
  2022-05-08 14:25   ` [PATCH v3 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
@ 2022-05-08 14:25   ` Xueming Li
  2022-05-09 19:38   ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Maxime Coquelin
  7 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2022-05-08 14:25 UTC (permalink / raw)
  To: dev, Maxime Coquelin; +Cc: xuemingl

In order to speed-up the device suspend and resume, make the statistics
counters persistent in reconfiguration until the device gets removed.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/vdpadevs/mlx5.rst        |  6 ++++++
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 19 +++++++----------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  1 +
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 32 +++++++++++------------------
 4 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/doc/guides/vdpadevs/mlx5.rst b/doc/guides/vdpadevs/mlx5.rst
index acb791032ad..3ded142311e 100644
--- a/doc/guides/vdpadevs/mlx5.rst
+++ b/doc/guides/vdpadevs/mlx5.rst
@@ -109,3 +109,9 @@ Upon potential hardware errors, mlx5 PMD try to recover, give up if failed 3
 times in 3 seconds, virtq will be put in disable state. User should check log
 to get error information, or query vdpa statistics counter to know error type
 and count report.
+
+Statistics
+^^^^^^^^^^
+
+The device statistics counter persists in reconfiguration until the device gets
+removed. User can reset counters by calling function rte_vdpa_reset_stats().
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index b1d5487080d..76fa5d4299e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -388,12 +388,7 @@ mlx5_vdpa_get_stats(struct rte_vdpa_device *vdev, int qid,
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED) {
-		DRV_LOG(ERR, "Device %s was not configured.",
-				vdev->device->name);
-		return -ENODATA;
-	}
-	if (qid >= (int)priv->nr_virtqs) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -416,12 +411,7 @@ mlx5_vdpa_reset_stats(struct rte_vdpa_device *vdev, int qid)
 		DRV_LOG(ERR, "Invalid device: %s.", vdev->device->name);
 		return -ENODEV;
 	}
-	if (priv->state == MLX5_VDPA_STATE_PROBED) {
-		DRV_LOG(ERR, "Device %s was not configured.",
-				vdev->device->name);
-		return -ENODATA;
-	}
-	if (qid >= (int)priv->nr_virtqs) {
+	if (qid >= (int)priv->caps.max_num_virtio_queues * 2) {
 		DRV_LOG(ERR, "Too big vring id: %d for device %s.", qid,
 				vdev->device->name);
 		return -E2BIG;
@@ -695,6 +685,11 @@ mlx5_vdpa_release_dev_resources(struct mlx5_vdpa_priv *priv)
 	uint32_t i;
 
 	mlx5_vdpa_dev_cache_clean(priv);
+	for (i = 0; i < priv->caps.max_num_virtio_queues * 2; i++) {
+		if (!priv->virtqs[i].counters)
+			continue;
+		claim_zero(mlx5_devx_cmd_destroy(priv->virtqs[i].counters));
+	}
 	mlx5_vdpa_event_qp_global_release(priv);
 	mlx5_vdpa_err_event_unset(priv);
 	if (priv->steer.tbl)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 24bafe85b44..e7f3319f896 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -92,6 +92,7 @@ struct mlx5_vdpa_virtq {
 	struct rte_intr_handle *intr_handle;
 	uint64_t err_time[3]; /* RDTSC time of recent errors. */
 	uint32_t n_retry;
+	struct mlx5_devx_virtio_q_couners_attr stats;
 	struct mlx5_devx_virtio_q_couners_attr reset;
 };
 
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 0dfeb8fce24..e025be47d27 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -127,14 +127,9 @@ void
 mlx5_vdpa_virtqs_release(struct mlx5_vdpa_priv *priv)
 {
 	int i;
-	struct mlx5_vdpa_virtq *virtq;
 
-	for (i = 0; i < priv->nr_virtqs; i++) {
-		virtq = &priv->virtqs[i];
-		mlx5_vdpa_virtq_unset(virtq);
-		if (virtq->counters)
-			claim_zero(mlx5_devx_cmd_destroy(virtq->counters));
-	}
+	for (i = 0; i < priv->nr_virtqs; i++)
+		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
 	priv->features = 0;
 	priv->nr_virtqs = 0;
 }
@@ -590,7 +585,7 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 			  struct rte_vdpa_stat *stats, unsigned int n)
 {
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[qid];
-	struct mlx5_devx_virtio_q_couners_attr attr = {0};
+	struct mlx5_devx_virtio_q_couners_attr *attr = &virtq->stats;
 	int ret;
 
 	if (!virtq->counters) {
@@ -598,7 +593,7 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 			"is invalid.", qid);
 		return -EINVAL;
 	}
-	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters, &attr);
+	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters, attr);
 	if (ret) {
 		DRV_LOG(ERR, "Failed to read virtq %d stats from HW.", qid);
 		return ret;
@@ -608,37 +603,37 @@ mlx5_vdpa_virtq_stats_get(struct mlx5_vdpa_priv *priv, int qid,
 		return ret;
 	stats[MLX5_VDPA_STATS_RECEIVED_DESCRIPTORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_RECEIVED_DESCRIPTORS,
-		.value = attr.received_desc - virtq->reset.received_desc,
+		.value = attr->received_desc - virtq->reset.received_desc,
 	};
 	if (ret == MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_COMPLETED_DESCRIPTORS,
-		.value = attr.completed_desc - virtq->reset.completed_desc,
+		.value = attr->completed_desc - virtq->reset.completed_desc,
 	};
 	if (ret == MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_BAD_DESCRIPTOR_ERRORS,
-		.value = attr.bad_desc_errors - virtq->reset.bad_desc_errors,
+		.value = attr->bad_desc_errors - virtq->reset.bad_desc_errors,
 	};
 	if (ret == MLX5_VDPA_STATS_EXCEED_MAX_CHAIN)
 		return ret;
 	stats[MLX5_VDPA_STATS_EXCEED_MAX_CHAIN] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_EXCEED_MAX_CHAIN,
-		.value = attr.exceed_max_chain - virtq->reset.exceed_max_chain,
+		.value = attr->exceed_max_chain - virtq->reset.exceed_max_chain,
 	};
 	if (ret == MLX5_VDPA_STATS_INVALID_BUFFER)
 		return ret;
 	stats[MLX5_VDPA_STATS_INVALID_BUFFER] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_INVALID_BUFFER,
-		.value = attr.invalid_buffer - virtq->reset.invalid_buffer,
+		.value = attr->invalid_buffer - virtq->reset.invalid_buffer,
 	};
 	if (ret == MLX5_VDPA_STATS_COMPLETION_ERRORS)
 		return ret;
 	stats[MLX5_VDPA_STATS_COMPLETION_ERRORS] = (struct rte_vdpa_stat) {
 		.id = MLX5_VDPA_STATS_COMPLETION_ERRORS,
-		.value = attr.error_cqes - virtq->reset.error_cqes,
+		.value = attr->error_cqes - virtq->reset.error_cqes,
 	};
 	return ret;
 }
@@ -649,11 +644,8 @@ mlx5_vdpa_virtq_stats_reset(struct mlx5_vdpa_priv *priv, int qid)
 	struct mlx5_vdpa_virtq *virtq = &priv->virtqs[qid];
 	int ret;
 
-	if (!virtq->counters) {
-		DRV_LOG(ERR, "Failed to read virtq %d statistics - virtq "
-			"is invalid.", qid);
-		return -EINVAL;
-	}
+	if (virtq->counters == NULL) /* VQ not enabled. */
+		return 0;
 	ret = mlx5_devx_cmd_query_virtio_q_counters(virtq->counters,
 						    &virtq->reset);
 	if (ret)
-- 
2.35.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time
  2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
                     ` (6 preceding siblings ...)
  2022-05-08 14:25   ` [PATCH v3 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
@ 2022-05-09 19:38   ` Maxime Coquelin
  7 siblings, 0 replies; 43+ messages in thread
From: Maxime Coquelin @ 2022-05-09 19:38 UTC (permalink / raw)
  To: Xueming Li, dev



On 5/8/22 16:25, Xueming Li wrote:
> v1:
>   - rebase with latest upstream code
>   - fix coverity issues
> v2:
>   - fix build issue on OS w/o flow DR API
> v3:
>   - commit message update, thanks Maxime!
> 
> 
> Xueming Li (7):
>    vdpa/mlx5: fix interrupt trash that leads to segment fault
>    vdpa/mlx5: fix dead loop when process interrupted
>    vdpa/mlx5: no kick handling during shutdown
>    vdpa/mlx5: reuse resources in reconfiguration
>    vdpa/mlx5: cache and reuse hardware resources
>    vdpa/mlx5: support device cleanup callback
>    vdpa/mlx5: make statistics counter persistent
> 
>   doc/guides/vdpadevs/mlx5.rst        |   6 +
>   drivers/vdpa/mlx5/mlx5_vdpa.c       | 231 +++++++++++++++++++++-------
>   drivers/vdpa/mlx5/mlx5_vdpa.h       |  31 +++-
>   drivers/vdpa/mlx5/mlx5_vdpa_event.c |  23 +--
>   drivers/vdpa/mlx5/mlx5_vdpa_mem.c   |  38 +++--
>   drivers/vdpa/mlx5/mlx5_vdpa_steer.c |  30 +---
>   drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 189 +++++++++++------------
>   7 files changed, 336 insertions(+), 212 deletions(-)
> 


Applied to dpdk-next-virtio/main.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2022-05-09 19:38 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-24 13:28 [PATCH 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
2022-02-24 13:28 ` [PATCH 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
2022-02-24 13:28 ` [PATCH 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
2022-02-24 13:28 ` [PATCH 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
2022-02-24 13:28 ` [PATCH 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
2022-02-24 13:28 ` [PATCH 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
2022-02-24 13:28 ` [PATCH 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
2022-02-24 13:28 ` [PATCH 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
2022-02-24 14:38 ` [PATCH v1 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
2022-02-24 14:38   ` [PATCH v1 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
2022-02-24 14:38   ` [PATCH v1 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
2022-02-24 14:38   ` [PATCH v1 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
2022-02-24 14:38   ` [PATCH v1 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
2022-02-24 14:38   ` [PATCH v1 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
2022-02-24 14:38   ` [PATCH v1 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
2022-02-24 14:38   ` [PATCH v1 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
2022-02-24 15:50 ` [PATCH v2 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
2022-02-24 15:50   ` [PATCH v2 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
2022-04-20 10:39     ` Maxime Coquelin
2022-02-24 15:50   ` [PATCH v2 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
2022-04-20 10:33     ` Maxime Coquelin
2022-02-24 15:50   ` [PATCH v2 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
2022-04-20 12:37     ` Maxime Coquelin
2022-04-20 13:23       ` Xueming(Steven) Li
2022-02-24 15:50   ` [PATCH v2 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
2022-04-20 14:49     ` Maxime Coquelin
2022-02-24 15:50   ` [PATCH v2 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
2022-04-20 15:03     ` Maxime Coquelin
2022-04-25 13:28       ` Xueming(Steven) Li
2022-05-05 20:01         ` Maxime Coquelin
2022-02-24 15:51   ` [PATCH v2 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
2022-04-21  8:19     ` Maxime Coquelin
2022-02-24 15:51   ` [PATCH v2 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
2022-04-21  8:22     ` Maxime Coquelin
2022-05-08 14:25 ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Xueming Li
2022-05-08 14:25   ` [PATCH v3 1/7] vdpa/mlx5: fix interrupt trash that leads to segment fault Xueming Li
2022-05-08 14:25   ` [PATCH v3 2/7] vdpa/mlx5: fix dead loop when process interrupted Xueming Li
2022-05-08 14:25   ` [PATCH v3 3/7] vdpa/mlx5: no kick handling during shutdown Xueming Li
2022-05-08 14:25   ` [PATCH v3 4/7] vdpa/mlx5: reuse resources in reconfiguration Xueming Li
2022-05-08 14:25   ` [PATCH v3 5/7] vdpa/mlx5: cache and reuse hardware resources Xueming Li
2022-05-08 14:25   ` [PATCH v3 6/7] vdpa/mlx5: support device cleanup callback Xueming Li
2022-05-08 14:25   ` [PATCH v3 7/7] vdpa/mlx5: make statistics counter persistent Xueming Li
2022-05-09 19:38   ` [PATCH v3 0/7] vdpa/mlx5: improve device shutdown time Maxime Coquelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).